1. 19 3月, 2018 4 次提交
  2. 26 2月, 2018 1 次提交
  3. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  4. 07 2月, 2018 7 次提交
  5. 01 2月, 2018 3 次提交
    • D
      mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks · 5ff7091f
      David Rientjes 提交于
      Commit 4d4bbd85 ("mm, oom_reaper: skip mm structs with mmu
      notifiers") prevented the oom reaper from unmapping private anonymous
      memory with the oom reaper when the oom victim mm had mmu notifiers
      registered.
      
      The rationale is that doing mmu_notifier_invalidate_range_{start,end}()
      around the unmap_page_range(), which is needed, can block and the oom
      killer will stall forever waiting for the victim to exit, which may not
      be possible without reaping.
      
      That concern is real, but only true for mmu notifiers that have
      blockable invalidate_range_{start,end}() callbacks.  This patch adds a
      "flags" field to mmu notifier ops that can set a bit to indicate that
      these callbacks do not block.
      
      The implementation is steered toward an expensive slowpath, such as
      after the oom reaper has grabbed mm->mmap_sem of a still alive oom
      victim.
      
      [rientjes@google.com: mmu_notifier_invalidate_range_end() can also call the invalidate_range() must not block, fix comment]
        Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1801091339570.240101@chino.kir.corp.google.com
      [akpm@linux-foundation.org: make mm_has_blockable_invalidate_notifiers() return bool, use rwsem_is_locked()]
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1712141329500.74052@chino.kir.corp.google.comSigned-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Acked-by: NDimitri Sivanich <sivanich@hpe.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Oded Gabbay <oded.gabbay@gmail.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ff7091f
    • M
      kvm: embed vcpu id to dentry of vcpu anon inode · e46b4692
      Masatake YAMATO 提交于
      All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means
      it is impossible to know the mapping between fds for vcpu and vcpu
      from userland.
      
          # LC_ALL=C ls -l /proc/617/fd | grep vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 18 -> anon_inode:kvm-vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 19 -> anon_inode:kvm-vcpu
      
      It is also impossible to know the mapping between vma for kvm_run
      structure and vcpu from userland.
      
          # LC_ALL=C grep vcpu /proc/617/maps
          7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu
          7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu
      
      This change adds vcpu id to d-entries for vcpu. With this change
      you can get the following output:
      
          # LC_ALL=C ls -l /proc/617/fd | grep vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 18 -> anon_inode:kvm-vcpu:0
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 19 -> anon_inode:kvm-vcpu:1
      
          # LC_ALL=C grep vcpu /proc/617/maps
          7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu:0
          7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu:1
      
      With the mappings known from the output, a tool like strace can report more details
      of qemu-kvm process activities. Here is the strace output of my local prototype:
      
          # ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K'
          ...
          [pid   664] ioctl(18, KVM_RUN, 0)       = 0 (KVM_EXIT_MMIO)
           K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
           K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
          [pid   664] ioctl(18, KVM_RUN, 0)       = 0 (KVM_EXIT_MMIO)
           K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
           K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
          ...
      Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e46b4692
    • K
      kvm: Map PFN-type memory regions as writable (if possible) · a340b3e2
      KarimAllah Ahmed 提交于
      For EPT-violations that are triggered by a read, the pages are also mapped with
      write permissions (if their memory region is also writable). That would avoid
      getting yet another fault on the same page when a write occurs.
      
      This optimization only happens when you have a "struct page" backing the memory
      region. So also enable it for memory regions that do not have a "struct page".
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a340b3e2
  6. 31 1月, 2018 3 次提交
    • C
      KVM: arm/arm64: Fixup userspace irqchip static key optimization · cd15d205
      Christoffer Dall 提交于
      When I introduced a static key to avoid work in the critical path for
      userspace irqchips which is very rarely used, I accidentally messed up
      my logic and used && where I should have used ||, because the point was
      to short-circuit the evaluation in case userspace irqchips weren't even
      in use.
      
      This fixes an issue when running in-kernel irqchip VMs alongside
      userspace irqchip VMs.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Fixes: c44c232ee2d3 ("KVM: arm/arm64: Avoid work when userspace iqchips are not used")
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      cd15d205
    • C
      KVM: arm/arm64: Fix userspace_irqchip_in_use counting · f1d7231c
      Christoffer Dall 提交于
      We were not decrementing the static key count in the right location.
      kvm_arch_vcpu_destroy() is only called to clean up after a failed
      VCPU create attempt, whereas kvm_arch_vcpu_free() is called on teardown
      of the VM as well.  Move the static key decrement call to
      kvm_arch_vcpu_free().
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      f1d7231c
    • C
      KVM: arm/arm64: Fix incorrect timer_is_pending logic · 13e59ece
      Christoffer Dall 提交于
      After the recently introduced support for level-triggered mapped
      interrupt, I accidentally left the VCPU thread busily going back and
      forward between the guest and the hypervisor whenever the guest was
      blocking, because I would always incorrectly report that a timer
      interrupt was pending.
      
      This is because the timer->irq.level field is not valid for mapped
      interrupts, where we offload the level state to the hardware, and as a
      result this field is always true.
      
      Luckily the problem can be relatively easily solved by not checking the
      cached signal state of either timer in kvm_timer_should_fire() but
      instead compute the timer state on the fly, which we do already if the
      cached signal state wasn't high.  In fact, the only reason for checking
      the cached signal state was a tiny optimization which would only be
      potentially faster when the polling loop detects a pending timer
      interrupt, which is quite unlikely.
      
      Instead of duplicating the logic from kvm_arch_timer_handler(), we
      enlighten kvm_timer_should_fire() to report something valid when the
      timer state is loaded onto the hardware.  We can then call this from
      kvm_arch_timer_handler() as well and avoid the call to
      __timer_snapshot_state() in kvm_arch_timer_get_input_level().
      Reported-by: NTomasz Nowicki <tn@semihalf.com>
      Tested-by: NTomasz Nowicki <tn@semihalf.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      13e59ece
  7. 23 1月, 2018 1 次提交
    • J
      KVM: arm/arm64: Handle CPU_PM_ENTER_FAILED · 58d6b15e
      James Morse 提交于
      cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if
      there is a failure: CPU_PM_ENTER_FAILED.
      
      When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will
      return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED,
      KVM does nothing, leaving the CPU running with the hyp-stub, at odds
      with kvm_arm_hardware_enabled.
      
      Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads
      KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER
      never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset()
      to make sure the hyp-stub is loaded before reloading KVM.
      
      Fixes: 67f69197 ("arm64: kvm: allows kvm cpu hotplug")
      Cc: <stable@vger.kernel.org> # v4.7+
      CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      58d6b15e
  8. 16 1月, 2018 4 次提交
    • J
      KVM: arm64: Handle RAS SErrors from EL1 on guest exit · 3368bd80
      James Morse 提交于
      We expect to have firmware-first handling of RAS SErrors, with errors
      notified via an APEI method. For systems without firmware-first, add
      some minimal handling to KVM.
      
      There are two ways KVM can take an SError due to a guest, either may be a
      RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
      or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
      
      For SError that interrupt a guest and are routed to EL2 the existing
      behaviour is to inject an impdef SError into the guest.
      
      Add code to handle RAS SError based on the ESR. For uncontained and
      uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
      errors compromise the host too. All other error types are contained:
      For the fatal errors the vCPU can't make progress, so we inject a virtual
      SError. We ignore contained errors where we can make progress as if
      we're lucky, we may not hit them again.
      
      If only some of the CPUs support RAS the guest will see the cpufeature
      sanitised version of the id registers, but we may still take RAS SError
      on this CPU. Move the SError handling out of handle_exit() into a new
      handler that runs before we can be preempted. This allows us to use
      this_cpu_has_cap(), via arm64_is_ras_serror().
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      3368bd80
    • J
      KVM: arm/arm64: mask/unmask daif around VHE guests · 4f5abad9
      James Morse 提交于
      Non-VHE systems take an exception to EL2 in order to world-switch into the
      guest. When returning from the guest KVM implicitly restores the DAIF
      flags when it returns to the kernel at EL1.
      
      With VHE none of this exception-level jumping happens, so KVMs
      world-switch code is exposed to the host kernel's DAIF values, and KVM
      spills the guest-exit DAIF values back into the host kernel.
      On entry to a guest we have Debug and SError exceptions unmasked, KVM
      has switched VBAR but isn't prepared to handle these. On guest exit
      Debug exceptions are left disabled once we return to the host and will
      stay this way until we enter user space.
      
      Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
      happen after the hosts VBAR value has been synchronised by the isb in
      __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
      setting KVMs VBAR value, but is kept here for symmetry.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4f5abad9
    • P
      kvm: whitelist struct kvm_vcpu_arch · 46515736
      Paolo Bonzini 提交于
      On x86, ARM and s390, struct kvm_vcpu_arch has a usercopy region
      that is read and written by the KVM_GET/SET_CPUID2 ioctls (x86)
      or KVM_GET/SET_ONE_REG (ARM/s390).  Without whitelisting the area,
      KVM is completely broken on those architectures with usercopy hardening
      enabled.
      
      For now, allow writing to the entire struct on all architectures.
      The KVM tree will not refine this to an architecture-specific
      subset of struct kvm_vcpu_arch.
      
      Cc: kernel-hardening@lists.openwall.com
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Christian Borntraeger <borntraeger@redhat.com>
      Cc: Christoffer Dall <cdall@linaro.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      46515736
    • K
      KVM: arm/arm64: fix HYP ID map extension to 52 bits · 98732d1b
      Kristina Martsenko 提交于
      Commit fa2a8445 incorrectly masks the index of the HYP ID map pgd
      entry, causing a non-VHE kernel to hang during boot. This happens when
      VA_BITS=48 and the ID map text is in 52-bit physical memory. In this
      case we don't need an extra table level but need more entries in the
      top-level table, so we need to map into hyp_pgd and need to use
      __kvm_idmap_ptrs_per_pgd to mask in the extra bits. However,
      __create_hyp_mappings currently masks by PTRS_PER_PGD instead.
      
      Fix it so that we always use __kvm_idmap_ptrs_per_pgd for the HYP ID
      map. This ensures that we use the larger mask for the top-level ID map
      table when it has more entries. In all other cases, PTRS_PER_PGD is used
      as normal.
      
      Fixes: fa2a8445 ("arm64: allow ID map to be extended to 52 bits")
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      98732d1b
  9. 13 1月, 2018 1 次提交
  10. 12 1月, 2018 1 次提交
  11. 11 1月, 2018 1 次提交
  12. 09 1月, 2018 1 次提交
  13. 08 1月, 2018 6 次提交
  14. 02 1月, 2018 6 次提交
    • C
      KVM: arm/arm64: Avoid work when userspace iqchips are not used · 61bbe380
      Christoffer Dall 提交于
      We currently check if the VM has a userspace irqchip in several places
      along the critical path, and if so, we do some work which is only
      required for having an irqchip in userspace.  This is unfortunate, as we
      could avoid doing any work entirely, if we didn't have to support
      irqchip in userspace.
      
      Realizing the userspace irqchip on ARM is mostly a developer or hobby
      feature, and is unlikely to be used in servers or other scenarios where
      performance is a priority, we can use a refcounted static key to only
      check the irqchip configuration when we have at least one VM that uses
      an irqchip in userspace.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      61bbe380
    • C
      KVM: arm/arm64: Provide a get_input_level for the arch timer · 4c60e360
      Christoffer Dall 提交于
      The VGIC can now support the life-cycle of mapped level-triggered
      interrupts, and we no longer have to read back the timer state on every
      exit from the VM if we had an asserted timer interrupt signal, because
      the VGIC already knows if we hit the unlikely case where the guest
      disables the timer without ACKing the virtual timer interrupt.
      
      This means we rework a bit of the code to factor out the functionality
      to snapshot the timer state from vtimer_save_state(), and we can reuse
      this functionality in the sync path when we have an irqchip in
      userspace, and also to support our implementation of the
      get_input_level() function for the timer.
      
      This change also means that we can no longer rely on the timer's view of
      the interrupt line to set the active state, because we no longer
      maintain this state for mapped interrupts when exiting from the guest.
      Instead, we only set the active state if the virtual interrupt is
      active, and otherwise we simply let the timer fire again and raise the
      virtual interrupt from the ISR.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      4c60e360
    • C
      KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQs · df635c5b
      Christoffer Dall 提交于
      For mapped IRQs (with the HW bit set in the LR) we have to follow some
      rules of the architecture.  One of these rules is that VM must not be
      allowed to deactivate a virtual interrupt with the HW bit set unless the
      physical interrupt is also active.
      
      This works fine when injecting mapped interrupts, because we leave it up
      to the injector to either set EOImode==1 or manually set the active
      state of the physical interrupt.
      
      However, the guest can set virtual interrupt to be pending or active by
      writing to the virtual distributor, which could lead to deactivating a
      virtual interrupt with the HW bit set without the physical interrupt
      being active.
      
      We could set the physical interrupt to active whenever we are about to
      enter the VM with a HW interrupt either pending or active, but that
      would be really slow, especially on GICv2.  So we take the long way
      around and do the hard work when needed, which is expected to be
      extremely rare.
      
      When the VM sets the pending state for a HW interrupt on the virtual
      distributor we set the active state on the physical distributor, because
      the virtual interrupt can become active and then the guest can
      deactivate it.
      
      When the VM clears the pending state we also clear it on the physical
      side, because the injector might otherwise raise the interrupt.  We also
      clear the physical active state when the virtual interrupt is not
      active, since otherwise a SPEND/CPEND sequence from the guest would
      prevent signaling of future interrupts.
      
      Changing the state of mapped interrupts from userspace is not supported,
      and it's expected that userspace unmaps devices from VFIO before
      attempting to set the interrupt state, because the interrupt state is
      driven by hardware.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      df635c5b
    • C
      KVM: arm/arm64: Support a vgic interrupt line level sample function · b6909a65
      Christoffer Dall 提交于
      The GIC sometimes need to sample the physical line of a mapped
      interrupt.  As we know this to be notoriously slow, provide a callback
      function for devices (such as the timer) which can do this much faster
      than talking to the distributor, for example by comparing a few
      in-memory values.  Fall back to the good old method of poking the
      physical GIC if no callback is provided.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      b6909a65
    • C
      KVM: arm/arm64: vgic: Support level-triggered mapped interrupts · e40cc57b
      Christoffer Dall 提交于
      Level-triggered mapped IRQs are special because we only observe rising
      edges as input to the VGIC, and we don't set the EOI flag and therefore
      are not told when the level goes down, so that we can re-queue a new
      interrupt when the level goes up.
      
      One way to solve this problem is to side-step the logic of the VGIC and
      special case the validation in the injection path, but it has the
      unfortunate drawback of having to peak into the physical GIC state
      whenever we want to know if the interrupt is pending on the virtual
      distributor.
      
      Instead, we can maintain the current semantics of a level triggered
      interrupt by sort of treating it as an edge-triggered interrupt,
      following from the fact that we only observe an asserting edge.  This
      requires us to be a bit careful when populating the LRs and when folding
      the state back in though:
      
       * We lower the line level when populating the LR, so that when
         subsequently observing an asserting edge, the VGIC will do the right
         thing.
      
       * If the guest never acked the interrupt while running (for example if
         it had masked interrupts at the CPU level while running), we have
         to preserve the pending state of the LR and move it back to the
         line_level field of the struct irq when folding LR state.
      
         If the guest never acked the interrupt while running, but changed the
         device state and lowered the line (again with interrupts masked) then
         we need to observe this change in the line_level.
      
         Both of the above situations are solved by sampling the physical line
         and set the line level when folding the LR back.
      
       * Finally, if the guest never acked the interrupt while running and
         sampling the line reveals that the device state has changed and the
         line has been lowered, we must clear the physical active state, since
         we will otherwise never be told when the interrupt becomes asserted
         again.
      
      This has the added benefit of making the timer optimization patches
      (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
      bit simpler, because the timer code doesn't have to clear the active
      state on the sync anymore.  It also potentially improves the performance
      of the timer implementation because the GIC knows the state or the LR
      and only needs to clear the
      active state when the pending bit in the LR is still set, where the
      timer has to always clear it when returning from running the guest with
      an injected timer interrupt.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e40cc57b
    • C
      KVM: arm/arm64: Don't cache the timer IRQ level · 70450a9f
      Christoffer Dall 提交于
      The timer logic was designed after a strict idea of modeling an
      interrupt line level in software, meaning that only transitions in the
      level need to be reported to the VGIC.  This works well for the timer,
      because the arch timer code is in complete control of the device and can
      track the transitions of the line.
      
      However, as we are about to support using the HW bit in the VGIC not
      just for the timer, but also for VFIO which cannot track transitions of
      the interrupt line, we have to decide on an interface between the GIC
      and other subsystems for level triggered mapped interrupts, which both
      the timer and VFIO can use.
      
      VFIO only sees an asserting transition of the physical interrupt line,
      and tells the VGIC when that happens.  That means that part of the
      interrupt flow is offloaded to the hardware.
      
      To use the same interface for VFIO devices and the timer, we therefore
      have to change the timer (we cannot change VFIO because it doesn't know
      the details of the device it is assigning to a VM).
      
      Luckily, changing the timer is simple, we just need to stop 'caching'
      the line level, but instead let the VGIC know the state of the timer
      every time there is a potential change in the line level, and when the
      line level should be asserted from the timer ISR.  The VGIC can ignore
      extra notifications using its validate mechanism.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      70450a9f