1. 13 12月, 2019 1 次提交
  2. 06 9月, 2019 1 次提交
  3. 25 8月, 2019 1 次提交
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 8c7053d1
      Marc Zyngier 提交于
      commit 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c upstream.
      
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      8c7053d1
  4. 04 5月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory · 0371fa03
      Marc Zyngier 提交于
      [ Upstream commit a6ecfb11bf37743c1ac49b266595582b107b61d4 ]
      
      When halting a guest, QEMU flushes the virtual ITS caches, which
      amounts to writing to the various tables that the guest has allocated.
      
      When doing this, we fail to take the srcu lock, and the kernel
      shouts loudly if running a lockdep kernel:
      
      [   69.680416] =============================
      [   69.680819] WARNING: suspicious RCU usage
      [   69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
      [   69.682096] -----------------------------
      [   69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
      [   69.683225]
      [   69.683225] other info that might help us debug this:
      [   69.683225]
      [   69.683975]
      [   69.683975] rcu_scheduler_active = 2, debug_locks = 1
      [   69.684598] 6 locks held by qemu-system-aar/4097:
      [   69.685059]  #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
      [   69.686087]  #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
      [   69.686919]  #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.687698]  #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.688475]  #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.689978]  #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.690729]
      [   69.690729] stack backtrace:
      [   69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
      [   69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
      [   69.692831] Call trace:
      [   69.694072]  lockdep_rcu_suspicious+0xcc/0x110
      [   69.694490]  gfn_to_memslot+0x174/0x190
      [   69.694853]  kvm_write_guest+0x50/0xb0
      [   69.695209]  vgic_its_save_tables_v0+0x248/0x330
      [   69.695639]  vgic_its_set_attr+0x298/0x3a0
      [   69.696024]  kvm_device_ioctl_attr+0x9c/0xd8
      [   69.696424]  kvm_device_ioctl+0x8c/0xf8
      [   69.696788]  do_vfs_ioctl+0xc8/0x960
      [   69.697128]  ksys_ioctl+0x8c/0xa0
      [   69.697445]  __arm64_sys_ioctl+0x28/0x38
      [   69.697817]  el0_svc_common+0xd8/0x138
      [   69.698173]  el0_svc_handler+0x38/0x78
      [   69.698528]  el0_svc+0x8/0xc
      
      The fix is to obviously take the srcu lock, just like we do on the
      read side of things since bf308242. One wonders why this wasn't
      fixed at the same time, but hey...
      
      Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      0371fa03
  5. 12 8月, 2018 1 次提交
  6. 21 7月, 2018 1 次提交
  7. 21 6月, 2018 1 次提交
    • A
      KVM: arm/arm64: Drop resource size check for GICV window · ba56bc3a
      Ard Biesheuvel 提交于
      When booting a 64 KB pages kernel on a ACPI GICv3 system that
      implements support for v2 emulation, the following warning is
      produced
      
        GICV size 0x2000 not a multiple of page size 0x10000
      
      and support for v2 emulation is disabled, preventing GICv2 VMs
      from being able to run on such hosts.
      
      The reason is that vgic_v3_probe() performs a sanity check on the
      size of the window (it should be a multiple of the page size),
      while the ACPI MADT parsing code hardcodes the size of the window
      to 8 KB. This makes sense, considering that ACPI does not bother
      to describe the size in the first place, under the assumption that
      platforms implementing ACPI will follow the architecture and not
      put anything else in the same 64 KB window.
      
      So let's just drop the sanity check altogether, and assume that
      the window is at least 64 KB in size.
      
      Fixes: 90977732 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ba56bc3a
  8. 25 5月, 2018 5 次提交
  9. 15 5月, 2018 1 次提交
  10. 27 4月, 2018 1 次提交
    • M
      KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI · 53692908
      Marc Zyngier 提交于
      Now that we make sure we don't inject multiple instances of the
      same GICv2 SGI at the same time, we've made another bug more
      obvious:
      
      If we exit with an active SGI, we completely lose track of which
      vcpu it came from. On the next entry, we restore it with 0 as a
      source, and if that wasn't the right one, too bad. While this
      doesn't seem to trouble GIC-400, the architectural model gets
      offended and doesn't deactivate the interrupt on EOI.
      
      Another connected issue is that we will happilly make pending
      an interrupt from another vcpu, overriding the above zero with
      something that is just as inconsistent. Don't do that.
      
      The final issue is that we signal a maintenance interrupt when
      no pending interrupts are present in the LR. Assuming we've fixed
      the two issues above, we end-up in a situation where we keep
      exiting as soon as we've reached the active state, and not be
      able to inject the following pending.
      
      The fix comes in 3 parts:
      - GICv2 SGIs have their source vcpu saved if they are active on
        exit, and restored on entry
      - Multi-SGIs cannot go via the Pending+Active state, as this would
        corrupt the source field
      - Multi-SGIs are converted to using MI on EOI instead of NPIE
      
      Fixes: 16ca6a60 ("KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid")
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      53692908
  11. 26 3月, 2018 1 次提交
    • M
      KVM: arm/arm64: vgic: Disallow Active+Pending for level interrupts · 67b5b673
      Marc Zyngier 提交于
      It was recently reported that VFIO mediated devices, and anything
      that VFIO exposes as level interrupts, do no strictly follow the
      expected logic of such interrupts as it only lowers the input
      line when the guest has EOId the interrupt at the GIC level, rather
      than when it Acked the interrupt at the device level.
      
      THe GIC's Active+Pending state is fundamentally incompatible with
      this behaviour, as it prevents KVM from observing the EOI, and in
      turn results in VFIO never dropping the line. This results in an
      interrupt storm in the guest, which it really never expected.
      
      As we cannot really change VFIO to follow the strict rules of level
      signalling, let's forbid the A+P state altogether, as it is in the
      end only an optimization. It ensures that we will transition via
      an invalid state, which we can use to notify VFIO of the EOI.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      67b5b673
  12. 19 3月, 2018 3 次提交
    • C
      KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs · 2d0e63e0
      Christoffer Dall 提交于
      We can finally get completely rid of any calls to the VGICv3
      save/restore functions when the AP lists are empty on VHE systems.  This
      requires carefully factoring out trap configuration from saving and
      restoring state, and carefully choosing what to do on the VHE and
      non-VHE path.
      
      One of the challenges is that we cannot save/restore the VMCR lazily
      because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
      emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
      up being delivered as FIQ.
      
      To solve this problem, and still provide fast performance in the fast
      path of exiting a VM when no interrupts are pending (which also
      optimized the latency for actually delivering virtual interrupts coming
      from physical interrupts), we orchestrate a dance of only doing the
      activate/deactivate traps in vgic load/put for VHE systems (which can
      have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
      configuration on every round-trip on non-VHE systems.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      2d0e63e0
    • C
      KVM: arm/arm64: Move VGIC APR save/restore to vgic put/load · 923a2e30
      Christoffer Dall 提交于
      The APRs can only have bits set when the guest acknowledges an interrupt
      in the LR and can only have a bit cleared when the guest EOIs an
      interrupt in the LR.  Therefore, if we have no LRs with any
      pending/active interrupts, the APR cannot change value and there is no
      need to clear it on every exit from the VM (hint: it will have already
      been cleared when we exited the guest the last time with the LRs all
      EOIed).
      
      The only case we need to take care of is when we migrate the VCPU away
      from a CPU or migrate a new VCPU onto a CPU, or when we return to
      userspace to capture the state of the VCPU for migration.  To make sure
      this works, factor out the APR save/restore functionality into separate
      functions called from the VCPU (and by extension VGIC) put/load hooks.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      923a2e30
    • C
      KVM: arm/arm64: Get rid of vgic_elrsr · bb5ed703
      Christoffer Dall 提交于
      There is really no need to store the vgic_elrsr on the VGIC data
      structures as the only need we have for the elrsr is to figure out if an
      LR is inactive when we save the VGIC state upon returning from the
      guest.  We can might as well store this in a temporary local variable.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      bb5ed703
  13. 15 3月, 2018 1 次提交
    • M
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · 16ca6a60
      Marc Zyngier 提交于
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
        injected.
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      16ca6a60
  14. 02 1月, 2018 1 次提交
    • C
      KVM: arm/arm64: vgic: Support level-triggered mapped interrupts · e40cc57b
      Christoffer Dall 提交于
      Level-triggered mapped IRQs are special because we only observe rising
      edges as input to the VGIC, and we don't set the EOI flag and therefore
      are not told when the level goes down, so that we can re-queue a new
      interrupt when the level goes up.
      
      One way to solve this problem is to side-step the logic of the VGIC and
      special case the validation in the injection path, but it has the
      unfortunate drawback of having to peak into the physical GIC state
      whenever we want to know if the interrupt is pending on the virtual
      distributor.
      
      Instead, we can maintain the current semantics of a level triggered
      interrupt by sort of treating it as an edge-triggered interrupt,
      following from the fact that we only observe an asserting edge.  This
      requires us to be a bit careful when populating the LRs and when folding
      the state back in though:
      
       * We lower the line level when populating the LR, so that when
         subsequently observing an asserting edge, the VGIC will do the right
         thing.
      
       * If the guest never acked the interrupt while running (for example if
         it had masked interrupts at the CPU level while running), we have
         to preserve the pending state of the LR and move it back to the
         line_level field of the struct irq when folding LR state.
      
         If the guest never acked the interrupt while running, but changed the
         device state and lowered the line (again with interrupts masked) then
         we need to observe this change in the line_level.
      
         Both of the above situations are solved by sampling the physical line
         and set the line level when folding the LR back.
      
       * Finally, if the guest never acked the interrupt while running and
         sampling the line reveals that the device state has changed and the
         line has been lowered, we must clear the physical active state, since
         we will otherwise never be told when the interrupt becomes asserted
         again.
      
      This has the added benefit of making the timer optimization patches
      (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
      bit simpler, because the timer code doesn't have to clear the active
      state on the sync anymore.  It also potentially improves the performance
      of the timer implementation because the GIC knows the state or the LR
      and only needs to clear the
      active state when the pending bit in the LR is still set, where the
      timer has to always clear it when returning from running the guest with
      an injected timer interrupt.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e40cc57b
  15. 29 11月, 2017 1 次提交
  16. 10 11月, 2017 1 次提交
  17. 06 11月, 2017 1 次提交
  18. 15 6月, 2017 8 次提交
  19. 24 5月, 2017 1 次提交
  20. 15 5月, 2017 1 次提交
  21. 09 5月, 2017 4 次提交
  22. 08 5月, 2017 2 次提交
  23. 19 4月, 2017 1 次提交
    • M
      KVM: arm/arm64: vgic-v3: De-optimize VMCR save/restore when emulating a GICv2 · ff567614
      Marc Zyngier 提交于
      When emulating a GICv2-on-GICv3, special care must be taken to only
      save/restore VMCR_EL2 when ICC_SRE_EL1.SRE is cleared. Otherwise,
      all Group-0 interrupts end-up being delivered as FIQ, which is
      probably not what the guest expects, as demonstrated here with
      an unhappy EFI:
      
      	FIQ Exception at 0x000000013BD21CC4
      
      This means that we cannot perform the load/put trick when dealing
      with VMCR_EL2 (because the host has SRE set), and we have to deal
      with it in the world-switch.
      
      Fortunately, this is not the most common case (modern guests should
      be able to deal with GICv3 directly), and the performance is not worse
      than what it was before the VMCR optimization.
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <cdall@linaro.org>
      ff567614