1. 05 8月, 2019 1 次提交
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e
      Marc Zyngier 提交于
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      5eeaf10e
  2. 19 6月, 2019 1 次提交
  3. 03 4月, 2019 1 次提交
  4. 20 3月, 2019 1 次提交
  5. 24 1月, 2019 3 次提交
  6. 20 12月, 2018 2 次提交
    • C
      KVM: arm/arm64: vgic: Consider priority and active state for pending irq · 9009782a
      Christoffer Dall 提交于
      When checking if there are any pending IRQs for the VM, consider the
      active state and priority of the IRQs as well.
      
      Otherwise we could be continuously scheduling a guest hypervisor without
      it seeing an IRQ.
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      9009782a
    • G
      KVM: arm/arm64: vgic: Fix off-by-one bug in vgic_get_irq() · c23b2e6f
      Gustavo A. R. Silva 提交于
      When using the nospec API, it should be taken into account that:
      
      "...if the CPU speculates past the bounds check then
       * array_index_nospec() will clamp the index within the range of [0,
       * size)."
      
      The above is part of the header for macro array_index_nospec() in
      linux/nospec.h
      
      Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI
      or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up
      returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead
      of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic:
      
      	/* SGIs and PPIs */
      	if (intid <= VGIC_MAX_PRIVATE)
       		return &vcpu->arch.vgic_cpu.private_irqs[intid];
      
       	/* SPIs */
      	if (intid <= VGIC_MAX_SPI)
       		return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
      
      are valid values for intid.
      
      Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1
      and VGIC_MAX_SPI + 1 as arguments for its parameter size.
      
      Fixes: 41b87599 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      [dropped the SPI part which was fixed separately]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      c23b2e6f
  7. 18 12月, 2018 1 次提交
  8. 13 11月, 2018 1 次提交
  9. 12 8月, 2018 2 次提交
  10. 15 5月, 2018 1 次提交
  11. 27 4月, 2018 2 次提交
  12. 17 4月, 2018 1 次提交
  13. 19 3月, 2018 3 次提交
    • C
      KVM: arm/arm64: Avoid VGICv3 save/restore on VHE with no IRQs · 2d0e63e0
      Christoffer Dall 提交于
      We can finally get completely rid of any calls to the VGICv3
      save/restore functions when the AP lists are empty on VHE systems.  This
      requires carefully factoring out trap configuration from saving and
      restoring state, and carefully choosing what to do on the VHE and
      non-VHE path.
      
      One of the challenges is that we cannot save/restore the VMCR lazily
      because we can only write the VMCR when ICC_SRE_EL1.SRE is cleared when
      emulating a GICv2-on-GICv3, since otherwise all Group-0 interrupts end
      up being delivered as FIQ.
      
      To solve this problem, and still provide fast performance in the fast
      path of exiting a VM when no interrupts are pending (which also
      optimized the latency for actually delivering virtual interrupts coming
      from physical interrupts), we orchestrate a dance of only doing the
      activate/deactivate traps in vgic load/put for VHE systems (which can
      have ICC_SRE_EL1.SRE cleared when running in the host), and doing the
      configuration on every round-trip on non-VHE systems.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      2d0e63e0
    • C
      KVM: arm/arm64: Handle VGICv3 save/restore from the main VGIC code on VHE · 771621b0
      Christoffer Dall 提交于
      Just like we can program the GICv2 hypervisor control interface directly
      from the core vgic code, we can do the same for the GICv3 hypervisor
      control interface on VHE systems.
      
      We do this by simply calling the save/restore functions when we have VHE
      and we can then get rid of the save/restore function calls from the VHE
      world switch function.
      
      One caveat is that we now write GICv3 system register state before the
      potential early exit path in the run loop, and because we sync back
      state in the early exit path, we have to ensure that we read a
      consistent GIC state from the sync path, even though we have never
      actually run the guest with the newly written GIC state.  We solve this
      by inserting an ISB in the early exit path.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      771621b0
    • C
      KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code · 75174ba6
      Christoffer Dall 提交于
      We can program the GICv2 hypervisor control interface logic directly
      from the core vgic code and can instead do the save/restore directly
      from the flush/sync functions, which can lead to a number of future
      optimizations.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      75174ba6
  14. 15 3月, 2018 2 次提交
    • M
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · 16ca6a60
      Marc Zyngier 提交于
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
        injected.
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      16ca6a60
    • C
      KVM: arm/arm64: Reset mapped IRQs on VM reset · 413aa807
      Christoffer Dall 提交于
      We currently don't allow resetting mapped IRQs from userspace, because
      their state is controlled by the hardware.  But we do need to reset the
      state when the VM is reset, so we provide a function for the 'owner' of
      the mapped interrupt to reset the interrupt state.
      
      Currently only the timer uses mapped interrupts, so we call this
      function from the timer reset logic.
      
      Cc: stable@vger.kernel.org
      Fixes: 4c60e360 ("KVM: arm/arm64: Provide a get_input_level for the arch timer")
      Signed-off-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      413aa807
  15. 02 1月, 2018 3 次提交
    • C
      KVM: arm/arm64: Support VGIC dist pend/active changes for mapped IRQs · df635c5b
      Christoffer Dall 提交于
      For mapped IRQs (with the HW bit set in the LR) we have to follow some
      rules of the architecture.  One of these rules is that VM must not be
      allowed to deactivate a virtual interrupt with the HW bit set unless the
      physical interrupt is also active.
      
      This works fine when injecting mapped interrupts, because we leave it up
      to the injector to either set EOImode==1 or manually set the active
      state of the physical interrupt.
      
      However, the guest can set virtual interrupt to be pending or active by
      writing to the virtual distributor, which could lead to deactivating a
      virtual interrupt with the HW bit set without the physical interrupt
      being active.
      
      We could set the physical interrupt to active whenever we are about to
      enter the VM with a HW interrupt either pending or active, but that
      would be really slow, especially on GICv2.  So we take the long way
      around and do the hard work when needed, which is expected to be
      extremely rare.
      
      When the VM sets the pending state for a HW interrupt on the virtual
      distributor we set the active state on the physical distributor, because
      the virtual interrupt can become active and then the guest can
      deactivate it.
      
      When the VM clears the pending state we also clear it on the physical
      side, because the injector might otherwise raise the interrupt.  We also
      clear the physical active state when the virtual interrupt is not
      active, since otherwise a SPEND/CPEND sequence from the guest would
      prevent signaling of future interrupts.
      
      Changing the state of mapped interrupts from userspace is not supported,
      and it's expected that userspace unmaps devices from VFIO before
      attempting to set the interrupt state, because the interrupt state is
      driven by hardware.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      df635c5b
    • C
      KVM: arm/arm64: Support a vgic interrupt line level sample function · b6909a65
      Christoffer Dall 提交于
      The GIC sometimes need to sample the physical line of a mapped
      interrupt.  As we know this to be notoriously slow, provide a callback
      function for devices (such as the timer) which can do this much faster
      than talking to the distributor, for example by comparing a few
      in-memory values.  Fall back to the good old method of poking the
      physical GIC if no callback is provided.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      b6909a65
    • C
      KVM: arm/arm64: vgic: Support level-triggered mapped interrupts · e40cc57b
      Christoffer Dall 提交于
      Level-triggered mapped IRQs are special because we only observe rising
      edges as input to the VGIC, and we don't set the EOI flag and therefore
      are not told when the level goes down, so that we can re-queue a new
      interrupt when the level goes up.
      
      One way to solve this problem is to side-step the logic of the VGIC and
      special case the validation in the injection path, but it has the
      unfortunate drawback of having to peak into the physical GIC state
      whenever we want to know if the interrupt is pending on the virtual
      distributor.
      
      Instead, we can maintain the current semantics of a level triggered
      interrupt by sort of treating it as an edge-triggered interrupt,
      following from the fact that we only observe an asserting edge.  This
      requires us to be a bit careful when populating the LRs and when folding
      the state back in though:
      
       * We lower the line level when populating the LR, so that when
         subsequently observing an asserting edge, the VGIC will do the right
         thing.
      
       * If the guest never acked the interrupt while running (for example if
         it had masked interrupts at the CPU level while running), we have
         to preserve the pending state of the LR and move it back to the
         line_level field of the struct irq when folding LR state.
      
         If the guest never acked the interrupt while running, but changed the
         device state and lowered the line (again with interrupts masked) then
         we need to observe this change in the line_level.
      
         Both of the above situations are solved by sampling the physical line
         and set the line level when folding the LR back.
      
       * Finally, if the guest never acked the interrupt while running and
         sampling the line reveals that the device state has changed and the
         line has been lowered, we must clear the physical active state, since
         we will otherwise never be told when the interrupt becomes asserted
         again.
      
      This has the added benefit of making the timer optimization patches
      (https://lists.cs.columbia.edu/pipermail/kvmarm/2017-July/026343.html) a
      bit simpler, because the timer code doesn't have to clear the active
      state on the sync anymore.  It also potentially improves the performance
      of the timer implementation because the GIC knows the state or the LR
      and only needs to clear the
      active state when the pending bit in the LR is still set, where the
      timer has to always clear it when returning from running the guest with
      an injected timer interrupt.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e40cc57b
  16. 01 12月, 2017 1 次提交
  17. 29 11月, 2017 1 次提交
  18. 10 11月, 2017 2 次提交
  19. 07 11月, 2017 1 次提交
  20. 06 11月, 2017 2 次提交
  21. 08 6月, 2017 2 次提交
  22. 04 6月, 2017 1 次提交
    • A
      KVM: arm/arm64: use vcpu requests for irq injection · 325f9c64
      Andrew Jones 提交于
      Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
      kick meant to trigger the interrupt injection could be sent while
      the VCPU is outside guest mode, which means no IPI is sent, and
      after it has called kvm_vgic_flush_hwstate(), meaning it won't see
      the updated GIC state until its next exit some time later for some
      other reason.  The receiving VCPU only needs to check this request
      in VCPU RUN to handle it.  By checking it, if it's pending, a
      memory barrier will be issued that ensures all state is visible.
      See "Ensuring Requests Are Seen" of
      Documentation/virtual/kvm/vcpu-requests.rst
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NChristoffer Dall <cdall@linaro.org>
      325f9c64
  23. 23 5月, 2017 1 次提交
  24. 04 5月, 2017 1 次提交
    • C
      KVM: arm/arm64: Move shared files to virt/kvm/arm · 35d2d5d4
      Christoffer Dall 提交于
      For some time now we have been having a lot of shared functionality
      between the arm and arm64 KVM support in arch/arm, which not only
      required a horrible inter-arch reference from the Makefile in
      arch/arm64/kvm, but also created confusion for newcomers to the code
      base, as was recently seen on the mailing list.
      
      Further, it causes confusion for things like cscope, which needs special
      attention to index specific shared files for arm64 from the arm tree.
      
      Move the shared files into virt/kvm/arm and move the trace points along
      with it.  When moving the tracepoints we have to modify the way the vgic
      creates definitions of the trace points, so we take the chance to
      include the VGIC tracepoints in its very own special vgic trace.h file.
      Signed-off-by: NChristoffer Dall <cdall@linaro.org>
      35d2d5d4
  25. 09 4月, 2017 3 次提交