1. 15 5月, 2022 1 次提交
  2. 22 1月, 2022 1 次提交
  3. 16 12月, 2021 1 次提交
  4. 15 12月, 2021 1 次提交
  5. 08 12月, 2021 1 次提交
  6. 17 10月, 2021 2 次提交
    • M
      KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible · 0924729b
      Marc Zyngier 提交于
      On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
      accesses from the guest. From a performance perspective, this is OK
      as long as the guest doesn't hammer the GICv3 CPU interface.
      
      In most cases, this is fine, unless the guest actively uses
      priorities and switches PMR_EL1 very often. Which is exactly what
      happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
      In these condition, the performance plumets as we hit PMR each time
      we mask/unmask interrupts. Not good.
      
      There is however an opportunity for improvement. Careful reading
      of the architecture specification indicates that the only GICv3
      sysreg belonging to the common group (which contains the SGI
      registers, PMR, DIR, CTLR and RPR) that is allowed to generate
      a SError is DIR. Everything else is safe.
      
      It is thus possible to substitute the trapping of all the common
      group with just that of DIR if it supported by the implementation.
      Yes, that's yet another optional bit of the architecture.
      So let's just do that, as it leads to some impressive result on
      the M1:
      
      Without this change:
      	bash-5.1# /host/home/maz/hackbench 100 process 1000
      	Running with 100*40 (== 4000) tasks.
      	Time: 56.596
      
      With this change:
      	bash-5.1# /host/home/maz/hackbench 100 process 1000
      	Running with 100*40 (== 4000) tasks.
      	Time: 8.649
      
      which is a pretty convincing result.
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
      Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
      0924729b
    • M
      KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors · df652bcf
      Marc Zyngier 提交于
      The infamous M1 has a feature nobody else ever implemented,
      in the form of the "GIC locally generated SError interrupts",
      also known as SEIS for short.
      
      These SErrors are generated when a guest does something that violates
      the GIC state machine. It would have been simpler to just *ignore*
      the damned thing, but that's not what this HW does. Oh well.
      
      This part of of the architecture is also amazingly under-specified.
      There is a whole 10 lines that describe the feature in a spec that
      is 930 pages long, and some of these lines are factually wrong.
      Oh, and it is deprecated, so the insentive to clarify it is low.
      
      Now, the spec says that this should be a *virtual* SError when
      HCR_EL2.AMO is set. As it turns out, that's not always the case
      on this CPU, and the SError sometimes fires on the host as a
      physical SError. Goodbye, cruel world. This clearly is a HW bug,
      and it means that a guest can easily take the host down, on demand.
      
      Thankfully, we have seen systems that were just as broken in the
      past, and we have the perfect vaccine for it.
      
      Apple M1, please meet the Cavium ThunderX workaround. All your
      GIC accesses will be trapped, sanitised, and emulated. Only the
      signalling aspect of the HW will be used. It won't be super speedy,
      but it will at least be safe. You're most welcome.
      
      Given that this has only ever been seen on this single implementation,
      that the spec is unclear at best and that we cannot trust it to ever
      be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
      being set.
      Tested-by: NJoey Gouly <joey.gouly@arm.com>
      Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
      df652bcf
  7. 11 10月, 2021 1 次提交
    • R
      KVM: arm64: vgic-v3: Check redist region is not above the VM IPA size · 4612d98f
      Ricardo Koller 提交于
      Verify that the redistributor regions do not extend beyond the
      VM-specified IPA range (phys_size). This can happen when using
      KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
      with:
      
        base + size > phys_size AND base < phys_size
      
      Add the missing check into vgic_v3_alloc_redist_region() which is called
      when setting the regions, and into vgic_v3_check_base() which is called
      when attempting the first vcpu-run. The vcpu-run check does not apply to
      KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
      before the first vcpu-run. Note that using the REDIST_REGIONS API
      results in a different check, which already exists, at first vcpu run:
      that the number of redist regions is enough for all vcpus.
      
      Finally, this patch also enables some extra tests in
      vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
      redist api: like checking that the REDIST region can fit all the already
      created vcpus.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NRicardo Koller <ricarkol@google.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211005011921.437353-3-ricarkol@google.com
      4612d98f
  8. 20 8月, 2021 1 次提交
  9. 01 6月, 2021 1 次提交
    • M
      KVM: arm64: vgic: Implement SW-driven deactivation · 354920e7
      Marc Zyngier 提交于
      In order to deal with these systems that do not offer HW-based
      deactivation of interrupts, let implement a SW-based approach:
      
      - When the irq is queued into a LR, treat it as a pure virtual
        interrupt and set the EOI flag in the LR.
      
      - When the interrupt state is read back from the LR, force a
        deactivation when the state is invalid (neither active nor
        pending)
      
      Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      354920e7
  10. 25 3月, 2021 1 次提交
  11. 06 3月, 2021 2 次提交
  12. 27 12月, 2020 1 次提交
  13. 24 12月, 2020 1 次提交
  14. 16 9月, 2020 1 次提交
  15. 28 5月, 2020 1 次提交
  16. 16 5月, 2020 2 次提交
  17. 24 3月, 2020 2 次提交
  18. 29 10月, 2019 3 次提交
  19. 28 8月, 2019 1 次提交
  20. 19 8月, 2019 1 次提交
  21. 05 8月, 2019 1 次提交
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e
      Marc Zyngier 提交于
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      5eeaf10e
  22. 19 6月, 2019 1 次提交
  23. 20 3月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memory · a6ecfb11
      Marc Zyngier 提交于
      When halting a guest, QEMU flushes the virtual ITS caches, which
      amounts to writing to the various tables that the guest has allocated.
      
      When doing this, we fail to take the srcu lock, and the kernel
      shouts loudly if running a lockdep kernel:
      
      [   69.680416] =============================
      [   69.680819] WARNING: suspicious RCU usage
      [   69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted
      [   69.682096] -----------------------------
      [   69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage!
      [   69.683225]
      [   69.683225] other info that might help us debug this:
      [   69.683225]
      [   69.683975]
      [   69.683975] rcu_scheduler_active = 2, debug_locks = 1
      [   69.684598] 6 locks held by qemu-system-aar/4097:
      [   69.685059]  #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0
      [   69.686087]  #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0
      [   69.686919]  #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.687698]  #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.688475]  #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.689978]  #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0
      [   69.690729]
      [   69.690729] stack backtrace:
      [   69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18
      [   69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019
      [   69.692831] Call trace:
      [   69.694072]  lockdep_rcu_suspicious+0xcc/0x110
      [   69.694490]  gfn_to_memslot+0x174/0x190
      [   69.694853]  kvm_write_guest+0x50/0xb0
      [   69.695209]  vgic_its_save_tables_v0+0x248/0x330
      [   69.695639]  vgic_its_set_attr+0x298/0x3a0
      [   69.696024]  kvm_device_ioctl_attr+0x9c/0xd8
      [   69.696424]  kvm_device_ioctl+0x8c/0xf8
      [   69.696788]  do_vfs_ioctl+0xc8/0x960
      [   69.697128]  ksys_ioctl+0x8c/0xa0
      [   69.697445]  __arm64_sys_ioctl+0x28/0x38
      [   69.697817]  el0_svc_common+0xd8/0x138
      [   69.698173]  el0_svc_handler+0x38/0x78
      [   69.698528]  el0_svc+0x8/0xc
      
      The fix is to obviously take the srcu lock, just like we do on the
      read side of things since bf308242. One wonders why this wasn't
      fixed at the same time, but hey...
      
      Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      a6ecfb11
  24. 20 2月, 2019 1 次提交
  25. 24 1月, 2019 1 次提交
  26. 12 8月, 2018 1 次提交
  27. 21 7月, 2018 1 次提交
  28. 21 6月, 2018 1 次提交
    • A
      KVM: arm/arm64: Drop resource size check for GICV window · ba56bc3a
      Ard Biesheuvel 提交于
      When booting a 64 KB pages kernel on a ACPI GICv3 system that
      implements support for v2 emulation, the following warning is
      produced
      
        GICV size 0x2000 not a multiple of page size 0x10000
      
      and support for v2 emulation is disabled, preventing GICv2 VMs
      from being able to run on such hosts.
      
      The reason is that vgic_v3_probe() performs a sanity check on the
      size of the window (it should be a multiple of the page size),
      while the ACPI MADT parsing code hardcodes the size of the window
      to 8 KB. This makes sense, considering that ACPI does not bother
      to describe the size in the first place, under the assumption that
      platforms implementing ACPI will follow the architecture and not
      put anything else in the same 64 KB window.
      
      So let's just drop the sanity check altogether, and assume that
      the window is at least 64 KB in size.
      
      Fixes: 90977732 ("KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init")
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ba56bc3a
  29. 25 5月, 2018 5 次提交
  30. 15 5月, 2018 1 次提交