1. 25 8月, 2019 1 次提交
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 8c7053d1
      Marc Zyngier 提交于
      commit 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c upstream.
      
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      8c7053d1
  2. 09 6月, 2019 1 次提交
    • T
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · 6a2fbec7
      Thomas Huth 提交于
      commit a86cb413f4bf273a9d341a3ab2c2ca44e12eb317 upstream.
      
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      6a2fbec7
  3. 26 5月, 2019 1 次提交
  4. 24 3月, 2019 1 次提交
  5. 17 1月, 2019 1 次提交
    • C
      KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · 4f14f446
      Christoffer Dall 提交于
      commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream.
      
      We recently addressed a VMID generation race by introducing a read/write
      lock around accesses and updates to the vmid generation values.
      
      However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
      does so without taking the read lock.
      
      As far as I can tell, this can lead to the same kind of race:
      
        VM 0, VCPU 0			VM 0, VCPU 1
        ------------			------------
        update_vttbr (vmid 254)
        				update_vttbr (vmid 1) // roll over
      				read_lock(kvm_vmid_lock);
      				force_vm_exit()
        local_irq_disable
        need_new_vmid_gen == false //because vmid gen matches
      
        enter_guest (vmid 254)
        				kvm_arch.vttbr = <PGD>:<VMID 1>
      				read_unlock(kvm_vmid_lock);
      
        				enter_guest (vmid 1)
      
      Which results in running two VCPUs in the same VM with different VMIDs
      and (even worse) other VCPUs from other VMs could now allocate clashing
      VMID 254 from the new generation as long as VCPU 0 is not exiting.
      
      Attempt to solve this by making sure vttbr is updated before another CPU
      can observe the updated VMID generation.
      
      Cc: stable@vger.kernel.org
      Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race"
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f14f446
  6. 14 11月, 2018 1 次提交
    • M
      KVM: arm64: Fix caching of host MDCR_EL2 value · 59571785
      Mark Rutland 提交于
      commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream.
      
      At boot time, KVM stashes the host MDCR_EL2 value, but only does this
      when the kernel is not running in hyp mode (i.e. is non-VHE). In these
      cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can
      lead to CONSTRAINED UNPREDICTABLE behaviour.
      
      Since we use this value to derive the MDCR_EL2 value when switching
      to/from a guest, after a guest have been run, the performance counters
      do not behave as expected. This has been observed to result in accesses
      via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant
      counters, resulting in events not being counted. In these cases, only
      the fixed-purpose cycle counter appears to work as expected.
      
      Fix this by always stashing the host MDCR_EL2 value, regardless of VHE.
      
      Cc: Christopher Dall <christoffer.dall@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP")
      Tested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59571785
  7. 21 7月, 2018 3 次提交
  8. 09 7月, 2018 1 次提交
  9. 20 6月, 2018 1 次提交
  10. 02 6月, 2018 2 次提交
  11. 01 6月, 2018 1 次提交
  12. 25 5月, 2018 4 次提交
    • E
      KVM: arm/arm64: Remove kvm_vgic_vcpu_early_init · 5ec17fba
      Eric Auger 提交于
      kvm_vgic_vcpu_early_init gets called after kvm_vgic_cpu_init which
      is confusing. The call path is as follows:
      kvm_vm_ioctl_create_vcpu
      |_ kvm_arch_cpu_create
         |_ kvm_vcpu_init
            |_ kvm_arch_vcpu_init
               |_ kvm_vgic_vcpu_init
      |_ kvm_arch_vcpu_postcreate
         |_ kvm_vgic_vcpu_early_init
      
      Static initialization currently done in kvm_vgic_vcpu_early_init()
      can be moved to kvm_vgic_vcpu_init(). So let's move the code and
      remove kvm_vgic_vcpu_early_init(). kvm_arch_vcpu_postcreate() does
      nothing.
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      5ec17fba
    • D
      KVM: arm64: Remove eager host SVE state saving · 21cdd7fd
      Dave Martin 提交于
      Now that the host SVE context can be saved on demand from Hyp,
      there is no longer any need to save this state in advance before
      entering the guest.
      
      This patch removes the relevant call to
      kvm_fpsimd_flush_cpu_state().
      
      Since the problem that function was intended to solve now no longer
      exists, the function and its dependencies are also deleted.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      21cdd7fd
    • D
      KVM: arm64: Save host SVE context as appropriate · 85acda3b
      Dave Martin 提交于
      This patch adds SVE context saving to the hyp FPSIMD context switch
      path.  This means that it is no longer necessary to save the host
      SVE state in advance of entering the guest, when in use.
      
      In order to avoid adding pointless complexity to the code, VHE is
      assumed if SVE is in use.  VHE is an architectural prerequisite for
      SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in
      kernels that support both SVE and KVM.
      
      Historically, software models exist that can expose the
      architecturally invalid configuration of SVE without VHE, so if
      this situation is detected at kvm_init() time then KVM will be
      disabled.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85acda3b
    • D
      KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing · e6b673b7
      Dave Martin 提交于
      This patch refactors KVM to align the host and guest FPSIMD
      save/restore logic with each other for arm64.  This reduces the
      number of redundant save/restore operations that must occur, and
      reduces the common-case IRQ blackout time during guest exit storms
      by saving the host state lazily and optimising away the need to
      restore the host state before returning to the run loop.
      
      Four hooks are defined in order to enable this:
      
       * kvm_arch_vcpu_run_map_fp():
         Called on PID change to map necessary bits of current to Hyp.
      
       * kvm_arch_vcpu_load_fp():
         Set up FP/SIMD for entering the KVM run loop (parse as
         "vcpu_load fp").
      
       * kvm_arch_vcpu_ctxsync_fp():
         Get FP/SIMD into a safe state for re-enabling interrupts after a
         guest exit back to the run loop.
      
         For arm64 specifically, this involves updating the host kernel's
         FPSIMD context tracking metadata so that kernel-mode NEON use
         will cause the vcpu's FPSIMD state to be saved back correctly
         into the vcpu struct.  This must be done before re-enabling
         interrupts because kernel-mode NEON may be used by softirqs.
      
       * kvm_arch_vcpu_put_fp():
         Save guest FP/SIMD state back to memory and dissociate from the
         CPU ("vcpu_put fp").
      
      Also, the arm64 FPSIMD context switch code is updated to enable it
      to save back FPSIMD state for a vcpu, not just current.  A few
      helpers drive this:
      
       * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
         mark this CPU as having context fp (which may belong to a vcpu)
         currently loaded in its registers.  This is the non-task
         equivalent of the static function fpsimd_bind_to_cpu() in
         fpsimd.c.
      
       * task_fpsimd_save():
         exported to allow KVM to save the guest's FPSIMD state back to
         memory on exit from the run loop.
      
       * fpsimd_flush_state():
         invalidate any context's FPSIMD state that is currently loaded.
         Used to disassociate the vcpu from the CPU regs on run loop exit.
      
      These changes allow the run loop to enable interrupts (and thus
      softirqs that may use kernel-mode NEON) without having to save the
      guest's FPSIMD state eagerly.
      
      Some new vcpu_arch fields are added to make all this work.  Because
      host FPSIMD state can now be saved back directly into current's
      thread_struct as appropriate, host_cpu_context is no longer used
      for preserving the FPSIMD state.  However, it is still needed for
      preserving other things such as the host's system registers.  To
      avoid ABI churn, the redundant storage space in host_cpu_context is
      not removed for now.
      
      arch/arm is not addressed by this patch and continues to use its
      current save/restore logic.  It could provide implementations of
      the helpers later if desired.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e6b673b7
  13. 17 4月, 2018 1 次提交
    • M
      KVM: arm/arm64: Close VMID generation race · f0cf47d9
      Marc Zyngier 提交于
      Before entering the guest, we check whether our VMID is still
      part of the current generation. In order to avoid taking a lock,
      we start with checking that the generation is still current, and
      only if not current do we take the lock, recheck, and update the
      generation and VMID.
      
      This leaves open a small race: A vcpu can bump up the global
      generation number as well as the VM's, but has not updated
      the VMID itself yet.
      
      At that point another vcpu from the same VM comes in, checks
      the generation (and finds it not needing anything), and jumps
      into the guest. At this point, we end-up with two vcpus belonging
      to the same VM running with two different VMIDs. Eventually, the
      VMID used by the second vcpu will get reassigned, and things will
      really go wrong...
      
      A simple solution would be to drop this initial check, and always take
      the lock. This is likely to cause performance issues. A middle ground
      is to convert the spinlock to a rwlock, and only take the read lock
      on the fast path. If the check fails at that point, drop it and
      acquire the write lock, rechecking the condition.
      
      This ensures that the above scenario doesn't occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NShannon Zhao <zhaoshenglong@huawei.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      f0cf47d9
  14. 19 3月, 2018 6 次提交
  15. 15 3月, 2018 1 次提交
    • C
      KVM: arm/arm64: Avoid vcpu_load for other vcpu ioctls than KVM_RUN · e21a4f3a
      Christoffer Dall 提交于
      Calling vcpu_load() registers preempt notifiers for this vcpu and calls
      kvm_arch_vcpu_load().  The latter will soon be doing a lot of heavy
      lifting on arm/arm64 and will try to do things such as enabling the
      virtual timer and setting us up to handle interrupts from the timer
      hardware.
      
      Loading state onto hardware registers and enabling hardware to signal
      interrupts can be problematic when we're not actually about to run the
      VCPU, because it makes it difficult to establish the right context when
      handling interrupts from the timer, and it makes the register access
      code difficult to reason about.
      
      Luckily, now when we call vcpu_load in each ioctl implementation, we can
      simply remove the call from the non-KVM_RUN vcpu ioctls, and our
      kvm_arch_vcpu_load() is only used for loading vcpu content to the
      physical CPU when we're actually going to run the vcpu.
      
      Cc: stable@vger.kernel.org
      Fixes: 9b062471 ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl")
      Reviewed-by: NJulien Grall <julien.grall@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e21a4f3a
  16. 07 2月, 2018 1 次提交
  17. 31 1月, 2018 1 次提交
  18. 23 1月, 2018 1 次提交
    • J
      KVM: arm/arm64: Handle CPU_PM_ENTER_FAILED · 58d6b15e
      James Morse 提交于
      cpu_pm_enter() calls the pm notifier chain with CPU_PM_ENTER, then if
      there is a failure: CPU_PM_ENTER_FAILED.
      
      When KVM receives CPU_PM_ENTER it calls cpu_hyp_reset() which will
      return us to the hyp-stub. If we subsequently get a CPU_PM_ENTER_FAILED,
      KVM does nothing, leaving the CPU running with the hyp-stub, at odds
      with kvm_arm_hardware_enabled.
      
      Add CPU_PM_ENTER_FAILED as a fallthrough for CPU_PM_EXIT, this reloads
      KVM based on kvm_arm_hardware_enabled. This is safe even if CPU_PM_ENTER
      never gets as far as KVM, as cpu_hyp_reinit() calls cpu_hyp_reset()
      to make sure the hyp-stub is loaded before reloading KVM.
      
      Fixes: 67f69197 ("arm64: kvm: allows kvm cpu hotplug")
      Cc: <stable@vger.kernel.org> # v4.7+
      CC: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      58d6b15e
  19. 16 1月, 2018 2 次提交
    • J
      KVM: arm64: Handle RAS SErrors from EL1 on guest exit · 3368bd80
      James Morse 提交于
      We expect to have firmware-first handling of RAS SErrors, with errors
      notified via an APEI method. For systems without firmware-first, add
      some minimal handling to KVM.
      
      There are two ways KVM can take an SError due to a guest, either may be a
      RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
      or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
      
      For SError that interrupt a guest and are routed to EL2 the existing
      behaviour is to inject an impdef SError into the guest.
      
      Add code to handle RAS SError based on the ESR. For uncontained and
      uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
      errors compromise the host too. All other error types are contained:
      For the fatal errors the vCPU can't make progress, so we inject a virtual
      SError. We ignore contained errors where we can make progress as if
      we're lucky, we may not hit them again.
      
      If only some of the CPUs support RAS the guest will see the cpufeature
      sanitised version of the id registers, but we may still take RAS SError
      on this CPU. Move the SError handling out of handle_exit() into a new
      handler that runs before we can be preempted. This allows us to use
      this_cpu_has_cap(), via arm64_is_ras_serror().
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      3368bd80
    • J
      KVM: arm/arm64: mask/unmask daif around VHE guests · 4f5abad9
      James Morse 提交于
      Non-VHE systems take an exception to EL2 in order to world-switch into the
      guest. When returning from the guest KVM implicitly restores the DAIF
      flags when it returns to the kernel at EL1.
      
      With VHE none of this exception-level jumping happens, so KVMs
      world-switch code is exposed to the host kernel's DAIF values, and KVM
      spills the guest-exit DAIF values back into the host kernel.
      On entry to a guest we have Debug and SError exceptions unmasked, KVM
      has switched VBAR but isn't prepared to handle these. On guest exit
      Debug exceptions are left disabled once we return to the host and will
      stay this way until we enter user space.
      
      Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
      happen after the hosts VBAR value has been synchronised by the isb in
      __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
      setting KVMs VBAR value, but is kept here for symmetry.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4f5abad9
  20. 13 1月, 2018 1 次提交
  21. 09 1月, 2018 1 次提交
  22. 02 1月, 2018 2 次提交
  23. 23 12月, 2017 1 次提交
  24. 18 12月, 2017 1 次提交
  25. 14 12月, 2017 3 次提交