1. 25 5月, 2018 2 次提交
    • D
      KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing · e6b673b7
      Dave Martin 提交于
      This patch refactors KVM to align the host and guest FPSIMD
      save/restore logic with each other for arm64.  This reduces the
      number of redundant save/restore operations that must occur, and
      reduces the common-case IRQ blackout time during guest exit storms
      by saving the host state lazily and optimising away the need to
      restore the host state before returning to the run loop.
      
      Four hooks are defined in order to enable this:
      
       * kvm_arch_vcpu_run_map_fp():
         Called on PID change to map necessary bits of current to Hyp.
      
       * kvm_arch_vcpu_load_fp():
         Set up FP/SIMD for entering the KVM run loop (parse as
         "vcpu_load fp").
      
       * kvm_arch_vcpu_ctxsync_fp():
         Get FP/SIMD into a safe state for re-enabling interrupts after a
         guest exit back to the run loop.
      
         For arm64 specifically, this involves updating the host kernel's
         FPSIMD context tracking metadata so that kernel-mode NEON use
         will cause the vcpu's FPSIMD state to be saved back correctly
         into the vcpu struct.  This must be done before re-enabling
         interrupts because kernel-mode NEON may be used by softirqs.
      
       * kvm_arch_vcpu_put_fp():
         Save guest FP/SIMD state back to memory and dissociate from the
         CPU ("vcpu_put fp").
      
      Also, the arm64 FPSIMD context switch code is updated to enable it
      to save back FPSIMD state for a vcpu, not just current.  A few
      helpers drive this:
      
       * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp):
         mark this CPU as having context fp (which may belong to a vcpu)
         currently loaded in its registers.  This is the non-task
         equivalent of the static function fpsimd_bind_to_cpu() in
         fpsimd.c.
      
       * task_fpsimd_save():
         exported to allow KVM to save the guest's FPSIMD state back to
         memory on exit from the run loop.
      
       * fpsimd_flush_state():
         invalidate any context's FPSIMD state that is currently loaded.
         Used to disassociate the vcpu from the CPU regs on run loop exit.
      
      These changes allow the run loop to enable interrupts (and thus
      softirqs that may use kernel-mode NEON) without having to save the
      guest's FPSIMD state eagerly.
      
      Some new vcpu_arch fields are added to make all this work.  Because
      host FPSIMD state can now be saved back directly into current's
      thread_struct as appropriate, host_cpu_context is no longer used
      for preserving the FPSIMD state.  However, it is still needed for
      preserving other things such as the host's system registers.  To
      avoid ABI churn, the redundant storage space in host_cpu_context is
      not removed for now.
      
      arch/arm is not addressed by this patch and continues to use its
      current save/restore logic.  It could provide implementations of
      the helpers later if desired.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e6b673b7
    • D
      KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags · fa89d31c
      Dave Martin 提交于
      In struct vcpu_arch, the debug_flags field is used to store
      debug-related flags about the vcpu state.
      
      Since we are about to add some more flags related to FPSIMD and
      SVE, it makes sense to add them to the existing flags field rather
      than adding new fields.  Since there is only one debug_flags flag
      defined so far, there is plenty of free space for expansion.
      
      In preparation for adding more flags, this patch renames the
      debug_flags field to simply "flags", and updates comments
      appropriately.
      
      The flag definitions are also moved to <asm/kvm_host.h>, since
      their presence in <asm/kvm_asm.h> was for purely historical
      reasons:  these definitions are not used from asm any more, and not
      very likely to be as more Hyp asm is migrated to C.
      
      KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit
      1ea66d27 ("arm64: KVM: Move away from the assembly version of
      the world switch"), so this patch gets rid of that too.
      
      No functional change.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
      [maz: fixed minor conflict]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fa89d31c
  2. 20 4月, 2018 1 次提交
    • M
      arm/arm64: KVM: Add PSCI version selection API · 85bd0ba1
      Marc Zyngier 提交于
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85bd0ba1
  3. 19 3月, 2018 7 次提交
  4. 07 2月, 2018 1 次提交
  5. 16 1月, 2018 5 次提交
    • J
      KVM: arm64: Handle RAS SErrors from EL2 on guest exit · 0067df41
      James Morse 提交于
      We expect to have firmware-first handling of RAS SErrors, with errors
      notified via an APEI method. For systems without firmware-first, add
      some minimal handling to KVM.
      
      There are two ways KVM can take an SError due to a guest, either may be a
      RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
      or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
      
      The current SError from EL2 code unmasks SError and tries to fence any
      pending SError into a single instruction window. It then leaves SError
      unmasked.
      
      With the v8.2 RAS Extensions we may take an SError for a 'corrected'
      error, but KVM is only able to handle SError from EL2 if they occur
      during this single instruction window...
      
      The RAS Extensions give us a new instruction to synchronise and
      consume SErrors. The RAS Extensions document (ARM DDI0587),
      '2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
      SError interrupts generated by 'instructions, translation table walks,
      hardware updates to the translation tables, and instruction fetches on
      the same PE'. This makes ESB equivalent to KVMs existing
      'dsb, mrs-daifclr, isb' sequence.
      
      Use the alternatives to synchronise and consume any SError using ESB
      instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
      in the exit_code so that we can restart the vcpu if it turns out this
      SError has no impact on the vcpu.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      0067df41
    • J
      KVM: arm64: Handle RAS SErrors from EL1 on guest exit · 3368bd80
      James Morse 提交于
      We expect to have firmware-first handling of RAS SErrors, with errors
      notified via an APEI method. For systems without firmware-first, add
      some minimal handling to KVM.
      
      There are two ways KVM can take an SError due to a guest, either may be a
      RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
      or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
      
      For SError that interrupt a guest and are routed to EL2 the existing
      behaviour is to inject an impdef SError into the guest.
      
      Add code to handle RAS SError based on the ESR. For uncontained and
      uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
      errors compromise the host too. All other error types are contained:
      For the fatal errors the vCPU can't make progress, so we inject a virtual
      SError. We ignore contained errors where we can make progress as if
      we're lucky, we may not hit them again.
      
      If only some of the CPUs support RAS the guest will see the cpufeature
      sanitised version of the id registers, but we may still take RAS SError
      on this CPU. Move the SError handling out of handle_exit() into a new
      handler that runs before we can be preempted. This allows us to use
      this_cpu_has_cap(), via arm64_is_ras_serror().
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      3368bd80
    • J
      KVM: arm64: Save/Restore guest DISR_EL1 · c773ae2b
      James Morse 提交于
      If we deliver a virtual SError to the guest, the guest may defer it
      with an ESB instruction. The guest reads the deferred value via DISR_EL1,
      but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO
      is set.
      
      Add the KVM code to save/restore VDISR_EL2, and make it accessible to
      userspace as DISR_EL1.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c773ae2b
    • J
      KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2. · 4715c14b
      James Morse 提交于
      Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature
      generated an SError with an implementation defined ESR_EL1.ISS, because we
      had no mechanism to specify the ESR value.
      
      On Juno this generates an all-zero ESR, the most significant bit 'ISV'
      is clear indicating the remainder of the ISS field is invalid.
      
      With the RAS Extensions we have a mechanism to specify this value, and the
      most significant bit has a new meaning: 'IDS - Implementation Defined
      Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized'
      instead of 'no valid ISS'.
      
      Add KVM support for the VSESR_EL2 register to specify an ESR value when
      HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to
      specify an implementation-defined value.
      
      We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM
      save/restores this bit during __{,de}activate_traps() and hardware clears the
      bit once the guest has consumed the virtual-SError.
      
      Future patches may add an API (or KVM CAP) to pend a virtual SError with
      a specified ESR.
      
      Cc: Dongjiu Geng <gengdongjiu@huawei.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4715c14b
    • J
      KVM: arm/arm64: mask/unmask daif around VHE guests · 4f5abad9
      James Morse 提交于
      Non-VHE systems take an exception to EL2 in order to world-switch into the
      guest. When returning from the guest KVM implicitly restores the DAIF
      flags when it returns to the kernel at EL1.
      
      With VHE none of this exception-level jumping happens, so KVMs
      world-switch code is exposed to the host kernel's DAIF values, and KVM
      spills the guest-exit DAIF values back into the host kernel.
      On entry to a guest we have Debug and SError exceptions unmasked, KVM
      has switched VBAR but isn't prepared to handle these. On guest exit
      Debug exceptions are left disabled once we return to the host and will
      stay this way until we enter user space.
      
      Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
      happen after the hosts VBAR value has been synchronised by the isb in
      __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
      setting KVMs VBAR value, but is kept here for symmetry.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4f5abad9
  6. 13 1月, 2018 1 次提交
    • J
      KVM: arm64: Change hyp_panic()s dependency on tpidr_el2 · c97e166e
      James Morse 提交于
      Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
      host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
      on VHE they can be the same register.
      
      KVM calls hyp_panic() when anything unexpected happens. This may occur
      while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
      tpidr_el2, which it uses to find the host context in order to restore
      the host EL1 registers before parachuting into the host's panic().
      
      The host context is a struct kvm_cpu_context allocated in the per-cpu
      area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
      easy to find. Change hyp_panic() to take a pointer to the
      struct kvm_cpu_context. Wrap these calls with an asm function that
      retrieves the struct kvm_cpu_context from the host's per-cpu area.
      
      Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
      kvm init. (Later patches will make this unnecessary for VHE hosts)
      
      We print out the vcpu pointer as part of the panic message. Add a back
      reference to the 'running vcpu' in the host cpu context to preserve this.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c97e166e
  7. 02 1月, 2018 1 次提交
    • C
      KVM: arm/arm64: Avoid work when userspace iqchips are not used · 61bbe380
      Christoffer Dall 提交于
      We currently check if the VM has a userspace irqchip in several places
      along the critical path, and if so, we do some work which is only
      required for having an irqchip in userspace.  This is unfortunate, as we
      could avoid doing any work entirely, if we didn't have to support
      irqchip in userspace.
      
      Realizing the userspace irqchip on ARM is mostly a developer or hobby
      feature, and is unlikely to be used in servers or other scenarios where
      performance is a priority, we can use a refcounted static key to only
      check the irqchip configuration when we have at least one VM that uses
      an irqchip in userspace.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      61bbe380
  8. 29 11月, 2017 1 次提交
  9. 03 11月, 2017 1 次提交
    • D
      arm64/sve: KVM: Prevent guests from using SVE · 17eed27b
      Dave Martin 提交于
      Until KVM has full SVE support, guests must not be allowed to
      execute SVE instructions.
      
      This patch enables the necessary traps, and also ensures that the
      traps are disabled again on exit from the guest so that the host
      can still use SVE if it wants to.
      
      On guest exit, high bits of the SVE Zn registers may have been
      clobbered as a side-effect the execution of FPSIMD instructions in
      the guest.  The existing KVM host FPSIMD restore code is not
      sufficient to restore these bits, so this patch explicitly marks
      the CPU as not containing cached vector state for any task, thus
      forcing a reload on the next return to userspace.  This is an
      interim measure, in advance of adding full SVE awareness to KVM.
      
      This marking of cached vector state in the CPU as invalid is done
      using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c.  Due
      to the repeated use of this rather obscure operation, it makes
      sense to factor it out as a separate helper with a clearer name.
      This patch factors it out as fpsimd_flush_cpu_state(), and ports
      all callers to use it.
      
      As a side effect of this refactoring, a this_cpu_write() in
      fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
      should be fine, since cpu_pm_enter() is supposed to be called only
      with interrupts disabled.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      17eed27b
  10. 01 9月, 2017 1 次提交
  11. 04 6月, 2017 3 次提交
  12. 23 5月, 2017 1 次提交
  13. 18 5月, 2017 1 次提交
    • M
      arm64/cpufeature: don't use mutex in bringup path · 63a1e1c9
      Mark Rutland 提交于
      Currently, cpus_set_cap() calls static_branch_enable_cpuslocked(), which
      must take the jump_label mutex.
      
      We call cpus_set_cap() in the secondary bringup path, from the idle
      thread where interrupts are disabled. Taking a mutex in this path "is a
      NONO" regardless of whether it's contended, and something we must avoid.
      We didn't spot this until recently, as ___might_sleep() won't warn for
      this case until all CPUs have been brought up.
      
      This patch avoids taking the mutex in the secondary bringup path. The
      poking of static keys is deferred until enable_cpu_capabilities(), which
      runs in a suitable context on the boot CPU. To account for the static
      keys being set later, cpus_have_const_cap() is updated to use another
      static key to check whether the const cap keys have been initialised,
      falling back to the caps bitmap until this is the case.
      
      This means that users of cpus_have_const_cap() gain should only gain a
      single additional NOP in the fast path once the const caps are
      initialised, but should always see the current cap value.
      
      The hyp code should never dereference the caps array, since the caps are
      initialized before we run the module initcall to initialise hyp. A check
      is added to the hyp init code to document this requirement.
      
      This change will sidestep a number of issues when the upcoming hotplug
      locking rework is merged.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NMarc Zyniger <marc.zyngier@arm.com>
      Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      63a1e1c9
  14. 27 4月, 2017 2 次提交
    • P
      KVM: mark requests that need synchronization · 7a97cec2
      Paolo Bonzini 提交于
      kvm_make_all_requests() provides a synchronization that waits until all
      kicked VCPUs have acknowledged the kick.  This is important for
      KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
      underway.
      
      This patch adds the synchronization property into all requests that are
      currently being used with kvm_make_all_requests() in order to preserve
      the current behavior and only introduce a new framework.  Removing it
      from requests where it is not necessary is left for future patches.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a97cec2
    • R
      KVM: mark requests that do not need a wakeup · 930f7fd6
      Radim Krčmář 提交于
      Some operations must ensure that the guest is not running with stale
      data, but if the guest is halted, then the update can wait until another
      event happens.  kvm_make_all_requests() currently doesn't wake up, so we
      can mark all requests used with it.
      
      First 8 bits were arbitrarily reserved for request numbers.
      
      Most uses of requests have the request type as a constant, so a compiler
      will optimize the '&'.
      
      An alternative would be to have an inline function that would return
      whether the request needs a wake-up or not, but I like this one better
      even though it might produce worse assembly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      930f7fd6
  15. 09 4月, 2017 2 次提交
  16. 07 4月, 2017 1 次提交
  17. 09 3月, 2017 2 次提交
  18. 08 2月, 2017 1 次提交
  19. 03 2月, 2017 1 次提交
    • W
      arm64: KVM: Save/restore the host SPE state when entering/leaving a VM · f85279b4
      Will Deacon 提交于
      The SPE buffer is virtually addressed, using the page tables of the CPU
      MMU. Unusually, this means that the EL0/1 page table may be live whilst
      we're executing at EL2 on non-VHE configurations. When VHE is in use,
      we can use the same property to profile the guest behind its back.
      
      This patch adds the relevant disabling and flushing code to KVM so that
      the host can make use of SPE without corrupting guest memory, and any
      attempts by a guest to use SPE will result in a trap.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Alex Bennée <alex.bennee@linaro.org>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f85279b4
  20. 05 11月, 2016 1 次提交
    • M
      arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU · 94d0e598
      Marc Zyngier 提交于
      Architecturally, TLBs are private to the (physical) CPU they're
      associated with. But when multiple vcpus from the same VM are
      being multiplexed on the same CPU, the TLBs are not private
      to the vcpus (and are actually shared across the VMID).
      
      Let's consider the following scenario:
      
      - vcpu-0 maps PA to VA
      - vcpu-1 maps PA' to VA
      
      If run on the same physical CPU, vcpu-1 can hit TLB entries generated
      by vcpu-0 accesses, and access the wrong physical page.
      
      The solution to this is to keep a per-VM map of which vcpu ran last
      on each given physical CPU, and invalidate local TLBs when switching
      to a different vcpu from the same VM.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      94d0e598
  21. 08 9月, 2016 1 次提交
    • S
      KVM: Add provisioning for ulong vm stats and u64 vcpu stats · 8a7e75d4
      Suraj Jitindar Singh 提交于
      vms and vcpus have statistics associated with them which can be viewed
      within the debugfs. Currently it is assumed within the vcpu_stat_get() and
      vm_stat_get() functions that all of these statistics are represented as
      u32s, however the next patch adds some u64 vcpu statistics.
      
      Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
      Since vcpu statistics are per vcpu, they will only be updated by a single
      vcpu at a time so this shouldn't present a problem on 32-bit machines
      which can't atomically increment 64-bit numbers. However vm statistics
      could potentially be updated by multiple vcpus from that vm at a time.
      To avoid the overhead of atomics make all vm statistics ulong such that
      they are 64-bit on 64-bit systems where they can be atomically incremented
      and are 32-bit on 32-bit systems which may not be able to atomically
      increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8a7e75d4
  22. 19 7月, 2016 1 次提交
  23. 04 7月, 2016 2 次提交