1. 16 1月, 2018 1 次提交
    • J
      KVM: arm/arm64: mask/unmask daif around VHE guests · 4f5abad9
      James Morse 提交于
      Non-VHE systems take an exception to EL2 in order to world-switch into the
      guest. When returning from the guest KVM implicitly restores the DAIF
      flags when it returns to the kernel at EL1.
      
      With VHE none of this exception-level jumping happens, so KVMs
      world-switch code is exposed to the host kernel's DAIF values, and KVM
      spills the guest-exit DAIF values back into the host kernel.
      On entry to a guest we have Debug and SError exceptions unmasked, KVM
      has switched VBAR but isn't prepared to handle these. On guest exit
      Debug exceptions are left disabled once we return to the host and will
      stay this way until we enter user space.
      
      Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
      happen after the hosts VBAR value has been synchronised by the isb in
      __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
      setting KVMs VBAR value, but is kept here for symmetry.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4f5abad9
  2. 13 1月, 2018 1 次提交
    • J
      KVM: arm64: Change hyp_panic()s dependency on tpidr_el2 · c97e166e
      James Morse 提交于
      Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
      host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
      on VHE they can be the same register.
      
      KVM calls hyp_panic() when anything unexpected happens. This may occur
      while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
      tpidr_el2, which it uses to find the host context in order to restore
      the host EL1 registers before parachuting into the host's panic().
      
      The host context is a struct kvm_cpu_context allocated in the per-cpu
      area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
      easy to find. Change hyp_panic() to take a pointer to the
      struct kvm_cpu_context. Wrap these calls with an asm function that
      retrieves the struct kvm_cpu_context from the host's per-cpu area.
      
      Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
      kvm init. (Later patches will make this unnecessary for VHE hosts)
      
      We print out the vcpu pointer as part of the panic message. Add a back
      reference to the 'running vcpu' in the host cpu context to preserve this.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <cdall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c97e166e
  3. 29 11月, 2017 1 次提交
  4. 03 11月, 2017 1 次提交
    • D
      arm64/sve: KVM: Prevent guests from using SVE · 17eed27b
      Dave Martin 提交于
      Until KVM has full SVE support, guests must not be allowed to
      execute SVE instructions.
      
      This patch enables the necessary traps, and also ensures that the
      traps are disabled again on exit from the guest so that the host
      can still use SVE if it wants to.
      
      On guest exit, high bits of the SVE Zn registers may have been
      clobbered as a side-effect the execution of FPSIMD instructions in
      the guest.  The existing KVM host FPSIMD restore code is not
      sufficient to restore these bits, so this patch explicitly marks
      the CPU as not containing cached vector state for any task, thus
      forcing a reload on the next return to userspace.  This is an
      interim measure, in advance of adding full SVE awareness to KVM.
      
      This marking of cached vector state in the CPU as invalid is done
      using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c.  Due
      to the repeated use of this rather obscure operation, it makes
      sense to factor it out as a separate helper with a clearer name.
      This patch factors it out as fpsimd_flush_cpu_state(), and ports
      all callers to use it.
      
      As a side effect of this refactoring, a this_cpu_write() in
      fpsimd_cpu_pm_notifier() is changed to __this_cpu_write().  This
      should be fine, since cpu_pm_enter() is supposed to be called only
      with interrupts disabled.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      17eed27b
  5. 01 9月, 2017 1 次提交
  6. 04 6月, 2017 3 次提交
  7. 23 5月, 2017 1 次提交
  8. 18 5月, 2017 1 次提交
    • M
      arm64/cpufeature: don't use mutex in bringup path · 63a1e1c9
      Mark Rutland 提交于
      Currently, cpus_set_cap() calls static_branch_enable_cpuslocked(), which
      must take the jump_label mutex.
      
      We call cpus_set_cap() in the secondary bringup path, from the idle
      thread where interrupts are disabled. Taking a mutex in this path "is a
      NONO" regardless of whether it's contended, and something we must avoid.
      We didn't spot this until recently, as ___might_sleep() won't warn for
      this case until all CPUs have been brought up.
      
      This patch avoids taking the mutex in the secondary bringup path. The
      poking of static keys is deferred until enable_cpu_capabilities(), which
      runs in a suitable context on the boot CPU. To account for the static
      keys being set later, cpus_have_const_cap() is updated to use another
      static key to check whether the const cap keys have been initialised,
      falling back to the caps bitmap until this is the case.
      
      This means that users of cpus_have_const_cap() gain should only gain a
      single additional NOP in the fast path once the const caps are
      initialised, but should always see the current cap value.
      
      The hyp code should never dereference the caps array, since the caps are
      initialized before we run the module initcall to initialise hyp. A check
      is added to the hyp init code to document this requirement.
      
      This change will sidestep a number of issues when the upcoming hotplug
      locking rework is merged.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NMarc Zyniger <marc.zyngier@arm.com>
      Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      63a1e1c9
  9. 27 4月, 2017 2 次提交
    • P
      KVM: mark requests that need synchronization · 7a97cec2
      Paolo Bonzini 提交于
      kvm_make_all_requests() provides a synchronization that waits until all
      kicked VCPUs have acknowledged the kick.  This is important for
      KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
      underway.
      
      This patch adds the synchronization property into all requests that are
      currently being used with kvm_make_all_requests() in order to preserve
      the current behavior and only introduce a new framework.  Removing it
      from requests where it is not necessary is left for future patches.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a97cec2
    • R
      KVM: mark requests that do not need a wakeup · 930f7fd6
      Radim Krčmář 提交于
      Some operations must ensure that the guest is not running with stale
      data, but if the guest is halted, then the update can wait until another
      event happens.  kvm_make_all_requests() currently doesn't wake up, so we
      can mark all requests used with it.
      
      First 8 bits were arbitrarily reserved for request numbers.
      
      Most uses of requests have the request type as a constant, so a compiler
      will optimize the '&'.
      
      An alternative would be to have an inline function that would return
      whether the request needs a wake-up or not, but I like this one better
      even though it might produce worse assembly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      930f7fd6
  10. 09 4月, 2017 2 次提交
  11. 07 4月, 2017 1 次提交
  12. 09 3月, 2017 2 次提交
  13. 08 2月, 2017 1 次提交
  14. 03 2月, 2017 1 次提交
    • W
      arm64: KVM: Save/restore the host SPE state when entering/leaving a VM · f85279b4
      Will Deacon 提交于
      The SPE buffer is virtually addressed, using the page tables of the CPU
      MMU. Unusually, this means that the EL0/1 page table may be live whilst
      we're executing at EL2 on non-VHE configurations. When VHE is in use,
      we can use the same property to profile the guest behind its back.
      
      This patch adds the relevant disabling and flushing code to KVM so that
      the host can make use of SPE without corrupting guest memory, and any
      attempts by a guest to use SPE will result in a trap.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Alex Bennée <alex.bennee@linaro.org>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f85279b4
  15. 05 11月, 2016 1 次提交
    • M
      arm/arm64: KVM: Perform local TLB invalidation when multiplexing vcpus on a single CPU · 94d0e598
      Marc Zyngier 提交于
      Architecturally, TLBs are private to the (physical) CPU they're
      associated with. But when multiple vcpus from the same VM are
      being multiplexed on the same CPU, the TLBs are not private
      to the vcpus (and are actually shared across the VMID).
      
      Let's consider the following scenario:
      
      - vcpu-0 maps PA to VA
      - vcpu-1 maps PA' to VA
      
      If run on the same physical CPU, vcpu-1 can hit TLB entries generated
      by vcpu-0 accesses, and access the wrong physical page.
      
      The solution to this is to keep a per-VM map of which vcpu ran last
      on each given physical CPU, and invalidate local TLBs when switching
      to a different vcpu from the same VM.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      94d0e598
  16. 08 9月, 2016 1 次提交
    • S
      KVM: Add provisioning for ulong vm stats and u64 vcpu stats · 8a7e75d4
      Suraj Jitindar Singh 提交于
      vms and vcpus have statistics associated with them which can be viewed
      within the debugfs. Currently it is assumed within the vcpu_stat_get() and
      vm_stat_get() functions that all of these statistics are represented as
      u32s, however the next patch adds some u64 vcpu statistics.
      
      Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
      Since vcpu statistics are per vcpu, they will only be updated by a single
      vcpu at a time so this shouldn't present a problem on 32-bit machines
      which can't atomically increment 64-bit numbers. However vm statistics
      could potentially be updated by multiple vcpus from that vm at a time.
      To avoid the overhead of atomics make all vm statistics ulong such that
      they are 64-bit on 64-bit systems where they can be atomically incremented
      and are 32-bit on 32-bit systems which may not be able to atomically
      increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      8a7e75d4
  17. 19 7月, 2016 1 次提交
  18. 04 7月, 2016 3 次提交
  19. 20 5月, 2016 2 次提交
    • C
      KVM: arm/arm64: vgic-new: Synchronize changes to active state · 35a2d585
      Christoffer Dall 提交于
      When modifying the active state of an interrupt via the MMIO interface,
      we should ensure that the write has the intended effect.
      
      If a guest sets an interrupt to active, but that interrupt is already
      flushed into a list register on a running VCPU, then that VCPU will
      write the active state back into the struct vgic_irq upon returning from
      the guest and syncing its state.  This is a non-benign race, because the
      guest can observe that an interrupt is not active, and it can have a
      reasonable expectations that other VCPUs will not ack any IRQs, and then
      set the state to active, and expect it to stay that way.  Currently we
      are not honoring this case.
      
      Thefore, change both the SACTIVE and CACTIVE mmio handlers to stop the
      world, change the irq state, potentially queue the irq if we're setting
      it to active, and then continue.
      
      We take this chance to slightly optimize these functions by not stopping
      the world when touching private interrupts where there is inherently no
      possible race.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      35a2d585
    • C
      KVM: arm/arm64: Provide functionality to pause and resume a guest · b13216cf
      Christoffer Dall 提交于
      For some rare corner cases in our VGIC emulation later we have to stop
      the guest to make sure the VGIC state is consistent.
      Provide the necessary framework to pause and resume a guest.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      b13216cf
  20. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  21. 03 5月, 2016 1 次提交
    • J
      arm64: kvm: Fix kvm teardown for systems using the extended idmap · c612505f
      James Morse 提交于
      If memory is located above 1<<VA_BITS, kvm adds an extra level to its page
      tables, merging the runtime tables and boot tables that contain the idmap.
      This lets us avoid the trampoline dance during initialisation.
      
      This also means there is no trampoline page mapped, so
      __cpu_reset_hyp_mode() can't call __kvm_hyp_reset() in this page. The good
      news is the idmap is still mapped, so we don't need the trampoline page.
      The bad news is we can't call it directly as the idmap is above
      HYP_PAGE_OFFSET, so its address is masked by kvm_call_hyp.
      
      Add a function __extended_idmap_trampoline which will branch into
      __kvm_hyp_reset in the idmap, change kvm_hyp_reset_entry() to return
      this address if __kvm_cpu_uses_extended_idmap(). In this case
      __kvm_hyp_reset() will still switch to the boot tables (which are the
      merged tables that were already in use), and branch into the idmap (where
      it already was).
      
      This fixes boot failures on these systems, where we fail to execute the
      missing trampoline page when tearing down kvm in init_subsystems():
      [    2.508922] kvm [1]: 8-bit VMID
      [    2.512057] kvm [1]: Hyp mode initialized successfully
      [    2.517242] kvm [1]: interrupt-controller@e1140000 IRQ13
      [    2.522622] kvm [1]: timer IRQ3
      [    2.525783] Kernel panic - not syncing: HYP panic:
      [    2.525783] PS:200003c9 PC:0000007ffffff820 ESR:86000005
      [    2.525783] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
      [    2.525783] VCPU:          (null)
      [    2.525783]
      [    2.547667] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.6.0-rc5+ #1
      [    2.555137] Hardware name: Default string Default string/Default string, BIOS ROD0084E 09/03/2015
      [    2.563994] Call trace:
      [    2.566432] [<ffffff80080888d0>] dump_backtrace+0x0/0x240
      [    2.571818] [<ffffff8008088b24>] show_stack+0x14/0x20
      [    2.576858] [<ffffff80083423ac>] dump_stack+0x94/0xb8
      [    2.581899] [<ffffff8008152130>] panic+0x10c/0x250
      [    2.586677] [<ffffff8008152024>] panic+0x0/0x250
      [    2.591281] SMP: stopping secondary CPUs
      [    3.649692] SMP: failed to stop secondary CPUs 0-2,4-7
      [    3.654818] Kernel Offset: disabled
      [    3.658293] Memory Limit: none
      [    3.661337] ---[ end Kernel panic - not syncing: HYP panic:
      [    3.661337] PS:200003c9 PC:0000007ffffff820 ESR:86000005
      [    3.661337] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
      [    3.661337] VCPU:          (null)
      [    3.661337]
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c612505f
  22. 28 4月, 2016 1 次提交
    • A
      arm64: kvm: allows kvm cpu hotplug · 67f69197
      AKASHI Takahiro 提交于
      The current kvm implementation on arm64 does cpu-specific initialization
      at system boot, and has no way to gracefully shutdown a core in terms of
      kvm. This prevents kexec from rebooting the system at EL2.
      
      This patch adds a cpu tear-down function and also puts an existing cpu-init
      code into a separate function, kvm_arch_hardware_disable() and
      kvm_arch_hardware_enable() respectively.
      We don't need the arm64 specific cpu hotplug hook any more.
      
      Since this patch modifies common code between arm and arm64, one stub
      definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
      compilation errors.
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      [Rebase, added separate VHE init/exit path, changed resets use of
       kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
       added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
       guest-enter after teardown handling]
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      67f69197
  23. 06 4月, 2016 1 次提交
    • M
      arm64: KVM: Warn when PARange is less than 40 bits · 6141570c
      Marc Zyngier 提交于
      We always thought that 40bits of PA range would be the minimum people
      would actually build. Anything less is terrifyingly small.
      
      Turns out that we were both right and wrong. Nobody has ever built
      such a system, but the ARM Foundation Model has a PARange set to 36bits.
      Just because we can. Oh well. Now, the KVM API explicitely says that
      we offer a 40bit PA space to the VM, so we shouldn't run KVM on
      the Foundation Model at all.
      
      That being said, this patch offers a less agressive alternative, and
      loudly warns about the configuration being unsupported. You'll still
      be able to run VMs (at your own risks, though).
      
      This is just a workaround until we have a proper userspace API where
      we report the PARange to userspace.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      6141570c
  24. 29 3月, 2016 1 次提交
  25. 05 3月, 2016 1 次提交
  26. 01 3月, 2016 7 次提交