1. 06 12月, 2019 1 次提交
  2. 08 11月, 2019 1 次提交
  3. 29 10月, 2019 1 次提交
    • M
      KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put · 8e01d9a3
      Marc Zyngier 提交于
      When the VHE code was reworked, a lot of the vgic stuff was moved around,
      but the GICv4 residency code did stay untouched, meaning that we come
      in and out of residency on each flush/sync, which is obviously suboptimal.
      
      To address this, let's move things around a bit:
      
      - Residency entry (flush) moves to vcpu_load
      - Residency exit (sync) moves to vcpu_put
      - On blocking (entry to WFI), we "put"
      - On unblocking (exit from WFI), we "load"
      
      Because these can nest (load/block/put/load/unblock/put, for example),
      we now have per-VPE tracking of the residency state.
      
      Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
      gets set to true when blocking because of a WFI. This allows a finer
      control of the doorbell, which now also gets disabled as soon as
      it gets signaled.
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
      8e01d9a3
  4. 22 10月, 2019 3 次提交
    • S
      KVM: arm64: Support stolen time reporting via shared structure · 8564d637
      Steven Price 提交于
      Implement the service call for configuring a shared structure between a
      VCPU and the hypervisor in which the hypervisor can write the time
      stolen from the VCPU's execution time by other tasks on the host.
      
      User space allocates memory which is placed at an IPA also chosen by user
      space. The hypervisor then updates the shared structure using
      kvm_put_guest() to ensure single copy atomicity of the 64-bit value
      reporting the stolen time in nanoseconds.
      
      Whenever stolen time is enabled by the guest, the stolen time counter is
      reset.
      
      The stolen time itself is retrieved from the sched_info structure
      maintained by the Linux scheduler code. We enable SCHEDSTATS when
      selecting KVM Kconfig to ensure this value is meaningful.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      8564d637
    • C
      KVM: arm/arm64: Allow user injection of external data aborts · da345174
      Christoffer Dall 提交于
      In some scenarios, such as buggy guest or incorrect configuration of the
      VMM and firmware description data, userspace will detect a memory access
      to a portion of the IPA, which is not mapped to any MMIO region.
      
      For this purpose, the appropriate action is to inject an external abort
      to the guest.  The kernel already has functionality to inject an
      external abort, but we need to wire up a signal from user space that
      lets user space tell the kernel to do this.
      
      It turns out, we already have the set event functionality which we can
      perfectly reuse for this.
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      da345174
    • C
      KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d
      Christoffer Dall 提交于
      For a long time, if a guest accessed memory outside of a memslot using
      any of the load/store instructions in the architecture which doesn't
      supply decoding information in the ESR_EL2 (the ISV bit is not set), the
      kernel would print the following message and terminate the VM as a
      result of returning -ENOSYS to userspace:
      
        load/store instruction decoding not implemented
      
      The reason behind this message is that KVM assumes that all accesses
      outside a memslot is an MMIO access which should be handled by
      userspace, and we originally expected to eventually implement some sort
      of decoding of load/store instructions where the ISV bit was not set.
      
      However, it turns out that many of the instructions which don't provide
      decoding information on abort are not safe to use for MMIO accesses, and
      the remaining few that would potentially make sense to use on MMIO
      accesses, such as those with register writeback, are not used in
      practice.  It also turns out that fetching an instruction from guest
      memory can be a pretty horrible affair, involving stopping all CPUs on
      SMP systems, handling multiple corner cases of address translation in
      software, and more.  It doesn't appear likely that we'll ever implement
      this in the kernel.
      
      What is much more common is that a user has misconfigured his/her guest
      and is actually not accessing an MMIO region, but just hitting some
      random hole in the IPA space.  In this scenario, the error message above
      is almost misleading and has led to a great deal of confusion over the
      years.
      
      It is, nevertheless, ABI to userspace, and we therefore need to
      introduce a new capability that userspace explicitly enables to change
      behavior.
      
      This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
      which does exactly that, and introduces a new exit reason to report the
      event to userspace.  User space can then emulate an exception to the
      guest, restart the guest, suspend the guest, or take any other
      appropriate action as per the policy of the running system.
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      c726200d
  5. 09 9月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier 提交于
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      92f35b75
  6. 05 8月, 2019 2 次提交
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e
      Marc Zyngier 提交于
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      5eeaf10e
    • P
      KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae
      Paolo Bonzini 提交于
      There is no need for this function as all arches have to implement
      kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
      let us actually simplify the code.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      741cbbae
  7. 24 7月, 2019 1 次提交
  8. 23 7月, 2019 1 次提交
  9. 08 7月, 2019 1 次提交
    • M
      KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register · 1e0cf16c
      Marc Zyngier 提交于
      As part of setting up the host context, we populate its
      MPIDR by using cpu_logical_map(). It turns out that contrary
      to arm64, cpu_logical_map() on 32bit ARM doesn't return the
      *full* MPIDR, but a truncated version.
      
      This leaves the host MPIDR slightly corrupted after the first
      run of a VM, since we won't correctly restore the MPIDR on
      exit. Oops.
      
      Since we cannot trust cpu_logical_map(), let's adopt a different
      strategy. We move the initialization of the host CPU context as
      part of the per-CPU initialization (which, in retrospect, makes
      a lot of sense), and directly read the MPIDR from the HW. This
      is guaranteed to work on both arm and arm64.
      Reported-by: NAndre Przywara <Andre.Przywara@arm.com>
      Tested-by: NAndre Przywara <Andre.Przywara@arm.com>
      Fixes: 32f13955 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1e0cf16c
  10. 05 6月, 2019 2 次提交
  11. 28 5月, 2019 1 次提交
    • T
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · a86cb413
      Thomas Huth 提交于
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      a86cb413
  12. 25 4月, 2019 1 次提交
  13. 24 4月, 2019 3 次提交
    • A
      arm64: KVM: Enable VHE support for :G/:H perf event modifiers · 435e53fb
      Andrew Murray 提交于
      With VHE different exception levels are used between the host (EL2) and
      guest (EL1) with a shared exception level for userpace (EL0). We can take
      advantage of this and use the PMU's exception level filtering to avoid
      enabling/disabling counters in the world-switch code. Instead we just
      modify the counter type to include or exclude EL0 at vcpu_{load,put} time.
      
      We also ensure that trapped PMU system register writes do not re-enable
      EL0 when reconfiguring the backing perf events.
      
      This approach completely avoids blackout windows seen with !VHE.
      Suggested-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      435e53fb
    • A
      arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data · 630a1685
      Andrew Murray 提交于
      The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
      a typedef to kvm_cpu_context and is used to store host cpu context. The
      kvm_cpu_context structure is also used elsewhere to hold vcpu context.
      In order to use the percpu to hold additional future host information we
      encapsulate kvm_cpu_context in a new structure and rename the typedef and
      percpu to match.
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      630a1685
    • M
      KVM: arm/arm64: Context-switch ptrauth registers · 384b40ca
      Mark Rutland 提交于
      When pointer authentication is supported, a guest may wish to use it.
      This patch adds the necessary KVM infrastructure for this to work, with
      a semi-lazy context switch of the pointer auth state.
      
      Pointer authentication feature is only enabled when VHE is built
      in the kernel and present in the CPU implementation so only VHE code
      paths are modified.
      
      When we schedule a vcpu, we disable guest usage of pointer
      authentication instructions and accesses to the keys. While these are
      disabled, we avoid context-switching the keys. When we trap the guest
      trying to use pointer authentication functionality, we change to eagerly
      context-switching the keys, and enable the feature. The next time the
      vcpu is scheduled out/in, we start again. However the host key save is
      optimized and implemented inside ptrauth instruction/register access
      trap.
      
      Pointer authentication consists of address authentication and generic
      authentication, and CPUs in a system might have varied support for
      either. Where support for either feature is not uniform, it is hidden
      from guests via ID register emulation, as a result of the cpufeature
      framework in the host.
      
      Unfortunately, address authentication and generic authentication cannot
      be trapped separately, as the architecture provides a single EL2 trap
      covering both. If we wish to expose one without the other, we cannot
      prevent a (badly-written) guest from intermittently using a feature
      which is not uniformly supported (when scheduled on a physical CPU which
      supports the relevant feature). Hence, this patch expects both type of
      authentication to be present in a cpu.
      
      This switch of key is done from guest enter/exit assembly as preparation
      for the upcoming in-kernel pointer authentication support. Hence, these
      key switching routines are not implemented in C code as they may cause
      pointer authentication key signing error in some situations.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
      , save host key in ptrauth exception trap]
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      [maz: various fixups]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      384b40ca
  14. 19 4月, 2019 1 次提交
  15. 16 4月, 2019 1 次提交
  16. 29 3月, 2019 2 次提交
    • D
      KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl · 7dd32a0d
      Dave Martin 提交于
      Some aspects of vcpu configuration may be too complex to be
      completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
      requirement for userspace to do some additional configuration
      before various other ioctls will work in a consistent way.
      
      In particular this will be the case for SVE, where userspace will
      need to negotiate the set of vector lengths to be made available to
      the guest before the vcpu becomes fully usable.
      
      In order to provide an explicit way for userspace to confirm that
      it has finished setting up a particular vcpu feature, this patch
      adds a new ioctl KVM_ARM_VCPU_FINALIZE.
      
      When userspace has opted into a feature that requires finalization,
      typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
      matching call to KVM_ARM_VCPU_FINALIZE is now required before
      KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
      impose additional restrictions where appropriate.
      
      No existing vcpu features are affected by this, so current
      userspace implementations will continue to work exactly as before,
      with no need to issue KVM_ARM_VCPU_FINALIZE.
      
      As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
      placeholder: no finalizable features exist yet, so ioctl is not
      required and will always yield EINVAL.  Subsequent patches will add
      the finalization logic to make use of this ioctl for SVE.
      
      No functional change for existing userspace.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7dd32a0d
    • D
      KVM: arm/arm64: Add hook for arch-specific KVM initialisation · 0f062bfe
      Dave Martin 提交于
      This patch adds a kvm_arm_init_arch_resources() hook to perform
      subarch-specific initialisation when starting up KVM.
      
      This will be used in a subsequent patch for global SVE-related
      setup on arm64.
      
      No functional change.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      0f062bfe
  17. 20 2月, 2019 5 次提交
  18. 07 2月, 2019 1 次提交
  19. 18 12月, 2018 2 次提交
    • C
      KVM: arm/arm64: Fix VMID alloc race by reverting to lock-less · fb544d1c
      Christoffer Dall 提交于
      We recently addressed a VMID generation race by introducing a read/write
      lock around accesses and updates to the vmid generation values.
      
      However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
      does so without taking the read lock.
      
      As far as I can tell, this can lead to the same kind of race:
      
        VM 0, VCPU 0			VM 0, VCPU 1
        ------------			------------
        update_vttbr (vmid 254)
        				update_vttbr (vmid 1) // roll over
      				read_lock(kvm_vmid_lock);
      				force_vm_exit()
        local_irq_disable
        need_new_vmid_gen == false //because vmid gen matches
      
        enter_guest (vmid 254)
        				kvm_arch.vttbr = <PGD>:<VMID 1>
      				read_unlock(kvm_vmid_lock);
      
        				enter_guest (vmid 1)
      
      Which results in running two VCPUs in the same VM with different VMIDs
      and (even worse) other VCPUs from other VMs could now allocate clashing
      VMID 254 from the new generation as long as VCPU 0 is not exiting.
      
      Attempt to solve this by making sure vttbr is updated before another CPU
      can observe the updated VMID generation.
      
      Cc: stable@vger.kernel.org
      Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race"
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fb544d1c
    • M
      arm64: KVM: Consistently advance singlestep when emulating instructions · bd7d95ca
      Mark Rutland 提交于
      When we emulate a guest instruction, we don't advance the hardware
      singlestep state machine, and thus the guest will receive a software
      step exception after a next instruction which is not emulated by the
      host.
      
      We bodge around this in an ad-hoc fashion. Sometimes we explicitly check
      whether userspace requested a single step, and fake a debug exception
      from within the kernel. Other times, we advance the HW singlestep state
      rely on the HW to generate the exception for us. Thus, the observed step
      behaviour differs for host and guest.
      
      Let's make this simpler and consistent by always advancing the HW
      singlestep state machine when we skip an instruction. Thus we can rely
      on the hardware to generate the singlestep exception for us, and never
      need to explicitly check for an active-pending step, nor do we need to
      fake a debug exception from the guest.
      
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      bd7d95ca
  20. 14 12月, 2018 2 次提交
    • P
      kvm: introduce manual dirty log reprotect · 2a31b9db
      Paolo Bonzini 提交于
      There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
      it can take kvm->mmu_lock for an extended period of time.  Second, its user
      can actually see many false positives in some cases.  The latter is due
      to a benign race like this:
      
        1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
           them.
        2. The guest modifies the pages, causing them to be marked ditry.
        3. Userspace actually copies the pages.
        4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
           they were not written to since (3).
      
      This is especially a problem for large guests, where the time between
      (1) and (3) can be substantial.  This patch introduces a new
      capability which, when enabled, makes KVM_GET_DIRTY_LOG not
      write-protect the pages it returns.  Instead, userspace has to
      explicitly clear the dirty log bits just before using the content
      of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
      64-page granularity rather than requiring to sync a full memslot;
      this way, the mmu_lock is taken for small amounts of time, and
      only a small amount of time will pass between write protection
      of pages and the sending of their content.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2a31b9db
    • P
      kvm: rename last argument to kvm_get_dirty_log_protect · 8fe65a82
      Paolo Bonzini 提交于
      When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's
      pointer argument will always be false on exit, because no TLB flush is needed
      until the manual re-protection operation.  Rename it from "is_dirty" to "flush",
      which more accurately tells the caller what they have to do with it.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8fe65a82
  21. 10 12月, 2018 1 次提交
  22. 18 10月, 2018 3 次提交
  23. 03 10月, 2018 3 次提交