1. 09 6月, 2020 1 次提交
    • M
      KVM: arm64: Save the host's PtrAuth keys in non-preemptible context · ef3e40a7
      Marc Zyngier 提交于
      When using the PtrAuth feature in a guest, we need to save the host's
      keys before allowing the guest to program them. For that, we dump
      them in a per-CPU data structure (the so called host context).
      
      But both call sites that do this are in preemptible context,
      which may end up in disaster should the vcpu thread get preempted
      before reentering the guest.
      
      Instead, save the keys eagerly on each vcpu_load(). This has an
      increased overhead, but is at least safe.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      ef3e40a7
  2. 31 5月, 2020 1 次提交
  3. 25 5月, 2020 1 次提交
  4. 16 5月, 2020 4 次提交
  5. 31 3月, 2020 1 次提交
  6. 24 3月, 2020 1 次提交
  7. 17 3月, 2020 1 次提交
    • S
      KVM: Provide common implementation for generic dirty log functions · 0dff0846
      Sean Christopherson 提交于
      Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
      for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
      The arch specific implemenations are extremely similar, differing
      only in whether the dirty log needs to be sync'd from hardware (x86)
      and how the TLBs are flushed.  Add new arch hooks to handle sync
      and TLB flush; the sync will also be used for non-generic dirty log
      support in a future patch (s390).
      
      The ulterior motive for providing a common implementation is to
      eliminate the dependency between arch and common code with respect to
      the memslot referenced by the dirty log, i.e. to make it obvious in the
      code that the validity of the memslot is guaranteed, as a future patch
      will rework memslot handling such that id_to_memslot() can return NULL.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0dff0846
  8. 17 2月, 2020 1 次提交
  9. 28 1月, 2020 5 次提交
  10. 24 1月, 2020 3 次提交
  11. 20 1月, 2020 2 次提交
  12. 06 12月, 2019 1 次提交
  13. 08 11月, 2019 1 次提交
  14. 29 10月, 2019 1 次提交
    • M
      KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put · 8e01d9a3
      Marc Zyngier 提交于
      When the VHE code was reworked, a lot of the vgic stuff was moved around,
      but the GICv4 residency code did stay untouched, meaning that we come
      in and out of residency on each flush/sync, which is obviously suboptimal.
      
      To address this, let's move things around a bit:
      
      - Residency entry (flush) moves to vcpu_load
      - Residency exit (sync) moves to vcpu_put
      - On blocking (entry to WFI), we "put"
      - On unblocking (exit from WFI), we "load"
      
      Because these can nest (load/block/put/load/unblock/put, for example),
      we now have per-VPE tracking of the residency state.
      
      Additionally, vgic_v4_put gains a "need doorbell" parameter, which only
      gets set to true when blocking because of a WFI. This allows a finer
      control of the doorbell, which now also gets disabled as soon as
      it gets signaled.
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
      8e01d9a3
  15. 22 10月, 2019 3 次提交
    • S
      KVM: arm64: Support stolen time reporting via shared structure · 8564d637
      Steven Price 提交于
      Implement the service call for configuring a shared structure between a
      VCPU and the hypervisor in which the hypervisor can write the time
      stolen from the VCPU's execution time by other tasks on the host.
      
      User space allocates memory which is placed at an IPA also chosen by user
      space. The hypervisor then updates the shared structure using
      kvm_put_guest() to ensure single copy atomicity of the 64-bit value
      reporting the stolen time in nanoseconds.
      
      Whenever stolen time is enabled by the guest, the stolen time counter is
      reset.
      
      The stolen time itself is retrieved from the sched_info structure
      maintained by the Linux scheduler code. We enable SCHEDSTATS when
      selecting KVM Kconfig to ensure this value is meaningful.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      8564d637
    • C
      KVM: arm/arm64: Allow user injection of external data aborts · da345174
      Christoffer Dall 提交于
      In some scenarios, such as buggy guest or incorrect configuration of the
      VMM and firmware description data, userspace will detect a memory access
      to a portion of the IPA, which is not mapped to any MMIO region.
      
      For this purpose, the appropriate action is to inject an external abort
      to the guest.  The kernel already has functionality to inject an
      external abort, but we need to wire up a signal from user space that
      lets user space tell the kernel to do this.
      
      It turns out, we already have the set event functionality which we can
      perfectly reuse for this.
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      da345174
    • C
      KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d
      Christoffer Dall 提交于
      For a long time, if a guest accessed memory outside of a memslot using
      any of the load/store instructions in the architecture which doesn't
      supply decoding information in the ESR_EL2 (the ISV bit is not set), the
      kernel would print the following message and terminate the VM as a
      result of returning -ENOSYS to userspace:
      
        load/store instruction decoding not implemented
      
      The reason behind this message is that KVM assumes that all accesses
      outside a memslot is an MMIO access which should be handled by
      userspace, and we originally expected to eventually implement some sort
      of decoding of load/store instructions where the ISV bit was not set.
      
      However, it turns out that many of the instructions which don't provide
      decoding information on abort are not safe to use for MMIO accesses, and
      the remaining few that would potentially make sense to use on MMIO
      accesses, such as those with register writeback, are not used in
      practice.  It also turns out that fetching an instruction from guest
      memory can be a pretty horrible affair, involving stopping all CPUs on
      SMP systems, handling multiple corner cases of address translation in
      software, and more.  It doesn't appear likely that we'll ever implement
      this in the kernel.
      
      What is much more common is that a user has misconfigured his/her guest
      and is actually not accessing an MMIO region, but just hitting some
      random hole in the IPA space.  In this scenario, the error message above
      is almost misleading and has led to a great deal of confusion over the
      years.
      
      It is, nevertheless, ABI to userspace, and we therefore need to
      introduce a new capability that userspace explicitly enables to change
      behavior.
      
      This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
      which does exactly that, and introduces a new exit reason to report the
      event to userspace.  User space can then emulate an exception to the
      guest, restart the guest, suspend the guest, or take any other
      appropriate action as per the policy of the running system.
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      c726200d
  16. 09 9月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier 提交于
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      92f35b75
  17. 05 8月, 2019 2 次提交
    • M
      KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block · 5eeaf10e
      Marc Zyngier 提交于
      Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer
      touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
      its GICv2 equivalent) loaded as long as we can, only syncing it
      back when we're scheduled out.
      
      There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
      which is indirectly called from kvm_vcpu_check_block(), needs to
      evaluate the guest's view of ICC_PMR_EL1. At the point were we
      call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
      changes to PMR is not visible in memory until we do a vcpu_put().
      
      Things go really south if the guest does the following:
      
      	mov x0, #0	// or any small value masking interrupts
      	msr ICC_PMR_EL1, x0
      
      	[vcpu preempted, then rescheduled, VMCR sampled]
      
      	mov x0, #ff	// allow all interrupts
      	msr ICC_PMR_EL1, x0
      	wfi		// traps to EL2, so samping of VMCR
      
      	[interrupt arrives just after WFI]
      
      Here, the hypervisor's view of PMR is zero, while the guest has enabled
      its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
      interrupts are pending (despite an interrupt being received) and we'll
      block for no reason. If the guest doesn't have a periodic interrupt
      firing once it has blocked, it will stay there forever.
      
      To avoid this unfortuante situation, let's resync VMCR from
      kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
      will observe the latest value of PMR.
      
      This has been found by booting an arm64 Linux guest with the pseudo NMI
      feature, and thus using interrupt priorities to mask interrupts instead
      of the usual PSTATE masking.
      
      Cc: stable@vger.kernel.org # 4.12
      Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      5eeaf10e
    • P
      KVM: remove kvm_arch_has_vcpu_debugfs() · 741cbbae
      Paolo Bonzini 提交于
      There is no need for this function as all arches have to implement
      kvm_arch_create_vcpu_debugfs() no matter what.  A #define symbol
      let us actually simplify the code.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      741cbbae
  18. 24 7月, 2019 1 次提交
  19. 23 7月, 2019 1 次提交
  20. 08 7月, 2019 1 次提交
    • M
      KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register · 1e0cf16c
      Marc Zyngier 提交于
      As part of setting up the host context, we populate its
      MPIDR by using cpu_logical_map(). It turns out that contrary
      to arm64, cpu_logical_map() on 32bit ARM doesn't return the
      *full* MPIDR, but a truncated version.
      
      This leaves the host MPIDR slightly corrupted after the first
      run of a VM, since we won't correctly restore the MPIDR on
      exit. Oops.
      
      Since we cannot trust cpu_logical_map(), let's adopt a different
      strategy. We move the initialization of the host CPU context as
      part of the per-CPU initialization (which, in retrospect, makes
      a lot of sense), and directly read the MPIDR from the HW. This
      is guaranteed to work on both arm and arm64.
      Reported-by: NAndre Przywara <Andre.Przywara@arm.com>
      Tested-by: NAndre Przywara <Andre.Przywara@arm.com>
      Fixes: 32f13955 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1e0cf16c
  21. 05 6月, 2019 2 次提交
  22. 28 5月, 2019 1 次提交
    • T
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · a86cb413
      Thomas Huth 提交于
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      a86cb413
  23. 25 4月, 2019 1 次提交
  24. 24 4月, 2019 3 次提交
    • A
      arm64: KVM: Enable VHE support for :G/:H perf event modifiers · 435e53fb
      Andrew Murray 提交于
      With VHE different exception levels are used between the host (EL2) and
      guest (EL1) with a shared exception level for userpace (EL0). We can take
      advantage of this and use the PMU's exception level filtering to avoid
      enabling/disabling counters in the world-switch code. Instead we just
      modify the counter type to include or exclude EL0 at vcpu_{load,put} time.
      
      We also ensure that trapped PMU system register writes do not re-enable
      EL0 when reconfiguring the backing perf events.
      
      This approach completely avoids blackout windows seen with !VHE.
      Suggested-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      435e53fb
    • A
      arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data · 630a1685
      Andrew Murray 提交于
      The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
      a typedef to kvm_cpu_context and is used to store host cpu context. The
      kvm_cpu_context structure is also used elsewhere to hold vcpu context.
      In order to use the percpu to hold additional future host information we
      encapsulate kvm_cpu_context in a new structure and rename the typedef and
      percpu to match.
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      630a1685
    • M
      KVM: arm/arm64: Context-switch ptrauth registers · 384b40ca
      Mark Rutland 提交于
      When pointer authentication is supported, a guest may wish to use it.
      This patch adds the necessary KVM infrastructure for this to work, with
      a semi-lazy context switch of the pointer auth state.
      
      Pointer authentication feature is only enabled when VHE is built
      in the kernel and present in the CPU implementation so only VHE code
      paths are modified.
      
      When we schedule a vcpu, we disable guest usage of pointer
      authentication instructions and accesses to the keys. While these are
      disabled, we avoid context-switching the keys. When we trap the guest
      trying to use pointer authentication functionality, we change to eagerly
      context-switching the keys, and enable the feature. The next time the
      vcpu is scheduled out/in, we start again. However the host key save is
      optimized and implemented inside ptrauth instruction/register access
      trap.
      
      Pointer authentication consists of address authentication and generic
      authentication, and CPUs in a system might have varied support for
      either. Where support for either feature is not uniform, it is hidden
      from guests via ID register emulation, as a result of the cpufeature
      framework in the host.
      
      Unfortunately, address authentication and generic authentication cannot
      be trapped separately, as the architecture provides a single EL2 trap
      covering both. If we wish to expose one without the other, we cannot
      prevent a (badly-written) guest from intermittently using a feature
      which is not uniformly supported (when scheduled on a physical CPU which
      supports the relevant feature). Hence, this patch expects both type of
      authentication to be present in a cpu.
      
      This switch of key is done from guest enter/exit assembly as preparation
      for the upcoming in-kernel pointer authentication support. Hence, these
      key switching routines are not implemented in C code as they may cause
      pointer authentication key signing error in some situations.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
      , save host key in ptrauth exception trap]
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      [maz: various fixups]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      384b40ca