1. 25 5月, 2020 1 次提交
  2. 16 5月, 2020 1 次提交
    • K
      KVM: arm64: Support enabling dirty log gradually in small chunks · c862626e
      Keqian Zhu 提交于
      There is already support of enabling dirty log gradually in small chunks
      for x86 in commit 3c9bd400 ("KVM: x86: enable dirty log gradually in
      small chunks"). This adds support for arm64.
      
      x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
      is enabled. However, for arm64, both huge pages and normal pages can be
      write protected gradually by userspace.
      
      Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
      Linux VMs with different page size. The memory pressure is 127G in each
      case. The time taken of memory_global_dirty_log_start in QEMU is listed
      below:
      
      Page Size      Before    After Optimization
        4K            650ms         1.8ms
        2M             4ms          1.8ms
        1G             2ms          1.8ms
      
      Besides the time reduction, the biggest improvement is that we will minimize
      the performance side effect (because of dissolving huge pages and marking
      memslots dirty) on guest after enabling dirty log.
      Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com
      c862626e
  3. 24 3月, 2020 1 次提交
  4. 17 2月, 2020 1 次提交
  5. 28 1月, 2020 3 次提交
  6. 23 1月, 2020 1 次提交
    • M
      KVM: arm/arm64: Cleanup MMIO handling · 0e20f5e2
      Marc Zyngier 提交于
      Our MMIO handling is a bit odd, in the sense that it uses an
      intermediate per-vcpu structure to store the various decoded
      information that describe the access.
      
      But the same information is readily available in the HSR/ESR_EL2
      field, and we actually use this field to populate the structure.
      
      Let's simplify the whole thing by getting rid of the superfluous
      structure and save a (tiny) bit of space in the vcpu structure.
      
      [32bit fix courtesy of Olof Johansson <olof@lixom.net>]
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      0e20f5e2
  7. 16 1月, 2020 1 次提交
  8. 15 1月, 2020 1 次提交
    • S
      arm64: Introduce system_capabilities_finalized() marker · b51c6ac2
      Suzuki K Poulose 提交于
      We finalize the system wide capabilities after the SMP CPUs
      are booted by the kernel. This is used as a marker for deciding
      various checks in the kernel. e.g, sanity check the hotplugged
      CPUs for missing mandatory features.
      
      However there is no explicit helper available for this in the
      kernel. There is sys_caps_initialised, which is not exposed.
      The other closest one we have is the jump_label arm64_const_caps_ready
      which denotes that the capabilities are set and the capability checks
      could use the individual jump_labels for fast path. This is
      performed before setting the ELF Hwcaps, which must be checked
      against the new CPUs. We also perform some of the other initialization
      e.g, SVE setup, which is important for the use of FP/SIMD
      where SVE is supported. Normally userspace doesn't get to run
      before we finish this. However the in-kernel users may
      potentially start using the neon mode. So, we need to
      reject uses of neon mode before we are set. Instead of defining
      a new marker for the completion of SVE setup, we could simply
      reuse the arm64_const_caps_ready and enable it once we have
      finished all the setup. Also we could expose this to the
      various users as "system_capabilities_finalized()" to make
      it more meaningful than "const_caps_ready".
      
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      b51c6ac2
  9. 22 10月, 2019 4 次提交
    • S
      KVM: arm64: Provide VCPU attributes for stolen time · 58772e9a
      Steven Price 提交于
      Allow user space to inform the KVM host where in the physical memory
      map the paravirtualized time structures should be located.
      
      User space can set an attribute on the VCPU providing the IPA base
      address of the stolen time structure for that VCPU. This must be
      repeated for every VCPU in the VM.
      
      The address is given in terms of the physical address visible to
      the guest and must be 64 byte aligned. The guest will discover the
      address via a hypercall.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      58772e9a
    • S
      KVM: arm64: Support stolen time reporting via shared structure · 8564d637
      Steven Price 提交于
      Implement the service call for configuring a shared structure between a
      VCPU and the hypervisor in which the hypervisor can write the time
      stolen from the VCPU's execution time by other tasks on the host.
      
      User space allocates memory which is placed at an IPA also chosen by user
      space. The hypervisor then updates the shared structure using
      kvm_put_guest() to ensure single copy atomicity of the 64-bit value
      reporting the stolen time in nanoseconds.
      
      Whenever stolen time is enabled by the guest, the stolen time counter is
      reset.
      
      The stolen time itself is retrieved from the sched_info structure
      maintained by the Linux scheduler code. We enable SCHEDSTATS when
      selecting KVM Kconfig to ensure this value is meaningful.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      8564d637
    • S
      KVM: arm64: Implement PV_TIME_FEATURES call · b48c1a45
      Steven Price 提交于
      This provides a mechanism for querying which paravirtualized time
      features are available in this hypervisor.
      
      Also add the header file which defines the ABI for the paravirtualized
      time features we're about to add.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      b48c1a45
    • C
      KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d
      Christoffer Dall 提交于
      For a long time, if a guest accessed memory outside of a memslot using
      any of the load/store instructions in the architecture which doesn't
      supply decoding information in the ESR_EL2 (the ISV bit is not set), the
      kernel would print the following message and terminate the VM as a
      result of returning -ENOSYS to userspace:
      
        load/store instruction decoding not implemented
      
      The reason behind this message is that KVM assumes that all accesses
      outside a memslot is an MMIO access which should be handled by
      userspace, and we originally expected to eventually implement some sort
      of decoding of load/store instructions where the ISV bit was not set.
      
      However, it turns out that many of the instructions which don't provide
      decoding information on abort are not safe to use for MMIO accesses, and
      the remaining few that would potentially make sense to use on MMIO
      accesses, such as those with register writeback, are not used in
      practice.  It also turns out that fetching an instruction from guest
      memory can be a pretty horrible affair, involving stopping all CPUs on
      SMP systems, handling multiple corner cases of address translation in
      software, and more.  It doesn't appear likely that we'll ever implement
      this in the kernel.
      
      What is much more common is that a user has misconfigured his/her guest
      and is actually not accessing an MMIO region, but just hitting some
      random hole in the IPA space.  In this scenario, the error message above
      is almost misleading and has led to a great deal of confusion over the
      years.
      
      It is, nevertheless, ABI to userspace, and we therefore need to
      introduce a new capability that userspace explicitly enables to change
      behavior.
      
      This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
      which does exactly that, and introduces a new exit reason to report the
      event to userspace.  User space can then emulate an exception to the
      guest, restart the guest, suspend the guest, or take any other
      appropriate action as per the policy of the running system.
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      c726200d
  10. 15 10月, 2019 1 次提交
    • M
      arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear · f2266504
      Marc Zyngier 提交于
      The GICv3 architecture specification is incredibly misleading when it
      comes to PMR and the requirement for a DSB. It turns out that this DSB
      is only required if the CPU interface sends an Upstream Control
      message to the redistributor in order to update the RD's view of PMR.
      
      This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't
      the case in Linux. It can still be set from EL3, so some special care
      is required. But the upshot is that in the (hopefuly large) majority
      of the cases, we can drop the DSB altogether.
      
      This relies on a new static key being set if the boot CPU has PMHE
      set. The drawback is that this static key has to be exported to
      modules.
      
      Cc: Will Deacon <will@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f2266504
  11. 08 7月, 2019 1 次提交
    • M
      KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register · 1e0cf16c
      Marc Zyngier 提交于
      As part of setting up the host context, we populate its
      MPIDR by using cpu_logical_map(). It turns out that contrary
      to arm64, cpu_logical_map() on 32bit ARM doesn't return the
      *full* MPIDR, but a truncated version.
      
      This leaves the host MPIDR slightly corrupted after the first
      run of a VM, since we won't correctly restore the MPIDR on
      exit. Oops.
      
      Since we cannot trust cpu_logical_map(), let's adopt a different
      strategy. We move the initialization of the host CPU context as
      part of the per-CPU initialization (which, in retrospect, makes
      a lot of sense), and directly read the MPIDR from the HW. This
      is guaranteed to work on both arm and arm64.
      Reported-by: NAndre Przywara <Andre.Przywara@arm.com>
      Tested-by: NAndre Przywara <Andre.Przywara@arm.com>
      Fixes: 32f13955 ("arm/arm64: KVM: Statically configure the host's view of MPIDR")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1e0cf16c
  12. 05 7月, 2019 1 次提交
  13. 21 6月, 2019 1 次提交
    • J
      arm64: Fix incorrect irqflag restore for priority masking · bd82d4bd
      Julien Thierry 提交于
      When using IRQ priority masking to disable interrupts, in order to deal
      with the PSR.I state, local_irq_save() would convert the I bit into a
      PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
      potentially modifying the value of PMR in undesired location due to the
      state of PSR.I upon flag saving [1].
      
      In an attempt to solve this issue in a less hackish manner, introduce
      a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
      whether PSR.I is being used to disable interrupts, in which case it
      takes precedence of the status of interrupt masking via PMR.
      
      GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
      GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
      some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
      requires PMR not to mask interrupts that could be signaled to the
      CPU when using only PSR.I.
      
      [1] https://www.spinics.net/lists/arm-kernel/msg716956.html
      
      Fixes: 4a503217 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
      Cc: <stable@vger.kernel.org> # 5.1.x-
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Wei Li <liwei391@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      bd82d4bd
  14. 19 6月, 2019 1 次提交
  15. 24 5月, 2019 1 次提交
    • J
      KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation · b7c50fab
      James Morse 提交于
      KVM's pmu.c contains the __hyp_text needed to switch the pmu registers
      between host and guest. Because this isn't covered by the 'hyp' Makefile,
      it can be built with kasan and friends when these are enabled in Kconfig.
      
      When starting a guest, this results in:
      | Kernel panic - not syncing: HYP panic:
      | PS:a00003c9 PC:000083000028ada0 ESR:86000007
      | FAR:000083000028ada0 HPFAR:0000000029df5300 PAR:0000000000000000
      | VCPU:000000004e10b7d6
      | CPU: 0 PID: 3088 Comm: qemu-system-aar Not tainted 5.2.0-rc1 #11026
      | Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Plat
      | Call trace:
      |  dump_backtrace+0x0/0x200
      |  show_stack+0x20/0x30
      |  dump_stack+0xec/0x158
      |  panic+0x1ec/0x420
      |  panic+0x0/0x420
      | SMP: stopping secondary CPUs
      | Kernel Offset: disabled
      | CPU features: 0x002,25006082
      | Memory Limit: none
      | ---[ end Kernel panic - not syncing: HYP panic:
      
      This is caused by functions in pmu.c calling the instrumented
      code, which isn't mapped to hyp. From objdump -r:
      | RELOCATION RECORDS FOR [.hyp.text]:
      | OFFSET           TYPE              VALUE
      | 0000000000000010 R_AARCH64_CALL26  __sanitizer_cov_trace_pc
      | 0000000000000018 R_AARCH64_CALL26  __asan_load4_noabort
      | 0000000000000024 R_AARCH64_CALL26  __asan_load4_noabort
      
      Move the affected code to a new file under 'hyp's Makefile.
      
      Fixes: 3d91befb ("arm64: KVM: Enable !VHE support for :G/:H perf event modifiers")
      Cc: Andrew Murray <Andrew.Murray@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      b7c50fab
  16. 24 4月, 2019 6 次提交
    • A
      arm64: KVM: Enable VHE support for :G/:H perf event modifiers · 435e53fb
      Andrew Murray 提交于
      With VHE different exception levels are used between the host (EL2) and
      guest (EL1) with a shared exception level for userpace (EL0). We can take
      advantage of this and use the PMU's exception level filtering to avoid
      enabling/disabling counters in the world-switch code. Instead we just
      modify the counter type to include or exclude EL0 at vcpu_{load,put} time.
      
      We also ensure that trapped PMU system register writes do not re-enable
      EL0 when reconfiguring the backing perf events.
      
      This approach completely avoids blackout windows seen with !VHE.
      Suggested-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      435e53fb
    • A
      arm64: KVM: Enable !VHE support for :G/:H perf event modifiers · 3d91befb
      Andrew Murray 提交于
      Enable/disable event counters as appropriate when entering and exiting
      the guest to enable support for guest or host only event counting.
      
      For both VHE and non-VHE we switch the counters between host/guest at
      EL2.
      
      The PMU may be on when we change which counters are enabled however
      we avoid adding an isb as we instead rely on existing context
      synchronisation events: the eret to enter the guest (__guest_enter)
      and eret in kvm_call_hyp for __kvm_vcpu_run_nvhe on returning.
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      3d91befb
    • A
      arm64: KVM: Add accessors to track guest/host only counters · eb41238c
      Andrew Murray 提交于
      In order to effeciently switch events_{guest,host} perf counters at
      guest entry/exit we add bitfields to kvm_cpu_context for guest and host
      events as well as accessors for updating them.
      
      A function is also provided which allows the PMU driver to determine
      if a counter should start counting when it is enabled. With exclude_host,
      we may only start counting when entering the guest.
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      eb41238c
    • A
      arm64: KVM: Encapsulate kvm_cpu_context in kvm_host_data · 630a1685
      Andrew Murray 提交于
      The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
      a typedef to kvm_cpu_context and is used to store host cpu context. The
      kvm_cpu_context structure is also used elsewhere to hold vcpu context.
      In order to use the percpu to hold additional future host information we
      encapsulate kvm_cpu_context in a new structure and rename the typedef and
      percpu to match.
      Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      630a1685
    • A
      KVM: arm64: Add userspace flag to enable pointer authentication · a22fa321
      Amit Daniel Kachhap 提交于
      Now that the building blocks of pointer authentication are present, lets
      add userspace flags KVM_ARM_VCPU_PTRAUTH_ADDRESS and
      KVM_ARM_VCPU_PTRAUTH_GENERIC. These flags will enable pointer
      authentication for the KVM guest on a per-vcpu basis through the ioctl
      KVM_ARM_VCPU_INIT.
      
      This features will allow the KVM guest to allow the handling of
      pointer authentication instructions or to treat them as undefined
      if not set.
      
      Necessary documentations are added to reflect the changes done.
      Reviewed-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      a22fa321
    • M
      KVM: arm/arm64: Context-switch ptrauth registers · 384b40ca
      Mark Rutland 提交于
      When pointer authentication is supported, a guest may wish to use it.
      This patch adds the necessary KVM infrastructure for this to work, with
      a semi-lazy context switch of the pointer auth state.
      
      Pointer authentication feature is only enabled when VHE is built
      in the kernel and present in the CPU implementation so only VHE code
      paths are modified.
      
      When we schedule a vcpu, we disable guest usage of pointer
      authentication instructions and accesses to the keys. While these are
      disabled, we avoid context-switching the keys. When we trap the guest
      trying to use pointer authentication functionality, we change to eagerly
      context-switching the keys, and enable the feature. The next time the
      vcpu is scheduled out/in, we start again. However the host key save is
      optimized and implemented inside ptrauth instruction/register access
      trap.
      
      Pointer authentication consists of address authentication and generic
      authentication, and CPUs in a system might have varied support for
      either. Where support for either feature is not uniform, it is hidden
      from guests via ID register emulation, as a result of the cpufeature
      framework in the host.
      
      Unfortunately, address authentication and generic authentication cannot
      be trapped separately, as the architecture provides a single EL2 trap
      covering both. If we wish to expose one without the other, we cannot
      prevent a (badly-written) guest from intermittently using a feature
      which is not uniformly supported (when scheduled on a physical CPU which
      supports the relevant feature). Hence, this patch expects both type of
      authentication to be present in a cpu.
      
      This switch of key is done from guest enter/exit assembly as preparation
      for the upcoming in-kernel pointer authentication support. Hence, these
      key switching routines are not implemented in C code as they may cause
      pointer authentication key signing error in some situations.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks
      , save host key in ptrauth exception trap]
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      [maz: various fixups]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      384b40ca
  17. 23 4月, 2019 1 次提交
  18. 19 4月, 2019 2 次提交
  19. 29 3月, 2019 9 次提交
    • D
      KVM: arm64/sve: Allow userspace to enable SVE for vcpus · 9a3cdf26
      Dave Martin 提交于
      Now that all the pieces are in place, this patch offers a new flag
      KVM_ARM_VCPU_SVE that userspace can pass to KVM_ARM_VCPU_INIT to
      turn on SVE for the guest, on a per-vcpu basis.
      
      As part of this, support for initialisation and reset of the SVE
      vector length set and registers is added in the appropriate places,
      as well as finally setting the KVM_ARM64_GUEST_HAS_SVE vcpu flag,
      to turn on the SVE support code.
      
      Allocation of the SVE register storage in vcpu->arch.sve_state is
      deferred until the SVE configuration is finalized, by which time
      the size of the registers is known.
      
      Setting the vector lengths supported by the vcpu is considered
      configuration of the emulated hardware rather than runtime
      configuration, so no support is offered for changing the vector
      lengths available to an existing vcpu across reset.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      9a3cdf26
    • D
      KVM: arm64/sve: Add pseudo-register for the guest's vector lengths · 9033bba4
      Dave Martin 提交于
      This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
      allow userspace to set and query the set of vector lengths visible
      to the guest.
      
      In the future, multiple register slices per SVE register may be
      visible through the ioctl interface.  Once the set of slices has
      been determined we would not be able to allow the vector length set
      to be changed any more, in order to avoid userspace seeing
      inconsistent sets of registers.  For this reason, this patch adds
      support for explicit finalization of the SVE configuration via the
      KVM_ARM_VCPU_FINALIZE ioctl.
      
      Finalization is the proper place to allocate the SVE register state
      storage in vcpu->arch.sve_state, so this patch adds that as
      appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
      was previously a no-op on arm64.
      
      To simplify the logic for determining what vector lengths can be
      supported, some code is added to KVM init to work this out, in the
      kvm_arm_init_arch_resources() hook.
      
      The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
      Subsequent patches will allow SVE to be turned on for guest vcpus,
      making it visible.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      9033bba4
    • D
      KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl · 7dd32a0d
      Dave Martin 提交于
      Some aspects of vcpu configuration may be too complex to be
      completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
      requirement for userspace to do some additional configuration
      before various other ioctls will work in a consistent way.
      
      In particular this will be the case for SVE, where userspace will
      need to negotiate the set of vector lengths to be made available to
      the guest before the vcpu becomes fully usable.
      
      In order to provide an explicit way for userspace to confirm that
      it has finished setting up a particular vcpu feature, this patch
      adds a new ioctl KVM_ARM_VCPU_FINALIZE.
      
      When userspace has opted into a feature that requires finalization,
      typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
      matching call to KVM_ARM_VCPU_FINALIZE is now required before
      KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
      impose additional restrictions where appropriate.
      
      No existing vcpu features are affected by this, so current
      userspace implementations will continue to work exactly as before,
      with no need to issue KVM_ARM_VCPU_FINALIZE.
      
      As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
      placeholder: no finalizable features exist yet, so ioctl is not
      required and will always yield EINVAL.  Subsequent patches will add
      the finalization logic to make use of this ioctl for SVE.
      
      No functional change for existing userspace.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7dd32a0d
    • D
      KVM: arm/arm64: Add hook for arch-specific KVM initialisation · 0f062bfe
      Dave Martin 提交于
      This patch adds a kvm_arm_init_arch_resources() hook to perform
      subarch-specific initialisation when starting up KVM.
      
      This will be used in a subsequent patch for global SVE-related
      setup on arm64.
      
      No functional change.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      0f062bfe
    • D
      KVM: arm64/sve: Add SVE support to register access ioctl interface · e1c9c983
      Dave Martin 提交于
      This patch adds the following registers for access via the
      KVM_{GET,SET}_ONE_REG interface:
      
       * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
       * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
       * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)
      
      In order to adapt gracefully to future architectural extensions,
      the registers are logically divided up into slices as noted above:
      the i parameter denotes the slice index.
      
      This allows us to reserve space in the ABI for future expansion of
      these registers.  However, as of today the architecture does not
      permit registers to be larger than a single slice, so no code is
      needed in the kernel to expose additional slices, for now.  The
      code can be extended later as needed to expose them up to a maximum
      of 32 slices (as carved out in the architecture itself) if they
      really exist someday.
      
      The registers are only visible for vcpus that have SVE enabled.
      They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
      have SVE.
      
      Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
      allowed for SVE-enabled vcpus: SVE-aware userspace can use the
      KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
      register state.  This avoids some complex and pointless emulation
      in the kernel to convert between the two views of these aliased
      registers.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e1c9c983
    • D
      KVM: arm64/sve: Context switch the SVE registers · b43b5dd9
      Dave Martin 提交于
      In order to give each vcpu its own view of the SVE registers, this
      patch adds context storage via a new sve_state pointer in struct
      vcpu_arch.  An additional member sve_max_vl is also added for each
      vcpu, to determine the maximum vector length visible to the guest
      and thus the value to be configured in ZCR_EL2.LEN while the vcpu
      is active.  This also determines the layout and size of the storage
      in sve_state, which is read and written by the same backend
      functions that are used for context-switching the SVE state for
      host tasks.
      
      On SVE-enabled vcpus, SVE access traps are now handled by switching
      in the vcpu's SVE context and disabling the trap before returning
      to the guest.  On other vcpus, the trap is not handled and an exit
      back to the host occurs, where the handle_sve() fallback path
      reflects an undefined instruction exception back to the guest,
      consistently with the behaviour of non-SVE-capable hardware (as was
      done unconditionally prior to this patch).
      
      No SVE handling is added on non-VHE-only paths, since VHE is an
      architectural and Kconfig prerequisite of SVE.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      b43b5dd9
    • D
      KVM: arm64/sve: System register context switch and access support · 73433762
      Dave Martin 提交于
      This patch adds the necessary support for context switching ZCR_EL1
      for each vcpu.
      
      ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
      sense for it to be handled as part of the guest FPSIMD/SVE context
      for context switch purposes instead of handling it as a general
      system register.  This means that it can be switched in lazily at
      the appropriate time.  No effort is made to track host context for
      this register, since SVE requires VHE: thus the hosts's value for
      this register lives permanently in ZCR_EL2 and does not alias the
      guest's value at any time.
      
      The Hyp switch and fpsimd context handling code is extended
      appropriately.
      
      Accessors are added in sys_regs.c to expose the SVE system
      registers and ID register fields.  Because these need to be
      conditionally visible based on the guest configuration, they are
      implemented separately for now rather than by use of the generic
      system register helpers.  This may be abstracted better later on
      when/if there are more features requiring this model.
      
      ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
      guest, but for compatibility with non-SVE aware KVM implementations
      the register should not be enumerated at all for KVM_GET_REG_LIST
      in this case.  For consistency we also reject ioctl access to the
      register.  This ensures that a non-SVE-enabled guest looks the same
      to userspace, irrespective of whether the kernel KVM implementation
      supports SVE.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      73433762
    • D
      KVM: arm64: Add a vcpu flag to control SVE visibility for the guest · 1765edba
      Dave Martin 提交于
      Since SVE will be enabled or disabled on a per-vcpu basis, a flag
      is needed in order to track which vcpus have it enabled.
      
      This patch adds a suitable flag and a helper for checking it.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1765edba
    • D
      KVM: arm64: Add missing #includes to kvm_host.h · 3f61f409
      Dave Martin 提交于
      kvm_host.h uses some declarations from other headers that are
      currently included by accident, without an explicit #include.
      
      This patch adds a few #includes that are clearly missing.  Although
      the header builds without them today, this should help to avoid
      future surprises.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      3f61f409
  20. 20 2月, 2019 2 次提交
    • C
      KVM: arm/arm64: Factor out VMID into struct kvm_vmid · e329fb75
      Christoffer Dall 提交于
      In preparation for nested virtualization where we are going to have more
      than a single VMID per VM, let's factor out the VMID data into a
      separate VMID data structure and change the VMID allocator to operate on
      this new structure instead of using a struct kvm.
      
      This also means that udate_vttbr now becomes update_vmid, and that the
      vttbr itself is generated on the fly based on the stage 2 page table
      base address and the vmid.
      
      We cache the physical address of the pgd when allocating the pgd to
      avoid doing the calculation on every entry to the guest and to avoid
      calling into potentially non-hyp-mapped code from hyp/EL2.
      
      If we wanted to merge the VMID allocator with the arm64 ASID allocator
      at some point in the future, it should actually become easier to do that
      after this patch.
      
      Note that to avoid mapping the kvm_vmid_bits variable into hyp, we
      simply forego the masking of the vmid value in kvm_get_vttbr and rely on
      update_vmid to always assign a valid vmid value (within the supported
      range).
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      [maz: minor cleanups]
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e329fb75
    • M
      arm/arm64: KVM: Statically configure the host's view of MPIDR · 32f13955
      Marc Zyngier 提交于
      We currently eagerly save/restore MPIDR. It turns out to be
      slightly pointless:
      - On the host, this value is known as soon as we're scheduled on a
        physical CPU
      - In the guest, this value cannot change, as it is set by KVM
        (and this is a read-only register)
      
      The result of the above is that we can perfectly avoid the eager
      saving of MPIDR_EL1, and only keep the restore. We just have
      to setup the host contexts appropriately at boot time.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      32f13955