1. 28 2月, 2020 4 次提交
  2. 31 1月, 2020 1 次提交
  3. 28 11月, 2019 1 次提交
  4. 22 10月, 2019 3 次提交
    • S
      KVM: arm64: Provide VCPU attributes for stolen time · 58772e9a
      Steven Price 提交于
      Allow user space to inform the KVM host where in the physical memory
      map the paravirtualized time structures should be located.
      
      User space can set an attribute on the VCPU providing the IPA base
      address of the stolen time structure for that VCPU. This must be
      repeated for every VCPU in the VM.
      
      The address is given in terms of the physical address visible to
      the guest and must be 64 byte aligned. The guest will discover the
      address via a hypercall.
      Signed-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      58772e9a
    • C
      KVM: arm/arm64: Allow user injection of external data aborts · da345174
      Christoffer Dall 提交于
      In some scenarios, such as buggy guest or incorrect configuration of the
      VMM and firmware description data, userspace will detect a memory access
      to a portion of the IPA, which is not mapped to any MMIO region.
      
      For this purpose, the appropriate action is to inject an external abort
      to the guest.  The kernel already has functionality to inject an
      external abort, but we need to wire up a signal from user space that
      lets user space tell the kernel to do this.
      
      It turns out, we already have the set event functionality which we can
      perfectly reuse for this.
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      da345174
    • C
      KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d
      Christoffer Dall 提交于
      For a long time, if a guest accessed memory outside of a memslot using
      any of the load/store instructions in the architecture which doesn't
      supply decoding information in the ESR_EL2 (the ISV bit is not set), the
      kernel would print the following message and terminate the VM as a
      result of returning -ENOSYS to userspace:
      
        load/store instruction decoding not implemented
      
      The reason behind this message is that KVM assumes that all accesses
      outside a memslot is an MMIO access which should be handled by
      userspace, and we originally expected to eventually implement some sort
      of decoding of load/store instructions where the ISV bit was not set.
      
      However, it turns out that many of the instructions which don't provide
      decoding information on abort are not safe to use for MMIO accesses, and
      the remaining few that would potentially make sense to use on MMIO
      accesses, such as those with register writeback, are not used in
      practice.  It also turns out that fetching an instruction from guest
      memory can be a pretty horrible affair, involving stopping all CPUs on
      SMP systems, handling multiple corner cases of address translation in
      software, and more.  It doesn't appear likely that we'll ever implement
      this in the kernel.
      
      What is much more common is that a user has misconfigured his/her guest
      and is actually not accessing an MMIO region, but just hitting some
      random hole in the IPA space.  In this scenario, the error message above
      is almost misleading and has led to a great deal of confusion over the
      years.
      
      It is, nevertheless, ABI to userspace, and we therefore need to
      introduce a new capability that userspace explicitly enables to change
      behavior.
      
      This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
      which does exactly that, and introduces a new exit reason to report the
      event to userspace.  User space can then emulate an exception to the
      guest, restart the guest, suspend the guest, or take any other
      appropriate action as per the policy of the running system.
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      c726200d
  5. 21 10月, 2019 1 次提交
    • F
      KVM: PPC: Report single stepping capability · 1a9167a2
      Fabiano Rosas 提交于
      When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
      the next instruction to be single stepped via the
      KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
      
      This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
      to inform userspace about the state of single stepping support.
      
      We currently don't have support for guest single stepping implemented
      in Book3S HV so the capability is only present for Book3S PR and
      BookE.
      Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1a9167a2
  6. 24 9月, 2019 1 次提交
  7. 20 9月, 2019 1 次提交
  8. 11 9月, 2019 1 次提交
  9. 09 9月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier 提交于
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      92f35b75
  10. 24 7月, 2019 1 次提交
  11. 11 7月, 2019 1 次提交
    • E
      KVM: x86: PMU Event Filter · 66bb8a06
      Eric Hankland 提交于
      Some events can provide a guest with information about other guests or the
      host (e.g. L3 cache stats); providing the capability to restrict access
      to a "safe" set of events would limit the potential for the PMU to be used
      in any side channel attacks. This change introduces a new VM ioctl that
      sets an event filter. If the guest attempts to program a counter for
      any blacklisted or non-whitelisted event, the kernel counter won't be
      created, so any RDPMC/RDMSR will show 0 instances of that event.
      Signed-off-by: NEric Hankland <ehankland@google.com>
      [Lots of changes. All remaining bugs are probably mine. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66bb8a06
  12. 05 6月, 2019 1 次提交
  13. 08 5月, 2019 1 次提交
  14. 30 4月, 2019 2 次提交
  15. 24 4月, 2019 1 次提交
    • A
      KVM: arm64: Add capability to advertise ptrauth for guest · a243c16d
      Amit Daniel Kachhap 提交于
      This patch advertises the capability of two cpu feature called address
      pointer authentication and generic pointer authentication. These
      capabilities depend upon system support for pointer authentication and
      VHE mode.
      
      The current arm64 KVM partially implements pointer authentication and
      support of address/generic authentication are tied together. However,
      separate ABI requirements for both of them is added so that any future
      isolated implementation will not require any ABI changes.
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      a243c16d
  16. 29 3月, 2019 3 次提交
    • D
      KVM: arm64: Add a capability to advertise SVE support · 555f3d03
      Dave Martin 提交于
      To provide a uniform way to check for KVM SVE support amongst other
      features, this patch adds a suitable capability KVM_CAP_ARM_SVE,
      and reports it as present when SVE is available.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      555f3d03
    • D
      KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl · 7dd32a0d
      Dave Martin 提交于
      Some aspects of vcpu configuration may be too complex to be
      completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
      requirement for userspace to do some additional configuration
      before various other ioctls will work in a consistent way.
      
      In particular this will be the case for SVE, where userspace will
      need to negotiate the set of vector lengths to be made available to
      the guest before the vcpu becomes fully usable.
      
      In order to provide an explicit way for userspace to confirm that
      it has finished setting up a particular vcpu feature, this patch
      adds a new ioctl KVM_ARM_VCPU_FINALIZE.
      
      When userspace has opted into a feature that requires finalization,
      typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
      matching call to KVM_ARM_VCPU_FINALIZE is now required before
      KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
      impose additional restrictions where appropriate.
      
      No existing vcpu features are affected by this, so current
      userspace implementations will continue to work exactly as before,
      with no need to issue KVM_ARM_VCPU_FINALIZE.
      
      As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
      placeholder: no finalizable features exist yet, so ioctl is not
      required and will always yield EINVAL.  Subsequent patches will add
      the finalization logic to make use of this ioctl for SVE.
      
      No functional change for existing userspace.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Thierry <julien.thierry@arm.com>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      7dd32a0d
    • D
      KVM: Allow 2048-bit register access via ioctl interface · 2b953ea3
      Dave Martin 提交于
      The Arm SVE architecture defines registers that are up to 2048 bits
      in size (with some possibility of further future expansion).
      
      In order to avoid the need for an excessively large number of
      ioctls when saving and restoring a vcpu's registers, this patch
      adds a #define to make support for individual 2048-bit registers
      through the KVM_{GET,SET}_ONE_REG ioctl interface official.  This
      will allow each SVE register to be accessed in a single call.
      
      There are sufficient spare bits in the register id size field for
      this change, so there is no ABI impact, providing that
      KVM_GET_REG_LIST does not enumerate any 2048-bit register unless
      userspace explicitly opts in to the relevant architecture-specific
      features.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NAlex Bennée <alex.bennee@linaro.org>
      Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      2b953ea3
  17. 15 12月, 2018 1 次提交
    • V
      x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID · 2bc39970
      Vitaly Kuznetsov 提交于
      With every new Hyper-V Enlightenment we implement we're forced to add a
      KVM_CAP_HYPERV_* capability. While this approach works it is fairly
      inconvenient: the majority of the enlightenments we do have corresponding
      CPUID feature bit(s) and userspace has to know this anyways to be able to
      expose the feature to the guest.
      
      Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
      cap to rule them all!") returning all Hyper-V CPUID feature leaves.
      
      Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
      Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
      0x40000001) and we would probably confuse userspace in case we decide to
      return these twice.
      
      KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
      KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2bc39970
  18. 14 12月, 2018 1 次提交
    • P
      kvm: introduce manual dirty log reprotect · 2a31b9db
      Paolo Bonzini 提交于
      There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
      it can take kvm->mmu_lock for an extended period of time.  Second, its user
      can actually see many false positives in some cases.  The latter is due
      to a benign race like this:
      
        1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
           them.
        2. The guest modifies the pages, causing them to be marked ditry.
        3. Userspace actually copies the pages.
        4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
           they were not written to since (3).
      
      This is especially a problem for large guests, where the time between
      (1) and (3) can be substantial.  This patch introduces a new
      capability which, when enabled, makes KVM_GET_DIRTY_LOG not
      write-protect the pages it returns.  Instead, userspace has to
      explicitly clear the dirty log bits just before using the content
      of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
      64-page granularity rather than requiring to sync a full memslot;
      this way, the mmu_lock is taken for small amounts of time, and
      only a small amount of time will pass between write protection
      of pages and the sending of their content.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2a31b9db
  19. 18 10月, 2018 1 次提交
    • J
      kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD · c4f55198
      Jim Mattson 提交于
      This is a per-VM capability which can be enabled by userspace so that
      the faulting linear address will be included with the information
      about a pending #PF in L2, and the "new DR6 bits" will be included
      with the information about a pending #DB in L2. With this capability
      enabled, the L1 hypervisor can now intercept #PF before CR2 is
      modified. Under VMX, the L1 hypervisor can now intercept #DB before
      DR6 and DR7 are modified.
      
      When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
      generally provide an appropriate payload when injecting a #PF or #DB
      exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
      checkpoints, this payload is not required.
      
      Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
      exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
      region while advanced debugging of RTM transactional regions was
      enabled. This is the reverse of DR6.RTM, which is cleared in this
      scenario.
      
      This capability also enables exception.pending in struct
      kvm_vcpu_events, which allows userspace to distinguish between pending
      and injected exceptions.
      Reported-by: NJim Mattson <jmattson@google.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c4f55198
  20. 17 10月, 2018 3 次提交
    • V
      KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability · 57b119da
      Vitaly Kuznetsov 提交于
      Enlightened VMCS is opt-in. The current version does not contain all
      fields supported by nested VMX so we must not advertise the
      corresponding VMX features if enlightened VMCS is enabled.
      
      Userspace is given the enlightened VMCS version supported by KVM as
      part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to
      be advertised to the nested hypervisor, currently done via a cpuid
      leaf for Hyper-V.
      Suggested-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      57b119da
    • P
      kvm/x86 : add coalesced pio support · 0804c849
      Peng Hao 提交于
      Coalesced pio is based on coalesced mmio and can be used for some port
      like rtc port, pci-host config port and so on.
      
      Specially in case of rtc as coalesced pio, some versions of windows guest
      access rtc frequently because of rtc as system tick. guest access rtc like
      this: write register index to 0x70, then write or read data from 0x71.
      writing 0x70 port is just as index and do nothing else. So we can use
      coalesced pio to handle this scene to reduce VM-EXIT time.
      
      When starting and closing a virtual machine, it will access pci-host config
      port frequently. So setting these port as coalesced pio can reduce startup
      and shutdown time.
      
      without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
      using perf tools: (guest OS : windows 7 64bit)
      IO Port Access  Samples Samples%  Time%  Min Time  Max Time  Avg time
      0x70:POUT        86     30.99%    74.59%   9us      29us    10.75us (+- 3.41%)
      0xcf8:POUT     1119     2.60%     2.12%   2.79us    56.83us 3.41us (+- 2.23%)
      
      with my patch
      IO Port Access  Samples Samples%  Time%   Min Time  Max Time   Avg time
      0x70:POUT       106    32.02%    29.47%    0us      10us     1.57us (+- 7.38%)
      0xcf8:POUT      1065    1.67%     0.28%   0.41us    65.44us   0.66us (+- 10.55%)
      Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0804c849
    • V
      KVM: x86: hyperv: implement PV IPI send hypercalls · 214ff83d
      Vitaly Kuznetsov 提交于
      Using hypercall for sending IPIs is faster because this allows to specify
      any number of vCPUs (even > 64 with sparse CPU set), the whole procedure
      will take only one VMEXIT.
      
      Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi
      hypercall can't be 'fast' (passing parameters through registers) but
      apparently this is not true, Windows always uses it as 'fast' so we need
      to support that.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      214ff83d
  21. 09 10月, 2018 2 次提交
  22. 03 10月, 2018 1 次提交
    • S
      kvm: arm64: Allow tuning the physical address size for VM · 233a7cb2
      Suzuki K Poulose 提交于
      Allow specifying the physical address size limit for a new
      VM via the kvm_type argument for the KVM_CREATE_VM ioctl. This
      allows us to finalise the stage2 page table as early as possible
      and hence perform the right checks on the memory slots
      without complication. The size is encoded as Log2(PA_Size) in
      bits[7:0] of the type field. For backward compatibility the
      value 0 is reserved and implies 40bits. Also, lift the limit
      of the IPA to host limit and allow lower IPA sizes (e.g, 32).
      
      The userspace could check the extension KVM_CAP_ARM_VM_IPA_SIZE
      for the availability of this feature. The cap check returns the
      maximum limit for the physical address shift supported by the host.
      
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <cdall@kernel.org>
      Cc: Peter Maydell <peter.maydell@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      233a7cb2
  23. 20 9月, 2018 1 次提交
    • D
      KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a
      Drew Schmitt 提交于
      Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
      to reads of MSR_PLATFORM_INFO.
      
      Disabling access to reads of this MSR gives userspace the control to "expose"
      this platform-dependent information to guests in a clear way. As it exists
      today, guests that read this MSR would get unpopulated information if userspace
      hadn't already set it (and prior to this patch series, only the CPUID faulting
      information could have been populated). This existing interface could be
      confusing if guests don't handle the potential for incorrect/incomplete
      information gracefully (e.g. zero reported for base frequency).
      Signed-off-by: NDrew Schmitt <dasch@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fbbde9a
  24. 06 8月, 2018 1 次提交
    • J
      kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59
      Jim Mattson 提交于
      For nested virtualization L0 KVM is managing a bit of state for L2 guests,
      this state can not be captured through the currently available IOCTLs. In
      fact the state captured through all of these IOCTLs is usually a mix of L1
      and L2 state. It is also dependent on whether the L2 guest was running at
      the moment when the process was interrupted to save its state.
      
      With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
      and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
      that is in VMX operation.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [karahmed@ - rename structs and functions and make them ready for AMD and
                   address previous comments.
                 - handle nested.smm state.
                 - rebase & a bit of refactoring.
                 - Merge 7/8 and 8/8 into one patch. ]
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8fcc4b59
  25. 31 7月, 2018 1 次提交
    • J
      KVM: s390: Add huge page enablement control · a4499382
      Janosch Frank 提交于
      General KVM huge page support on s390 has to be enabled via the
      kvm.hpage module parameter. Either nested or hpage can be enabled, as
      we currently do not support vSIE for huge backed guests. Once the vSIE
      support is added we will either drop the parameter or enable it as
      default.
      
      For a guest the feature has to be enabled through the new
      KVM_CAP_S390_HPAGE_1M capability and the hpage module
      parameter. Enabling it means that cmm can't be enabled for the vm and
      disables pfmf and storage key interpretation.
      
      This is due to the fact that in some cases, in upcoming patches, we
      have to split huge pages in the guest mapping to be able to set more
      granular memory protection on 4k pages. These split pages have fake
      page tables that are not visible to the Linux memory management which
      subsequently will not manage its PGSTEs, while the SIE will. Disabling
      these features lets us manage PGSTE data in a consistent matter and
      solve that problem.
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      a4499382
  26. 21 7月, 2018 1 次提交
  27. 12 6月, 2018 1 次提交
  28. 26 5月, 2018 1 次提交
  29. 28 4月, 2018 1 次提交