1. 09 10月, 2018 2 次提交
  2. 20 9月, 2018 1 次提交
    • D
      KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a
      Drew Schmitt 提交于
      Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
      to reads of MSR_PLATFORM_INFO.
      
      Disabling access to reads of this MSR gives userspace the control to "expose"
      this platform-dependent information to guests in a clear way. As it exists
      today, guests that read this MSR would get unpopulated information if userspace
      hadn't already set it (and prior to this patch series, only the CPUID faulting
      information could have been populated). This existing interface could be
      confusing if guests don't handle the potential for incorrect/incomplete
      information gracefully (e.g. zero reported for base frequency).
      Signed-off-by: NDrew Schmitt <dasch@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fbbde9a
  3. 12 9月, 2018 1 次提交
  4. 22 8月, 2018 1 次提交
  5. 06 8月, 2018 2 次提交
    • W
      KVM: X86: Implement "send IPI" hypercall · 4180bf1b
      Wanpeng Li 提交于
      Using hypercall to send IPIs by one vmexit instead of one by one for
      xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster
      mode. Intel guest can enter x2apic cluster mode when interrupt remmaping
      is enabled in qemu, however, latest AMD EPYC still just supports xapic
      mode which can get great improvement by Exit-less IPIs. This patchset
      lets a guest send multicast IPIs, with at most 128 destinations per
      hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode.
      
      Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM
      is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141):
      
      x2apic cluster mode, vanilla
      
       Dry-run:                         0,            2392199 ns
       Self-IPI:                  6907514,           15027589 ns
       Normal IPI:              223910476,          251301666 ns
       Broadcast IPI:                   0,         9282161150 ns
       Broadcast lock:                  0,         8812934104 ns
      
      x2apic cluster mode, pv-ipi
      
       Dry-run:                         0,            2449341 ns
       Self-IPI:                  6720360,           15028732 ns
       Normal IPI:              228643307,          255708477 ns
       Broadcast IPI:                   0,         7572293590 ns  => 22% performance boost
       Broadcast lock:                  0,         8316124651 ns
      
      x2apic physical mode, vanilla
      
       Dry-run:                         0,            3135933 ns
       Self-IPI:                  8572670,           17901757 ns
       Normal IPI:              226444334,          255421709 ns
       Broadcast IPI:                   0,        19845070887 ns
       Broadcast lock:                  0,        19827383656 ns
      
      x2apic physical mode, pv-ipi
      
       Dry-run:                         0,            2446381 ns
       Self-IPI:                  6788217,           15021056 ns
       Normal IPI:              219454441,          249583458 ns
       Broadcast IPI:                   0,         7806540019 ns  => 154% performance boost
       Broadcast lock:                  0,         9143618799 ns
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4180bf1b
    • J
      kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59
      Jim Mattson 提交于
      For nested virtualization L0 KVM is managing a bit of state for L2 guests,
      this state can not be captured through the currently available IOCTLs. In
      fact the state captured through all of these IOCTLs is usually a mix of L1
      and L2 state. It is also dependent on whether the L2 guest was running at
      the moment when the process was interrupted to save its state.
      
      With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
      and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
      that is in VMX operation.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [karahmed@ - rename structs and functions and make them ready for AMD and
                   address previous comments.
                 - handle nested.smm state.
                 - rebase & a bit of refactoring.
                 - Merge 7/8 and 8/8 into one patch. ]
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8fcc4b59
  6. 31 7月, 2018 1 次提交
    • J
      KVM: s390: Add huge page enablement control · a4499382
      Janosch Frank 提交于
      General KVM huge page support on s390 has to be enabled via the
      kvm.hpage module parameter. Either nested or hpage can be enabled, as
      we currently do not support vSIE for huge backed guests. Once the vSIE
      support is added we will either drop the parameter or enable it as
      default.
      
      For a guest the feature has to be enabled through the new
      KVM_CAP_S390_HPAGE_1M capability and the hpage module
      parameter. Enabling it means that cmm can't be enabled for the vm and
      disables pfmf and storage key interpretation.
      
      This is due to the fact that in some cases, in upcoming patches, we
      have to split huge pages in the guest mapping to be able to set more
      granular memory protection on 4k pages. These split pages have fake
      page tables that are not visible to the Linux memory management which
      subsequently will not manage its PGSTEs, while the SIE will. Disabling
      these features lets us manage PGSTE data in a consistent matter and
      solve that problem.
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      a4499382
  7. 21 7月, 2018 4 次提交
  8. 22 6月, 2018 1 次提交
  9. 02 6月, 2018 2 次提交
  10. 26 5月, 2018 3 次提交
  11. 25 5月, 2018 1 次提交
  12. 18 5月, 2018 1 次提交
    • M
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin 提交于
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      633711e8
  13. 20 4月, 2018 1 次提交
    • M
      arm/arm64: KVM: Add PSCI version selection API · 85bd0ba1
      Marc Zyngier 提交于
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85bd0ba1
  14. 29 3月, 2018 1 次提交
  15. 17 3月, 2018 2 次提交
  16. 15 3月, 2018 1 次提交
  17. 07 3月, 2018 4 次提交
  18. 02 3月, 2018 1 次提交
  19. 24 2月, 2018 1 次提交
  20. 19 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds · 3214d01f
      Paul Mackerras 提交于
      This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace
      information about the underlying machine's level of vulnerability
      to the recently announced vulnerabilities CVE-2017-5715,
      CVE-2017-5753 and CVE-2017-5754, and whether the machine provides
      instructions to assist software to work around the vulnerabilities.
      
      The ioctl returns two u64 words describing characteristics of the
      CPU and required software behaviour respectively, plus two mask
      words which indicate which bits have been filled in by the kernel,
      for extensibility.  The bit definitions are the same as for the
      new H_GET_CPU_CHARACTERISTICS hypercall.
      
      There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which
      indicates whether the new ioctl is available.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3214d01f
  21. 16 1月, 2018 2 次提交
    • W
      KVM: X86: use paravirtualized TLB Shootdown · 858a43aa
      Wanpeng Li 提交于
      Remote TLB flush does a busy wait which is fine in bare-metal
      scenario. But with-in the guest, the vcpus might have been pre-empted or
      blocked. In this scenario, the initator vcpu would end up busy-waiting
      for a long amount of time; it also consumes CPU unnecessarily to wake
      up the target of the shootdown.
      
      This patch set adds support for KVM's new paravirtualized TLB flush;
      remote TLB flush does not wait for vcpus that are sleeping, instead
      KVM will flush the TLB as soon as the vCPU starts running again.
      
      The improvement is clearly visible when the host is overcommitted; in this
      case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
      prevents preempted vCPUs from stealing precious execution time from the
      running ones.
      
      Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
      so 64 pCPUs, and each VM is 64 vCPUs.
      
      ebizzy -M
                    vanilla    optimized     boost
      1VM            46799       48670         4%
      2VM            23962       42691        78%
      3VM            16152       37539       132%
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      858a43aa
    • P
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras 提交于
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5855564c
  22. 02 1月, 2018 1 次提交
  23. 06 12月, 2017 1 次提交
  24. 05 12月, 2017 4 次提交
    • B
      KVM: Define SEV key management command id · dc48bae0
      Brijesh Singh 提交于
      Define Secure Encrypted Virtualization (SEV) key management command id
      and structure. The command definition is available in SEV KM spec
      0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf)
      and Documentation/virtual/kvm/amd-memory-encryption.txt.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      dc48bae0
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl · 69eaedee
      Brijesh Singh 提交于
      If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
      and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
      register/unregister the guest memory regions which may contain the encrypted
      data (e.g guest RAM, PCI BAR, SMRAM etc).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      69eaedee
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl · 5acc5c06
      Brijesh Singh 提交于
      If the hardware supports memory encryption then the
      KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
      specific memory encryption commands.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      5acc5c06
    • B
      Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV) · b38defdb
      Brijesh Singh 提交于
      Create a Documentation entry to describe the AMD Secure Encrypted
      Virtualization (SEV) feature.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      b38defdb