1. 31 7月, 2018 1 次提交
    • J
      KVM: s390: Add huge page enablement control · a4499382
      Janosch Frank 提交于
      General KVM huge page support on s390 has to be enabled via the
      kvm.hpage module parameter. Either nested or hpage can be enabled, as
      we currently do not support vSIE for huge backed guests. Once the vSIE
      support is added we will either drop the parameter or enable it as
      default.
      
      For a guest the feature has to be enabled through the new
      KVM_CAP_S390_HPAGE_1M capability and the hpage module
      parameter. Enabling it means that cmm can't be enabled for the vm and
      disables pfmf and storage key interpretation.
      
      This is due to the fact that in some cases, in upcoming patches, we
      have to split huge pages in the guest mapping to be able to set more
      granular memory protection on 4k pages. These split pages have fake
      page tables that are not visible to the Linux memory management which
      subsequently will not manage its PGSTEs, while the SIE will. Disabling
      these features lets us manage PGSTE data in a consistent matter and
      solve that problem.
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      a4499382
  2. 22 6月, 2018 1 次提交
  3. 02 6月, 2018 2 次提交
  4. 26 5月, 2018 3 次提交
  5. 25 5月, 2018 1 次提交
  6. 18 5月, 2018 1 次提交
    • M
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin 提交于
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      633711e8
  7. 20 4月, 2018 1 次提交
    • M
      arm/arm64: KVM: Add PSCI version selection API · 85bd0ba1
      Marc Zyngier 提交于
      Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
      or 1.0 to a guest, defaulting to the latest version of the PSCI
      implementation that is compatible with the requested version. This is
      no different from doing a firmware upgrade on KVM.
      
      But in order to give a chance to hypothetical badly implemented guests
      that would have a fit by discovering something other than PSCI 0.2,
      let's provide a new API that allows userspace to pick one particular
      version of the API.
      
      This is implemented as a new class of "firmware" registers, where
      we expose the PSCI version. This allows the PSCI version to be
      save/restored as part of a guest migration, and also set to
      any supported version if the guest requires it.
      
      Cc: stable@vger.kernel.org #4.16
      Reviewed-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85bd0ba1
  8. 29 3月, 2018 1 次提交
  9. 17 3月, 2018 2 次提交
  10. 15 3月, 2018 1 次提交
  11. 07 3月, 2018 4 次提交
  12. 02 3月, 2018 1 次提交
  13. 24 2月, 2018 1 次提交
  14. 19 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds · 3214d01f
      Paul Mackerras 提交于
      This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace
      information about the underlying machine's level of vulnerability
      to the recently announced vulnerabilities CVE-2017-5715,
      CVE-2017-5753 and CVE-2017-5754, and whether the machine provides
      instructions to assist software to work around the vulnerabilities.
      
      The ioctl returns two u64 words describing characteristics of the
      CPU and required software behaviour respectively, plus two mask
      words which indicate which bits have been filled in by the kernel,
      for extensibility.  The bit definitions are the same as for the
      new H_GET_CPU_CHARACTERISTICS hypercall.
      
      There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which
      indicates whether the new ioctl is available.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3214d01f
  15. 16 1月, 2018 2 次提交
    • W
      KVM: X86: use paravirtualized TLB Shootdown · 858a43aa
      Wanpeng Li 提交于
      Remote TLB flush does a busy wait which is fine in bare-metal
      scenario. But with-in the guest, the vcpus might have been pre-empted or
      blocked. In this scenario, the initator vcpu would end up busy-waiting
      for a long amount of time; it also consumes CPU unnecessarily to wake
      up the target of the shootdown.
      
      This patch set adds support for KVM's new paravirtualized TLB flush;
      remote TLB flush does not wait for vcpus that are sleeping, instead
      KVM will flush the TLB as soon as the vCPU starts running again.
      
      The improvement is clearly visible when the host is overcommitted; in this
      case, the PV TLB flush (in addition to avoiding the wait on the main CPU)
      prevents preempted vCPUs from stealing precious execution time from the
      running ones.
      
      Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads,
      so 64 pCPUs, and each VM is 64 vCPUs.
      
      ebizzy -M
                    vanilla    optimized     boost
      1VM            46799       48670         4%
      2VM            23962       42691        78%
      3VM            16152       37539       132%
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      858a43aa
    • P
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras 提交于
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5855564c
  16. 02 1月, 2018 1 次提交
  17. 06 12月, 2017 1 次提交
  18. 05 12月, 2017 4 次提交
    • B
      KVM: Define SEV key management command id · dc48bae0
      Brijesh Singh 提交于
      Define Secure Encrypted Virtualization (SEV) key management command id
      and structure. The command definition is available in SEV KM spec
      0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf)
      and Documentation/virtual/kvm/amd-memory-encryption.txt.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      dc48bae0
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl · 69eaedee
      Brijesh Singh 提交于
      If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
      and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
      register/unregister the guest memory regions which may contain the encrypted
      data (e.g guest RAM, PCI BAR, SMRAM etc).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      69eaedee
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl · 5acc5c06
      Brijesh Singh 提交于
      If the hardware supports memory encryption then the
      KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
      specific memory encryption commands.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      5acc5c06
    • B
      Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV) · b38defdb
      Brijesh Singh 提交于
      Create a Documentation entry to describe the AMD Secure Encrypted
      Virtualization (SEV) feature.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      b38defdb
  19. 10 11月, 2017 1 次提交
  20. 09 11月, 2017 2 次提交
  21. 06 11月, 2017 1 次提交
    • E
      KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET · ae204f80
      Eric Auger 提交于
      At the moment, the in-kernel emulated ITS is not properly reset.
      On guest restart/reset some registers keep their old values and
      internal structures like device, ITE, and collection lists are not
      freed.
      
      This may lead to various bugs. Among them, we can have incorrect state
      backup or failure when saving the ITS state at early guest boot stage.
      
      This patch documents a new attribute, KVM_DEV_ARM_ITS_CTRL_RESET in
      the KVM_DEV_ARM_VGIC_GRP_CTRL group.
      
      Upon this action, we can reset registers and especially those
      pointing to tables previously allocated by the guest and free
      the internal data structures storing the list of devices, collections
      and lpis.
      
      The usual approach for device reset of having userspace write
      the reset values of the registers to the kernel via the register
      read/write APIs doesn't work for the ITS because it has some
      internal state (caches) which is not exposed as registers,
      and there is no register interface for "drop cached data without
      writing it back to RAM". So we need a KVM API which mimics the
      hardware's reset line, to provide the equivalent behaviour to
      a "pull the power cord out of the back of the machine" reset.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Reported-by: Nwanghaibin <wanghaibin.wang@huawei.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ae204f80
  22. 12 10月, 2017 2 次提交
  23. 05 9月, 2017 1 次提交
    • C
      KVM: arm/arm64: Support uaccess of GICC_APRn · 9b87e7a8
      Christoffer Dall 提交于
      When migrating guests around we need to know the active priorities to
      ensure functional virtual interrupt prioritization by the GIC.
      
      This commit clarifies the API and how active priorities of interrupts in
      different groups are represented, and implements the accessor functions
      for the uaccess register range.
      
      We live with a slight layering violation in accessing GICv3 data
      structures from vgic-mmio-v2.c, because anything else just adds too much
      complexity for us to deal with (it's not like there's a benefit
      elsewhere in the code of an intermediate representation as is the case
      with the VMCR).  We accept this, because while doing v3 processing from
      a file named something-v2.c can look strange at first, this really is
      specific to dealing with the user space interface for something that
      looks like a GICv2.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <cdall@linaro.org>
      9b87e7a8
  24. 29 8月, 2017 1 次提交
  25. 14 7月, 2017 2 次提交
    • R
      kvm: x86: hyperv: make VP_INDEX managed by userspace · d3457c87
      Roman Kagan 提交于
      Hyper-V identifies vCPUs by Virtual Processor Index, which can be
      queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
      sequential number which can't exceed the maximum number of vCPUs per VM.
      APIC ids can be sparse and thus aren't a valid replacement for VP
      indices.
      
      Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
      it predictable and persistent across VM migrations, the userspace has to
      control the value of VP_INDEX.
      
      This patch achieves that, by storing vp_index explicitly on vcpu, and
      allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
      compatibility it's initialized to KVM vcpu index.  Also a few variables
      are renamed to make clear distinction betweed this Hyper-V vp_index and
      KVM vcpu_id (== APIC id).  Besides, a new capability,
      KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
      attempting msr writes where unsupported, to avoid spamming error logs.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d3457c87
    • W
      KVM: async_pf: Let guest support delivery of async_pf from guest mode · 52a5c155
      Wanpeng Li 提交于
      Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
      async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
      kvm_can_do_async_pf returns 0 if in guest mode.
      
      This is similar to what svm.c wanted to do all along, but it is only
      enabled for Linux as L1 hypervisor.  Foreign hypervisors must never
      receive async page faults as vmexits, because they'd probably be very
      confused about that.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      52a5c155
  26. 13 7月, 2017 1 次提交
    • R
      kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6
      Roman Kagan 提交于
      There is a flaw in the Hyper-V SynIC implementation in KVM: when message
      page or event flags page is enabled by setting the corresponding msr,
      KVM zeroes it out.  This is problematic because on migration the
      corresponding MSRs are loaded on the destination, so the content of
      those pages is lost.
      
      This went unnoticed so far because the only user of those pages was
      in-KVM hyperv synic timers, which could continue working despite that
      zeroing.
      
      Newer QEMU uses those pages for Hyper-V VMBus implementation, and
      zeroing them breaks the migration.
      
      Besides, in newer QEMU the content of those pages is fully managed by
      QEMU, so zeroing them is undesirable even when writing the MSRs from the
      guest side.
      
      To support this new scheme, introduce a new capability,
      KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
      pages aren't zeroed out in KVM.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      efc479e6