1. 09 10月, 2018 1 次提交
  2. 20 9月, 2018 1 次提交
    • D
      KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a
      Drew Schmitt 提交于
      Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
      to reads of MSR_PLATFORM_INFO.
      
      Disabling access to reads of this MSR gives userspace the control to "expose"
      this platform-dependent information to guests in a clear way. As it exists
      today, guests that read this MSR would get unpopulated information if userspace
      hadn't already set it (and prior to this patch series, only the CPUID faulting
      information could have been populated). This existing interface could be
      confusing if guests don't handle the potential for incorrect/incomplete
      information gracefully (e.g. zero reported for base frequency).
      Signed-off-by: NDrew Schmitt <dasch@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6fbbde9a
  3. 06 8月, 2018 1 次提交
    • J
      kvm: nVMX: Introduce KVM_CAP_NESTED_STATE · 8fcc4b59
      Jim Mattson 提交于
      For nested virtualization L0 KVM is managing a bit of state for L2 guests,
      this state can not be captured through the currently available IOCTLs. In
      fact the state captured through all of these IOCTLs is usually a mix of L1
      and L2 state. It is also dependent on whether the L2 guest was running at
      the moment when the process was interrupted to save its state.
      
      With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
      and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
      that is in VMX operation.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NJim Mattson <jmattson@google.com>
      [karahmed@ - rename structs and functions and make them ready for AMD and
                   address previous comments.
                 - handle nested.smm state.
                 - rebase & a bit of refactoring.
                 - Merge 7/8 and 8/8 into one patch. ]
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8fcc4b59
  4. 31 7月, 2018 1 次提交
    • J
      KVM: s390: Add huge page enablement control · a4499382
      Janosch Frank 提交于
      General KVM huge page support on s390 has to be enabled via the
      kvm.hpage module parameter. Either nested or hpage can be enabled, as
      we currently do not support vSIE for huge backed guests. Once the vSIE
      support is added we will either drop the parameter or enable it as
      default.
      
      For a guest the feature has to be enabled through the new
      KVM_CAP_S390_HPAGE_1M capability and the hpage module
      parameter. Enabling it means that cmm can't be enabled for the vm and
      disables pfmf and storage key interpretation.
      
      This is due to the fact that in some cases, in upcoming patches, we
      have to split huge pages in the guest mapping to be able to set more
      granular memory protection on 4k pages. These split pages have fake
      page tables that are not visible to the Linux memory management which
      subsequently will not manage its PGSTEs, while the SIE will. Disabling
      these features lets us manage PGSTE data in a consistent matter and
      solve that problem.
      Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      a4499382
  5. 21 7月, 2018 1 次提交
  6. 12 6月, 2018 1 次提交
  7. 26 5月, 2018 1 次提交
  8. 28 4月, 2018 1 次提交
  9. 17 3月, 2018 1 次提交
    • W
      KVM: X86: Provide a capability to disable MWAIT intercepts · 4d5422ce
      Wanpeng Li 提交于
      Allowing a guest to execute MWAIT without interception enables a guest
      to put a (physical) CPU into a power saving state, where it takes
      longer to return from than what may be desired by the host.
      
      Don't give a guest that power over a host by default. (Especially,
      since nothing prevents a guest from using MWAIT even when it is not
      advertised via CPUID.)
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d5422ce
  10. 07 3月, 2018 2 次提交
  11. 02 3月, 2018 1 次提交
  12. 21 1月, 2018 1 次提交
  13. 19 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds · 3214d01f
      Paul Mackerras 提交于
      This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace
      information about the underlying machine's level of vulnerability
      to the recently announced vulnerabilities CVE-2017-5715,
      CVE-2017-5753 and CVE-2017-5754, and whether the machine provides
      instructions to assist software to work around the vulnerabilities.
      
      The ioctl returns two u64 words describing characteristics of the
      CPU and required software behaviour respectively, plus two mask
      words which indicate which bits have been filled in by the kernel,
      for extensibility.  The bit definitions are the same as for the
      new H_GET_CPU_CHARACTERISTICS hypercall.
      
      There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which
      indicates whether the new ioctl is available.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      3214d01f
  14. 06 12月, 2017 1 次提交
  15. 05 12月, 2017 3 次提交
    • B
      KVM: Define SEV key management command id · dc48bae0
      Brijesh Singh 提交于
      Define Secure Encrypted Virtualization (SEV) key management command id
      and structure. The command definition is available in SEV KM spec
      0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf)
      and Documentation/virtual/kvm/amd-memory-encryption.txt.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      dc48bae0
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl · 69eaedee
      Brijesh Singh 提交于
      If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION
      and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to
      register/unregister the guest memory regions which may contain the encrypted
      data (e.g guest RAM, PCI BAR, SMRAM etc).
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Improvements-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      69eaedee
    • B
      KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl · 5acc5c06
      Brijesh Singh 提交于
      If the hardware supports memory encryption then the
      KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform
      specific memory encryption commands.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      5acc5c06
  16. 09 11月, 2017 1 次提交
  17. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  18. 31 8月, 2017 1 次提交
  19. 14 7月, 2017 1 次提交
    • R
      kvm: x86: hyperv: make VP_INDEX managed by userspace · d3457c87
      Roman Kagan 提交于
      Hyper-V identifies vCPUs by Virtual Processor Index, which can be
      queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
      sequential number which can't exceed the maximum number of vCPUs per VM.
      APIC ids can be sparse and thus aren't a valid replacement for VP
      indices.
      
      Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
      it predictable and persistent across VM migrations, the userspace has to
      control the value of VP_INDEX.
      
      This patch achieves that, by storing vp_index explicitly on vcpu, and
      allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
      compatibility it's initialized to KVM vcpu index.  Also a few variables
      are renamed to make clear distinction betweed this Hyper-V vp_index and
      KVM vcpu_id (== APIC id).  Besides, a new capability,
      KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
      attempting msr writes where unsupported, to avoid spamming error logs.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d3457c87
  20. 13 7月, 2017 2 次提交
    • R
      kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6
      Roman Kagan 提交于
      There is a flaw in the Hyper-V SynIC implementation in KVM: when message
      page or event flags page is enabled by setting the corresponding msr,
      KVM zeroes it out.  This is problematic because on migration the
      corresponding MSRs are loaded on the destination, so the content of
      those pages is lost.
      
      This went unnoticed so far because the only user of those pages was
      in-KVM hyperv synic timers, which could continue working despite that
      zeroing.
      
      Newer QEMU uses those pages for Hyper-V VMBus implementation, and
      zeroing them breaks the migration.
      
      Besides, in newer QEMU the content of those pages is fully managed by
      QEMU, so zeroing them is undesirable even when writing the MSRs from the
      guest side.
      
      To support this new scheme, introduce a new capability,
      KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
      pages aren't zeroed out in KVM.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      efc479e6
    • G
      KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition · 949c0336
      Gleb Fotengauer-Malinovskiy 提交于
      In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct
      kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE),
      it also writes back a return value (which constitutes _IOC_READ) making
      this an _IOWR ioctl instead of _IOW.
      
      Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes")
      Signed-off-by: NGleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      949c0336
  21. 22 6月, 2017 2 次提交
  22. 21 6月, 2017 1 次提交
    • A
      KVM: PPC: Book3S HV: Add new capability to control MCE behaviour · 134764ed
      Aravinda Prasad 提交于
      This introduces a new KVM capability to control how KVM behaves
      on machine check exception (MCE) in HV KVM guests.
      
      If this capability has not been enabled, KVM redirects machine check
      exceptions to guest's 0x200 vector, if the address in error belongs to
      the guest. With this capability enabled, KVM will cause a guest exit
      with the exit reason indicating an NMI.
      
      The new capability is required to avoid problems if a new kernel/KVM
      is used with an old QEMU, running a guest that doesn't issue
      "ibm,nmi-register".  As old QEMU does not understand the NMI exit
      type, it treats it as a fatal error.  However, the guest could have
      handled the machine check error if the exception was delivered to
      guest's 0x200 interrupt vector instead of NMI exit in case of old
      QEMU.
      
      [paulus@ozlabs.org - Reworded the commit message to be clearer,
       enable only on HV KVM.]
      Signed-off-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      134764ed
  23. 21 4月, 2017 1 次提交
    • M
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin 提交于
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: N"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
  24. 20 4月, 2017 2 次提交
    • A
      KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba
      Alexey Kardashevskiy 提交于
      This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
      and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
      without passing them to user space which saves time on switching
      to user space and back.
      
      This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
      KVM tries to handle a TCE request in the real mode, if failed
      it passes the request to the virtual mode to complete the operation.
      If it a virtual mode handler fails, the request is passed to
      the user space; this is not expected to happen though.
      
      To avoid dealing with page use counters (which is tricky in real mode),
      this only accelerates SPAPR TCE IOMMU v2 clients which are required
      to pre-register the userspace memory. The very first TCE request will
      be handled in the VFIO SPAPR TCE driver anyway as the userspace view
      of the TCE table (iommu_table::it_userspace) is not allocated till
      the very first mapping happens and we cannot call vmalloc in real mode.
      
      If we fail to update a hardware IOMMU table unexpected reason, we just
      clear it and move on as there is nothing really we can do about it -
      for example, if we hot plug a VFIO device to a guest, existing TCE tables
      will be mirrored automatically to the hardware and there is no interface
      to report to the guest about possible failures.
      
      This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
      the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
      and associates a physical IOMMU table with the SPAPR TCE table (which
      is a guest view of the hardware IOMMU table). The iommu_table object
      is cached and referenced so we do not have to look up for it in real mode.
      
      This does not implement the UNSET counterpart as there is no use for it -
      once the acceleration is enabled, the existing userspace won't
      disable it unless a VFIO container is destroyed; this adds necessary
      cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
      
      This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
      space.
      
      This adds real mode version of WARN_ON_ONCE() as the generic version
      causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
      returns in the code, this also adds a check for already existing
      vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
      
      This finally makes use of vfio_external_user_iommu_id() which was
      introduced quite some time ago and was considered for removal.
      
      Tests show that this patch increases transmission speed from 220MB/s
      to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      121f80ba
    • A
      KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number · 4898d3f4
      Alexey Kardashevskiy 提交于
      This adds a capability number for in-kernel support for VFIO on
      SPAPR platform.
      
      The capability will tell the user space whether in-kernel handlers of
      H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
      must not attempt allocating a TCE table in the host kernel via
      the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
      will not be passed to the user space which is desired action in
      the situation like that.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4898d3f4
  25. 09 4月, 2017 1 次提交
  26. 07 4月, 2017 1 次提交
  27. 28 3月, 2017 2 次提交
    • J
      KVM: MIPS: Add 64BIT capability · 578fd61d
      James Hogan 提交于
      Add a new KVM_CAP_MIPS_64BIT capability to indicate that 64-bit MIPS
      guests are available and supported. In this case it should still be
      possible to run 32-bit guest code. If not available it won't be possible
      to run 64-bit guest code and the instructions may not be available, or
      the kernel may not support full context switching of 64-bit registers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      578fd61d
    • J
      KVM: MIPS: Add VZ & TE capabilities · a8a3c426
      James Hogan 提交于
      Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
      to allow MIPS KVM to support VZ without confusing old users (which
      expect the trap & emulate implementation), define and start checking
      KVM_CREATE_VM type codes.
      
      The codes available are:
      
       - KVM_VM_MIPS_TE = 0
      
         This is the current value expected from the user, and will create a
         VM using trap & emulate in user mode, confined to the user mode
         address space. This may in future become unavailable if the kernel is
         only configured to support VZ, in which case the EINVAL error will be
         returned and KVM_CAP_MIPS_TE won't be available even though
         KVM_CAP_MIPS_VZ is.
      
       - KVM_VM_MIPS_VZ = 1
      
         This can be provided when the KVM_CAP_MIPS_VZ capability is available
         to create a VM using VZ, with a fully virtualized guest virtual
         address space. If VZ support is unavailable in the kernel, the EINVAL
         error will be returned (although old kernels without the
         KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
         emulate VM).
      
      This is designed to allow the desired implementation (T&E vs VZ) to be
      potentially chosen at runtime rather than being fixed in the kernel
      configuration.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      a8a3c426
  28. 23 3月, 2017 1 次提交
  29. 17 2月, 2017 1 次提交
    • P
      KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1
      Paolo Bonzini 提交于
      The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
      a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
      to a dummy signal handler; by blocking the signal outside KVM_RUN and
      unblocking it inside, this possible race is closed:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
              check flag
                                                set flag
                                                raise signal
              (signal handler does nothing)
              KVM_RUN
      
      However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
      tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
      remote NUMA node, because it is on the node of a thread's creator.
      Taking this lock can be very expensive if there are many userspace
      exits (as is the case for SMP Windows VMs without Hyper-V reference
      time counter).
      
      As an alternative, we can put the flag directly in kvm_run so that
      KVM can see it:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
                                                raise signal
              signal handler
                set run->immediate_exit
              KVM_RUN
                check run->immediate_exit
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      460df4c1
  30. 31 1月, 2017 2 次提交
    • D
      KVM: PPC: Book3S HV: HPT resizing documentation and reserved numbers · ef1ead0c
      David Gibson 提交于
      This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
      advertise whether KVM is capable of handling the PAPR extensions for
      resizing the hashed page table during guest runtime.  It also adds
      definitions for two new VM ioctl()s to implement this extension, and
      documentation of the same.
      
      Note that, HPT resizing is already possible with KVM PR without kernel
      modification, since the HPT is managed within userspace (qemu).  The
      capability defined here will only be set where an in-kernel implementation
      of resizing is necessary, i.e. for KVM HV.  To determine if the userspace
      resize implementation can be used, it's necessary to check
      KVM_CAP_PPC_ALLOC_HTAB.  Unfortunately older kernels incorrectly set
      KVM_CAP_PPC_ALLOC_HTAB even with KVM PR.  If userspace it want to support
      resizing with KVM PR on such kernels, it will need a workaround.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef1ead0c
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
  31. 24 11月, 2016 1 次提交
  32. 20 11月, 2016 1 次提交