1. 06 12月, 2017 1 次提交
  2. 09 11月, 2017 1 次提交
  3. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX license identifier to uapi header files with no license · 6f52b16c
      Greg Kroah-Hartman 提交于
      Many user space API headers are missing licensing information, which
      makes it hard for compliance tools to determine the correct license.
      
      By default are files without license information under the default
      license of the kernel, which is GPLV2.  Marking them GPLV2 would exclude
      them from being included in non GPLV2 code, which is obviously not
      intended. The user space API headers fall under the syscall exception
      which is in the kernels COPYING file:
      
         NOTE! This copyright does *not* cover user programs that use kernel
         services by normal system calls - this is merely considered normal use
         of the kernel, and does *not* fall under the heading of "derived work".
      
      otherwise syscall usage would not be possible.
      
      Update the files which contain no license information with an SPDX
      license identifier.  The chosen identifier is 'GPL-2.0 WITH
      Linux-syscall-note' which is the officially assigned identifier for the
      Linux syscall exception.  SPDX license identifiers are a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.  See the previous patch in this series for the
      methodology of how this patch was researched.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f52b16c
  4. 31 8月, 2017 1 次提交
  5. 14 7月, 2017 1 次提交
    • R
      kvm: x86: hyperv: make VP_INDEX managed by userspace · d3457c87
      Roman Kagan 提交于
      Hyper-V identifies vCPUs by Virtual Processor Index, which can be
      queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
      sequential number which can't exceed the maximum number of vCPUs per VM.
      APIC ids can be sparse and thus aren't a valid replacement for VP
      indices.
      
      Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
      it predictable and persistent across VM migrations, the userspace has to
      control the value of VP_INDEX.
      
      This patch achieves that, by storing vp_index explicitly on vcpu, and
      allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
      compatibility it's initialized to KVM vcpu index.  Also a few variables
      are renamed to make clear distinction betweed this Hyper-V vp_index and
      KVM vcpu_id (== APIC id).  Besides, a new capability,
      KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
      attempting msr writes where unsupported, to avoid spamming error logs.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d3457c87
  6. 13 7月, 2017 2 次提交
    • R
      kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6
      Roman Kagan 提交于
      There is a flaw in the Hyper-V SynIC implementation in KVM: when message
      page or event flags page is enabled by setting the corresponding msr,
      KVM zeroes it out.  This is problematic because on migration the
      corresponding MSRs are loaded on the destination, so the content of
      those pages is lost.
      
      This went unnoticed so far because the only user of those pages was
      in-KVM hyperv synic timers, which could continue working despite that
      zeroing.
      
      Newer QEMU uses those pages for Hyper-V VMBus implementation, and
      zeroing them breaks the migration.
      
      Besides, in newer QEMU the content of those pages is fully managed by
      QEMU, so zeroing them is undesirable even when writing the MSRs from the
      guest side.
      
      To support this new scheme, introduce a new capability,
      KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
      pages aren't zeroed out in KVM.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      efc479e6
    • G
      KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition · 949c0336
      Gleb Fotengauer-Malinovskiy 提交于
      In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct
      kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE),
      it also writes back a return value (which constitutes _IOC_READ) making
      this an _IOWR ioctl instead of _IOW.
      
      Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes")
      Signed-off-by: NGleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      949c0336
  7. 22 6月, 2017 2 次提交
  8. 21 6月, 2017 1 次提交
    • A
      KVM: PPC: Book3S HV: Add new capability to control MCE behaviour · 134764ed
      Aravinda Prasad 提交于
      This introduces a new KVM capability to control how KVM behaves
      on machine check exception (MCE) in HV KVM guests.
      
      If this capability has not been enabled, KVM redirects machine check
      exceptions to guest's 0x200 vector, if the address in error belongs to
      the guest. With this capability enabled, KVM will cause a guest exit
      with the exit reason indicating an NMI.
      
      The new capability is required to avoid problems if a new kernel/KVM
      is used with an old QEMU, running a guest that doesn't issue
      "ibm,nmi-register".  As old QEMU does not understand the NMI exit
      type, it treats it as a fatal error.  However, the guest could have
      handled the machine check error if the exception was delivered to
      guest's 0x200 interrupt vector instead of NMI exit in case of old
      QEMU.
      
      [paulus@ozlabs.org - Reworded the commit message to be clearer,
       enable only on HV KVM.]
      Signed-off-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      134764ed
  9. 21 4月, 2017 1 次提交
    • M
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin 提交于
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: N"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
  10. 20 4月, 2017 2 次提交
    • A
      KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba
      Alexey Kardashevskiy 提交于
      This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
      and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
      without passing them to user space which saves time on switching
      to user space and back.
      
      This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
      KVM tries to handle a TCE request in the real mode, if failed
      it passes the request to the virtual mode to complete the operation.
      If it a virtual mode handler fails, the request is passed to
      the user space; this is not expected to happen though.
      
      To avoid dealing with page use counters (which is tricky in real mode),
      this only accelerates SPAPR TCE IOMMU v2 clients which are required
      to pre-register the userspace memory. The very first TCE request will
      be handled in the VFIO SPAPR TCE driver anyway as the userspace view
      of the TCE table (iommu_table::it_userspace) is not allocated till
      the very first mapping happens and we cannot call vmalloc in real mode.
      
      If we fail to update a hardware IOMMU table unexpected reason, we just
      clear it and move on as there is nothing really we can do about it -
      for example, if we hot plug a VFIO device to a guest, existing TCE tables
      will be mirrored automatically to the hardware and there is no interface
      to report to the guest about possible failures.
      
      This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
      the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
      and associates a physical IOMMU table with the SPAPR TCE table (which
      is a guest view of the hardware IOMMU table). The iommu_table object
      is cached and referenced so we do not have to look up for it in real mode.
      
      This does not implement the UNSET counterpart as there is no use for it -
      once the acceleration is enabled, the existing userspace won't
      disable it unless a VFIO container is destroyed; this adds necessary
      cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
      
      This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
      space.
      
      This adds real mode version of WARN_ON_ONCE() as the generic version
      causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
      returns in the code, this also adds a check for already existing
      vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
      
      This finally makes use of vfio_external_user_iommu_id() which was
      introduced quite some time ago and was considered for removal.
      
      Tests show that this patch increases transmission speed from 220MB/s
      to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      121f80ba
    • A
      KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number · 4898d3f4
      Alexey Kardashevskiy 提交于
      This adds a capability number for in-kernel support for VFIO on
      SPAPR platform.
      
      The capability will tell the user space whether in-kernel handlers of
      H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
      must not attempt allocating a TCE table in the host kernel via
      the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
      will not be passed to the user space which is desired action in
      the situation like that.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4898d3f4
  11. 09 4月, 2017 1 次提交
  12. 07 4月, 2017 1 次提交
  13. 28 3月, 2017 2 次提交
    • J
      KVM: MIPS: Add 64BIT capability · 578fd61d
      James Hogan 提交于
      Add a new KVM_CAP_MIPS_64BIT capability to indicate that 64-bit MIPS
      guests are available and supported. In this case it should still be
      possible to run 32-bit guest code. If not available it won't be possible
      to run 64-bit guest code and the instructions may not be available, or
      the kernel may not support full context switching of 64-bit registers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      578fd61d
    • J
      KVM: MIPS: Add VZ & TE capabilities · a8a3c426
      James Hogan 提交于
      Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
      to allow MIPS KVM to support VZ without confusing old users (which
      expect the trap & emulate implementation), define and start checking
      KVM_CREATE_VM type codes.
      
      The codes available are:
      
       - KVM_VM_MIPS_TE = 0
      
         This is the current value expected from the user, and will create a
         VM using trap & emulate in user mode, confined to the user mode
         address space. This may in future become unavailable if the kernel is
         only configured to support VZ, in which case the EINVAL error will be
         returned and KVM_CAP_MIPS_TE won't be available even though
         KVM_CAP_MIPS_VZ is.
      
       - KVM_VM_MIPS_VZ = 1
      
         This can be provided when the KVM_CAP_MIPS_VZ capability is available
         to create a VM using VZ, with a fully virtualized guest virtual
         address space. If VZ support is unavailable in the kernel, the EINVAL
         error will be returned (although old kernels without the
         KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
         emulate VM).
      
      This is designed to allow the desired implementation (T&E vs VZ) to be
      potentially chosen at runtime rather than being fixed in the kernel
      configuration.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      a8a3c426
  14. 23 3月, 2017 1 次提交
  15. 17 2月, 2017 1 次提交
    • P
      KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1
      Paolo Bonzini 提交于
      The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
      a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
      to a dummy signal handler; by blocking the signal outside KVM_RUN and
      unblocking it inside, this possible race is closed:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
              check flag
                                                set flag
                                                raise signal
              (signal handler does nothing)
              KVM_RUN
      
      However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
      tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
      remote NUMA node, because it is on the node of a thread's creator.
      Taking this lock can be very expensive if there are many userspace
      exits (as is the case for SMP Windows VMs without Hyper-V reference
      time counter).
      
      As an alternative, we can put the flag directly in kvm_run so that
      KVM can see it:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
                                                raise signal
              signal handler
                set run->immediate_exit
              KVM_RUN
                check run->immediate_exit
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      460df4c1
  16. 31 1月, 2017 2 次提交
    • D
      KVM: PPC: Book3S HV: HPT resizing documentation and reserved numbers · ef1ead0c
      David Gibson 提交于
      This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
      advertise whether KVM is capable of handling the PAPR extensions for
      resizing the hashed page table during guest runtime.  It also adds
      definitions for two new VM ioctl()s to implement this extension, and
      documentation of the same.
      
      Note that, HPT resizing is already possible with KVM PR without kernel
      modification, since the HPT is managed within userspace (qemu).  The
      capability defined here will only be set where an in-kernel implementation
      of resizing is necessary, i.e. for KVM HV.  To determine if the userspace
      resize implementation can be used, it's necessary to check
      KVM_CAP_PPC_ALLOC_HTAB.  Unfortunately older kernels incorrectly set
      KVM_CAP_PPC_ALLOC_HTAB even with KVM PR.  If userspace it want to support
      resizing with KVM PR on such kernels, it will need a workaround.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef1ead0c
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
  17. 24 11月, 2016 1 次提交
  18. 20 11月, 2016 1 次提交
  19. 02 8月, 2016 1 次提交
  20. 23 7月, 2016 1 次提交
  21. 19 7月, 2016 2 次提交
  22. 18 7月, 2016 1 次提交
  23. 14 7月, 2016 2 次提交
    • R
      KVM: x86: add a flag to disable KVM x2apic broadcast quirk · c519265f
      Radim Krčmář 提交于
      Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
      KVM_CAP_X2APIC_API.
      
      The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
      The enableable capability is needed in order to support standard x2APIC and
      remain backward compatible.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      [Expand kvm_apic_mda comment. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c519265f
    • R
      KVM: x86: add KVM_CAP_X2APIC_API · 37131313
      Radim Krčmář 提交于
      KVM_CAP_X2APIC_API is a capability for features related to x2APIC
      enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
      extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
      Both are needed to support x2APIC.
      
      The feature has to be enableable and disabled by default, because
      get/set ioctl shifted and truncated APIC ID to 8 bits by using a
      non-standard protocol inspired by xAPIC and the change is not
      backward-compatible.
      
      Changes to MSI addresses follow the format used by interrupt remapping
      unit.  The upper address word, that used to be 0, contains upper 24 bits
      of the LAPIC address in its upper 24 bits.  Lower 8 bits are reserved as
      0.  Using the upper address word is not backward-compatible either as we
      didn't check that userspace zeroed the word.  Reserved bits are still
      not explicitly checked, but non-zero data will affect LAPIC addresses,
      which will cause a bug.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37131313
  24. 12 5月, 2016 1 次提交
    • G
      kvm: introduce KVM_MAX_VCPU_ID · 0b1b1dfd
      Greg Kurz 提交于
      The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
      also the upper limit for vCPU ids. This is okay for all archs except PowerPC
      which can have higher ids, depending on the cpu/core/thread topology. In the
      worst case (single threaded guest, host with 8 threads per core), it limits
      the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
      
      This patch separates the vCPU numbering from the total number of vCPUs, with
      the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
      plus one.
      
      The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
      before passing them to KVM_CREATE_VCPU.
      
      This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
      Other archs continue to return KVM_MAX_VCPUS instead.
      Suggested-by: NRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b1b1dfd
  25. 02 3月, 2016 2 次提交
  26. 01 3月, 2016 2 次提交
  27. 17 2月, 2016 1 次提交
    • A
      kvm/x86: Hyper-V VMBus hypercall userspace exit · 83326e43
      Andrey Smetanin 提交于
      The patch implements KVM_EXIT_HYPERV userspace exit
      functionality for Hyper-V VMBus hypercalls:
      HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.
      
      Changes v3:
      * use vcpu->arch.complete_userspace_io to setup hypercall
      result
      
      Changes v2:
      * use KVM_EXIT_HYPERV for hypercalls
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      83326e43
  28. 10 2月, 2016 2 次提交
  29. 07 1月, 2016 1 次提交
  30. 26 11月, 2015 1 次提交