1. 17 3月, 2018 1 次提交
    • W
      KVM: X86: Provide a capability to disable MWAIT intercepts · 4d5422ce
      Wanpeng Li 提交于
      Allowing a guest to execute MWAIT without interception enables a guest
      to put a (physical) CPU into a power saving state, where it takes
      longer to return from than what may be desired by the host.
      
      Don't give a guest that power over a host by default. (Especially,
      since nothing prevents a guest from using MWAIT even when it is not
      advertised via CPUID.)
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4d5422ce
  2. 15 2月, 2018 1 次提交
    • I
      tools/headers: Synchronize kernel ABI headers, v4.16-rc1 · f091f1d6
      Ingo Molnar 提交于
      Sync the following tooling headers with the latest kernel version:
      
        tools/arch/powerpc/include/uapi/asm/kvm.h
        tools/arch/x86/include/asm/cpufeatures.h
        tools/include/uapi/drm/i915_drm.h
        tools/include/uapi/linux/if_link.h
        tools/include/uapi/linux/kvm.h
      
      All the changes are new ABI additions which don't impact their use
      in existing tooling.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f091f1d6
  3. 03 2月, 2018 1 次提交
    • A
      tools headers: Sync {tools/,}arch/powerpc/include/uapi/asm/kvm.h · 1b8f5160
      Arnaldo Carvalho de Melo 提交于
      The changes in the 3214d01f ("KVM: PPC: Book3S: Provide information
      about hardware/firmware CVE workarounds") commit right now will not
      produce any change in the tools, but that is because we still need to
      improve tools/perf/trace/beauty/kvm_ioctl.sh to build per arch string
      tables, so that we avoid assigning multiple times to the same command
      string entry, i.e. multiple defines, for different arches, have the same
      value, causing this:
      
        In file included from trace/beauty/ioctl.c:82:0:
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c: In function ‘ioctl__scnprintf_kvm_cmd’:
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c:76:11: error: initialized field overwritten [-Werror=override-init]
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c:88:11: note: (near initialization for ‘kvm_ioctl_cmds[165]’)
        /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c:90:11: error: initialized field overwritten [-Werror=override-init]
          [0xa6] = "PPC_GET_SMMU_INFO",
                   ^~~~~~~~~~~~~~~~~~~
      
      So the onlye effect of updating the tools/ copy of ppc's kvm.h header
      is to silence these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/kvm.h' differs from latest version at 'arch/powerpc/include/uapi/asm/kvm.h'
      
      At some point we should do what we did for the errno tables and create
      per-arch string translation tables for the KVM ioctl commands for the
      architectures supporting KVM, such as s/390, PowerPC, x86_64 and ARM.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-jmcf78tqiudgn46zqfw2tgt2@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1b8f5160
  4. 15 12月, 2017 1 次提交
    • I
      tools/headers: Synchronize kernel <-> tooling headers · 643e345c
      Ingo Molnar 提交于
      Two kernel headers got modified recently, which are used by tooling as well:
      
       tools/include/uapi/linux/kvm.h
       arch/x86/include/asm/cpufeatures.h
      
      None of those changes have an effect on tooling, so do a plain copy.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      643e345c
  5. 29 11月, 2017 1 次提交
  6. 04 11月, 2017 1 次提交
    • I
      tools/headers: Synchronize kernel ABI headers · fb7df12d
      Ingo Molnar 提交于
      After the SPDX license tags were added a number of tooling headers got out of
      sync with their kernel variants, generating lots of build warnings.
      
      Sync them:
      
       - tools/arch/x86/include/asm/disabled-features.h,
         tools/arch/x86/include/asm/required-features.h,
         tools/include/linux/hash.h:
      
           Remove the SPDX tag where the kernel version does not have it.
      
       - tools/include/asm-generic/bitops/__fls.h,
         tools/include/asm-generic/bitops/arch_hweight.h,
         tools/include/asm-generic/bitops/const_hweight.h,
         tools/include/asm-generic/bitops/fls.h,
         tools/include/asm-generic/bitops/fls64.h,
         tools/include/uapi/asm-generic/ioctls.h,
         tools/include/uapi/asm-generic/mman-common.h,
         tools/include/uapi/sound/asound.h,
         tools/include/uapi/linux/kvm.h,
         tools/include/uapi/linux/perf_event.h,
         tools/include/uapi/linux/sched.h,
         tools/include/uapi/linux/vhost.h,
         tools/include/uapi/sound/asound.h:
      
           Add the SPDX tag of the respective kernel header.
      
       - tools/include/uapi/linux/bpf_common.h,
         tools/include/uapi/linux/fcntl.h,
         tools/include/uapi/linux/hw_breakpoint.h,
         tools/include/uapi/linux/mman.h,
         tools/include/uapi/linux/stat.h,
      
           Change the tag to the kernel header version:
      
             -/* SPDX-License-Identifier: GPL-2.0 */
             +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
      
      Also sync other header details:
      
       - include/uapi/sound/asound.h:
      
           Fix pointless end of line whitespace noise the header grew in this cycle.
      
       - tools/arch/x86/lib/memcpy_64.S:
      
           Sync the code and add tools/include/asm/export.h with dummy wrappers
           to support building the kernel side code in a tooling header environment.
      
       - tools/include/uapi/asm-generic/mman.h,
         tools/include/uapi/linux/bpf.h:
      
           Sync other details that don't impact tooling's use of the ABIs.
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fb7df12d
  7. 25 9月, 2017 1 次提交
    • I
      tools include: Sync kernel ABI headers with tooling headers · 549a3976
      Ingo Molnar 提交于
      Time for a sync with ABI/uapi headers with the upcoming v4.14 kernel.
      
      None of the ABI changes require any source code level changes to our
      existing in-kernel tooling code:
      
        - tools/arch/s390/include/uapi/asm/kvm.h:
      
            New KVM_S390_VM_TOD_EXT ABI, not used by in-kernel tooling.
      
        - tools/arch/x86/include/asm/cpufeatures.h:
          tools/arch/x86/include/asm/disabled-features.h:
      
            New PCID, SME and VGIF x86 CPU feature bits defined.
      
        - tools/include/asm-generic/hugetlb_encode.h:
          tools/include/uapi/asm-generic/mman-common.h:
          tools/include/uapi/linux/mman.h:
      
            Two new madvise() flags, plus a hugetlb system call mmap flags
            restructuring/extension changes.
      
        - tools/include/uapi/drm/drm.h:
          tools/include/uapi/drm/i915_drm.h:
      
            New drm_syncobj_create flags definitions, new drm_syncobj_wait
            and drm_syncobj_array ABIs. DRM_I915_PERF_* calls and a new
            I915_PARAM_HAS_EXEC_FENCE_ARRAY ABI for the Intel driver.
      
        - tools/include/uapi/linux/bpf.h:
      
            New bpf_sock fields (::mark and ::priority), new XDP_REDIRECT
            action, new kvm_ppc_smmu_info fields (::data_keys, instr_keys)
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/20170913073823.lxmi4c7ejqlfabjx@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      549a3976
  8. 02 8月, 2017 1 次提交
  9. 14 7月, 2017 1 次提交
    • R
      kvm: x86: hyperv: make VP_INDEX managed by userspace · d3457c87
      Roman Kagan 提交于
      Hyper-V identifies vCPUs by Virtual Processor Index, which can be
      queried via HV_X64_MSR_VP_INDEX msr.  It is defined by the spec as a
      sequential number which can't exceed the maximum number of vCPUs per VM.
      APIC ids can be sparse and thus aren't a valid replacement for VP
      indices.
      
      Current KVM uses its internal vcpu index as VP_INDEX.  However, to make
      it predictable and persistent across VM migrations, the userspace has to
      control the value of VP_INDEX.
      
      This patch achieves that, by storing vp_index explicitly on vcpu, and
      allowing HV_X64_MSR_VP_INDEX to be set from the host side.  For
      compatibility it's initialized to KVM vcpu index.  Also a few variables
      are renamed to make clear distinction betweed this Hyper-V vp_index and
      KVM vcpu_id (== APIC id).  Besides, a new capability,
      KVM_CAP_HYPERV_VP_INDEX, is added to allow the userspace to skip
      attempting msr writes where unsupported, to avoid spamming error logs.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d3457c87
  10. 13 7月, 2017 2 次提交
    • R
      kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6
      Roman Kagan 提交于
      There is a flaw in the Hyper-V SynIC implementation in KVM: when message
      page or event flags page is enabled by setting the corresponding msr,
      KVM zeroes it out.  This is problematic because on migration the
      corresponding MSRs are loaded on the destination, so the content of
      those pages is lost.
      
      This went unnoticed so far because the only user of those pages was
      in-KVM hyperv synic timers, which could continue working despite that
      zeroing.
      
      Newer QEMU uses those pages for Hyper-V VMBus implementation, and
      zeroing them breaks the migration.
      
      Besides, in newer QEMU the content of those pages is fully managed by
      QEMU, so zeroing them is undesirable even when writing the MSRs from the
      guest side.
      
      To support this new scheme, introduce a new capability,
      KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
      pages aren't zeroed out in KVM.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      efc479e6
    • G
      KVM: s390: Fix KVM_S390_GET_CMMA_BITS ioctl definition · 949c0336
      Gleb Fotengauer-Malinovskiy 提交于
      In case of KVM_S390_GET_CMMA_BITS, the kernel does not only read struct
      kvm_s390_cmma_log passed from userspace (which constitutes _IOC_WRITE),
      it also writes back a return value (which constitutes _IOC_READ) making
      this an _IOWR ioctl instead of _IOW.
      
      Fixes: 4036e387 ("KVM: s390: ioctls to get and set guest storage attributes")
      Signed-off-by: NGleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      949c0336
  11. 22 6月, 2017 2 次提交
  12. 21 6月, 2017 1 次提交
    • A
      KVM: PPC: Book3S HV: Add new capability to control MCE behaviour · 134764ed
      Aravinda Prasad 提交于
      This introduces a new KVM capability to control how KVM behaves
      on machine check exception (MCE) in HV KVM guests.
      
      If this capability has not been enabled, KVM redirects machine check
      exceptions to guest's 0x200 vector, if the address in error belongs to
      the guest. With this capability enabled, KVM will cause a guest exit
      with the exit reason indicating an NMI.
      
      The new capability is required to avoid problems if a new kernel/KVM
      is used with an old QEMU, running a guest that doesn't issue
      "ibm,nmi-register".  As old QEMU does not understand the NMI exit
      type, it treats it as a fatal error.  However, the guest could have
      handled the machine check error if the exception was delivered to
      guest's 0x200 interrupt vector instead of NMI exit in case of old
      QEMU.
      
      [paulus@ozlabs.org - Reworded the commit message to be clearer,
       enable only on HV KVM.]
      Signed-off-by: NAravinda Prasad <aravinda@linux.vnet.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      134764ed
  13. 21 4月, 2017 1 次提交
    • M
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin 提交于
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: N"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
  14. 20 4月, 2017 2 次提交
    • A
      KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba
      Alexey Kardashevskiy 提交于
      This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
      and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
      without passing them to user space which saves time on switching
      to user space and back.
      
      This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
      KVM tries to handle a TCE request in the real mode, if failed
      it passes the request to the virtual mode to complete the operation.
      If it a virtual mode handler fails, the request is passed to
      the user space; this is not expected to happen though.
      
      To avoid dealing with page use counters (which is tricky in real mode),
      this only accelerates SPAPR TCE IOMMU v2 clients which are required
      to pre-register the userspace memory. The very first TCE request will
      be handled in the VFIO SPAPR TCE driver anyway as the userspace view
      of the TCE table (iommu_table::it_userspace) is not allocated till
      the very first mapping happens and we cannot call vmalloc in real mode.
      
      If we fail to update a hardware IOMMU table unexpected reason, we just
      clear it and move on as there is nothing really we can do about it -
      for example, if we hot plug a VFIO device to a guest, existing TCE tables
      will be mirrored automatically to the hardware and there is no interface
      to report to the guest about possible failures.
      
      This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
      the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
      and associates a physical IOMMU table with the SPAPR TCE table (which
      is a guest view of the hardware IOMMU table). The iommu_table object
      is cached and referenced so we do not have to look up for it in real mode.
      
      This does not implement the UNSET counterpart as there is no use for it -
      once the acceleration is enabled, the existing userspace won't
      disable it unless a VFIO container is destroyed; this adds necessary
      cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
      
      This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
      space.
      
      This adds real mode version of WARN_ON_ONCE() as the generic version
      causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
      returns in the code, this also adds a check for already existing
      vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
      
      This finally makes use of vfio_external_user_iommu_id() which was
      introduced quite some time ago and was considered for removal.
      
      Tests show that this patch increases transmission speed from 220MB/s
      to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      121f80ba
    • A
      KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number · 4898d3f4
      Alexey Kardashevskiy 提交于
      This adds a capability number for in-kernel support for VFIO on
      SPAPR platform.
      
      The capability will tell the user space whether in-kernel handlers of
      H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
      must not attempt allocating a TCE table in the host kernel via
      the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
      will not be passed to the user space which is desired action in
      the situation like that.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4898d3f4
  15. 09 4月, 2017 1 次提交
  16. 07 4月, 2017 1 次提交
  17. 28 3月, 2017 2 次提交
    • J
      KVM: MIPS: Add 64BIT capability · 578fd61d
      James Hogan 提交于
      Add a new KVM_CAP_MIPS_64BIT capability to indicate that 64-bit MIPS
      guests are available and supported. In this case it should still be
      possible to run 32-bit guest code. If not available it won't be possible
      to run 64-bit guest code and the instructions may not be available, or
      the kernel may not support full context switching of 64-bit registers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      578fd61d
    • J
      KVM: MIPS: Add VZ & TE capabilities · a8a3c426
      James Hogan 提交于
      Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
      to allow MIPS KVM to support VZ without confusing old users (which
      expect the trap & emulate implementation), define and start checking
      KVM_CREATE_VM type codes.
      
      The codes available are:
      
       - KVM_VM_MIPS_TE = 0
      
         This is the current value expected from the user, and will create a
         VM using trap & emulate in user mode, confined to the user mode
         address space. This may in future become unavailable if the kernel is
         only configured to support VZ, in which case the EINVAL error will be
         returned and KVM_CAP_MIPS_TE won't be available even though
         KVM_CAP_MIPS_VZ is.
      
       - KVM_VM_MIPS_VZ = 1
      
         This can be provided when the KVM_CAP_MIPS_VZ capability is available
         to create a VM using VZ, with a fully virtualized guest virtual
         address space. If VZ support is unavailable in the kernel, the EINVAL
         error will be returned (although old kernels without the
         KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
         emulate VM).
      
      This is designed to allow the desired implementation (T&E vs VZ) to be
      potentially chosen at runtime rather than being fixed in the kernel
      configuration.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      a8a3c426
  18. 23 3月, 2017 1 次提交
  19. 17 2月, 2017 1 次提交
    • P
      KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1
      Paolo Bonzini 提交于
      The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
      a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
      to a dummy signal handler; by blocking the signal outside KVM_RUN and
      unblocking it inside, this possible race is closed:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
              check flag
                                                set flag
                                                raise signal
              (signal handler does nothing)
              KVM_RUN
      
      However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
      tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
      remote NUMA node, because it is on the node of a thread's creator.
      Taking this lock can be very expensive if there are many userspace
      exits (as is the case for SMP Windows VMs without Hyper-V reference
      time counter).
      
      As an alternative, we can put the flag directly in kvm_run so that
      KVM can see it:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
                                                raise signal
              signal handler
                set run->immediate_exit
              KVM_RUN
                check run->immediate_exit
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      460df4c1
  20. 31 1月, 2017 2 次提交
    • D
      KVM: PPC: Book3S HV: HPT resizing documentation and reserved numbers · ef1ead0c
      David Gibson 提交于
      This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
      advertise whether KVM is capable of handling the PAPR extensions for
      resizing the hashed page table during guest runtime.  It also adds
      definitions for two new VM ioctl()s to implement this extension, and
      documentation of the same.
      
      Note that, HPT resizing is already possible with KVM PR without kernel
      modification, since the HPT is managed within userspace (qemu).  The
      capability defined here will only be set where an in-kernel implementation
      of resizing is necessary, i.e. for KVM HV.  To determine if the userspace
      resize implementation can be used, it's necessary to check
      KVM_CAP_PPC_ALLOC_HTAB.  Unfortunately older kernels incorrectly set
      KVM_CAP_PPC_ALLOC_HTAB even with KVM PR.  If userspace it want to support
      resizing with KVM PR on such kernels, it will need a workaround.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef1ead0c
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
  21. 24 11月, 2016 1 次提交
  22. 20 11月, 2016 1 次提交
  23. 02 8月, 2016 1 次提交
  24. 23 7月, 2016 1 次提交
  25. 19 7月, 2016 2 次提交
  26. 18 7月, 2016 1 次提交
  27. 14 7月, 2016 2 次提交
    • R
      KVM: x86: add a flag to disable KVM x2apic broadcast quirk · c519265f
      Radim Krčmář 提交于
      Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
      KVM_CAP_X2APIC_API.
      
      The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
      The enableable capability is needed in order to support standard x2APIC and
      remain backward compatible.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      [Expand kvm_apic_mda comment. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c519265f
    • R
      KVM: x86: add KVM_CAP_X2APIC_API · 37131313
      Radim Krčmář 提交于
      KVM_CAP_X2APIC_API is a capability for features related to x2APIC
      enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
      extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
      Both are needed to support x2APIC.
      
      The feature has to be enableable and disabled by default, because
      get/set ioctl shifted and truncated APIC ID to 8 bits by using a
      non-standard protocol inspired by xAPIC and the change is not
      backward-compatible.
      
      Changes to MSI addresses follow the format used by interrupt remapping
      unit.  The upper address word, that used to be 0, contains upper 24 bits
      of the LAPIC address in its upper 24 bits.  Lower 8 bits are reserved as
      0.  Using the upper address word is not backward-compatible either as we
      didn't check that userspace zeroed the word.  Reserved bits are still
      not explicitly checked, but non-zero data will affect LAPIC addresses,
      which will cause a bug.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37131313
  28. 12 5月, 2016 1 次提交
    • G
      kvm: introduce KVM_MAX_VCPU_ID · 0b1b1dfd
      Greg Kurz 提交于
      The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
      also the upper limit for vCPU ids. This is okay for all archs except PowerPC
      which can have higher ids, depending on the cpu/core/thread topology. In the
      worst case (single threaded guest, host with 8 threads per core), it limits
      the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
      
      This patch separates the vCPU numbering from the total number of vCPUs, with
      the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
      plus one.
      
      The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
      before passing them to KVM_CREATE_VCPU.
      
      This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
      Other archs continue to return KVM_MAX_VCPUS instead.
      Suggested-by: NRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b1b1dfd
  29. 02 3月, 2016 2 次提交
  30. 01 3月, 2016 2 次提交
  31. 17 2月, 2016 1 次提交
    • A
      kvm/x86: Hyper-V VMBus hypercall userspace exit · 83326e43
      Andrey Smetanin 提交于
      The patch implements KVM_EXIT_HYPERV userspace exit
      functionality for Hyper-V VMBus hypercalls:
      HV_X64_HCALL_POST_MESSAGE, HV_X64_HCALL_SIGNAL_EVENT.
      
      Changes v3:
      * use vcpu->arch.complete_userspace_io to setup hypercall
      result
      
      Changes v2:
      * use KVM_EXIT_HYPERV for hypercalls
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Joerg Roedel <joro@8bytes.org>
      CC: "K. Y. Srinivasan" <kys@microsoft.com>
      CC: Haiyang Zhang <haiyangz@microsoft.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      83326e43