1. 09 3月, 2017 1 次提交
  2. 17 2月, 2017 1 次提交
    • P
      KVM: race-free exit from KVM_RUN without POSIX signals · 460df4c1
      Paolo Bonzini 提交于
      The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick"
      a VCPU out of KVM_RUN through a POSIX signal.  A signal is attached
      to a dummy signal handler; by blocking the signal outside KVM_RUN and
      unblocking it inside, this possible race is closed:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
              check flag
                                                set flag
                                                raise signal
              (signal handler does nothing)
              KVM_RUN
      
      However, one issue with KVM_SET_SIGNAL_MASK is that it has to take
      tsk->sighand->siglock on every KVM_RUN.  This lock is often on a
      remote NUMA node, because it is on the node of a thread's creator.
      Taking this lock can be very expensive if there are many userspace
      exits (as is the case for SMP Windows VMs without Hyper-V reference
      time counter).
      
      As an alternative, we can put the flag directly in kvm_run so that
      KVM can see it:
      
                VCPU thread                     service thread
         --------------------------------------------------------------
                                                raise signal
              signal handler
                set run->immediate_exit
              KVM_RUN
                check run->immediate_exit
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      460df4c1
  3. 08 2月, 2017 1 次提交
  4. 03 2月, 2017 3 次提交
    • J
      KVM: MIPS/T&E: Expose read-only CP0_IntCtl register · ad58d4d4
      James Hogan 提交于
      Expose the CP0_IntCtl register through the KVM register access API,
      which is a required register since MIPS32r2. It is currently read-only
      since the VS field isn't implemented due to lack of Config3.VInt or
      Config3.VEIC.
      
      It is implemented in trap_emul.c so that a VZ implementation can allow
      writes.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      ad58d4d4
    • J
      KVM: MIPS/T&E: Expose CP0_EntryLo0/1 registers · 013044cc
      James Hogan 提交于
      Expose the CP0_EntryLo0 and CP0_EntryLo1 registers through the KVM
      register access API. This is fairly straightforward for trap & emulate
      since we don't support the RI and XI bits. For the sake of future
      proofing (particularly for VZ) it is explicitly specified that the API
      always exposes the 64-bit version of these registers (i.e. with the RI
      and XI bits in bit positions 63 and 62 respectively), and they are
      implemented in trap_emul.c rather than mips.c to allow them to be
      implemented differently for VZ.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      013044cc
    • J
      KVM: MIPS/T&E: Implement CP0_EBase register · 7801bbe1
      James Hogan 提交于
      The CP0_EBase register is a standard feature of MIPS32r2, so we should
      always have been implementing it properly. However the register value
      was ignored and wasn't exposed to userland.
      
      Fix the emulation of exceptions and interrupts to use the value stored
      in guest CP0_EBase, and fix the masks so that the top 3 bits (rather
      than the standard 2) are fixed, so that it is always in the guest KSeg0
      segment.
      
      Also add CP0_EBASE to the KVM one_reg interface so it can be accessed by
      userland, also allowing the CPU number field to be written (which isn't
      permitted by the guest).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      7801bbe1
  5. 31 1月, 2017 4 次提交
    • D
      KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size · f98a8bf9
      David Gibson 提交于
      The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
      table (HPT) that userspace expects a guest VM to have, and is also used to
      clear that HPT when necessary (e.g. guest reboot).
      
      At present, once the ioctl() is called for the first time, the HPT size can
      never be changed thereafter - it will be cleared but always sized as from
      the first call.
      
      With upcoming HPT resize implementation, we're going to need to allow
      userspace to resize the HPT at reset (to change it back to the default size
      if the guest changed it).
      
      So, we need to allow this ioctl() to change the HPT size.
      
      This patch also updates Documentation/virtual/kvm/api.txt to reflect
      the new behaviour.  In fact the documentation was already slightly
      incorrect since 572abd56 "KVM: PPC: Book3S HV: Don't fall back to
      smaller HPT size in allocation ioctl"
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f98a8bf9
    • D
      KVM: PPC: Book3S HV: HPT resizing documentation and reserved numbers · ef1ead0c
      David Gibson 提交于
      This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
      advertise whether KVM is capable of handling the PAPR extensions for
      resizing the hashed page table during guest runtime.  It also adds
      definitions for two new VM ioctl()s to implement this extension, and
      documentation of the same.
      
      Note that, HPT resizing is already possible with KVM PR without kernel
      modification, since the HPT is managed within userspace (qemu).  The
      capability defined here will only be set where an in-kernel implementation
      of resizing is necessary, i.e. for KVM HV.  To determine if the userspace
      resize implementation can be used, it's necessary to check
      KVM_CAP_PPC_ALLOC_HTAB.  Unfortunately older kernels incorrectly set
      KVM_CAP_PPC_ALLOC_HTAB even with KVM PR.  If userspace it want to support
      resizing with KVM PR on such kernels, it will need a workaround.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ef1ead0c
    • D
      Documentation: Correct duplicate section number in kvm/api.txt · ccc4df4e
      David Gibson 提交于
      Both KVM_CREATE_SPAPR_TCE_64 and KVM_REINJECT_CONTROL have section number
      4.98 in Documentation/virtual/kvm/api.txt, presumably due to a naive merge.
      This corrects the duplication.
      
      [paulus@ozlabs.org - correct section numbers for following sections,
       KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO, as well.]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ccc4df4e
    • P
      KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU · c9270132
      Paul Mackerras 提交于
      This adds two capabilities and two ioctls to allow userspace to
      find out about and configure the POWER9 MMU in a guest.  The two
      capabilities tell userspace whether KVM can support a guest using
      the radix MMU, or using the hashed page table (HPT) MMU with a
      process table and segment tables.  (Note that the MMUs in the
      POWER9 processor cores do not use the process and segment tables
      when in HPT mode, but the nest MMU does).
      
      The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
      whether a guest will use the radix MMU or the HPT MMU, and to
      specify the size and location (in guest space) of the process
      table.
      
      The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
      the radix MMU.  It returns a list of supported radix tree geometries
      (base page size and number of bits indexed at each level of the
      radix tree) and the encoding used to specify the various page
      sizes for the TLB invalidate entry instruction.
      
      Initially, both capabilities return 0 and the ioctls return -EINVAL,
      until the necessary infrastructure for them to operate correctly
      is added.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9270132
  6. 30 1月, 2017 1 次提交
  7. 09 1月, 2017 1 次提交
  8. 17 12月, 2016 1 次提交
  9. 28 11月, 2016 1 次提交
  10. 24 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Add new POWER9 guest-accessible SPRs · e9cf1e08
      Paul Mackerras 提交于
      This adds code to handle two new guest-accessible special-purpose
      registers on POWER9: TIDR (thread ID register) and PSSCR (processor
      stop status and control register).  They are context-switched
      between host and guest, and the guest values can be read and set
      via the one_reg interface.
      
      The PSSCR contains some fields which are guest-accessible and some
      which are only accessible in hypervisor mode.  We only allow the
      guest-accessible fields to be read or set by userspace.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      e9cf1e08
  11. 22 11月, 2016 1 次提交
  12. 21 11月, 2016 1 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state · 0d808df0
      Paul Mackerras 提交于
      When switching from/to a guest that has a transaction in progress,
      we need to save/restore the checkpointed register state.  Although
      XER is part of the CPU state that gets checkpointed, the code that
      does this saving and restoring doesn't save/restore XER.
      
      This fixes it by saving and restoring the XER.  To allow userspace
      to read/write the checkpointed XER value, we also add a new ONE_REG
      specifier.
      
      The visible effect of this bug is that the guest may see its XER
      value being corrupted when it uses transactions.
      
      Fixes: e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support")
      Fixes: 0a8eccef ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0d808df0
  13. 20 11月, 2016 1 次提交
  14. 14 11月, 2016 1 次提交
  15. 27 10月, 2016 1 次提交
  16. 24 10月, 2016 1 次提交
  17. 28 9月, 2016 1 次提交
  18. 08 9月, 2016 1 次提交
  19. 04 8月, 2016 1 次提交
  20. 23 7月, 2016 3 次提交
  21. 19 7月, 2016 3 次提交
  22. 18 7月, 2016 1 次提交
  23. 14 7月, 2016 2 次提交
    • R
      KVM: x86: add a flag to disable KVM x2apic broadcast quirk · c519265f
      Radim Krčmář 提交于
      Add KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK as a feature flag to
      KVM_CAP_X2APIC_API.
      
      The quirk made KVM interpret 0xff as a broadcast even in x2APIC mode.
      The enableable capability is needed in order to support standard x2APIC and
      remain backward compatible.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      [Expand kvm_apic_mda comment. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c519265f
    • R
      KVM: x86: add KVM_CAP_X2APIC_API · 37131313
      Radim Krčmář 提交于
      KVM_CAP_X2APIC_API is a capability for features related to x2APIC
      enablement.  KVM_X2APIC_API_32BIT_FORMAT feature can be enabled to
      extend APIC ID in get/set ioctl and MSI addresses to 32 bits.
      Both are needed to support x2APIC.
      
      The feature has to be enableable and disabled by default, because
      get/set ioctl shifted and truncated APIC ID to 8 bits by using a
      non-standard protocol inspired by xAPIC and the change is not
      backward-compatible.
      
      Changes to MSI addresses follow the format used by interrupt remapping
      unit.  The upper address word, that used to be 0, contains upper 24 bits
      of the LAPIC address in its upper 24 bits.  Lower 8 bits are reserved as
      0.  Using the upper address word is not backward-compatible either as we
      didn't check that userspace zeroed the word.  Reserved bits are still
      not explicitly checked, but non-zero data will affect LAPIC addresses,
      which will cause a bug.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37131313
  24. 16 6月, 2016 1 次提交
    • J
      MIPS: KVM: Add KScratch registers · 05108709
      James Hogan 提交于
      Allow up to 6 KVM guest KScratch registers to be enabled and accessed
      via the KVM guest register API and from the guest itself (the fallback
      reading and writing of commpage registers is sufficient for KScratch
      registers to work as expected).
      
      User mode can expose the registers by setting the appropriate bits of
      the guest Config4.KScrExist field. KScratch registers that aren't usable
      won't be writeable via the KVM Ioctl API.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05108709
  25. 14 6月, 2016 1 次提交
  26. 10 6月, 2016 3 次提交
    • D
      KVM: s390: provide CMMA attributes only if available · f9cbd9b0
      David Hildenbrand 提交于
      Let's not provide the device attribute for cmma enabling and clearing
      if the hardware doesn't support it.
      
      This also helps getting rid of the undocumented return value "-EINVAL"
      in case CMMA is not available when trying to enable it.
      
      Also properly document the meaning of -EINVAL for CMMA clearing.
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      f9cbd9b0
    • D
      KVM: s390: interface to query and configure cpu subfunctions · 0a763c78
      David Hildenbrand 提交于
      We have certain instructions that indicate available subfunctions via
      a query subfunction (crypto functions and ptff), or via a test bit
      function (plo).
      
      By exposing these "subfunction blocks" to user space, we allow user space
      to
      1) query available subfunctions and make sure subfunctions won't get lost
         during migration - e.g. properly indicate them via a CPU model
      2) change the subfunctions to be reported to the guest (even adding
         unavailable ones)
      
      This mechanism works just like the way we indicate the stfl(e) list to
      user space.
      
      This way, user space could even emulate some subfunctions in QEMU in the
      future. If this is ever applicable, we have to make sure later on, that
      unsupported subfunctions result in an intercept to QEMU.
      
      Please note that support to indicate them to the guest is still missing
      and requires hardware support. Usually, the IBC takes already care of these
      subfunctions for migration safety. QEMU should make sure to always set
      these bits properly according to the machine generation to be emulated.
      
      Available subfunctions are only valid in combination with STFLE bits
      retrieved via KVM_S390_VM_CPU_MACHINE and enabled via
      KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the
      indicated subfunctions are guaranteed to be correct.
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      0a763c78
    • D
      KVM: s390: interface to query and configure cpu features · 15c9705f
      David Hildenbrand 提交于
      For now, we only have an interface to query and configure facilities
      indicated via STFL(E). However, we also have features indicated via
      SCLP, that have to be indicated to the guest by user space and usually
      require KVM support.
      
      This patch allows user space to query and configure available cpu features
      for the guest.
      
      Please note that disabling a feature doesn't necessarily mean that it is
      completely disabled (e.g. ESOP is mostly handled by the SIE). We will try
      our best to disable it.
      
      Most features (e.g. SCLP) can't directly be forwarded, as most of them need
      in addition to hardware support, support in KVM. As we later on want to
      turn these features in KVM explicitly on/off (to simulate different
      behavior), we have to filter all features provided by the hardware and
      make them configurable.
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      15c9705f
  27. 12 5月, 2016 1 次提交
    • G
      kvm: introduce KVM_MAX_VCPU_ID · 0b1b1dfd
      Greg Kurz 提交于
      The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
      also the upper limit for vCPU ids. This is okay for all archs except PowerPC
      which can have higher ids, depending on the cpu/core/thread topology. In the
      worst case (single threaded guest, host with 8 threads per core), it limits
      the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
      
      This patch separates the vCPU numbering from the total number of vCPUs, with
      the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
      plus one.
      
      The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
      before passing them to KVM_CREATE_VCPU.
      
      This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
      Other archs continue to return KVM_MAX_VCPUS instead.
      Suggested-by: NRadim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b1b1dfd
  28. 09 5月, 2016 1 次提交