1. 22 10月, 2020 2 次提交
    • P
      KVM: VMX: Forbid userspace MSR filters for x2APIC · 043248b3
      Paolo Bonzini 提交于
      Allowing userspace to intercept reads to x2APIC MSRs when APICV is
      fully enabled for the guest simply can't work.   But more in general,
      the LAPIC could be set to in-kernel after the MSR filter is setup
      and allowing accesses by userspace would be very confusing.
      
      We could in principle allow userspace to intercept reads and writes to TPR,
      and writes to EOI and SELF_IPI, but while that could be made it work, it
      would still be silly.
      
      Cc: Alexander Graf <graf@amazon.com>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      043248b3
    • S
      KVM: VMX: Ignore userspace MSR filters for x2APIC · 9389b9d5
      Sean Christopherson 提交于
      Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace
      filtering.  Allowing userspace to intercept reads to x2APIC MSRs when
      APICV is fully enabled for the guest simply can't work; the LAPIC and thus
      virtual APIC is in-kernel and cannot be directly accessed by userspace.
      To keep things simple we will in fact forbid intercepting x2APIC MSRs
      altogether, independent of the default_allow setting.
      
      Cc: Alexander Graf <graf@amazon.com>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201005195532.8674-3-sean.j.christopherson@intel.com>
      [Modified to operate even if APICv is disabled, adjust documentation. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9389b9d5
  2. 28 9月, 2020 3 次提交
    • A
      KVM: x86: Introduce MSR filtering · 1a155254
      Alexander Graf 提交于
      It's not desireable to have all MSRs always handled by KVM kernel space. Some
      MSRs would be useful to handle in user space to either emulate behavior (like
      uCode updates) or differentiate whether they are valid based on the CPU model.
      
      To allow user space to specify which MSRs it wants to see handled by KVM,
      this patch introduces a new ioctl to push filter rules with bitmaps into
      KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
      With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
      denied MSR events to user space to operate on.
      
      If no filter is populated, MSR handling stays identical to before.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      
      Message-Id: <20200925143422.21718-8-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1a155254
    • A
      KVM: x86: Allow deflecting unknown MSR accesses to user space · 1ae09954
      Alexander Graf 提交于
      MSRs are weird. Some of them are normal control registers, such as EFER.
      Some however are registers that really are model specific, not very
      interesting to virtualization workloads, and not performance critical.
      Others again are really just windows into package configuration.
      
      Out of these MSRs, only the first category is necessary to implement in
      kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
      certain CPU models and MSRs that contain information on the package level
      are much better suited for user space to process. However, over time we have
      accumulated a lot of MSRs that are not the first category, but still handled
      by in-kernel KVM code.
      
      This patch adds a generic interface to handle WRMSR and RDMSR from user
      space. With this, any future MSR that is part of the latter categories can
      be handled in user space.
      
      Furthermore, it allows us to replace the existing "ignore_msrs" logic with
      something that applies per-VM rather than on the full system. That way you
      can run productive VMs in parallel to experimental ones where you don't care
      about proper MSR handling.
      Signed-off-by: NAlexander Graf <graf@amazon.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      
      Message-Id: <20200925143422.21718-3-graf@amazon.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1ae09954
    • V
      KVM: x86: hyper-v: Mention SynDBG CPUID leaves in api.rst · b44f50d8
      Vitaly Kuznetsov 提交于
      We forgot to update KVM_GET_SUPPORTED_HV_CPUID's documentation in api.rst
      when SynDBG leaves were added.
      
      While on it, fix 'KVM_GET_SUPPORTED_CPUID' copy-paste error.
      
      Fixes: f97f5a56 ("x86/kvm/hyper-v: Add support for synthetic debugger interface")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200924145757.1035782-2-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b44f50d8
  3. 14 9月, 2020 1 次提交
  4. 21 8月, 2020 2 次提交
  5. 22 7月, 2020 1 次提交
  6. 13 7月, 2020 1 次提交
  7. 10 7月, 2020 1 次提交
  8. 09 7月, 2020 2 次提交
  9. 06 7月, 2020 1 次提交
  10. 01 6月, 2020 3 次提交
  11. 16 5月, 2020 1 次提交
    • K
      KVM: arm64: Support enabling dirty log gradually in small chunks · c862626e
      Keqian Zhu 提交于
      There is already support of enabling dirty log gradually in small chunks
      for x86 in commit 3c9bd400 ("KVM: x86: enable dirty log gradually in
      small chunks"). This adds support for arm64.
      
      x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
      is enabled. However, for arm64, both huge pages and normal pages can be
      write protected gradually by userspace.
      
      Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
      Linux VMs with different page size. The memory pressure is 127G in each
      case. The time taken of memory_global_dirty_log_start in QEMU is listed
      below:
      
      Page Size      Before    After Optimization
        4K            650ms         1.8ms
        2M             4ms          1.8ms
        1G             2ms          1.8ms
      
      Besides the time reduction, the biggest improvement is that we will minimize
      the performance side effect (because of dissolving huge pages and marking
      memslots dirty) on guest after enabling dirty log.
      Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com
      c862626e
  12. 05 5月, 2020 1 次提交
  13. 25 4月, 2020 1 次提交
  14. 27 3月, 2020 1 次提交
  15. 26 3月, 2020 1 次提交
    • P
      KVM: PPC: Book3S HV: Add a capability for enabling secure guests · 9a5788c6
      Paul Mackerras 提交于
      At present, on Power systems with Protected Execution Facility
      hardware and an ultravisor, a KVM guest can transition to being a
      secure guest at will.  Userspace (QEMU) has no way of knowing
      whether a host system is capable of running secure guests.  This
      will present a problem in future when the ultravisor is capable of
      migrating secure guests from one host to another, because
      virtualization management software will have no way to ensure that
      secure guests only run in domains where all of the hosts can
      support secure guests.
      
      This adds a VM capability which has two functions: (a) userspace
      can query it to find out whether the host can support secure guests,
      and (b) userspace can enable it for a guest, which allows that
      guest to become a secure guest.  If userspace does not enable it,
      KVM will return an error when the ultravisor does the hypercall
      that indicates that the guest is starting to transition to a
      secure guest.  The ultravisor will then abort the transition and
      the guest will terminate.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      9a5788c6
  16. 17 3月, 2020 2 次提交
    • S
      KVM: x86: Remove stateful CPUID handling · 7ff6c035
      Sean Christopherson 提交于
      Remove the code for handling stateful CPUID 0x2 and mark the associated
      flags as deprecated.  WARN if host CPUID 0x2.0.AL > 1, i.e. if by some
      miracle a host with stateful CPUID 0x2 is encountered.
      
      No known CPU exists that supports hardware accelerated virtualization
      _and_ a stateful CPUID 0x2.  Barring an extremely contrived nested
      virtualization scenario, stateful CPUID support is dead code.
      Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7ff6c035
    • J
      KVM: x86: enable dirty log gradually in small chunks · 3c9bd400
      Jay Zhou 提交于
      It could take kvm->mmu_lock for an extended period of time when
      enabling dirty log for the first time. The main cost is to clear
      all the D-bits of last level SPTEs. This situation can benefit from
      manual dirty log protect as well, which can reduce the mmu_lock
      time taken. The sequence is like this:
      
      1. Initialize all the bits of the dirty bitmap to 1 when enabling
         dirty log for the first time
      2. Only write protect the huge pages
      3. KVM_GET_DIRTY_LOG returns the dirty bitmap info
      4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
         SPTEs gradually in small chunks
      
      Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment,
      I did some tests with a 128G windows VM and counted the time taken
      of memory_global_dirty_log_start, here is the numbers:
      
      VM Size        Before    After optimization
      128G           460ms     10ms
      Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3c9bd400
  17. 28 2月, 2020 2 次提交
  18. 25 2月, 2020 1 次提交
  19. 13 2月, 2020 1 次提交
  20. 31 1月, 2020 1 次提交
  21. 23 1月, 2020 1 次提交
  22. 30 11月, 2019 1 次提交
  23. 28 11月, 2019 1 次提交
  24. 22 10月, 2019 2 次提交
    • C
      KVM: arm/arm64: Allow user injection of external data aborts · da345174
      Christoffer Dall 提交于
      In some scenarios, such as buggy guest or incorrect configuration of the
      VMM and firmware description data, userspace will detect a memory access
      to a portion of the IPA, which is not mapped to any MMIO region.
      
      For this purpose, the appropriate action is to inject an external abort
      to the guest.  The kernel already has functionality to inject an
      external abort, but we need to wire up a signal from user space that
      lets user space tell the kernel to do this.
      
      It turns out, we already have the set event functionality which we can
      perfectly reuse for this.
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      da345174
    • C
      KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace · c726200d
      Christoffer Dall 提交于
      For a long time, if a guest accessed memory outside of a memslot using
      any of the load/store instructions in the architecture which doesn't
      supply decoding information in the ESR_EL2 (the ISV bit is not set), the
      kernel would print the following message and terminate the VM as a
      result of returning -ENOSYS to userspace:
      
        load/store instruction decoding not implemented
      
      The reason behind this message is that KVM assumes that all accesses
      outside a memslot is an MMIO access which should be handled by
      userspace, and we originally expected to eventually implement some sort
      of decoding of load/store instructions where the ISV bit was not set.
      
      However, it turns out that many of the instructions which don't provide
      decoding information on abort are not safe to use for MMIO accesses, and
      the remaining few that would potentially make sense to use on MMIO
      accesses, such as those with register writeback, are not used in
      practice.  It also turns out that fetching an instruction from guest
      memory can be a pretty horrible affair, involving stopping all CPUs on
      SMP systems, handling multiple corner cases of address translation in
      software, and more.  It doesn't appear likely that we'll ever implement
      this in the kernel.
      
      What is much more common is that a user has misconfigured his/her guest
      and is actually not accessing an MMIO region, but just hitting some
      random hole in the IPA space.  In this scenario, the error message above
      is almost misleading and has led to a great deal of confusion over the
      years.
      
      It is, nevertheless, ABI to userspace, and we therefore need to
      introduce a new capability that userspace explicitly enables to change
      behavior.
      
      This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV)
      which does exactly that, and introduces a new exit reason to report the
      event to userspace.  User space can then emulate an exception to the
      guest, restart the guest, suspend the guest, or take any other
      appropriate action as per the policy of the running system.
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      Reviewed-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      c726200d
  25. 21 10月, 2019 1 次提交
    • F
      KVM: PPC: Report single stepping capability · 1a9167a2
      Fabiano Rosas 提交于
      When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
      the next instruction to be single stepped via the
      KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
      
      This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
      to inform userspace about the state of single stepping support.
      
      We currently don't have support for guest single stepping implemented
      in Book3S HV so the capability is only present for Book3S PR and
      BookE.
      Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1a9167a2
  26. 24 9月, 2019 1 次提交
  27. 11 9月, 2019 1 次提交
  28. 09 9月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier 提交于
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      92f35b75
  29. 29 8月, 2019 1 次提交
  30. 24 7月, 2019 1 次提交