1. 14 9月, 2017 1 次提交
  2. 13 9月, 2017 1 次提交
  3. 01 9月, 2017 1 次提交
  4. 25 8月, 2017 6 次提交
  5. 18 8月, 2017 1 次提交
  6. 10 8月, 2017 1 次提交
  7. 08 8月, 2017 1 次提交
  8. 18 7月, 2017 1 次提交
    • T
      kvm/x86/svm: Support Secure Memory Encryption within KVM · d0ec49d4
      Tom Lendacky 提交于
      Update the KVM support to work with SME. The VMCB has a number of fields
      where physical addresses are used and these addresses must contain the
      memory encryption mask in order to properly access the encrypted memory.
      Also, use the memory encryption mask when creating and using the nested
      page tables.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/89146eccfa50334409801ff20acd52a90fb5efcf.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d0ec49d4
  9. 14 7月, 2017 5 次提交
  10. 13 7月, 2017 2 次提交
    • R
      kvm: x86: hyperv: add KVM_CAP_HYPERV_SYNIC2 · efc479e6
      Roman Kagan 提交于
      There is a flaw in the Hyper-V SynIC implementation in KVM: when message
      page or event flags page is enabled by setting the corresponding msr,
      KVM zeroes it out.  This is problematic because on migration the
      corresponding MSRs are loaded on the destination, so the content of
      those pages is lost.
      
      This went unnoticed so far because the only user of those pages was
      in-KVM hyperv synic timers, which could continue working despite that
      zeroing.
      
      Newer QEMU uses those pages for Hyper-V VMBus implementation, and
      zeroing them breaks the migration.
      
      Besides, in newer QEMU the content of those pages is fully managed by
      QEMU, so zeroing them is undesirable even when writing the MSRs from the
      guest side.
      
      To support this new scheme, introduce a new capability,
      KVM_CAP_HYPERV_SYNIC2, which, when enabled, makes sure that the synic
      pages aren't zeroed out in KVM.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      efc479e6
    • L
      KVM: x86: make backwards_tsc_observed a per-VM variable · a826faf1
      Ladi Prosek 提交于
      The backwards_tsc_observed global introduced in commit 16a96021 is never
      reset to false. If a VM happens to be running while the host is suspended
      (a common source of the TSC jumping backwards), master clock will never
      be enabled again for any VM. In contrast, if no VM is running while the
      host is suspended, master clock is unaffected. This is inconsistent and
      unnecessarily strict. Let's track the backwards_tsc_observed variable
      separately and let each VM start with a clean slate.
      
      Real world impact: My Windows VMs get slower after my laptop undergoes a
      suspend/resume cycle. The only way to get the perf back is unloading and
      reloading the kvm module.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a826faf1
  11. 03 7月, 2017 1 次提交
    • P
      kvm: x86: mmu: allow A/D bits to be disabled in an mmu · ac8d57e5
      Peter Feiner 提交于
      Adds the plumbing to disable A/D bits in the MMU based on a new role
      bit, ad_disabled. When A/D is disabled, the MMU operates as though A/D
      aren't available (i.e., using access tracking faults instead).
      
      To avoid SP -> kvm_mmu_page.role.ad_disabled lookups all over the
      place, A/D disablement is now stored in the SPTE. This state is stored
      in the SPTE by tweaking the use of SPTE_SPECIAL_MASK for access
      tracking. Rather than just setting SPTE_SPECIAL_MASK when an
      access-tracking SPTE is non-present, we now always set
      SPTE_SPECIAL_MASK for access-tracking SPTEs.
      Signed-off-by: NPeter Feiner <pfeiner@google.com>
      [Use role.ad_disabled even for direct (non-shadow) EPT page tables.  Add
       documentation and a few MMU_WARN_ONs. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ac8d57e5
  12. 04 6月, 2017 1 次提交
  13. 17 5月, 2017 1 次提交
    • P
      KVM: x86: lower default for halt_poll_ns · b401ee0b
      Paolo Bonzini 提交于
      In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
      increase heavily even in cases where the performance improvement was
      small.  In particular, bandwidth divided by CPU usage was as much as
      60% lower.
      
      To some extent this is the expected effect of the patch, and the
      additional CPU utilization is only visible when running the
      benchmarks.  However, halving the threshold also halves the extra
      CPU utilization (from +30-130% to +20-70%) and has no negative
      effect on performance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b401ee0b
  14. 09 5月, 2017 1 次提交
  15. 02 5月, 2017 1 次提交
  16. 27 4月, 2017 2 次提交
    • P
      KVM: mark requests that need synchronization · 7a97cec2
      Paolo Bonzini 提交于
      kvm_make_all_requests() provides a synchronization that waits until all
      kicked VCPUs have acknowledged the kick.  This is important for
      KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is
      underway.
      
      This patch adds the synchronization property into all requests that are
      currently being used with kvm_make_all_requests() in order to preserve
      the current behavior and only introduce a new framework.  Removing it
      from requests where it is not necessary is left for future patches.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a97cec2
    • R
      KVM: mark requests that do not need a wakeup · 930f7fd6
      Radim Krčmář 提交于
      Some operations must ensure that the guest is not running with stale
      data, but if the guest is halted, then the update can wait until another
      event happens.  kvm_make_all_requests() currently doesn't wake up, so we
      can mark all requests used with it.
      
      First 8 bits were arbitrarily reserved for request numbers.
      
      Most uses of requests have the request type as a constant, so a compiler
      will optimize the '&'.
      
      An alternative would be to have an inline function that would return
      whether the request needs a wake-up or not, but I like this one better
      even though it might produce worse assembly.
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      930f7fd6
  17. 21 4月, 2017 1 次提交
  18. 13 4月, 2017 1 次提交
  19. 07 4月, 2017 2 次提交
    • P
      kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public · 4b4357e0
      Paolo Bonzini 提交于
      Its value has never changed; we might as well make it part of the ABI instead
      of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).
      
      Because PPC does not always make MMIO available, the code has to be made
      dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      4b4357e0
    • P
      kvm: nVMX: support EPT accessed/dirty bits · ae1e2d10
      Paolo Bonzini 提交于
      Now use bit 6 of EPTP to optionally enable A/D bits for EPTP.  Another
      thing to change is that, when EPT accessed and dirty bits are not in use,
      VMX treats accesses to guest paging structures as data reads.  When they
      are in use (bit 6 of EPTP is set), they are treated as writes and the
      corresponding EPT dirty bit is set.  The MMU didn't know this detail,
      so this patch adds it.
      
      We also have to fix up the exit qualification.  It may be wrong because
      KVM sets bit 6 but the guest might not.
      
      L1 emulates EPT A/D bits using write permissions, so in principle it may
      be possible for EPT A/D bits to be used by L1 even though not available
      in hardware.  The problem is that guest page-table walks will be treated
      as reads rather than writes, so they would not cause an EPT violation.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Fixed typo in walk_addr_generic() comment and changed bit clear +
       conditional-set pattern in handle_ept_violation() to conditional-clear]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ae1e2d10
  20. 17 2月, 2017 1 次提交
  21. 15 2月, 2017 1 次提交
    • P
      KVM: x86: do not scan IRR twice on APICv vmentry · 76dfafd5
      Paolo Bonzini 提交于
      Calls to apic_find_highest_irr are scanning IRR twice, once
      in vmx_sync_pir_from_irr and once in apic_search_irr.  Change
      sync_pir_from_irr to get the new maximum IRR from kvm_apic_update_irr;
      now that it does the computation, it can also do the RVI write.
      
      In order to avoid complications in svm.c, make the callback optional.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76dfafd5
  22. 09 1月, 2017 7 次提交
    • P
      KVM: x86: add VCPU stat for KVM_REQ_EVENT processing · 0f1e261e
      Paolo Bonzini 提交于
      This statistic can be useful to estimate the cost of an IRQ injection
      scenario, by comparing it with irq_injections.  For example the stat
      shows that sti;hlt triggers more KVM_REQ_EVENT than sti;nop.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f1e261e
    • T
      kvm: svm: Use the hardware provided GPA instead of page walk · 0f89b207
      Tom Lendacky 提交于
      When a guest causes a NPF which requires emulation, KVM sometimes walks
      the guest page tables to translate the GVA to a GPA. This is unnecessary
      most of the time on AMD hardware since the hardware provides the GPA in
      EXITINFO2.
      
      The only exception cases involve string operations involving rep or
      operations that use two memory locations. With rep, the GPA will only be
      the value of the initial NPF and with dual memory locations we won't know
      which memory address was translated into EXITINFO2.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f89b207
    • J
      kvm: x86: mmu: Lockless access tracking for Intel CPUs without EPT A bits. · f160c7b7
      Junaid Shahid 提交于
      This change implements lockless access tracking for Intel CPUs without EPT
      A bits. This is achieved by marking the PTEs as not-present (but not
      completely clearing them) when clear_flush_young() is called after marking
      the pages as accessed. When an EPT Violation is generated as a result of
      the VM accessing those pages, the PTEs are restored to their original values.
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f160c7b7
    • J
      kvm: x86: mmu: Do not use bit 63 for tracking special SPTEs · 37f0e8fe
      Junaid Shahid 提交于
      MMIO SPTEs currently set both bits 62 and 63 to distinguish them as special
      PTEs. However, bit 63 is used as the SVE bit in Intel EPT PTEs. The SVE bit
      is ignored for misconfigured PTEs but not necessarily for not-Present PTEs.
      Since MMIO SPTEs use an EPT misconfiguration, so using bit 63 for them is
      acceptable. However, the upcoming fast access tracking feature adds another
      type of special tracking PTE, which uses not-Present PTEs and hence should
      not set bit 63.
      
      In order to use common bits to distinguish both type of special PTEs, we
      now use only bit 62 as the special bit.
      Signed-off-by: NJunaid Shahid <junaids@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37f0e8fe
    • D
      kvm: x86: reduce collisions in mmu_page_hash · 114df303
      David Matlack 提交于
      When using two-dimensional paging, the mmu_page_hash (which provides
      lookups for existing kvm_mmu_page structs), becomes imbalanced; with
      too many collisions in buckets 0 and 512. This has been seen to cause
      mmu_lock to be held for multiple milliseconds in kvm_mmu_get_page on
      VMs with a large amount of RAM mapped with 4K pages.
      
      The current hash function uses the lower 10 bits of gfn to index into
      mmu_page_hash. When doing shadow paging, gfn is the address of the
      guest page table being shadow. These tables are 4K-aligned, which
      makes the low bits of gfn a good hash. However, with two-dimensional
      paging, no guest page tables are being shadowed, so gfn is the base
      address that is mapped by the table. Thus page tables (level=1) have
      a 2MB aligned gfn, page directories (level=2) have a 1GB aligned gfn,
      etc. This means hashes will only differ in their 10th bit.
      
      hash_64() provides a better hash. For example, on a VM with ~200G
      (99458 direct=1 kvm_mmu_page structs):
      
      hash            max_mmu_page_hash_collisions
      --------------------------------------------
      low 10 bits     49847
      hash_64         105
      perfect         97
      
      While we're changing the hash, increase the table size by 4x to better
      support large VMs (further reduces number of collisions in 200G VM to
      29).
      
      Note that hash_64() does not provide a good distribution prior to commit
      ef703f49 ("Eliminate bad hash multipliers from hash_32() and
      hash_64()").
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Change-Id: I5aa6b13c834722813c6cca46b8b1ed6f53368ade
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      114df303
    • D
      kvm: x86: export maximum number of mmu_page_hash collisions · f3414bc7
      David Matlack 提交于
      Report the maximum number of mmu_page_hash collisions as a per-VM stat.
      This will make it easy to identify problems with the mmu_page_hash in
      the future.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f3414bc7
    • R
      KVM: x86: decouple irqchip_in_kernel() and pic_irqchip() · 49776faf
      Radim Krčmář 提交于
      irqchip_in_kernel() tried to save a bit by reusing pic_irqchip(), but it
      just complicated the code.
      Add a separate state for the irqchip mode.
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      [Used Paolo's version of condition in irqchip_in_kernel().]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      49776faf