1. 13 3月, 2021 1 次提交
    • S
      KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode · 8df9f1af
      Sean Christopherson 提交于
      If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
      REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
      removal.  The concurrent write protections provided by REMOVED_SPTE are
      not needed, there are no backing page side effects to record, and MMIO
      SPTEs can be left as is since they are protected by the memslot
      generation, not by ensuring that the MMIO SPTE is unreachable (which
      is racy with respect to lockless walks regardless of zapping behavior).
      
      Skipping !PRESENT drastically reduces the number of updates needed to
      tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
      didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
      be skipped.
      
      Avoiding the write itself is likely close to a wash, but avoiding
      __handle_changed_spte() is a clear-cut win as that involves saving and
      restoring all non-volatile GPRs (it's a subtly big function), as well as
      several conditional branches before bailing out.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210310003029.1250571-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8df9f1af
  2. 19 2月, 2021 3 次提交
    • S
      KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML · b6e16ae5
      Sean Christopherson 提交于
      Stop setting dirty bits for MMU pages when dirty logging is disabled for
      a memslot, as PML is now completely disabled when there are no memslots
      with dirty logging enabled.
      
      This means that spurious PML entries will be created for memslots with
      dirty logging disabled if at least one other memslot has dirty logging
      enabled.  However, spurious PML entries are already possible since
      dirty bits are set only when a dirty logging is turned off, i.e. memslots
      that are never dirty logged will have dirty bits cleared.
      
      In the end, it's faster overall to eat a few spurious PML entries in the
      window where dirty logging is being disabled across all memslots.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210213005015.1651772-13-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b6e16ae5
    • S
      KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs · 9eba50f8
      Sean Christopherson 提交于
      When zapping SPTEs in order to rebuild them as huge pages, use the new
      helper that computes the max mapping level to detect whether or not a
      SPTE should be zapped.  Doing so avoids zapping SPTEs that can't
      possibly be rebuilt as huge pages, e.g. due to hardware constraints,
      memslot alignment, etc...
      
      This also avoids zapping SPTEs that are still large, e.g. if migration
      was canceled before write-protected huge pages were shattered to enable
      dirty logging.  Note, such pages are still write-protected at this time,
      i.e. a page fault VM-Exit will still occur.  This will hopefully be
      addressed in a future patch.
      
      Sadly, TDP MMU loses its const on the memslot, but that's a pervasive
      problem that's been around for quite some time.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210213005015.1651772-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9eba50f8
    • S
      KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages · c060c72f
      Sean Christopherson 提交于
      Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to
      rebuild them as huge pages in the TDP MMU.  ZONE_DEVICE huge pages are
      managed differently than "regular" pages and are not compound pages.
      Likewise, PageTransCompoundMap() will not detect HugeTLB, so switch
      to PageCompound().
      
      This matches the similar check in kvm_mmu_zap_collapsible_spte.
      
      Cc: Ben Gardon <bgardon@google.com>
      Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210213005015.1651772-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c060c72f
  3. 09 2月, 2021 2 次提交
    • M
      KVM: x86/mmu: Make HVA handler retpoline-friendly · 8f5c44f9
      Maciej S. Szmigiero 提交于
      When retpolines are enabled they have high overhead in the inner loop
      inside kvm_handle_hva_range() that iterates over the provided memory area.
      
      Let's mark this function and its TDP MMU equivalent __always_inline so
      compiler will be able to change the call to the actual handler function
      inside each of them into a direct one.
      
      This significantly improves performance on the unmap test on the existing
      kernel memslot code (tested on a Xeon 8167M machine):
      30 slots in use:
      Test       Before   After     Improvement
      Unmap      0.0353s  0.0334s   5%
      Unmap 2M   0.00104s 0.000407s 61%
      
      509 slots in use:
      Test       Before   After     Improvement
      Unmap      0.0742s  0.0740s   None
      Unmap 2M   0.00221s 0.00159s  28%
      
      Looks like having an indirect call in these functions (and, so, a
      retpoline) might have interfered with unrolling of the whole loop in the
      CPU.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f5c44f9
    • P
      KVM: x86: compile out TDP MMU on 32-bit systems · 897218ff
      Paolo Bonzini 提交于
      The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs.
      Rather than just disabling it, compile it out completely so that it
      is possible to use for example 64-bit xchg.
      
      To limit the number of stubs, wrap all accesses to tdp_mmu_enabled
      or tdp_mmu_page with a function.  Calls to all other functions in
      tdp_mmu.c are eliminated and do not even reach the linker.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      897218ff
  4. 04 2月, 2021 17 次提交
  5. 08 1月, 2021 4 次提交
    • B
      KVM: x86/mmu: Ensure TDP MMU roots are freed after yield · a889ea54
      Ben Gardon 提交于
      Many TDP MMU functions which need to perform some action on all TDP MMU
      roots hold a reference on that root so that they can safely drop the MMU
      lock in order to yield to other threads. However, when releasing the
      reference on the root, there is a bug: the root will not be freed even
      if its reference count (root_count) is reduced to 0.
      
      To simplify acquiring and releasing references on TDP MMU root pages, and
      to ensure that these roots are properly freed, move the get/put operations
      into another TDP MMU root iterator macro.
      
      Moving the get/put operations into an iterator macro also helps
      simplify control flow when a root does need to be freed. Note that using
      the list_for_each_entry_safe macro would not have been appropriate in
      this situation because it could keep a pointer to the next root across
      an MMU lock release + reacquire, during which time that root could be
      freed.
      Reported-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
      Fixes: 063afacd ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
      Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
      Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20210107001935.3732070-1-bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a889ea54
    • S
      KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array · dde81f94
      Sean Christopherson 提交于
      Bump the size of the sptes array by one and use the raw level of the
      SPTE to index into the sptes array.  Using the SPTE level directly
      improves readability by eliminating the need to reason out why the level
      is being adjusted when indexing the array.  The array is on the stack
      and is not explicitly initialized; bumping its size is nothing more than
      a superficial adjustment to the stack frame.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-4-seanjc@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dde81f94
    • S
      KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE · 39b4d43e
      Sean Christopherson 提交于
      Get the so called "root" level from the low level shadow page table
      walkers instead of manually attempting to calculate it higher up the
      stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
      the starting level of the walk, from the callers perspective, is not
      the CR3 root but rather the PDPTR "root".  Checking for reserved bits
      from the CR3 root causes get_mmio_spte() to consume uninitialized stack
      data due to indexing into sptes[] for a level that was not filled by
      get_walk().  This can result in false positives and/or negatives
      depending on what garbage happens to be on the stack.
      
      Opportunistically nuke a few extra newlines.
      
      Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
      Reported-by: NRichard Herbert <rherbert@sympatico.ca>
      Cc: Ben Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      39b4d43e
    • S
      KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() · 2aa07893
      Sean Christopherson 提交于
      Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
      least one spte, which can theoretically happen if the walk hits a
      not-present PDPTR.  Returning the root level in such a case will cause
      get_mmio_spte() to return garbage (uninitialized stack data).  In
      practice, such a scenario should be impossible as KVM shouldn't get a
      reserved-bit page fault with a not-present PDPTR.
      
      Note, using mmu->root_level in get_walk() is wrong for other reasons,
      too, but that's now a moot point.
      
      Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
      Cc: Ben Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20201218003139.2167891-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2aa07893
  6. 04 12月, 2020 1 次提交
  7. 19 11月, 2020 2 次提交
  8. 15 11月, 2020 2 次提交
    • P
      KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed
      Peter Xu 提交于
      This patch is heavily based on previous work from Lei Cao
      <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]
      
      KVM currently uses large bitmaps to track dirty memory.  These bitmaps
      are copied to userspace when userspace queries KVM for its dirty page
      information.  The use of bitmaps is mostly sufficient for live
      migration, as large parts of memory are be dirtied from one log-dirty
      pass to another.  However, in a checkpointing system, the number of
      dirty pages is small and in fact it is often bounded---the VM is
      paused when it has dirtied a pre-defined number of pages. Traversing a
      large, sparsely populated bitmap to find set bits is time-consuming,
      as is copying the bitmap to user-space.
      
      A similar issue will be there for live migration when the guest memory
      is huge while the page dirty procedure is trivial.  In that case for
      each dirty sync we need to pull the whole dirty bitmap to userspace
      and analyse every bit even if it's mostly zeros.
      
      The preferred data structure for above scenarios is a dense list of
      guest frame numbers (GFN).  This patch series stores the dirty list in
      kernel memory that can be memory mapped into userspace to allow speedy
      harvesting.
      
      This patch enables dirty ring for X86 only.  However it should be
      easily extended to other archs as well.
      
      [1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012222.5767-1-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb04a1ed
    • P
      kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9
      Paolo Bonzini 提交于
      In some cases where shadow paging is in use, the root page will
      be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
      not have an associated struct kvm_mmu_page, because it is allocated
      with alloc_page instead of kvm_mmu_alloc_page.
      
      Just return false quickly from is_tdp_mmu_root if the TDP MMU is
      not in use, which also includes the case where shadow paging is
      enabled.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c887c9b9
  9. 24 10月, 2020 1 次提交
  10. 23 10月, 2020 7 次提交