1. 15 3月, 2021 2 次提交
  2. 13 3月, 2021 1 次提交
    • S
      KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode · 8df9f1af
      Sean Christopherson 提交于
      If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
      REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
      removal.  The concurrent write protections provided by REMOVED_SPTE are
      not needed, there are no backing page side effects to record, and MMIO
      SPTEs can be left as is since they are protected by the memslot
      generation, not by ensuring that the MMIO SPTE is unreachable (which
      is racy with respect to lockless walks regardless of zapping behavior).
      
      Skipping !PRESENT drastically reduces the number of updates needed to
      tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
      didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
      be skipped.
      
      Avoiding the write itself is likely close to a wash, but avoiding
      __handle_changed_spte() is a clear-cut win as that involves saving and
      restoring all non-volatile GPRs (it's a subtly big function), as well as
      several conditional branches before bailing out.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210310003029.1250571-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8df9f1af
  3. 26 2月, 2021 1 次提交
  4. 23 2月, 2021 2 次提交
    • D
      KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848
      David Stevens 提交于
      Track the range being invalidated by mmu_notifier and skip page fault
      retries if the fault address is not affected by the in-progress
      invalidation. Handle concurrent invalidations by finding the minimal
      range which includes all ranges being invalidated. Although the combined
      range may include unrelated addresses and cannot be shrunk as individual
      invalidation operations complete, it is unlikely the marginal gains of
      proper range tracking are worth the additional complexity.
      
      The primary benefit of this change is the reduction in the likelihood of
      extreme latency when handing a page fault due to another thread having
      been preempted while modifying host virtual addresses.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210222024522.1751719-3-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a42d848
    • S
      KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault · 5f8a7cf2
      Sean Christopherson 提交于
      Don't retry a page fault due to an mmu_notifier invalidation when
      handling a page fault for a GPA that did not resolve to a memslot, i.e.
      an MMIO page fault.  Invalidations from the mmu_notifier signal a change
      in a host virtual address (HVA) mapping; without a memslot, there is no
      HVA and thus no possibility that the invalidation is relevant to the
      page fault being handled.
      
      Note, the MMIO vs. memslot generation checks handle the case where a
      pending memslot will create a memslot overlapping the faulting GPA.  The
      mmu_notifier checks are orthogonal to memslot updates.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210222024522.1751719-2-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5f8a7cf2
  5. 19 2月, 2021 10 次提交
  6. 09 2月, 2021 2 次提交
    • M
      KVM: x86/mmu: Make HVA handler retpoline-friendly · 8f5c44f9
      Maciej S. Szmigiero 提交于
      When retpolines are enabled they have high overhead in the inner loop
      inside kvm_handle_hva_range() that iterates over the provided memory area.
      
      Let's mark this function and its TDP MMU equivalent __always_inline so
      compiler will be able to change the call to the actual handler function
      inside each of them into a direct one.
      
      This significantly improves performance on the unmap test on the existing
      kernel memslot code (tested on a Xeon 8167M machine):
      30 slots in use:
      Test       Before   After     Improvement
      Unmap      0.0353s  0.0334s   5%
      Unmap 2M   0.00104s 0.000407s 61%
      
      509 slots in use:
      Test       Before   After     Improvement
      Unmap      0.0742s  0.0740s   None
      Unmap 2M   0.00221s 0.00159s  28%
      
      Looks like having an indirect call in these functions (and, so, a
      retpoline) might have interfered with unrolling of the whole loop in the
      CPU.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f5c44f9
    • P
      KVM: x86: compile out TDP MMU on 32-bit systems · 897218ff
      Paolo Bonzini 提交于
      The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs.
      Rather than just disabling it, compile it out completely so that it
      is possible to use for example 64-bit xchg.
      
      To limit the number of stubs, wrap all accesses to tdp_mmu_enabled
      or tdp_mmu_page with a function.  Calls to all other functions in
      tdp_mmu.c are eliminated and do not even reach the linker.
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Tested-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      897218ff
  7. 04 2月, 2021 22 次提交