1. 17 4月, 2021 14 次提交
    • S
      KVM: Move x86's MMU notifier memslot walkers to generic code · 3039bcc7
      Sean Christopherson 提交于
      Move the hva->gfn lookup for MMU notifiers into common code.  Every arch
      does a similar lookup, and some arch code is all but identical across
      multiple architectures.
      
      In addition to consolidating code, this will allow introducing
      optimizations that will benefit all architectures without incurring
      multiple walks of the memslots, e.g. by taking mmu_lock if and only if a
      relevant range exists in the memslots.
      
      The use of __always_inline to avoid indirect call retpolines, as done by
      x86, may also benefit other architectures.
      
      Consolidating the lookups also fixes a wart in x86, where the legacy MMU
      and TDP MMU each do their own memslot walks.
      
      Lastly, future enhancements to the memslot implementation, e.g. to add an
      interval tree to track host address, will need to touch far less arch
      specific code.
      
      MIPS, PPC, and arm64 will be converted one at a time in future patches.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210402005658.3024832-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3039bcc7
    • P
      KVM: constify kvm_arch_flush_remote_tlbs_memslot · 6c9dd6d2
      Paolo Bonzini 提交于
      memslots are stored in RCU and there should be no need to
      change them.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6c9dd6d2
    • P
      KVM: MMU: protect TDP MMU pages only down to required level · dbb6964e
      Paolo Bonzini 提交于
      When using manual protection of dirty pages, it is not necessary
      to protect nested page tables down to the 4K level; instead KVM
      can protect only hugepages in order to split them lazily, and
      delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG.
      This was overlooked in the TDP MMU, so do it there as well.
      
      Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
      Cc: Ben Gardon <bgardon@google.com>
      Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbb6964e
    • S
      KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMU · 8f8f52a4
      Sean Christopherson 提交于
      Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs,
      as opposed to the fancy ffs()+clear_bit() logic that was copied from the
      legacy MMU.  The legacy MMU uses clear_bit() because it is operating on
      the SPTE itself, i.e. clearing needs to be atomic.  The TDP MMU operates
      on a local variable that it later writes to the SPTE, and so doesn't need
      to be atomic or even resident in memory.
      
      Opportunistically drop unnecessary initialization of new_spte, it's
      guaranteed to be written before being accessed.
      
      Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from:
      
         0x0000000000058be6 <+134>:	test   %rax,%rax
         0x0000000000058be9 <+137>:	je     0x58bf4 <age_gfn_range+148>
         0x0000000000058beb <+139>:	test   %rax,%rdi
         0x0000000000058bee <+142>:	je     0x58cdc <age_gfn_range+380>
         0x0000000000058bf4 <+148>:	mov    %rdi,0x8(%rsp)
         0x0000000000058bf9 <+153>:	mov    $0xffffffff,%edx
         0x0000000000058bfe <+158>:	bsf    %eax,%edx
         0x0000000000058c01 <+161>:	movslq %edx,%rdx
         0x0000000000058c04 <+164>:	lock btr %rdx,0x8(%rsp)
         0x0000000000058c0b <+171>:	mov    0x8(%rsp),%r15
      
      to:
      
         0x0000000000058bdd <+125>:	test   %rax,%rax
         0x0000000000058be0 <+128>:	je     0x58beb <age_gfn_range+139>
         0x0000000000058be2 <+130>:	test   %rax,%r8
         0x0000000000058be5 <+133>:	je     0x58cc0 <age_gfn_range+352>
         0x0000000000058beb <+139>:	not    %rax
         0x0000000000058bee <+142>:	and    %r8,%rax
         0x0000000000058bf1 <+145>:	mov    %rax,%r15
      
      thus eliminating several memory accesses, including a locked access.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210331004942.2444916-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f8f52a4
    • S
      KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTE · 6d9aafb9
      Sean Christopherson 提交于
      Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU
      notifier event).  Prematurely clearing the dirty bit could cause spurious
      PML updates if aging a page happened to coincide with dirty logging.
      
      Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(),
      so the host PFN will be marked dirty, i.e. there is no potential for data
      corruption.
      
      Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210331004942.2444916-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6d9aafb9
    • S
      KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint · 6dfbd6b5
      Sean Christopherson 提交于
      Remove x86's trace_kvm_age_page() tracepoint.  It's mostly redundant with
      the common trace_kvm_age_hva() tracepoint, and if there is a need for the
      extra details, e.g. gfn, referenced, etc... those details should be added
      to the common tracepoint so that all architectures and MMUs benefit from
      the info.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-19-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6dfbd6b5
    • S
      KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTE · aaaac889
      Sean Christopherson 提交于
      Use the leaf-only TDP iterator when changing the SPTE in reaction to a
      MMU notifier.  Practically speaking, this is a nop since the guts of the
      loop explicitly looks for 4k SPTEs, which are always leaf SPTEs.  Switch
      the iterator to match age_gfn_range() and test_age_gfn() so that a future
      patch can consolidate the core iterating logic.
      
      No real functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aaaac889
    • S
      KVM: x86/mmu: Pass address space ID to TDP MMU root walkers · a3f15bda
      Sean Christopherson 提交于
      Move the address space ID check that is performed when iterating over
      roots into the macro helpers to consolidate code.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3f15bda
    • S
      KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() · 2b9663d8
      Sean Christopherson 提交于
      Pass the address space ID to TDP MMU's primary "zap gfn range" helper to
      allow the MMU notifier paths to iterate over memslots exactly once.
      Currently, both the legacy MMU and TDP MMU iterate over memslots when
      looking for an overlapping hva range, which can be quite costly if there
      are a large number of memslots.
      
      Add a "flush" parameter so that iterating over multiple address spaces
      in the caller will continue to do the right thing when yielding while a
      flush is pending from a previous address space.
      
      Note, this also has a functional change in the form of coalescing TLB
      flushes across multiple address spaces in kvm_zap_gfn_range(), and also
      optimizes the TDP MMU to utilize range-based flushing when running as L1
      with Hyper-V enlightenments.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-6-seanjc@google.com>
      [Keep separate for loops to prepare for other incoming patches. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2b9663d8
    • S
      KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zap · 1a61b7db
      Sean Christopherson 提交于
      Gather pending TLB flushes across both address spaces when zapping a
      given gfn range.  This requires feeding "flush" back into subsequent
      calls, but on the plus side sets the stage for further batching
      between the legacy MMU and TDP MMU.  It also allows refactoring the
      address space iteration to cover the legacy and TDP MMUs without
      introducing truly ugly code.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-5-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1a61b7db
    • S
      KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs · 142ccde1
      Sean Christopherson 提交于
      Gather pending TLB flushes across both the legacy and TDP MMUs when
      zapping collapsible SPTEs to avoid multiple flushes if both the legacy
      MMU (for nested guests) and TDP MMU have mappings for the memslot.
      
      Note, this also optimizes the TDP MMU to flush only the relevant range
      when running as L1 with Hyper-V enlightenments.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      142ccde1
    • S
      KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU · 302695a5
      Sean Christopherson 提交于
      Place the onus on the caller of slot_handle_*() to flush the TLB, rather
      than handling the flush in the helper, and rename parameters accordingly.
      This will allow future patches to coalesce flushes between address spaces
      and between the legacy and TDP MMUs.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      302695a5
    • S
      KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs · af95b53e
      Sean Christopherson 提交于
      When zapping collapsible SPTEs across multiple roots, gather pending
      flushes and perform a single remote TLB flush at the end, as opposed to
      flushing after processing every root.
      
      Note, flush may be cleared by the result of zap_collapsible_spte_range().
      This is intended and correct, e.g. yielding may have serviced a prior
      pending flush.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af95b53e
    • P
      KVM: MMU: load PDPTRs outside mmu_lock · 4a38162e
      Paolo Bonzini 提交于
      On SVM, reading PDPTRs might access guest memory, which might fault
      and thus might sleep.  On the other hand, it is not possible to
      release the lock after make_mmu_pages_available has been called.
      
      Therefore, push the call to make_mmu_pages_available and the
      mmu_lock critical section within mmu_alloc_direct_roots and
      mmu_alloc_shadow_roots.
      Reported-by: NWanpeng Li <wanpengli@tencent.com>
      Co-developed-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a38162e
  2. 31 3月, 2021 3 次提交
    • S
      KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641
      Sean Christopherson 提交于
      Prevent the TDP MMU from yielding when zapping a gfn range during NX
      page recovery.  If a flush is pending from a previous invocation of the
      zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
      has not accumulated a flush for the current invocation, then yielding
      will release mmu_lock with stale TLB entries.
      
      That being said, this isn't technically a bug fix in the current code, as
      the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
      will yield if and only if it has made forward progress, as defined by the
      current gfn vs. the last yielded (or starting) gfn.  Because zapping a
      single shadow page is guaranteed to (a) find that page and (b) step
      sideways at the level of the shadow page, the TDP iter will break its loop
      before getting a chance to yield.
      
      But that is all very, very subtle, and will break at the slightest sneeze,
      e.g. zapping while holding mmu_lock for read would break as the TDP MMU
      wouldn't be guaranteed to see the present shadow page, and thus could step
      sideways at a lower level.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-4-seanjc@google.com>
      [Add lockdep assertion. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33a31641
    • S
      KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping · 048f4980
      Sean Christopherson 提交于
      Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
      does the flush itself if and only if it yields (which it will never do in
      this particular scenario), and otherwise expects the caller to do the
      flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
      no flush will occur.
      
      Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-3-seanjc@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      048f4980
    • S
      KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap · a835429c
      Sean Christopherson 提交于
      When flushing a range of GFNs across multiple roots, ensure any pending
      flush from a previous root is honored before yielding while walking the
      tables of the current root.
      
      Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
      "flush" with the result to avoid redundant flushes.  zap_gfn_range()
      preserves and return the incoming "flush", unless of course the flush was
      performed prior to yielding and no new flush was triggered.
      
      Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
      Cc: stable@vger.kernel.org
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a835429c
  3. 17 3月, 2021 4 次提交
  4. 15 3月, 2021 19 次提交