1. 18 6月, 2021 6 次提交
  2. 27 5月, 2021 1 次提交
  3. 03 5月, 2021 3 次提交
  4. 20 4月, 2021 2 次提交
    • B
      KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns · 4c6654bd
      Ben Gardon 提交于
      To avoid saddling a vCPU thread with the work of tearing down an entire
      paging structure, take a reference on each root before they become
      obsolete, so that the thread initiating the fast invalidation can tear
      down the paging structure and (most likely) release the last reference.
      As a bonus, this teardown can happen under the MMU lock in read mode so
      as not to block the progress of vCPU threads.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20210401233736.638171-14-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4c6654bd
    • B
      KVM: x86/mmu: Fast invalidation for TDP MMU · b7cccd39
      Ben Gardon 提交于
      Provide a real mechanism for fast invalidation by marking roots as
      invalid so that their reference count will quickly fall to zero
      and they will be torn down.
      
      One negative side affect of this approach is that a vCPU thread will
      likely drop the last reference to a root and be saddled with the work of
      tearing down an entire paging structure. This issue will be resolved in
      a later commit.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20210401233736.638171-13-bgardon@google.com>
      [Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b7cccd39
  5. 19 4月, 2021 12 次提交
  6. 17 4月, 2021 9 次提交
    • S
      KVM: Move x86's MMU notifier memslot walkers to generic code · 3039bcc7
      Sean Christopherson 提交于
      Move the hva->gfn lookup for MMU notifiers into common code.  Every arch
      does a similar lookup, and some arch code is all but identical across
      multiple architectures.
      
      In addition to consolidating code, this will allow introducing
      optimizations that will benefit all architectures without incurring
      multiple walks of the memslots, e.g. by taking mmu_lock if and only if a
      relevant range exists in the memslots.
      
      The use of __always_inline to avoid indirect call retpolines, as done by
      x86, may also benefit other architectures.
      
      Consolidating the lookups also fixes a wart in x86, where the legacy MMU
      and TDP MMU each do their own memslot walks.
      
      Lastly, future enhancements to the memslot implementation, e.g. to add an
      interval tree to track host address, will need to touch far less arch
      specific code.
      
      MIPS, PPC, and arm64 will be converted one at a time in future patches.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210402005658.3024832-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3039bcc7
    • S
      KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMU · 8f8f52a4
      Sean Christopherson 提交于
      Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs,
      as opposed to the fancy ffs()+clear_bit() logic that was copied from the
      legacy MMU.  The legacy MMU uses clear_bit() because it is operating on
      the SPTE itself, i.e. clearing needs to be atomic.  The TDP MMU operates
      on a local variable that it later writes to the SPTE, and so doesn't need
      to be atomic or even resident in memory.
      
      Opportunistically drop unnecessary initialization of new_spte, it's
      guaranteed to be written before being accessed.
      
      Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from:
      
         0x0000000000058be6 <+134>:	test   %rax,%rax
         0x0000000000058be9 <+137>:	je     0x58bf4 <age_gfn_range+148>
         0x0000000000058beb <+139>:	test   %rax,%rdi
         0x0000000000058bee <+142>:	je     0x58cdc <age_gfn_range+380>
         0x0000000000058bf4 <+148>:	mov    %rdi,0x8(%rsp)
         0x0000000000058bf9 <+153>:	mov    $0xffffffff,%edx
         0x0000000000058bfe <+158>:	bsf    %eax,%edx
         0x0000000000058c01 <+161>:	movslq %edx,%rdx
         0x0000000000058c04 <+164>:	lock btr %rdx,0x8(%rsp)
         0x0000000000058c0b <+171>:	mov    0x8(%rsp),%r15
      
      to:
      
         0x0000000000058bdd <+125>:	test   %rax,%rax
         0x0000000000058be0 <+128>:	je     0x58beb <age_gfn_range+139>
         0x0000000000058be2 <+130>:	test   %rax,%r8
         0x0000000000058be5 <+133>:	je     0x58cc0 <age_gfn_range+352>
         0x0000000000058beb <+139>:	not    %rax
         0x0000000000058bee <+142>:	and    %r8,%rax
         0x0000000000058bf1 <+145>:	mov    %rax,%r15
      
      thus eliminating several memory accesses, including a locked access.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210331004942.2444916-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f8f52a4
    • S
      KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTE · 6d9aafb9
      Sean Christopherson 提交于
      Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU
      notifier event).  Prematurely clearing the dirty bit could cause spurious
      PML updates if aging a page happened to coincide with dirty logging.
      
      Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(),
      so the host PFN will be marked dirty, i.e. there is no potential for data
      corruption.
      
      Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210331004942.2444916-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6d9aafb9
    • S
      KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint · 6dfbd6b5
      Sean Christopherson 提交于
      Remove x86's trace_kvm_age_page() tracepoint.  It's mostly redundant with
      the common trace_kvm_age_hva() tracepoint, and if there is a need for the
      extra details, e.g. gfn, referenced, etc... those details should be added
      to the common tracepoint so that all architectures and MMUs benefit from
      the info.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-19-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6dfbd6b5
    • S
      KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTE · aaaac889
      Sean Christopherson 提交于
      Use the leaf-only TDP iterator when changing the SPTE in reaction to a
      MMU notifier.  Practically speaking, this is a nop since the guts of the
      loop explicitly looks for 4k SPTEs, which are always leaf SPTEs.  Switch
      the iterator to match age_gfn_range() and test_age_gfn() so that a future
      patch can consolidate the core iterating logic.
      
      No real functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      aaaac889
    • S
      KVM: x86/mmu: Pass address space ID to TDP MMU root walkers · a3f15bda
      Sean Christopherson 提交于
      Move the address space ID check that is performed when iterating over
      roots into the macro helpers to consolidate code.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a3f15bda
    • S
      KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() · 2b9663d8
      Sean Christopherson 提交于
      Pass the address space ID to TDP MMU's primary "zap gfn range" helper to
      allow the MMU notifier paths to iterate over memslots exactly once.
      Currently, both the legacy MMU and TDP MMU iterate over memslots when
      looking for an overlapping hva range, which can be quite costly if there
      are a large number of memslots.
      
      Add a "flush" parameter so that iterating over multiple address spaces
      in the caller will continue to do the right thing when yielding while a
      flush is pending from a previous address space.
      
      Note, this also has a functional change in the form of coalescing TLB
      flushes across multiple address spaces in kvm_zap_gfn_range(), and also
      optimizes the TDP MMU to utilize range-based flushing when running as L1
      with Hyper-V enlightenments.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-6-seanjc@google.com>
      [Keep separate for loops to prepare for other incoming patches. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2b9663d8
    • S
      KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs · 142ccde1
      Sean Christopherson 提交于
      Gather pending TLB flushes across both the legacy and TDP MMUs when
      zapping collapsible SPTEs to avoid multiple flushes if both the legacy
      MMU (for nested guests) and TDP MMU have mappings for the memslot.
      
      Note, this also optimizes the TDP MMU to flush only the relevant range
      when running as L1 with Hyper-V enlightenments.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      142ccde1
    • S
      KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs · af95b53e
      Sean Christopherson 提交于
      When zapping collapsible SPTEs across multiple roots, gather pending
      flushes and perform a single remote TLB flush at the end, as opposed to
      flushing after processing every root.
      
      Note, flush may be cleared by the result of zap_collapsible_spte_range().
      This is intended and correct, e.g. yielding may have serviced a prior
      pending flush.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af95b53e
  7. 31 3月, 2021 2 次提交
    • S
      KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641
      Sean Christopherson 提交于
      Prevent the TDP MMU from yielding when zapping a gfn range during NX
      page recovery.  If a flush is pending from a previous invocation of the
      zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
      has not accumulated a flush for the current invocation, then yielding
      will release mmu_lock with stale TLB entries.
      
      That being said, this isn't technically a bug fix in the current code, as
      the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
      will yield if and only if it has made forward progress, as defined by the
      current gfn vs. the last yielded (or starting) gfn.  Because zapping a
      single shadow page is guaranteed to (a) find that page and (b) step
      sideways at the level of the shadow page, the TDP iter will break its loop
      before getting a chance to yield.
      
      But that is all very, very subtle, and will break at the slightest sneeze,
      e.g. zapping while holding mmu_lock for read would break as the TDP MMU
      wouldn't be guaranteed to see the present shadow page, and thus could step
      sideways at a lower level.
      
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-4-seanjc@google.com>
      [Add lockdep assertion. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33a31641
    • S
      KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap · a835429c
      Sean Christopherson 提交于
      When flushing a range of GFNs across multiple roots, ensure any pending
      flush from a previous root is honored before yielding while walking the
      tables of the current root.
      
      Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
      "flush" with the result to avoid redundant flushes.  zap_gfn_range()
      preserves and return the incoming "flush", unless of course the flush was
      performed prior to yielding and no new flush was triggered.
      
      Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
      Cc: stable@vger.kernel.org
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210325200119.1359384-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a835429c
  8. 18 3月, 2021 1 次提交
    • I
      x86: Fix various typos in comments · d9f6e12f
      Ingo Molnar 提交于
      Fix ~144 single-word typos in arch/x86/ code comments.
      
      Doing this in a single commit should reduce the churn.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      d9f6e12f
  9. 17 3月, 2021 4 次提交