提交 · 3039bcc744980afe87c612122e47a27306483bc2 · openeuler / Kernel

17 4月, 2021 14 次提交

KVM: Move x86's MMU notifier memslot walkers to generic code · 3039bcc7

由 Sean Christopherson 提交于 4月 01, 2021

Move the hva->gfn lookup for MMU notifiers into common code.  Every arch
does a similar lookup, and some arch code is all but identical across
multiple architectures.

In addition to consolidating code, this will allow introducing
optimizations that will benefit all architectures without incurring
multiple walks of the memslots, e.g. by taking mmu_lock if and only if a
relevant range exists in the memslots.

The use of __always_inline to avoid indirect call retpolines, as done by
x86, may also benefit other architectures.

Consolidating the lookups also fixes a wart in x86, where the legacy MMU
and TDP MMU each do their own memslot walks.

Lastly, future enhancements to the memslot implementation, e.g. to add an
interval tree to track host address, will need to touch far less arch
specific code.

MIPS, PPC, and arm64 will be converted one at a time in future patches.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3039bcc7

KVM: constify kvm_arch_flush_remote_tlbs_memslot · 6c9dd6d2

由 Paolo Bonzini 提交于 4月 02, 2021

memslots are stored in RCU and there should be no need to
change them.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c9dd6d2

KVM: MMU: protect TDP MMU pages only down to required level · dbb6964e

由 Paolo Bonzini 提交于 4月 02, 2021

When using manual protection of dirty pages, it is not necessary
to protect nested page tables down to the 4K level; instead KVM
can protect only hugepages in order to split them lazily, and
delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG.
This was overlooked in the TDP MMU, so do it there as well.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dbb6964e

KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMU · 8f8f52a4

由 Sean Christopherson 提交于 3月 30, 2021

Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs,
as opposed to the fancy ffs()+clear_bit() logic that was copied from the
legacy MMU.  The legacy MMU uses clear_bit() because it is operating on
the SPTE itself, i.e. clearing needs to be atomic.  The TDP MMU operates
on a local variable that it later writes to the SPTE, and so doesn't need
to be atomic or even resident in memory.

Opportunistically drop unnecessary initialization of new_spte, it's
guaranteed to be written before being accessed.

Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from:

   0x0000000000058be6 <+134>:	test   %rax,%rax
   0x0000000000058be9 <+137>:	je     0x58bf4 <age_gfn_range+148>
   0x0000000000058beb <+139>:	test   %rax,%rdi
   0x0000000000058bee <+142>:	je     0x58cdc <age_gfn_range+380>
   0x0000000000058bf4 <+148>:	mov    %rdi,0x8(%rsp)
   0x0000000000058bf9 <+153>:	mov    $0xffffffff,%edx
   0x0000000000058bfe <+158>:	bsf    %eax,%edx
   0x0000000000058c01 <+161>:	movslq %edx,%rdx
   0x0000000000058c04 <+164>:	lock btr %rdx,0x8(%rsp)
   0x0000000000058c0b <+171>:	mov    0x8(%rsp),%r15

to:

   0x0000000000058bdd <+125>:	test   %rax,%rax
   0x0000000000058be0 <+128>:	je     0x58beb <age_gfn_range+139>
   0x0000000000058be2 <+130>:	test   %rax,%r8
   0x0000000000058be5 <+133>:	je     0x58cc0 <age_gfn_range+352>
   0x0000000000058beb <+139>:	not    %rax
   0x0000000000058bee <+142>:	and    %r8,%rax
   0x0000000000058bf1 <+145>:	mov    %rax,%r15

thus eliminating several memory accesses, including a locked access.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210331004942.2444916-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f8f52a4

KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTE · 6d9aafb9

由 Sean Christopherson 提交于 3月 30, 2021

Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU
notifier event).  Prematurely clearing the dirty bit could cause spurious
PML updates if aging a page happened to coincide with dirty logging.

Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(),
so the host PFN will be marked dirty, i.e. there is no potential for data
corruption.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210331004942.2444916-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d9aafb9

KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint · 6dfbd6b5

由 Sean Christopherson 提交于 3月 25, 2021

Remove x86's trace_kvm_age_page() tracepoint.  It's mostly redundant with
the common trace_kvm_age_hva() tracepoint, and if there is a need for the
extra details, e.g. gfn, referenced, etc... those details should be added
to the common tracepoint so that all architectures and MMUs benefit from
the info.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6dfbd6b5

KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTE · aaaac889

由 Sean Christopherson 提交于 3月 25, 2021

Use the leaf-only TDP iterator when changing the SPTE in reaction to a
MMU notifier.  Practically speaking, this is a nop since the guts of the
loop explicitly looks for 4k SPTEs, which are always leaf SPTEs.  Switch
the iterator to match age_gfn_range() and test_age_gfn() so that a future
patch can consolidate the core iterating logic.

No real functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aaaac889

KVM: x86/mmu: Pass address space ID to TDP MMU root walkers · a3f15bda

由 Sean Christopherson 提交于 3月 25, 2021

Move the address space ID check that is performed when iterating over
roots into the macro helpers to consolidate code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3f15bda

KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() · 2b9663d8

由 Sean Christopherson 提交于 3月 25, 2021

Pass the address space ID to TDP MMU's primary "zap gfn range" helper to
allow the MMU notifier paths to iterate over memslots exactly once.
Currently, both the legacy MMU and TDP MMU iterate over memslots when
looking for an overlapping hva range, which can be quite costly if there
are a large number of memslots.

Add a "flush" parameter so that iterating over multiple address spaces
in the caller will continue to do the right thing when yielding while a
flush is pending from a previous address space.

Note, this also has a functional change in the form of coalescing TLB
flushes across multiple address spaces in kvm_zap_gfn_range(), and also
optimizes the TDP MMU to utilize range-based flushing when running as L1
with Hyper-V enlightenments.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-6-seanjc@google.com>
[Keep separate for loops to prepare for other incoming patches. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b9663d8

KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zap · 1a61b7db

由 Sean Christopherson 提交于 3月 25, 2021

Gather pending TLB flushes across both address spaces when zapping a
given gfn range.  This requires feeding "flush" back into subsequent
calls, but on the plus side sets the stage for further batching
between the legacy MMU and TDP MMU.  It also allows refactoring the
address space iteration to cover the legacy and TDP MMUs without
introducing truly ugly code.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1a61b7db

KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs · 142ccde1

由 Sean Christopherson 提交于 3月 25, 2021

Gather pending TLB flushes across both the legacy and TDP MMUs when
zapping collapsible SPTEs to avoid multiple flushes if both the legacy
MMU (for nested guests) and TDP MMU have mappings for the memslot.

Note, this also optimizes the TDP MMU to flush only the relevant range
when running as L1 with Hyper-V enlightenments.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

142ccde1

KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU · 302695a5

由 Sean Christopherson 提交于 3月 25, 2021

Place the onus on the caller of slot_handle_*() to flush the TLB, rather
than handling the flush in the helper, and rename parameters accordingly.
This will allow future patches to coalesce flushes between address spaces
and between the legacy and TDP MMUs.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

302695a5

KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs · af95b53e

由 Sean Christopherson 提交于 3月 25, 2021

When zapping collapsible SPTEs across multiple roots, gather pending
flushes and perform a single remote TLB flush at the end, as opposed to
flushing after processing every root.

Note, flush may be cleared by the result of zap_collapsible_spte_range().
This is intended and correct, e.g. yielding may have serviced a prior
pending flush.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

af95b53e

KVM: MMU: load PDPTRs outside mmu_lock · 4a38162e

由 Paolo Bonzini 提交于 4月 08, 2021

On SVM, reading PDPTRs might access guest memory, which might fault
and thus might sleep.  On the other hand, it is not possible to
release the lock after make_mmu_pages_available has been called.

Therefore, push the call to make_mmu_pages_available and the
mmu_lock critical section within mmu_alloc_direct_roots and
mmu_alloc_shadow_roots.
Reported-by: NWanpeng Li <wanpengli@tencent.com>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a38162e

31 3月, 2021 3 次提交

KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641

由 Sean Christopherson 提交于 3月 25, 2021

Prevent the TDP MMU from yielding when zapping a gfn range during NX
page recovery.  If a flush is pending from a previous invocation of the
zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
has not accumulated a flush for the current invocation, then yielding
will release mmu_lock with stale TLB entries.

That being said, this isn't technically a bug fix in the current code, as
the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
will yield if and only if it has made forward progress, as defined by the
current gfn vs. the last yielded (or starting) gfn.  Because zapping a
single shadow page is guaranteed to (a) find that page and (b) step
sideways at the level of the shadow page, the TDP iter will break its loop
before getting a chance to yield.

But that is all very, very subtle, and will break at the slightest sneeze,
e.g. zapping while holding mmu_lock for read would break as the TDP MMU
wouldn't be guaranteed to see the present shadow page, and thus could step
sideways at a lower level.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-4-seanjc@google.com>
[Add lockdep assertion. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33a31641

KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping · 048f4980

由 Sean Christopherson 提交于 3月 25, 2021

Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
does the flush itself if and only if it yields (which it will never do in
this particular scenario), and otherwise expects the caller to do the
flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
no flush will occur.

Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-3-seanjc@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

048f4980

KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap · a835429c

由 Sean Christopherson 提交于 3月 25, 2021

When flushing a range of GFNs across multiple roots, ensure any pending
flush from a previous root is honored before yielding while walking the
tables of the current root.

Note, kvm_tdp_mmu_zap_gfn_range() now intentionally overwrites its local
"flush" with the result to avoid redundant flushes.  zap_gfn_range()
preserves and return the incoming "flush", unless of course the flush was
performed prior to yielding and no new flush was triggered.

Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Cc: stable@vger.kernel.org
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a835429c

17 3月, 2021 4 次提交

KVM: x86/mmu: Store the address space ID in the TDP iterator · 08889894

由 Sean Christopherson 提交于 3月 15, 2021

Store the address space ID in the TDP iterator so that it can be
retrieved without having to bounce through the root shadow page.  This
streamlines the code and fixes a Sparse warning about not properly using
rcu_dereference() when grabbing the ID from the root on the fly.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-5-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08889894

KVM: x86/mmu: Factor out tdp_iter_return_to_root · b601c3bc

由 Ben Gardon 提交于 3月 15, 2021

In tdp_mmu_iter_cond_resched there is a call to tdp_iter_start which
causes the iterator to continue its walk over the paging structure from
the root. This is needed after a yield as paging structure could have
been freed in the interim.

The tdp_iter_start call is not very clear and something of a hack. It
requires exposing tdp_iter fields not used elsewhere in tdp_mmu.c and
the effect is not obvious from the function name. Factor a more aptly
named function out of tdp_iter_start and call it from
tdp_mmu_iter_cond_resched and tdp_iter_start.

No functional change intended.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-4-bgardon@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b601c3bc

KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs · 14f6fec2

由 Ben Gardon 提交于 3月 15, 2021

Fix a missing rcu_dereference in tdp_mmu_zap_spte_atomic.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-3-bgardon@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14f6fec2

KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page · 70fb3e41

由 Ben Gardon 提交于 3月 15, 2021

The pt passed into handle_removed_tdp_mmu_page does not need RCU
protection, as it is not at any risk of being freed by another thread at
that point. However, the implicit cast from tdp_sptep_t to u64 * dropped
the __rcu annotation without a proper rcu_derefrence. Fix this by
passing the pt as a tdp_ptep_t and then rcu_dereferencing it in
the function.
Suggested-by: NSean Christopherson <seanjc@google.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210315233803.2706477-2-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

70fb3e41

15 3月, 2021 19 次提交

KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging · 4a98623d

由 Sean Christopherson 提交于 3月 09, 2021

Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging.  Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU.  The extra documentation is nice to have.

Note, there are several subtleties at play with NPT.  In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT.  However, 32-bit Linux, and thus KVM, doesn't support SME.  And
64-bit KVM can happily set the C-bit in CR3.  This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210309224207.1218275-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a98623d

KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root · c834e5e4

由 Sean Christopherson 提交于 3月 09, 2021

Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
Unlike root_hpa, the pae_roots hold permission bits and thus are
guaranteed to be non-zero.  Having to deal with both values leads to
bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
value, etc...
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210309224207.1218275-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c834e5e4

KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE · bb4cdf3a

由 Sean Christopherson 提交于 2月 25, 2021

Debugging unexpected reserved bit page faults sucks.  Dump the reserved
bits that (likely) caused the page fault to make debugging suck a little
less.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-25-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb4cdf3a

KVM: x86/mmu: Use low available bits for removed SPTEs · 715f1079

由 Sean Christopherson 提交于 2月 25, 2021

Use low "available" bits to tag REMOVED SPTEs.  Using a high bit is
moderately costly as it often causes the compiler to generate a 64-bit
immediate.  More importantly, this makes it very clear REMOVED_SPTE is
a value, not a flag.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-24-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

715f1079

KVM: x86/mmu: Use is_removed_spte() instead of open coded equivalents · 7a51393a

由 Sean Christopherson 提交于 2月 25, 2021

Use the is_removed_spte() helper instead of open coding the check.

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-23-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a51393a

KVM: x86/mmu: Tweak auditing WARN for A/D bits to !PRESENT (was MMIO) · 8f366ae6

由 Sean Christopherson 提交于 2月 25, 2021

Tweak the MMU_WARN that guards against weirdness when querying A/D status
to fire on a !MMU_PRESENT SPTE, as opposed to a MMIO SPTE.  Attempting to
query A/D status on any kind of !MMU_PRESENT SPTE, MMIO or otherwise,
indicates a KVM bug.  Case in point, several now-fixed bugs were
identified by enabling this new WARN.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-22-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f366ae6

KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs · edea7c4f

由 Sean Christopherson 提交于 2月 25, 2021

Introduce MMU_PRESENT to explicitly track which SPTEs are "present" from
the MMU's perspective. Checking for shadow-present SPTEs is a very
common operation for the MMU, particularly in hot paths such as page
faults. With the addition of "removed" SPTEs for the TDP MMU,
identifying shadow-present SPTEs is quite costly especially since it
requires checking multiple 64-bit values.

On 64-bit KVM, this reduces the footprint of kvm.ko's .text by ~2k bytes.
On 32-bit KVM, this increases the footprint by ~200 bytes, but only
because gcc now inlines several more MMU helpers, e.g. drop_parent_pte().

We now need to drop bit 11, used for the MMU_PRESENT flag, from
the set of bits used to store the generation number in MMIO SPTEs.
Otherwise MMIO SPTEs with bit 11 set would get false positives for
is_shadow_present_spte() and lead to a variety of fireworks, from oopses
to likely hangs of the host kernel.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-21-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

edea7c4f

KVM: x86/mmu: Use high bits for host/mmu writable masks for EPT SPTEs · 613a3f37

由 Sean Christopherson 提交于 2月 25, 2021

Use bits 57 and 58 for HOST_WRITABLE and MMU_WRITABLE when using EPT.
This will allow using bit 11 as a constant MMU_PRESENT, which is
desirable as checking for a shadow-present SPTE is one of the most
common SPTE operations in KVM, particular in hot paths such as page
faults.

EPT is short on low available bits; currently only bit 11 is the only
always-available bit.  Bit 10 is also available, but only while KVM
doesn't support mode-based execution.  On the other hand, PAE paging
doesn't have _any_ high available bits.  Thus, using bit 11 is the only
feasible option for MMU_PRESENT.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-20-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

613a3f37

KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic · 5fc3424f

由 Sean Christopherson 提交于 2月 25, 2021

Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for
a given KVM instance.  This will allow EPT to use high available bits,
which in turn will free up bit 11 for a constant MMU_PRESENT bit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5fc3424f

KVM: x86/mmu: Move logic for setting SPTE masks for EPT into the MMU proper · e7b7bdea

由 Sean Christopherson 提交于 2月 25, 2021

Let the MMU deal with the SPTE masks to avoid splitting the logic and
knowledge across the MMU and VMX.

The SPTE masks that are used for EPT are very, very tightly coupled to
the MMU implementation.  The use of available bits, the existence of A/D
types, the fact that shadow_x_mask even exists, and so on and so forth
are all baked into the MMU implementation.  Cross referencing the params
to the masks is also a nightmare, as pretty much every param is a u64.

A future patch will make the location of the MMU_WRITABLE and
HOST_WRITABLE bits MMU specific, to free up bit 11 for a MMU_PRESENT bit.
Doing that change with the current kvm_mmu_set_mask_ptes() would be an
absolute mess.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-18-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e7b7bdea

KVM: x86/mmu: Co-locate code for setting various SPTE masks · d6b87f25

由 Sean Christopherson 提交于 2月 25, 2021

Squish all the code for (re)setting the various SPTE masks into one
location.  With the split code, it's not at all clear that the masks are
set once during module initialization.  This will allow a future patch to
clean up initialization of the masks without shuffling code all over
tarnation.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-17-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6b87f25

KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper · ec761cfd

由 Sean Christopherson 提交于 2月 25, 2021

Move kvm_mmu_set_mask_ptes() into mmu.c as prep for future cleanup of the
mask initialization code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-16-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec761cfd

KVM: x86/mmu: Document dependency bewteen TDP A/D type and saved bits · c4827eab

由 Sean Christopherson 提交于 2月 25, 2021

Document that SHADOW_ACC_TRACK_SAVED_BITS_SHIFT is directly dependent on
bits 53:52 being used to track the A/D type.

Remove PT64_SECOND_AVAIL_BITS_SHIFT as it is at best misleading, and at
worst wrong.  For PAE paging, which arguably is a variant of PT64, the
bits are reserved.  For MMIO SPTEs the bits are not available as they're
used for the MMIO generation.  For access tracked SPTEs, they are also
not available as bits 56:54 are used to store the original RX bits.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c4827eab

KVM: x86/mmu: Use MMIO SPTE bits 53 and 52 for the MMIO generation · b0de5680

由 Sean Christopherson 提交于 2月 25, 2021

Use bits 53 and 52 for the MMIO generation now that they're not used to
identify MMIO SPTEs.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b0de5680

KVM: x86/mmu: Rename and document A/D scheme for TDP SPTEs · 8a406c89

由 Sean Christopherson 提交于 2月 25, 2021

Rename the various A/D status defines to explicitly associated them with
TDP.  There is a subtle dependency on the bits in question never being
set when using PAE paging, as those bits are reserved, not available.
I.e. using these bits outside of TDP (technically EPT) would cause
explosions.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a406c89

KVM: x86/mmu: Add module param to disable MMIO caching (for testing) · b09763da

由 Sean Christopherson 提交于 2月 25, 2021

Add a module param to disable MMIO caching so that it's possible to test
the related flows without access to the necessary hardware. Using shadow
paging with 64-bit KVM and 52 bits of physical address space must disable
MMIO caching as there are no reserved bits to be had.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-12-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b09763da

KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs · 8120337a

由 Sean Christopherson 提交于 2月 25, 2021

Stop tagging MMIO SPTEs with specific available bits and instead detect
MMIO SPTEs by checking for their unique SPTE value. The value is
guaranteed to be unique on shadow paging and NPT as setting reserved
physical address bits on any other type of SPTE would consistute a KVM
bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug.

Note, this approach is also future-compatibile with TDX, which will need
to reflect MMIO EPT violations as #VEs into the guest. To create an EPT
violation instead of a misconfig, TDX EPTs will need to have RWX=0, But,
MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so
MMIO SPTEs will still be guaranteed to have a unique value within a given
MMU context.

The main motivation is to make it easier to reason about which types of
SPTEs use which available bits. As a happy side effect, this frees up
two more bits for storing the MMIO generation.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8120337a

KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers · c236d962

由 Sean Christopherson 提交于 2月 25, 2021

The value returned by make_mmio_spte() is a SPTE, it is not a mask.
Name it accordingly.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c236d962

KVM: x86/mmu: Drop redundant trace_kvm_mmu_set_spte() in the TDP MMU · 3849e092

由 Sean Christopherson 提交于 2月 25, 2021

Remove TDP MMU's call to trace_kvm_mmu_set_spte() that is done for both
shadow-present SPTEs and MMIO SPTEs.  It's fully redundant for the
former, and unnecessary for the latter.  This aligns TDP MMU tracing
behavior with that of the legacy MMU.

Fixes: 33dd3574 ("kvm: x86/mmu: Add existing trace points to TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3849e092

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功