提交 · 4a38162ee9f10f5f67c36f4f5aa4f6be2657efd5 · openeuler / Kernel

17 4月, 2021 1 次提交

KVM: MMU: load PDPTRs outside mmu_lock · 4a38162e

由 Paolo Bonzini 提交于 4月 08, 2021

On SVM, reading PDPTRs might access guest memory, which might fault
and thus might sleep.  On the other hand, it is not possible to
release the lock after make_mmu_pages_available has been called.

Therefore, push the call to make_mmu_pages_available and the
mmu_lock critical section within mmu_alloc_direct_roots and
mmu_alloc_shadow_roots.
Reported-by: NWanpeng Li <wanpengli@tencent.com>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a38162e

31 3月, 2021 2 次提交

KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages · 33a31641

由 Sean Christopherson 提交于 3月 25, 2021

Prevent the TDP MMU from yielding when zapping a gfn range during NX
page recovery.  If a flush is pending from a previous invocation of the
zapping helper, either in the TDP MMU or the legacy MMU, but the TDP MMU
has not accumulated a flush for the current invocation, then yielding
will release mmu_lock with stale TLB entries.

That being said, this isn't technically a bug fix in the current code, as
the TDP MMU will never yield in this case.  tdp_mmu_iter_cond_resched()
will yield if and only if it has made forward progress, as defined by the
current gfn vs. the last yielded (or starting) gfn.  Because zapping a
single shadow page is guaranteed to (a) find that page and (b) step
sideways at the level of the shadow page, the TDP iter will break its loop
before getting a chance to yield.

But that is all very, very subtle, and will break at the slightest sneeze,
e.g. zapping while holding mmu_lock for read would break as the TDP MMU
wouldn't be guaranteed to see the present shadow page, and thus could step
sideways at a lower level.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-4-seanjc@google.com>
[Add lockdep assertion. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33a31641

KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping · 048f4980

由 Sean Christopherson 提交于 3月 25, 2021

Honor the "flush needed" return from kvm_tdp_mmu_zap_gfn_range(), which
does the flush itself if and only if it yields (which it will never do in
this particular scenario), and otherwise expects the caller to do the
flush.  If pages are zapped from the TDP MMU but not the legacy MMU, then
no flush will occur.

Fixes: 29cf0f50 ("kvm: x86/mmu: NX largepage recovery for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210325200119.1359384-3-seanjc@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

048f4980

15 3月, 2021 25 次提交

KVM: x86/mmu: Mark the PAE roots as decrypted for shadow paging · 4a98623d

由 Sean Christopherson 提交于 3月 09, 2021

Set the PAE roots used as decrypted to play nice with SME when KVM is
using shadow paging.  Explicitly skip setting the C-bit when loading
CR3 for PAE shadow paging, even though it's completely ignored by the
CPU.  The extra documentation is nice to have.

Note, there are several subtleties at play with NPT.  In addition to
legacy shadow paging, the PAE roots are used for SVM's NPT when either
KVM is 32-bit (uses PAE paging) or KVM is 64-bit and shadowing 32-bit
NPT.  However, 32-bit Linux, and thus KVM, doesn't support SME.  And
64-bit KVM can happily set the C-bit in CR3.  This also means that
keeping __sme_set(root) for 32-bit KVM when NPT is enabled is
conceptually wrong, but functionally ok since SME is 64-bit only.
Leave it as is to avoid unnecessary pollution.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210309224207.1218275-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a98623d

KVM: x86/mmu: Use '0' as the one and only value for an invalid PAE root · c834e5e4

由 Sean Christopherson 提交于 3月 09, 2021

Use '0' to denote an invalid pae_root instead of '0' or INVALID_PAGE.
Unlike root_hpa, the pae_roots hold permission bits and thus are
guaranteed to be non-zero.  Having to deal with both values leads to
bugs, e.g. failing to set back to INVALID_PAGE, warning on the wrong
value, etc...
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210309224207.1218275-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c834e5e4

KVM: x86/mmu: Dump reserved bits if they're detected on non-MMIO SPTE · bb4cdf3a

由 Sean Christopherson 提交于 2月 25, 2021

Debugging unexpected reserved bit page faults sucks.  Dump the reserved
bits that (likely) caused the page fault to make debugging suck a little
less.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-25-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb4cdf3a

KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic · 5fc3424f

由 Sean Christopherson 提交于 2月 25, 2021

Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for
a given KVM instance.  This will allow EPT to use high available bits,
which in turn will free up bit 11 for a constant MMU_PRESENT bit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5fc3424f

KVM: x86/mmu: Co-locate code for setting various SPTE masks · d6b87f25

由 Sean Christopherson 提交于 2月 25, 2021

Squish all the code for (re)setting the various SPTE masks into one
location.  With the split code, it's not at all clear that the masks are
set once during module initialization.  This will allow a future patch to
clean up initialization of the masks without shuffling code all over
tarnation.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-17-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6b87f25

KVM: x86/mmu: Move initial kvm_mmu_set_mask_ptes() call into MMU proper · ec761cfd

由 Sean Christopherson 提交于 2月 25, 2021

Move kvm_mmu_set_mask_ptes() into mmu.c as prep for future cleanup of the
mask initialization code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-16-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec761cfd

KVM: x86/mmu: Stop using software available bits to denote MMIO SPTEs · 8120337a

由 Sean Christopherson 提交于 2月 25, 2021

Stop tagging MMIO SPTEs with specific available bits and instead detect
MMIO SPTEs by checking for their unique SPTE value. The value is
guaranteed to be unique on shadow paging and NPT as setting reserved
physical address bits on any other type of SPTE would consistute a KVM
bug. Ditto for EPT, as creating a WX non-MMIO would also be a bug.

Note, this approach is also future-compatibile with TDX, which will need
to reflect MMIO EPT violations as #VEs into the guest. To create an EPT
violation instead of a misconfig, TDX EPTs will need to have RWX=0, But,
MMIO SPTEs will also be the only case where KVM clears SUPPRESS_VE, so
MMIO SPTEs will still be guaranteed to have a unique value within a given
MMU context.

The main motivation is to make it easier to reason about which types of
SPTEs use which available bits. As a happy side effect, this frees up
two more bits for storing the MMIO generation.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8120337a

KVM: x86/mmu: Rename 'mask' to 'spte' in MMIO SPTE helpers · c236d962

由 Sean Christopherson 提交于 2月 25, 2021

The value returned by make_mmio_spte() is a SPTE, it is not a mask.
Name it accordingly.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c236d962

KVM: x86/mmu: Handle MMIO SPTEs directly in mmu_set_spte() · a54aa15c

由 Sean Christopherson 提交于 2月 25, 2021

Now that it should be impossible to convert a valid SPTE to an MMIO SPTE,
handle MMIO SPTEs early in mmu_set_spte() without going through
set_spte() and all the logic for removing an existing, valid SPTE.
The other caller of set_spte(), FNAME(sync_page)(), explicitly handles
MMIO SPTEs prior to calling set_spte().

This simplifies mmu_set_spte() and set_spte(), and also "fixes" an oddity
where MMIO SPTEs are traced by both trace_kvm_mmu_set_spte() and
trace_mark_mmio_spte().

Note, mmu_spte_set() will WARN if this new approach causes KVM to create
an MMIO SPTE overtop a valid SPTE.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a54aa15c

KVM: x86/mmu: Don't install bogus MMIO SPTEs if MMIO caching is disabled · 30ab5901

由 Sean Christopherson 提交于 2月 25, 2021

If MMIO caching is disabled, e.g. when using shadow paging on CPUs with
52 bits of PA space, go straight to MMIO emulation and don't install an
MMIO SPTE.  The SPTE will just generate a !PRESENT #PF, i.e. can't
actually accelerate future MMIO.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30ab5901

KVM: x86/mmu: Retry page faults that hit an invalid memslot · e0c37868

由 Sean Christopherson 提交于 2月 25, 2021

Retry page faults (re-enter the guest) that hit an invalid memslot
instead of treating the memslot as not existing, i.e. handling the
page fault as an MMIO access.  When deleting a memslot, SPTEs aren't
zapped and the TLBs aren't flushed until after the memslot has been
marked invalid.

Handling the invalid slot as MMIO means there's a small window where a
page fault could replace a valid SPTE with an MMIO SPTE.  The legacy
MMU handles such a scenario cleanly, but the TDP MMU assumes such
behavior is impossible (see the BUG() in __handle_changed_spte()).
There's really no good reason why the legacy MMU should allow such a
scenario, and closing this hole allows for additional cleanups.

Fixes: 2f2fad08 ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0c37868

KVM: x86/mmu: Bail from fast_page_fault() if SPTE is not shadow-present · ec89e643

由 Sean Christopherson 提交于 2月 25, 2021

Bail from fast_page_fault() if the SPTE is not a shadow-present SPTE.
Functionally, this is not strictly necessary as the !is_access_allowed()
check will eventually reject the fast path, but an early check on
shadow-present skips unnecessary checks and will allow a future patch to
tweak the A/D status auditing to warn if KVM attempts to query A/D bits
without first ensuring the SPTE is a shadow-present SPTE.

Note, is_shadow_present_pte() is quite expensive at this time, i.e. this
might be a net negative in the short term. A future patch will optimize
is_shadow_present_pte() to a single AND operation and remedy the issue.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec89e643

KVM: x86/mmu: Add typedefs for rmap/iter handlers · c1b91493

由 Sean Christopherson 提交于 2月 25, 2021

Add typedefs for the MMU handlers that are invoked when walking the MMU
SPTEs (rmaps in legacy MMU) to act on a host virtual address range.

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210226010329.1766033-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1b91493

KVM: nSVM: Set the shadow root level to the TDP level for nested NPT · a3322d5c

由 Sean Christopherson 提交于 3月 04, 2021

Override the shadow root level in the MMU context when configuring
NPT for shadowing nested NPT.  The level is always tied to the TDP level
of the host, not whatever level the guest happens to be using.

Fixes: 096586fd ("KVM: nSVM: Correctly set the shadow NPT root level in its MMU role")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3322d5c

KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level · 73ad1606

由 Sean Christopherson 提交于 3月 04, 2021

WARN if KVM is about to dereference a NULL pae_root or lm_root when
loading an MMU, and convert the BUG() on a bad shadow_root_level into a
WARN (now that errors are handled cleanly). With nested NPT, botching
the level and sending KVM down the wrong path is all too easy, and the
on-demand allocation of pae_root and lm_root means bugs crash the host.
Obviously, KVM could unconditionally allocate the roots, but that's
arguably a worse failure mode as it would potentially corrupt the guest
instead of crashing it.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-18-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73ad1606

KVM: x86/mmu: Sync roots after MMU load iff load as successful · a91f387b

由 Sean Christopherson 提交于 3月 04, 2021

For clarity, explicitly skip syncing roots if the MMU load failed
instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots().
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-17-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a91f387b

KVM: x86/mmu: Unexport MMU load/unload functions · 61a1773e

由 Sean Christopherson 提交于 3月 04, 2021

Unexport the MMU load and unload helpers now that they are no longer
used (incorrectly) in vendor code.

Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h,
it should not be exposed to vendor code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-16-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61a1773e

KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs · 17e368d9

由 Sean Christopherson 提交于 3月 04, 2021

Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.

Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17e368d9

KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks · e49e0b7b

由 Sean Christopherson 提交于 3月 04, 2021

Exempt NULL PAE roots from the check to detect leaks, since
kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE.  Stop hiding
the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs
obviously didn't do their job given the hilarious number of bugs that
could lead to PAE roots being leaked, not to mention the above false
positive.

Opportunistically delete a warning on root_hpa being valid, there's
nothing special about 4/5-level shadow pages that warrants a WARN.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e49e0b7b

KVM: x86/mmu: Check PDPTRs before allocating PAE roots · 6e0918ae

由 Sean Christopherson 提交于 3月 04, 2021

Check the validity of the PDPTRs before allocating any of the PAE roots,
otherwise a bad PDPTR will cause KVM to leak any previously allocated
roots.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e0918ae

KVM: x86/mmu: Ensure MMU pages are available when allocating roots · 6e6ec584

由 Sean Christopherson 提交于 3月 04, 2021

Hold the mmu_lock for write for the entire duration of allocating and
initializing an MMU's roots.  This ensures there are MMU pages available
and thus prevents root allocations from failing.  That in turn fixes a
bug where KVM would fail to free valid PAE roots if a one of the later
roots failed to allocate.

Add a comment to make_mmu_pages_available() to call out that the limit
is a soft limit, e.g. KVM will temporarily exceed the threshold if a
page fault allocates multiple shadow pages and there was only one page
"available".

Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail.
This will be addressed in a future commit.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e6ec584

KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper · 748e52b9

由 Sean Christopherson 提交于 3月 04, 2021

Move the on-demand allocation of the pae_root and lm_root pages, used by
nested NPT for 32-bit L1s, into a separate helper.  This will allow a
future patch to hold mmu_lock while allocating the non-special roots so
that make_mmu_pages_available() can be checked once at the start of root
allocation, and thus avoid having to deal with failure in the middle of
root allocation.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

748e52b9

KVM: x86/mmu: Allocate the lm_root before allocating PAE roots · ba0a194f

由 Sean Christopherson 提交于 3月 04, 2021

Allocate lm_root before the PAE roots so that the PAE roots aren't
leaked if the memory allocation for the lm_root happens to fail.

Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's
PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available.
Those issues will be fixed in future commits.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba0a194f

KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots · b37233c9

由 Sean Christopherson 提交于 3月 04, 2021

Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield
smaller diffs when moving code around in future cleanup without forcing
the new code to use the same ugly pattern.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b37233c9

KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit · 04d45551

由 Sean Christopherson 提交于 3月 04, 2021

Allocate the so called pae_root page on-demand, along with the lm_root
page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a
32-bit L1.  KVM currently only allocates the page when NPT is disabled,
or when L0 is 32-bit (using PAE paging).

Note, there is an existing memory leak involving the MMU roots, as KVM
fails to free the PAE roots on failure.  This will be addressed in a
future commit.

Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled")
Fixes: b6b80c78 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT")
Cc: stable@vger.kernel.org
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

04d45551

23 2月, 2021 2 次提交

KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848

由 David Stevens 提交于 2月 22, 2021

Track the range being invalidated by mmu_notifier and skip page fault
retries if the fault address is not affected by the in-progress
invalidation. Handle concurrent invalidations by finding the minimal
range which includes all ranges being invalidated. Although the combined
range may include unrelated addresses and cannot be shrunk as individual
invalidation operations complete, it is unlikely the marginal gains of
proper range tracking are worth the additional complexity.

The primary benefit of this change is the reduction in the likelihood of
extreme latency when handing a page fault due to another thread having
been preempted while modifying host virtual addresses.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
Message-Id: <20210222024522.1751719-3-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a42d848

KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault · 5f8a7cf2

由 Sean Christopherson 提交于 2月 22, 2021

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210222024522.1751719-2-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f8a7cf2

19 2月, 2021 8 次提交

KVM: x86/mmu: Remove a variety of unnecessary exports · 96ad91ae

由 Sean Christopherson 提交于 2月 12, 2021

Remove several exports from the MMU that are no longer necessary.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96ad91ae

KVM: x86: Fold "write-protect large" use case into generic write-protect · a1419f8b

由 Sean Christopherson 提交于 2月 12, 2021

Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole
caller to use kvm_mmu_slot_remove_write_access().  Remove the now-unused
slot_handle_large_level() and slot_handle_all_level() helpers.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1419f8b

KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML · b6e16ae5

由 Sean Christopherson 提交于 2月 12, 2021

Stop setting dirty bits for MMU pages when dirty logging is disabled for
a memslot, as PML is now completely disabled when there are no memslots
with dirty logging enabled.

This means that spurious PML entries will be created for memslots with
dirty logging disabled if at least one other memslot has dirty logging
enabled. However, spurious PML entries are already possible since
dirty bits are set only when a dirty logging is turned off, i.e. memslots
that are never dirty logged will have dirty bits cleared.

In the end, it's faster overall to eat a few spurious PML entries in the
window where dirty logging is being disabled across all memslots.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6e16ae5

KVM: x86: Move MMU's PML logic to common code · a018eba5

由 Sean Christopherson 提交于 2月 12, 2021

Drop the facade of KVM's PML logic being vendor specific and move the
bits that aren't truly VMX specific into common x86 code.  The MMU logic
for dealing with PML is tightly coupled to the feature and to VMX's
implementation, bouncing through kvm_x86_ops obfuscates the code without
providing any meaningful separation of concerns or encapsulation.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a018eba5

KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function · 6dd03800

由 Sean Christopherson 提交于 2月 12, 2021

Store the vendor-specific dirty log size in a variable, there's no need
to wrap it in a function since the value is constant after
hardware_setup() runs.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6dd03800

KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs · 9eba50f8

由 Sean Christopherson 提交于 2月 12, 2021

When zapping SPTEs in order to rebuild them as huge pages, use the new
helper that computes the max mapping level to detect whether or not a
SPTE should be zapped.  Doing so avoids zapping SPTEs that can't
possibly be rebuilt as huge pages, e.g. due to hardware constraints,
memslot alignment, etc...

This also avoids zapping SPTEs that are still large, e.g. if migration
was canceled before write-protected huge pages were shattered to enable
dirty logging.  Note, such pages are still write-protected at this time,
i.e. a page fault VM-Exit will still occur.  This will hopefully be
addressed in a future patch.

Sadly, TDP MMU loses its const on the memslot, but that's a pervasive
problem that's been around for quite some time.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9eba50f8

KVM: x86/mmu: Pass the memslot to the rmap callbacks · 0a234f5d

由 Sean Christopherson 提交于 2月 12, 2021

Pass the memslot to the rmap callbacks, it will be used when zapping
collapsible SPTEs to verify the memslot is compatible with hugepages
before zapping its SPTEs.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a234f5d

KVM: x86/mmu: Split out max mapping level calculation to helper · 1b6d9d9e

由 Sean Christopherson 提交于 2月 12, 2021

Factor out the logic for determining the maximum mapping level given a
memslot and a gpa.  The helper will be used when zapping collapsible
SPTEs when disabling dirty logging, e.g. to avoid zapping SPTEs that
can't possibly be rebuilt as hugepages.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b6d9d9e

09 2月, 2021 2 次提交

KVM: x86/mmu: Make HVA handler retpoline-friendly · 8f5c44f9

由 Maciej S. Szmigiero 提交于 2月 08, 2021

When retpolines are enabled they have high overhead in the inner loop
inside kvm_handle_hva_range() that iterates over the provided memory area.

Let's mark this function and its TDP MMU equivalent __always_inline so
compiler will be able to change the call to the actual handler function
inside each of them into a direct one.

This significantly improves performance on the unmap test on the existing
kernel memslot code (tested on a Xeon 8167M machine):
30 slots in use:
Test       Before   After     Improvement
Unmap      0.0353s  0.0334s   5%
Unmap 2M   0.00104s 0.000407s 61%

509 slots in use:
Test       Before   After     Improvement
Unmap      0.0742s  0.0740s   None
Unmap 2M   0.00221s 0.00159s  28%

Looks like having an indirect call in these functions (and, so, a
retpoline) might have interfered with unrolling of the whole loop in the
CPU.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f5c44f9

KVM: x86: compile out TDP MMU on 32-bit systems · 897218ff

由 Paolo Bonzini 提交于 2月 06, 2021

The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs.
Rather than just disabling it, compile it out completely so that it
is possible to use for example 64-bit xchg.

To limit the number of stubs, wrap all accesses to tdp_mmu_enabled
or tdp_mmu_page with a function.  Calls to all other functions in
tdp_mmu.c are eliminated and do not even reach the linker.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

897218ff

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功