提交 · 9d0c8e793f0eb0613efe81d2cdca8c2efa0ad33c · openeuler / Kernel

13 3月, 2021 1 次提交

KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode · 8df9f1af

由 Sean Christopherson 提交于 3月 09, 2021

If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
removal.  The concurrent write protections provided by REMOVED_SPTE are
not needed, there are no backing page side effects to record, and MMIO
SPTEs can be left as is since they are protected by the memslot
generation, not by ensuring that the MMIO SPTE is unreachable (which
is racy with respect to lockless walks regardless of zapping behavior).

Skipping !PRESENT drastically reduces the number of updates needed to
tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
be skipped.

Avoiding the write itself is likely close to a wash, but avoiding
__handle_changed_spte() is a clear-cut win as that involves saving and
restoring all non-volatile GPRs (it's a subtly big function), as well as
several conditional branches before bailing out.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210310003029.1250571-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8df9f1af

19 2月, 2021 3 次提交

KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML · b6e16ae5

由 Sean Christopherson 提交于 2月 12, 2021

Stop setting dirty bits for MMU pages when dirty logging is disabled for
a memslot, as PML is now completely disabled when there are no memslots
with dirty logging enabled.

This means that spurious PML entries will be created for memslots with
dirty logging disabled if at least one other memslot has dirty logging
enabled. However, spurious PML entries are already possible since
dirty bits are set only when a dirty logging is turned off, i.e. memslots
that are never dirty logged will have dirty bits cleared.

In the end, it's faster overall to eat a few spurious PML entries in the
window where dirty logging is being disabled across all memslots.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6e16ae5

KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs · 9eba50f8

由 Sean Christopherson 提交于 2月 12, 2021

When zapping SPTEs in order to rebuild them as huge pages, use the new
helper that computes the max mapping level to detect whether or not a
SPTE should be zapped.  Doing so avoids zapping SPTEs that can't
possibly be rebuilt as huge pages, e.g. due to hardware constraints,
memslot alignment, etc...

This also avoids zapping SPTEs that are still large, e.g. if migration
was canceled before write-protected huge pages were shattered to enable
dirty logging.  Note, such pages are still write-protected at this time,
i.e. a page fault VM-Exit will still occur.  This will hopefully be
addressed in a future patch.

Sadly, TDP MMU loses its const on the memslot, but that's a pervasive
problem that's been around for quite some time.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9eba50f8

KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages · c060c72f

由 Sean Christopherson 提交于 2月 12, 2021

Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to
rebuild them as huge pages in the TDP MMU.  ZONE_DEVICE huge pages are
managed differently than "regular" pages and are not compound pages.
Likewise, PageTransCompoundMap() will not detect HugeTLB, so switch
to PageCompound().

This matches the similar check in kvm_mmu_zap_collapsible_spte.

Cc: Ben Gardon <bgardon@google.com>
Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c060c72f

09 2月, 2021 2 次提交

KVM: x86/mmu: Make HVA handler retpoline-friendly · 8f5c44f9

由 Maciej S. Szmigiero 提交于 2月 08, 2021

When retpolines are enabled they have high overhead in the inner loop
inside kvm_handle_hva_range() that iterates over the provided memory area.

Let's mark this function and its TDP MMU equivalent __always_inline so
compiler will be able to change the call to the actual handler function
inside each of them into a direct one.

This significantly improves performance on the unmap test on the existing
kernel memslot code (tested on a Xeon 8167M machine):
30 slots in use:
Test       Before   After     Improvement
Unmap      0.0353s  0.0334s   5%
Unmap 2M   0.00104s 0.000407s 61%

509 slots in use:
Test       Before   After     Improvement
Unmap      0.0742s  0.0740s   None
Unmap 2M   0.00221s 0.00159s  28%

Looks like having an indirect call in these functions (and, so, a
retpoline) might have interfered with unrolling of the whole loop in the
CPU.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <732d3fe9eb68aa08402a638ab0309199fa89ae56.1612810129.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f5c44f9

KVM: x86: compile out TDP MMU on 32-bit systems · 897218ff

由 Paolo Bonzini 提交于 2月 06, 2021

The TDP MMU assumes that it can do atomic accesses to 64-bit PTEs.
Rather than just disabling it, compile it out completely so that it
is possible to use for example 64-bit xchg.

To limit the number of stubs, wrap all accesses to tdp_mmu_enabled
or tdp_mmu_page with a function.  Calls to all other functions in
tdp_mmu.c are eliminated and do not even reach the linker.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

897218ff

04 2月, 2021 17 次提交

KVM: x86/mmu: Mark SPTEs in disconnected pages as removed · e25f0e0c

由 Ben Gardon 提交于 2月 02, 2021

When clearing TDP MMU pages what have been disconnected from the paging
structure root, set the SPTEs to a special non-present value which will
not be overwritten by other threads. This is needed to prevent races in
which a thread is clearing a disconnected page table, but another thread
has already acquired a pointer to that memory and installs a mapping in
an already cleared entry. This can lead to memory leaks and accounting
errors.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-23-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e25f0e0c

KVM: x86/mmu: Flush TLBs after zap in TDP MMU PF handler · 08f07c80

由 Ben Gardon 提交于 2月 02, 2021

When the TDP MMU is allowed to handle page faults in parallel there is
the possiblity of a race where an SPTE is cleared and then imediately
replaced with a present SPTE pointing to a different PFN, before the
TLBs can be flushed. This race would violate architectural specs. Ensure
that the TLBs are flushed properly before other threads are allowed to
install any present value for the SPTE.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-22-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08f07c80

KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map · 9a77daac

由 Ben Gardon 提交于 2月 02, 2021

To prepare for handling page faults in parallel, change the TDP MMU
page fault handler to use atomic operations to set SPTEs so that changes
are not lost if multiple threads attempt to modify the same SPTE.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-21-bgardon@google.com>
[Document new locking rules. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a77daac

KVM: x86/mmu: Factor out functions to add/remove TDP MMU pages · a9442f59

由 Ben Gardon 提交于 2月 02, 2021

Move the work of adding and removing TDP MMU pages to/from  "secondary"
data structures to helper functions. These functions will be built on in
future commits to enable MMU operations to proceed (mostly) in parallel.

No functional change expected.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-20-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9442f59

KVM: x86/mmu: Use an rwlock for the x86 MMU · 531810ca

由 Ben Gardon 提交于 2月 02, 2021

Add a read / write lock to be used in place of the MMU spinlock on x86.
The rwlock will enable the TDP MMU to handle page faults, and other
operations in parallel in future commits.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-19-bgardon@google.com>
[Introduce virt/kvm/mmu_lock.h - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

531810ca

KVM: x86/mmu: Protect TDP MMU page table memory with RCU · 7cca2d0b

由 Ben Gardon 提交于 2月 02, 2021

In order to enable concurrent modifications to the paging structures in
the TDP MMU, threads must be able to safely remove pages of page table
memory while other threads are traversing the same memory. To ensure
threads do not access PT memory after it is freed, protect PT memory
with RCU.

Protecting concurrent accesses to page table memory from use-after-free
bugs could also have been acomplished using
walk_shadow_page_lockless_begin/end() and READING_SHADOW_PAGE_TABLES,
coupling with the barriers in a TLB flush. The use of RCU for this case
has several distinct advantages over that approach.
1. Disabling interrupts for long running operations is not desirable.
   Future commits will allow operations besides page faults to operate
   without the exclusive protection of the MMU lock and those operations
   are too long to disable iterrupts for their duration.
2. The use of RCU here avoids long blocking / spinning operations in
   perfromance critical paths. By freeing memory with an asynchronous
   RCU API we avoid the longer wait times TLB flushes experience when
   overlapping with a thread in walk_shadow_page_lockless_begin/end().
3. RCU provides a separation of concerns when removing memory from the
   paging structure. Because the RCU callback to free memory can be
   scheduled immediately after a TLB flush, there's no need for the
   thread to manually free a queue of pages later, as commit_zap_pages
   does.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-18-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cca2d0b

KVM: x86/mmu: Clear dirtied pages mask bit before early break · f1b3b06a

由 Ben Gardon 提交于 2月 02, 2021

In clear_dirty_pt_masked, the loop is intended to exit early after
processing each of the GFNs with corresponding bits set in mask. This
does not work as intended if another thread has already cleared the
dirty bit or writable bit on the SPTE. In that case, the loop would
proceed to the next iteration early and the bit in mask would not be
cleared. As a result the loop could not exit early and would proceed
uselessly. Move the unsetting of the mask bit before the check for a
no-op SPTE change.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP
MMU")
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-17-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f1b3b06a

KVM: x86/mmu: Skip no-op changes in TDP MMU functions · 0f99ee2c

由 Ben Gardon 提交于 2月 02, 2021

Skip setting SPTEs if no change is expected.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-16-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f99ee2c

KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed · 1af4a960

由 Ben Gardon 提交于 2月 02, 2021

Given certain conditions, some TDP MMU functions may not yield
reliably / frequently enough. For example, if a paging structure was
very large but had few, if any writable entries, wrprot_gfn_range
could traverse many entries before finding a writable entry and yielding
because the check for yielding only happens after an SPTE is modified.

Fix this issue by moving the yield to the beginning of the loop.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-15-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1af4a960

KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter · ed5e484b

由 Ben Gardon 提交于 2月 02, 2021

In some functions the TDP iter risks not making forward progress if two
threads livelock yielding to one another. This is possible if two threads
are trying to execute wrprot_gfn_range. Each could write protect an entry
and then yield. This would reset the tdp_iter's walk over the paging
structure and the loop would end up repeating the same entry over and
over, preventing either thread from making forward progress.

Fix this issue by only yielding if the loop has made forward progress
since the last yield.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-14-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed5e484b

KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched · e139a34e

由 Ben Gardon 提交于 2月 02, 2021

The flushing and non-flushing variants of tdp_mmu_iter_cond_resched have
almost identical implementations. Merge the two functions and add a
flush parameter.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-12-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e139a34e

KVM: x86/mmu: Factor out handling of removed page tables · a066e61f

由 Ben Gardon 提交于 2月 02, 2021

Factor out the code to handle a disconnected subtree of the TDP paging
structure from the code to handle the change to an individual SPTE.
Future commits will build on this to allow asynchronous page freeing.

No functional change intended.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-6-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a066e61f

KVM: x86/mmu: Don't redundantly clear TDP MMU pt memory · 734e45b3

由 Ben Gardon 提交于 2月 02, 2021

The KVM MMU caches already guarantee that shadow page table memory will
be zeroed, so there is no reason to re-zero the page in the TDP MMU page
fault handler.

No functional change intended.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-5-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

734e45b3

KVM: x86/mmu: Add lockdep when setting a TDP MMU SPTE · 3a9a4aa5

由 Ben Gardon 提交于 2月 02, 2021

Add lockdep to __tdp_mmu_set_spte to ensure that SPTEs are only modified
under the MMU lock.

No functional change intended.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-4-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a9a4aa5

KVM: x86/mmu: Add comment on __tdp_mmu_set_spte · fe43fa2f

由 Ben Gardon 提交于 2月 02, 2021

__tdp_mmu_set_spte is a very important function in the TDP MMU which
already accepts several arguments and will take more in future commits.
To offset this complexity, add a comment to the function describing each
of the arguemnts.

No functional change intended.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-3-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe43fa2f

KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched · e28a436c

由 Ben Gardon 提交于 2月 02, 2021

Currently the TDP MMU yield / cond_resched functions either return
nothing or return true if the TLBs were not flushed. These are confusing
semantics, especially when making control flow decisions in calling
functions.

To clean things up, change both functions to have the same
return value semantics as cond_resched: true if the thread yielded,
false if it did not. If the function yielded in the _flush_ version,
then the TLBs will have been flushed.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-2-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e28a436c

KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs · 87aa9ec9

由 Ben Gardon 提交于 2月 02, 2021

There is a bug in the TDP MMU function to zap SPTEs which could be
replaced with a larger mapping which prevents the function from doing
anything. Fix this by correctly zapping the last level SPTEs.

Cc: stable@vger.kernel.org
Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-11-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87aa9ec9

08 1月, 2021 4 次提交

KVM: x86/mmu: Ensure TDP MMU roots are freed after yield · a889ea54

由 Ben Gardon 提交于 1月 06, 2021

Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.

To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.

Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.
Reported-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a889ea54

KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array · dde81f94

由 Sean Christopherson 提交于 12月 17, 2020

Bump the size of the sptes array by one and use the raw level of the
SPTE to index into the sptes array.  Using the SPTE level directly
improves readability by eliminating the need to reason out why the level
is being adjusted when indexing the array.  The array is on the stack
and is not explicitly initialized; bumping its size is nothing more than
a superficial adjustment to the stack frame.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-4-seanjc@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dde81f94

KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE · 39b4d43e

由 Sean Christopherson 提交于 12月 17, 2020

Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root".  Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk().  This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.

Opportunistically nuke a few extra newlines.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: NRichard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

39b4d43e

KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() · 2aa07893

由 Sean Christopherson 提交于 12月 17, 2020

Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR.  Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data).  In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.

Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.

Fixes: 95fb5b02 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2aa07893

04 12月, 2020 1 次提交

kvm: x86/mmu: Use cpuid to determine max gfn · 339f5a7f

由 Rick Edgecombe 提交于 12月 03, 2020

In the TDP MMU, use shadow_phys_bits to dermine the maximum possible GFN
mapped in the guest for zapping operations. boot_cpu_data.x86_phys_bits
may be reduced in the case of HW features that steal HPA bits for other
purposes. However, this doesn't necessarily reduce GPA space that can be
accessed via TDP. So zap based on a maximum gfn calculated with MAXPHYADDR
retrieved from CPUID. This is already stored in shadow_phys_bits, so use
it instead of x86_phys_bits.

Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Signed-off-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
Message-Id: <20201203231120.27307-1-rick.p.edgecombe@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

339f5a7f

19 11月, 2020 2 次提交

kvm: x86/mmu: Add TDP MMU SPTE changed trace point · b9a98c34

由 Ben Gardon 提交于 10月 27, 2020

Add an extremely verbose trace point to the TDP MMU to log all SPTE
changes, regardless of callstack / motivation. This is useful when a
complete picture of the paging structure is needed or a change cannot be
explained with the other, existing trace points.

Tested: ran the demand paging selftest on an Intel Skylake machine with
all the trace points used by the TDP MMU enabled and observed
them firing with expected values.

This patch can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/3813Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201027175944.1183301-2-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9a98c34

kvm: x86/mmu: Add existing trace points to TDP MMU · 33dd3574

由 Ben Gardon 提交于 10月 27, 2020

The TDP MMU was initially implemented without some of the usual
tracepoints found in mmu.c. Correct this discrepancy by adding the
missing trace points to the TDP MMU.

Tested: ran the demand paging selftest on an Intel Skylake machine with
	all the trace points used by the TDP MMU enabled and observed
	them firing with expected values.

This patch can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/3812Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201027175944.1183301-1-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33dd3574

15 11月, 2020 2 次提交

KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed

由 Peter Xu 提交于 9月 30, 2020

This patch is heavily based on previous work from Lei Cao
<lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]

KVM currently uses large bitmaps to track dirty memory. These bitmaps
are copied to userspace when userspace queries KVM for its dirty page
information. The use of bitmaps is mostly sufficient for live
migration, as large parts of memory are be dirtied from one log-dirty
pass to another. However, in a checkpointing system, the number of
dirty pages is small and in fact it is often bounded---the VM is
paused when it has dirtied a pre-defined number of pages. Traversing a
large, sparsely populated bitmap to find set bits is time-consuming,
as is copying the bitmap to user-space.

A similar issue will be there for live migration when the guest memory
is huge while the page dirty procedure is trivial. In that case for
each dirty sync we need to pull the whole dirty bitmap to userspace
and analyse every bit even if it's mostly zeros.

The preferred data structure for above scenarios is a dense list of
guest frame numbers (GFN). This patch series stores the dirty list in
kernel memory that can be memory mapped into userspace to allow speedy
harvesting.

This patch enables dirty ring for X86 only. However it should be
easily extended to other archs as well.

[1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012222.5767-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb04a1ed

kvm: mmu: fix is_tdp_mmu_check when the TDP MMU is not in use · c887c9b9

由 Paolo Bonzini 提交于 11月 15, 2020

In some cases where shadow paging is in use, the root page will
be either mmu->pae_root or vcpu->arch.mmu->lm_root.  Then it will
not have an associated struct kvm_mmu_page, because it is allocated
with alloc_page instead of kvm_mmu_alloc_page.

Just return false quickly from is_tdp_mmu_root if the TDP MMU is
not in use, which also includes the case where shadow paging is
enabled.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c887c9b9

24 10月, 2020 1 次提交

KVM: x86/mmu: Avoid modulo operator on 64-bit value to fix i386 build · 764388ce

由 Sean Christopherson 提交于 10月 23, 2020

Replace a modulo operator with the more common pattern for computing the
gfn "offset" of a huge page to fix an i386 build error.

  arch/x86/kvm/mmu/tdp_mmu.c:212: undefined reference to `__umoddi3'

In fact, almost all of tdp_mmu.c can be elided on 32-bit builds, but
that is a much larger patch.

Fixes: 2f2fad08 ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Reported-by: NDaniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201024031150.9318-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

764388ce

23 10月, 2020 7 次提交

kvm: x86/mmu: NX largepage recovery for TDP MMU · 29cf0f50

由 Ben Gardon 提交于 10月 14, 2020

When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.

With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-21-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

29cf0f50

kvm: x86/mmu: Support MMIO in the TDP MMU · 95fb5b02

由 Ben Gardon 提交于 10月 14, 2020

In order to support MMIO, KVM must be able to walk the TDP paging
structures to find mappings for a given GFN. Support this walk for
the TDP MMU.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

v2: Thanks to Dan Carpenter and kernel test robot for finding that root
was used uninitialized in get_mmio_spte.
Signed-off-by: NBen Gardon <bgardon@google.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20201014182700.2888246-19-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95fb5b02

kvm: x86/mmu: Support write protection for nesting in tdp MMU · 46044f72

由 Ben Gardon 提交于 10月 14, 2020

To support nested virtualization, KVM will sometimes need to write
protect pages which are part of a shadowed paging structure or are not
writable in the shadowed paging structure. Add a function to write
protect GFN mappings for this purpose.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-18-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46044f72

kvm: x86/mmu: Support disabling dirty logging for the tdp MMU · 14881998

由 Ben Gardon 提交于 10月 14, 2020

Dirty logging ultimately breaks down MMU mappings to 4k granularity.
When dirty logging is no longer needed, these granaular mappings
represent a useless performance penalty. When dirty logging is disabled,
search the paging structure for mappings that could be re-constituted
into a large page mapping. Zap those mappings so that they can be
faulted in again at a higher mapping level.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-17-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14881998

kvm: x86/mmu: Support dirty logging for the TDP MMU · a6a0b05d

由 Ben Gardon 提交于 10月 14, 2020

Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-16-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6a0b05d

kvm: x86/mmu: Support changed pte notifier in tdp MMU · 1d8dd6b3

由 Ben Gardon 提交于 10月 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
a hook and handle the change_pte MMU notifier.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-15-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d8dd6b3

kvm: x86/mmu: Add access tracking for tdp_mmu · f8e14497

由 Ben Gardon 提交于 10月 14, 2020

In order to interoperate correctly with the rest of KVM and other Linux
subsystems, the TDP MMU must correctly handle various MMU notifiers. The
main Linux MM uses the access tracking MMU notifiers for swap and other
features. Add hooks to handle the test/flush HVA (range) family of
MMU notifiers.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-14-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8e14497

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功