提交 · c298a30c2821cb03274dfc81840f7bd3eb59d5a3 · openeuler / Kernel

11 2月, 2022 8 次提交

KVM: x86/mmu: Rename TDP MMU functions that handle shadow pages · c298a30c

由 David Matlack 提交于 1月 19, 2022

Rename 3 functions in tdp_mmu.c that handle shadow pages:

  alloc_tdp_mmu_page()  -> tdp_mmu_alloc_sp()
  tdp_mmu_link_page()   -> tdp_mmu_link_sp()
  tdp_mmu_unlink_page() -> tdp_mmu_unlink_sp()

These changed make tdp_mmu a consistent prefix before the verb in the
function name, and make it more clear that these functions deal with
kvm_mmu_page structs rather than struct pages.

One could argue that "shadow page" is the wrong term for a page table in
the TDP MMU since it never actually shadows a guest page table.
However, "shadow page" (or "sp" for short) has evolved to become the
standard term in KVM when referring to a kvm_mmu_page struct, and its
associated page table and other metadata, regardless of whether the page
table shadows a guest page table. So this commit just makes the TDP MMU
more consistent with the rest of KVM.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-6-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c298a30c

KVM: x86/mmu: Change tdp_mmu_{set,zap}_spte_atomic() to return 0/-EBUSY · 3e72c791

由 David Matlack 提交于 1月 19, 2022

tdp_mmu_set_spte_atomic() and tdp_mmu_zap_spte_atomic() return a bool
with true indicating the SPTE modification was successful and false
indicating failure. Change these functions to return an int instead
since that is the common practice.

Opportunistically fix up the kernel-doc style for the Return section
above tdp_mmu_set_spte_atomic().

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3e72c791

KVM: x86/mmu: Automatically update iter->old_spte if cmpxchg fails · 3255530a

由 David Matlack 提交于 1月 19, 2022

Consolidate a bunch of code that was manually re-reading the spte if the
cmpxchg failed. There is no extra cost of doing this because we already
have the spte value as a result of the cmpxchg (and in fact this
eliminates re-reading the spte), and none of the call sites depend on
iter->old_spte retaining the stale spte value.
Reviewed-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-4-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3255530a

KVM: x86/mmu: Check SPTE writable invariants when setting leaf SPTEs · 115111ef

由 David Matlack 提交于 1月 25, 2022

Check SPTE writable invariants when setting SPTEs rather than in
spte_can_locklessly_be_made_writable(). By the time KVM checks
spte_can_locklessly_be_made_writable(), the SPTE has long been since
corrupted.

Note that these invariants only apply to shadow-present leaf SPTEs (i.e.
not to MMIO SPTEs, non-leaf SPTEs, etc.). Add a comment explaining the
restriction and only instrument the code paths that set shadow-present
leaf SPTEs.

To account for access tracking, also check the SPTE writable invariants
when marking an SPTE as an access track SPTE. This also lets us remove
a redundant WARN from mark_spte_for_access_track().
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220125230518.1697048-3-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

115111ef

KVM: x86/tdp_mmu: Remove unused "kvm" of kvm_tdp_mmu_get_root() · ad6d6b94

由 Jinrong Liang 提交于 1月 25, 2022

The "struct kvm *kvm" parameter of kvm_tdp_mmu_get_root() is not used,
so remove it. No functional change intended.
Signed-off-by: NJinrong Liang <cloudliang@tencent.com>
Message-Id: <20220125095909.38122-5-cloudliang@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad6d6b94

KVM: x86/mmu: Zap _all_ roots when unmapping gfn range in TDP MMU · d62007ed

由 Sean Christopherson 提交于 12月 15, 2021

Zap both valid and invalid roots when zapping/unmapping a gfn range, as
KVM must ensure it holds no references to the freed page after returning
from the unmap operation.  Most notably, the TDP MMU doesn't zap invalid
roots in mmu_notifier callbacks.  This leads to use-after-free and other
issues if the mmu_notifier runs to completion while an invalid root
zapper yields as KVM fails to honor the requirement that there must be
_no_ references to the page after the mmu_notifier returns.

The bug is most easily reproduced by hacking KVM to cause a collision
between set_nx_huge_pages() and kvm_mmu_notifier_release(), but the bug
exists between kvm_mmu_notifier_invalidate_range_start() and memslot
updates as well.  Invalidating a root ensures pages aren't accessible by
the guest, and KVM won't read or write page data itself, but KVM will
trigger e.g. kvm_set_pfn_dirty() when zapping SPTEs, and thus completing
a zap of an invalid root _after_ the mmu_notifier returns is fatal.

  WARNING: CPU: 24 PID: 1496 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:173 [kvm]
  RIP: 0010:kvm_is_zone_device_pfn+0x96/0xa0 [kvm]
  Call Trace:
   <TASK>
   kvm_set_pfn_dirty+0xa8/0xe0 [kvm]
   __handle_changed_spte+0x2ab/0x5e0 [kvm]
   __handle_changed_spte+0x2ab/0x5e0 [kvm]
   __handle_changed_spte+0x2ab/0x5e0 [kvm]
   zap_gfn_range+0x1f3/0x310 [kvm]
   kvm_tdp_mmu_zap_invalidated_roots+0x50/0x90 [kvm]
   kvm_mmu_zap_all_fast+0x177/0x1a0 [kvm]
   set_nx_huge_pages+0xb4/0x190 [kvm]
   param_attr_store+0x70/0x100
   module_attr_store+0x19/0x30
   kernfs_fop_write_iter+0x119/0x1b0
   new_sync_write+0x11c/0x1b0
   vfs_write+0x1cc/0x270
   ksys_write+0x5f/0xe0
   do_syscall_64+0x38/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211215011557.399940-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d62007ed

KVM: x86/mmu: Move "invalid" check out of kvm_tdp_mmu_get_root() · 04dc4e6c

由 Sean Christopherson 提交于 12月 15, 2021

Move the check for an invalid root out of kvm_tdp_mmu_get_root() and into
the one place it actually matters, tdp_mmu_next_root(), as the other user
already has an implicit validity check.  A future bug fix will need to
get references to invalid roots to honor mmu_notifier requests; there's
no point in forcing what will be a common path to open code getting a
reference to a root.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211215011557.399940-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

04dc4e6c

KVM: x86/mmu: Use common TDP MMU zap helper for MMU notifier unmap hook · 83b83a02

由 Sean Christopherson 提交于 12月 15, 2021

Use the common TDP MMU zap helper when handling an MMU notifier unmap
event, the two flows are semantically identical.  Consolidate the code in
preparation for a future bug fix, as both kvm_tdp_mmu_unmap_gfn_range()
and __kvm_tdp_mmu_zap_gfn_range() are guilty of not zapping SPTEs in
invalid roots.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211215011557.399940-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

83b83a02

20 1月, 2022 1 次提交

KVM: x86/mmu: Fix write-protection of PTs mapped by the TDP MMU · 7c8a4742

由 David Matlack 提交于 1月 13, 2022

When the TDP MMU is write-protection GFNs for page table protection (as
opposed to for dirty logging, or due to the HVA not being writable), it
checks if the SPTE is already write-protected and if so skips modifying
the SPTE and the TLB flush.

This behavior is incorrect because it fails to check if the SPTE
is write-protected for page table protection, i.e. fails to check
that MMU-writable is '0'.  If the SPTE was write-protected for dirty
logging but not page table protection, the SPTE could locklessly be made
writable, and vCPUs could still be running with writable mappings cached
in their TLB.

Fix this by only skipping setting the SPTE if the SPTE is already
write-protected *and* MMU-writable is already clear.  Technically,
checking only MMU-writable would suffice; a SPTE cannot be writable
without MMU-writable being set.  But check both to be paranoid and
because it arguably yields more readable code.

Fixes: 46044f72 ("kvm: x86/mmu: Support write protection for nesting in tdp MMU")
Cc: stable@vger.kernel.org
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220113233020.3986005-2-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c8a4742

20 12月, 2021 1 次提交

KVM: x86/mmu: Don't advance iterator after restart due to yielding · 3a0f64de

由 Sean Christopherson 提交于 12月 14, 2021

After dropping mmu_lock in the TDP MMU, restart the iterator during
tdp_iter_next() and do not advance the iterator.  Advancing the iterator
results in skipping the top-level SPTE and all its children, which is
fatal if any of the skipped SPTEs were not visited before yielding.

When zapping all SPTEs, i.e. when min_level == root_level, restarting the
iter and then invoking tdp_iter_next() is always fatal if the current gfn
has as a valid SPTE, as advancing the iterator results in try_step_side()
skipping the current gfn, which wasn't visited before yielding.

Sprinkle WARNs on iter->yielded being true in various helpers that are
often used in conjunction with yielding, and tag the helper with
__must_check to reduce the probabily of improper usage.

Failing to zap a top-level SPTE manifests in one of two ways.  If a valid
SPTE is skipped by both kvm_tdp_mmu_zap_all() and kvm_tdp_mmu_put_root(),
the shadow page will be leaked and KVM will WARN accordingly.

  WARNING: CPU: 1 PID: 3509 at arch/x86/kvm/mmu/tdp_mmu.c:46 [kvm]
  RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x3e/0x50 [kvm]
  Call Trace:
   <TASK>
   kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
   kvm_destroy_vm+0x162/0x2a0 [kvm]
   kvm_vcpu_release+0x34/0x60 [kvm]
   __fput+0x82/0x240
   task_work_run+0x5c/0x90
   do_exit+0x364/0xa10
   ? futex_unqueue+0x38/0x60
   do_group_exit+0x33/0xa0
   get_signal+0x155/0x850
   arch_do_signal_or_restart+0xed/0x750
   exit_to_user_mode_prepare+0xc5/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x48/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

If kvm_tdp_mmu_zap_all() skips a gfn/SPTE but that SPTE is then zapped by
kvm_tdp_mmu_put_root(), KVM triggers a use-after-free in the form of
marking a struct page as dirty/accessed after it has been put back on the
free list.  This directly triggers a WARN due to encountering a page with
page_count() == 0, but it can also lead to data corruption and additional
errors in the kernel.

  WARNING: CPU: 7 PID: 1995658 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:171
  RIP: 0010:kvm_is_zone_device_pfn.part.0+0x9e/0xd0 [kvm]
  Call Trace:
   <TASK>
   kvm_set_pfn_dirty+0x120/0x1d0 [kvm]
   __handle_changed_spte+0x92e/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   __handle_changed_spte+0x63c/0xca0 [kvm]
   zap_gfn_range+0x549/0x620 [kvm]
   kvm_tdp_mmu_put_root+0x1b6/0x270 [kvm]
   mmu_free_root_page+0x219/0x2c0 [kvm]
   kvm_mmu_free_roots+0x1b4/0x4e0 [kvm]
   kvm_mmu_unload+0x1c/0xa0 [kvm]
   kvm_arch_destroy_vm+0x1f2/0x5c0 [kvm]
   kvm_put_kvm+0x3b1/0x8b0 [kvm]
   kvm_vcpu_release+0x4e/0x70 [kvm]
   __fput+0x1f7/0x8c0
   task_work_run+0xf8/0x1a0
   do_exit+0x97b/0x2230
   do_group_exit+0xda/0x2a0
   get_signal+0x3be/0x1e50
   arch_do_signal_or_restart+0x244/0x17f0
   exit_to_user_mode_prepare+0xcb/0x120
   syscall_exit_to_user_mode+0x1d/0x40
   do_syscall_64+0x4d/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Note, the underlying bug existed even before commit 1af4a960 ("KVM:
x86/mmu: Yield in TDU MMU iter even if no SPTES changed") moved calls to
tdp_mmu_iter_cond_resched() to the beginning of loops, as KVM could still
incorrectly advance past a top-level entry when yielding on a lower-level
entry.  But with respect to leaking shadow pages, the bug was introduced
by yielding before processing the current gfn.

Alternatively, tdp_mmu_iter_cond_resched() could simply fall through, or
callers could jump to their "retry" label.  The downside of that approach
is that tdp_mmu_iter_cond_resched() _must_ be called before anything else
in the loop, and there's no easy way to enfornce that requirement.

Ideally, KVM would handling the cond_resched() fully within the iterator
macro (the code is actually quite clean) and avoid this entire class of
bugs, but that is extremely difficult do while also supporting yielding
after tdp_mmu_set_spte_atomic() fails.  Yielding after failing to set a
SPTE is very desirable as the "owner" of the REMOVED_SPTE isn't strictly
bounded, e.g. if it's zapping a high-level shadow page, the REMOVED_SPTE
may block operations on the SPTE for a significant amount of time.

Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 1af4a960 ("KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed")
Reported-by: NIgnat Korchagin <ignat@cloudflare.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211214033528.123268-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a0f64de

08 12月, 2021 1 次提交

KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the direction · bb3b394d

由 Lai Jiangshan 提交于 11月 24, 2021

This bit is very close to mean "role.quadrant is not in use", except that
it is false also when the MMU is mapping guest physical addresses
directly.  In that case, role.quadrant is indeed not in use, but there
are no guest PTEs at all.

Changing the name and direction of the bit removes the special case,
since a guest with paging disabled, or not considering guest paging
structures as is the case for two-dimensional paging, does not have
to deal with 4-byte guest PTEs.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-10-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb3b394d

30 11月, 2021 2 次提交

KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path · 4b85c921

由 Sean Christopherson 提交于 11月 20, 2021

Drop the "flush" param and return values to/from the TDP MMU's helper for
zapping collapsible SPTEs.  Because the helper runs with mmu_lock held
for read, not write, it uses tdp_mmu_zap_spte_atomic(), and the atomic
zap handles the necessary remote TLB flush.

Similarly, because mmu_lock is dropped and re-acquired between zapping
legacy MMUs and zapping TDP MMUs, kvm_mmu_zap_collapsible_sptes() must
handle remote TLB flushes from the legacy MMU before calling into the TDP
MMU.

Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120045046.3940942-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4b85c921

KVM: x86/mmu: Use yield-safe TDP MMU root iter in MMU notifier unmapping · 75333772

由 Sean Christopherson 提交于 11月 20, 2021

Use the yield-safe variant of the TDP MMU iterator when handling an
unmapping event from the MMU notifier, as most occurences of the event
allow yielding.

Fixes: e1eed584 ("KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120015008.3780032-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

75333772

18 11月, 2021 2 次提交

KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() · c7785d85

由 Hou Wenlong 提交于 11月 17, 2021

If the parameter flush is set, zap_gfn_range() would flush remote tlb
when yield, then tlb flush is not needed outside. So use the return
value of zap_gfn_range() directly instead of OR on it in
kvm_unmap_gfn_range() and kvm_tdp_mmu_unmap_gfn_range().

Fixes: 3039bcc7 ("KVM: Move x86's MMU notifier memslot walkers to generic code")
Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <5e16546e228877a4d974f8c0e448a93d52c7a5a9.1637140154.git.houwenlong93@linux.alibaba.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7785d85

KVM: x86/mmu: Fix TLB flush range when handling disconnected pt · 574c3c55

由 Ben Gardon 提交于 11月 15, 2021

When recursively clearing out disconnected pts, the range based TLB
flush in handle_removed_tdp_mmu_page uses the wrong starting GFN,
resulting in the flush mostly missing the affected range. Fix this by
using base_gfn for the flush.

In response to feedback from David Matlack on the RFC version of this
patch, also move a few definitions into the for loop in the function to
prevent unintended references to them in the future.

Fixes: a066e61f ("KVM: x86/mmu: Factor out handling of removed page tables")
CC: stable@vger.kernel.org
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20211115211704.2621644-1-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

574c3c55

11 11月, 2021 1 次提交

KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator · c435d4b7

由 Sean Christopherson 提交于 11月 03, 2021

Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
rcu_dereference().  Shadow pages in the TDP MMU, and thus their SPTEs,
are protected by rcu.

This fixes a Sparse warning at tdp_mmu.c:900:51:
  warning: incorrect type in argument 1 (different address spaces)
  expected unsigned long long [usertype] *sptep
  got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep

Fixes: 7158bee4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211103161833.3769487-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c435d4b7

22 10月, 2021 1 次提交

KVM: x86/mmu: clean up prefetch/prefault/speculative naming · 2839180c

由 Paolo Bonzini 提交于 9月 29, 2021

"prefetch", "prefault" and "speculative" are used throughout KVM to mean
the same thing.  Use a single name, standardizing on "prefetch" which
is already used by various functions such as direct_pte_prefetch,
FNAME(prefetch_gpte), FNAME(pte_prefetch), etc.
Suggested-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2839180c

01 10月, 2021 12 次提交

KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858

由 David Matlack 提交于 8月 17, 2021

mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.

The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity.  Apart from the warning,
no functional change is intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53597858

KVM: MMU: pass kvm_mmu_page struct to make_spte · 7158bee4

由 Paolo Bonzini 提交于 8月 17, 2021

The level and A/D bit support of the new SPTE can be found in the role,
which is stored in the kvm_mmu_page struct.  This merges two arguments
into one.

For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
not use it if the SPTE is already present) so we fetch it just before
calling make_spte.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7158bee4

KVM: MMU: set ad_disabled in TDP MMU role · 87e888ea

由 Paolo Bonzini 提交于 8月 17, 2021

Prepare for removing the ad_disabled argument of make_spte; instead it can
be found in the role of a struct kvm_mmu_page.  First of all, the TDP MMU
must set the role accurately.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87e888ea

KVM: MMU: clean up make_spte return value · ad67e480

由 Paolo Bonzini 提交于 8月 17, 2021

Now that make_spte is called directly by the shadow MMU (rather than
wrapped by set_spte), it only has to return one boolean value.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad67e480

KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault · e710c5f6

由 David Matlack 提交于 9月 24, 2021

The memslot for the faulting gfn is used throughout the page fault
handling code, so capture it in kvm_page_fault as soon as we know the
gfn and use it in the page fault handling code that has direct access
to the kvm_page_fault struct.  Replace various tests using is_noslot_pfn
with more direct tests on fault->slot being NULL.

This, in combination with the subsequent patch, improves "Populate
memory time" in dirty_log_perf_test by 5% when using the legacy MMU.
There is no discerable improvement to the performance of the TDP MMU.

No functional change intended.
Suggested-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-4-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e710c5f6

KVM: MMU: unify tdp_mmu_map_set_spte_atomic and tdp_mmu_set_spte_atomic_no_dirty_log · 6ccf4438

由 Paolo Bonzini 提交于 9月 23, 2021

tdp_mmu_map_set_spte_atomic is not taking care of dirty logging anymore,
the only difference that remains is that it takes a vCPU instead of
the struct kvm.  Merge the two functions.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ccf4438

KVM: MMU: mark page dirty in make_spte · bcc4f2bc

由 Paolo Bonzini 提交于 9月 24, 2021

This simplifies set_spte, which we want to remove, and unifies code
between the shadow MMU and the TDP MMU.  The warning will be added
back later to make_spte as well.

There is a small disadvantage in the TDP MMU; it may unnecessarily mark
a page as dirty twice if two vCPUs end up mapping the same page twice.
However, this is a very small cost for a case that is already rare.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcc4f2bc

KVM: MMU: change tracepoints arguments to kvm_page_fault · f0066d94

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to tracepoints instead of extracting the
arguments from the struct.  This also lets the kvm_mmu_spte_requested
tracepoint pick the gfn directly from fault->gfn, instead of using
the address.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f0066d94

KVM: MMU: change disallowed_hugepage_adjust() arguments to kvm_page_fault · 536f0e6a

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of
extracting the arguments from the struct.  Tweak a bit the conditions
to avoid long lines.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

536f0e6a

KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault · 73a3c659

由 Paolo Bonzini 提交于 8月 07, 2021

Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of
extracting the arguments from the struct; the results are also stored
in the struct, so the callers are adjusted consequently.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73a3c659

KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to kvm_page_fault · cdc47767

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to tdp_mmu_map_handle_target_level() instead of
extracting the arguments from the struct.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cdc47767

KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault · 2f6305dd

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of
extracting the arguments from the struct.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f6305dd

30 9月, 2021 1 次提交

KVM: X86: Don't flush current tlb on shadow page modification · bd047e54

由 Lai Jiangshan 提交于 9月 18, 2021

After any shadow page modification, flushing tlb only on current VCPU
is weird due to other VCPU's tlb might still be stale.

In other words, if there is any mandatory tlb-flushing after shadow page
modification, SET_SPTE_NEED_REMOTE_TLB_FLUSH or remote_flush should be
set and the tlbs of all VCPUs should be flushed.  There is not point to
only flush current tlb except when the request is from vCPU's or pCPU's
activities.

If there was any bug that mandatory tlb-flushing is required and
SET_SPTE_NEED_REMOTE_TLB_FLUSH/remote_flush is failed to set, this patch
would expose the bug in a more destructive way.  The related code paths
are checked and no missing SET_SPTE_NEED_REMOTE_TLB_FLUSH is found yet.

Currently, there is no optional tlb-flushing after sync page related code
is changed to flush tlb timely.  So we can just remove these local flushing
code.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-5-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd047e54

21 8月, 2021 4 次提交

KVM: x86/mmu: Drop 'shared' param from tdp_mmu_link_page() · 9653f2da

由 Sean Christopherson 提交于 8月 10, 2021

Drop @shared from tdp_mmu_link_page() and hardcode it to work for
mmu_lock being held for read.  The helper has exactly one caller and
in all likelihood will only ever have exactly one caller.  Even if KVM
adds a path to install translations without an initiating page fault,
odds are very, very good that the path will just be a wrapper to the
"page fault" handler (both SNP and TDX RFCs propose patches to do
exactly that).

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810224554.2978735-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9653f2da

KVM: x86/mmu: Add detailed page size stats · 71f51d2c

由 Mingwei Zhang 提交于 8月 02, 2021

Existing KVM code tracks the number of large pages regardless of their
sizes. Therefore, when large page of 1GB (or larger) is adopted, the
information becomes less useful because lpages counts a mix of 1G and 2M
pages.

So remove the lpages since it is easy for user space to aggregate the info.
Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
Suggested-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Message-Id: <20210803044607.599629-4-mizhang@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71f51d2c

KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats · 088acd23

由 Sean Christopherson 提交于 8月 02, 2021

Factor in whether or not the old/new SPTEs are shadow-present when
adjusting the large page stats in the TDP MMU.  A modified MMIO SPTE can
toggle the page size bit, as bit 7 is used to store the MMIO generation,
i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
value.

Opportunistically move the logic below the check to verify at least one
of the old/new SPTEs is shadow present.

Use is/was_leaf even though is/was_present would suffice.  The code
generation is roughly equivalent since all flags need to be computed
prior to the code in question, and using the *_leaf flags will minimize
the diff in a future enhancement to account all pages, i.e. will change
the check to "is_leaf != was_leaf".
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>

Fixes: 1699f65c ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Message-Id: <20210803044607.599629-3-mizhang@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

088acd23

Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" · 5a324c24

由 Sean Christopherson 提交于 8月 10, 2021

This together with the next patch will fix a future race between
kvm_zap_gfn_range and the page fault handler, which will happen
when AVIC memslot is going to be only partially disabled.

The performance impact is minimal since kvm_zap_gfn_range is only
called by users, update_mtrr() and kvm_post_set_cr0().

Both only use it if the guest has non-coherent DMA, in order to
honor the guest's UC memtype.

MTRR and CD setup only happens at boot, and generally in an area
where the page tables should be small (for CD) or should not
include the affected GFNs at all (for MTRRs).

This is based on a patch suggested by Sean Christopherson:
https://lkml.org/lkml/2021/7/22/1025Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a324c24

13 8月, 2021 2 次提交

KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs · 0103098f

由 Sean Christopherson 提交于 8月 12, 2021

Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down. This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.

In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time. The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs. In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0103098f

KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs · 524a1e4e

由 Sean Christopherson 提交于 8月 12, 2021

Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.

Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.

Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.

The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.

  =============================================================================
  BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------
  Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
  CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x45/0x59
   slab_err+0x95/0xc9
   __kmem_cache_shutdown.cold+0x3c/0x158
   kmem_cache_destroy+0x3d/0xf0
   kvm_mmu_module_exit+0xa/0x30 [kvm]
   kvm_arch_exit+0x5d/0x90 [kvm]
   kvm_exit+0x78/0x90 [kvm]
   vmx_exit+0x1a/0x50 [kvm_intel]
   __x64_sys_delete_module+0x13f/0x220
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

524a1e4e

06 8月, 2021 1 次提交

KVM: x86/mmu: Leverage vcpu->last_used_slot in tdp_mmu_map_handle_target_level · 081de470

由 David Matlack 提交于 8月 04, 2021

The existing TDP MMU methods to handle dirty logging are vcpu-agnostic
since they can be driven by MMU notifiers and other non-vcpu-specific
events in addition to page faults. However this means that the TDP MMU
is not benefiting from the new vcpu->last_used_slot. Fix that by
introducing a tdp_mmu_map_set_spte_atomic() which is only called during
a TDP page fault and has access to the kvm_vcpu for fast slot lookups.

This improves "Populate memory time" in dirty_log_perf_test by 5%:

Command                         | Before           | After
------------------------------- | ---------------- | -------------
./dirty_log_perf_test -v64 -x64 | 5.472321072s     | 5.169832886s
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

081de470

03 8月, 2021 1 次提交

KVM: const-ify all relevant uses of struct kvm_memory_slot · 269e9552

由 Hamza Mahfooz 提交于 7月 12, 2021

As alluded to in commit f36f3f28 ("KVM: add "new" argument to
kvm_arch_commit_memory_region"), a bunch of other places where struct
kvm_memory_slot is used, needs to be refactored to preserve the
"const"ness of struct kvm_memory_slot across-the-board.
Signed-off-by: NHamza Mahfooz <someguy@effective-light.com>
Message-Id: <20210713023338.57108-1-someguy@effective-light.com>
[Do not touch body of slot_rmap_walk_init. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

269e9552

02 8月, 2021 2 次提交

KVM: x86: enable TDP MMU by default · 71ba3f31

由 Paolo Bonzini 提交于 7月 26, 2021

With the addition of fast page fault support, the TDP-specific MMU has reached
feature parity with the original MMU. All my testing in the last few months
has been done with the TDP MMU; switch the default on 64-bit machines.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71ba3f31

KVM: x86/mmu: fast_page_fault support for the TDP MMU · 6e8eb206

由 David Matlack 提交于 7月 13, 2021

Make fast_page_fault interoperate with the TDP MMU by leveraging
walk_shadow_page_lockless_{begin,end} to acquire the RCU read lock and
introducing a new helper function kvm_tdp_mmu_fast_pf_get_last_sptep to
grab the lowest level sptep.
Suggested-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e8eb206

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功