提交 · c435d4b7badfa791fea2a1a97a07e02e7134a150 · openeuler / Kernel

11 11月, 2021 1 次提交

KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator · c435d4b7

由 Sean Christopherson 提交于 11月 03, 2021

Wrap the read of iter->sptep in tdp_mmu_map_handle_target_level() with
rcu_dereference().  Shadow pages in the TDP MMU, and thus their SPTEs,
are protected by rcu.

This fixes a Sparse warning at tdp_mmu.c:900:51:
  warning: incorrect type in argument 1 (different address spaces)
  expected unsigned long long [usertype] *sptep
  got unsigned long long [noderef] [usertype] __rcu *[usertype] sptep

Fixes: 7158bee4 ("KVM: MMU: pass kvm_mmu_page struct to make_spte")
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211103161833.3769487-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c435d4b7

22 10月, 2021 1 次提交

KVM: x86/mmu: clean up prefetch/prefault/speculative naming · 2839180c

由 Paolo Bonzini 提交于 9月 29, 2021

"prefetch", "prefault" and "speculative" are used throughout KVM to mean
the same thing.  Use a single name, standardizing on "prefetch" which
is already used by various functions such as direct_pte_prefetch,
FNAME(prefetch_gpte), FNAME(pte_prefetch), etc.
Suggested-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2839180c

01 10月, 2021 12 次提交

KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858

由 David Matlack 提交于 8月 17, 2021

mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.

The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity.  Apart from the warning,
no functional change is intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53597858

KVM: MMU: pass kvm_mmu_page struct to make_spte · 7158bee4

由 Paolo Bonzini 提交于 8月 17, 2021

The level and A/D bit support of the new SPTE can be found in the role,
which is stored in the kvm_mmu_page struct.  This merges two arguments
into one.

For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
not use it if the SPTE is already present) so we fetch it just before
calling make_spte.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7158bee4

KVM: MMU: set ad_disabled in TDP MMU role · 87e888ea

由 Paolo Bonzini 提交于 8月 17, 2021

Prepare for removing the ad_disabled argument of make_spte; instead it can
be found in the role of a struct kvm_mmu_page.  First of all, the TDP MMU
must set the role accurately.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87e888ea

KVM: MMU: clean up make_spte return value · ad67e480

由 Paolo Bonzini 提交于 8月 17, 2021

Now that make_spte is called directly by the shadow MMU (rather than
wrapped by set_spte), it only has to return one boolean value.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad67e480

KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault · e710c5f6

由 David Matlack 提交于 9月 24, 2021

The memslot for the faulting gfn is used throughout the page fault
handling code, so capture it in kvm_page_fault as soon as we know the
gfn and use it in the page fault handling code that has direct access
to the kvm_page_fault struct.  Replace various tests using is_noslot_pfn
with more direct tests on fault->slot being NULL.

This, in combination with the subsequent patch, improves "Populate
memory time" in dirty_log_perf_test by 5% when using the legacy MMU.
There is no discerable improvement to the performance of the TDP MMU.

No functional change intended.
Suggested-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-4-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e710c5f6

KVM: MMU: unify tdp_mmu_map_set_spte_atomic and tdp_mmu_set_spte_atomic_no_dirty_log · 6ccf4438

由 Paolo Bonzini 提交于 9月 23, 2021

tdp_mmu_map_set_spte_atomic is not taking care of dirty logging anymore,
the only difference that remains is that it takes a vCPU instead of
the struct kvm.  Merge the two functions.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ccf4438

KVM: MMU: mark page dirty in make_spte · bcc4f2bc

由 Paolo Bonzini 提交于 9月 24, 2021

This simplifies set_spte, which we want to remove, and unifies code
between the shadow MMU and the TDP MMU.  The warning will be added
back later to make_spte as well.

There is a small disadvantage in the TDP MMU; it may unnecessarily mark
a page as dirty twice if two vCPUs end up mapping the same page twice.
However, this is a very small cost for a case that is already rare.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bcc4f2bc

KVM: MMU: change tracepoints arguments to kvm_page_fault · f0066d94

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to tracepoints instead of extracting the
arguments from the struct.  This also lets the kvm_mmu_spte_requested
tracepoint pick the gfn directly from fault->gfn, instead of using
the address.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f0066d94

KVM: MMU: change disallowed_hugepage_adjust() arguments to kvm_page_fault · 536f0e6a

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of
extracting the arguments from the struct.  Tweak a bit the conditions
to avoid long lines.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

536f0e6a

KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault · 73a3c659

由 Paolo Bonzini 提交于 8月 07, 2021

Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of
extracting the arguments from the struct; the results are also stored
in the struct, so the callers are adjusted consequently.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73a3c659

KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to kvm_page_fault · cdc47767

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to tdp_mmu_map_handle_target_level() instead of
extracting the arguments from the struct.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cdc47767

KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault · 2f6305dd

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of
extracting the arguments from the struct.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f6305dd

30 9月, 2021 1 次提交

KVM: X86: Don't flush current tlb on shadow page modification · bd047e54

由 Lai Jiangshan 提交于 9月 18, 2021

After any shadow page modification, flushing tlb only on current VCPU
is weird due to other VCPU's tlb might still be stale.

In other words, if there is any mandatory tlb-flushing after shadow page
modification, SET_SPTE_NEED_REMOTE_TLB_FLUSH or remote_flush should be
set and the tlbs of all VCPUs should be flushed.  There is not point to
only flush current tlb except when the request is from vCPU's or pCPU's
activities.

If there was any bug that mandatory tlb-flushing is required and
SET_SPTE_NEED_REMOTE_TLB_FLUSH/remote_flush is failed to set, this patch
would expose the bug in a more destructive way.  The related code paths
are checked and no missing SET_SPTE_NEED_REMOTE_TLB_FLUSH is found yet.

Currently, there is no optional tlb-flushing after sync page related code
is changed to flush tlb timely.  So we can just remove these local flushing
code.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-5-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd047e54

21 8月, 2021 4 次提交

KVM: x86/mmu: Drop 'shared' param from tdp_mmu_link_page() · 9653f2da

由 Sean Christopherson 提交于 8月 10, 2021

Drop @shared from tdp_mmu_link_page() and hardcode it to work for
mmu_lock being held for read.  The helper has exactly one caller and
in all likelihood will only ever have exactly one caller.  Even if KVM
adds a path to install translations without an initiating page fault,
odds are very, very good that the path will just be a wrapper to the
"page fault" handler (both SNP and TDX RFCs propose patches to do
exactly that).

No functional change intended.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810224554.2978735-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9653f2da

KVM: x86/mmu: Add detailed page size stats · 71f51d2c

由 Mingwei Zhang 提交于 8月 02, 2021

Existing KVM code tracks the number of large pages regardless of their
sizes. Therefore, when large page of 1GB (or larger) is adopted, the
information becomes less useful because lpages counts a mix of 1G and 2M
pages.

So remove the lpages since it is easy for user space to aggregate the info.
Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
Suggested-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Cc: Jing Zhang <jingzhangos@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Message-Id: <20210803044607.599629-4-mizhang@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71f51d2c

KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats · 088acd23

由 Sean Christopherson 提交于 8月 02, 2021

Factor in whether or not the old/new SPTEs are shadow-present when
adjusting the large page stats in the TDP MMU.  A modified MMIO SPTE can
toggle the page size bit, as bit 7 is used to store the MMIO generation,
i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
value.

Opportunistically move the logic below the check to verify at least one
of the old/new SPTEs is shadow present.

Use is/was_leaf even though is/was_present would suffice.  The code
generation is roughly equivalent since all flags need to be computed
prior to the code in question, and using the *_leaf flags will minimize
the diff in a future enhancement to account all pages, i.e. will change
the check to "is_leaf != was_leaf".
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>

Fixes: 1699f65c ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Message-Id: <20210803044607.599629-3-mizhang@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

088acd23

Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" · 5a324c24

由 Sean Christopherson 提交于 8月 10, 2021

This together with the next patch will fix a future race between
kvm_zap_gfn_range and the page fault handler, which will happen
when AVIC memslot is going to be only partially disabled.

The performance impact is minimal since kvm_zap_gfn_range is only
called by users, update_mtrr() and kvm_post_set_cr0().

Both only use it if the guest has non-coherent DMA, in order to
honor the guest's UC memtype.

MTRR and CD setup only happens at boot, and generally in an area
where the page tables should be small (for CD) or should not
include the affected GFNs at all (for MTRRs).

This is based on a patch suggested by Sean Christopherson:
https://lkml.org/lkml/2021/7/22/1025Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a324c24

13 8月, 2021 2 次提交

KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs · 0103098f

由 Sean Christopherson 提交于 8月 12, 2021

Set the min_level for the TDP iterator at the root level when zapping all
SPTEs to optimize the iterator's try_step_down(). Zapping a non-leaf
SPTE will recursively zap all its children, thus there is no need for the
iterator to attempt to step down. This avoids rereading the top-level
SPTEs after they are zapped by causing try_step_down() to short-circuit.

In most cases, optimizing try_step_down() will be in the noise as the cost
of zapping SPTEs completely dominates the overall time. The optimization
is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
is zapping in response to multiple memslot updates when userspace is adding
and removing read-only memslots for option ROMs. In that case, the task
doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
for read and thus can be a noisy neighbor of sorts.
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0103098f

KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs · 524a1e4e

由 Sean Christopherson 提交于 8月 12, 2021

Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
paging, the "zap all" flows will skip top-level SPTEs whose range extends
beyond shadow_phys_bits and leak their SPs when the VM is destroyed.

Use the current upper bound (based on host.MAXPHYADDR) to detect that the
caller wants to zap all SPTEs, e.g. instead of using the max theoretical
gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.

Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
help future debuggers should KVM decide to leak SPTEs again.

The bug is most easily reproduced by running (and unloading!) KVM in a
VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.

  =============================================================================
  BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------
  Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
  CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack_lvl+0x45/0x59
   slab_err+0x95/0xc9
   __kmem_cache_shutdown.cold+0x3c/0x158
   kmem_cache_destroy+0x3d/0xf0
   kvm_mmu_module_exit+0xa/0x30 [kvm]
   kvm_arch_exit+0x5d/0x90 [kvm]
   kvm_exit+0x78/0x90 [kvm]
   vmx_exit+0x1a/0x50 [kvm_intel]
   __x64_sys_delete_module+0x13f/0x220
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812181414.3376143-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

524a1e4e

06 8月, 2021 1 次提交

KVM: x86/mmu: Leverage vcpu->last_used_slot in tdp_mmu_map_handle_target_level · 081de470

由 David Matlack 提交于 8月 04, 2021

The existing TDP MMU methods to handle dirty logging are vcpu-agnostic
since they can be driven by MMU notifiers and other non-vcpu-specific
events in addition to page faults. However this means that the TDP MMU
is not benefiting from the new vcpu->last_used_slot. Fix that by
introducing a tdp_mmu_map_set_spte_atomic() which is only called during
a TDP page fault and has access to the kvm_vcpu for fast slot lookups.

This improves "Populate memory time" in dirty_log_perf_test by 5%:

Command                         | Before           | After
------------------------------- | ---------------- | -------------
./dirty_log_perf_test -v64 -x64 | 5.472321072s     | 5.169832886s
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

081de470

03 8月, 2021 1 次提交

KVM: const-ify all relevant uses of struct kvm_memory_slot · 269e9552

由 Hamza Mahfooz 提交于 7月 12, 2021

As alluded to in commit f36f3f28 ("KVM: add "new" argument to
kvm_arch_commit_memory_region"), a bunch of other places where struct
kvm_memory_slot is used, needs to be refactored to preserve the
"const"ness of struct kvm_memory_slot across-the-board.
Signed-off-by: NHamza Mahfooz <someguy@effective-light.com>
Message-Id: <20210713023338.57108-1-someguy@effective-light.com>
[Do not touch body of slot_rmap_walk_init. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

269e9552

02 8月, 2021 3 次提交

KVM: x86: enable TDP MMU by default · 71ba3f31

由 Paolo Bonzini 提交于 7月 26, 2021

With the addition of fast page fault support, the TDP-specific MMU has reached
feature parity with the original MMU. All my testing in the last few months
has been done with the TDP MMU; switch the default on 64-bit machines.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71ba3f31

KVM: x86/mmu: fast_page_fault support for the TDP MMU · 6e8eb206

由 David Matlack 提交于 7月 13, 2021

Make fast_page_fault interoperate with the TDP MMU by leveraging
walk_shadow_page_lockless_{begin,end} to acquire the RCU read lock and
introducing a new helper function kvm_tdp_mmu_fast_pf_get_last_sptep to
grab the lowest level sptep.
Suggested-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e8eb206

KVM: x86/mmu: Make walk_shadow_page_lockless_{begin,end} interoperate with the TDP MMU · c5c8c7c5

由 David Matlack 提交于 7月 13, 2021

Acquire the RCU read lock in walk_shadow_page_lockless_begin and release
it in walk_shadow_page_lockless_end when the TDP MMU is enabled.  This
should not introduce any functional changes but is used in the following
commit to make fast_page_fault interoperate with the TDP MMU.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-4-dmatlack@google.com>
[Use if...else instead of if(){return;}]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5c8c7c5

18 6月, 2021 7 次提交

KVM: x86/mmu: Remove redundant root_hpa checks · 0485cf8d

由 David Matlack 提交于 6月 17, 2021

The root_hpa checks below the top-level check in kvm_mmu_page_fault are
theoretically redundant since there is no longer a way for the root_hpa
to be reset during a page fault. The details of why are described in
commit ddce6208 ("KVM: x86/mmu: Move root_hpa validity checks to top
of page fault handler")

__direct_map, kvm_tdp_mmu_map, and get_mmio_spte are all only reachable
through kvm_mmu_page_fault, therefore their root_hpa checks are
redundant.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210617231948.2591431-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0485cf8d

KVM: x86/mmu: Remove redundant is_tdp_mmu_root check · aa23c0ad

由 David Matlack 提交于 6月 17, 2021

The check for is_tdp_mmu_root in kvm_tdp_mmu_map is redundant because
kvm_tdp_mmu_map's only caller (direct_page_fault) already checks
is_tdp_mmu_root.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210617231948.2591431-2-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aa23c0ad

KVM: x86/mmu: Fix TDP MMU page table level · f1b83255

由 Kai Huang 提交于 6月 15, 2021

TDP MMU iterator's level is identical to page table's actual level. For
instance, for the last level page table (whose entry points to one 4K
page), iter->level is 1 (PG_LEVEL_4K), and in case of 5 level paging,
the iter->level is mmu->shadow_root_level, which is 5. However, struct
kvm_mmu_page's level currently is not set correctly when it is allocated
in kvm_tdp_mmu_map(). When iterator hits non-present SPTE and needs to
allocate a new child page table, currently iter->level, which is the
level of the page table where the non-present SPTE belongs to, is used.
This results in struct kvm_mmu_page's level always having its parent's
level (excpet root table's level, which is initialized explicitly using
mmu->shadow_root_level).

This is kinda wrong, and not consistent with existing non TDP MMU code.
Fortuantely sp->role.level is only used in handle_removed_tdp_mmu_page()
and kvm_tdp_mmu_zap_sp(), and they are already aware of this and behave
correctly. However to make it consistent with legacy MMU code (and fix
the issue that both root page table and its child page table have
shadow_root_level), use iter->level - 1 in kvm_tdp_mmu_map(), and change
handle_removed_tdp_mmu_page() and kvm_tdp_mmu_zap_sp() accordingly.
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <bcb6569b6e96cb78aaa7b50640e6e6b53291a74e.1623717884.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f1b83255

KVM: x86/mmu: Fix pf_fixed count in tdp_mmu_map_handle_target_level() · 857f8474

由 Kai Huang 提交于 6月 15, 2021

Currently pf_fixed is not increased when prefault is true. This is not
correct, since prefault here really means "async page fault completed".
In that case, the original page fault from the guest was morphed into as
async page fault and pf_fixed was not increased. So when prefault
indicates async page fault is completed, pf_fixed should be increased.

Additionally, currently pf_fixed is also increased even when page fault
is spurious, while legacy MMU increases pf_fixed when page fault returns
RET_PF_EMULATE or RET_PF_FIXED.

To fix above two issues, change to increase pf_fixed when return value
is not RET_PF_SPURIOUS (RET_PF_RETRY has already been ruled out by
reaching here).

More information:
https://lore.kernel.org/kvm/cover.1620200410.git.kai.huang@intel.com/T/#mbb5f8083e58a2cd262231512b9211cbe70fc3bd5

Fixes: bb18842e ("kvm: x86/mmu: Add TDP MMU PF handler")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <2ea8b7f5d4f03c99b32bc56fc982e1e4e3d3fc6b.1623717884.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

857f8474

KVM: x86/mmu: Fix return value in tdp_mmu_map_handle_target_level() · 57a3e96d

由 Kai Huang 提交于 6月 15, 2021

Currently tdp_mmu_map_handle_target_level() returns 0, which is
RET_PF_RETRY, when page fault is actually fixed.  This makes
kvm_tdp_mmu_map() also return RET_PF_RETRY in this case, instead of
RET_PF_FIXED.  Fix by initializing ret to RET_PF_FIXED.

Note that kvm_mmu_page_fault() resumes guest on both RET_PF_RETRY and
RET_PF_FIXED, which means in practice returning the two won't make
difference, so this fix alone won't be necessary for stable tree.

Fixes: bb18842e ("kvm: x86/mmu: Add TDP MMU PF handler")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <f9e8956223a586cd28c090879a8ff40f5eb6d609.1623717884.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57a3e96d

KVM: x86/mmu: Lazily allocate memslot rmaps · d501f747

由 Ben Gardon 提交于 5月 18, 2021

If the TDP MMU is in use, wait to allocate the rmaps until the shadow
MMU is actually used. (i.e. a nested VM is launched.) This saves memory
equal to 0.2% of guest memory in cases where the TDP MMU is used and
there are no nested guests involved.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210518173414.450044-8-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d501f747

KVM: x86: Support write protecting only large pages · 3ad93562

由 Keqian Zhu 提交于 4月 29, 2021

Prepare for write protecting large page lazily during dirty log tracking,
for which we will only need to write protect gfns at large page
granularity.

No functional or performance change expected.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20210429034115.35560-2-zhukeqian1@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ad93562

27 5月, 2021 1 次提交

KVM: x86/mmu: Fix comment mentioning skip_4k · bedd9195

由 David Matlack 提交于 5月 26, 2021

This comment was left over from a previous version of the patch that
introduced wrprot_gfn_range, when skip_4k was passed in instead of
min_level.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210526163227.3113557-1-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bedd9195

13 5月, 2021 1 次提交
- I
  x86: Fix leftover comment typos · c4342633
  由 Ingo Molnar 提交于 5月 12, 2021
```
Signed-off-by: NIngo Molnar <mingo@kernel.org>
```
  c4342633
03 5月, 2021 3 次提交

KVM: x86/mmu: Fix kdoc of __handle_changed_spte · 7f6231a3

由 Kai Huang 提交于 5月 03, 2021

The function name of kdoc of __handle_changed_spte() should be itself,
rather than handle_changed_spte().  Fix the typo.
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <20210503042446.154695-1-kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f6231a3

kvm/x86: Fix 'lpages' kvm stat for TDM MMU · 1699f65c

由 Shahin, Md Shahadat Hossain 提交于 4月 30, 2021

Large pages not being created properly may result in increased memory
access time. The 'lpages' kvm stat used to keep track of the current
number of large pages in the system, but with TDP MMU enabled the stat
is not showing the correct number.

This patch extends the lpages counter to cover the TDP case.
Signed-off-by: NMd Shahadat Hossain Shahin <shahinmd@amazon.de>
Cc: Bartosz Szczepanek <bsz@amazon.de>
Message-Id: <1619783551459.35424@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1699f65c

KVM: x86/mmu: Avoid unnecessary page table allocation in kvm_tdp_mmu_map() · ff76d506

由 Kai Huang 提交于 4月 29, 2021

In kvm_tdp_mmu_map(), while iterating TDP MMU page table entries, it is
possible SPTE has already been frozen by another thread but the frozen
is not done yet, for instance, when another thread is still in middle of
zapping large page. In this case, the !is_shadow_present_pte() check
for old SPTE in tdp_mmu_for_each_pte() may hit true, and in this case
allocating new page table is unnecessary since tdp_mmu_set_spte_atomic()
later will return false and page table will need to be freed. Add
is_removed_spte() check before allocating new page table to avoid this.
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <20210429041226.50279-1-kai.huang@intel.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff76d506

20 4月, 2021 2 次提交

KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns · 4c6654bd

由 Ben Gardon 提交于 4月 01, 2021

To avoid saddling a vCPU thread with the work of tearing down an entire
paging structure, take a reference on each root before they become
obsolete, so that the thread initiating the fast invalidation can tear
down the paging structure and (most likely) release the last reference.
As a bonus, this teardown can happen under the MMU lock in read mode so
as not to block the progress of vCPU threads.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-14-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c6654bd

KVM: x86/mmu: Fast invalidation for TDP MMU · b7cccd39

由 Ben Gardon 提交于 4月 01, 2021

Provide a real mechanism for fast invalidation by marking roots as
invalid so that their reference count will quickly fall to zero
and they will be torn down.

One negative side affect of this approach is that a vCPU thread will
likely drop the last reference to a root and be saddled with the work of
tearing down an entire paging structure. This issue will be resolved in
a later commit.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-13-bgardon@google.com>
[Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b7cccd39

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功