提交 · 2822da446640d82b7bf65800314ef2a825e8df13 · openeuler / Kernel

21 8月, 2021 3 次提交

KVM: x86/mmu: fix parameters to kvm_flush_remote_tlbs_with_address · 2822da44

由 Maxim Levitsky 提交于 8月 10, 2021

kvm_flush_remote_tlbs_with_address expects (start gfn, number of pages),
and not (start gfn, end gfn)
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2822da44

Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock" · 5a324c24

由 Sean Christopherson 提交于 8月 10, 2021

This together with the next patch will fix a future race between
kvm_zap_gfn_range and the page fault handler, which will happen
when AVIC memslot is going to be only partially disabled.

The performance impact is minimal since kvm_zap_gfn_range is only
called by users, update_mtrr() and kvm_post_set_cr0().

Both only use it if the guest has non-coherent DMA, in order to
honor the guest's UC memtype.

MTRR and CD setup only happens at boot, and generally in an area
where the page tables should be small (for CD) or should not
include the affected GFNs at all (for MTRRs).

This is based on a patch suggested by Sean Christopherson:
https://lkml.org/lkml/2021/7/22/1025Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210810205251.424103-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a324c24

KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file · 3bcd0662

由 Peter Xu 提交于 7月 30, 2021

Use this file to dump rmap statistic information. The statistic is done by
calculating the rmap count and the result is log-2-based.

An example output of this looks like (idle 6GB guest, right after boot linux):

Rmap_Count: 0 1 2-3 4-7 8-15 16-31 32-63 64-127 128-255 256-511 512-1023
Level=4K: 3086676 53045 12330 1272 502 121 76 2 0 0 0
Level=2M: 5947 231 0 0 0 0 0 0 0 0 0
Level=1G: 32 0 0 0 0 0 0 0 0 0 0
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220455.26054-5-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3bcd0662

13 8月, 2021 1 次提交

KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock · ce25681d

由 Sean Christopherson 提交于 8月 12, 2021

Add yet another spinlock for the TDP MMU and take it when marking indirect
shadow pages unsync. When using the TDP MMU and L1 is running L2(s) with
nested TDP, KVM may encounter shadow pages for the TDP entries managed by
L1 (controlling L2) when handling a TDP MMU page fault. The unsync logic
is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
which runs with mmu_lock held for read, not write.

Lack of a critical section manifests most visibly as an underflow of
unsync_children in clear_unsync_child_bit() due to unsync_children being
corrupted when multiple CPUs write it without a critical section and
without atomic operations. But underflow is the best case scenario. The
worst case scenario is that unsync_children prematurely hits '0' and
leads to guest memory corruption due to KVM neglecting to properly sync
shadow pages.

Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
would functionally be ok. Usurping the lock could degrade performance when
building upper level page tables on different vCPUs, especially since the
unsync flow could hold the lock for a comparatively long time depending on
the number of indirect shadow pages and the depth of the paging tree.

For simplicity, take the lock for all MMUs, even though KVM could fairly
easily know that mmu_lock is held for write. If mmu_lock is held for
write, there cannot be contention for the inner spinlock, and marking
shadow pages unsync across multiple vCPUs will be slow enough that
bouncing the kvm_arch cacheline should be in the noise.

Note, even though L2 could theoretically be given access to its own EPT
entries, a nested MMU must hold mmu_lock for write and thus cannot race
against a TDP MMU page fault. I.e. the additional spinlock only _needs_ to
be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
that is running with the TDP MMU enabled. Holding mmu_lock for read also
prevents the indirect shadow page from being freed. But as above, keep
it simple and always take the lock.

Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
effectively disable unsync behavior for nested TDP. Write protecting leaf
shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
VMMs typically don't modify TDP entries, but the same may not hold true for
non-standard use cases and/or VMMs that are migrating physical pages (from
L1's perspective).

Alternative #2, the unsync logic could be made thread safe. In theory,
simply converting all relevant kvm_mmu_page fields to atomics and using
atomic bitops for the bitmap would suffice. However, (a) an in-depth audit
would be required, (b) the code churn would be substantial, and (c) legacy
shadow paging would incur additional atomic operations in performance
sensitive paths for no benefit (to legacy shadow paging).

Fixes: a2855afc ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812181815.3378104-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce25681d

06 8月, 2021 2 次提交

KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap · 93e083d4

由 David Matlack 提交于 8月 04, 2021

gfn_to_rmap was removed in the previous patch so there is no need to
retain the double underscore on __gfn_to_rmap.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93e083d4

KVM: x86/mmu: Leverage vcpu->last_used_slot for rmap_add and rmap_recycle · 601f8af0

由 David Matlack 提交于 8月 04, 2021

rmap_add() and rmap_recycle() both run in the context of the vCPU and
thus we can use kvm_vcpu_gfn_to_memslot() to look up the memslot. This
enables rmap_add() and rmap_recycle() to take advantage of
vcpu->last_used_slot and avoid expensive memslot searching.

This change improves the performance of "Populate memory time" in
dirty_log_perf_test with tdp_mmu=N. In addition to improving the
performance, "Populate memory time" no longer scales with the number
of memslots in the VM.

Command                         | Before           | After
------------------------------- | ---------------- | -------------
./dirty_log_perf_test -v64 -x1  | 15.18001570s     | 14.99469366s
./dirty_log_perf_test -v64 -x64 | 18.71336392s     | 14.98675076s
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-6-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

601f8af0

05 8月, 2021 1 次提交

KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds · d5aaad6f

由 Sean Christopherson 提交于 8月 04, 2021

Take a signed 'long' instead of an 'unsigned long' for the number of
pages to add/subtract to the total number of pages used by the MMU. This
fixes a zero-extension bug on 32-bit kernels that effectively corrupts
the per-cpu counter used by the shrinker.

Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
an unsigned 32-bit value on 32-bit kernels. As a result, the value used
to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
4294967295 and effectively corrupts the counter.

This was found by a staggering amount of sheer dumb luck when running
kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick
in while running tests and do_shrink_slab() logged an error about trying
to free a negative number of objects. The truly lucky part is that the
kernel just happened to be a slightly stale build, as the shrinker no
longer yells about negative objects as of commit 18bb473e ("mm:
vmscan: shrink deferred objects proportional to priority").

vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460

Fixes: bc8a3d89 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210804214609.1096003-1-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5aaad6f

04 8月, 2021 3 次提交

KVM: X86: Optimize zapping rmap · a75b5404

由 Peter Xu 提交于 7月 30, 2021

Using rmap_get_first() and rmap_remove() for zapping a huge rmap list could be
slow.  The easy way is to travers the rmap list, collecting the a/d bits and
free the slots along the way.

Provide a pte_list_destroy() and do exactly that.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220605.26377-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a75b5404

KVM: X86: Optimize pte_list_desc with per-array counter · 13236e25

由 Peter Xu 提交于 7月 30, 2021

Add a counter field into pte_list_desc, so as to simplify the add/remove/loop
logic. E.g., we don't need to loop over the array any more for most reasons.

This will make more sense after we've switched the array size to be larger
otherwise the counter will be a waste.

Initially I wanted to store a tail pointer at the head of the array list so we
don't need to traverse the list at least for pushing new ones (if without the
counter we traverse both the list and the array). However that'll need
slightly more change without a huge lot benefit, e.g., after we grow entry
numbers per array the list traversing is not so expensive.

So let's be simple but still try to get as much benefit as we can with just
these extra few lines of changes (not to mention the code looks easier too
without looping over arrays).

I used the same a test case to fork 500 child and recycle them ("./rmap_fork
500" [1]), this patch further speeds up the total fork time of about 4%, which
is a total of 33% of vanilla kernel:

Vanilla: 473.90 (+-5.93%)
3->15 slots: 366.10 (+-4.94%)
Add counter: 351.00 (+-3.70%)

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220602.26327-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13236e25

KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger · dc1cff96

由 Peter Xu 提交于 7月 30, 2021

Currently rmap array element only contains 3 entries. However for EPT=N there
could have a lot of guest pages that got tens of even hundreds of rmap entry.

A normal distribution of a 6G guest (even if idle) shows this with rmap count
statistics:

Rmap_Count: 0 1 2-3 4-7 8-15 16-31 32-63 64-127 128-255 256-511 512-1023
Level=4K: 3089171 49005 14016 1363 235 212 15 7 0 0 0
Level=2M: 5951 227 0 0 0 0 0 0 0 0 0
Level=1G: 32 0 0 0 0 0 0 0 0 0 0

If we do some more fork some pages will grow even larger rmap counts.

This patch makes PTE_LIST_EXT bigger so it'll be more efficient for the general
use case of EPT=N as we do list reference less and the loops over PTE_LIST_EXT
will be slightly more efficient; but still not too large so less waste when
array not full.

It should not affecting EPT=Y since EPT normally only has zero or one rmap
entry for each page, so no array is even allocated.

With a test case to fork 500 child and recycle them ("./rmap_fork 500" [1]),
this patch speeds up fork time of about 29%.

Before: 473.90 (+-5.93%)
After: 366.10 (+-4.94%)

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220455.26054-6-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dc1cff96

03 8月, 2021 1 次提交

KVM: const-ify all relevant uses of struct kvm_memory_slot · 269e9552

由 Hamza Mahfooz 提交于 7月 12, 2021

As alluded to in commit f36f3f28 ("KVM: add "new" argument to
kvm_arch_commit_memory_region"), a bunch of other places where struct
kvm_memory_slot is used, needs to be refactored to preserve the
"const"ness of struct kvm_memory_slot across-the-board.
Signed-off-by: NHamza Mahfooz <someguy@effective-light.com>
Message-Id: <20210713023338.57108-1-someguy@effective-light.com>
[Do not touch body of slot_rmap_walk_init. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

269e9552

02 8月, 2021 7 次提交

KVM: x86/mmu: fast_page_fault support for the TDP MMU · 6e8eb206

由 David Matlack 提交于 7月 13, 2021

Make fast_page_fault interoperate with the TDP MMU by leveraging
walk_shadow_page_lockless_{begin,end} to acquire the RCU read lock and
introducing a new helper function kvm_tdp_mmu_fast_pf_get_last_sptep to
grab the lowest level sptep.
Suggested-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e8eb206

KVM: x86/mmu: Make walk_shadow_page_lockless_{begin,end} interoperate with the TDP MMU · c5c8c7c5

由 David Matlack 提交于 7月 13, 2021

Acquire the RCU read lock in walk_shadow_page_lockless_begin and release
it in walk_shadow_page_lockless_end when the TDP MMU is enabled.  This
should not introduce any functional changes but is used in the following
commit to make fast_page_fault interoperate with the TDP MMU.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-4-dmatlack@google.com>
[Use if...else instead of if(){return;}]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5c8c7c5

KVM: x86/mmu: Rename cr2_or_gpa to gpa in fast_page_fault · 76cd325e

由 David Matlack 提交于 7月 13, 2021

fast_page_fault is only called from direct_page_fault where we know the
address is a gpa.

Fixes: 736c291c ("KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVM")
Reviewed-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-2-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76cd325e

KVM: X86: Add per-vm stat for max rmap list size · ec1cf69c

由 Peter Xu 提交于 6月 25, 2021

Add a new statistic max_mmu_rmap_size, which stores the maximum size of rmap
for the vm.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210625153214.43106-2-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec1cf69c

KVM: x86/mmu: Return old SPTE from mmu_spte_clear_track_bits() · 7fa2a347

由 Sean Christopherson 提交于 7月 02, 2021

Return the old SPTE when clearing a SPTE and push the "old SPTE present"
check to the caller.  Private shadow page support will use the old SPTE
in rmap_remove() to determine whether or not there is a linked private
shadow page.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <b16bac1fd1357aaf39e425aab2177d3f89ee8318.1625186503.git.isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7fa2a347

KVM: x86/mmu: Refactor shadow walk in __direct_map() to reduce indentation · 03fffc54

由 Sean Christopherson 提交于 7月 02, 2021

Employ a 'continue' to reduce the indentation for linking a new shadow
page during __direct_map() in preparation for linking private pages.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <702419686d5700373123f6ea84e7a946c2cad8b4.1625186503.git.isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03fffc54

KVM: x86/mmu: Mark VM as bugged if page fault returns RET_PF_INVALID · 19025e7b

由 Sean Christopherson 提交于 7月 02, 2021

Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <298980aa5fc5707184ac082287d13a800cd9c25f.1625186503.git.isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19025e7b

15 7月, 2021 1 次提交

KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs · fc9bf2e0

由 Sean Christopherson 提交于 6月 23, 2021

Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks. The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".

For non-SEV guests, not dropping bits is the correct behavior. Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits. And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.

For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted. Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM. The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-5-seanjc@google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fc9bf2e0

25 6月, 2021 21 次提交

KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on · 27de9250

由 Sean Christopherson 提交于 6月 22, 2021

Let the guest use 1g hugepages if TDP is enabled and the host supports
GBPAGES, KVM can't actively prevent the guest from using 1g pages in this
case since they can't be disabled in the hardware page walker. While
injecting a page fault if a bogus 1g page is encountered during a
software page walk is perfectly reasonable since KVM is simply honoring
userspace's vCPU model, doing so arguably doesn't provide any meaningful
value, and at worst will be horribly confusing as the guest will see
inconsistent behavior and seemingly spurious page faults.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-55-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27de9250

KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT · f82fdaf5

由 Sean Christopherson 提交于 6月 22, 2021

Drop the extra reset of shadow_zero_bits in the nested NPT flow now
that shadow_mmu_init_context computes the correct level for nested NPT.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-52-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f82fdaf5

KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic · 7cd138db

由 Sean Christopherson 提交于 6月 22, 2021

Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
best confusing.  Per the comment:

  Can have large pages at levels 2..last_nonleaf_level-1.

the intent of the variable would appear to be to track what levels can
_legally_ have large pages, but that intent doesn't align with reality.
The computed value will be wrong for 5-level paging, or if 1gb pages are
not supported.

The flawed code is not a problem in practice, because except for 32-bit
PSE paging, bit 7 is reserved if large pages aren't supported at the
level.  Take advantage of this invariant and simply omit the level magic
math for 64-bit page tables (including PAE).

For 32-bit paging (non-PAE), the adjustments are needed purely because
bit 7 is ignored if PSE=0.  Retain that logic as is, but make
is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.

Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
not PTEs and will blow up all the other gpte helpers.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-51-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cd138db

KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU · 961f8445

由 Sean Christopherson 提交于 6月 22, 2021

Extract the reserved SPTE check and print helpers in get_mmio_spte() to
new helpers so that KVM can also WARN on reserved badness when making a
SPTE.

Tag the checking helper with __always_inline to improve the probability
of the compiler generating optimal code for the checking loop, e.g. gcc
appears to avoid using %rbp when the helper is tagged with a vanilla
"inline".

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-48-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

961f8445

KVM: x86/mmu: Use MMU's role to determine PTTYPE · 36f26787

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role instead of vCPU state or role_regs to determine the
PTTYPE, i.e. which helpers to wire up.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-47-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

36f26787

KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers · fe660f72

由 Sean Christopherson 提交于 6月 22, 2021

Skip paging32E_init_context() and paging64_init_context_common() and go
directly to paging64_init_context() (was the common version) now that
the relevant flows don't need to distinguish between 64-bit PAE and
32-bit PAE for other reasons.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-46-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe660f72

KVM: x86/mmu: Add a helper to calculate root from role_regs · f4bd6f73

由 Sean Christopherson 提交于 6月 22, 2021

Add a helper to calculate the level for non-EPT page tables from the
MMU's role_regs.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-45-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4bd6f73

KVM: x86/mmu: Add helper to update paging metadata · 533f9a4b

由 Sean Christopherson 提交于 6月 22, 2021

Consolidate MMU guest metadata updates into a common helper for TDP,
shadow, and nested MMUs.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-44-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

533f9a4b

KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 · af0eb17e

由 Sean Christopherson 提交于 6月 22, 2021

Don't bother updating the bitmasks and last-leaf information if paging is
disabled as the metadata will never be used.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-43-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

af0eb17e

KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls · fa4b5588

由 Sean Christopherson 提交于 6月 22, 2021

Move calls to reset_rsvds_bits_mask() out of the various mode statements
and under a more generic CR0.PG=1 check.  This will allow for additional
code consolidation in the future.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-42-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa4b5588

KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper · 87e99d7d

由 Sean Christopherson 提交于 6月 22, 2021

Get LA57 from the role_regs, which are initialized from the vCPU even
though TDP is enabled, instead of pulling the value directly from the
vCPU when computing the guest's root_level for TDP MMUs.  Note, the check
is inside an is_long_mode() statement, so that requirement is not lost.

Use role_regs even though the MMU's role is available and arguably
"better".  A future commit will consolidate the guest root level logic,
and it needs access to EFER.LMA, which is not tracked in the role (it
can't be toggled on VM-Exit, unlike LA57).

Drop is_la57_mode() as there are no remaining users, and to discourage
pulling MMU state from the vCPU (in the future).

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-41-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87e99d7d

KVM: x86/mmu: Get nested MMU's root level from the MMU's role · 5472fcd4

由 Sean Christopherson 提交于 6月 22, 2021

Initialize the MMU's (guest) root_level using its mmu_role instead of
redoing the calculations.  The role_regs used to calculate the mmu_role
are initialized from the vCPU, i.e. this should be a complete nop.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-40-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5472fcd4

KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers · a4c93252

由 Sean Christopherson 提交于 6月 22, 2021

Drop kvm_mmu.nx as there no consumers left.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-39-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a4c93252

KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration · 90599c28

由 Sean Christopherson 提交于 6月 22, 2021

Get the MMU's effective EFER.NX from its role instead of using the
one-off, dedicated flag.  This will allow dropping said flag in a
future commit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-38-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

90599c28

KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata · 84a16226

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role and role_regs to calculate the MMU's guest root level
and NX bit.  For some flows, the vCPU state may not be correct (or
relevant), e.g. EPT doesn't interact with EFER.NX and nested NPT will
configure the guest_mmu with possibly-stale vCPU state.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-37-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84a16226

KVM: x86/mmu: Use MMU's roles to compute last non-leaf level · b67a93a8

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role to get CR4.PSE when determining the last level at
which the guest _cannot_ create a non-leaf PTE, i.e. cannot create a
huge page.

Note, the existing logic is arguably wrong when considering 5-level
paging and the case where 1gb pages aren't supported. In practice, the
logic is confusing but not broken, because except for 32-bit non-PAE
paging, bit 7 (_PAGE_PSE) bit is reserved when a huge page isn't supported at
that level. I.e. setting bit 7 will terminate the guest walk one way or
another. Furthermore, last_nonleaf_level is only consulted after KVM has
verified there are no reserved bits set.

All that confusion will be addressed in a future patch by dropping
last_nonleaf_level entirely. For now, massage the code to continue the
march toward using mmu_role for (almost) all MMU computations.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-35-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b67a93a8

KVM: x86/mmu: Use MMU's role to compute PKRU bitmask · 2e4c0661

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role to calculate the Protection Keys (Restrict Userspace)
bitmask instead of pulling bits from current vCPU state. For some flows,
the vCPU state may not be correct (or relevant), e.g. EPT doesn't
interact with PKRU. Case in point, the "ept" param simply disappears.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-34-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e4c0661

KVM: x86/mmu: Use MMU's role to compute permission bitmask · c596f147

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role to generate the permission bitmasks for the MMU.
For some flows, the vCPU state may not be correct (or relevant), e.g.
the nested NPT MMU can be initialized with incoherent vCPU state.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-33-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c596f147

KVM: x86/mmu: Drop vCPU param from reserved bits calculator · b705a277

由 Sean Christopherson 提交于 6月 22, 2021

Drop the vCPU param from __reset_rsvds_bits_mask() as it's now unused,
and ideally will remain unused in the future.  Any information that's
needed by the low level helper should be explicitly provided as it's used
for both shadow/host MMUs and guest MMUs, i.e. vCPU state may be
meaningless or simply wrong.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-32-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b705a277

KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits · 4e9c0d80

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role to get CR4.PSE when calculating reserved bits for the
guest's PTEs.  Practically speaking, this is a glorified nop as the role
always come from vCPU state for the relevant flows, but converting to
the roles will provide consistency once everything else is converted, and
will Just Work if the "always comes from vCPU" behavior were ever to
change (unlikely).
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-31-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e9c0d80

KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits · 8c985b2d

由 Sean Christopherson 提交于 6月 22, 2021

Unconditionally pass pse=false when calculating reserved bits for shadow
PTEs.  CR4.PSE is only relevant for 32-bit non-PAE paging, which KVM does
not use for shadow paging (including nested NPT).
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-30-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8c985b2d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功