提交 · 90bf8d98b422c09625762d55644bad64ab74c509 · openeuler / Kernel

02 12月, 2021 1 次提交

KVM: x86/mmu: Retry page fault if root is invalidated by memslot update · a955cad8

由 Sean Christopherson 提交于 11月 20, 2021

Bail from the page fault handler if the root shadow page was obsoleted by
a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
doesn't rely on the memslot/MMU generation, and instead relies on the
root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
mmu_lock for write.

For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
moved past the gfn associated with the SP.

For other MMUs, the resulting behavior is far more convoluted, though
unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
root isn't directly problematic, as the obsolete root will be unloaded
and dropped before the vCPU re-enters the guest.  But because the legacy
MMU tracks shadow pages by their role, any SP created by the fault can
can be reused in the new post-reload root.  Again, that _shouldn't_ be
problematic as any leaf child SPTEs will be created for the current/valid
memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
the old generation as they will be flagged as obsolete.  But, given that
continuing with the fault is pointess (the root will be unloaded), apply
the check to all MMUs.

Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120045046.3940942-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a955cad8

22 10月, 2021 1 次提交

KVM: x86/mmu: clean up prefetch/prefault/speculative naming · 2839180c

由 Paolo Bonzini 提交于 9月 29, 2021

"prefetch", "prefault" and "speculative" are used throughout KVM to mean
the same thing.  Use a single name, standardizing on "prefetch" which
is already used by various functions such as direct_pte_prefetch,
FNAME(prefetch_gpte), FNAME(pte_prefetch), etc.
Suggested-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2839180c

01 10月, 2021 18 次提交

KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858

由 David Matlack 提交于 8月 17, 2021

mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.

The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity.  Apart from the warning,
no functional change is intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53597858

KVM: x86/mmu: Avoid memslot lookup in rmap_add · 8a9f566a

由 David Matlack 提交于 8月 13, 2021

Avoid the memslot lookup in rmap_add, by passing it down from the fault
handling code to mmu_set_spte and then to rmap_add.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-6-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a9f566a

KVM: MMU: pass struct kvm_page_fault to mmu_set_spte · a12f4381

由 Paolo Bonzini 提交于 8月 17, 2021

mmu_set_spte is called for either PTE prefetching or page faults.  The
three boolean arguments write_fault, speculative and host_writable are
always respectively false/true/true for prefetching and coming from
a struct kvm_page_fault for page faults.

Let mmu_set_spte distinguish these two situation by accepting a
possibly NULL struct kvm_page_fault argument.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a12f4381

KVM: MMU: pass kvm_mmu_page struct to make_spte · 7158bee4

由 Paolo Bonzini 提交于 8月 17, 2021

The level and A/D bit support of the new SPTE can be found in the role,
which is stored in the kvm_mmu_page struct.  This merges two arguments
into one.

For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
not use it if the SPTE is already present) so we fetch it just before
calling make_spte.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7158bee4

KVM: MMU: remove unnecessary argument to mmu_set_spte · eb5cd7ff

由 Paolo Bonzini 提交于 8月 17, 2021

The level of the new SPTE can be found in the kvm_mmu_page struct; there
is no need to pass it down.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb5cd7ff

KVM: MMU: inline set_spte in FNAME(sync_page) · 4758d47e

由 Paolo Bonzini 提交于 8月 17, 2021

Since the two callers of set_spte do different things with the results,
inlining it actually makes the code simpler to reason about. For example,
FNAME(sync_page) already has a struct kvm_mmu_page *, but set_spte had to
fish it back out of sptep's private page data.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4758d47e

KVM: x86/mmu: Pass the memslot around via struct kvm_page_fault · e710c5f6

由 David Matlack 提交于 9月 24, 2021

The memslot for the faulting gfn is used throughout the page fault
handling code, so capture it in kvm_page_fault as soon as we know the
gfn and use it in the page fault handling code that has direct access
to the kvm_page_fault struct.  Replace various tests using is_noslot_pfn
with more direct tests on fault->slot being NULL.

This, in combination with the subsequent patch, improves "Populate
memory time" in dirty_log_perf_test by 5% when using the legacy MMU.
There is no discerable improvement to the performance of the TDP MMU.

No functional change intended.
Suggested-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-4-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e710c5f6

KVM: x86/mmu: Verify shadow walk doesn't terminate early in page faults · b1a429fb

由 Sean Christopherson 提交于 9月 06, 2021

WARN and bail if the shadow walk for faulting in a SPTE terminates early,
i.e. doesn't reach the expected level because the walk encountered a
terminal SPTE.  The shadow walks for page faults are subtle in that they
install non-leaf SPTEs (zapping leaf SPTEs if necessary!) in the loop
body, and consume the newly created non-leaf SPTE in the loop control,
e.g. __shadow_walk_next().  In other words, the walks guarantee that the
walk will stop if and only if the target level is reached by installing
non-leaf SPTEs to guarantee the walk remains valid.

Opportunistically use fault->goal-level instead of it.level in
FNAME(fetch) to further clarify that KVM always installs the leaf SPTE at
the target level.
Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210906122547.263316-1-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b1a429fb

KVM: MMU: change tracepoints arguments to kvm_page_fault · f0066d94

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to tracepoints instead of extracting the
arguments from the struct.  This also lets the kvm_mmu_spte_requested
tracepoint pick the gfn directly from fault->gfn, instead of using
the address.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f0066d94

KVM: MMU: change disallowed_hugepage_adjust() arguments to kvm_page_fault · 536f0e6a

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of
extracting the arguments from the struct.  Tweak a bit the conditions
to avoid long lines.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

536f0e6a

KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault · 73a3c659

由 Paolo Bonzini 提交于 8月 07, 2021

Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of
extracting the arguments from the struct; the results are also stored
in the struct, so the callers are adjusted consequently.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73a3c659

KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault · 9c03b182

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to FNAME(fetch)() instead of
extracting the arguments from the struct.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c03b182

KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault · 3a13f4fe

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to handle_abnormal_pfn() instead of
extracting the arguments from the struct.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a13f4fe

KVM: MMU: change kvm_faultin_pfn() arguments to kvm_page_fault · 3647cd04

由 Paolo Bonzini 提交于 8月 07, 2021

Add fields to struct kvm_page_fault corresponding to outputs of
kvm_faultin_pfn().  For now they have to be extracted again from struct
kvm_page_fault in the subsequent steps, but this is temporary until
other functions in the chain are switched over as well.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3647cd04

KVM: MMU: change page_fault_handle_page_track() arguments to kvm_page_fault · b8a5d551

由 Paolo Bonzini 提交于 8月 06, 2021

Add fields to struct kvm_page_fault corresponding to the arguments
of page_fault_handle_page_track(). The fields are initialized in the
callers, and page_fault_handle_page_track() receives a struct
kvm_page_fault instead of having to extract the arguments out of it.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8a5d551

KVM: MMU: change direct_page_fault() arguments to kvm_page_fault · 4326e57e

由 Paolo Bonzini 提交于 8月 06, 2021

Add fields to struct kvm_page_fault corresponding to
the arguments of direct_page_fault().  The fields are
initialized in the callers, and direct_page_fault()
receives a struct kvm_page_fault instead of having to
extract the arguments out of it.

Also adjust FNAME(page_fault) to store the max_level in
struct kvm_page_fault, to keep it similar to the direct
map path.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4326e57e

KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault · c501040a

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to mmu->page_fault() instead of
extracting the arguments from the struct.  FNAME(page_fault) can use
the precomputed bools from the error code.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c501040a

KVM: X86: Move PTE present check from loop body to __shadow_walk_next() · 3e44dce4

由 Lai Jiangshan 提交于 9月 06, 2021

So far, the loop bodies already ensure the PTE is present before calling
__shadow_walk_next():  Some loop bodies simply exit with a !PRESENT
directly and some other loop bodies, i.e. FNAME(fetch) and __direct_map()
do not currently guard their walks with is_shadow_present_pte, but only
because they install present non-leaf SPTEs in the loop itself.

But checking pte present in __shadow_walk_next() (which is called from
shadow_walk_okay()) is more prudent; walking past a !PRESENT SPTE
would lead to attempting to read a the next level SPTE from a garbage
iter->shadow_addr.  It also allows to remove the is_shadow_present_pte()
checks from the loop bodies.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210906122547.263316-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3e44dce4

30 9月, 2021 2 次提交

KVM: X86: Remove FNAME(update_pte) · cc2a8e66

由 Lai Jiangshan 提交于 9月 18, 2021

Its solo caller is changed to use FNAME(prefetch_gpte) directly.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-9-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc2a8e66

KVM: X86: Change kvm_sync_page() to return true when remote flush is needed · c3e5e415

由 Lai Jiangshan 提交于 9月 18, 2021

Currently kvm_sync_page() returns true when there is any present spte.
But the return value is ignored in the callers.

Changing kvm_sync_page() to return true when remote flush is needed and
changing mmu->sync_page() not to directly flush can combine and reduce
remote flush requests.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-7-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c3e5e415

23 9月, 2021 2 次提交

KVM: X86: Synchronize the shadow pagetable before link it · 65855ed8

由 Lai Jiangshan 提交于 9月 18, 2021

If gpte is changed from non-present to present, the guest doesn't need
to flush tlb per SDM.  So the host must synchronze sp before
link it.  Otherwise the guest might use a wrong mapping.

For example: the guest first changes a level-1 pagetable, and then
links its parent to a new place where the original gpte is non-present.
Finally the guest can access the remapped area without flushing
the tlb.  The guest's behavior should be allowed per SDM, but the host
kvm mmu makes it wrong.

Fixes: 4731d4c7 ("KVM: MMU: out of sync shadow core")
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

65855ed8

KVM: X86: Fix missed remote tlb flush in rmap_write_protect() · f8160295

由 Lai Jiangshan 提交于 9月 18, 2021

When kvm->tlbs_dirty > 0, some rmaps might have been deleted
without flushing tlb remotely after kvm_sync_page().  If @gfn
was writable before and it's rmaps was deleted in kvm_sync_page(),
and if the tlb entry is still in a remote running VCPU,  the @gfn
is not safely protected.

To fix the problem, kvm_sync_page() does the remote flush when
needed to avoid the problem.

Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210918005636.3675-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8160295

21 8月, 2021 2 次提交

KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code · 8f32d5e5

由 Maxim Levitsky 提交于 8月 10, 2021

This will allow it to return RET_PF_EMULATE for APIC mmio
emulation.

This code is based on a patch from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210810205251.424103-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8f32d5e5

KVM: x86/mmu: rename try_async_pf to kvm_faultin_pfn · 33a5c000

由 Maxim Levitsky 提交于 8月 10, 2021

try_async_pf is a wrong name for this function, since this code
is used when asynchronous page fault is not enabled as well.

This code is based on a patch from Sean Christopherson:
https://lkml.org/lkml/2021/7/19/2970Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210810205251.424103-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33a5c000

15 7月, 2021 1 次提交

KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs · fc9bf2e0

由 Sean Christopherson 提交于 6月 23, 2021

Ignore "dynamic" host adjustments to the physical address mask when
generating the masks for guest PTEs, i.e. the guest PA masks. The host
physical address space and guest physical address space are two different
beasts, e.g. even though SEV's C-bit is the same bit location for both
host and guest, disabling SME in the host (which clears shadow_me_mask)
does not affect the guest PTE->GPA "translation".

For non-SEV guests, not dropping bits is the correct behavior. Assuming
KVM and userspace correctly enumerate/configure guest MAXPHYADDR, bits
that are lost as collateral damage from memory encryption are treated as
reserved bits, i.e. KVM will never get to the point where it attempts to
generate a gfn using the affected bits. And if userspace wants to create
a bogus vCPU, then userspace gets to deal with the fallout of hardware
doing odd things with bad GPAs.

For SEV guests, not dropping the C-bit is technically wrong, but it's a
moot point because KVM can't read SEV guest's page tables in any case
since they're always encrypted. Not to mention that the current KVM code
is also broken since sme_me_mask does not have to be non-zero for SEV to
be supported by KVM. The proper fix would be to teach all of KVM to
correctly handle guest private memory, but that's a task for the future.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210623230552.4027702-5-seanjc@google.com>
[Use a new header instead of adding header guards to paging_tmpl.h. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fc9bf2e0

25 6月, 2021 7 次提交

KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault · 9a65d0b7

由 Sean Christopherson 提交于 6月 22, 2021

Use the current MMU instead of vCPU state to query CR4.SMEP when handling
a page fault. In the nested NPT case, the current CR4.SMEP reflects L2,
whereas the page fault is shadowing L1's NPT, which uses L1's hCR4.
Practically speaking, this is a nop a NPT walks are always user faults,
i.e. this code will never be reached, but fix it up for consistency.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-54-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a65d0b7

KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault · fdaa2935

由 Sean Christopherson 提交于 6月 22, 2021

Use the current MMU instead of vCPU state to query CR0.WP when handling
a page fault.  In the nested NPT case, the current CR0.WP reflects L2,
whereas the page fault is shadowing L1's NPT.  Practically speaking, this
is a nop a NPT walks are always user faults, but fix it up for
consistency.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-53-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdaa2935

KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic · 7cd138db

由 Sean Christopherson 提交于 6月 22, 2021

Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
best confusing.  Per the comment:

  Can have large pages at levels 2..last_nonleaf_level-1.

the intent of the variable would appear to be to track what levels can
_legally_ have large pages, but that intent doesn't align with reality.
The computed value will be wrong for 5-level paging, or if 1gb pages are
not supported.

The flawed code is not a problem in practice, because except for 32-bit
PSE paging, bit 7 is reserved if large pages aren't supported at the
level.  Take advantage of this invariant and simply omit the level magic
math for 64-bit page tables (including PAE).

For 32-bit paging (non-PAE), the adjustments are needed purely because
bit 7 is ignored if PSE=0.  Retain that logic as is, but make
is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.

Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
not PTEs and will blow up all the other gpte helpers.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-51-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cd138db

KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk · cd628f0f

由 Sean Christopherson 提交于 6月 22, 2021

Use the NX bit from the MMU's role instead of the MMU itself so that the
redundant, dedicated "nx" flag can be dropped.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-36-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd628f0f

KVM: x86/mmu: Add accessors to query mmu_role bits · 60667724

由 Sean Christopherson 提交于 6月 22, 2021

Add accessors via a builder macro for all mmu_role bits that track a CR0,
CR4, or EFER bit, abstracting whether the bits are in the base or the
extended role.

Future commits will switch to using mmu_role instead of vCPU state to
configure the MMU, i.e. there are about to be a large number of users.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-26-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

60667724

KVM: x86/mmu: WARN and zap SP when sync'ing if MMU role mismatches · 2640b086

由 Sean Christopherson 提交于 6月 22, 2021

When synchronizing a shadow page, WARN and zap the page if its mmu role
isn't compatible with the current MMU context, where "compatible" is an
exact match sans the bits that have no meaning in the overall MMU context
or will be explicitly overwritten during the sync.  Many of the helpers
used by sync_page() are specific to the current context, updating a SMM
vs. non-SMM shadow page would use the wrong memslots, updating L1 vs. L2
PTEs might work but would be extremely bizaree, and so on and so forth.

Drop the guard with respect to 8-byte vs. 4-byte PTEs in
__kvm_sync_page(), it was made useless when kvm_mmu_get_page() stopped
trying to sync shadow pages irrespective of the current MMU context.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-12-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2640b086

KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in nested NPT walk · ef318b9e

由 Sean Christopherson 提交于 6月 22, 2021

Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest.  When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc...  If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.

Fixes: e57d4a35 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef318b9e

09 6月, 2021 1 次提交

KVM: X86: MMU: Use the correct inherited permissions to get shadow page · b1bd5cba

由 Lai Jiangshan 提交于 6月 03, 2021

When computing the access permissions of a shadow page, use the effective
permissions of the walk up to that point, i.e. the logic AND of its parents'
permissions.  Two guest PxE entries that point at the same table gfn need to
be shadowed with different shadow pages if their parents' permissions are
different.  KVM currently uses the effective permissions of the last
non-leaf entry for all non-leaf entries.  Because all non-leaf SPTEs have
full ("uwx") permissions, and the effective permissions are recorded only
in role.access and merged into the leaves, this can lead to incorrect
reuse of a shadow page and eventually to a missing guest protection page
fault.

For example, here is a shared pagetable:

   pgd[]   pud[]        pmd[]            virtual address pointers
                     /->pmd1(u--)->pte1(uw-)->page1 <- ptr1 (u--)
        /->pud1(uw-)--->pmd2(uw-)->pte2(uw-)->page2 <- ptr2 (uw-)
   pgd-|           (shared pmd[] as above)
        \->pud2(u--)--->pmd1(u--)->pte1(uw-)->page1 <- ptr3 (u--)
                     \->pmd2(uw-)->pte2(uw-)->page2 <- ptr4 (u--)

  pud1 and pud2 point to the same pmd table, so:
  - ptr1 and ptr3 points to the same page.
  - ptr2 and ptr4 points to the same page.

(pud1 and pud2 here are pud entries, while pmd1 and pmd2 here are pmd entries)

- First, the guest reads from ptr1 first and KVM prepares a shadow
  page table with role.access=u--, from ptr1's pud1 and ptr1's pmd1.
  "u--" comes from the effective permissions of pgd, pud1 and
  pmd1, which are stored in pt->access.  "u--" is used also to get
  the pagetable for pud1, instead of "uw-".

- Then the guest writes to ptr2 and KVM reuses pud1 which is present.
  The hypervisor set up a shadow page for ptr2 with pt->access is "uw-"
  even though the pud1 pmd (because of the incorrect argument to
  kvm_mmu_get_page in the previous step) has role.access="u--".

- Then the guest reads from ptr3.  The hypervisor reuses pud1's
  shadow pmd for pud2, because both use "u--" for their permissions.
  Thus, the shadow pmd already includes entries for both pmd1 and pmd2.

- At last, the guest writes to ptr4.  This causes no vmexit or pagefault,
  because pud1's shadow page structures included an "uw-" page even though
  its role.access was "u--".

Any kind of shared pagetable might have the similar problem when in
virtual machine without TDP enabled if the permissions are different
from different ancestors.

In order to fix the problem, we change pt->access to be an array, and
any access in it will not include permissions ANDed from child ptes.

The test code is: https://lore.kernel.org/kvm/20210603050537.19605-1-jiangshanlai@gmail.com/
Remember to test it with TDP disabled.

The problem had existed long before the commit 41074d07 ("KVM: MMU:
Fix inherited permissions for emulated guest pte updates"), and it
is hard to find which is the culprit.  So there is no fixes tag here.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210603052455.21023-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
Fixes: cea0f0e7 ("[PATCH] KVM: MMU: Shadow page table caching")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b1bd5cba

15 3月, 2021 2 次提交

KVM: x86/mmu: Make Host-writable and MMU-writable bit locations dynamic · 5fc3424f

由 Sean Christopherson 提交于 2月 25, 2021

Make the location of the HOST_WRITABLE and MMU_WRITABLE configurable for
a given KVM instance.  This will allow EPT to use high available bits,
which in turn will free up bit 11 for a constant MMU_PRESENT bit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5fc3424f

KVM: x86: mmu: initialize fault.async_page_fault in walk_addr_generic · 422e2e17

由 Maxim Levitsky 提交于 2月 25, 2021

This field was left uninitialized by a mistake.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210225154135.405125-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

422e2e17

23 2月, 2021 2 次提交

KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848

由 David Stevens 提交于 2月 22, 2021

Track the range being invalidated by mmu_notifier and skip page fault
retries if the fault address is not affected by the in-progress
invalidation. Handle concurrent invalidations by finding the minimal
range which includes all ranges being invalidated. Although the combined
range may include unrelated addresses and cannot be shrunk as individual
invalidation operations complete, it is unlikely the marginal gains of
proper range tracking are worth the additional complexity.

The primary benefit of this change is the reduction in the likelihood of
extreme latency when handing a page fault due to another thread having
been preempted while modifying host virtual addresses.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
Message-Id: <20210222024522.1751719-3-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a42d848

KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault · 5f8a7cf2

由 Sean Christopherson 提交于 2月 22, 2021

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210222024522.1751719-2-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f8a7cf2

04 2月, 2021 1 次提交

KVM: x86/mmu: Use an rwlock for the x86 MMU · 531810ca

由 Ben Gardon 提交于 2月 02, 2021

Add a read / write lock to be used in place of the MMU spinlock on x86.
The rwlock will enable the TDP MMU to handle page faults, and other
operations in parallel in future commits.
Reviewed-by: NPeter Feiner <pfeiner@google.com>
Signed-off-by: NBen Gardon <bgardon@google.com>

Message-Id: <20210202185734.1680553-19-bgardon@google.com>
[Introduce virt/kvm/mmu_lock.h - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

531810ca

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功