提交 · a82070b6e71a6642f87ef9e483ddc062c3571678 · openeuler / Kernel

11 2月, 2022 12 次提交

KVM: x86/mmu: Move restore_acc_track_spte() to spte.h · 315d86da

由 David Matlack 提交于 1月 19, 2022

restore_acc_track_spte() is pure SPTE bit manipulation, making it a good
fit for spte.h. And now that the WARN_ON_ONCE() calls have been removed,
there isn't any good reason to not inline it.

This move also prepares for a follow-up commit that will need to call
restore_acc_track_spte() from spte.c

No functional change intended.
Reviewed-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-11-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

315d86da

KVM: x86/mmu: Drop new_spte local variable from restore_acc_track_spte() · 77c23c77

由 David Matlack 提交于 1月 19, 2022

The new_spte local variable is unnecessary. Deleting it can save a line
of code and simplify the remaining lines a bit.

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-10-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

77c23c77

KVM: x86/mmu: Remove unnecessary warnings from restore_acc_track_spte() · 59940e76

由 David Matlack 提交于 1月 19, 2022

The warnings in restore_acc_track_spte() can be removed because the only
caller checks is_access_track_spte(), and is_access_track_spte() checks
!spte_ad_enabled(). In other words, the warning can never be triggered.

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-9-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

59940e76

KVM: x86/mmu: Rename __rmap_write_protect() to rmap_write_protect() · 1346bbb6

由 David Matlack 提交于 1月 19, 2022

The function formerly known as rmap_write_protect() has been renamed to
kvm_vcpu_write_protect_gfn(), so we can get rid of the double
underscores in front of __rmap_write_protect().

No functional change intended.
Reviewed-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-3-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1346bbb6

KVM: x86/mmu: Rename rmap_write_protect() to kvm_vcpu_write_protect_gfn() · cf48f9e2

由 David Matlack 提交于 1月 19, 2022

rmap_write_protect() is a poor name because it also write-protects SPTEs
in the TDP MMU, not just SPTEs in the rmap. It is also confusing that
rmap_write_protect() is not a simple wrapper around
__rmap_write_protect(), since that is the common pattern for functions
with double-underscore names.

Rename rmap_write_protect() to kvm_vcpu_write_protect_gfn() to convey
that KVM is write-protecting a specific gfn in the context of a vCPU.

No functional change intended.
Reviewed-by: NBen Gardon <bgardon@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220119230739.2234394-2-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf48f9e2

KVM: x86/mmu: Consolidate comments about {Host,MMU}-writable · 02844ac1

由 David Matlack 提交于 1月 25, 2022

Consolidate the large comment above DEFAULT_SPTE_HOST_WRITABLE with the
large comment above is_writable_pte() into one comment. This comment
explains the different reasons why an SPTE may be non-writable and KVM
keeps track of that with the {Host,MMU}-writable bits.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220125230723.1701061-1-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02844ac1

KVM: x86/mmu: Rename DEFAULT_SPTE_MMU_WRITEABLE to DEFAULT_SPTE_MMU_WRITABLE · 1ca87e01

由 David Matlack 提交于 1月 25, 2022

Both "writeable" and "writable" are valid, but we should be consistent
about which we use. DEFAULT_SPTE_MMU_WRITEABLE was the odd one out in
the SPTE code, so rename it to DEFAULT_SPTE_MMU_WRITABLE.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220125230713.1700406-1-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ca87e01

KVM: x86/mmu: Check SPTE writable invariants when setting leaf SPTEs · 115111ef

由 David Matlack 提交于 1月 25, 2022

Check SPTE writable invariants when setting SPTEs rather than in
spte_can_locklessly_be_made_writable(). By the time KVM checks
spte_can_locklessly_be_made_writable(), the SPTE has long been since
corrupted.

Note that these invariants only apply to shadow-present leaf SPTEs (i.e.
not to MMIO SPTEs, non-leaf SPTEs, etc.). Add a comment explaining the
restriction and only instrument the code paths that set shadow-present
leaf SPTEs.

To account for access tracking, also check the SPTE writable invariants
when marking an SPTE as an access track SPTE. This also lets us remove
a redundant WARN from mark_spte_for_access_track().
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220125230518.1697048-3-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

115111ef

KVM: x86: Rename kvm_x86_ops pointers to align w/ preferred vendor names · e27bc044

由 Sean Christopherson 提交于 1月 28, 2022

Rename a variety of kvm_x86_op function pointers so that preferred name
for vendor implementations follows the pattern <vendor>_<function>, e.g.
rename .run() to .vcpu_run() to match {svm,vmx}_vcpu_run().  This will
allow vendor implementations to be wired up via the KVM_X86_OP macro.

In many cases, VMX and SVM "disagree" on the preferred name, though in
reality it's VMX and x86 that disagree as SVM blindly prepended _svm to
the kvm_x86_ops name.  Justification for using the VMX nomenclature:

  - set_{irq,nmi} => inject_{irq,nmi} because the helper is injecting an
    event that has already been "set" in e.g. the vIRR.  SVM's relevant
    VMCB field is even named event_inj, and KVM's stat is irq_injections.

  - prepare_guest_switch => prepare_switch_to_guest because the former is
    ambiguous, e.g. it could mean switching between multiple guests,
    switching from the guest to host, etc...

  - update_pi_irte => pi_update_irte to allow for matching match the rest
    of VMX's posted interrupt naming scheme, which is vmx_pi_<blah>().

  - start_assignment => pi_start_assignment to again follow VMX's posted
    interrupt naming scheme, and to provide context for what bit of code
    might care about an otherwise undescribed "assignment".

The "tlb_flush" => "flush_tlb" creates an inconsistency with respect to
Hyper-V's "tlb_remote_flush" hooks, but Hyper-V really is the one that's
wrong.  x86, VMX, and SVM all use flush_tlb, and even common KVM is on a
variant of the bandwagon with "kvm_flush_remote_tlbs", e.g. a more
appropriate name for the Hyper-V hooks would be flush_remote_tlbs.  Leave
that change for another time as the Hyper-V hooks always start as NULL,
i.e. the name doesn't matter for using kvm-x86-ops.h, and changing all
names requires an astounding amount of churn.

VMX and SVM function names are intentionally left as is to minimize the
diff.  Both VMX and SVM will need to rename even more functions in order
to fully utilize KVM_X86_OPS, i.e. an additional patch for each is
inevitable.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e27bc044

KVM: x86/mmu: Remove unused "vcpu" of reset_{tdp,ept}_shadow_zero_bits_mask() · e8f6e738

由 Jinrong Liang 提交于 1月 25, 2022

The "struct kvm_vcpu *vcpu" parameter of reset_ept_shadow_zero_bits_mask()
and reset_tdp_shadow_zero_bits_mask() is not used, so remove it.

No functional change intended.
Signed-off-by: NJinrong Liang <cloudliang@tencent.com>
Message-Id: <20220125095909.38122-4-cloudliang@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8f6e738

KVM: x86/mmu: Remove unused "kvm" of __rmap_write_protect() · a0e72cd1

由 Jinrong Liang 提交于 1月 25, 2022

The "struct kvm *kvm" parameter of __rmap_write_protect()
is not used, so remove it. No functional change intended.
Signed-off-by: NJinrong Liang <cloudliang@tencent.com>
Message-Id: <20220125095909.38122-3-cloudliang@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0e72cd1

KVM: x86/mmu: Remove unused "kvm" of kvm_mmu_unlink_parents() · 61827671

由 Jinrong Liang 提交于 1月 25, 2022

The "struct kvm *kvm" parameter of kvm_mmu_unlink_parents()
is not used, so remove it. No functional change intended.
Signed-off-by: NJinrong Liang <cloudliang@tencent.com>
Message-Id: <20220125095909.38122-2-cloudliang@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61827671

20 1月, 2022 1 次提交

KVM: x86/mmu: Improve TLB flush comment in kvm_mmu_slot_remove_write_access() · 6ff94f27

由 David Matlack 提交于 1月 13, 2022

Rewrite the comment in kvm_mmu_slot_remove_write_access() that explains
why it is safe to flush TLBs outside of the MMU lock after
write-protecting SPTEs for dirty logging. The current comment is a long
run-on sentence that was difficult to understand. In addition it was
specific to the shadow MMU (mentioning mmu_spte_update()) when the TDP
MMU has to handle this as well.

The new comment explains:
 - Why the TLB flush is necessary at all.
 - Why it is desirable to do the TLB flush outside of the MMU lock.
 - Why it is safe to do the TLB flush outside of the MMU lock.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220113233020.3986005-5-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ff94f27

20 12月, 2021 1 次提交

KVM: x86: Retry page fault if MMU reload is pending and root has no sp · 18c841e1

由 Sean Christopherson 提交于 12月 09, 2021

Play nice with a NULL shadow page when checking for an obsolete root in
the page fault handler by flagging the page fault as stale if there's no
shadow page associated with the root and KVM_REQ_MMU_RELOAD is pending.
Invalidating memslots, which is the only case where _all_ roots need to
be reloaded, requests all vCPUs to reload their MMUs while holding
mmu_lock for lock.

The "special" roots, e.g. pae_root when KVM uses PAE paging, are not
backed by a shadow page. Running with TDP disabled or with nested NPT
explodes spectaculary due to dereferencing a NULL shadow page pointer.

Skip the KVM_REQ_MMU_RELOAD check if there is a valid shadow page for the
root. Zapping shadow pages in response to guest activity, e.g. when the
guest frees a PGD, can trigger KVM_REQ_MMU_RELOAD even if the current
vCPU isn't using the affected root. I.e. KVM_REQ_MMU_RELOAD can be seen
with a completely valid root shadow page. This is a bit of a moot point
as KVM currently unloads all roots on KVM_REQ_MMU_RELOAD, but that will
be cleaned up in the future.

Fixes: a955cad8 ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211209060552.2956723-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

18c841e1

08 12月, 2021 16 次提交

KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the direction · bb3b394d

由 Lai Jiangshan 提交于 11月 24, 2021

This bit is very close to mean "role.quadrant is not in use", except that
it is false also when the MMU is mapping guest physical addresses
directly.  In that case, role.quadrant is indeed not in use, but there
are no guest PTEs at all.

Changing the name and direction of the bit removes the special case,
since a guest with paging disabled, or not considering guest paging
structures as is the case for two-dimensional paging, does not have
to deal with 4-byte guest PTEs.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-10-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb3b394d

KVM: X86: Add parameter huge_page_level to kvm_init_shadow_ept_mmu() · cc022ae1

由 Lai Jiangshan 提交于 11月 24, 2021

The level of supported large page on nEPT affects the rsvds_bits_mask.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-8-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc022ae1

KVM: X86: Add huge_page_level to __reset_rsvds_bits_mask_ept() · 84ea5c09

由 Lai Jiangshan 提交于 11月 24, 2021

Bit 7 on pte depends on the level of supported large page.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-7-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84ea5c09

KVM: X86: Remove mmu->translate_gpa · c59a0f57

由 Lai Jiangshan 提交于 11月 24, 2021

Reduce an indirect function call (retpoline) and some intialization
code.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-4-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c59a0f57

KVM: X86: Add parameter struct kvm_mmu *mmu into mmu->gva_to_gpa() · 1f5a21ee

由 Lai Jiangshan 提交于 11月 24, 2021

The mmu->gva_to_gpa() has no "struct kvm_mmu *mmu", so an extra
FNAME(gva_to_gpa_nested) is needed.

Add the parameter can simplify the code.  And it makes it explicit that
the walk is upon vcpu->arch.walk_mmu for gva and vcpu->arch.mmu for L2
gpa in translate_nested_gpa() via the new parameter.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f5a21ee

KVM: X86: Calculate quadrant when !role.gpte_is_8_bytes · b46a13cb

由 Lai Jiangshan 提交于 11月 18, 2021

role.quadrant is only valid when gpte size is 4 bytes and only be
calculated when gpte size is 4 bytes.

Although "vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL" also means
gpte size is 4 bytes, but using "!role.gpte_is_8_bytes" is clearer
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211118110814.2568-15-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b46a13cb

KVM: X86: Remove useless code to set role.gpte_is_8_bytes when role.direct · 41e35604

由 Lai Jiangshan 提交于 11月 18, 2021

role.gpte_is_8_bytes is unused when role.direct; there is no
point in changing a bit in the role, the value that was set
when the MMU is initialized is just fine.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211118110814.2568-14-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

41e35604

KVM: X86: Fix comment in __kvm_mmu_create() · 84432316

由 Lai Jiangshan 提交于 11月 18, 2021

The allocation of special roots is moved to mmu_alloc_special_roots().
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211118110814.2568-12-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84432316

KVM: X86: Skip allocating pae_root for vcpu->arch.guest_mmu when !tdp_enabled · 27f4fca2

由 Lai Jiangshan 提交于 11月 18, 2021

It is never used.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211118110814.2568-11-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27f4fca2

KVM: x86: change TLB flush indicator to bool · 98a26b69

由 Vihas Mak 提交于 11月 14, 2021

change 0 to false and 1 to true to fix following cocci warnings:

        arch/x86/kvm/mmu/mmu.c:1485:9-10: WARNING: return of 0/1 in function 'kvm_set_pte_rmapp' with return type bool
        arch/x86/kvm/mmu/mmu.c:1636:10-11: WARNING: return of 0/1 in function 'kvm_test_age_rmapp' with return type bool
Signed-off-by: NVihas Mak <makvihas@gmail.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Message-Id: <20211114164312.GA28736@makvihas>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98a26b69

KVM: x86/mmu: Propagate memslot const qualifier · 8283e36a

由 Ben Gardon 提交于 11月 15, 2021

In preparation for implementing in-place hugepage promotion, various
functions will need to be called from zap_collapsible_spte_range, which
has the const qualifier on its memslot argument. Propagate the const
qualifier to the various functions which will be needed. This just serves
to simplify the following patch.

No functional change intended.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20211115234603.2908381-11-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8283e36a

KVM: x86/mmu: Remove need for a vcpu from mmu_try_to_unsync_pages · 4d78d0b3

由 Ben Gardon 提交于 11月 15, 2021

The vCPU argument to mmu_try_to_unsync_pages is now only used to get a
pointer to the associated struct kvm, so pass in the kvm pointer from
the beginning to remove the need for a vCPU when calling the function.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20211115234603.2908381-7-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d78d0b3

KVM: x86/mmu: Remove need for a vcpu from kvm_slot_page_track_is_active · 9d395a0a

由 Ben Gardon 提交于 11月 15, 2021

kvm_slot_page_track_is_active only uses its vCPU argument to get a
pointer to the assoicated struct kvm, so just pass in the struct KVM to
remove the need for a vCPU pointer.

No functional change intended.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20211115234603.2908381-6-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d395a0a

KVM: Optimize gfn lookup in kvm_zap_gfn_range() · f4209439

由 Maciej S. Szmigiero 提交于 12月 06, 2021

Introduce a memslots gfn upper bound operation and use it to optimize
kvm_zap_gfn_range().
This way this handler can do a quick lookup for intersecting gfns and won't
have to do a linear scan of the whole memslot set.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <ef242146a87a335ee93b441dcf01665cb847c902.1638817641.git.maciej.szmigiero@oracle.com>

f4209439

KVM: Keep memslots in tree-based structures instead of array-based ones · a54d8066

由 Maciej S. Szmigiero 提交于 12月 06, 2021

The current memslot code uses a (reverse gfn-ordered) memslot array for
keeping track of them.

Because the memslot array that is currently in use cannot be modified
every memslot management operation (create, delete, move, change flags)
has to make a copy of the whole array so it has a scratch copy to work on.

Strictly speaking, however, it is only necessary to make copy of the
memslot that is being modified, copying all the memslots currently present
is just a limitation of the array-based memslot implementation.

Two memslot sets, however, are still needed so the VM continues to run
on the currently active set while the requested operation is being
performed on the second, currently inactive one.

In order to have two memslot sets, but only one copy of actual memslots
it is necessary to split out the memslot data from the memslot sets.

The memslots themselves should be also kept independent of each other
so they can be individually added or deleted.

These two memslot sets should normally point to the same set of
memslots. They can, however, be desynchronized when performing a
memslot management operation by replacing the memslot to be modified
by its copy. After the operation is complete, both memslot sets once
again point to the same, common set of memslot data.

This commit implements the aforementioned idea.

For tracking of gfns an ordinary rbtree is used since memslots cannot
overlap in the guest address space and so this data structure is
sufficient for ensuring that lookups are done quickly.

The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
that keep track of the last found-by-gfn memslot, are still present in the
new code.
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>

a54d8066

KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array · f5756029

由 Maciej S. Szmigiero 提交于 12月 06, 2021

There is no point in recalculating from scratch the total number of pages
in all memslots each time a memslot is created or deleted.  Use KVM's
cached nr_memslot_pages to compute the default max number of MMU pages.

Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
possibly overflow an unsigned long variable.

Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
overflow.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
[sean: use common KVM field and rework changelog accordingly]
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <d14c5a24535269606675437d5602b7dac4ad8c0e.1638817640.git.maciej.szmigiero@oracle.com>

f5756029

02 12月, 2021 1 次提交

KVM: x86/mmu: Retry page fault if root is invalidated by memslot update · a955cad8

由 Sean Christopherson 提交于 11月 20, 2021

Bail from the page fault handler if the root shadow page was obsoleted by
a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
doesn't rely on the memslot/MMU generation, and instead relies on the
root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
mmu_lock for write.

For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
moved past the gfn associated with the SP.

For other MMUs, the resulting behavior is far more convoluted, though
unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
root isn't directly problematic, as the obsolete root will be unloaded
and dropped before the vCPU re-enters the guest.  But because the legacy
MMU tracks shadow pages by their role, any SP created by the fault can
can be reused in the new post-reload root.  Again, that _shouldn't_ be
problematic as any leaf child SPTEs will be created for the current/valid
memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
the old generation as they will be flagged as obsolete.  But, given that
continuing with the fault is pointess (the root will be unloaded), apply
the check to all MMUs.

Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120045046.3940942-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a955cad8

30 11月, 2021 3 次提交

KVM: x86/mmu: Handle "default" period when selectively waking kthread · f47491d7

由 Sean Christopherson 提交于 11月 20, 2021

Account for the '0' being a default, "let KVM choose" period, when
determining whether or not the recovery worker needs to be awakened in
response to userspace reducing the period.  Failure to do so results in
the worker not being awakened properly, e.g. when changing the period
from '0' to any small-ish value.

Fixes: 4dfe4f40 ("kvm: x86: mmu: Make NX huge page recovery period configurable")
Cc: stable@vger.kernel.org
Cc: Junaid Shahid <junaids@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120015706.3830341-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f47491d7

KVM: MMU: shadow nested paging does not have PKU · 28f091bc

由 Paolo Bonzini 提交于 11月 22, 2021

Initialize the mask for PKU permissions as if CR4.PKE=0, avoiding
incorrect interpretations of the nested hypervisor's page tables.

Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

28f091bc

KVM: x86/mmu: Remove spurious TLB flushes in TDP MMU zap collapsible path · 4b85c921

由 Sean Christopherson 提交于 11月 20, 2021

Drop the "flush" param and return values to/from the TDP MMU's helper for
zapping collapsible SPTEs.  Because the helper runs with mmu_lock held
for read, not write, it uses tdp_mmu_zap_spte_atomic(), and the atomic
zap handles the necessary remote TLB flush.

Similarly, because mmu_lock is dropped and re-acquired between zapping
legacy MMUs and zapping TDP MMUs, kvm_mmu_zap_collapsible_sptes() must
handle remote TLB flushes from the legacy MMU before calling into the TDP
MMU.

Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120045046.3940942-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4b85c921

26 11月, 2021 3 次提交

KVM: X86: Use vcpu->arch.walk_mmu for kvm_mmu_invlpg() · 05b29633

由 Lai Jiangshan 提交于 11月 24, 2021

INVLPG operates on guest virtual address, which are represented by
vcpu->arch.walk_mmu.  In nested virtualization scenarios,
kvm_mmu_invlpg() was using the wrong MMU structure; if L2's invlpg were
emulated by L0 (in practice, it hardly happen) when nested two-dimensional
paging is enabled, the call to ->tlb_flush_gva() would be skipped and
the hardware TLB entry would not be invalidated.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-5-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

05b29633

KVM: X86: Fix when shadow_root_level=5 && guest root_level<4 · 12ec33a7

由 Lai Jiangshan 提交于 11月 24, 2021

If the is an L1 with nNPT in 32bit, the shadow walk starts with
pae_root.

Fixes: a717a780 ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host)
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

12ec33a7

KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN · feb627e8

由 Vitaly Kuznetsov 提交于 11月 22, 2021

Commit 63f5a190 ("KVM: x86: Alert userspace that KVM_SET_CPUID{,2}
after KVM_RUN is broken") officially deprecated KVM_SET_CPUID{,2} ioctls
after first successful KVM_RUN and promissed to make this sequence forbiden
in 5.16. It's time to fulfil the promise.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211122175818.608220-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

feb627e8

18 11月, 2021 3 次提交

KVM: x86/mmu: Pass parameter flush as false in kvm_tdp_mmu_zap_collapsible_sptes() · 8ed716ca

由 Hou Wenlong 提交于 11月 17, 2021

Since tlb flush has been done for legacy MMU before
kvm_tdp_mmu_zap_collapsible_sptes(), so the parameter flush
should be false for kvm_tdp_mmu_zap_collapsible_sptes().

Fixes: e2209710 ("KVM: x86/mmu: Skip rmap operations if rmaps not allocated")
Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <21453a1d2533afb6e59fb6c729af89e771ff2e76.1637140154.git.houwenlong93@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8ed716ca

KVM: x86/mmu: Skip tlb flush if it has been done in zap_gfn_range() · c7785d85

由 Hou Wenlong 提交于 11月 17, 2021

If the parameter flush is set, zap_gfn_range() would flush remote tlb
when yield, then tlb flush is not needed outside. So use the return
value of zap_gfn_range() directly instead of OR on it in
kvm_unmap_gfn_range() and kvm_tdp_mmu_unmap_gfn_range().

Fixes: 3039bcc7 ("KVM: Move x86's MMU notifier memslot walkers to generic code")
Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <5e16546e228877a4d974f8c0e448a93d52c7a5a9.1637140154.git.houwenlong93@linux.alibaba.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7785d85

KVM: x86/mmu: include EFER.LMA in extended mmu role · b8453cdc

由 Maxim Levitsky 提交于 11月 15, 2021

Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use. When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context. And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.

However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2. If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.

Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8453cdc

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功