提交 · 355d0473b1a11d7cf526fbd43c3908224e08a909 · openeuler / Kernel

08 12月, 2021 27 次提交

KVM: nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU races · 355d0473

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Use the already checked svm->nested.save cached fields
(EFER, CR0, CR4, ...) instead of vmcb12's in
nested_vmcb02_prepare_save().
This prevents from creating TOC/TOU races, since the
guest could modify the vmcb12 fields.

This also avoids the need of force-setting EFER_SVME in
nested_vmcb02_prepare_save.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211103140527.752797-6-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

355d0473

KVM: nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs() · b7a3d8b6

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Now that struct vmcb_save_area_cached contains the required
vmcb fields values (done in nested_load_save_from_vmcb12()),
check them to see if they are correct in nested_vmcb_valid_sregs().

While at it, rename nested_vmcb_valid_sregs in nested_vmcb_check_save.
__nested_vmcb_check_save takes the additional @save parameter, so it
is helpful when we want to check a non-svm save state, like in
svm_set_nested_state. The reason for that is that save is the L1
state, not L2, so we check it without moving it to svm->nested.save.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211103140527.752797-5-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b7a3d8b6

KVM: nSVM: rename nested_load_control_from_vmcb12 in nested_copy_vmcb_control_to_cache · 7907160d

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Following the same naming convention of the previous patch,
rename nested_load_control_from_vmcb12.
In addition, inline copy_vmcb_control_area as it is only called
by this function.

__nested_copy_vmcb_control_to_cache() works with vmcb_control_area
parameters and it will be useful in next patches, when we use
local variables instead of svm cached state.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211103140527.752797-4-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7907160d

KVM: nSVM: introduce svm->nested.save to cache save area before checks · f2740a8d

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

This is useful in the next patch, to keep a saved copy
of vmcb12 registers and pass it around more easily.

Instead of blindly copying everything, we just copy EFER, CR0, CR3, CR4,
DR6 and DR7 which are needed by the VMRUN checks.  If more fields will
need to be checked, it will be quite obvious to see that they must be added
in struct vmcb_save_area_cached and in nested_copy_vmcb_save_to_cache().

__nested_copy_vmcb_save_to_cache() takes a vmcb_save_area_cached
parameter, which is useful in order to save the state to a local
variable.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211103140527.752797-3-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2740a8d

KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in nested_vmcb_valid_sregs · 907afa48

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Inline nested_vmcb_check_cr3_cr4 as it is not called by anyone else.
Doing so simplifies next patches.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211103140527.752797-2-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

907afa48

KVM: Optimize gfn lookup in kvm_zap_gfn_range() · f4209439

由 Maciej S. Szmigiero 提交于 12月 06, 2021

Introduce a memslots gfn upper bound operation and use it to optimize
kvm_zap_gfn_range().
This way this handler can do a quick lookup for intersecting gfns and won't
have to do a linear scan of the whole memslot set.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <ef242146a87a335ee93b441dcf01665cb847c902.1638817641.git.maciej.szmigiero@oracle.com>

f4209439

KVM: Keep memslots in tree-based structures instead of array-based ones · a54d8066

由 Maciej S. Szmigiero 提交于 12月 06, 2021

The current memslot code uses a (reverse gfn-ordered) memslot array for
keeping track of them.

Because the memslot array that is currently in use cannot be modified
every memslot management operation (create, delete, move, change flags)
has to make a copy of the whole array so it has a scratch copy to work on.

Strictly speaking, however, it is only necessary to make copy of the
memslot that is being modified, copying all the memslots currently present
is just a limitation of the array-based memslot implementation.

Two memslot sets, however, are still needed so the VM continues to run
on the currently active set while the requested operation is being
performed on the second, currently inactive one.

In order to have two memslot sets, but only one copy of actual memslots
it is necessary to split out the memslot data from the memslot sets.

The memslots themselves should be also kept independent of each other
so they can be individually added or deleted.

These two memslot sets should normally point to the same set of
memslots. They can, however, be desynchronized when performing a
memslot management operation by replacing the memslot to be modified
by its copy. After the operation is complete, both memslot sets once
again point to the same, common set of memslot data.

This commit implements the aforementioned idea.

For tracking of gfns an ordinary rbtree is used since memslots cannot
overlap in the guest address space and so this data structure is
sufficient for ensuring that lookups are done quickly.

The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
that keep track of the last found-by-gfn memslot, are still present in the
new code.
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>

a54d8066

KVM: s390: Introduce kvm_s390_get_gfn_end() · 6a656832

由 Maciej S. Szmigiero 提交于 12月 06, 2021

And use it where s390 code would just access the memslot with the highest
gfn directly.

No functional change intended.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <42496041d6af1c23b1cbba2636b344ca8d5fc3af.1638817641.git.maciej.szmigiero@oracle.com>

6a656832

KVM: Use interval tree to do fast hva lookup in memslots · ed922739

由 Maciej S. Szmigiero 提交于 12月 06, 2021

The current memslots implementation only allows quick binary search by gfn,
quick lookup by hva is not possible - the implementation has to do a linear
scan of the whole memslots array, even though the operation being performed
might apply just to a single memslot.

This significantly hurts performance of per-hva operations with higher
memslot counts.

Since hva ranges can overlap between memslots an interval tree is needed
for tracking them.

[sean: handle interval tree updates in kvm_replace_memslot()]
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <d66b9974becaa9839be9c4e1a5de97b177b4ac20.1638817640.git.maciej.szmigiero@oracle.com>

ed922739

KVM: Integrate gfn_to_memslot_approx() into search_memslots() · c928bfc2

由 Maciej S. Szmigiero 提交于 12月 06, 2021

s390 arch has gfn_to_memslot_approx() which is almost identical to
search_memslots(), differing only in that in case the gfn falls in a hole
one of the memslots bordering the hole is returned.

Add this lookup mode as an option to search_memslots() so we don't have two
almost identical functions for looking up a memslot by its gfn.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
[sean: tweaked helper names to keep gfn_to_memslot_approx() in s390]
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <171cd89b52c718dbe180ecd909b4437a64a7e2ec.1638817640.git.maciej.szmigiero@oracle.com>

c928bfc2

KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array · f5756029

由 Maciej S. Szmigiero 提交于 12月 06, 2021

There is no point in recalculating from scratch the total number of pages
in all memslots each time a memslot is created or deleted.  Use KVM's
cached nr_memslot_pages to compute the default max number of MMU pages.

Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
possibly overflow an unsigned long variable.

Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
overflow.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
[sean: use common KVM field and rework changelog accordingly]
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <d14c5a24535269606675437d5602b7dac4ad8c0e.1638817640.git.maciej.szmigiero@oracle.com>

f5756029

KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't changed · e0c2b633

由 Maciej S. Szmigiero 提交于 12月 06, 2021

There is no point in calling kvm_mmu_change_mmu_pages() for memslot
operations that don't change the total page count, so do it just for
KVM_MR_CREATE and KVM_MR_DELETE.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <9e56b7616a11f5654e4ab486b3237366b7ba9f2a.1638817640.git.maciej.szmigiero@oracle.com>

e0c2b633

KVM: s390: Skip gfn/size sanity checks on memslot DELETE or FLAGS_ONLY · ec5c8697

由 Sean Christopherson 提交于 12月 06, 2021

Sanity check the hva, gfn, and size of a userspace memory region only if
any of those properties can change, i.e. skip the checks for DELETE and
FLAGS_ONLY. KVM doesn't allow moving the hva or changing the size, a gfn
change shows up as a MOVE even if flags are being modified, and the
checks are pointless for the DELETE case as userspace_addr and gfn_base
are zeroed by common KVM.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <05430738437ac2c9c7371ac4e11f4a533e1677da.1638817640.git.maciej.szmigiero@oracle.com>

ec5c8697

KVM: x86: Don't assume old/new memslots are non-NULL at memslot commit · 77aedf26

由 Sean Christopherson 提交于 12月 06, 2021

Play nice with a NULL @old or @new when handling memslot updates so that
common KVM can pass NULL for one or the other in CREATE and DELETE cases
instead of having to synthesize a dummy memslot.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <2eb7788adbdc2bc9a9c5f86844dd8ee5c8428732.1638817640.git.maciej.szmigiero@oracle.com>

77aedf26

KVM: Stop passing kvm_userspace_memory_region to arch memslot hooks · 6a99c6e3

由 Sean Christopherson 提交于 12月 06, 2021

Drop the @mem param from kvm_arch_{prepare,commit}_memory_region() now
that its use has been removed in all architectures.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <aa5ed3e62c27e881d0d8bc0acbc1572bc336dc19.1638817640.git.maciej.szmigiero@oracle.com>

6a99c6e3

KVM: RISC-V: Use "new" memslot instead of userspace memory region · d01495d4

由 Sean Christopherson 提交于 12月 06, 2021

Get the slot ID, hva, etc... from the "new" memslot instead of the
userspace memory region when preparing/committing a memory region. This
will allow a future commit to drop @mem from the prepare/commit hooks
once all architectures convert to using "new".

Opportunistically wait to get the various "new" values until after
filtering out the DELETE case in anticipation of a future commit passing
NULL for @new when deleting a memslot.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <543608ab88a1190e73a958efffafc98d2652c067.1638817640.git.maciej.szmigiero@oracle.com>

d01495d4

KVM: x86: Use "new" memslot instead of userspace memory region · 9d7d18ee

由 Sean Christopherson 提交于 12月 06, 2021

Get the number of pages directly from the new memslot instead of
computing the same from the userspace memory region when allocating
memslot metadata. This will allow a future patch to drop @mem.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <ef44892eb615f5c28e682bbe06af96aff9ce2a9f.1638817639.git.maciej.szmigiero@oracle.com>

9d7d18ee

KVM: s390: Use "new" memslot instead of userspace memory region · cf5b4869

由 Sean Christopherson 提交于 12月 06, 2021

Get the gfn, size, and hva from the new memslot instead of the userspace
memory region when preparing/committing memory region changes. This will
allow a future commit to drop the @mem param.

Note, this has a subtle functional change as KVM would previously reject
DELETE if userspace provided a garbage userspace_addr or guest_phys_addr,
whereas KVM zeros those fields in the "new" memslot when deleting an
existing memslot. Arguably the old behavior is more correct, but there's
zero benefit into requiring userspace to provide sane values for hva and
gfn.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <917ed131c06a4c7b35dd7fb7ed7955be899ad8cc.1638817639.git.maciej.szmigiero@oracle.com>

cf5b4869

KVM: PPC: Avoid referencing userspace memory region in memslot updates · eaaaed13

由 Sean Christopherson 提交于 12月 06, 2021

For PPC HV, get the number of pages directly from the new memslot instead
of computing the same from the userspace memory region, and explicitly
check for !DELETE instead of inferring the same when toggling mmio_update.
The motivation for these changes is to avoid referencing the @mem param
so that it can be dropped in a future commit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1e97fb5198be25f98ef82e63a8d770c682264cc9.1638817639.git.maciej.szmigiero@oracle.com>

eaaaed13

KVM: MIPS: Drop pr_debug from memslot commit to avoid using "mem" · 3b181617

由 Sean Christopherson 提交于 12月 06, 2021

Remove an old (circa 2012) kvm_debug from kvm_arch_commit_memory_region()
to print basic information when committing a memslot change. The primary
motivation for removing the kvm_debug is to avoid using @mem, the user
memory region, so that said param can be removed.

Alternatively, the debug message could be converted to use @new, but that
would require synthesizing select state to play nice with the DELETED
case, which will pass NULL for @new in the future. And there's no
argument to be had for dumping generic information in an arch callback,
i.e. if there's a good reason for the debug message, then it belongs in
common KVM code where all architectures can benefit.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <446929a668f6e1346751571b71db41e94e976cdf.1638817639.git.maciej.szmigiero@oracle.com>

3b181617

KVM: arm64: Use "new" memslot instead of userspace memory region · 509c594c

由 Sean Christopherson 提交于 12月 06, 2021

Opportunistically wait to get the hva begin+end until after filtering out
the DELETE case in anticipation of a future commit passing NULL for @new
when deleting a memslot.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <c019d00c2531520c52e0b52dfda1be5aa898103c.1638817639.git.maciej.szmigiero@oracle.com>

509c594c

KVM: Let/force architectures to deal with arch specific memslot data · 537a17b3

由 Sean Christopherson 提交于 12月 06, 2021

Pass the "old" slot to kvm_arch_prepare_memory_region() and force arch
code to handle propagating arch specific data from "new" to "old" when
necessary.  This is a baby step towards dynamically allocating "new" from
the get go, and is a (very) minor performance boost on x86 due to not
unnecessarily copying arch data.

For PPC HV, copy the rmap in the !CREATE and !DELETE paths, i.e. for MOVE
and FLAGS_ONLY.  This is functionally a nop as the previous behavior
would overwrite the pointer for CREATE, and eventually discard/ignore it
for DELETE.

For x86, copy the arch data only for FLAGS_ONLY changes.  Unlike PPC HV,
x86 needs to reallocate arch data in the MOVE case as the size of x86's
allocations depend on the alignment of the memslot's gfn.

Opportunistically tweak kvm_arch_prepare_memory_region()'s param order to
match the "commit" prototype.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
[mss: add missing RISCV kvm_arch_prepare_memory_region() change]
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <67dea5f11bbcfd71e3da5986f11e87f5dd4013f9.1638817639.git.maciej.szmigiero@oracle.com>

537a17b3

KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index · 46808a4c

由 Marc Zyngier 提交于 11月 16, 2021

Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.

Let's bite the bullet and repaint all of it in one go.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46808a4c

KVM: s390: Use kvm_get_vcpu() instead of open-coded access · 113d10bc

由 Marc Zyngier 提交于 11月 16, 2021

As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-4-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

113d10bc

KVM: mips: Use kvm_get_vcpu() instead of open-coded access · 75a9869f

由 Marc Zyngier 提交于 11月 16, 2021

As we are about to change the way vcpus are allocated, mandate
the use of kvm_get_vcpu() instead of open-coding the access.
Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-3-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

75a9869f

KVM: Move wiping of the kvm->vcpus array to common code · 27592ae8

由 Marc Zyngier 提交于 11月 16, 2021

All architectures have similar loops iterating over the vcpus,
freeing one vcpu at a time, and eventually wiping the reference
off the vcpus array. They are also inconsistently taking
the kvm->lock mutex when wiping the references from the array.

Make this code common, which will simplify further changes.
The locking is dropped altogether, as this should only be called
when there is no further references on the kvm structure.
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-2-maz@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27592ae8

KVM: MMU: update comment on the number of page role combinations · dc1ce455

由 Paolo Bonzini 提交于 11月 16, 2021

Fix the number of bits in the role, and simplify the explanation of
why several bits or combinations of bits are redundant.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dc1ce455

05 12月, 2021 3 次提交

KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure · ad5b3532

由 Tom Lendacky 提交于 12月 02, 2021

Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT
exit code or exit parameters fails.

The VMGEXIT instruction can be issued from userspace, even though
userspace (likely) can't update the GHCB. To prevent userspace from being
able to kill the guest, return an error through the GHCB when validation
fails rather than terminating the guest. For cases where the GHCB can't be
updated (e.g. the GHCB can't be mapped, etc.), just return back to the
guest.

The new error codes are documented in the lasest update to the GHCB
specification.

Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad5b3532

KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary · a655276a

由 Sean Christopherson 提交于 11月 09, 2021

Use kvzalloc() to allocate KVM's buffer for SEV-ES's GHCB scratch area so
that KVM falls back to __vmalloc() if physically contiguous memory isn't
available.  The buffer is purely a KVM software construct, i.e. there's
no need for it to be physically contiguous.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211109222350.2266045-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a655276a

KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails · 75236f5f

由 Sean Christopherson 提交于 11月 09, 2021

Return appropriate error codes if setting up the GHCB scratch area for an
SEV-ES guest fails.  In particular, returning -EINVAL instead of -ENOMEM
when allocating the kernel buffer could be confusing as userspace would
likely suspect a guest issue.

Fixes: 8f423a80 ("KVM: SVM: Support MMIO for an SEV-ES guest")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211109222350.2266045-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

75236f5f

02 12月, 2021 4 次提交

KVM: x86/mmu: Retry page fault if root is invalidated by memslot update · a955cad8

由 Sean Christopherson 提交于 11月 20, 2021

Bail from the page fault handler if the root shadow page was obsoleted by
a memslot update.  Do the check _after_ acuiring mmu_lock, as the TDP MMU
doesn't rely on the memslot/MMU generation, and instead relies on the
root being explicit marked invalid by kvm_mmu_zap_all_fast(), which takes
mmu_lock for write.

For the TDP MMU, inserting a SPTE into an obsolete root can leak a SP if
kvm_tdp_mmu_zap_invalidated_roots() has already zapped the SP, i.e. has
moved past the gfn associated with the SP.

For other MMUs, the resulting behavior is far more convoluted, though
unlikely to be truly problematic.  Installing SPs/SPTEs into the obsolete
root isn't directly problematic, as the obsolete root will be unloaded
and dropped before the vCPU re-enters the guest.  But because the legacy
MMU tracks shadow pages by their role, any SP created by the fault can
can be reused in the new post-reload root.  Again, that _shouldn't_ be
problematic as any leaf child SPTEs will be created for the current/valid
memslot generation, and kvm_mmu_get_page() will not reuse child SPs from
the old generation as they will be flagged as obsolete.  But, given that
continuing with the fault is pointess (the root will be unloaded), apply
the check to all MMUs.

Fixes: b7cccd39 ("KVM: x86/mmu: Fast invalidation for TDP MMU")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211120045046.3940942-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a955cad8

KVM: VMX: Set failure code in prepare_vmcs02() · bfbb307c

由 Dan Carpenter 提交于 11月 30, 2021

The error paths in the prepare_vmcs02() function are supposed to set
*entry_failure_code but this path does not.  It leads to using an
uninitialized variable in the caller.

Fixes: 71f73470 ("KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20211130125337.GB24578@kili>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bfbb307c

KVM: ensure APICv is considered inactive if there is no APIC · ef8b4b72

由 Paolo Bonzini 提交于 11月 30, 2021

kvm_vcpu_apicv_active() returns false if a virtual machine has no in-kernel
local APIC, however kvm_apicv_activated might still be true if there are
no reasons to disable APICv; in fact it is quite likely that there is none
because APICv is inhibited by specific configurations of the local APIC
and those configurations cannot be programmed. This triggers a WARN:

WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));

To avoid this, introduce another cause for APICv inhibition, namely the
absence of an in-kernel local APIC. This cause is enabled by default,
and is dropped by either KVM_CREATE_IRQCHIP or the enabling of
KVM_CAP_IRQCHIP_SPLIT.
Reported-by: NIgnat Korchagin <ignat@cloudflare.com>
Fixes: ee49a893 ("KVM: x86: Move SVM's APICv sanity check to common x86", 2021-10-22)
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NIgnat Korchagin <ignat@cloudflare.com>
Message-Id: <20211130123746.293379-1-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef8b4b72

KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register · cb1d220d

由 Like Xu 提交于 11月 18, 2021

If we run the following perf command in an AMD Milan guest:

  perf stat \
  -e cpu/event=0x1d0/ \
  -e cpu/event=0x1c7/ \
  -e cpu/umask=0x1f,event=0x18e/ \
  -e cpu/umask=0x7,event=0x18e/ \
  -e cpu/umask=0x18,event=0x18e/ \
  ./workload

dmesg will report a #GP warning from an unchecked MSR access
error on MSR_F15H_PERF_CTLx.

This is because according to APM (Revision: 4.03) Figure 13-7,
the bits [35:32] of AMD PerfEvtSeln register is a part of the
event select encoding, which extends the EVENT_SELECT field
from 8 bits to 12 bits.

Opportunistically update pmu->reserved_bits for reserved bit 19.
Reported-by: NJim Mattson <jmattson@google.com>
Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211118130320.95997-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb1d220d

30 11月, 2021 6 次提交

KVM: fix avic_set_running for preemptable kernels · 7cfc5c65

由 Paolo Bonzini 提交于 11月 30, 2021

avic_set_running() passes the current CPU to avic_vcpu_load(), albeit
via vcpu->cpu rather than smp_processor_id().  If the thread is migrated
while avic_set_running runs, the call to avic_vcpu_load() can use a stale
value for the processor id.  Avoid this by blocking preemption over the
entire execution of avic_set_running().
Reported-by: NSean Christopherson <seanjc@google.com>
Fixes: 8221c137 ("svm: Manage vcpu load/unload when enable AVIC")
Cc: stable@vger.kernel.org
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cfc5c65

KVM: VMX: clear vmx_x86_ops.sync_pir_to_irr if APICv is disabled · e90e51d5

由 Paolo Bonzini 提交于 11月 30, 2021

There is nothing to synchronize if APICv is disabled, since neither
other vCPUs nor assigned devices can set PIR.ON.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e90e51d5

KVM: SEV: accept signals in sev_lock_two_vms · c9d61dcb

由 Paolo Bonzini 提交于 11月 22, 2021

Generally, kvm->lock is not taken for a long time, but
sev_lock_two_vms is different: it takes vCPU locks
inside, so userspace can hold it back just by calling
a vCPU ioctl.  Play it safe and use mutex_lock_killable.

Message-Id: <20211123005036.2954379-13-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9d61dcb

KVM: SEV: do not take kvm->lock when destroying · 10a37929

由 Paolo Bonzini 提交于 11月 22, 2021

Taking the lock is useless since there are no other references,
and there are already accesses (e.g. to sev->enc_context_owner)
that do not take it.  So get rid of it.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-12-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10a37929

KVM: SEV: Prohibit migration of a VM that has mirrors · 17d44a96

由 Paolo Bonzini 提交于 11月 22, 2021

VMs that mirror an encryption context rely on the owner to keep the
ASID allocated.  Performing a KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
would cause a dangling ASID:

1. copy context from A to B (gets ref to A)
2. move context from A to L (moves ASID from A to L)
3. close L (releases ASID from L, B still references it)

The right way to do the handoff instead is to create a fresh mirror VM
on the destination first:

1. copy context from A to B (gets ref to A)
[later] 2. close B (releases ref to A)
3. move context from A to L (moves ASID from A to L)
4. copy context from L to M

So, catch the situation by adding a count of how many VMs are
mirroring this one's encryption context.

Fixes: 0b020f5a ("KVM: SEV: Add support for SEV-ES intra host migration")
Message-Id: <20211123005036.2954379-11-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17d44a96

KVM: SEV: Do COPY_ENC_CONTEXT_FROM with both VMs locked · bf42b02b

由 Paolo Bonzini 提交于 11月 22, 2021

Now that we have a facility to lock two VMs with deadlock
protection, use it for the creation of mirror VMs as well.  One of
COPY_ENC_CONTEXT_FROM(dst, src) and COPY_ENC_CONTEXT_FROM(src, dst)
would always fail, so the combination is nonsensical and it is okay to
return -EBUSY if it is attempted.

This sidesteps the question of what happens if a VM is
MOVE_ENC_CONTEXT_FROM'd at the same time as it is
COPY_ENC_CONTEXT_FROM'd: the locking prevents that from
happening.

Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211123005036.2954379-10-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bf42b02b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功