提交 · 34114136f725cbd0c83e7b5a0c8a977976cd82f7 · openeuler / Kernel

07 5月, 2021 29 次提交

KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks · 34114136

由 Nicholas Piggin 提交于 5月 05, 2021

Commit b1c5356e ("KVM: PPC: Convert to the gfn-based MMU notifier
callbacks") causes unmap_gfn_range and age_gfn callbacks to only work
on the first gfn in the range. It also makes the aging callbacks call
into both radix and hash aging functions for radix guests. Fix this.

Add warnings for the single-gfn calls that have been converted to range
callbacks, in case they ever receieve ranges greater than 1.

Fixes: b1c5356e ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks")
Reported-by: NBharata B Rao <bharata@linux.ibm.com>
Tested-by: NBharata B Rao <bharata@linux.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Message-Id: <20210505121509.1470207-1-npiggin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

34114136

KVM: x86: Hide RDTSCP and RDPID if MSR_TSC_AUX probing failed · 78bba966

由 Sean Christopherson 提交于 5月 04, 2021

If probing MSR_TSC_AUX failed, hide RDTSCP and RDPID, and WARN if either
feature was reported as supported. In theory, such a scenario should
never happen as both Intel and AMD state that MSR_TSC_AUX is available if
RDTSCP or RDPID is supported. But, KVM injects #GP on MSR_TSC_AUX
accesses if probing failed, faults on WRMSR(MSR_TSC_AUX) may be fatal to
the guest (because they happen during early CPU bringup), and KVM itself
has effectively misreported RDPID support in the past.

Note, this also has the happy side effect of omitting MSR_TSC_AUX from
the list of MSRs that are exposed to userspace if probing the MSR fails.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-16-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

78bba966

KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model · 61a05d44

由 Sean Christopherson 提交于 5月 04, 2021

Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
the guest CPU model instead of the host CPU behavior. While not strictly
necessary to avoid guest breakage, emulating cross-vendor "architecture"
will provide consistent behavior for the guest, e.g. WRMSR fault behavior
won't change if the vCPU is migrated to a host with divergent behavior.

Note, the "new" kvm_is_supported_user_return_msr() checks do not add new
functionality on either SVM or VMX. On SVM, the equivalent was
"tsc_aux_uret_slot < 0", and on VMX the check was buried in the
vmx_find_uret_msr() call at the find_uret_msr label.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61a05d44

KVM: x86: Move uret MSR slot management to common x86 · e5fda4bb

由 Sean Christopherson 提交于 5月 04, 2021

Now that SVM and VMX both probe MSRs before "defining" user return slots
for them, consolidate the code for probe+define into common x86 and
eliminate the odd behavior of having the vendor code define the slot for
a given MSR.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5fda4bb

KVM: x86: Export the number of uret MSRs to vendor modules · 9cc39a5a

由 Sean Christopherson 提交于 5月 04, 2021

Split out and export the number of configured user return MSRs so that
VMX can iterate over the set of MSRs without having to do its own tracking.
Keep the list itself internal to x86 so that vendor code still has to go
through the "official" APIs to add/modify entries.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9cc39a5a

KVM: VMX: Disable loading of TSX_CTRL MSR the more conventional way · 5e17c624

由 Sean Christopherson 提交于 5月 04, 2021

Tag TSX_CTRL as not needing to be loaded when RTM isn't supported in the
host.  Crushing the write mask to '0' has the same effect, but requires
more mental gymnastics to understand.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-12-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5e17c624

KVM: VMX: Use common x86's uret MSR list as the one true list · 8ea8b8d6

由 Sean Christopherson 提交于 5月 04, 2021

Drop VMX's global list of user return MSRs now that VMX doesn't resort said
list to isolate "active" MSRs, i.e. now that VMX's list and x86's list have
the same MSRs in the same order.

In addition to eliminating the redundant list, this will also allow moving
more of the list management into common x86.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8ea8b8d6

KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list · ee9d22e0

由 Sean Christopherson 提交于 5月 04, 2021

Explicitly flag a uret MSR as needing to be loaded into hardware instead of
resorting the list of "active" MSRs and tracking how many MSRs in total
need to be loaded. The only benefit to sorting the list is that the loop
to load MSRs during vmx_prepare_switch_to_guest() doesn't need to iterate
over all supported uret MRS, only those that are active. But that is a
pointless optimization, as the most common case, running a 64-bit guest,
will load the vast majority of MSRs. Not to mention that a single WRMSR is
far more expensive than iterating over the list.

Providing a stable list order obviates the need to track a given MSR's
"slot" in the per-CPU list of user return MSRs; all lists simply use the
same ordering. Future patches will take advantage of the stable order to
further simplify the related code.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee9d22e0

KVM: VMX: Configure list of user return MSRs at module init · b6194b94

由 Sean Christopherson 提交于 5月 04, 2021

Configure the list of user return MSRs that are actually supported at
module init instead of reprobing the list of possible MSRs every time a
vCPU is created. Curating the list on a per-vCPU basis is pointless; KVM
is completely hosed if the set of supported MSRs changes after module init,
or if the set of MSRs differs per physical PCU.

The per-vCPU lists also increase complexity (see __vmx_find_uret_msr()) and
creates corner cases that _should_ be impossible, but theoretically exist
in KVM, e.g. advertising RDTSCP to userspace without actually being able to
virtualize RDTSCP if probing MSR_TSC_AUX fails.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6194b94

KVM: x86: Add support for RDPID without RDTSCP · 36fa06f9

由 Sean Christopherson 提交于 5月 04, 2021

Allow userspace to enable RDPID for a guest without also enabling RDTSCP.
Aside from checking for RDPID support in the obvious flows, VMX also needs
to set ENABLE_RDTSCP=1 when RDPID is exposed.

For the record, there is no known scenario where enabling RDPID without
RDTSCP is desirable.  But, both AMD and Intel architectures allow for the
condition, i.e. this is purely to make KVM more architecturally accurate.

Fixes: 41cd02c6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
Cc: stable@vger.kernel.org
Reported-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

36fa06f9

KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host · 0caa0a77

由 Sean Christopherson 提交于 5月 04, 2021

Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and
if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to
VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot
be disallowed in the guest (without resorting to binary translation).
Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to
the guest if RDTSCP is not supported.

Note, there is also a kernel bug that prevents leaking the host's value.
The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is
supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID.
I.e. if RDTSCP is not supported, there is no host value to leak. But,
if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX
in the case where hardware supports RDPID but RDTSCP is unavailable for
whatever reason.

Probing MSR_TSC_AUX will also allow consolidating the probe and define
logic in common x86, and will make it simpler to condition the existence
of MSR_TSX_AUX (from the guest's perspective) on RDTSCP *or* RDPID.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0caa0a77

KVM: VMX: Disable preemption when probing user return MSRs · 5104d7ff

由 Sean Christopherson 提交于 5月 04, 2021

Disable preemption when probing a user return MSR via RDSMR/WRMSR.  If
the MSR holds a different value per logical CPU, the WRMSR could corrupt
the host's value if KVM is preempted between the RDMSR and WRMSR, and
then rescheduled on a different CPU.

Opportunistically land the helper in common x86, SVM will use the helper
in a future commit.

Fixes: 4be53410 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation")
Cc: stable@vger.kernel.org
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-6-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5104d7ff

KVM: x86: Move RDPID emulation intercept to its own enum · 2183de41

由 Sean Christopherson 提交于 5月 04, 2021

Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP.
Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept.

Fixes: fb6d4d34 ("KVM: x86: emulate RDPID")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-5-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2183de41

KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest · 3b195ac9

由 Sean Christopherson 提交于 5月 04, 2021

Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.

Note, SVM does not support intercepting RDPID.  Unlike VMX's
ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
is a benign virtualization hole as the host kernel (incorrectly) sets
MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
the host's MSR_TSC_AUX to the guest.

But, when the kernel bug is fixed, KVM will start leaking the host's
MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
for whatever reason.  This leak will be remedied in a future commit.

Fixes: 46896c73 ("KVM: svm: add support for RDTSCP")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-4-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3b195ac9

KVM: x86: Emulate RDPID only if RDTSCP is supported · 85d00112

由 Sean Christopherson 提交于 5月 04, 2021

Do not advertise emulation support for RDPID if RDTSCP is unsupported.
RDPID emulation subtly relies on MSR_TSC_AUX to exist in hardware, as
both vmx_get_msr() and svm_get_msr() will return an error if the MSR is
unsupported, i.e. ctxt->ops->get_msr() will fail and the emulator will
inject a #UD.

Note, RDPID emulation also relies on RDTSCP being enabled in the guest,
but this is a KVM bug and will eventually be fixed.

Fixes: fb6d4d34 ("KVM: x86: emulate RDPID")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-3-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

85d00112

KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported · 8aec21c0

由 Sean Christopherson 提交于 5月 04, 2021

Clear KVM's RDPID capability if the ENABLE_RDTSCP secondary exec control is
unsupported.  Despite being enumerated in a separate CPUID flag, RDPID is
bundled under the same VMCS control as RDTSCP and will #UD in VMX non-root
if ENABLE_RDTSCP is not enabled.

Fixes: 41cd02c6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-2-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8aec21c0

KVM: nSVM: remove a warning about vmcb01 VM exit reason · 809c7913

由 Maxim Levitsky 提交于 5月 04, 2021

While in most cases, when returning to use the VMCB01,
the exit reason stored in it will be SVM_EXIT_VMRUN,
on first VM exit after a nested migration this field
can contain anything since the VM entry did happen
before the migration.

Remove this warning to avoid the false positive.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210504143936.1644378-3-mlevitsk@redhat.com>
Fixes: 9a7de6ec ("KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit()")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

809c7913

KVM: nSVM: always restore the L1's GIF on migration · 063ab16c

由 Maxim Levitsky 提交于 5月 04, 2021

While usually the L1's GIF is set while L2 runs, and usually
migration nested state is loaded after a vCPU reset which
also sets L1's GIF to true, this is not guaranteed.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210504143936.1644378-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

063ab16c

KVM: x86: Hoist input checks in kvm_add_msr_filter() · aca35288

由 Siddharth Chandrasekaran 提交于 5月 03, 2021

In ioctl KVM_X86_SET_MSR_FILTER, input from user space is validated
after a memdup_user(). For invalid inputs we'd memdup and then call
kfree unnecessarily. Hoist input validation to avoid kfree altogether.
Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de>
Message-Id: <20210503122111.13775-1-sidcha@amazon.de>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aca35288

selftests: kvm: remove reassignment of non-absolute variables · 5f443e42

由 Bill Wendling 提交于 12月 10, 2020

Clang's integrated assembler does not allow symbols with non-absolute
values to be reassigned. Modify the interrupt entry loop macro to be
compatible with IAS by using a label and an offset.

Cc: Jian Cai <caij2003@gmail.com>
Signed-off-by: NBill Wendling <morbo@google.com>
References: https://lore.kernel.org/lkml/20200714233024.1789985-1-caij2003@gmail.com/
Message-Id: <20201211012317.3722214-1-morbo@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f443e42

KVM: nVMX: Properly pad 'struct kvm_vmx_nested_state_hdr' · 70f094f4

由 Vitaly Kuznetsov 提交于 5月 03, 2021

Eliminate the probably unwanted hole in 'struct kvm_vmx_nested_state_hdr':

Pre-patch:
struct kvm_vmx_nested_state_hdr {
        __u64                      vmxon_pa;             /*     0     8 */
        __u64                      vmcs12_pa;            /*     8     8 */
        struct {
                __u16              flags;                /*    16     2 */
        } smm;                                           /*    16     2 */

        /* XXX 2 bytes hole, try to pack */

        __u32                      flags;                /*    20     4 */
        __u64                      preemption_timer_deadline; /*    24     8 */
};

Post-patch:
struct kvm_vmx_nested_state_hdr {
        __u64                      vmxon_pa;             /*     0     8 */
        __u64                      vmcs12_pa;            /*     8     8 */
        struct {
                __u16              flags;                /*    16     2 */
        } smm;                                           /*    16     2 */
        __u16                      pad;                  /*    18     2 */
        __u32                      flags;                /*    20     4 */
        __u64                      preemption_timer_deadline; /*    24     8 */
};
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210503150854.1144255-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

70f094f4

KVM: selftests: evmcs_test: Check that VMCS12 is alway properly synced to eVMCS after restore · c9ecafaf

由 Vitaly Kuznetsov 提交于 5月 05, 2021

Add a test for the regression, introduced by commit f2c7ef3b
("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit"). When
L2->L1 exit is forced immediately after restoring nested state,
KVM_REQ_GET_NESTED_STATE_PAGES request is cleared and VMCS12 changes
(e.g. fresh RIP) are not reflected to eVMCS. The consequent nested
vCPU run gets broken.

Utilize NMI injection to do the job.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210505151823.1341678-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9ecafaf

KVM: selftests: evmcs_test: Check that VMLAUNCH with bogus EVMPTR is causing #UD · 32d1b3ab

由 Vitaly Kuznetsov 提交于 5月 05, 2021

'run->exit_reason == KVM_EXIT_SHUTDOWN' check is not ideal as we may be
getting some unexpected exception. Directly check for #UD instead.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210505151823.1341678-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

32d1b3ab

KVM: nVMX: Always make an attempt to map eVMCS after migration · f5c7e842

由 Vitaly Kuznetsov 提交于 5月 03, 2021

When enlightened VMCS is in use and nested state is migrated with
vmx_get_nested_state()/vmx_set_nested_state() KVM can't map evmcs
page right away: evmcs gpa is not 'struct kvm_vmx_nested_state_hdr'
and we can't read it from VP assist page because userspace may decide
to restore HV_X64_MSR_VP_ASSIST_PAGE after restoring nested state
(and QEMU, for example, does exactly that). To make sure eVMCS is
mapped /vmx_set_nested_state() raises KVM_REQ_GET_NESTED_STATE_PAGES
request.

Commit f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
on nested vmexit") added KVM_REQ_GET_NESTED_STATE_PAGES clearing to
nested_vmx_vmexit() to make sure MSR permission bitmap is not switched
when an immediate exit from L2 to L1 happens right after migration (caused
by a pending event, for example). Unfortunately, in the exact same
situation we still need to have eVMCS mapped so
nested_sync_vmcs12_to_shadow() reflects changes in VMCS12 to eVMCS.

As a band-aid, restore nested_get_evmcs_page() when clearing
KVM_REQ_GET_NESTED_STATE_PAGES in nested_vmx_vmexit(). The 'fix' is far
from being ideal as we can't easily propagate possible failures and even if
we could, this is most likely already too late to do so. The whole
'KVM_REQ_GET_NESTED_STATE_PAGES' idea for mapping eVMCS after migration
seems to be fragile as we diverge too much from the 'native' path when
vmptr loading happens on vmx_set_nested_state().

Fixes: f2c7ef3b ("KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210503150854.1144255-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f5c7e842

doc/kvm: Fix wrong entry for KVM_CAP_X86_MSR_FILTER · 46a63924

由 Siddharth Chandrasekaran 提交于 5月 03, 2021

The capability that exposes new ioctl KVM_X86_SET_MSR_FILTER to
userspace is specified incorrectly as the ioctl itself (instead of
KVM_CAP_X86_MSR_FILTER). This patch fixes it.

Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
Reviewed-by: NAlexander Graf <graf@amazon.de>
Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de>
Message-Id: <20210503120059.9283-1-sidcha@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46a63924

x86/kvm: Unify kvm_pv_guest_cpu_reboot() with kvm_guest_cpu_offline() · 384fc672

由 Vitaly Kuznetsov 提交于 4月 14, 2021

Simplify the code by making PV features shutdown happen in one place.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-6-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

384fc672

x86/kvm: Disable all PV features on crash · 3d6b8413

由 Vitaly Kuznetsov 提交于 4月 14, 2021

Crash shutdown handler only disables kvmclock and steal time, other PV
features remain active so we risk corrupting memory or getting some
side-effects in kdump kernel. Move crash handler to kvm.c and unify
with CPU offline.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-5-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d6b8413

x86/kvm: Disable kvmclock on all CPUs on shutdown · c02027b5

由 Vitaly Kuznetsov 提交于 4月 14, 2021

Currenly, we disable kvmclock from machine_shutdown() hook and this
only happens for boot CPU. We need to disable it for all CPUs to
guard against memory corruption e.g. on restore from hibernate.

Note, writing '0' to kvmclock MSR doesn't clear memory location, it
just prevents hypervisor from updating the location so for the short
while after write and while CPU is still alive, the clock remains usable
and correct so we don't need to switch to some other clocksource.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c02027b5

x86/kvm: Teardown PV features on boot CPU as well · 8b79feff

由 Vitaly Kuznetsov 提交于 4月 14, 2021

Various PV features (Async PF, PV EOI, steal time) work through memory
shared with hypervisor and when we restore from hibernation we must
properly teardown all these features to make sure hypervisor doesn't
write to stale locations after we jump to the previously hibernated kernel
(which can try to place anything there). For secondary CPUs the job is
already done by kvm_cpu_down_prepare(), register syscore ops to do
the same for boot CPU.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8b79feff

05 5月, 2021 1 次提交

x86/kvm: Fix pr_info() for async PF setup/teardown · 0a269a00

由 Vitaly Kuznetsov 提交于 4月 14, 2021

'pr_fmt' already has 'kvm-guest: ' so 'KVM' prefix is redundant.
"Unregister pv shared memory" is very ambiguous, it's hard to
say which particular PV feature it relates to.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210414123544.1060604-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a269a00

03 5月, 2021 10 次提交

KVM: x86: Fix potential fput on a null source_kvm_file · 8899a5fc

由 Colin Ian King 提交于 4月 30, 2021

The fget can potentially return null, so the fput on the error return
path can cause a null pointer dereference. Fix this by checking for
a null source_kvm_file before doing a fput.

Addresses-Coverity: ("Dereference null return")
Fixes: 54526d1f ("KVM: x86: Support KVM VMs sharing SEV context")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Message-Id: <20210430170303.131924-1-colin.king@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8899a5fc

KVM: x86/mmu: Fix kdoc of __handle_changed_spte · 7f6231a3

由 Kai Huang 提交于 5月 03, 2021

The function name of kdoc of __handle_changed_spte() should be itself,
rather than handle_changed_spte().  Fix the typo.
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <20210503042446.154695-1-kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7f6231a3

KVM: nSVM: leave the guest mode prior to loading a nested state · 9d290e16

由 Maxim Levitsky 提交于 5月 03, 2021

This allows the KVM to load the nested state more than
once without warnings.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d290e16

KVM: nSVM: fix few bugs in the vmcb02 caching logic · c74ad08f

由 Maxim Levitsky 提交于 5月 03, 2021

* Define and use an invalid GPA (all ones) for init value of last
  and current nested vmcb physical addresses.

* Reset the current vmcb12 gpa to the invalid value when leaving
  the nested mode, similar to what is done on nested vmexit.

* Reset	the last seen vmcb12 address when disabling the nested SVM,
  as it relies on vmcb02 fields which are freed at that point.

Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c74ad08f

KVM: nSVM: fix a typo in svm_leave_nested · deee59ba

由 Maxim Levitsky 提交于 5月 03, 2021

When forcibly leaving the nested mode, we should switch to vmcb01

Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

deee59ba

kvm: exit halt polling on need_resched() as well · 262de410

由 Benjamin Segall 提交于 4月 29, 2021

single_task_running() is usually more general than need_resched()
but CFS_BANDWIDTH throttling will use resched_task() when there
is just one task to get the task to block. This was causing
long-need_resched warnings and was likely allowing VMs to
overrun their quota when halt polling.
Signed-off-by: NBen Segall <bsegall@google.com>
Signed-off-by: NVenkatesh Srinivas <venkateshs@chromium.org>
Message-Id: <20210429162233.116849-1-venkateshs@chromium.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: NJim Mattson <jmattson@google.com>

262de410

KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer · d981dd15

由 Wanpeng Li 提交于 4月 28, 2021

Commit ee66e453 (KVM: lapic: Busy wait for timer to expire when
using hv_timer) tries to set ktime->expired_tscdeadline by checking
ktime->hv_timer_in_use since lapic timer oneshot/periodic modes which
are emulated by vmx preemption timer also get advanced, they leverage
the same vmx preemption timer logic with tsc-deadline mode. However,
ktime->hv_timer_in_use is cleared before apic_timer_expired() handling,
let's delay this clearing in preemption-disabled region.

Fixes: ee66e453 ("KVM: lapic: Busy wait for timer to expire when using hv_timer")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1619608082-4187-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d981dd15

kvm/x86: Fix 'lpages' kvm stat for TDM MMU · 1699f65c

由 Shahin, Md Shahadat Hossain 提交于 4月 30, 2021

Large pages not being created properly may result in increased memory
access time. The 'lpages' kvm stat used to keep track of the current
number of large pages in the system, but with TDP MMU enabled the stat
is not showing the correct number.

This patch extends the lpages counter to cover the TDP case.
Signed-off-by: NMd Shahadat Hossain Shahin <shahinmd@amazon.de>
Cc: Bartosz Szczepanek <bsz@amazon.de>
Message-Id: <1619783551459.35424@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1699f65c

KVM: x86/mmu: Avoid unnecessary page table allocation in kvm_tdp_mmu_map() · ff76d506

由 Kai Huang 提交于 4月 29, 2021

In kvm_tdp_mmu_map(), while iterating TDP MMU page table entries, it is
possible SPTE has already been frozen by another thread but the frozen
is not done yet, for instance, when another thread is still in middle of
zapping large page. In this case, the !is_shadow_present_pte() check
for old SPTE in tdp_mmu_for_each_pte() may hit true, and in this case
allocating new page table is unnecessary since tdp_mmu_set_spte_atomic()
later will return false and page table will need to be freed. Add
is_removed_spte() check before allocating new page table to avoid this.
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <20210429041226.50279-1-kai.huang@intel.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff76d506

Merge tag 'for-linus-5.13-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 9ccce092

由 Linus Torvalds 提交于 5月 02, 2021

Pull orangefs updates from Mike Marshall:
 "orangefs: implement orangefs_readahead

  mm/readahead.c/read_pages was quite a bit different back when I put my
  open-coded readahead logic into orangefs_readpage. That logic seemed
  to work as designed back then, it is a trainwreck now.

  This implements orangefs_readahead using the new xarray and
  readahead_expand features and removes all my open-coded readahead
  logic.

  This results in an extreme read performance improvement, these sample
  numbers are from my test VM:

  Here's an example of what's upstream in
  5.11.8-200.fc33.x86_64:

     30+0 records in
     30+0 records out
     125829120 bytes (126 MB, 120 MiB) copied, 5.77943 s, 21.8 MB/s

  And here's this version of orangefs_readahead on top of 5.12.0-rc4:

     30+0 records in
     30+0 records out
     125829120 bytes (126 MB, 120 MiB) copied, 0.325919 s, 386 MB/s

  There are four xfstest regressions with this patch. David Howells and
  Matthew Wilcox have been helping me work with this code"

* tag 'for-linus-5.13-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: leave files in the page cache for a few micro seconds at least
  Orangef: implement orangefs_readahead.

9ccce092

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功