提交 · 67f4b9969c305be515e47f809ecacfd86bd20a9c · openeuler / Kernel

11 11月, 2021 4 次提交

KVM: nVMX: Handle dynamic MSR intercept toggling · 67f4b996

由 Sean Christopherson 提交于 11月 09, 2021

Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02.  This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.

The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.

The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.

Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.

Move the various accessor/mutators that are currently buried in vmx.c
into vmx.h so that they can be shared by the nested code.

Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
Fixes: d69129b4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67f4b996

KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use · 7dfbc624

由 Sean Christopherson 提交于 11月 09, 2021

Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled.  In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.

Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL.  Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.

Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.

Fixes: d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7dfbc624

KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc

由 Maxim Levitsky 提交于 11月 08, 2021

KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.

Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.

Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cae72dcc

kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool · e6cd31f1

由 Jim Mattson 提交于 11月 05, 2021

These function names sound like predicates, and they have siblings,
*is_valid_msr(), which _are_ predicates. Moreover, there are comments
that essentially warn that these functions behave unexpectedly.

Flip the polarity of the return values, so that they become
predicates, and convert the boolean result to a success/failure code
at the outer call site.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211105202058.1048757-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6cd31f1

25 10月, 2021 3 次提交

KVM: x86: SGX must obey the KVM_INTERNAL_ERROR_EMULATION protocol · 0d7d8449

由 David Edmondson 提交于 9月 20, 2021

When passing the failing address and size out to user space, SGX must
ensure not to trample on the earlier fields of the emulation_failure
sub-union of struct kvm_run.
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-5-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d7d8449

KVM: x86: On emulation failure, convey the exit reason, etc. to userspace · e615e355

由 David Edmondson 提交于 9月 20, 2021

Should instruction emulation fail, include the VM exit reason, etc. in
the emulation_failure data passed to userspace, in order that the VMM
can report it as a debugging aid when describing the failure.
Suggested-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e615e355

KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info · 0a62a031

由 David Edmondson 提交于 9月 20, 2021

Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a62a031

23 10月, 2021 1 次提交

KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup · ec5a4919

由 Sean Christopherson 提交于 10月 08, 2021

Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.

Fixes: bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211009001107.3936588-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec5a4919

22 10月, 2021 6 次提交

KVM: x86: Add vendor name to kvm_x86_ops, use it for error messages · 9dadfc4a

由 Sean Christopherson 提交于 10月 18, 2021

Paul pointed out the error messages when KVM fails to load are unhelpful
in understanding exactly what went wrong if userspace probes the "wrong"
module.

Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
and kvm_amd, and use the name for relevant error message when KVM fails
to load so that the user knows which module failed to load.

Opportunistically tweak the "disabled by bios" error message to clarify
that _support_ was disabled, not that the module itself was magically
disabled by BIOS.
Suggested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211018183929.897461-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dadfc4a

KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 · 540c7abe

由 Wanpeng Li 提交于 10月 19, 2021

SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.
Tested-by: NLike Xu <likexu@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

540c7abe

KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit · e099f3eb

由 Xiaoyao Li 提交于 8月 27, 2021

Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID
leaf 0x14.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-5-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e099f3eb

KVM: VMX: Rename pt_desc.addr_range to pt_desc.num_address_ranges · f4d3a902

由 Xiaoyao Li 提交于 8月 27, 2021

To better self explain the meaning of this field and match the
PT_CAP_num_address_ranges constatn.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4d3a902

KVM: VMX: Use precomputed vmx->pt_desc.addr_range · ba51d627

由 Xiaoyao Li 提交于 8月 27, 2021

The number of valid PT ADDR MSRs for the guest is precomputed in
vmx->pt_desc.addr_range. Use it instead of calculating again.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba51d627

KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero · 2e6e0d68

由 Xiaoyao Li 提交于 8月 27, 2021

A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary.

Opportunistically refine the comment to call out that KVM requires
VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e6e0d68

01 10月, 2021 1 次提交

KVM: x86: nSVM: implement nested TSC scaling · 5228eb96

由 Maxim Levitsky 提交于 9月 14, 2021

This was tested by booting a nested guest with TSC=1Ghz,
observing the clocks, and doing about 100 cycles of migration.

Note that qemu patch is needed to support migration because
of a new MSR that needs to be placed in the migration state.

The patch will be sent to the qemu mailing list soon.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-14-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5228eb96

30 9月, 2021 5 次提交

KVM: VMX: Move RESET emulation to vmx_vcpu_reset() · 06692e4b

由 Sean Christopherson 提交于 9月 20, 2021

Move vCPU RESET emulation, including initializating of select VMCS state,
to vmx_vcpu_reset().  Drop the open coded "vCPU load" sequence, as
->vcpu_reset() is invoked while the vCPU is properly loaded (which is
kind of the point of ->vcpu_reset()...).  Hopefully KVM will someday
expose a dedicated RESET ioctl(), and in the meantime separating "create"
from "RESET" is a nice cleanup.

Deferring VMCS initialization is effectively a nop as it's impossible to
safely access the VMCS between the current call site and its new home, as
both the vCPU and the pCPU are put immediately after init_vmcs(), i.e.
the VMCS isn't guaranteed to be loaded.

Note, task preemption is not a problem as vmx_sched_in() _can't_ touch
the VMCS as ->sched_in() is invoked before the vCPU, and thus VMCS, is
reloaded.  I.e. the preemption path also can't consume VMCS state.

Cc: Reiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210921000303.400537-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

06692e4b

KVM: VMX: Drop explicit zeroing of MSR guest values at vCPU creation · d0656735

由 Sean Christopherson 提交于 9月 20, 2021

Don't zero out user return and nested MSRs during vCPU creation, and
instead rely on vcpu_vmx being zero-allocated.  Explicitly zeroing MSRs
is not wrong, and is in fact necessary if KVM ever emulates vCPU RESET
outside of vCPU creation, but zeroing only a subset of MSRs is confusing.

Poking directly into KVM's backing is also undesirable in that it doesn't
scale and is error prone.  Ideally KVM would have a common RESET path for
all MSRs, e.g. by expanding kvm_set_msr(), which would obviate the need
for this out-of-bad code (to support standalone RESET).

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210921000303.400537-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0656735

KVM: x86: Do not mark all registers as avail/dirty during RESET/INIT · ff8828c8

由 Sean Christopherson 提交于 9月 20, 2021

Do not blindly mark all registers as available+dirty at RESET/INIT, and
instead rely on writes to registers to go through the proper mutators or
to explicitly mark registers as dirty.  INIT in particular does not blindly
overwrite all registers, e.g. select bits in CR0 are preserved across INIT,
thus marking registers available+dirty without first reading the register
from hardware is incorrect.

In practice this is a benign bug as KVM doesn't let the guest control CR0
bits that are preserved across INIT, and all other true registers are
explicitly written during the RESET/INIT flows.  The PDPTRs and EX_INFO
"registers" are not explicitly written, but accessing those values during
RESET/INIT is nonsensical and would be a KVM bug regardless of register
caching.

Fixes: 66f7b72e ("KVM: x86: Make register state after reset conform to specification")
[sean: !!! NOT FOR STABLE !!!]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210921000303.400537-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff8828c8

KVM: nVMX: Reset vmxon_ptr upon VMXOFF emulation. · feb3162f

由 Vitaly Kuznetsov 提交于 9月 30, 2021

Currently, 'vmx->nested.vmxon_ptr' is not reset upon VMXOFF
emulation. This is not a problem per se as we never access
it when !vmx->nested.vmxon. But this should be done to avoid
any issue in the future.

Also, initialize the vmxon_ptr when vcpu is created.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Message-Id: <20210929175154.11396-3-yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

feb3162f

KVM: nVMX: Use INVALID_GPA for pointers used in nVMX. · 64c78508

由 Yu Zhang 提交于 9月 30, 2021

Clean up nested.c and vmx.c by using INVALID_GPA instead of "-1ull",
to denote an invalid address in nested VMX. Affected addresses are
the ones of VMXON region, current VMCS, VMCS link pointer, virtual-
APIC page, ENCLS-exiting bitmap, and IO bitmap etc.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Message-Id: <20210929175154.11396-2-yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

64c78508

27 9月, 2021 1 次提交

KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue · 5c49d185

由 Zhenzhong Duan 提交于 9月 26, 2021

When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
Modifying guest_uret_msrs directly is completely broken as 'i' does not
point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an
out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
loop. By sheer dumb luck, the fallout is limited to "only" failing to
preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is
benign as it's guaranteed to clear a bit in a guest MSR value, which are
always zero at vCPU creation on both x86-64 and i386.

Cc: stable@vger.kernel.org
Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list")
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c49d185

22 9月, 2021 7 次提交

KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit · dbab610a

由 Maxim Levitsky 提交于 9月 13, 2021

If L1 had invalid state on VM entry (can happen on SMM transactions
when we enter from real mode, straight to nested guest),

then after we load 'host' state from VMCS12, the state has to become
valid again, but since we load the segment registers with
__vmx_set_segment we weren't always updating emulation_required.

Update emulation_required explicitly at end of load_vmcs12_host_state.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dbab610a

KVM: x86: nVMX: don't fail nested VM entry on invalid guest state if !from_vmentry · c8607e4a

由 Maxim Levitsky 提交于 9月 13, 2021

It is possible that when non root mode is entered via special entry
(!from_vmentry), that is from SMM or from loading the nested state,
the L2 state could be invalid in regard to non unrestricted guest mode,
but later it can become valid.

(for example when RSM emulation restores segment registers from SMRAM)

Thus delay the check to VM entry, where we will check this and fail.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c8607e4a

KVM: x86: VMX: synthesize invalid VM exit when emulating invalid guest state · c42dec14

由 Maxim Levitsky 提交于 9月 13, 2021

Since no actual VM entry happened, the VM exit information is stale.
To avoid this, synthesize an invalid VM guest state VM exit.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c42dec14

KVM: nVMX: Filter out all unsupported controls when eVMCS was activated · 8d68bad6

由 Vitaly Kuznetsov 提交于 9月 07, 2021

Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when
enlightened VMCS is advertised. Debugging revealed there are two exposed
secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and
SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported,
as there are no corresponding fields in eVMCSv1 (see the comment above
EVMCS1_UNSUPPORTED_2NDEXEC definition).

Previously, commit 31de3d25 ("x86/kvm/hyper-v: move VMX controls
sanitization out of nested_enable_evmcs()") introduced the required
filtering mechanism for VMX MSRs but for some reason put only known
to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls
there.

Note, Windows Server 2022 seems to have gained some sanity check for VMX
MSRs: it doesn't even try to launch a guest when there's something it
doesn't like, nested_evmcs_check_controls() mechanism can't catch the
problem.

Let's be bold this time and instead of playing whack-a-mole just filter out
all unsupported controls from VMX MSRs.

Fixes: 31de3d25 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210907163530.110066-1-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8d68bad6

KVM: nVMX: Fix nested bus lock VM exit · 24a996ad

由 Chenyi Qiang 提交于 9月 14, 2021

Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
VM exit, it will be directed to L1 VMM, which would cause unexpected
behavior. Therefore, handle L2's bus lock VM exits in L0 directly.

Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

24a996ad

KVM: nVMX: fix comments of handle_vmon() · ed7023a1

由 Yu Zhang 提交于 9月 09, 2021

"VMXON pointer" is saved in vmx->nested.vmxon_ptr since
commit 3573e22c ("KVM: nVMX: additional checks on
vmxon region"). Also, handle_vmptrld() & handle_vmclear()
now have logic to check the VMCS pointer against the VMXON
pointer.

So just remove the obsolete comments of handle_vmon().
Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed7023a1

KVM: VMX: Remove defunct "nr_active_uret_msrs" field · cd36ae87

由 Sean Christopherson 提交于 9月 07, 2021

Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are
both defunct now that KVM keeps the list constant and instead explicitly
tracks which entries need to be loaded into hardware.

No functional change intended.

Fixes: ee9d22e0 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210908002401.1947049-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd36ae87

06 9月, 2021 1 次提交

KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation · 81b4b56d

由 Maxim Levitsky 提交于 8月 26, 2021

If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81b4b56d

21 8月, 2021 1 次提交

KVM: x86: Allow CPU to force vendor-specific TDP level · 746700d2

由 Wei Huang 提交于 8月 18, 2021

AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set.
To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level
on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force
a fixed TDP level.
Signed-off-by: NWei Huang <wei.huang2@amd.com>
Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

746700d2

13 8月, 2021 10 次提交

KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter · f7782bb8

由 Sean Christopherson 提交于 8月 10, 2021

Clear nested.pi_pending on nested VM-Enter even if L2 will run without
posted interrupts enabled. If nested.pi_pending is left set from a
previous L2, vmx_complete_nested_posted_interrupt() will pick up the
stale flag and exit to userspace with an "internal emulation error" due
the new L2 not having a valid nested.pi_desc.

Arguably, vmx_complete_nested_posted_interrupt() should first check for
posted interrupts being enabled, but it's also completely reasonable that
KVM wouldn't screw up a fundamental flag. Not to mention that the mere
existence of nested.pi_pending is a long-standing bug as KVM shouldn't
move the posted interrupt out of the IRR until it's actually processed,
e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit
with a "pending" posted interrupt. Fixing the mess is a future problem.

Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI
descriptor as an error, this was a benign bug as the null PI descriptor
effectively served as a check on PI not being enabled. Even then, the
new flow did not become problematic until KVM started checking the result
of kvm_check_nested_events().

Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing")
Fixes: 966eefb8 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable")
Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810144526.2662272-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7782bb8

KVM: x86: Clean up redundant ROL16(val, n) macro definition · c1a527a1

由 Like Xu 提交于 8月 09, 2021

The ROL16(val, n) macro is repeatedly defined in several vmcs-related
files, and it has never been used outside the KVM context.

Let's move it to vmcs.h without any intended functional changes.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20210809093410.59304-4-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1a527a1

KVM: x86: Move declaration of kvm_spurious_fault() to x86.h · 65297341

由 Uros Bizjak 提交于 8月 09, 2021

Move the declaration of kvm_spurious_fault() to KVM's "private" x86.h,
it should never be called by anything other than low level KVM code.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
[sean: rebased to a series without __ex()/__kvm_handle_fault_on_reboot()]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210809173955.1710866-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

65297341

KVM: x86: Kill off __ex() and __kvm_handle_fault_on_reboot() · ad0577c3

由 Sean Christopherson 提交于 8月 09, 2021

Remove the __kvm_handle_fault_on_reboot() and __ex() macros now that all
VMX and SVM instructions use asm goto to handle the fault (or in the
case of VMREAD, completely custom logic).  Drop kvm_spurious_fault()'s
asmlinkage annotation as __kvm_handle_fault_on_reboot() was the only
flow that invoked it from assembly code.

Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210809173955.1710866-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad0577c3

KVM: VMX: Hide VMCS control calculators in vmx.c · 2fba4fc1

由 Sean Christopherson 提交于 8月 10, 2021

Now that nested VMX pulls KVM's desired VMCS controls from vmcs01 instead
of re-calculating on the fly, bury the helpers that do the calcluations
in vmx.c.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2fba4fc1

KVM: VMX: Drop caching of KVM's desired sec exec controls for vmcs01 · b6247686

由 Sean Christopherson 提交于 8月 10, 2021

Remove the secondary execution controls cache now that it's effectively
dead code; it is only read immediately after it is written.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6247686

KVM: nVMX: Pull KVM L0's desired controls directly from vmcs01 · 389ab252

由 Sean Christopherson 提交于 8月 10, 2021

When preparing controls for vmcs02, grab KVM's desired controls from
vmcs01's shadow state instead of recalculating the controls from scratch,
or in the secondary execution controls, instead of using the dedicated
cache. Calculating secondary exec controls is eye-poppingly expensive
due to the guest CPUID checks, hence the dedicated cache, but the other
calculations aren't exactly free either.

Explicitly clear several bits (x2APIC, DESC exiting, and load EFER on
exit) as appropriate as they may be set in vmcs01, whereas the previous
implementation relied on dynamic bits being cleared in the calculator.

Intentionally propagate VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL from
vmcs01 to vmcs02. Whether or not PERF_GLOBAL_CTRL is loaded depends on
whether or not perf itself is active, so unless perf stops between the
exit from L1 and entry to L2, vmcs01 will hold the desired value. This
is purely an optimization as atomic_switch_perf_msrs() will set/clear
the control as needed at VM-Enter, i.e. it avoids two extra VMWRITEs in
the case where perf is active (versus starting with the bits clear in
vmcs02, which was the previous behavior).

Cc: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

389ab252

KVM: VMX: Reset DR6 only when KVM_DEBUGREG_WONT_EXIT · 1ccb6f98

由 Paolo Bonzini 提交于 8月 10, 2021

The commit efdab992 ("KVM: x86: fix escape of guest dr6 to the host")
fixed a bug by resetting DR6 unconditionally when the vcpu being scheduled out.

But writing to debug registers is slow, and it can be visible in perf results
sometimes, even if neither the host nor the guest activate breakpoints.

Since KVM_DEBUGREG_WONT_EXIT on Intel processors is the only case
where DR6 gets the guest value, and it never happens at all on SVM,
the register can be cleared in vmx.c right after reading it.
Reported-by: NLai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ccb6f98

KVM: X86: Set host DR6 only on VMX and for KVM_DEBUGREG_WONT_EXIT · 375e28ff

由 Paolo Bonzini 提交于 8月 10, 2021

Commit c77fb5fe ("KVM: x86: Allow the guest to run with dirty debug
registers") allows the guest accessing to DRs without exiting when
KVM_DEBUGREG_WONT_EXIT and we need to ensure that they are synchronized
on entry to the guest---including DR6 that was not synced before the commit.

But the commit sets the hardware DR6 not only when KVM_DEBUGREG_WONT_EXIT,
but also when KVM_DEBUGREG_BP_ENABLED. The second case is unnecessary
and just leads to a more case which leaks stale DR6 to the host which has
to be resolved by unconditionally reseting DR6 in kvm_arch_vcpu_put().

Even if KVM_DEBUGREG_WONT_EXIT, however, setting the host DR6 only matters
on VMX because SVM always uses the DR6 value from the VMCB. So move this
line to vmx.c and make it conditional on KVM_DEBUGREG_WONT_EXIT.
Reported-by: NLai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

375e28ff

KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF · 18712c13

由 Sean Christopherson 提交于 8月 11, 2021

Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
in L2 or if the VM-Exit should be forwarded to L1.  The current logic fails
to account for the case where #PF is intercepted to handle
guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
L1.  At best, L1 will complain and inject the #PF back into L2.  At
worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
page faults.

Note, while the bug was technically introduced by the commit that added
support for the MAXPHYADDR madness, the shame is all on commit
a0c13434 ("KVM: VMX: introduce vmx_need_pf_intercept").

Fixes: 1dbf5d68 ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
Cc: stable@vger.kernel.org
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812045615.3167686-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

18712c13

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功