- 11 11月, 2021 4 次提交
-
-
由 Sean Christopherson 提交于
Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2, and always update the relevant bits in vmcs02. This fixes two distinct, but intertwined bugs related to dynamic MSR bitmap modifications. The first issue is that KVM fails to enable MSR interception in vmcs02 for the FS/GS base MSRs if L1 first runs L2 with interception disabled, and later enables interception. The second issue is that KVM fails to honor userspace MSR filtering when preparing vmcs02. Fix both issues simultaneous as fixing only one of the issues (doesn't matter which) would create a mess that no one should have to bisect. Fixing only the first bug would exacerbate the MSR filtering issue as userspace would see inconsistent behavior depending on the whims of L1. Fixing only the second bug (MSR filtering) effectively requires fixing the first, as the nVMX code only knows how to transition vmcs02's bitmap from 1->0. Move the various accessor/mutators that are currently buried in vmx.c into vmx.h so that they can be shared by the nested code. Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering") Fixes: d69129b4 ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible") Cc: stable@vger.kernel.org Cc: Alexander Graf <graf@amazon.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Check the current VMCS controls to determine if an MSR write will be intercepted due to MSR bitmaps being disabled. In the nested VMX case, KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or if KVM can't map L1's bitmaps for whatever reason. Note, the bad behavior is relatively benign in the current code base as KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and only if L0 KVM also disables interception of an MSR, and only uses the buggy helper for MSR_IA32_SPEC_CTRL. Because KVM explicitly tests WRMSR before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it isn't strictly necessary. Tag the fix for stable in case a future fix wants to use msr_write_intercepted(), in which case a buggy implementation in older kernels could prove subtly problematic. Fixes: d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211109013047.2041518-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using standard kvm's inject_pending_event, and not via APICv/AVIC. Since this is a debug feature, just inhibit APICv/AVIC while KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU. Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ") Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Tested-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
These function names sound like predicates, and they have siblings, *is_valid_msr(), which _are_ predicates. Moreover, there are comments that essentially warn that these functions behave unexpectedly. Flip the polarity of the return values, so that they become predicates, and convert the boolean result to a success/failure code at the outer call site. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211105202058.1048757-1-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 10月, 2021 3 次提交
-
-
由 David Edmondson 提交于
When passing the failing address and size out to user space, SGX must ensure not to trample on the earlier fields of the emulation_failure sub-union of struct kvm_run. Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-5-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
Should instruction emulation fail, include the VM exit reason, etc. in the emulation_failure data passed to userspace, in order that the VMM can report it as a debugging aid when describing the failure. Suggested-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Edmondson 提交于
Extend the get_exit_info static call to provide the reason for the VM exit. Modify relevant trace points to use this rather than extracting the reason in the caller. Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 10月, 2021 1 次提交
-
-
由 Sean Christopherson 提交于
Unregister KVM's posted interrupt wakeup handler during unsetup so that a spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't call into freed memory. Fixes: bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211009001107.3936588-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2021 6 次提交
-
-
由 Sean Christopherson 提交于
Paul pointed out the error messages when KVM fails to load are unhelpful in understanding exactly what went wrong if userspace probes the "wrong" module. Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel and kvm_amd, and use the name for relevant error message when KVM fails to load so that the user knows which module failed to load. Opportunistically tweak the "disabled by bios" error message to clarify that _support_ was disabled, not that the module itself was magically disabled by BIOS. Suggested-by: NPaul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211018183929.897461-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
SDM section 18.2.3 mentioned that: "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of any general-purpose or fixed-function counters via a single WRMSR." It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing, the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop global_ovf_ctrl variable. Tested-by: NLike Xu <likexu@tencent.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID leaf 0x14. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-5-xiaoyao.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
To better self explain the meaning of this field and match the PT_CAP_num_address_ranges constatn. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
The number of valid PT ADDR MSRs for the guest is precomputed in vmx->pt_desc.addr_range. Use it instead of calculating again. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary. Opportunistically refine the comment to call out that KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 10月, 2021 1 次提交
-
-
由 Maxim Levitsky 提交于
This was tested by booting a nested guest with TSC=1Ghz, observing the clocks, and doing about 100 cycles of migration. Note that qemu patch is needed to support migration because of a new MSR that needs to be placed in the migration state. The patch will be sent to the qemu mailing list soon. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-14-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 9月, 2021 5 次提交
-
-
由 Sean Christopherson 提交于
Move vCPU RESET emulation, including initializating of select VMCS state, to vmx_vcpu_reset(). Drop the open coded "vCPU load" sequence, as ->vcpu_reset() is invoked while the vCPU is properly loaded (which is kind of the point of ->vcpu_reset()...). Hopefully KVM will someday expose a dedicated RESET ioctl(), and in the meantime separating "create" from "RESET" is a nice cleanup. Deferring VMCS initialization is effectively a nop as it's impossible to safely access the VMCS between the current call site and its new home, as both the vCPU and the pCPU are put immediately after init_vmcs(), i.e. the VMCS isn't guaranteed to be loaded. Note, task preemption is not a problem as vmx_sched_in() _can't_ touch the VMCS as ->sched_in() is invoked before the vCPU, and thus VMCS, is reloaded. I.e. the preemption path also can't consume VMCS state. Cc: Reiji Watanabe <reijiw@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210921000303.400537-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Don't zero out user return and nested MSRs during vCPU creation, and instead rely on vcpu_vmx being zero-allocated. Explicitly zeroing MSRs is not wrong, and is in fact necessary if KVM ever emulates vCPU RESET outside of vCPU creation, but zeroing only a subset of MSRs is confusing. Poking directly into KVM's backing is also undesirable in that it doesn't scale and is error prone. Ideally KVM would have a common RESET path for all MSRs, e.g. by expanding kvm_set_msr(), which would obviate the need for this out-of-bad code (to support standalone RESET). No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210921000303.400537-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Do not blindly mark all registers as available+dirty at RESET/INIT, and instead rely on writes to registers to go through the proper mutators or to explicitly mark registers as dirty. INIT in particular does not blindly overwrite all registers, e.g. select bits in CR0 are preserved across INIT, thus marking registers available+dirty without first reading the register from hardware is incorrect. In practice this is a benign bug as KVM doesn't let the guest control CR0 bits that are preserved across INIT, and all other true registers are explicitly written during the RESET/INIT flows. The PDPTRs and EX_INFO "registers" are not explicitly written, but accessing those values during RESET/INIT is nonsensical and would be a KVM bug regardless of register caching. Fixes: 66f7b72e ("KVM: x86: Make register state after reset conform to specification") [sean: !!! NOT FOR STABLE !!!] Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20210921000303.400537-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Currently, 'vmx->nested.vmxon_ptr' is not reset upon VMXOFF emulation. This is not a problem per se as we never access it when !vmx->nested.vmxon. But this should be done to avoid any issue in the future. Also, initialize the vmxon_ptr when vcpu is created. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210929175154.11396-3-yu.c.zhang@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yu Zhang 提交于
Clean up nested.c and vmx.c by using INVALID_GPA instead of "-1ull", to denote an invalid address in nested VMX. Affected addresses are the ones of VMXON region, current VMCS, VMCS link pointer, virtual- APIC page, ENCLS-exiting bitmap, and IO bitmap etc. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210929175154.11396-2-yu.c.zhang@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 9月, 2021 1 次提交
-
-
由 Zhenzhong Duan 提交于
When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry, clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i]. Modifying guest_uret_msrs directly is completely broken as 'i' does not point at the MSR_IA32_TSX_CTRL entry. In fact, it's guaranteed to be an out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior loop. By sheer dumb luck, the fallout is limited to "only" failing to preserve the host's TSX_CTRL_CPUID_CLEAR. The out-of-bounds access is benign as it's guaranteed to clear a bit in a guest MSR value, which are always zero at vCPU creation on both x86-64 and i386. Cc: stable@vger.kernel.org Fixes: 8ea8b8d6 ("KVM: VMX: Use common x86's uret MSR list as the one true list") Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 9月, 2021 7 次提交
-
-
由 Maxim Levitsky 提交于
If L1 had invalid state on VM entry (can happen on SMM transactions when we enter from real mode, straight to nested guest), then after we load 'host' state from VMCS12, the state has to become valid again, but since we load the segment registers with __vmx_set_segment we weren't always updating emulation_required. Update emulation_required explicitly at end of load_vmcs12_host_state. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
It is possible that when non root mode is entered via special entry (!from_vmentry), that is from SMM or from loading the nested state, the L2 state could be invalid in regard to non unrestricted guest mode, but later it can become valid. (for example when RSM emulation restores segment registers from SMRAM) Thus delay the check to VM entry, where we will check this and fail. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-7-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Maxim Levitsky 提交于
Since no actual VM entry happened, the VM exit information is stale. To avoid this, synthesize an invalid VM guest state VM exit. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-6-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when enlightened VMCS is advertised. Debugging revealed there are two exposed secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported, as there are no corresponding fields in eVMCSv1 (see the comment above EVMCS1_UNSUPPORTED_2NDEXEC definition). Previously, commit 31de3d25 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()") introduced the required filtering mechanism for VMX MSRs but for some reason put only known to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls there. Note, Windows Server 2022 seems to have gained some sanity check for VMX MSRs: it doesn't even try to launch a guest when there's something it doesn't like, nested_evmcs_check_controls() mechanism can't catch the problem. Let's be bold this time and instead of playing whack-a-mole just filter out all unsupported controls from VMX MSRs. Fixes: 31de3d25 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()") Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210907163530.110066-1-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
Nested bus lock VM exits are not supported yet. If L2 triggers bus lock VM exit, it will be directed to L1 VMM, which would cause unexpected behavior. Therefore, handle L2's bus lock VM exits in L0 directly. Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit") Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yu Zhang 提交于
"VMXON pointer" is saved in vmx->nested.vmxon_ptr since commit 3573e22c ("KVM: nVMX: additional checks on vmxon region"). Also, handle_vmptrld() & handle_vmclear() now have logic to check the VMCS pointer against the VMXON pointer. So just remove the obsolete comments of handle_vmon(). Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are both defunct now that KVM keeps the list constant and instead explicitly tracks which entries need to be loaded into hardware. No functional change intended. Fixes: ee9d22e0 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210908002401.1947049-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 9月, 2021 1 次提交
-
-
由 Maxim Levitsky 提交于
If we are emulating an invalid guest state, we don't have a correct exit reason, and thus we shouldn't do anything in this function. Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 8月, 2021 1 次提交
-
-
由 Wei Huang 提交于
AMD future CPUs will require a 5-level NPT if host CR4.LA57 is set. To prevent kvm_mmu_get_tdp_level() from incorrectly changing NPT level on behalf of CPUs, add a new parameter in kvm_configure_mmu() to force a fixed TDP level. Signed-off-by: NWei Huang <wei.huang2@amd.com> Message-Id: <20210818165549.3771014-2-wei.huang2@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 8月, 2021 10 次提交
-
-
由 Sean Christopherson 提交于
Clear nested.pi_pending on nested VM-Enter even if L2 will run without posted interrupts enabled. If nested.pi_pending is left set from a previous L2, vmx_complete_nested_posted_interrupt() will pick up the stale flag and exit to userspace with an "internal emulation error" due the new L2 not having a valid nested.pi_desc. Arguably, vmx_complete_nested_posted_interrupt() should first check for posted interrupts being enabled, but it's also completely reasonable that KVM wouldn't screw up a fundamental flag. Not to mention that the mere existence of nested.pi_pending is a long-standing bug as KVM shouldn't move the posted interrupt out of the IRR until it's actually processed, e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit with a "pending" posted interrupt. Fixing the mess is a future problem. Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI descriptor as an error, this was a benign bug as the null PI descriptor effectively served as a check on PI not being enabled. Even then, the new flow did not become problematic until KVM started checking the result of kvm_check_nested_events(). Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing") Fixes: 966eefb8 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails") Cc: stable@vger.kernel.org Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210810144526.2662272-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Like Xu 提交于
The ROL16(val, n) macro is repeatedly defined in several vmcs-related files, and it has never been used outside the KVM context. Let's move it to vmcs.h without any intended functional changes. Signed-off-by: NLike Xu <likexu@tencent.com> Message-Id: <20210809093410.59304-4-likexu@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Uros Bizjak 提交于
Move the declaration of kvm_spurious_fault() to KVM's "private" x86.h, it should never be called by anything other than low level KVM code. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NUros Bizjak <ubizjak@gmail.com> [sean: rebased to a series without __ex()/__kvm_handle_fault_on_reboot()] Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210809173955.1710866-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove the __kvm_handle_fault_on_reboot() and __ex() macros now that all VMX and SVM instructions use asm goto to handle the fault (or in the case of VMREAD, completely custom logic). Drop kvm_spurious_fault()'s asmlinkage annotation as __kvm_handle_fault_on_reboot() was the only flow that invoked it from assembly code. Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210809173955.1710866-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Now that nested VMX pulls KVM's desired VMCS controls from vmcs01 instead of re-calculating on the fly, bury the helpers that do the calcluations in vmx.c. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove the secondary execution controls cache now that it's effectively dead code; it is only read immediately after it is written. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
When preparing controls for vmcs02, grab KVM's desired controls from vmcs01's shadow state instead of recalculating the controls from scratch, or in the secondary execution controls, instead of using the dedicated cache. Calculating secondary exec controls is eye-poppingly expensive due to the guest CPUID checks, hence the dedicated cache, but the other calculations aren't exactly free either. Explicitly clear several bits (x2APIC, DESC exiting, and load EFER on exit) as appropriate as they may be set in vmcs01, whereas the previous implementation relied on dynamic bits being cleared in the calculator. Intentionally propagate VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL from vmcs01 to vmcs02. Whether or not PERF_GLOBAL_CTRL is loaded depends on whether or not perf itself is active, so unless perf stops between the exit from L1 and entry to L2, vmcs01 will hold the desired value. This is purely an optimization as atomic_switch_perf_msrs() will set/clear the control as needed at VM-Enter, i.e. it avoids two extra VMWRITEs in the case where perf is active (versus starting with the bits clear in vmcs02, which was the previous behavior). Cc: Zeng Guang <guang.zeng@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210810171952.2758100-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The commit efdab992 ("KVM: x86: fix escape of guest dr6 to the host") fixed a bug by resetting DR6 unconditionally when the vcpu being scheduled out. But writing to debug registers is slow, and it can be visible in perf results sometimes, even if neither the host nor the guest activate breakpoints. Since KVM_DEBUGREG_WONT_EXIT on Intel processors is the only case where DR6 gets the guest value, and it never happens at all on SVM, the register can be cleared in vmx.c right after reading it. Reported-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Commit c77fb5fe ("KVM: x86: Allow the guest to run with dirty debug registers") allows the guest accessing to DRs without exiting when KVM_DEBUGREG_WONT_EXIT and we need to ensure that they are synchronized on entry to the guest---including DR6 that was not synced before the commit. But the commit sets the hardware DR6 not only when KVM_DEBUGREG_WONT_EXIT, but also when KVM_DEBUGREG_BP_ENABLED. The second case is unnecessary and just leads to a more case which leaks stale DR6 to the host which has to be resolved by unconditionally reseting DR6 in kvm_arch_vcpu_put(). Even if KVM_DEBUGREG_WONT_EXIT, however, setting the host DR6 only matters on VMX because SVM always uses the DR6 value from the VMCB. So move this line to vmx.c and make it conditional on KVM_DEBUGREG_WONT_EXIT. Reported-by: NLai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF in L2 or if the VM-Exit should be forwarded to L1. The current logic fails to account for the case where #PF is intercepted to handle guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into L1. At best, L1 will complain and inject the #PF back into L2. At worst, L1 will eat the unexpected fault and cause L2 to hang on infinite page faults. Note, while the bug was technically introduced by the commit that added support for the MAXPHYADDR madness, the shame is all on commit a0c13434 ("KVM: VMX: introduce vmx_need_pf_intercept"). Fixes: 1dbf5d68 ("KVM: VMX: Add guest physical address check in EPT violation and misconfig") Cc: stable@vger.kernel.org Cc: Peter Shier <pshier@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210812045615.3167686-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-