- 18 6月, 2021 38 次提交
-
-
由 Sean Christopherson 提交于
Now that .post_leave_smm() is gone, drop "pre_" from the remaining helpers. The helpers aren't invoked purely before SMI/RSM processing, e.g. both helpers are invoked after state is snapshotted (from regs or SMRAM), and the RSM helper is invoked after some amount of register state has been stuffed. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-10-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Drop the .post_leave_smm() emulator callback, which at this point is just a wrapper to kvm_mmu_reset_context(). The manual context reset is unnecessary, because unlike enter_smm() which calls vendor MSR/CR helpers directly, em_rsm() bounces through the KVM helpers, e.g. kvm_set_cr4(), which are responsible for processing side effects. em_rsm() is already subtly relying on this behavior as it doesn't manually do kvm_update_cpuid_runtime(), e.g. to recognize CR4.OSXSAVE changes. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-9-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename the SMM tracepoint, which handles both entering and exiting SMM, from kvm_enter_smm to kvm_smm_transition. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-8-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Invoke the "entering SMM" tracepoint from kvm_smm_changed() instead of enter_smm(), effectively moving it from before reading vCPU state to after reading state (but still before writing it to SMRAM!). The primary motivation is to consolidate code, but calling the tracepoint from kvm_smm_changed() also makes its invocation consistent with respect to SMI and RSM, and with respect to KVM_SET_VCPU_EVENTS (which previously only invoked the tracepoint when forcing the vCPU out of SMM). Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-7-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the core of SMM hflags modifications into kvm_smm_changed() and use kvm_smm_changed() in enter_smm(). Clear HF_SMM_INSIDE_NMI_MASK for leaving SMM but do not set it for entering SMM. If the vCPU is executing outside of SMM, the flag should unequivocally be cleared, e.g. this technically fixes a benign bug where the flag could be left set after KVM_SET_VCPU_EVENTS, but the reverse is not true as NMI blocking depends on pre-SMM state or userspace input. Note, this adds an extra kvm_mmu_reset_context() to enter_smm(). The extra/early reset isn't strictly necessary, and in a way can never be necessary since the vCPU/MMU context is in a half-baked state until the final context reset at the end of the function. But, enter_smm() is not a hot path, and exploding on an invalid root_hpa is probably better than having a stale SMM flag in the MMU role; it's at least no worse. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-6-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move RSM emulation's call to kvm_smm_changed() from .post_leave_smm() to .exiting_smm(), leaving behind the MMU context reset. The primary motivation is to allow for future cleanup, but this also fixes a bug of sorts by queueing KVM_REQ_EVENT even if RSM causes shutdown, e.g. to let an INIT wake the vCPU from shutdown. Of course, KVM doesn't properly emulate a shutdown state, e.g. KVM doesn't block SMIs after shutdown, and immediately exits to userspace, so the event request is a moot point in practice. Moving kvm_smm_changed() also moves the RSM tracepoint. This isn't strictly necessary, but will allow consolidating the SMI and RSM tracepoints in a future commit (by also moving the SMI tracepoint). Invoking the tracepoint before loading SMRAM state also means the SMBASE that reported in the tracepoint will point that the state that will be used for RSM, as opposed to the SMBASE _after_ RSM completes, which is arguably a good thing if the tracepoint is being used to debug a RSM/SMM issue. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Replace the .set_hflags() emulator hook with a dedicated .exiting_smm(), moving the SMM and SMM_INSIDE_NMI flag handling out of the emulator in the process. This is a step towards consolidating much of the logic in kvm_smm_changed(), including the SMM hflags updates. No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Use the recently introduced KVM_REQ_TRIPLE_FAULT to properly emulate shutdown if RSM from SMM fails. Note, entering shutdown after clearing the SMM flag and restoring NMI blocking is architecturally correct with respect to AMD's APM, which KVM also uses for SMRAM layout and RSM NMI blocking behavior. The APM says: An RSM causes a processor shutdown if an invalid-state condition is found in the SMRAM state-save area. Only an external reset, external processor-initialization, or non-maskable external interrupt (NMI) can cause the processor to leave the shutdown state. Of note is processor-initialization (INIT) as a valid shutdown wake event, as INIT is blocked by SMM, implying that entering shutdown also forces the CPU out of SMM. For recent Intel CPUs, restoring NMI blocking is technically wrong, but so is restoring NMI blocking in the first place, and Intel's RSM "architecture" is such a mess that just about anything is allowed and can be justified as micro-architectural behavior. Per the SDM: On Pentium 4 and later processors, shutdown will inhibit INTR and A20M but will not change any of the other inhibits. On these processors, NMIs will be inhibited if no action is taken in the SMI handler to uninhibit them (see Section 34.8). where Section 34.8 says: When the processor enters SMM while executing an NMI handler, the processor saves the SMRAM state save map but does not save the attribute to keep NMI interrupts disabled. Potentially, an NMI could be latched (while in SMM or upon exit) and serviced upon exit of SMM even though the previous NMI handler has still not completed. I.e. RSM unconditionally unblocks NMI, but shutdown on RSM does not, which is in direct contradiction of KVM's behavior. But, as mentioned above, KVM follows AMD architecture and restores NMI blocking on RSM, so that micro-architectural detail is already lost. And for Pentium era CPUs, SMI# can break shutdown, meaning that at least some Intel CPUs fully leave SMM when entering shutdown: In the shutdown state, Intel processors stop executing instructions until a RESET#, INIT# or NMI# is asserted. While Pentium family processors recognize the SMI# signal in shutdown state, P6 family and Intel486 processors do not. In other words, the fact that Intel CPUs have implemented the two extremes gives KVM carte blanche when it comes to honoring Intel's architecture for handling shutdown during RSM. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-3-seanjc@google.com> [Return X86EMUL_CONTINUE after triple fault. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Now that APICv/AVIC enablement is kept in common 'enable_apicv' variable, there's no need to call kvm_apicv_init() from vendor specific code. No functional change intended. Reviewed-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882c-3-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Unify VMX and SVM code by moving APICv/AVIC enablement tracking to common 'enable_apicv' variable. Note: unlike APICv, AVIC is disabled by default. No functional change intended. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210609150911.1471882c-2-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sergey Senozhatsky 提交于
Implement PM hibernation/suspend prepare notifiers so that KVM can reliably set PVCLOCK_GUEST_STOPPED on VCPUs and properly suspend VMs. Signed-off-by: NSergey Senozhatsky <senozhatsky@chromium.org> Message-Id: <20210606021045.14159-2-senozhatsky@chromium.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
Don't allow posted interrupts to modify a stale posted interrupt descriptor (including the initial value of 0). Empirical tests on real hardware reveal that a posted interrupt descriptor referencing an unbacked address has PCI bus error semantics (reads as all 1's; writes are ignored). However, kvm can't distinguish unbacked addresses from device-backed (MMIO) addresses, so it should really ask userspace for an MMIO completion. That's overly complicated, so just punt with KVM_INTERNAL_ERROR. Don't return the error until the posted interrupt descriptor is actually accessed. We don't want to break the existing kvm-unit-tests that assume they can launch an L2 VM with a posted interrupt descriptor that references MMIO space in L1. Fixes: 6beb7bd5 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()") Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-8-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
When the kernel has no mapping for the vmcs02 virtual APIC page, userspace MMIO completion is necessary to process nested posted interrupts. This is not a configuration that KVM supports. Rather than silently ignoring the problem, try to exit to userspace with KVM_INTERNAL_ERROR. Note that the event that triggers this error is consumed as a side-effect of a call to kvm_check_nested_events. On some paths (notably through kvm_vcpu_check_block), the error is dropped. In any case, this is an incremental improvement over always ignoring the error. Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-7-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
No functional change intended. At present, the only negative value returned by kvm_check_nested_events is -EBUSY. Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-6-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
No functional change intended. At present, 'r' will always be -EBUSY on a control transfer to the 'out' label. Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-5-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
No functional change intended. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NOliver Upton <oupton@google.com> Message-Id: <20210604172611.281819-4-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
A survey of the callsites reveals that they all ensure the vCPU is in guest mode before calling kvm_check_nested_events. Remove this dead code so that the only negative value this function returns (at the moment) is -EBUSY. Signed-off-by: NJim Mattson <jmattson@google.com> Message-Id: <20210604172611.281819-2-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Calculate the TSC offset and multiplier on nested transitions and expose the TSC scaling feature to L1. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-11-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the VMCS every time the VMCS is loaded. Instead of doing this, set this field from common code on initialization and whenever the scaling ratio changes. Additionally remove vmx->current_tsc_ratio. This field is redundant as vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling ratio. The vmx->current_tsc_ratio field is only used for avoiding unnecessary writes but it is no longer needed after removing the code from the VMCS load path. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Message-Id: <20210607105438.16541-1-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
The write_l1_tsc_offset() callback has a misleading name. It does not set L1's TSC offset, it rather updates the current TSC offset which might be different if a nested guest is executing. Additionally, both the vmx and svm implementations use the same logic for calculating the current TSC before writing it to hardware. Rename the function and move the common logic to the caller. The vmx/svm specific code now merely sets the given offset to the corresponding hardware structure. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-9-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
When L2 is entered we need to "merge" the TSC multiplier and TSC offset values of 01 and 12 together. The merging is done using the following equations: offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12 mult_02 = (mult_01 * mult_12) >> shift_bits Where shift_bits is kvm_tsc_scaling_ratio_frac_bits. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-8-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
In order to implement as much of the nested TSC scaling logic as possible in common code, we need these vendor callbacks for retrieving the TSC offset and the TSC multiplier that L1 has set for L2. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-7-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
This is required for supporting nested TSC scaling. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-6-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Sometimes kvm_scale_tsc() needs to use the current scaling ratio and other times (like when reading the TSC from user space) it needs to use L1's scaling ratio. Have the caller specify this by passing the ratio as a parameter. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-5-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
All existing code uses kvm_compute_tsc_offset() passing L1 TSC values to it. Let's document this by renaming it to kvm_compute_l1_tsc_offset(). Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-4-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ilias Stamatis 提交于
Store L1's scaling ratio in the kvm_vcpu_arch struct like we already do for L1's TSC offset. This allows for easy save/restore when we enter and then exit the nested guest. Signed-off-by: NIlias Stamatis <ilstam@amazon.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210526184418.28881-3-ilstam@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
If the TDP MMU is in use, wait to allocate the rmaps until the shadow MMU is actually used. (i.e. a nested VM is launched.) This saves memory equal to 0.2% of guest memory in cases where the TDP MMU is used and there are no nested guests involved. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-8-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
If only the TDP MMU is being used to manage the memory mappings for a VM, then many rmap operations can be skipped as they are guaranteed to be no-ops. This saves some time which would be spent on the rmap operation. It also avoids acquiring the MMU lock in write mode for many operations. This makes it safe to run the VM without rmaps allocated, when only using the TDP MMU and sets the stage for waiting to allocate the rmaps until they're needed. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-7-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Add a field to control whether new memslots should have rmaps allocated for them. As of this change, it's not safe to skip allocating rmaps, so the field is always set to allocate rmaps. Future changes will make it safe to operate without rmaps, using the TDP MMU. Then further changes will allow the rmaps to be allocated lazily when needed for nested oprtation. No functional change expected. Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-6-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Small refactor to facilitate allocating rmaps for all memslots at once. No functional change expected. Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-3-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ben Gardon 提交于
Small code deduplication. No functional change expected. Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20210518173414.450044-2-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Keqian Zhu 提交于
Currently, when dirty logging is started in initially-all-set mode, we write protect huge pages to prepare for splitting them into 4K pages, and leave normal pages untouched as the logging will be enabled lazily as dirty bits are cleared. However, enabling dirty logging lazily is also feasible for huge pages. This not only reduces the time of start dirty logging, but it also greatly reduces side-effect on guest when there is high dirty rate. Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210429034115.35560-3-zhukeqian1@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Keqian Zhu 提交于
Prepare for write protecting large page lazily during dirty log tracking, for which we will only need to write protect gfns at large page granularity. No functional or performance change expected. Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com> Message-Id: <20210429034115.35560-2-zhukeqian1@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
Now that kvm_hv_flush_tlb() has been patched to support XMM hypercall inputs, we can start advertising this feature to guests. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <e63fc1c61dd2efecbefef239f4f0a598bd552750.1622019134.git.sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
Hyper-V supports the use of XMM registers to perform fast hypercalls. This allows guests to take advantage of the improved performance of the fast hypercall interface even though a hypercall may require more than (the current maximum of) two input registers. The XMM fast hypercall interface uses six additional XMM registers (XMM0 to XMM5) to allow the guest to pass an input parameter block of up to 112 bytes. Add framework to read from XMM registers in kvm_hv_hypercall() and use the additional hypercall inputs from XMM registers in kvm_hv_flush_tlb() when possible. Cc: Alexander Graf <graf@amazon.com> Co-developed-by: NEvgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NEvgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <fc62edad33f1920fe5c74dde47d7d0b4275a9012.1622019134.git.sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
As of now there are 7 parameters (and flags) that are used in various hyper-v hypercall handlers. There are 6 more input/output parameters passed from XMM registers which are to be added in an upcoming patch. To make passing arguments to the handlers more readable, capture all these parameters into a single structure. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <273f7ed510a1f6ba177e61b73a5c7bfbee4a4a87.1622019133.git.sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Siddharth Chandrasekaran 提交于
Hyper-v XMM fast hypercalls use XMM registers to pass input/output parameters. To access these, hyperv.c can reuse some FPU register accessors defined in emulator.c. Move them to a common location so both can access them. While at it, reorder the parameters of these accessor methods to make them more readable. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: NSiddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <01a85a6560714d4d3637d3d86e5eba65073318fa.1622019133.git.sidcha@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Shaokun Zhang 提交于
Function 'is_nx_huge_page_enabled' is called only by kvm/mmu, so make it as inline fucntion and remove the unnecessary declaration. Cc: Ben Gardon <bgardon@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Message-Id: <1622102271-63107-1-git-send-email-zhangshaokun@hisilicon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 6月, 2021 2 次提交
-
-
由 Sean Christopherson 提交于
Calculate and check the full mmu_role when initializing the MMU context for the nested MMU, where "full" means the bits and pieces of the role that aren't handled by kvm_calc_mmu_role_common(). While the nested MMU isn't used for shadow paging, things like the number of levels in the guest's page tables are surprisingly important when walking the guest page tables. Failure to reinitialize the nested MMU context if L2's paging mode changes can result in unexpected and/or missed page faults, and likely other explosions. E.g. if an L1 vCPU is running both a 32-bit PAE L2 and a 64-bit L2, the "common" role calculation will yield the same role for both L2s. If the 64-bit L2 is run after the 32-bit PAE L2, L0 will fail to reinitialize the nested MMU context, ultimately resulting in a bad walk of L2's page tables as the MMU will still have a guest root_level of PT32E_ROOT_LEVEL. WARNING: CPU: 4 PID: 167334 at arch/x86/kvm/vmx/vmx.c:3075 ept_save_pdptrs+0x15/0xe0 [kvm_intel] Modules linked in: kvm_intel] CPU: 4 PID: 167334 Comm: CPU 3/KVM Not tainted 5.13.0-rc1-d849817d5673-reqs #185 Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:ept_save_pdptrs+0x15/0xe0 [kvm_intel] Code: <0f> 0b c3 f6 87 d8 02 00f RSP: 0018:ffffbba702dbba00 EFLAGS: 00010202 RAX: 0000000000000011 RBX: 0000000000000002 RCX: ffffffff810a2c08 RDX: ffff91d7bc30acc0 RSI: 0000000000000011 RDI: ffff91d7bc30a600 RBP: ffff91d7bc30a600 R08: 0000000000000010 R09: 0000000000000007 R10: 0000000000000000 R11: 0000000000000000 R12: ffff91d7bc30a600 R13: ffff91d7bc30acc0 R14: ffff91d67c123460 R15: 0000000115d7e005 FS: 00007fe8e9ffb700(0000) GS:ffff91d90fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000029f15a001 CR4: 00000000001726e0 Call Trace: kvm_pdptr_read+0x3a/0x40 [kvm] paging64_walk_addr_generic+0x327/0x6a0 [kvm] paging64_gva_to_gpa_nested+0x3f/0xb0 [kvm] kvm_fetch_guest_virt+0x4c/0xb0 [kvm] __do_insn_fetch_bytes+0x11a/0x1f0 [kvm] x86_decode_insn+0x787/0x1490 [kvm] x86_decode_emulated_instruction+0x58/0x1e0 [kvm] x86_emulate_instruction+0x122/0x4f0 [kvm] vmx_handle_exit+0x120/0x660 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xe25/0x1cb0 [kvm] kvm_vcpu_ioctl+0x211/0x5a0 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x40/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Fixes: bf627a92 ("x86/kvm/mmu: check if MMU reconfiguration is needed in init_kvm_nested_mmu()") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210610220026.1364486-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Commit c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context) tries to allocate per-vCPU emulation context dynamically, however, the x86_emulator slab cache is still exiting after the kvm module is unload as below after destroying the VM and unloading the kvm module. grep x86_emulator /proc/slabinfo x86_emulator 36 36 2672 12 8 : tunables 0 0 0 : slabdata 3 3 0 This patch fixes this slab cache leak by destroying the x86_emulator slab cache when the kvm module is unloaded. Fixes: c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context) Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1623387573-5969-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-