- 16 4月, 2019 4 次提交
-
-
由 Sean Christopherson 提交于
RSM emulation is currently broken on VMX when the interrupted guest has CR4.VMXE=1. Rather than dance around the issue of HF_SMM_MASK being set when loading SMSTATE into architectural state, ideally RSM emulation itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM architectural state. Ostensibly, the only motivation for having HF_SMM_MASK set throughout the loading of state from the SMRAM save state area is so that the memory accesses from GET_SMSTATE() are tagged with role.smm. Load all of the SMRAM save state area from guest memory at the beginning of RSM emulation, and load state from the buffer instead of reading guest memory one-by-one. This paves the way for clearing HF_SMM_MASK prior to loading state, and also aligns RSM with the enter_smm() behavior, which fills a buffer and writes SMRAM save state in a single go. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
Issue was discovered when running kvm-unit-tests on KVM running as L1 on top of Hyper-V. When vmx_instruction_intercept unit-test attempts to run RDPMC to test RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU. Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC. The reason vmx_instruction_intercept unit-test attempts to run RDPMC even though Hyper-V doesn't support PMU is because L1 expose to L2 support for RDPMC-exiting. Which is reasonable to assume that is supported only in case CPU supports PMU to being with. Above issue can easily be simulated by modifying vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with "-cpu host,+vmx,-pmu" and run unit-test. To handle issue, change KVM to expose RDPMC-exiting only when guest supports PMU. Reported-by: NSaar Amar <saaramar@microsoft.com> Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 WANG Chao 提交于
guest xcr0 could leak into host when MCE happens in guest mode. Because do_machine_check() could schedule out at a few places. For example: kvm_load_guest_xcr0 ... kvm_x86_ops->run(vcpu) { vmx_vcpu_run vmx_complete_atomic_exit kvm_machine_check do_machine_check do_memory_failure memory_failure lock_page In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule out, host cpu has guest xcr0 loaded (0xff). In __switch_to { switch_fpu_finish copy_kernel_to_fpregs XRSTORS If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in and tries to reinitialize fpu by restoring init fpu state. Same story as last #GP, except we get DOUBLE FAULT this time. Cc: stable@vger.kernel.org Signed-off-by: NWANG Chao <chao.wang@ucloud.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
As mentioned in the comment, there are some special cases where we can simply clear the TPR shadow bit from the CPU-based execution controls in the vmcs02. Handle them so that we can remove some XFAILs from vmx.flat. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 3月, 2019 2 次提交
-
-
由 Sean Christopherson 提交于
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES regardless of hardware support under the pretense that KVM fully emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts). Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so that it's emulated on AMD hosts. Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Singh, Brijesh 提交于
Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: NVenkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 3月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
Eliminate a gratuitous conflict with 5.0. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 2月, 2019 12 次提交
-
-
由 Ben Gardon 提交于
There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. If the allocations aren't tied to the process, the OOM killer will not know that killing the process will free the associated kernel memory. Add __GFP_ACCOUNT flags to many of the allocations which are not yet being charged to the VM process's cgroup. Tested: Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch introduced no new failures. Ran a kernel memory accounting test which creates a VM to touch memory and then checks that the kernel memory allocated for the process is within certain bounds. With this patch we account for much more of the vmalloc and slab memory allocated for the VM. Signed-off-by: NBen Gardon <bgardon@google.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Ensure that the VCPU free path goes through vmx_leave_nested and thus nested_vmx_vmexit, so that the cancellation of the timer does not have to be in free_nested. In addition, because some paths through nested_vmx_vmexit do not go through sync_vmcs12, the cancellation of the timer is moved there. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Luwei Kang 提交于
Some Posted-Interrupts from passthrough devices may be lost or overwritten when the vCPU is in runnable state. The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state but not running on physical CPU). If a posted interrupt coming at this time, the irq remmaping facility will set the bit of PIR (Posted Interrupt Requests) without ON (Outstanding Notification). So this interrupt can't be sync to APIC virtualization register and will not be handled by Guest because ON is zero. Signed-off-by: NLuwei Kang <luwei.kang@intel.com> [Eliminate the pi_clear_sn fast path. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...now that the sub-routine follows standard calling conventions. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...to make it callable from C code. Note that because KVM chooses to be ultra paranoid about guest register values, all callee-save registers are still cleared after VM-Exit even though the host's values are now reloaded from the stack. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...to prepare for making the assembly sub-routine callable from C code. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...to prepare for making the sub-routine callable from C code. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...now that the name is no longer usurped by a defunct helper function. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...now that the code is no longer tagged with STACK_FRAME_NON_STANDARD. Arguably, providing __vmx_vcpu_run() to break up vmx_vcpu_run() is valuable on its own, but the previous split was purposely made as small as possible to limit the effects STACK_FRAME_NON_STANDARD. In other words, the current split is now completely arbitrary and likely not the most logical. This also allows renaming ____vmx_vcpu_run() to __vmx_vcpu_run() in a future patch. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
As evidenced by the myriad patches leading up to this moment, using an inline asm blob for vCPU-run is nothing short of horrific. It's also been called "unholy", "an abomination" and likely a whole host of other names that would violate the Code of Conduct if recorded here and now. The code is relocated nearly verbatim, e.g. quotes, newlines, tabs and __stringify need to be dropped, but other than those cosmetic changes the only functional changees are to add the "call" and replace the final "jmp" with a "ret". Note that STACK_FRAME_NON_STANDARD is also dropped from __vmx_vcpu_run(). Suggested-by: NAndi Kleen <ak@linux.intel.com> Suggested-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...in preparation for moving to a proper assembly sub-routnine. vCPU-run isn't a leaf function since it calls vmx_update_host_rsp() and vmx_vmenter(). And since we need to save/restore RBP anyways, unconditionally creating the frame costs a single MOV, i.e. don't bother keying off CONFIG_FRAME_POINTER or using FRAME_BEGIN, etc... Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...to prepare for moving the inline asm to a proper asm sub-routine. Eliminating the immediates allows a nearly verbatim move, e.g. quotes, newlines, tabs and __stringify need to be dropped, but other than those cosmetic changes the only function change will be to replace the final "jmp" with a "ret". Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 2月, 2019 2 次提交
-
-
由 Xiaoyao Li 提交于
Commit ca83b4a7 ("x86/KVM/VMX: Add find_msr() helper function") introduces the helper function find_msr(), which returns -ENOENT when not find the msr in vmx->msr_autoload.guest/host. Correct checking contion of no more available entry in vmx->msr_autoload. Fixes: ca83b4a7 ("x86/KVM/VMX: Add find_msr() helper function") Cc: stable@vger.kernel.org Signed-off-by: NXiaoyao Li <xiaoyao.li@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Luwei Kang 提交于
Some Posted-Interrupts from passthrough devices may be lost or overwritten when the vCPU is in runnable state. The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state but not running on physical CPU). If a posted interrupt comes at this time, the irq remapping facility will set the bit of PIR (Posted Interrupt Requests) but not ON (Outstanding Notification). Then, the interrupt will not be seen by KVM, which always expects PID.ON=1 if PID.PIR=1 as documented in the Intel processor SDM but not in the VT-d specification. To fix this, restore the invariant after PID.SN is cleared. Signed-off-by: NLuwei Kang <luwei.kang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 2月, 2019 13 次提交
-
-
由 Sean Christopherson 提交于
...now that all other references to struct vcpu_vmx have been removed. Note that 'vmx' still needs to be passed into the asm blob in _ASM_ARG1 as it is consumed by vmx_update_host_rsp(). And similar to that code, use _ASM_ARG2 in the assembly code to prepare for moving to proper asm, while explicitly referencing the exact registers in the clobber list for clarity in the short term and to avoid additional precompiler games. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
A failed VM-Enter (obviously) didn't succeed, meaning the CPU never executed an instrunction in guest mode and so can't have changed the general purpose registers. In addition to saving some instructions in the VM-Fail case, this also provides a separate path entirely and thus an opportunity to propagate the fail condition to vmx->fail via register without introducing undue pain. Using a register, as opposed to directly referencing vmx->fail, eliminates the need to pass the offset of 'fail', which will simplify moving the code to proper assembly in future patches. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Switching the ordering allows for an out-of-line path for VM-Fail that elides saving guest state but still shares the register clearing with the VM-Exit path. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...and remove struct vcpu_vmx's temporary __launched variable. Eliminating __launched is a bonus, the real motivation is to get to the point where the only reference to struct vcpu_vmx in the asm code is to vcpu.arch.regs, which will simplify moving the blob to a proper asm file. Note that also means this approach is deliberately different than what is used in nested_vmx_check_vmentry_hw(). Use BL as it is a callee-save register in both 32-bit and 64-bit ABIs, i.e. it can't be modified by vmx_update_host_rsp(), to avoid having to temporarily save/restore the launched flag. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Providing a helper function to update HOST_RSP is visibly easier to read, and more importantly (for the future) eliminates two arguments to the VM-Enter assembly blob. Reducing the number of arguments to the asm blob is for all intents and purposes a prerequisite to moving the code to a proper assembly routine. It's not truly mandatory, but it greatly simplifies the future code, and the cost of the extra CALL+RET is negligible in the grand scheme. Note that although _ASM_ARG[1-3] can be used in the inline asm itself, the intput/output constraints need to be manually defined. gcc will actually compile with _ASM_ARG[1-3] specified as constraints, but what it actually ends up doing with the bogus constraint is unknown. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...to eliminate its parameter and struct vcpu_vmx offset definition from the assembly blob. Accessing CR2 from C versus assembly doesn't change the likelihood of taking a page fault (and modifying CR2) while it's loaded with the guest's value, so long as we don't do anything silly between accessing CR2 and VM-Enter/VM-Exit. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Currently, host_rsp is cached on a per-vCPU basis, i.e. it's stored in struct vcpu_vmx. In non-nested usage the caching is for all intents and purposes 100% effective, e.g. only the first VMLAUNCH needs to synchronize VMCS.HOST_RSP since the call stack to vmx_vcpu_run() is identical each and every time. But when running a nested guest, KVM must invalidate the cache when switching the current VMCS as it can't guarantee the new VMCS has the same HOST_RSP as the previous VMCS. In other words, the cache loses almost all of its efficacy when running a nested VM. Move host_rsp to struct vmcs_host_state, which is per-VMCS, so that it is cached on a per-VMCS basis and restores its 100% hit rate when nested VMs are in play. Note that the host_rsp cache for vmcs02 essentially "breaks" when nested early checks are enabled as nested_vmx_check_vmentry_hw() will see a different RSP at the time of its VM-Enter. While it's possible to avoid even that VMCS.HOST_RSP synchronization, e.g. by employing a dedicated VM-Exit stack, there is little motivation for doing so as the overhead of two VMWRITEs (~55 cycles) is dwarfed by the overhead of the extra VMX transition (600+ cycles) and is a proverbial drop in the ocean relative to the total cost of a nested transtion (10s of thousands of cycles). Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Per commit c2036300 ("KVM: VMX: Let gcc to choose which registers to save (x86_64)"), the only reason RDX is saved/loaded to/from the stack is because it was specified as an input, i.e. couldn't be marked as clobbered (ignoring the fact that "saving" it to a dummy output would indirectly mark it as clobbered). Now that RDX is no longer an input, clobber it. Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Load RDX with the VMCS.HOST_RSP field encoding on-demand instead of delegating to the compiler via an input constraint. In addition to saving one whole MOV instruction, this allows RDX to be properly clobbered (in a future patch) instead of being saved/loaded to/from the stack. Despite nested_vmx_check_vmentry_hw() having similar code, leave it alone, for now. In that case, RDX is unconditionally used and isn't clobbered, i.e. sending in HOST_RSP as an input is simpler. Note that because HOST_RSP is an enum and not a define, it must be redefined as an immediate instead of using __stringify(HOST_RSP). The naming "conflict" between host_rsp and HOST_RSP is slightly confusing, but the former will be removed in a future patch, at which point HOST_RSP is absolutely what is desired. Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
RSI is clobbered by the vCPU-run asm blob, but it's not marked as such, probably because GCC doesn't let you mark inputs as clobbered. "Save" RSI to a dummy output so that GCC recognizes it as being clobbered. Fixes: 773e8a04 ("x86/kvm: use Enlightened VMCS when running on Hyper-V") Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
In the vCPU-run asm blob, the guest's RCX is temporarily saved onto the stack after VM-Exit as the exit flow must first load a register with a pointer to the vCPU's save area in order to save the guest's registers. RCX is arbitrarily designated as the scratch register. Since the stack usage is to (1)save host, (2)save guest, (3)load host and (4)load guest, the code can't conform to the stack's natural FIFO semantics, i.e. it can't simply do PUSH/POP. Regardless of whether it is done for the host's value or guest's value, at some point the code needs to access the stack using a non-traditional method, e.g. MOV instead of POP. vCPU-run opts to create a placeholder on the stack for guest's RCX (by adjusting RSP) and saves RCX to its place immediately after VM-Exit (via MOV). In other words, the purpose of the first 'PUSH RCX' at the start of the vCPU-run asm blob is to adjust RSP down, i.e. there's no need to actually access memory. Use 'SUB $wordsize, RSP' instead of 'PUSH RCX' to make it more obvious that the intent is simply to create a gap on the stack for the guest's RCX. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...except RSP, which is restored by hardware as part of VM-Exit. Paolo theorized that restoring registers from the stack after a VM-Exit in lieu of zeroing them could lead to speculative execution with the guest's values, e.g. if the stack accesses miss the L1 cache[1]. Zeroing XORs are dirt cheap, so just be ultra-paranoid. Note that the scratch register (currently RCX) used to save/restore the guest state is also zeroed as its host-defined value is loaded via the stack, just with a MOV instead of a POP. [1] https://patchwork.kernel.org/patch/10771539/#22441255 Fixes: 0cb5b306 ("kvm: vmx: Scrub hardware GPRs at VM-exit") Cc: <stable@vger.kernel.org> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
The vCPU-run asm blob does a manual comparison of a VMCS' launched status to execute the correct VM-Enter instruction, i.e. VMLAUNCH vs. VMRESUME. The launched flag is a bool, which is a typedef of _Bool. C99 does not define an exact size for _Bool, stating only that is must be large enough to hold '0' and '1'. Most, if not all, compilers use a single byte for _Bool, including gcc[1]. Originally, 'launched' was of type 'int' and so the asm blob used 'cmpl' to check the launch status. When 'launched' was moved to be stored on a per-VMCS basis, struct vcpu_vmx's "temporary" __launched flag was added in order to avoid having to pass the current VMCS into the asm blob. The new '__launched' was defined as a 'bool' and not an 'int', but the 'cmp' instruction was not updated. This has not caused any known problems, likely due to compilers aligning variables to 4-byte or 8-byte boundaries and KVM zeroing out struct vcpu_vmx during allocation. I.e. vCPU-run accesses "junk" data, it just happens to always be zero and so doesn't affect the result. [1] https://gcc.gnu.org/ml/gcc-patches/2000-10/msg01127.html Fixes: d462b819 ("KVM: VMX: Keep list of loaded VMCSs, instead of vcpus") Cc: <stable@vger.kernel.org> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 1月, 2019 1 次提交
-
-
由 Josh Poimboeuf 提交于
With the following commit: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS") ... the hotplug code attempted to detect when SMT was disabled by BIOS, in which case it reported SMT as permanently disabled. However, that code broke a virt hotplug scenario, where the guest is booted with only primary CPU threads, and a sibling is brought online later. The problem is that there doesn't seem to be a way to reliably distinguish between the HW "SMT disabled by BIOS" case and the virt "sibling not yet brought online" case. So the above-mentioned commit was a bit misguided, as it permanently disabled SMT for both cases, preventing future virt sibling hotplugs. Going back and reviewing the original problems which were attempted to be solved by that commit, when SMT was disabled in BIOS: 1) /sys/devices/system/cpu/smt/control showed "on" instead of "notsupported"; and 2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning. I'd propose that we instead consider #1 above to not actually be a problem. Because, at least in the virt case, it's possible that SMT wasn't disabled by BIOS and a sibling thread could be brought online later. So it makes sense to just always default the smt control to "on" to allow for that possibility (assuming cpuid indicates that the CPU supports SMT). The real problem is #2, which has a simple fix: change vmx_vm_init() to query the actual current SMT state -- i.e., whether any siblings are currently online -- instead of looking at the SMT "control" sysfs value. So fix it by: a) reverting the original "fix" and its followup fix: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS") bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation") and b) changing vmx_vm_init() to query the actual current SMT state -- instead of the sysfs control value -- to determine whether the L1TF warning is needed. This also requires the 'sched_smt_present' variable to exported, instead of 'cpu_smt_control'. Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS") Reported-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Joe Mario <jmario@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
-
- 26 1月, 2019 4 次提交
-
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
...along with the function's STACK_FRAME_NON_STANDARD tag. Moving the asm blob results in a significantly smaller amount of code that is marked with STACK_FRAME_NON_STANDARD, which makes it far less likely that gcc will split the function and trigger a spurious objtool warning. As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows the bulk of code to be properly checked by objtool. Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually save/restore the host's RBP and load the guest's RBP prior to calling vmx_vmenter(). Modifying %rbp triggers objtool's stack validation code, and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's impossible to avoid modifying %rbp. Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will split into separate functions, e.g. so that pieces of the function can be inlined. Splitting the function means that the compiled Elf file will contain one or more vmx_vcpu_run.part.* functions in addition to a vmx_vcpu_run function. Depending on where the function is split, objtool may warn about a "call without frame pointer save/setup" in vmx_vcpu_run.part.* since objtool's stack validation looks for exact names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD. Up until recently, the undesirable function splitting was effectively blocked because vmx_vcpu_run() was tagged with __noclone. At the time, __noclone had an unintended side effect that put vmx_vcpu_run() into a separate optimization unit, which in turn prevented gcc from inlining the function (or any of its own function calls) and thus eliminated gcc's motivation to split the function. Removing the __noclone attribute allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning. Kudos to Qian Cai for root causing that the fnsplit optimization is what caused objtool to complain. Fixes: 453eafbe ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines") Tested-by: NQian Cai <cai@lca.pw> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yi Wang 提交于
We get some warnings when building kernel with W=1: arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes] arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes] Make them static to fix this. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Fix a recently introduced bug that results in the wrong VMCS control field being updated when applying a IA32_PERF_GLOBAL_CTRL errata. Fixes: c73da3fc ("KVM: VMX: Properly handle dynamic VM Entry/Exit controls") Reported-by: NHarald Arnesen <harald@skogtun.org> Tested-by: NHarald Arnesen <harald@skogtun.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 1月, 2019 1 次提交
-
-
由 Lan Tianyu 提交于
The "ret" is initialized to be ENOTSUPP. The return value of __hv_remote_flush_tlb_with_range() will be Or with "ret" when ept table potiners are mismatched. This will cause return ENOTSUPP even if flush tlb successfully. This patch is to fix the issue and set "ret" to 0. Fixes: a5c214da ("KVM/VMX: Change hv flush logic when ept tables are mismatched.") Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-