- 03 5月, 2018 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
A guest may modify the SPEC_CTRL MSR from the value used by the kernel. Since the kernel doesn't use IBRS, this means a value of zero is what is needed in the host. But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to the other bits as reserved so the kernel should respect the boot time SPEC_CTRL value and use that. This allows to deal with future extensions to the SPEC_CTRL interface if any at all. Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any difference as paravirt will over-write the callq *0xfff.. with the wrmsrl assembler code. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
- 27 4月, 2018 1 次提交
-
-
由 Junaid Shahid 提交于
Currently, KVM flushes the TLB after a change to the APIC access page address or the APIC mode when EPT mode is enabled. However, even in shadow paging mode, a TLB flush is needed if VPIDs are being used, as specified in the Intel SDM Section 29.4.5. So replace vmx_flush_tlb_ept_only() with vmx_flush_tlb(), which will flush if either EPT or VPIDs are in use. Signed-off-by: NJunaid Shahid <junaids@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 16 4月, 2018 2 次提交
-
-
由 Paolo Bonzini 提交于
This is not specific to Intel/AMD anymore. The TSC offset is available in vcpu->arch.tsc_offset. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 KarimAllah Ahmed 提交于
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always captures the TSC_OFFSET of the running guest whether it is the L1 or L2 guest. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NJim Mattson <jmattson@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> [AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 4月, 2018 1 次提交
-
-
由 Krish Sadhukhan 提交于
According to the sub-section titled 'VM-Execution Control Fields' in the section titled 'Basic VM-Entry Checks' in Intel SDM vol. 3C, the following vmentry check must be enforced: If the 'virtualize APIC-accesses' VM-execution control is 1, the APIC-access address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address should not set any bits beyond the processor's physical-address width. This patch adds the necessary check to conform to this rule. If the check fails, we cause the L2 VMENTRY to fail which is what the associated unit test (following patch) expects. Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 4月, 2018 1 次提交
-
-
由 hu huajun 提交于
In arch/x86/kvm/trace.h, this function is declared as host_irq the first input, and vcpu_id the second, instead of otherwise. Signed-off-by: Nhu huajun <huhuajun@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 4月, 2018 1 次提交
-
-
由 KarimAllah Ahmed 提交于
The VMX-preemption timer is used by KVM as a way to set deadlines for the guest (i.e. timer emulation). That was safe till very recently when capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was introduced. According to Intel SDM 25.5.1: """ The VMX-preemption timer operates in the C-states C0, C1, and C2; it also operates in the shutdown and wait-for-SIPI states. If the timer counts down to zero in any state other than the wait-for SIPI state, the logical processor transitions to the C0 C-state and causes a VM exit; the timer does not cause a VM exit if it counts down to zero in the wait-for-SIPI state. The timer is not decremented in C-states deeper than C2. """ Now once the guest issues the MWAIT with a c-state deeper than C2 the preemption timer will never wake it up again since it stopped ticking! Usually this is compensated by other activities in the system that would wake the core from the deep C-state (and cause a VMExit). For example, if the host itself is ticking or it received interrupts, etc! So disable the VMX-preemption timer if MWAIT is exposed to the guest! Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Fixes: 4d5422ceSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 4月, 2018 1 次提交
-
-
由 Peng Hao 提交于
Make the function static to avoid a warning: no previous prototype for ‘vmx_enable_tdp’ Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 4月, 2018 4 次提交
-
-
由 Wanpeng Li 提交于
Introduce handle_ud() to handle invalid opcode, this function will be used by later patches. Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
vmx_save_host_state has multiple ifdefs for CONFIG_X86_64 that have no other code between them. Simplify by reducing them to a single conditional. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Arnd Bergmann 提交于
The local variable was newly introduced but is only accessed in one place on x86_64, but not on 32-bit: arch/x86/kvm/vmx.c: In function 'vmx_save_host_state': arch/x86/kvm/vmx.c:2175:6: error: unused variable 'cpu' [-Werror=unused-variable] This puts it into another #ifdef. Fixes: 35060ed6 ("x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASE") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove the WARN_ON in handle_ept_misconfig() as it is unnecessary and causes false positives. Return the unmodified result of kvm_mmu_page_fault() instead of converting a system error code to KVM_EXIT_UNKNOWN so that userspace sees the error code of the actual failure, not a generic "we don't know what went wrong". * kvm_mmu_page_fault() will WARN if reserved bits are set in the SPTEs, i.e. it covers the case where an EPT misconfig occurred because of a KVM bug. * The WARN_ON will fire on any system error code that is hit while handling the fault, e.g. -ENOMEM from mmu_topup_memory_caches() while handling a legitmate MMIO EPT misconfig or -EFAULT from kvm_handle_bad_page() if the corresponding HVA is invalid. In either case, userspace should receive the original error code and firing a warning is incorrect behavior as KVM is operating as designed. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 4月, 2018 1 次提交
-
-
由 Sean Christopherson 提交于
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter an exception in Protected Mode while emulating guest due to invalid guest state. Unlike Big RM, KVM doesn't support emulating exceptions in PM, i.e. PM exceptions are always injected via the VMCS. Because we will never do VMRESUME due to emulation_required, the exception is never realized and we'll keep emulating the faulting instruction over and over until we receive a signal. Exit to userspace iff there is a pending exception, i.e. don't exit simply on a requested event. The purpose of this check and exit is to aid in debugging a guest that is in all likelihood already doomed. Invalid guest state in PM is extremely limited in normal operation, e.g. it generally only occurs for a few instructions early in BIOS, and any exception at this time is all but guaranteed to be fatal. Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly handled/emulated, while checking for vectored interrupts, e.g. INTR and NMI, without hitting false positives would add a fair amount of complexity for almost no benefit (getting hit by lightning seems more likely than encountering this specific scenario). Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an exception via the VMCS and emulation_required is true. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 29 3月, 2018 6 次提交
-
-
由 Liran Alon 提交于
When vCPU runs L2 and there is a pending event that requires to exit from L2 to L1 and nested_run_pending=1, vcpu_enter_guest() will request an immediate-exit from L2 (See req_immediate_exit). Since now handling of req_immediate_exit also makes sure to set KVM_REQ_EVENT, there is no need to also set it on vmx_vcpu_run() when nested_run_pending=1. This optimizes cases where VMRESUME was executed by L1 to enter L2 and there is no pending events that require exit from L2 to L1. Previously, this would have set KVM_REQ_EVENT unnecessarly. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Liran Alon 提交于
For exceptions & NMIs events, KVM code use the following coding convention: *) "pending" represents an event that should be injected to guest at some point but it's side-effects have not yet occurred. *) "injected" represents an event that it's side-effects have already occurred. However, interrupts don't conform to this coding convention. All current code flows mark interrupt.pending when it's side-effects have already taken place (For example, bit moved from LAPIC IRR to ISR). Therefore, it makes sense to just rename interrupt.pending to interrupt.injected. This change follows logic of previous commit 664f8e26 ("KVM: X86: Fix loss of exception which has not yet been injected") which changed exception to follow this coding convention as well. It is important to note that in case !lapic_in_kernel(vcpu), interrupt.pending usage was and still incorrect. In this case, interrrupt.pending can only be set using one of the following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that QEMU uses them either to re-set an "interrupt.pending" state it has received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR before sending ioctl to KVM. So it seems that indeed "interrupt.pending" in this case is also suppose to represent "interrupt.injected". However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr() is misusing (now named) interrupt.injected in order to return if there is a pending interrupt. This leads to nVMX/nSVM not be able to distinguish if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should re-inject an injected interrupt. Therefore, add a FIXME at these functions for handling this issue. This patch introduce no semantics change. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Enlightened VMCS is just a structure in memory, the main benefit besides avoiding somewhat slower VMREAD/VMWRITE is using clean field mask: we tell the underlying hypervisor which fields were modified since VMEXIT so there's no need to inspect them all. Tight CPUID loop test shows significant speedup: Before: 18890 cycles After: 8304 cycles Static key is being used to avoid performance penalty for non-Hyper-V deployments. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Babu Moger 提交于
This patch brings some of the code from vmx to x86.h header file. Now, we can share this code between vmx and svm. Modified couple functions to make it common. Signed-off-by: NBabu Moger <babu.moger@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Babu Moger 提交于
Get rid of ple_window_actual_max, because its benefits are really minuscule and the logic is complicated. The overflows(and underflow) are controlled in __ple_window_grow and _ple_window_shrink respectively. Suggested-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NBabu Moger <babu.moger@amd.com> [Fixed potential wraparound and change the max to UINT_MAX. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Babu Moger 提交于
The vmx module parameters are supposed to be unsigned variants. Also fixed the checkpatch errors like the one below. WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. +module_param(ple_gap, uint, S_IRUGO); Signed-off-by: NBabu Moger <babu.moger@amd.com> [Expanded uint to unsigned int in code. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 28 3月, 2018 1 次提交
-
-
由 Andi Kleen 提交于
KVM and perf have a special backdoor mechanism to report the IP for interrupts re-executed after vm exit. This works for the NMIs that perf normally uses. However when perf is in timer mode it doesn't work because the timer interrupt doesn't get this special treatment. This is common when KVM is running nested in another hypervisor which may not implement the PMU, so only timer mode is available. Call the functions to set up the backdoor IP also for non NMI interrupts. I renamed the functions to set up the backdoor IP reporting to be more appropiate for their new use. The SVM change is only compile tested. v2: Moved the functions inline. For the normal interrupt case the before/after functions are now called from x86.c, not arch specific code. For the NMI case we still need to call it in the architecture specific code, because it's already needed in the low level *_run functions. Signed-off-by: NAndi Kleen <ak@linux.intel.com> [Removed unnecessary calls from arch handle_external_intr. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 24 3月, 2018 4 次提交
-
-
由 Sean Christopherson 提交于
Add struct kvm_vmx, which wraps struct kvm, and a helper to_kvm_vmx() that retrieves 'struct kvm_vmx *' from 'struct kvm *'. Move the VMX specific variables out of kvm_arch and into kvm_vmx. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add kvm_x86_ops->set_identity_map_addr and set ept_identity_map_addr in VMX specific code so that ept_identity_map_addr can be moved out of 'struct kvm_arch' in a future patch. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Define kvm_arch_[alloc|free]_vm in x86 as pass through functions to new kvm_x86_ops vm_alloc and vm_free, and move the current allocation logic as-is to SVM and VMX. Vendor specific alloc/free functions set the stage for SVM/VMX wrappers of 'struct kvm', which will allow us to move the growing number of SVM/VMX specific member variables out of 'struct kvm_arch'. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Segment registers must be synchronized prior to any code that may trigger a call to emulation_required()/guest_state_valid(), e.g. vmx_set_cr0(). Because preparing vmcs02 writes segmentation fields directly, i.e. doesn't use vmx_set_segment(), emulation_required will not be re-evaluated when synchronizing the segment registers, which can result in L0 incorrectly starting emulation of L2. Fixes: 8665c3f9 ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> [Move all of prepare_vmcs02_full earlier, not just segment registers. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 3月, 2018 2 次提交
-
-
由 Paolo Bonzini 提交于
Commit 2bb8cafe ("KVM: vVMX: signal failure for nested VMEntry if emulation_required", 2018-03-12) introduces a new error path which does not set *entry_failure_code. Fix that to avoid a leak of L0 stack to L1. Reported-by: NRadim Krčmář <rkrcmar@redhat.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Linus Torvalds 提交于
The undocumented 'icebp' instruction (aka 'int1') works pretty much like 'int3' in the absense of in-circuit probing equipment (except, obviously, that it raises #DB instead of raising #BP), and is used by some validation test-suites as such. But Andy Lutomirski noticed that his test suite acted differently in kvm than on bare hardware. The reason is that kvm used an inexact test for the icebp instruction: it just assumed that an all-zero VM exit qualification value meant that the VM exit was due to icebp. That is not unlike the guess that do_debug() does for the actual exception handling case, but it's purely a heuristic, not an absolute rule. do_debug() does it because it wants to ascribe _some_ reasons to the #DB that happened, and an empty %dr6 value means that 'icebp' is the most likely casue and we have no better information. But kvm can just do it right, because unlike the do_debug() case, kvm actually sees the real reason for the #DB in the VM-exit interruption information field. So instead of relying on an inexact heuristic, just use the actual VM exit information that says "it was 'icebp'". Right now the 'icebp' instruction isn't technically documented by Intel, but that will hopefully change. The special "privileged software exception" information _is_ actually mentioned in the Intel SDM, even though the cause of it isn't enumerated. Reported-by: NAndy Lutomirski <luto@kernel.org> Tested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2018 14 次提交
-
-
由 Vitaly Kuznetsov 提交于
vmx_save_host_state() is only called from kvm_arch_vcpu_ioctl_run() so the context is pretty well defined and as we're past 'swapgs' MSR_GS_BASE should contain kernel's GS base which we point to irq_stack_union. Add new kernelmode_gs_base() API, irq_stack_union needs to be exported as KVM can be build as module. Acked-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
vmx_save_host_state() is only called from kvm_arch_vcpu_ioctl_run() so the context is pretty well defined. Read MSR_{FS,KERNEL_GS}_BASE from current->thread after calling save_fsgs() which takes care of X86_BUG_NULL_SEG case now and will do RD[FG,GS]BASE when FSGSBASE extensions are exposed to userspace (currently they are not). Acked-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Allow to disable pause loop exit/pause filtering on a per VM basis. If some VMs have dedicated host CPUs, they won't be negatively affected due to needlessly intercepted PAUSE instructions. Thanks to Jan H. Schönherr's initial patch. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
If host CPUs are dedicated to a VM, we can avoid VM exits on HLT. This patch adds the per-VM capability to disable them. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
If KVM enable_vmware_backdoor module parameter is set, the commit change VMX to now intercept #GP instead of being directly deliviered from CPU to guest. It is done to support access to VMware backdoor I/O ports even if TSS I/O permission denies it. In that case: 1. A #GP will be raised and intercepted. 2. #GP intercept handler will simulate I/O port access instruction. 3. I/O port access instruction simulation will allow access to VMware backdoor ports specifically even if TSS I/O permission bitmap denies it. Note that the above change introduce slight performance hit as now #GPs are not deliviered directly from CPU to guest but instead cause #VMExit and instruction emulation. However, this behavior is introduced only when enable_vmware_backdoor KVM module parameter is set. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add kvm_fast_pio() to consolidate duplicate code in VMX and SVM. Unexport kvm_fast_pio_in() and kvm_fast_pio_out(). Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Fast emulation of processor I/O for IN was disabled on x86 (both VMX and SVM) some years ago due to a buggy implementation. The addition of kvm_fast_pio_in(), used by SVM, re-introduced (functional!) fast emulation of IN. Piggyback SVM's work and use kvm_fast_pio_in() on VMX instead of performing full emulation of IN. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Fail a nested VMEntry with EXIT_REASON_INVALID_STATE if L2 guest state is invalid, i.e. vmcs12 contained invalid guest state, and unrestricted guest is disabled in L0 (and by extension disabled in L1). WARN_ON_ONCE in handle_invalid_guest_state() if we're attempting to emulate L2, i.e. nested_run_pending is true, to aid debug in the (hopefully unlikely) scenario that we somehow skip the nested VMEntry consistency check, e.g. due to a L0 bug. Note: KVM relies on hardware to detect the scenario where unrestricted guest is enabled in L0 but disabled in L1 and vmcs12 contains invalid guest state, i.e. checking emulation_required in prepare_vmcs02 is required only to handle the case were unrestricted guest is disabled in L0 since L0 never actually attempts VMLAUNCH/VMRESUME with vmcs02. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
CR3 load/store exiting are always off when unrestricted guest is enabled. WARN on the associated CR3 VMEXIT to detect code that would re-introduce CR3 load/store exiting for unrestricted guest. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Now CR3 is not forced to a host-controlled value when paging is disabled in an unrestricted guest, CR3 load/store exiting can be left disabled (for an unrestricted guest). And because CR0.WP and CR4.PAE/PSE are also not force to host-controlled values, all of ept_update_paging_mode_cr0() is no longer needed, i.e. skip ept_update_paging_mode_cr0() for an unrestricted guest. Because MOV CR3 no longer exits when paging is disabled for an unrestricted guest, vmx_decache_cr3() must always read GUEST_CR3 from the VMCS for an unrestricted guest. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
CR4.PAE - Unrestricted guest can only be enabled when EPT is enabled, and vmx_set_cr4() clears hardware CR0.PAE based on the guest's CR4.PAE, i.e. CR4.PAE always follows the guest's value when unrestricted guest is enabled. CR4.PSE - Unrestricted guest no longer uses the identity mapped IA32 page tables since CR0.PG can be cleared in hardware, thus there is no need to set CR4.PSE when paging is disabled in the guest (and EPT is enabled). Define KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST (to X86_CR4_VMXE) and use it in lieu of KVM_*MODE_VM_CR4_ALWAYS_ON when unrestricted guest is enabled, which removes the forcing of CR4.PAE. Skip the manipulation of CR4.PAE/PSE for EPT when unrestricted guest is enabled, as CR4.PAE isn't forced and so doesn't need to be manually cleared, and CR4.PSE does not need to be set when paging is disabled since the identity mapped IA32 page tables are not used. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Unrestricted guest can only be enabled when EPT is enabled, and when EPT is enabled, ept_update_paging_mode_cr0() will clear hardware CR0.WP based on the guest's CR0.WP, i.e. CR0.WP always follows the guest's value when unrestricted guest is enabled. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
An unrestricted guest can run with hardware CR0.PG==0, i.e. IA32 paging disabled, in which case there is no need to load the guest's CR3 with identity mapped IA32 page tables since hardware will effectively ignore CR3. If unrestricted guest is enabled, don't configure the identity mapped IA32 page table and always load the guest's desired CR3. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-