- 04 2月, 2021 5 次提交
-
-
由 Like Xu 提交于
Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Like Xu 提交于
To make code responsibilities clear, we may resue and invoke the vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c), so expose it to passthrough LBR msrs later. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-2-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32). The full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in bits 15:0, and single-bit modifiers in bits 31:16. Historically, KVM has only had to worry about handling the "failed VM-Entry" modifier, which could only be set in very specific flows and required dedicated handling. I.e. manually stripping the FAILED_VMENTRY bit was a somewhat viable approach. But even with only a single bit to worry about, KVM has had several bugs related to comparing a basic exit reason against the full exit reason store in vcpu_vmx. Upcoming Intel features, e.g. SGX, will add new modifier bits that can be set on more or less any VM-Exit, as opposed to the significantly more restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off flows isn't scalable. Tracking exit reason in a union forces code to explicitly choose between consuming the full exit reason and the basic exit, and is a convenient way to document and access the modifiers. No functional change intended. Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-2-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 11月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(), and invoke the new hook from kvm_valid_cr4(). This fixes an issue where KVM_SET_SREGS would return success while failing to actually set CR4. Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return in __set_sregs() is not a viable option as KVM has already stuffed a variety of vCPU state. Note, kvm_valid_cr4() and is_valid_cr4() have different return types and inverted semantics. This will be remedied in a future patch. Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2020 1 次提交
-
-
由 Maxim Levitsky 提交于
This will be used to signal an error to the userspace, in case the vendor code failed during handling of this msr. (e.g -ENOMEM) Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 9月, 2020 15 次提交
-
-
由 Alexander Graf 提交于
We will introduce the concept of MSRs that may not be handled in kernel space soon. Some MSRs are directly passed through to the guest, effectively making them handled by KVM from user space's point of view. This patch introduces all logic required to ensure that MSRs that user space wants trapped are not marked as direct access for guests. Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-7-graf@amazon.com> [Replace "_idx" with "_slot". - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Aaron Lewis 提交于
Prepare vmx and svm for a subsequent change that ensures the MSR permission bitmap is set to allow an MSR that userspace is tracking to force a vmx_vmexit in the guest. Signed-off-by: NAaron Lewis <aaronlewis@google.com> Reviewed-by: NOliver Upton <oupton@google.com> [agraf: rebase, adapt SVM scheme to nested changes that came in between] Signed-off-by: NAlexander Graf <graf@amazon.com> Message-Id: <20200925143422.21718-5-graf@amazon.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename "index" to "slot" in struct vmx_uret_msr to align with the terminology used by common x86's kvm_user_return_msrs, and to avoid conflating "MSR's ECX index" with "MSR's index into an array". No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-16-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename "find_msr_entry" to scope it to VMX and to associate it with guest_uret_msrs. Drop the "entry" so that the function name pairs with the existing __vmx_find_uret_msr(), which intentionally uses a double underscore prefix instead of appending "index" or "slot" as those names are already claimed by other pieces of the user return MSR stack. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-13-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add "uret" to "guest_msrs_ready" to explicitly associate it with the "guest_uret_msrs" array, and replace "ready" with "loaded" to more precisely reflect what it tracks, e.g. "ready" could be interpreted as meaning ready for processing (setup_msrs() has run), which is wrong. "loaded" also aligns with the similar "guest_state_loaded" field. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-8-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add "uret" into the name of "save_nmsrs" to explicitly associate it with the guest_uret_msrs array, and replace "save" with "active" (for lack of a better word) to better describe what is being tracked. While "save" is more or less accurate when viewed as a literal description of the field, e.g. it holds the number of MSRs that were saved into the array the last time setup_msrs() was invoked, it can easily be misinterpreted by the reader, e.g. as meaning the number of MSRs that were saved from hardware at some point in the past, or as the number of MSRs that need to be saved at some point in the future, both of which are wrong. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-7-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename vcpu_vmx.nsmrs to vcpu_vmx.nr_uret_msrs to explicitly associate it with the guest_uret_msrs array. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-6-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename struct "shared_msr_entry" to "vmx_uret_msr" to align with x86's rename of "shared_msrs" to "user_return_msrs", and to call out that the struct is specific to VMX, i.e. not part of the generic "shared_msrs" framework. Abbreviate "user_return" as "uret" to keep line lengths marginally sane and code more or less readable. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add "loadstore" to vmx_find_msr_index() to differentiate it from the so called shared MSRs helpers (which will soon be renamed), and replace "index" with "slot" to better convey that the helper returns slot in the array, not the MSR index (the value that gets stuffed into ECX). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-4-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add "MAX" to the LOADSTORE and so called SHARED MSR defines to make it more clear that the define controls the array size, as opposed to the actual number of valid entries that are in the array. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923180409.32255-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Call guest_state_valid() directly instead of querying emulation_required when checking if L1 is attempting VM-Enter with invalid guest state. If emulate_invalid_guest_state is false, KVM will fixup segment regs to avoid emulation and will never set emulation_required, i.e. KVM will incorrectly miss the associated consistency checks because the nested path stuffs segments directly into vmcs02. Opportunsitically add Consistency Check tracing to make future debug suck a little less. Fixes: 2bb8cafe ("KVM: vVMX: signal failure for nested VMEntry if emulation_required") Fixes: 3184a995 ("KVM: nVMX: fix vmentry failure code when L2 state would require emulation") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923184452.980-4-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Rename ops.h to vmx_ops.h to allow adding a tdx_ops.h in the future without causing massive confusion. Trust Domain Extensions (TDX) is built on VMX, but KVM cannot directly access the VMCS(es) for a TDX guest, thus TDX will need its own "ops" implementation for wrapping the low level operations. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923183112.3030-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
Extract the posted interrupt code so that it can be reused for Trust Domain Extensions (TDX), which requires posted interrupts and can use KVM VMX's implementation almost verbatim. TDX is different enough from raw VMX that it is highly desirable to implement the guts of TDX in a separate file, i.e. reusing posted interrupt code by shoving TDX support into vmx.c would be a mess. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200923183112.3030-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Krish Sadhukhan 提交于
KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it Currently, prepare_vmcs02_early() does not check if the "unrestricted guest" VM-execution control in vmcs12 is turned off and leaves the corresponding bit on in vmcs02. Due to this setting, vmentry checks which are supposed to render the nested guest state as invalid when this VM-execution control is not set, are passing in hardware. This patch turns off the "unrestricted guest" VM-execution control in vmcs02 if vmcs12 has turned it off. Suggested-by: NJim Mattson <jmattson@google.com> Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200921081027.23047-2-krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Babu Moger 提交于
Handling of kvm_read/write_guest_virt*() errors can be moved to common code. The same code can be used by both VMX and SVM. Signed-off-by: NBabu Moger <babu.moger@amd.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 9月, 2020 1 次提交
-
-
由 Mohammed Gamal 提交于
This patch exposes allow_smaller_maxphyaddr to the user as a module parameter. Since smaller physical address spaces are only supported on VMX, the parameter is only exposed in the kvm_intel module. For now disable support by default, and let the user decide if they want to enable it. Modifications to VMX page fault and EPT violation handling will depend on whether that parameter is enabled. Signed-off-by: NMohammed Gamal <mgamal@redhat.com> Message-Id: <20200903141122.72908-1-mgamal@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 9月, 2020 1 次提交
-
-
由 Peter Shier 提交于
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call load_pdptrs to read the possibly updated PDPTEs from the guest physical address referenced by CR3. It loads them into vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. At the subsequent assumed reentry into L2, the mmu will call vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works if the L2 CRn write intercept always resumes L2. The resume path calls vmx_check_nested_events which checks for exceptions, MTF, and expired VMX preemption timers. If vmx_check_nested_events finds any of these conditions pending it will reflect the corresponding exit into L1. Live migration at this point would also cause a missed immediate reentry into L2. After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale values stored at last L2 exit. A repro of this bug showed L2 entering triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values. When L2 is in PAE paging mode add a call to ept_load_pdptrs before leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in vcpu->arch.walk_mmu->pdptrs[]. Tested: kvm-unit-tests with new directed test: vmx_mtf_pdpte_test. Verified that test fails without the fix. Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix would repro readily. Ran 14 simultaneous L2s for 140 iterations with no failures. Signed-off-by: NPeter Shier <pshier@google.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20200820230545.2411347-1-pshier@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 7月, 2020 3 次提交
-
-
由 Sean Christopherson 提交于
Use the shadow_root_level from the current MMU as the root level for the PGD, i.e. for VMX's EPTP. This eliminates the weird dependency between VMX and the MMU where both must independently calculate the same root level for things to work correctly. Temporarily keep VMX's calculation of the level and use it to WARN if the incoming level diverges. Opportunistically refactor kvm_mmu_load_pgd() to avoid indentation hell, and rename a 'cr3' param in the load_mmu_pgd prototype that managed to survive the cr3 purge. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-6-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Make vmx_load_mmu_pgd() static as it is no longer invoked directly by nested VMX (or any code for that matter). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove an extra declaration of construct_eptp() from vmx.h. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-4-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2020 2 次提交
-
-
由 Mohammed Gamal 提交于
Check guest physical address against its maximum, which depends on the guest MAXPHYADDR. If the guest's physical address exceeds the maximum (i.e. has reserved bits set), inject a guest page fault with PFERR_RSVD_MASK set. This has to be done both in the EPT violation and page fault paths, as there are complications in both cases with respect to the computation of the correct error code. For EPT violations, unfortunately the only possibility is to emulate, because the access type in the exit qualification might refer to an access to a paging structure, rather than to the access performed by the program. Trapping page faults instead is needed in order to correct the error code, but the access type can be obtained from the original error code and passed to gva_to_gpa. The corrections required in the error code are subtle. For example, imagine that a PTE for a supervisor page has a reserved bit set. On a supervisor-mode access, the EPT violation path would trigger. However, on a user-mode access, the processor will not notice the reserved bit and not include PFERR_RSVD_MASK in the error code. Co-developed-by: NMohammed Gamal <mgamal@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-8-mgamal@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20200710154811.418214-7-mgamal@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 7月, 2020 2 次提交
-
-
由 Jim Mattson 提交于
Both the vcpu_vmx structure and the vcpu_svm structure have a 'last_cpu' field. Move the common field into the kvm_vcpu_arch structure. For clarity, rename it to 'last_vmentry_cpu.' Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NOliver Upton <oupton@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Message-Id: <20200603235623.245638-6-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
As we already do in svm, record the last logical processor on which a vCPU has run, so that it can be communicated to userspace for potential hardware errors. Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NOliver Upton <oupton@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Message-Id: <20200603235623.245638-4-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 6月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was moved to common x86 code. No functional change intended. Fixes: 37486135 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 6月, 2020 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
Syzbot reports the following issue: WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618 kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618 ... Call Trace: ... RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618 ... nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638 handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline] handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728 vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067 'exception' we're trying to inject with kvm_inject_emulated_page_fault() comes from: nested_vmx_get_vmptr() kvm_read_guest_virt() kvm_read_guest_virt_helper() vcpu->arch.walk_mmu->gva_to_gpa() but it is only set when GVA to GPA conversion fails. In case it doesn't but we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed 'exception'. This happen when the argument is MMIO. Paolo also noticed that nested_vmx_get_vmptr() is not the only place in KVM code where kvm_read/write_guest_virt*() return result is mishandled. VMX instructions along with INVPCID have the same issue. This was already noticed before, e.g. see commit 541ab2ae ("KVM: x86: work around leak of uninitialized stack contents") but was never fully fixed. KVM could've handled the request correctly by going to userspace and performing I/O but there doesn't seem to be a good need for such requests in the first place. Introduce vmx_handle_memory_failure() as an interim solution. Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF, KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem to have a good enum describing this tristate, just add "int *ret" to nested_vmx_get_vmptr() interface to pass the information. Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200605115906.532682-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 6月, 2020 1 次提交
-
-
由 Peter Shier 提交于
Add new field to hold preemption timer expiration deadline appended to struct kvm_vmx_nested_state_hdr. This is to prevent the first VM-Enter after migration from incorrectly restarting the timer with the full timer value instead of partially decayed timer value. KVM_SET_NESTED_STATE restarts timer using migrated state regardless of whether L1 sets VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. Fixes: cf8b84f4 ("kvm: nVMX: Prepare for checkpointing L2 state") Signed-off-by: NPeter Shier <pshier@google.com> Signed-off-by: NMakarand Sonare <makarandsonare@google.com> Message-Id: <20200526215107.205814-2-makarandsonare@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 5月, 2020 5 次提交
-
-
由 Sean Christopherson 提交于
Move CR0 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0() is called multiple times in a single VM-Exit, and more importantly eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR0, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr0_guest_bits". No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move CR4 caching into the standard register caching mechanism in order to take advantage of the availability checks provided by regs_avail. This avoids multiple VMREADs and retpolines (when configured) during nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times on each transition, e.g. when stuffing CR0 and CR3. As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading CR4, and squashes the confusing naming discrepancy of "cache_reg" vs. "decache_cr4_guest_bits". No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS switch when temporarily loading vmcs02 to synchronize it to vmcs12, i.e. give copy_vmcs02_to_vmcs12_rare() the same treatment as vmx_switch_vmcs(). Make vmx_vcpu_load() static now that it's only referenced within vmx.c. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200506235850.22600-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS switch when running with spectre_v2_user=on/auto if the switch is between two VMCSes in the same guest, i.e. between vmcs01 and vmcs02. The IBPB is intended to prevent one guest from attacking another, which is unnecessary in the nested case as it's the same guest from KVM's perspective. This all but eliminates the overhead observed for nested VMX transitions when running with CONFIG_RETPOLINE=y and spectre_v2_user=on/auto, which can be significant, e.g. roughly 3x on current systems. Reported-by: NAlexander Graf <graf@amazon.com> Cc: KarimAllah Raslan <karahmed@amazon.de> Cc: stable@vger.kernel.org Fixes: 15d45071 ("KVM/x86: Add IBPB support") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200501163117.4655-1-sean.j.christopherson@intel.com> [Invert direction of bool argument. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Move the architectural (non-KVM specific) interrupt/NMI blocking checks to a separate helper so that they can be used in a future patch by vmx_check_nested_events(). No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200423022550.15113-8-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 4月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Introduce a new "extended register" type, EXIT_INFO_2 (to pair with the nomenclature in .get_exit_info()), and use it to cache VMX's vmcs.EXIT_INTR_INFO. Drop a comment in vmx_recover_nmi_blocking() that is obsoleted by the generic caching mechanism. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200415203454.8296-6-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-