- 05 7月, 2018 1 次提交
-
-
由 Paolo Bonzini 提交于
Add the logic for flushing L1D on VMENTER. The flush depends on the static key being enabled and the new l1tf_flush_l1d flag being set. The flags is set: - Always, if the flush module parameter is 'always' - Conditionally at: - Entry to vcpu_run(), i.e. after executing user space - From the sched_in notifier, i.e. when switching to a vCPU thread. - From vmexit handlers which are considered unsafe, i.e. where sensitive data can be brought into L1D: - The emulator, which could be a good target for other speculative execution-based threats, - The MMU, which can bring host page tables in the L1 cache. - External interrupts - Nested operations that require the MMU (see above). That is vmptrld, vmptrst, vmclear,vmwrite,vmread. - When handling invept,invvpid [ tglx: Split out from combo patch and reduced to a single flag ] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 26 5月, 2018 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
Implement HvFlushVirtualAddress{List,Space} hypercalls in a simplistic way: do full TLB flush with KVM_REQ_TLB_FLUSH and kick vCPUs which are currently IN_GUEST_MODE. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 17 5月, 2018 1 次提交
-
-
由 Tom Lendacky 提交于
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 15 5月, 2018 3 次提交
-
-
由 Jim Mattson 提交于
L1 and L2 need to have disjoint mappings, so that L1's APIC access page (under VMX) can be omitted from L2's mappings. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
Previously, we toggled between SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, depending on whether or not the EXTD bit was set in MSR_IA32_APICBASE. However, if the local APIC is disabled, we should not set either of these APIC virtualization control bits. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Extract the logic to free a root page in a separate function to avoid code duplication in mmu_free_roots(). Also, change it to an exported function i.e. kvm_mmu_free_roots(). Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 4月, 2018 1 次提交
-
-
由 KarimAllah Ahmed 提交于
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always captures the TSC_OFFSET of the running guest whether it is the L1 or L2 guest. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: NJim Mattson <jmattson@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> [AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 3月, 2018 2 次提交
-
-
由 Liran Alon 提交于
For exceptions & NMIs events, KVM code use the following coding convention: *) "pending" represents an event that should be injected to guest at some point but it's side-effects have not yet occurred. *) "injected" represents an event that it's side-effects have already occurred. However, interrupts don't conform to this coding convention. All current code flows mark interrupt.pending when it's side-effects have already taken place (For example, bit moved from LAPIC IRR to ISR). Therefore, it makes sense to just rename interrupt.pending to interrupt.injected. This change follows logic of previous commit 664f8e26 ("KVM: X86: Fix loss of exception which has not yet been injected") which changed exception to follow this coding convention as well. It is important to note that in case !lapic_in_kernel(vcpu), interrupt.pending usage was and still incorrect. In this case, interrrupt.pending can only be set using one of the following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that QEMU uses them either to re-set an "interrupt.pending" state it has received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR before sending ioctl to KVM. So it seems that indeed "interrupt.pending" in this case is also suppose to represent "interrupt.injected". However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr() is misusing (now named) interrupt.injected in order to return if there is a pending interrupt. This leads to nVMX/nSVM not be able to distinguish if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should re-inject an injected interrupt. Therefore, add a FIXME at these functions for handling this issue. This patch introduce no semantics change. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Vitaly Kuznetsov 提交于
hyperv.h is not part of uapi, there are no (known) users outside of kernel. We are making changes to this file to match current Hyper-V Hypervisor Top-Level Functional Specification (TLFS, see: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs) and we don't want to maintain backwards compatibility. Move the file renaming to hyperv-tlfs.h to avoid confusing it with mshyperv.h. In future, all definitions from TLFS should go to it and all kernel objects should go to mshyperv.h or include/linux/hyperv.h. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 24 3月, 2018 4 次提交
-
-
由 Sean Christopherson 提交于
Add struct kvm_svm, which is analagous to struct vcpu_svm, along with a helper to_kvm_svm() to retrieve kvm_svm from a struct kvm *. Move the SVM specific variables and struct definitions out of kvm_arch and into kvm_svm. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add struct kvm_vmx, which wraps struct kvm, and a helper to_kvm_vmx() that retrieves 'struct kvm_vmx *' from 'struct kvm *'. Move the VMX specific variables out of kvm_arch and into kvm_vmx. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add kvm_x86_ops->set_identity_map_addr and set ept_identity_map_addr in VMX specific code so that ept_identity_map_addr can be moved out of 'struct kvm_arch' in a future patch. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Define kvm_arch_[alloc|free]_vm in x86 as pass through functions to new kvm_x86_ops vm_alloc and vm_free, and move the current allocation logic as-is to SVM and VMX. Vendor specific alloc/free functions set the stage for SVM/VMX wrappers of 'struct kvm', which will allow us to move the growing number of SVM/VMX specific member variables out of 'struct kvm_arch'. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 3月, 2018 1 次提交
-
-
由 Liran Alon 提交于
When L1 IOAPIC redirection-table is written, a request of KVM_REQ_SCAN_IOAPIC is set on all vCPUs. This is done such that all vCPUs will now recalc their IOAPIC handled vectors and load it to their EOI-exitmap. However, it could be that one of the vCPUs is currently running L2. In this case, load_eoi_exitmap() will be called which would write to vmcs02->eoi_exit_bitmap, which is wrong because vmcs02->eoi_exit_bitmap should always be equal to vmcs12->eoi_exit_bitmap. Furthermore, at this point KVM_REQ_SCAN_IOAPIC was already consumed and therefore we will never update vmcs01->eoi_exit_bitmap. This could lead to remote_irr of some IOAPIC level-triggered entry to remain set forever. Fix this issue by delaying the load of EOI-exitmap to when vCPU is running L1. One may wonder why not just delay entire KVM_REQ_SCAN_IOAPIC processing to when vCPU is running L1. This is done in order to handle correctly the case where LAPIC & IO-APIC of L1 is pass-throughed into L2. In this case, vmcs12->virtual_interrupt_delivery should be 0. In current nVMX implementation, that results in vmcs02->virtual_interrupt_delivery to also be 0. Thus, vmcs02->eoi_exit_bitmap is not used. Therefore, every L2 EOI cause a #VMExit into L0 (either on MSR_WRITE to x2APIC MSR or APIC_ACCESS/APIC_WRITE/EPT_MISCONFIG to APIC MMIO page). In order for such L2 EOI to be broadcasted, if needed, from LAPIC to IO-APIC, vcpu->arch.ioapic_handled_vectors must be updated while L2 is running. Therefore, patch makes sure to delay only the loading of EOI-exitmap but not the update of vcpu->arch.ioapic_handled_vectors. Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 3月, 2018 7 次提交
-
-
由 Wanpeng Li 提交于
Allow to disable pause loop exit/pause filtering on a per VM basis. If some VMs have dedicated host CPUs, they won't be negatively affected due to needlessly intercepted PAUSE instructions. Thanks to Jan H. Schönherr's initial patch. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
If host CPUs are dedicated to a VM, we can avoid VM exits on HLT. This patch adds the per-VM capability to disable them. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
Access to VMware backdoor ports is done by one of the IN/OUT/INS/OUTS instructions. These ports must be allowed access even if TSS I/O permission bitmap don't allow it. To handle this, VMX/SVM will be changed in future commits to intercept #GP which was raised by such access and handle it by calling x86 emulator to emulate instruction. If it was one of these instructions, the x86 emulator already handles it correctly (Since commit "KVM: x86: Always allow access to VMware backdoor I/O ports") by not checking these ports against TSS I/O permission bitmap. One may wonder why checking for specific instructions is necessary as we can just forward all #GPs to the x86 emulator. There are multiple reasons for doing so: 1. We don't want the x86 emulator to be reached easily by guest by just executing an instruction that raises #GP as that exposes the x86 emulator as a bigger attack surface. 2. The x86 emulator is incomplete and therefore certain instructions that can cause #GP cannot be emulated. Such an example is "INT x" (opcode 0xcd) which reaches emulate_int() which can only emulate the instruction if vCPU is in real-mode. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
Next commits are going introduce support for accessing VMware backdoor ports even though guest's TSS I/O permissions bitmap doesn't allow access. This mimic VMware hypervisor behavior. In order to support this, next commits will change VMX/SVM to intercept #GP which was raised by such access and handle it by calling the x86 emulator to emulate instruction. Since commit "KVM: x86: Always allow access to VMware backdoor I/O ports", the x86 emulator handles access to these I/O ports by not checking these ports against the TSS I/O permission bitmap. However, there could be cases that CPU rasies a #GP on instruction that fails to be disassembled by the x86 emulator (Because of incomplete implementation for example). In those cases, we would like the #GP intercept to just forward #GP as-is to guest as if there was no intercept to begin with. However, current emulator code always queues #UD exception in case emulator fails (including disassembly failures) which is not what is wanted in this flow. This commit addresses this issue by adding a new emulation_type flag that will allow the #GP intercept handler to specify that it wishes to be aware when instruction emulation fails and doesn't want #UD exception to be queued. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add kvm_fast_pio() to consolidate duplicate code in VMX and SVM. Unexport kvm_fast_pio_in() and kvm_fast_pio_out(). Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
Nested Hyper-V/Windows guest running on top of KVM will use TSC page clocksource in two cases: - L0 exposes invariant TSC (CPUID.80000007H:EDX[8]). - L0 provides Hyper-V Reenlightenment support (CPUID.40000003H:EAX[13]). Exposing invariant TSC effectively blocks migration to hosts with different TSC frequencies, providing reenlightenment support will be needed when we start migrating nested workloads. Implement rudimentary support for reenlightenment MSRs. For now, these are just read/write MSRs with no effect. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 07 3月, 2018 1 次提交
-
-
由 Roman Kagan 提交于
In Hyper-V, the fast guest->host notification mechanism is the SIGNAL_EVENT hypercall, with a single parameter of the connection ID to signal. Currently this hypercall incurs a user exit and requires the userspace to decode the parameters and trigger the notification of the potentially different I/O context. To avoid the costly user exit, process this hypercall and signal the corresponding eventfd in KVM, similar to ioeventfd. The association between the connection id and the eventfd is established via the newly introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an (srcu-protected) IDR. Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> [asm/hyperv.h changes approved by KY Srinivasan. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 02 3月, 2018 2 次提交
-
-
由 Wanpeng Li 提交于
Linux (among the others) has checks to make sure that certain features aren't enabled on a certain family/model/stepping if the microcode version isn't greater than or equal to a known good version. By exposing the real microcode version, we're preventing buggy guests that don't check that they are running virtualized (i.e., they should trust the hypervisor) from disabling features that are effectively not buggy. Suggested-by: NFilippo Sironi <sironi@amazon.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Tom Lendacky 提交于
Provide a new KVM capability that allows bits within MSRs to be recognized as features. Two new ioctls are added to the /dev/kvm ioctl routine to retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops callback is used to determine support for the listed MSR-based features. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Tweaked documentation. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 24 2月, 2018 1 次提交
-
-
由 Sebastian Ott 提交于
Fix the following sparse warning by moving the prototype of kvm_arch_mmu_notifier_invalidate_range() to linux/kvm_host.h . CHECK arch/s390/kvm/../../../virt/kvm/kvm_main.c arch/s390/kvm/../../../virt/kvm/kvm_main.c:138:13: warning: symbol 'kvm_arch_mmu_notifier_invalidate_range' was not declared. Should it be static? Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 2月, 2018 1 次提交
-
-
由 Longpeng(Mike) 提交于
The efer_reload is never used since commit 26bb0981 ("KVM: VMX: Use shared msr infrastructure"), so remove it. Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 16 1月, 2018 1 次提交
-
-
由 Wanpeng Li 提交于
Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush, it will be used by later patches to just flush guest tlb. For VMX, this will use INVVPID instead of INVEPT, which will invalidate combined mappings while keeping guest-physical mappings. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 14 12月, 2017 3 次提交
-
-
由 Liran Alon 提交于
This MSR returns the number of #SMIs that occurred on CPU since boot. It was seen to be used frequently by ESXi guest. Patch adds a new vcpu-arch specific var called smi_count to save the number of #SMIs which occurred on CPU since boot. It is exposed as a read-only MSR to guest (causing #GP on wrmsr) in RDMSR/WRMSR emulation code. MSR_SMI_COUNT is also added to emulated_msrs[] to make sure user-space can save/restore it for migration purposes. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
The User-Mode Instruction Prevention feature present in recent Intel processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str) from being executed with CPL > 0. Otherwise, a general protection fault is issued. UMIP instructions in general are also able to trigger vmexits, so we can actually emulate UMIP on older processors. This commit sets up the infrastructure so that kvm-intel.ko and kvm-amd.ko can set the UMIP feature bit for CPUID even if the feature is not actually available in hardware. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and add UMIP support for instructions that are already emulated by KVM. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 12月, 2017 2 次提交
-
-
由 Radim Krčmář 提交于
Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by: NFabian Grünbichler <f.gruenbichler@proxmox.com> Fixes: 38b99173 ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2") Cc: <stable@vger.kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Tested-by: NWanpeng Li <wanpeng.li@hotmail.com> Tested-by: NFabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Rik van Riel 提交于
Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by: NRik van Riel <riel@redhat.com> Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 05 12月, 2017 6 次提交
-
-
由 Brijesh Singh 提交于
The SEV memory encryption engine uses a tweak such that two identical plaintext pages at different location will have different ciphertext. So swapping or moving ciphertext of two pages will not result in plaintext being swapped. Relocating (or migrating) physical backing pages for a SEV guest will require some additional steps. The current SEV key management spec does not provide commands to swap or migrate (move) ciphertext pages. For now, we pin the guest memory registered through KVM_MEMORY_ENCRYPT_REG_REGION ioctl. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
-
由 Brijesh Singh 提交于
The command is used for encrypting the guest memory region using the VM encryption key (VEK) created during KVM_SEV_LAUNCH_START. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Brijesh Singh 提交于
The KVM_SEV_LAUNCH_START command is used to create a memory encryption context within the SEV firmware. In order to do so, the guest owner should provide the guest's policy, its public Diffie-Hellman (PDH) key and session information. The command implements the LAUNCH_START flow defined in SEV spec Section 6.2. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Brijesh Singh 提交于
The command initializes the SEV platform context and allocates a new ASID for this guest from the SEV ASID pool. The firmware must be initialized before we issue any guest launch commands to create a new memory encryption context. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Brijesh Singh 提交于
If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to register/unregister the guest memory regions which may contain the encrypted data (e.g guest RAM, PCI BAR, SMRAM etc). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Brijesh Singh 提交于
If the hardware supports memory encryption then the KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform specific memory encryption commands. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
- 17 11月, 2017 1 次提交
-
-
由 Liran Alon 提交于
In case of instruction-decode failure or emulation failure, x86_emulate_instruction() will call reexecute_instruction() which will attempt to use the cr2 value passed to x86_emulate_instruction(). However, when x86_emulate_instruction() is called from emulate_instruction(), cr2 is not passed (passed as 0) and therefore it doesn't make sense to execute reexecute_instruction() logic at all. Fixes: 51d8b661 ("KVM: cleanup emulate_instruction") Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 19 10月, 2017 1 次提交
-
-
由 Ladi Prosek 提交于
Commit 05cade71 ("KVM: nSVM: fix SMI injection in guest mode") made KVM mask SMI if GIF=0 but it didn't do anything to unmask it when GIF is enabled. The issue manifests for me as a significantly longer boot time of Windows guests when running with SMM-enabled OVMF. This commit fixes it by intercepting STGI instead of requesting immediate exit if the reason why SMM was masked is GIF. Fixes: 05cade71 ("KVM: nSVM: fix SMI injection in guest mode") Signed-off-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-