- 02 3月, 2018 2 次提交
-
-
由 Tom Lendacky 提交于
In order to determine if LFENCE is a serializing instruction on AMD processors, MSR 0xc0011029 (MSR_F10H_DECFG) must be read and the state of bit 1 checked. This patch will add support to allow a guest to properly make this determination. Add the MSR feature callback operation to svm.c and add MSR 0xc0011029 to the list of MSR-based features. If LFENCE is serializing, then the feature is supported, allowing the hypervisor to set the value of the MSR that guest will see. Support is also added to write (hypervisor only) and read the MSR value for the guest. A write by the guest will result in a #GP. A read by the guest will return the value as set by the host. In this way, the support to expose the feature to the guest is controlled by the hypervisor. Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Tom Lendacky 提交于
Provide a new KVM capability that allows bits within MSRs to be recognized as features. Two new ioctls are added to the /dev/kvm ioctl routine to retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops callback is used to determine support for the listed MSR-based features. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Tweaked documentation. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 24 2月, 2018 2 次提交
-
-
由 Eric Biggers 提交于
On x86, special KVM memslots such as the TSS region have anonymous memory mappings created on behalf of userspace, and these mappings are removed when the VM is destroyed. It is however possible for removing these mappings via vm_munmap() to fail. This can most easily happen if the thread receives SIGKILL while it's waiting to acquire ->mmap_sem. This triggers the 'WARN_ON(r < 0)' in __x86_set_memory_region(). syzkaller was able to hit this, using 'exit()' to send the SIGKILL. Note that while the vm_munmap() failure results in the mapping not being removed immediately, it is not leaked forever but rather will be freed when the process exits. It's not really possible to handle this failure properly, so almost every other caller of vm_munmap() doesn't check the return value. It's a limitation of having the kernel manage these mappings rather than userspace. So just remove the WARN_ON() so that users can't spam the kernel log with this warning. Fixes: f0d648bd ("KVM: x86: map/unmap private slots in __x86_set_memory_region") Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
The initial reset of the local APIC is performed before the VMCS has been created, but it tries to do a vmwrite: vmwrite error: reg 810 value 4a00 (err 18944) CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G W I 4.16.0-0.rc2.git0.1.fc28.x86_64 #1 Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014 Call Trace: vmx_set_rvi [kvm_intel] vmx_hwapic_irr_update [kvm_intel] kvm_lapic_reset [kvm] kvm_create_lapic [kvm] kvm_arch_vcpu_init [kvm] kvm_vcpu_init [kvm] vmx_create_vcpu [kvm_intel] kvm_vm_ioctl [kvm] Move it later, after the VMCS has been created. Fixes: 4191db26 ("KVM: x86: Update APICv on APIC reset") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 04 2月, 2018 2 次提交
-
-
由 KarimAllah Ahmed 提交于
[ Based on a patch from Ashok Raj <ashok.raj@intel.com> ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only start saving and restoring when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
-
由 KarimAllah Ahmed 提交于
Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: kvm@vger.kernel.org Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
-
- 03 2月, 2018 1 次提交
-
-
由 Eric Biggers 提交于
Due to a bad merge resolution between commit f2981033 ("KVM/x86: Check input paging mode when cs.l is set") and commit b4ef9d4e ("KVM: Move vcpu_load to arch-specific kvm_arch_vcpu_ioctl_set_sregs"), there is a case in kvm_arch_vcpu_ioctl_set_sregs() where vcpu_put() is not called after vcpu_get(). Fix it. Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 01 2月, 2018 2 次提交
-
-
由 Longpeng(Mike) 提交于
The efer_reload is never used since commit 26bb0981 ("KVM: VMX: Use shared msr infrastructure"), so remove it. Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Vitaly Kuznetsov 提交于
I was investigating an issue with seabios >= 1.10 which stopped working for nested KVM on Hyper-V. The problem appears to be in handle_ept_violation() function: when we do fast mmio we need to skip the instruction so we do kvm_skip_emulated_instruction(). This, however, depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS. However, this is not the case. Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when EPT MISCONFIG occurs. While on real hardware it was observed to be set, some hypervisors follow the spec and don't set it; we end up advancing IP with some random value. I checked with Microsoft and they confirmed they don't fill VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG. Fix the issue by doing instruction skip through emulator when running nested. Fixes: 68c3b4d1Suggested-by: NRadim Krčmář <rkrcmar@redhat.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 31 1月, 2018 3 次提交
-
-
由 Thomas Gleixner 提交于
The reenlightment support for hyperv slapped a direct reference to x86_hyper_type into the kvm code which results in the following build failure when CONFIG_HYPERVISOR_GUEST=n: arch/x86/kvm/x86.c:6259:6: error: ‘x86_hyper_type’ undeclared (first use in this function) arch/x86/kvm/x86.c:6259:6: note: each undeclared identifier is reported only once for each function it appears in Use the proper helper function to cure that. The 32bit compile fails because of: arch/x86/kvm/x86.c:5936:13: warning: ‘kvm_hyperv_tsc_notifier’ defined but not used [-Wunused-function] which is a real trainwreck engineering artwork. The callsite is wrapped into #ifdef CONFIG_X86_64, but the function itself has the #ifdef inside the function body. Make the function itself wrapped into the ifdef to cure that. Qualiteee.... Fixes: 0092e434 ("x86/kvm: Support Hyper-V reenlightenment") Reported-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com>
-
由 Vitaly Kuznetsov 提交于
When running nested KVM on Hyper-V guests its required to update masterclocks for all guests when L1 migrates to a host with different TSC frequency. Implement the procedure in the following way: - Pause all guests. - Tell the host (Hyper-V) to stop emulating TSC accesses. - Update the gtod copy, recompute clocks. - Unpause all guests. This is somewhat similar to cpufreq but there are two important differences: - TSC emulation can only be disabled globally (on all CPUs) - The new TSC frequency is not known until emulation is turned off so there is no way to 'prepare' for the event upfront. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-8-vkuznets@redhat.com
-
由 Vitaly Kuznetsov 提交于
Currently, KVM is able to work in 'masterclock' mode passing PVCLOCK_TSC_STABLE_BIT to guests when the clocksource which is used on the host is TSC. When running nested on Hyper-V the guest normally uses a different one: TSC page which is resistant to TSC frequency changes on events like L1 migration. Add support for it in KVM. The only non-trivial change is in vgettsc(): when updating the gtod copy both the clock readout and tsc value have to be updated now. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-7-vkuznets@redhat.com
-
- 17 1月, 2018 1 次提交
-
-
由 Tianyu Lan 提交于
kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is to fix it. Fixes: f2981033(KVM/x86: Check input paging mode when cs.l is set) Reported-by: NJeremi Piotrowski <jeremi.piotrowski@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 16 1月, 2018 7 次提交
-
-
由 Liran Alon 提交于
sync_pir_to_irr() is only called if vcpu->arch.apicv_active()==true. In case it is false, VMX code make sure to set sync_pir_to_irr to NULL. Therefore, having SVM stubs allows to remove check for if sync_pir_to_irr != NULL from all calling sites. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NLiam Merwick <liam.merwick@oracle.com> [Return highest IRR in the SVM case. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
xsetbv can be expensive when running on nested virtualization, try to avoid it. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: NQuan Xu <quan.xu0@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
syzkaller reported: WARNING: CPU: 0 PID: 12927 at arch/x86/kernel/traps.c:780 do_debug+0x222/0x250 CPU: 0 PID: 12927 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #16 RIP: 0010:do_debug+0x222/0x250 Call Trace: <#DB> debug+0x3e/0x70 RIP: 0010:copy_user_enhanced_fast_string+0x10/0x20 </#DB> _copy_from_user+0x5b/0x90 SyS_timer_create+0x33/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The testcase sets a watchpoint (with perf_event_open) on a buffer that is passed to timer_create() as the struct sigevent argument. In timer_create(), copy_from_user()'s rep movsb triggers the BP. The testcase also sets the debug registers for the guest. However, KVM only restores host debug registers when the host has active watchpoints, which triggers a race condition when running the testcase with multiple threads. The guest's DR6.BS bit can escape to the host before another thread invokes timer_create(), and do_debug() complains. The fix is to respect do_debug()'s dr6 invariant when leaving KVM. Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
When running on a virtual machine, IPIs are expensive when the target CPU is sleeping. Thus, it is nice to be able to avoid them for TLB shootdowns. KVM can just do the flush via INVVPID on the guest's behalf the next time the CPU is scheduled. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> [Use "&" to test the bit instead of "==". - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
Introduce a new bool invalidate_gpa argument to kvm_x86_ops->tlb_flush, it will be used by later patches to just flush guest tlb. For VMX, this will use INVVPID instead of INVEPT, which will invalidate combined mappings while keeping guest-physical mappings. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Wanpeng Li 提交于
The next patch will add another bit to the preempted field in kvm_steal_time. Define a constant for bit 0 (the only one that is currently used). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
This ioctl is obsolete (it was used by Xenner as far as I know) but still let's not break it gratuitously... Its handler is copying directly into struct kvm. Go through a bounce buffer instead, with the added benefit that we can actually do something useful with the flags argument---the previous code was exiting with -EINVAL but still doing the copy. This technically is a userspace ABI breakage, but since no one should be using the ioctl, it's a good occasion to see if someone actually complains. Cc: kernel-hardening@lists.openwall.com Cc: Kees Cook <keescook@chromium.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 18 12月, 2017 1 次提交
-
-
由 Wanpeng Li 提交于
Reported by syzkaller: BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm] Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298 CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0xab/0xe1 print_address_description+0x6b/0x290 kasan_report+0x28a/0x370 write_mmio+0x11e/0x270 [kvm] emulator_read_write_onepage+0x311/0x600 [kvm] emulator_read_write+0xef/0x240 [kvm] emulator_fix_hypercall+0x105/0x150 [kvm] em_hypercall+0x2b/0x80 [kvm] x86_emulate_insn+0x2b1/0x1640 [kvm] x86_emulate_instruction+0x39a/0xb90 [kvm] handle_exception+0x1b4/0x4d0 [kvm_intel] vcpu_enter_guest+0x15a0/0x2640 [kvm] kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall) to the guest memory, however, write_mmio tracepoint always prints 8 bytes through *(u64 *)val since kvm splits the mmio access into 8 bytes. This leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes it by just accessing the bytes which we operate on. Before patch: syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f After patch: syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f Reported-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Tested-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 12月, 2017 1 次提交
-
-
由 Lan Tianyu 提交于
Reported by syzkaller: WARNING: CPU: 0 PID: 27962 at arch/x86/kvm/emulate.c:5631 x86_emulate_insn+0x557/0x15f0 [kvm] Modules linked in: kvm_intel kvm [last unloaded: kvm] CPU: 0 PID: 27962 Comm: syz-executor Tainted: G B W 4.15.0-rc2-next-20171208+ #32 Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.01.03.0006.040720161253 04/07/2016 RIP: 0010:x86_emulate_insn+0x557/0x15f0 [kvm] RSP: 0018:ffff8807234476d0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88072d0237a0 RCX: ffffffffa0065c4d RDX: 1ffff100e5a046f9 RSI: 0000000000000003 RDI: ffff88072d0237c8 RBP: ffff880723447728 R08: ffff88072d020000 R09: ffffffffa008d240 R10: 0000000000000002 R11: ffffed00e7d87db3 R12: ffff88072d0237c8 R13: ffff88072d023870 R14: ffff88072d0238c2 R15: ffffffffa008d080 FS: 00007f8a68666700(0000) GS:ffff880802200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002009506c CR3: 000000071fec4005 CR4: 00000000003626f0 Call Trace: x86_emulate_instruction+0x3bc/0xb70 [kvm] ? reexecute_instruction.part.162+0x130/0x130 [kvm] vmx_handle_exit+0x46d/0x14f0 [kvm_intel] ? trace_event_raw_event_kvm_entry+0xe7/0x150 [kvm] ? handle_vmfunc+0x2f0/0x2f0 [kvm_intel] ? wait_lapic_expire+0x25/0x270 [kvm] vcpu_enter_guest+0x720/0x1ef0 [kvm] ... When CS.L is set, vcpu should run in the 64 bit paging mode. Current kvm set_sregs function doesn't have such check when userspace inputs sreg values. This will lead unexpected behavior. This patch is to add checks for CS.L, EFER.LME, EFER.LMA and CR4.PAE when get SREG inputs from userspace in order to avoid unexpected behavior. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NTianyu Lan <tianyu.lan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 12月, 2017 18 次提交
-
-
由 Christoffer Dall 提交于
Move the calls to vcpu_load() and vcpu_put() in to the architecture specific implementations of kvm_arch_vcpu_ioctl() which dispatches further architecture-specific ioctls on to other functions. Some architectures support asynchronous vcpu ioctls which cannot call vcpu_load() or take the vcpu->mutex, because that would prevent concurrent execution with a running VCPU, which is the intended purpose of these ioctls, for example because they inject interrupts. We repeat the separate checks for these specifics in the architecture code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and calling vcpu_load for these ioctls. Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_fpu(). Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_fpu(). Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_guest_debug(). Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_translate(). Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_mpstate(). Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_mpstate(). Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_sregs(). Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_sregs(). Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_set_regs(). Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_get_regs(). Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
Move vcpu_load() and vcpu_put() into the architecture specific implementations of kvm_arch_vcpu_ioctl_run(). Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts Reviewed-by: NCornelia Huck <cohuck@redhat.com> [Rebased. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Christoffer Dall 提交于
As we're about to call vcpu_load() from architecture-specific implementations of the KVM vcpu ioctls, but yet we access data structures protected by the vcpu->mutex in the generic code, factor this logic out from vcpu_load(). x86 is the only architecture which calls vcpu_load() outside of the main vcpu ioctl function, and these calls will no longer take the vcpu mutex following this patch. However, with the exception of kvm_arch_vcpu_postcreate (see below), the callers are either in the creation or destruction path of the VCPU, which means there cannot be any concurrent access to the data structure, because the file descriptor is not yet accessible, or is already gone. kvm_arch_vcpu_postcreate makes the newly created vcpu potentially accessible by other in-kernel threads through the kvm->vcpus array, and we therefore take the vcpu mutex in this case directly. Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
When I run ebizzy in a 32 vCPUs guest on a 32 pCPUs Xeon box, I can observe ~8000 kvm_wait_lapic_expire CurAvg/s through kvm_stat tool even if the advance tscdeadline hrtimer expiration is disabled. Each call to wait_lapic_expire() will consume ~70 cycles when a timer fires since apic_timer_expire() will set expired_tscdeadline and then wait_lapic_expire() will do some caculation before bailing out. So total ~175us per second is lost on this 3.2Ghz machine. This patch reduces the overhead by skipping the function wait_lapic_expire() when lapic_timer_advance is disabled. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
This MSR returns the number of #SMIs that occurred on CPU since boot. It was seen to be used frequently by ESXi guest. Patch adds a new vcpu-arch specific var called smi_count to save the number of #SMIs which occurred on CPU since boot. It is exposed as a read-only MSR to guest (causing #GP on wrmsr) in RDMSR/WRMSR emulation code. MSR_SMI_COUNT is also added to emulated_msrs[] to make sure user-space can save/restore it for migration purposes. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and add UMIP support for instructions that are already emulated by KVM. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
------------[ cut here ]------------ Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers. WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10 RIP: 0010:ex_handler_fprestore+0x88/0x90 Call Trace: fixup_exception+0x4e/0x60 do_general_protection+0xff/0x270 general_protection+0x22/0x30 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm] RSP: 0018:ffff8803d5627810 EFLAGS: 00010246 kvm_vcpu_reset+0x3b4/0x3c0 [kvm] kvm_apic_accept_events+0x1c0/0x240 [kvm] kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 do_syscall_64+0x15f/0x600 where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu. To fix it, move kvm_load_guest_fpu to the very beginning of kvm_arch_vcpu_ioctl_run. Cc: stable@vger.kernel.org Fixes: f775b13eSigned-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
*** Guest State *** CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871 CR3 = 0x00000000fffbc000 RSP = 0x0000000000000000 RIP = 0x0000000000000000 RFLAGS=0x00000000 DR7 = 0x0000000000000400 ^^^^^^^^^^ The failed vmentry is triggered by the following testcase when ept=Y: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[5]; int main() { r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); struct kvm_regs regs = { .rflags = 0, }; ioctl(r[4], KVM_SET_REGS, ®s); ioctl(r[4], KVM_RUN, 0); } X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1 of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails. This patch fixes it by oring X86_EFLAGS_FIXED during ioctl. Cc: stable@vger.kernel.org Suggested-by: NJim Mattson <jmattson@google.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NQuan Xu <quan.xu0@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-