- 14 12月, 2017 18 次提交
-
-
由 Wanpeng Li 提交于
MSR_IA32_DEBUGCTLMSR is zeroed on VMEXIT, so it is saved/restored each time during world switch. This patch caches the host IA32_DEBUGCTL MSR and saves/restores the host IA32_DEBUGCTL msr when guest/host switches to avoid to save/restore each time during world switch. This saves about 100 clock cycles per vmexit. Suggested-by: NJim Mattson <jmattson@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Mark Kanda 提交于
When attempting to free a loaded VMCS02, add a WARN and avoid freeing it (to avoid use-after-free situations). Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMark Kanda <mark.kanda@oracle.com> Reviewed-by: NAmeya More <ameya.more@oracle.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Jim Mattson 提交于
The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NMark Kanda <mark.kanda@oracle.com> Reviewed-by: NAmeya More <ameya.more@oracle.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Yang Zhong 提交于
Intel IceLake cpu has added new cpu features,AVX512_VBMI2/GFNI/ VAES/VPCLMULQDQ/AVX512_VNNI/AVX512_BITALG. Those new cpu features need expose to guest VM. The bit definition: CPUID.(EAX=7,ECX=0):ECX[bit 06] AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 08] GFNI CPUID.(EAX=7,ECX=0):ECX[bit 09] VAES CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG The release document ref below link: https://software.intel.com/sites/default/files/managed/c5/15/\ architecture-instruction-set-extensions-programming-reference.pdf The kernel dependency commit in kvm.git: (c128dbfa) Signed-off-by: NYang Zhong <yang.zhong@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Liran Alon 提交于
This MSR returns the number of #SMIs that occurred on CPU since boot. It was seen to be used frequently by ESXi guest. Patch adds a new vcpu-arch specific var called smi_count to save the number of #SMIs which occurred on CPU since boot. It is exposed as a read-only MSR to guest (causing #GP on wrmsr) in RDMSR/WRMSR emulation code. MSR_SMI_COUNT is also added to emulated_msrs[] to make sure user-space can save/restore it for migration purposes. Signed-off-by: NLiran Alon <liran.alon@oracle.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: NBhavesh Davda <bhavesh.davda@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Radim Krčmář 提交于
If Intel/AMD implements MWAIT, we expect that it works well and only reject known bugs; no reason to do it the other way around for minor vendors. (Not that they are relevant ATM.) This allows further simplification of kvm_mwait_in_guest(). And use boot_cpu_has() instead of "cpu_has(&boot_cpu_data," while at it. Reviewed-by: NAlexander Graf <agraf@suse.de> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
The check was added in some iteration while trying to fix a reported OS X on Core 2 bug, but that bug is elsewhere. The comment is misleading because the guest can call MWAIT with ECX = 0 even if we enforce CPUID5_ECX_INTERRUPT_BREAK; the call would have the exactly the same effect as if the host didn't have the feature. A problem is that a QEMU feature exposes CPUID5_ECX_INTERRUPT_BREAK on CPUs that do not support it. Removing the check changes behavior on last Pentium 4 lines (Presler, Dempsey, and Tulsa, which had VMX and MONITOR while missing INTERRUPT_BREAK) when running a guest OS that uses MWAIT without checking for its presence (QEMU doesn't expose MONITOR). The only known OS that ignores the MONITOR flag is old Mac OS X and we allowed it to bug on Core 2 (MWAIT used to throw #UD and only that OS noticed), so we can save another 20 lines letting it bug on even older CPUs. Alternatively, we can return MWAIT exiting by default and let userspace toggle it. Reviewed-by: NAlexander Graf <agraf@suse.de> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
The bug prevents MWAIT from waking up after a write to the monitored cache line. KVM might emulate a CPU model that shouldn't have the bug, so the guest would not employ a workaround and possibly miss wakeups. Better to avoid the situation. Reviewed-by: NAlexander Graf <agraf@suse.de> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Colin Ian King 提交于
The array audit_point_name is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: arch/x86/kvm/mmu_audit.c:22:12: warning: symbol 'audit_point_name' was not declared. Should it be static? Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Gimcuan Hui 提交于
The kvm_mmu_clear_all_pte_masks interface is only used by kvm_mmu_module_init locally, and does not need to be called by other module, make it static. This patch cleans up sparse warning: symbol 'kvm_mmu_clear_all_pte_masks' was not declared. Should it be static? Signed-off-by: NGimcuan Hui <gimcuan@gmail.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This is encoded as F3 0F C7 /7 with a register argument. The register argument is the second array in the group9 GroupDual, while F3 is the fourth element of a Prefix. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
UMIP can be emulated almost perfectly on Intel processor by enabling descriptor-table exits. SMSW does not cause a vmexit and hence it cannot be changed into a #GP fault, but all in all it's the most "innocuous" of the unprivileged instructions that UMIP blocks. In fact, Linux is _also_ emulating SMSW instructions on behalf of the program that executes them, because some 16-bit programs expect to use SMSW to detect vm86 mode, so this is an even smaller issue. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The User-Mode Instruction Prevention feature present in recent Intel processor prevents a group of instructions (sgdt, sidt, sldt, smsw, and str) from being executed with CPL > 0. Otherwise, a general protection fault is issued. UMIP instructions in general are also able to trigger vmexits, so we can actually emulate UMIP on older processors. This commit sets up the infrastructure so that kvm-intel.ko and kvm-amd.ko can set the UMIP feature bit for CPUID even if the feature is not actually available in hardware. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
These are needed to handle the descriptor table vmexits when emulating UMIP. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Add the CPUID bits, make the CR4.UMIP bit not reserved anymore, and add UMIP support for instructions that are already emulated by KVM. Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
------------[ cut here ]------------ Bad FPU state detected at kvm_put_guest_fpu+0xd8/0x2d0 [kvm], reinitializing FPU registers. WARNING: CPU: 1 PID: 4594 at arch/x86/mm/extable.c:103 ex_handler_fprestore+0x88/0x90 CPU: 1 PID: 4594 Comm: qemu-system-x86 Tainted: G B OE 4.15.0-rc2+ #10 RIP: 0010:ex_handler_fprestore+0x88/0x90 Call Trace: fixup_exception+0x4e/0x60 do_general_protection+0xff/0x270 general_protection+0x22/0x30 RIP: 0010:kvm_put_guest_fpu+0xd8/0x2d0 [kvm] RSP: 0018:ffff8803d5627810 EFLAGS: 00010246 kvm_vcpu_reset+0x3b4/0x3c0 [kvm] kvm_apic_accept_events+0x1c0/0x240 [kvm] kvm_arch_vcpu_ioctl_run+0x1658/0x2fb0 [kvm] kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 do_syscall_64+0x15f/0x600 where kvm_put_guest_fpu is called without a prior kvm_load_guest_fpu. To fix it, move kvm_load_guest_fpu to the very beginning of kvm_arch_vcpu_ioctl_run. Cc: stable@vger.kernel.org Fixes: f775b13eSigned-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
*** Guest State *** CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871 CR3 = 0x00000000fffbc000 RSP = 0x0000000000000000 RIP = 0x0000000000000000 RFLAGS=0x00000000 DR7 = 0x0000000000000400 ^^^^^^^^^^ The failed vmentry is triggered by the following testcase when ept=Y: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[5]; int main() { r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); struct kvm_regs regs = { .rflags = 0, }; ioctl(r[4], KVM_SET_REGS, ®s); ioctl(r[4], KVM_RUN, 0); } X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1 of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails. This patch fixes it by oring X86_EFLAGS_FIXED during ioctl. Cc: stable@vger.kernel.org Suggested-by: NJim Mattson <jmattson@google.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NQuan Xu <quan.xu0@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
The below test case can cause infinite loop in kvm when ept=0. #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[5]; int main() { r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); ioctl(r[4], KVM_RUN, 0); } It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in kvm return 1 when kvm fails to allocate root page table which can result in beblow infinite loop: vcpu_run() { for (;;) { r = vcpu_enter_guest()::kvm_mmu_reload() returns 1 if (r <= 0) break; if (need_resched()) cond_resched(); } } This patch fixes it by returning -ENOSPC when there is no available kvm mmu page for root page table. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Fixes: 26eeb53c (KVM: MMU: Bail out immediately if there is no available mmu page) Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 12月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
Commit 4675ff05 ("kmemcheck: rip it out") has removed the code but for some reason SPDX header stayed in place. This looks like a rebase mistake in the mmotm tree or the merge mistake. Let's drop those leftovers as well. Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 12月, 2017 2 次提交
-
-
由 Arnd Bergmann 提交于
gcc-8 warns that time() is an alias for __vdso_time() but the two have different prototypes: arch/x86/entry/vdso/vclock_gettime.c:327:5: error: 'time' alias between functions of incompatible types 'int(time_t *)' {aka 'int(long int *)'} and 'time_t(time_t *)' {aka 'long int(long int *)'} [-Werror=attribute-alias] int time(time_t *t) ^~~~ arch/x86/entry/vdso/vclock_gettime.c:318:16: note: aliased declaration here I could not figure out whether this is intentional, but I see that changing it to return time_t avoids the warning. Returning 'int' from time() is also a bit questionable, as it causes an overflow in y2038 even on 64-bit architectures that use a 64-bit time_t type. On 32-bit architecture with 64-bit time_t, time() should always be implement by the C library by calling a (to be added) clock_gettime() variant that takes a sufficiently wide argument. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: http://lkml.kernel.org/r/20171204150203.852959-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Colin Ian King 提交于
Functions x86_vector_debug_show(), uv_handle_nmi() and uv_nmi_setup_common() are local to the source and do not need to be in global scope, so make them static. Fixes up various sparse warnings. Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NMike Travis <mike.travis@hpe.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Cc: travis@sgi.com Link: http://lkml.kernel.org/r/20171206173358.24388-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 12月, 2017 11 次提交
-
-
由 Radim Krčmář 提交于
Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by: NFabian Grünbichler <f.gruenbichler@proxmox.com> Fixes: 38b99173 ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea824 ("mm/rmap: update to new mmu_notifier semantic v2") Cc: <stable@vger.kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Tested-by: NWanpeng Li <wanpeng.li@hotmail.com> Tested-by: NFabian Grünbichler <f.gruenbichler@proxmox.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Andy Lutomirski 提交于
__restore_processor_context() had a couple of ordering bugs. It restored GSBASE after calling load_gs_index(), and the latter can call into tracing code. It also tried to restore segment registers before restoring the LDT, which is straight-up wrong. Reorder the code so that we restore GSBASE, then the descriptor tables, then the segments. This fixes two bugs. First, it fixes a regression that broke resume under certain configurations due to irqflag tracing in native_load_gs_index(). Second, it fixes resume when the userspace process that initiated suspect had funny segments. The latter can be reproduced by compiling this: // SPDX-License-Identifier: GPL-2.0 /* * ldt_echo.c - Echo argv[1] while using an LDT segment */ int main(int argc, char **argv) { int ret; size_t len; char *buf; const struct user_desc desc = { .entry_number = 0, .base_addr = 0, .limit = 0xfffff, .seg_32bit = 1, .contents = 0, /* Data, grow-up */ .read_exec_only = 0, .limit_in_pages = 1, .seg_not_present = 0, .useable = 0 }; if (argc != 2) errx(1, "Usage: %s STRING", argv[0]); len = asprintf(&buf, "%s\n", argv[1]); if (len < 0) errx(1, "Out of memory"); ret = syscall(SYS_modify_ldt, 1, &desc, sizeof(desc)); if (ret < -1) errno = -ret; if (ret) err(1, "modify_ldt"); asm volatile ("movw %0, %%es" :: "rm" ((unsigned short)7)); write(1, buf, len); return 0; } and running ldt_echo >/sys/power/mem Without the fix, the latter causes a triple fault on resume. Fixes: ca37e57b ("x86/entry/64: Add missing irqflags tracing to native_load_gs_index()") Reported-by: NJarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NJarkko Nikula <jarkko.nikula@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/6b31721ea92f51ea839e79bd97ade4a75b1eeea2.1512057304.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rafael J. Wysocki 提交于
acpi_os_get_root_pointer() may return a valid address even if acpi_disabled is set, but the host bridge information from the ACPI tables is not going to be used in that case and the Broadcom host bridge initialization should not be skipped then, So make broadcom_postcore_init() check acpi_disabled too to avoid this issue. Fixes: 6361d72b (x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan) Reported-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linux PCI <linux-pci@vger.kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/3186627.pxZj1QbYNg@aspire.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tom Lendacky 提交于
The size for the Microcode Patch Block (MPB) for an AMD family 17h processor is 3200 bytes. Add a #define for fam17h so that it does not default to 2048 bytes and fail a microcode load/update. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rudolf Marek 提交于
The latest AMD AMD64 Architecture Programmer's Manual adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]). If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers, thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs. Signed-off-by: NRudolf Marek <r.marek@assembler.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Tested-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Linus Torvalds 提交于
Things like this will probably keep showing up for other architectures and other special cases. I actually thought we already used %lx for this, and that is indeed _historically_ the case, but we moved to %p when merging the 32-bit and 64-bit cases as a convenient way to get the formatting right (ie automatically picking "%08lx" vs "%016lx" based on register size). So just turn this %p into %px. Reported-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
The refcount_t protection on x86 was not intended to use the stricter GPL export. This adjusts the linkage again to avoid a regression in the availability of the refcount API. Reported-by: NDave Airlie <airlied@gmail.com> Fixes: 7a46ec0e ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") Cc: stable@vger.kernel.org Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jim Mattson 提交于
vmx_io_bitmap_b should not be allocated twice. Fixes: 23611332 ("KVM: VMX: refactor setup of global page-sized bitmaps") Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Andrew Honig 提交于
This fixes CVE-2017-1000407. KVM allows guests to directly access I/O port 0x80 on Intel hosts. If the guest floods this port with writes it generates exceptions and instability in the host kernel, leading to a crash. With this change guest writes to port 0x80 on Intel will behave the same as they currently behave on AMD systems. Prevent the flooding by removing the code that sets port 0x80 as a passthrough port. This is essentially the same as upstream patch 99f85a28, except that patch was for AMD chipsets and this patch is for Intel. Signed-off-by: NAndrew Honig <ahonig@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Fixes: fdef3ad1 ("KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs") Cc: <stable@vger.kernel.org> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Rik van Riel 提交于
Now that get_fpu and put_fpu do nothing, because the scheduler will automatically load and restore the guest FPU context for us while we are in this code (deep inside the vcpu_run main loop), we can get rid of the get_fpu and put_fpu hooks. Signed-off-by: NRik van Riel <riel@redhat.com> Suggested-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Rik van Riel 提交于
Currently, every time a VCPU is scheduled out, the host kernel will first save the guest FPU/xstate context, then load the qemu userspace FPU context, only to then immediately save the qemu userspace FPU context back to memory. When scheduling in a VCPU, the same extraneous FPU loads and saves are done. This could be avoided by moving from a model where the guest FPU is loaded and stored with preemption disabled, to a model where the qemu userspace FPU is swapped out for the guest FPU context for the duration of the KVM_RUN ioctl. This is done under the VCPU mutex, which is also taken when other tasks inspect the VCPU FPU context, so the code should already be safe for this change. That should come as no surprise, given that s390 already has this optimization. This can fix a bug where KVM calls get_user_pages while owning the FPU, and the file system ends up requesting the FPU again: [258270.527947] __warn+0xcb/0xf0 [258270.527948] warn_slowpath_null+0x1d/0x20 [258270.527951] kernel_fpu_disable+0x3f/0x50 [258270.527953] __kernel_fpu_begin+0x49/0x100 [258270.527955] kernel_fpu_begin+0xe/0x10 [258270.527958] crc32c_pcl_intel_update+0x84/0xb0 [258270.527961] crypto_shash_update+0x3f/0x110 [258270.527968] crc32c+0x63/0x8a [libcrc32c] [258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] [258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data] [258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] [258270.527988] submit_io+0x170/0x1b0 [dm_bufio] [258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio] [258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio] [258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio] [258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] [258270.528002] shrink_slab.part.40+0x1f5/0x420 [258270.528004] shrink_node+0x22c/0x320 [258270.528006] do_try_to_free_pages+0xf5/0x330 [258270.528008] try_to_free_pages+0xe9/0x190 [258270.528009] __alloc_pages_slowpath+0x40f/0xba0 [258270.528011] __alloc_pages_nodemask+0x209/0x260 [258270.528014] alloc_pages_vma+0x1f1/0x250 [258270.528017] do_huge_pmd_anonymous_page+0x123/0x660 [258270.528021] handle_mm_fault+0xfd3/0x1330 [258270.528025] __get_user_pages+0x113/0x640 [258270.528027] get_user_pages+0x4f/0x60 [258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] [258270.528108] try_async_pf+0x66/0x230 [kvm] [258270.528135] tdp_page_fault+0x130/0x280 [kvm] [258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm] [258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel] [258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] No performance changes were detected in quick ping-pong tests on my 4 socket system, which is expected since an FPU+xstate load is on the order of 0.1us, while ping-ponging between CPUs is on the order of 20us, and somewhat noisy. Cc: stable@vger.kernel.org Signed-off-by: NRik van Riel <riel@redhat.com> Suggested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu, which happened inside from KVM_CREATE_VCPU ioctl. - Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 05 12月, 2017 1 次提交
-
-
由 Hendrik Brueckner 提交于
Commit 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") introduced the bpf_perf_event_data structure which exports the pt_regs structure. This is OK for multiple architectures but fail for s390 and arm64 which do not export pt_regs. Programs using them, for example, the bpf selftest fail to compile on these architectures. For s390, exporting the pt_regs is not an option because s390 wants to allow changes to it. For arm64, there is a user_pt_regs structure that covers parts of the pt_regs structure for use by user space. To solve the broken uapi for s390 and arm64, introduce an abstract type for pt_regs and add an asm/bpf_perf_event.h file that concretes the type. An asm-generic header file covers the architectures that export pt_regs today. The arch-specific enablement for s390 and arm64 follows in separate commits. Reported-by: NThomas Richter <tmricht@linux.vnet.ibm.com> Fixes: 0515e599 ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 30 11月, 2017 2 次提交
-
-
由 Dan Williams 提交于
In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Suggested-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a00cc7d9 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 11月, 2017 5 次提交
-
-
由 Chunyu Hu 提交于
On a secondary, idt is first loaded in cpu_init() with load_current_idt(), i.e. no exceptions can be handled before that point. The conversion of WARN() to use UD requires the IDT being loaded earlier as any warning between start_secondary() and load_curren_idt() in cpu_init() will result in an unhandled @UD exception and therefore fail the bringup of the CPU. Install the IDT handlers right in start_secondary() before calling cpu_init(). [ tglx: Massaged changelog ] Fixes: 9a93848f ("x86/debug: Implement __WARN() using UD0") Signed-off-by: NChunyu Hu <chuhu@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1511792499-4073-1-git-send-email-chuhu@redhat.com
-
由 Juergen Gross 提交于
Add early interrupt handlers activated by idt_setup_early_handler() to the handlers supported by Xen pv guests. This will allow for early WARN() calls not crashing the guest. Suggested-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Link: https://lkml.kernel.org/r/20171124084221.30172-1-jgross@suse.com
-
由 Jan H. Schönherr 提交于
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that "any unblocked signal received [...] will cause KVM_RUN to return with -EINTR" and that "the signal will only be delivered if not blocked by the original signal mask". This, however, is only true, when the calling task has a signal handler registered for a signal. If not, signal evaluation is short-circuited for SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN returning or the whole process is terminated. Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar to that in do_sigtimedwait() to avoid short-circuiting of signals. Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: ------------[ cut here ]------------ WARNING: CPU: 5 PID: 2939 at arch/x86/kvm/vmx.c:3844 free_loaded_vmcs+0x77/0x80 [kvm_intel] CPU: 5 PID: 2939 Comm: repro Not tainted 4.14.0+ #26 RIP: 0010:free_loaded_vmcs+0x77/0x80 [kvm_intel] Call Trace: vmx_free_vcpu+0xda/0x130 [kvm_intel] kvm_arch_destroy_vm+0x192/0x290 [kvm] kvm_put_kvm+0x262/0x560 [kvm] kvm_vm_release+0x2c/0x30 [kvm] __fput+0x190/0x370 task_work_run+0xa1/0xd0 do_exit+0x4d2/0x13e0 do_group_exit+0x89/0x140 get_signal+0x318/0xb80 do_signal+0x8c/0xb40 exit_to_usermode_loop+0xe4/0x140 syscall_return_slowpath+0x206/0x230 entry_SYSCALL_64_fastpath+0x98/0x9a The syzkaller testcase will execute VMXON/VMLAUCH instructions, so the vmx->nested stuff is populated, it will also issue KVM_SMI ioctl. However, the testcase is just a simple c program and not be lauched by something like seabios which implements smi_handler. Commit 05cade71 (KVM: nSVM: fix SMI injection in guest mode) gets out of guest mode and set nested.vmxon to false for the duration of SMM according to SDM 34.14.1 "leave VMX operation" upon entering SMM. We can't alloc/free the vmx->nested stuff each time when entering/exiting SMM since it will induce more overhead. So the function vmx_pre_enter_smm() marks nested.vmxon false even if vmx->nested stuff is still populated. What it expected is em_rsm() can mark nested.vmxon to be true again. However, the smi_handler/rsm will not execute since there is no something like seabios in this scenario. The function free_nested() fails to free the vmx->nested stuff since the vmx->nested.vmxon is false which results in the above warning. This patch fixes it by also considering the no SMI handler case, luckily vmx->nested.smm.vmxon is marked according to the value of vmx->nested.vmxon in vmx_pre_enter_smm(), we can take advantage of it and free vmx->nested stuff when L1 goes down. Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Fixes: 05cade71 (KVM: nSVM: fix SMI injection in guest mode) Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: *** Guest State *** CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1 CR3 = 0x000000002081e000 RSP = 0x000000000000fffa RIP = 0x0000000000000000 RFLAGS=0x00023000 DR7 = 0x00000000000000 ^^^^^^^^^^ ------------[ cut here ]------------ WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm] CPU: 6 PID: 24431 Comm: reprotest Tainted: G W OE 4.14.0+ #26 RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm] RSP: 0018:ffff880291d179e0 EFLAGS: 00010202 Call Trace: kvm_vcpu_ioctl+0x479/0x880 [kvm] do_vfs_ioctl+0x142/0x9a0 SyS_ioctl+0x74/0x80 entry_SYSCALL_64_fastpath+0x23/0x9a The failed vmentry is triggered by the following beautified testcase: #include <unistd.h> #include <sys/syscall.h> #include <string.h> #include <stdint.h> #include <linux/kvm.h> #include <fcntl.h> #include <sys/ioctl.h> long r[5]; int main() { struct kvm_debugregs dr = { 0 }; r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); struct kvm_guest_debug debug = { .control = 0xf0403, .arch = { .debugreg[6] = 0x2, .debugreg[7] = 0x2 } }; ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug); ioctl(r[4], KVM_RUN, 0); } which testcase tries to setup the processor specific debug registers and configure vCPU for handling guest debug events through KVM_SET_GUEST_DEBUG. The KVM_SET_GUEST_DEBUG ioctl will get and set rflags in order to set TF bit if single step is needed. All regs' caches are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU reset. However, the cache of rflags is not reset during vCPU reset. The function vmx_get_rflags() returns an unreset rflags cache value since the cache is marked avail, it is 0 after boot. Vmentry fails if the rflags reserved bit 1 is 0. This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and its cache to 0x2 during vCPU reset. Reported-by: NDmitry Vyukov <dvyukov@google.com> Tested-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-