- 24 7月, 2020 11 次提交
-
-
由 Thomas Gleixner 提交于
Use the generic infrastructure to check for and handle pending work before transitioning into guest mode. This now handles TIF_NOTIFY_RESUME as well which was ignored so far. Handling it is important as this covers task work and task work will be used to offload the heavy lifting of POSIX CPU timers to thread context. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.979724969@linutronix.de
-
由 Thomas Gleixner 提交于
Remove the temporary defines and fixup all references. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.855839271@linutronix.de
-
由 Thomas Gleixner 提交于
Replace the x86 code with the generic variant. Use temporary defines for idtentry_* which will be cleaned up in the next step. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.711492752@linutronix.de
-
由 Thomas Gleixner 提交于
Cleanup the temporary defines and use irqentry_ instead of idtentry_. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.602603691@linutronix.de
-
由 Thomas Gleixner 提交于
Replace the x86 variant with the generic version. Provide the relevant architecture specific helper functions and defines. Use a temporary define for idtentry_exit_user which will be cleaned up seperately. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.494648601@linutronix.de
-
由 Thomas Gleixner 提交于
Replace the syscall entry work handling with the generic version. Provide the necessary helper inlines to handle the real architecture specific parts, e.g. ptrace. Use a temporary define for idtentry_enter_user which will be cleaned up seperately. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.376213694@linutronix.de
-
由 Thomas Gleixner 提交于
As a preparatory step for moving the syscall and interrupt entry/exit handling into generic code, provide a pt_regs helper which retrieves the interrupt state from pt_regs. This is required to check whether interrupts are reenabled by return from interrupt/exception. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220520.258511584@linutronix.de
-
由 Thomas Gleixner 提交于
Guests and user space share certain MSRs. KVM sets these MSRs to guest values once and does not set them back to user space values on every VM exit to spare the costly MSR operations. User return notifiers ensure that these MSRs are set back to the correct values before returning to user space in exit_to_usermode_loop(). There is no reason to evaluate the TIF flag indicating that user return notifiers need to be invoked in the loop. The important point is that they are invoked before returning to user space. Move the invocation out of the loop into the section which does the last preperatory steps before returning to user space. That section is not preemptible and runs with interrupts disabled until the actual return. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.159112003@linutronix.de
-
由 Thomas Gleixner 提交于
64bit and 32bit entry code have the same open coded syscall entry handling after the bitwidth specific bits. Move it to a helper function and share the code. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200722220520.051234096@linutronix.de
-
由 Thomas Gleixner 提交于
The user register sanity check is sprinkled all over the place. Move it into enter_from_user_mode(). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200722220519.943016204@linutronix.de
-
由 Ira Weiny 提交于
The noinstr qualifier is to be specified before the return type in the same way inline is used. These 2 cases were missed by previous patches. Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200723161405.852613-1-ira.weiny@intel.com
-
- 19 7月, 2020 2 次提交
-
-
由 Arvind Sankar 提交于
vmlinux-objs-y is added to targets, which currently means that the EFI stub gets added to the targets as well. It shouldn't be added since it is built elsewhere. This confuses Makefile.build which interprets the EFI stub as a target $(obj)/$(objtree)/drivers/firmware/efi/libstub/lib.a and will create drivers/firmware/efi/libstub/ underneath arch/x86/boot/compressed, to hold this supposed target, if building out-of-tree. [0] Fix this by pulling the stub out of vmlinux-objs-y into efi-obj-y. [0] See scripts/Makefile.build near the end: # Create directories for object files if they do not exist Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NMasahiro Yamada <masahiroy@kernel.org> Acked-by: NArd Biesheuvel <ardb@kernel.org> Link: https://lkml.kernel.org/r/20200715032631.1562882-1-nivedita@alum.mit.edu
-
由 Kees Cook 提交于
Some builds of GCC enable stack protector by default. Simply removing the arguments is not sufficient to disable stack protector, as the stack protector for those GCC builds must be explicitly disabled. Remove the argument removals and add -fno-stack-protector. Additionally include missed x32 argument updates, and adjust whitespace for readability. Fixes: 20355e5f ("x86/entry: Exclude low level entry code from sanitizing") Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/202006261333.585319CA6B@keescook
-
- 18 7月, 2020 2 次提交
-
-
由 Andy Lutomirski 提交于
tss_invalidate_io_bitmap() wasn't wired up properly through the pvop machinery, so the TSS and Xen's io bitmap would get out of sync whenever disabling a valid io bitmap. Add a new pvop for tss_invalidate_io_bitmap() to fix it. This is XSA-329. Fixes: 22fe5b04 ("x86/ioperm: Move TSS bitmap update to exit to user work") Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/d53075590e1f91c19f8af705059d3ff99424c020.1595030016.git.luto@kernel.org
-
由 Thomas Gleixner 提交于
Setting interrupt affinity on inactive interrupts is inconsistent when hierarchical irq domains are enabled. The core code should just store the affinity and not call into the irq chip driver for inactive interrupts because the chip drivers may not be in a state to handle such requests. X86 has a hacky workaround for that but all other irq chips have not which causes problems e.g. on GIC V3 ITS. Instead of adding more ugly hacks all over the place, solve the problem in the core code. If the affinity is set on an inactive interrupt then: - Store it in the irq descriptors affinity mask - Update the effective affinity to reflect that so user space has a consistent view - Don't call into the irq chip driver This is the core equivalent of the X86 workaround and works correctly because the affinity setting is established in the irq chip when the interrupt is activated later on. Note, that this is only effective when hierarchical irq domains are enabled by the architecture. Doing it unconditionally would break legacy irq chip implementations. For hierarchial irq domains this works correctly as none of the drivers can have a dependency on affinity setting in inactive state by design. Remove the X86 workaround as it is not longer required. Fixes: 02edee15 ("x86/apic/vector: Ignore set_affinity call for inactive interrupts") Reported-by: NAli Saidi <alisaidi@amazon.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NAli Saidi <alisaidi@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200529015501.15771-1-alisaidi@amazon.com Link: https://lkml.kernel.org/r/877dv2rv25.fsf@nanos.tec.linutronix.de
-
- 16 7月, 2020 4 次提交
-
-
由 Arnd Bergmann 提交于
The clang integrated assembler requires the 'cmp' instruction to have a length prefix here: arch/x86/math-emu/wm_sqrt.S:212:2: error: ambiguous instructions require an explicit suffix (could be 'cmpb', 'cmpw', or 'cmpl') cmp $0xffffffff,-24(%ebp) ^ Make this a 32-bit comparison, which it was clearly meant to be. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Link: https://lkml.kernel.org/r/20200527135352.1198078-1-arnd@arndb.de
-
由 Sedat Dilek 提交于
When assembling with Clang via `make LLVM_IAS=1` and CONFIG_HYPERV enabled, we observe the following error: <instantiation>:9:6: error: expected absolute expression .if HYPERVISOR_REENLIGHTENMENT_VECTOR == 3 ^ <instantiation>:1:1: note: while in macro instantiation idtentry HYPERVISOR_REENLIGHTENMENT_VECTOR asm_sysvec_hyperv_reenlightenment sysvec_hyperv_reenlightenment has_error_code=0 ^ ./arch/x86/include/asm/idtentry.h:627:1: note: while in macro instantiation idtentry_sysvec HYPERVISOR_REENLIGHTENMENT_VECTOR sysvec_hyperv_reenlightenment; ^ <instantiation>:9:6: error: expected absolute expression .if HYPERVISOR_STIMER0_VECTOR == 3 ^ <instantiation>:1:1: note: while in macro instantiation idtentry HYPERVISOR_STIMER0_VECTOR asm_sysvec_hyperv_stimer0 sysvec_hyperv_stimer0 has_error_code=0 ^ ./arch/x86/include/asm/idtentry.h:628:1: note: while in macro instantiation idtentry_sysvec HYPERVISOR_STIMER0_VECTOR sysvec_hyperv_stimer0; This is caused by typos in arch/x86/include/asm/idtentry.h: HYPERVISOR_REENLIGHTENMENT_VECTOR -> HYPERV_REENLIGHTENMENT_VECTOR HYPERVISOR_STIMER0_VECTOR -> HYPERV_STIMER0_VECTOR For more details see ClangBuiltLinux issue #1088. Fixes: a16be368 ("x86/entry: Convert various hypervisor vectors to IDTENTRY_SYSVEC") Suggested-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NWei Liu <wei.liu@kernel.org> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1088 Link: https://github.com/ClangBuiltLinux/linux/issues/1043 Link: https://lore.kernel.org/patchwork/patch/1272115/ Link: https://lkml.kernel.org/r/20200714194740.4548-1-sedat.dilek@gmail.com
-
由 Jian Cai 提交于
Clang's integrated assembler does not allow symbols with non-absolute values to be reassigned. Modify the interrupt entry loop macro to be compatible with IAS by using a label and an offset. Reported-by: NNick Desaulniers <ndesaulniers@google.com> Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Suggested-by: NNick Desaulniers <ndesaulniers@google.com> Suggested-by: NBrian Gerst <brgerst@gmail.com> Suggested-by: NArvind Sankar <nivedita@alum.mit.edu> Signed-off-by: NJian Cai <caij2003@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Link: https://github.com/ClangBuiltLinux/linux/issues/1043 Link: https://lkml.kernel.org/r/20200714233024.1789985-1-caij2003@gmail.com
-
由 Thomas Gleixner 提交于
Stack switching for interrupt handlers happens in C now for both 64 and 32bit. Remove the stale comment which claims the contrary. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 7月, 2020 1 次提交
-
-
由 Thomas Gleixner 提交于
Quite some non OF/ACPI users of irqdomains allocate firmware nodes of type IRQCHIP_FWNODE_NAMED or IRQCHIP_FWNODE_NAMED_ID and free them right after creating the irqdomain. The only purpose of these FW nodes is to convey name information. When this was introduced the core code did not store the pointer to the node in the irqdomain. A recent change stored the firmware node pointer in irqdomain for other reasons and missed to notice that the usage sites which do the alloc_fwnode/create_domain/free_fwnode sequence are broken by this. Storing a dangling pointer is dangerous itself, but in case that the domain is destroyed later on this leads to a double free. Remove the freeing of the firmware node after creating the irqdomain from all affected call sites to cure this. Fixes: 711419e5 ("irqdomain: Add the missing assignment of domain->fwnode for named fwnode") Reported-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NMarc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/873661qakd.fsf@nanos.tec.linutronix.de
-
- 10 7月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
Commit 850448f3 ("KVM: nVMX: Fix VMX preemption timer migration", 2020-06-01) accidentally broke nVMX live migration from older version by changing the userspace ABI. Restore it and, while at it, ensure that vmx->nested.has_preemption_timer_deadline is always initialized according to the KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE flag. Cc: Makarand Sonare <makarandsonare@google.com> Fixes: 850448f3 ("KVM: nVMX: Fix VMX preemption timer migration") Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 7月, 2020 3 次提交
-
-
由 Thomas Gleixner 提交于
No users outside this file anymore. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.301116609@linutronix.de
-
由 Thomas Gleixner 提交于
It's called from the non-instrumentable section. Fixes: c9c26150 ("x86/entry: Assert that syscalls are on the right stack") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.191497962@linutronix.de
-
由 Thomas Gleixner 提交于
exc_alignment_check() fails to disable interrupts before returning to the entry code. Fixes: ca4c6a98 ("x86/traps: Make interrupt enable/disable symmetric in C code") Reported-by: syzbot+0889df9502bc0f112b31@syzkaller.appspotmail.com Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200708192934.076519438@linutronix.de
-
- 07 7月, 2020 1 次提交
-
-
由 Andy Lutomirski 提交于
They were originally called _cond_rcu because they were special versions with conditional RCU handling. Now they're the standard entry and exit path, so the _cond_rcu part is just confusing. Drop it. Also change the signature to make them more extensible and more foolproof. No functional change -- it's pure refactoring. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/247fc67685263e0b673e1d7f808182d28ff80359.1593795633.git.luto@kernel.org
-
- 06 7月, 2020 2 次提交
-
-
由 Linus Torvalds 提交于
Using a mutex for "print this warning only once" is so overdesigned as to be actively offensive to my sensitive stomach. Just use "pr_info_once()" that already does this, although in a (harmlessly) racy manner that can in theory cause the message to be printed twice if more than one CPU races on that "is this the first time" test. [ If somebody really cares about that harmless data race (which sounds very unlikely indeed), that person can trivially fix printk_once() by using a simple atomic access, preferably with an optimistic non-atomic test first before even bothering to treat the pointless "make sure it is _really_ just once" case. A mutex is most definitely never the right primitive to use for something like this. ] Yes, this is a small and meaningless detail in a code path that hardly matters. But let's keep some code quality standards here, and not accept outrageously bad code. Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/ Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ingo Molnar 提交于
xenpv_exc_nmi() and xenpv_exc_debug() are only defined on 64-bit kernels, but they snuck into the 32-bit build via <asm/identry.h>, causing the link to fail: ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_nmi': (.entry.text+0x817): undefined reference to `xenpv_exc_nmi' ld: arch/x86/entry/entry_32.o: in function `asm_xenpv_exc_debug': (.entry.text+0x827): undefined reference to `xenpv_exc_debug' Only use them on 64-bit kernels. Fixes: f41f0824: ("x86/entry/xen: Route #DB correctly on Xen PV") Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 7月, 2020 5 次提交
-
-
由 Andy Lutomirski 提交于
Xen PV doesn't implement ESPFIX64, so they don't work right. Disable them. Also print a warning the first time anyone tries to use a 16-bit segment on a Xen PV guest that would otherwise allow it to help people diagnose this change in behavior. This gets us closer to having all x86 selftests pass on Xen PV. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/92b2975459dfe5929ecf34c3896ad920bd9e3f2d.1593795633.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW on x86_32, but the code expected them to be RAW. Get rid of all the macro indirection for them on 32-bit and just use DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly. Also add a warning to make sure that we only hit the _kernel paths in kernel mode. Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/9e90a7ee8e72fd757db6d92e1e5ff16339c1ecf9.1593795633.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
On Xen PV, #DB doesn't use IST. It still needs to be correctly routed depending on whether it came from user or kernel mode. Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the logic. Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on Xen, and do the right thing for #DB. Also add more warnings to the exc_debug* handlers to make this type of failure more obvious. This fixes various forms of corruption that happen when usermode triggers #DB on Xen PV. Fixes: 4c0dcd83 ("x86/entry: Implement user mode C entry points for #DB and #MCE") Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
Chasing down a Xen bug caused me to realize that the new entry sanity checks are still fairly weak. Add some more checks. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/881de09e786ab93ce56ee4a2437ba2c308afe7a9.1593795633.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
Move the clearing of the high bits of RAX after Xen PV joins the SYSENTER path so that Xen PV doesn't skip it. Arguably this code should be deleted instead, but that would belong in the merge window. Fixes: ffae641f ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup") Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/9d33b3f3216dcab008070f1c28b6091ae7199969.1593795633.git.luto@kernel.org
-
- 04 7月, 2020 4 次提交
-
-
由 Christoph Hellwig 提交于
Fix the recently added new __vmalloc_node_range callers to pass the correct values as the owner for display in /proc/vmallocinfo. Fixes: 800e26b8 ("x86/hyperv: allocate the hypercall page with only read and execute bits") Fixes: 10d5e97c ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page") Fixes: 7a0e27b2 ("mm: remove vmalloc_exec") Reported-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200627075649.2455097-1-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sean Christopherson 提交于
Use the "common" KVM_POSSIBLE_CR*_GUEST_BITS defines to initialize the CR0/CR4 guest host masks instead of duplicating most of the CR4 mask and open coding the CR0 mask. SVM doesn't utilize the masks, i.e. the masks are effectively VMX specific even if they're not named as such. This avoids duplicate code, better documents the guest owned CR0 bit, and eliminates the need for a build-time assertion to keep VMX and x86 synchronized. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703040422.31536-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Mark CR4.TSD as being possibly owned by the guest as that is indeed the case on VMX. Without TSD being tagged as possibly owned by the guest, a targeted read of CR4 to get TSD could observe a stale value. This bug is benign in the current code base as the sole consumer of TSD is the emulator (for RDTSC) and the emulator always "reads" the entirety of CR4 when grabbing bits. Add a build-time assertion in to ensure VMX doesn't hand over more CR4 bits without also updating x86. Fixes: 52ce3c21 ("x86,kvm,vmx: Don't trap writes to CR4.TSD") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703040422.31536-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Inject a #GP on MOV CR4 if CR4.LA57 is toggled in 64-bit mode, which is illegal per Intel's SDM: CR4.LA57 57-bit linear addresses (bit 12 of CR4) ... blah blah blah ... This bit cannot be modified in IA-32e mode. Note, the pseudocode for MOV CR doesn't call out the fault condition, which is likely why the check was missed during initial development. This is arguably an SDM bug and will hopefully be fixed in future release of the SDM. Fixes: fd8cb433 ("KVM: MMU: Expose the LA57 feature to VM.") Cc: stable@vger.kernel.org Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703021714.5549-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 7月, 2020 3 次提交
-
-
由 Andy Lutomirski 提交于
The SYSENTER frame setup was nonsense. It worked by accident because the normal code into which the Xen asm jumped (entry_SYSENTER_32/compat) threw away SP without touching the stack. entry_SYSENTER_compat was recently modified such that it relied on having a valid stack pointer, so now the Xen asm needs to invoke it with a valid stack. Fix it up like SYSCALL: use the Xen-provided frame and skip the bare metal prologue. Fixes: 1c3e5d3f ("x86/entry: Make entry_64_compat.S objtool clean") Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lkml.kernel.org/r/947880c41ade688ff4836f665d0c9fcaa9bd1201.1593191971.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
The SYSENTER asm (32-bit and compat) contains fixups for regs->sp and regs->flags. Move the fixups into C and fix some comments while at it. This is a valid cleanup all by itself, and it also simplifies the subsequent patch that will fix Xen PV SYSENTER. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/fe62bef67eda7fac75b8f3dbafccf571dc4ece6b.1593191971.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
Now that the entry stack is a full page, it's too easy to regress the system call entry code and end up on the wrong stack without noticing. Assert that all system calls (SYSCALL64, SYSCALL32, SYSENTER, and INT80) are on the right stack and have pt_regs in the right place. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/52059e42bb0ab8551153d012d68f7be18d72ff8e.1593191971.git.luto@kernel.org
-
- 30 6月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Choo! Choo! All aboard the Split Lock Express, with direct service to Wreckage! Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible SLD-enabled CPU model to avoid writing MSR_TEST_CTRL. MSR_TEST_CTRL exists, and is writable, on many generations of CPUs. Writing the MSR, even with '0', can result in bizarre, undocumented behavior. This fixes a crash on Haswell when resuming from suspend with a live KVM guest. Because APs use the standard SMP boot flow for resume, they will go through split_lock_init() and the subsequent RDMSR/WRMSR sequence, which runs even when sld_state==sld_off to ensure SLD is disabled. On Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will succeed and _may_ take the SMT _sibling_ out of VMX root mode. When KVM has an active guest, KVM performs VMXON as part of CPU onlining (see kvm_starting_cpu()). Because SMP boot is serialized, the resulting flow is effectively: on_each_ap_cpu() { WRMSR(MSR_TEST_CTRL, 0) VMXON } As a result, the WRMSR can disable VMX on a different CPU that has already done VMXON. This ultimately results in a #UD on VMPTRLD when KVM regains control and attempt run its vCPUs. The above voodoo was confirmed by reworking KVM's VMXON flow to write MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above. Further verification of the insanity was done by redoing VMXON on all APs after the initial WRMSR->VMXON sequence. The additional VMXON, which should VM-Fail, occasionally succeeded, and also eliminated the unexpected #UD on VMPTRLD. The damage done by writing MSR_TEST_CTRL doesn't appear to be limited to VMX, e.g. after suspend with an active KVM guest, subsequent reboots almost always hang (even when fudging VMXON), a #UD on a random Jcc was observed, suspend/resume stability is qualitatively poor, and so on and so forth. kernel BUG at arch/x86/kvm/x86.c:386! CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G D Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:kvm_spurious_fault+0xf/0x20 Call Trace: vmx_vcpu_load_vmcs+0x1fb/0x2b0 vmx_vcpu_load+0x3e/0x160 kvm_arch_vcpu_load+0x48/0x260 finish_task_switch+0x140/0x260 __schedule+0x460/0x720 _cond_resched+0x2d/0x40 kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0 kvm_vcpu_ioctl+0x363/0x5c0 ksys_ioctl+0x88/0xa0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: dbaba470 ("x86/split_lock: Rework the initialization flow of split lock detection") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200605192605.7439-1-sean.j.christopherson@intel.com
-