- 28 2月, 2018 1 次提交
-
-
由 Jan Beulich 提交于
Omitting suffixes from instructions in AT&T mode is bad practice when operand size cannot be determined by the assembler from register operands, and is likely going to be warned about by upstream gas in the future (mine does already). Add the single missing suffix here. Signed-off-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/5A93F96902000078001ABAC8@prv-mh.provo.novell.com
-
- 21 2月, 2018 7 次提交
-
-
由 Josh Poimboeuf 提交于
On 64-bit, the stack pointer is always aligned on interrupt, so instead of setting the LSB of the pt_regs address, we can just add 1 to it. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180221024214.lhl5jfgw33c4vz3m@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Open-code the two instances which called switch_to_thread_stack(). This allows us to remove the wrapper around DO_SWITCH_TO_THREAD_STACK. While at it, update the UNWIND hint to reflect where the IRET frame is, and update the commentary to reflect what we are actually doing here. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Moving ASM_CLAC to interrupt_entry means two instructions (addq / pushq and call interrupt_entry) are not covered by it. However, it offers a noticeable size reduction (-.2k): text data bss dec hex filename 16882 0 0 16882 41f2 entry_64.o-orig 16623 0 0 16623 40ef entry_64.o Suggested-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
It is now trivial to call interrupt_entry() and then the actual worker. Therefore, remove the interrupt macro and open code it all. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-5-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
We can also move the CLD, SWAPGS, and the switch_to_thread_stack() call to the interrupt_entry() helper function. As we do not want call depths of two, convert switch_to_thread_stack() to a macro. However, switch_to_thread_stack() has another user in entry_64_compat.S, which currently expects it to be a function. To keep the code changes in this patch minimal, create a wrapper function. The switch to a macro means that there is some binary code duplication if CONFIG_IA32_EMULATION=y is enabled. Therefore, the size reduction differs whether CONFIG_IA32_EMULATION is enabled or not: CONFIG_IA32_EMULATION=y (-0.13k): text data bss dec hex filename 17158 0 0 17158 4306 entry_64.o-orig 17028 0 0 17028 4284 entry_64.o CONFIG_IA32_EMULATION=n (-0.27k): text data bss dec hex filename 17158 0 0 17158 4306 entry_64.o-orig 16882 0 0 16882 41f2 entry_64.o Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Moving the switch to IRQ stack from the interrupt macro to the helper function requires some trickery: All ENTER_IRQ_STACK really cares about is where the "original" stack -- meaning the GP registers etc. -- is stored. Therefore, we need to offset the stored RSP value by 8 whenever ENTER_IRQ_STACK is called from within a function. In such cases, and after switching to the IRQ stack, we need to push the "original" return address (i.e. the return address from the call to the interrupt entry function) to the IRQ stack. This trickery allows us to carve another .85k from the text size (it would be more except for the additional unwind hints): text data bss dec hex filename 18006 0 0 18006 4656 entry_64.o-orig 17158 0 0 17158 4306 entry_64.o Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-3-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
The PUSH_AND_CLEAR_REGS macro is able to insert the GP registers "above" the original return address. This allows us to move a sizeable part of the interrupt entry macro to an interrupt entry helper function: text data bss dec hex filename 21088 0 0 21088 5260 entry_64.o-orig 18006 0 0 18006 4656 entry_64.o Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180220210113.6725-2-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 2月, 2018 1 次提交
-
-
由 David Woodhouse 提交于
This reverts commit 1dde7415. By putting the RSB filling out of line and calling it, we waste one RSB slot for returning from the function itself, which means one fewer actual function call we can make if we're doing the Skylake abomination of call-depth counting. It also changed the number of RSB stuffings we do on vmexit from 32, which was correct, to 16. Let's just stop with the bikeshedding; it didn't actually *fix* anything anyway. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arjan.van.de.ven@intel.com Cc: bp@alien8.de Cc: dave.hansen@intel.com Cc: jmattson@google.com Cc: karahmed@amazon.de Cc: kvm@vger.kernel.org Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Link: http://lkml.kernel.org/r/1519037457-7643-4-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 2月, 2018 2 次提交
-
-
由 Dominik Brodowski 提交于
On some x86 CPU microarchitectures using 'xorq' to clear general-purpose registers is slower than 'xorl'. As 'xorl' is sufficient to clear all 64 bits of these registers due to zero-extension [*], switch the x86 64-bit entry code to use 'xorl'. No change in functionality and no change in code size. [*] According to Intel 64 and IA-32 Architecture Software Developer's Manual, section 3.4.1.1, the result of 32-bit operands are "zero- extended to a 64-bit result in the destination general-purpose register." The AMD64 Architecture Programmer’s Manual Volume 3, Appendix B.1, describes the same behaviour. Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180214175924.23065-3-linux@dominikbrodowski.net [ Improved on the changelog a bit. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Play a little trick in the generic PUSH_AND_CLEAR_REGS macro to insert the GP registers "above" the original return address. This allows us to (re-)insert the macro in error_entry() and paranoid_entry() and to remove it from the idtentry macro. This reduces the static footprint significantly: text data bss dec hex filename 24307 0 0 24307 5ef3 entry_64.o-orig 20987 0 0 20987 51fb entry_64.o Co-developed-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180214175924.23065-2-linux@dominikbrodowski.net [ Small tweaks to comments. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 2月, 2018 1 次提交
-
-
由 Ingo Molnar 提交于
Josh Poimboeuf noticed the following bug: "The paranoid exit code only restores the saved CR3 when it switches back to the user GS. However, even in the kernel GS case, it's possible that it needs to restore a user CR3, if for example, the paranoid exception occurred in the syscall exit path between SWITCH_TO_USER_CR3_STACK and SWAPGS." Josh also confirmed via targeted testing that it's possible to hit this bug. Fix the bug by also restoring CR3 in the paranoid_exit_no_swapgs branch. The reason we haven't seen this bug reported by users yet is probably because "paranoid" entry points are limited to the following cases: idtentry double_fault do_double_fault has_error_code=1 paranoid=2 idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK idtentry machine_check do_mce has_error_code=0 paranoid=1 Amongst those entry points only machine_check is one that will interrupt an IRQS-off critical section asynchronously - and machine check events are rare. The other main asynchronous entries are NMI entries, which can be very high-freq with perf profiling, but they are special: they don't use the 'idtentry' macro but are open coded and restore user CR3 unconditionally so don't have this bug. Reported-and-tested-by: NJosh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180214073910.boevmg65upbk3vqb@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 2月, 2018 9 次提交
-
-
由 Borislav Petkov 提交于
That macro was touched around 2.5.8 times, judging by the full history linux repo, but it was unused even then. Get rid of it already. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux@dominikbrodowski.net Link: http://lkml.kernel.org/r/20180212201318.GD14640@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
With the following commit: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros") ... one of my suggested improvements triggered a frame pointer warning: arch/x86/entry/entry_64.o: warning: objtool: paranoid_entry()+0x11: call without frame pointer save/setup The warning is correct for the build-time code, but it's actually not relevant at runtime because of paravirt patching. The paravirt swapgs call gets replaced with either a SWAPGS instruction or NOPs at runtime. Go back to the previous behavior by removing the ELF function annotation for paranoid_entry() and adding an unwind hint, which effectively silences the warning. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kbuild-all@01.org Cc: tipbuild@zytor.com Fixes: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros") Link: http://lkml.kernel.org/r/20180212174503.5acbymg5z6p32snu@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
... same as the other macros in arch/x86/entry/calling.h Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-8-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Previously, error_entry() and paranoid_entry() saved the GP registers onto stack space previously allocated by its callers. Combine these two steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro for that. This adds a significant amount ot text size. However, Ingo Molnar points out that: "these numbers also _very_ significantly over-represent the extra footprint. The assumptions that resulted in us compressing the IRQ entry code have changed very significantly with the new x86 IRQ allocation code we introduced in the last year: - IRQ vectors are usually populated in tightly clustered groups. With our new vector allocator code the typical per CPU allocation percentage on x86 systems is ~3 device vectors and ~10 fixed vectors out of ~220 vectors - i.e. a very low ~6% utilization (!). [...] The days where we allocated a lot of vectors on every CPU and the compression of the IRQ entry code text mattered are over. - Another issue is that only a small minority of vectors is frequent enough to actually matter to cache utilization in practice: 3-4 key IPIs and 1-2 device IRQs at most - and those vectors tend to be tightly clustered as well into about two groups, and are probably already on 2-3 cache lines in practice. For the common case of 'cache cold' IRQs it's the depth of the call chain and the fragmentation of the resulting I$ that should be the main performance limit - not the overall size of it. - The CPU side cost of IRQ delivery is still very expensive even in the best, most cached case, as in 'over a thousand cycles'. So much stuff is done that maybe contemporary x86 IRQ entry microcode already prefetches the IDT entry and its expected call target address."[*] [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need modification. Previously, %rsp was manually decreased by 15*8; with this patch, %rsp is decreased by 15 pushq instructions. [jpoimboe@redhat.com: unwind hint improvements] Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
entry_SYSCALL_64_after_hwframe() and nmi() can be converted to use PUSH_AND_CLEAN_REGS instead of opencoded variants thereof. Due to the interleaving, the additional XOR-based clearing of R8 and R9 in entry_SYSCALL_64_after_hwframe() should not have any noticeable negative implications. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
Same as is done for syscalls, interleave XOR with PUSH instructions for exceptions/interrupts, in order to minimize the cost of the additional instructions required for register clearing. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
The two special, opencoded cases for POP_C_REGS can be handled by ASM macros. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-3-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dominik Brodowski 提交于
All current code paths call SAVE_C_REGS and then immediately SAVE_EXTRA_REGS. Therefore, merge these two macros and order the MOV sequeneces properly. While at it, remove the macros to save all except specific registers, as these macros have been unused for a long time. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dan.j.williams@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-2-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 2月, 2018 1 次提交
-
-
由 Nadav Amit 提交于
The comment is confusing since the path is taken when CONFIG_PAGE_TABLE_ISOLATION=y is disabled (while the comment says it is not taken). Signed-off-by: NNadav Amit <namit@vmware.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: nadav.amit@gmail.com Link: http://lkml.kernel.org/r/20180209170638.15161-1-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 2月, 2018 4 次提交
-
-
由 Dan Williams 提交于
At entry userspace may have populated registers with values that could otherwise be useful in a speculative execution attack. Clear them to minimize the kernel's attack surface. Originally-From: Andi Kleen <ak@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151787989697.7847.4083702787288600552.stgit@dwillia2-desk3.amr.corp.intel.com [ Made small improvements to the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dan Williams 提交于
Clear the 'extra' registers on entering the 64-bit kernel for exceptions and interrupts. The common registers are not cleared since they are likely clobbered well before they can be exploited in a speculative execution attack. Originally-From: Andi Kleen <ak@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151787989146.7847.15749181712358213254.stgit@dwillia2-desk3.amr.corp.intel.com [ Made small improvements to the changelog and the code comments. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dan Williams 提交于
At entry userspace may have (maliciously) populated the extra registers outside the syscall calling convention with arbitrary values that could be useful in a speculative execution (Spectre style) attack. Clear these registers to minimize the kernel's attack surface. Note, this only clears the extra registers and not the unused registers for syscalls less than 6 arguments, since those registers are likely to be clobbered well before their values could be put to use under speculation. Note, Linus found that the XOR instructions can be executed with minimized cost if interleaved with the PUSH instructions, and Ingo's analysis found that R10 and R11 should be included in the register clearing beyond the typical 'extra' syscall calling convention registers. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Reported-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/151787988577.7847.16733592218894189003.stgit@dwillia2-desk3.amr.corp.intel.com [ Made small improvements to the changelog and the code comments. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathieu Desnoyers 提交于
There are two places where core serialization is needed by membarrier: 1) When returning from the membarrier IPI, 2) After scheduler updates curr to a thread with a different mm, before going back to user-space, since the curr->mm is used by membarrier to check whether it needs to send an IPI to that CPU. x86-32 uses IRET as return from interrupt, and both IRET and SYSEXIT to go back to user-space. The IRET instruction is core serializing, but not SYSEXIT. x86-64 uses IRET as return from interrupt, which takes care of the IPI. However, it can return to user-space through either SYSRETL (compat code), SYSRETQ, or IRET. Given that SYSRET{L,Q} is not core serializing, we rely instead on write_cr3() performed by switch_mm() to provide core serialization after changing the current mm, and deal with the special case of kthread -> uthread (temporarily keeping current mm into active_mm) by adding a sync_core() in that specific case. Use the new sync_core_before_usermode() to guarantee this. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-10-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 1月, 2018 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
Hyper-V supports Live Migration notification. This is supposed to be used in conjunction with TSC emulation: when a VM is migrated to a host with different TSC frequency for some short period the host emulates the accesses to TSC and sends an interrupt to notify about the event. When the guest is done updating everything it can disable TSC emulation and everything will start working fast again. These notifications weren't required until now as Hyper-V guests are not supposed to use TSC as a clocksource: in Linux the TSC is even marked as unstable on boot. Guests normally use 'tsc page' clocksource and host updates its values on migrations automatically. Things change when with nested virtualization: even when the PV clocksources (kvm-clock or tsc page) are passed through to the nested guests the TSC frequency and frequency changes need to be know.. Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies EAX:BIT(12) of CPUID:0x40000009 as the feature identification bit. The right one to check is EAX:BIT(13) of CPUID:0x40000003. I was assured that the fix in on the way. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: devel@linuxdriverproject.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Link: https://lkml.kernel.org/r/20180124132337.30138-4-vkuznets@redhat.com
-
由 Dan Williams 提交于
The syscall table base is a user controlled function pointer in kernel space. Use array_index_nospec() to prevent any out of bounds speculation. While retpoline prevents speculating into a userspace directed target it does not stop the pointer de-reference, the concern is leaking memory relative to the syscall table base, by observing instruction cache behavior. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Andy Lutomirski <luto@kernel.org> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727417984.33451.1216731042505722161.stgit@dwillia2-desk3.amr.corp.intel.com
-
- 30 1月, 2018 3 次提交
-
-
由 Andy Lutomirski 提交于
The TS_COMPAT bit is very hot and is accessed from code paths that mostly also touch thread_info::flags. Move it into struct thread_info to improve cache locality. The only reason it was in thread_struct is that there was a brief period during which arch-specific fields were not allowed in struct thread_info. Linus suggested further changing: ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); to: if (unlikely(ti->status & (TS_COMPAT|TS_I386_REGS_POKED))) ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED); on the theory that frequently dirtying the cacheline even in pure 64-bit code that never needs to modify status hurts performance. That could be a reasonable followup patch, but I suspect it matters less on top of this patch. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NIngo Molnar <mingo@kernel.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Link: https://lkml.kernel.org/r/03148bcc1b217100e6e8ecf6a5468c45cf4304b6.1517164461.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
With the fast path removed there is no point in splitting the push of the normal and the extra register set. Just push the extra regs right away. [ tglx: Split out from 'x86/entry/64: Remove the SYSCALL64 fast path' ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
-
由 Andy Lutomirski 提交于
The SYCALLL64 fast path was a nice, if small, optimization back in the good old days when syscalls were actually reasonably fast. Now there is PTI to slow everything down, and indirect branches are verboten, making everything messier. The retpoline code in the fast path is particularly nasty. Just get rid of the fast path. The slow path is barely slower. [ tglx: Split out the 'push all extra regs' part ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kernel Hardening <kernel-hardening@lists.openwall.com> Link: https://lkml.kernel.org/r/462dff8d4d64dfbfc851fbf3130641809d980ecd.1517164461.git.luto@kernel.org
-
- 28 1月, 2018 1 次提交
-
-
由 Borislav Petkov 提交于
Simplify it to call an asm-function instead of pasting 41 insn bytes at every call site. Also, add alignment to the macro as suggested here: https://support.google.com/faqs/answer/7625886 [dwmw2: Clean up comments, let it clobber %ebx and just tell the compiler] Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: ak@linux.intel.com Cc: dave.hansen@intel.com Cc: karahmed@amazon.de Cc: arjan@linux.intel.com Cc: torvalds@linux-foundation.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: pbonzini@redhat.com Cc: tim.c.chen@linux.intel.com Cc: gregkh@linux-foundation.org Link: https://lkml.kernel.org/r/1517070274-12128-3-git-send-email-dwmw@amazon.co.uk
-
- 19 1月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
The machine check idtentry uses an indirect branch directly from the low level code. This evades the speculation protection. Replace it by a direct call into C code and issue the indirect call there so the compiler can apply the proper speculation protection. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@alien8.de> Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk> Niced-by: NPeter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801181626290.1847@nanos
-
- 15 1月, 2018 1 次提交
-
-
由 David Woodhouse 提交于
On context switch from a shallow call stack to a deeper one, as the CPU does 'ret' up the deeper side it may encounter RSB entries (predictions for where the 'ret' goes to) which were populated in userspace. This is problematic if neither SMEP nor KPTI (the latter of which marks userspace pages as NX for the kernel) are active, as malicious code in userspace may then be executed speculatively. Overwrite the CPU's return prediction stack with calls which are predicted to return to an infinite loop, to "capture" speculation if this happens. This is required both for retpoline, and also in conjunction with IBRS for !SMEP && !KPTI. On Skylake+ the problem is slightly different, and an *underflow* of the RSB may cause errant branch predictions to occur. So there it's not so much overwrite, as *filling* the RSB to attempt to prevent it getting empty. This is only a partial solution for Skylake+ since there are many other conditions which may result in the RSB becoming empty. The full solution on Skylake+ is to use IBRS, which will prevent the problem even when the RSB becomes empty. With IBRS, the RSB-stuffing will not be required on context switch. [ tglx: Added missing vendor check and slighty massaged comments and changelog ] Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NArjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515779365-9032-1-git-send-email-dwmw@amazon.co.uk
-
- 14 1月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
The switch to the user space page tables in the low level ASM code sets unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base address of the page directory to the user part, bit 11 is switching the PCID to the PCID associated with the user page tables. This fails on a machine which lacks PCID support because bit 11 is set in CR3. Bit 11 is reserved when PCID is inactive. While the Intel SDM claims that the reserved bits are ignored when PCID is disabled, the AMD APM states that they should be cleared. This went unnoticed as the AMD APM was not checked when the code was developed and reviewed and test systems with Intel CPUs never failed to boot. The report is against a Centos 6 host where the guest fails to boot, so it's not yet clear whether this is a virt issue or can happen on real hardware too, but thats irrelevant as the AMD APM clearly ask for clearing the reserved bits. Make sure that on non PCID machines bit 11 is not set by the page table switching code. Andy suggested to rename the related bits and masks so they are clearly describing what they should be used for, which is done as well for clarity. That split could have been done with alternatives but the macro hell is horrible and ugly. This can be done on top if someone cares to remove the extra orq. For now it's a straight forward fix. Fixes: 6fd166aa ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: NLaura Abbott <labbott@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Willy Tarreau <w@1wt.eu> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
-
- 12 1月, 2018 1 次提交
-
-
由 David Woodhouse 提交于
Convert indirect jumps in core 32/64bit entry assembler code to use non-speculative sequences when CONFIG_RETPOLINE is enabled. Don't use CALL_NOSPEC in entry_SYSCALL_64_fastpath because the return address after the 'call' instruction must be *precisely* at the .Lentry_SYSCALL_64_after_fastpath label for stub_ptregs_64 to work, and the use of alternatives will mess that up unless we play horrid games to prepend with NOPs and make the variants the same length. It's not worth it; in the case where we ALTERNATIVE out the retpoline, the first instruction at __x86.indirect_thunk.rax is going to be a bare jmp *%rax anyway. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Acked-by: NArjan van de Ven <arjan@linux.intel.com> Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: thomas.lendacky@amd.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kees Cook <keescook@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Link: https://lkml.kernel.org/r/1515707194-20531-7-git-send-email-dwmw@amazon.co.uk
-
- 04 1月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
The preparation for PTI which added CR3 switching to the entry code misplaced the CR3 switch in entry_SYSCALL_compat(). With PTI enabled the entry code tries to access a per cpu variable after switching to kernel GS. This fails because that variable is not mapped to user space. This results in a double fault and in the worst case a kernel crash. Move the switch ahead of the access and clobber RSP which has been saved already. Fixes: 8a09317b ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching") Reported-by: NLars Wendler <wendler.lars@web.de> Reported-by: NLaura Abbott <labbott@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org>, Cc: Dave Hansen <dave.hansen@linux.intel.com>, Cc: Peter Zijlstra <peterz@infradead.org>, Cc: Greg KH <gregkh@linuxfoundation.org>, , Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>, Cc: Juergen Gross <jgross@suse.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos
-
- 24 12月, 2017 3 次提交
-
-
由 Peter Zijlstra 提交于
Most NMI/paranoid exceptions will not in fact change pagetables and would thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3 writes. Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the kernel mappings and now that we track which user PCIDs need flushing we can avoid those too when possible. This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both sites have plenty available. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
We can use PCID to retain the TLBs across CR3 switches; including those now part of the user/kernel switch. This increases performance of kernel entry/exit at the cost of more expensive/complicated TLB flushing. Now that we have two address spaces, one for kernel and one for user space, we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID (just like we use the PFN LSB for the PGD). Since we do TLB invalidation from kernel space, the existing code will only invalidate the kernel PCID, we augment that by marking the corresponding user PCID invalid, and upon switching back to userspace, use a flushing CR3 write for the switch. In order to access the user_pcid_flush_mask we use PER_CPU storage, which means the previously established SWAPGS vs CR3 ordering is now mandatory and required. Having to do this memory access does require additional registers, most sites have a functioning stack and we can spill one (RAX), sites without functional stack need to otherwise provide the second scratch register. Note: PCID is generally available on Intel Sandybridge and later CPUs. Note: Up until this point TLB flushing was broken in this series. Based-on-code-from: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
Make VSYSCALLs work fully in PTI mode by mapping them properly to the user space visible page tables. [ tglx: Hide unused functions (Patch by Arnd Bergmann) ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-