1. 24 12月, 2017 1 次提交
    • T
      x86/cpufeatures: Add X86_BUG_CPU_INSECURE · a89f040f
      Thomas Gleixner 提交于
      Many x86 CPUs leak information to user space due to missing isolation of
      user space and kernel space page tables. There are many well documented
      ways to exploit that.
      
      The upcoming software migitation of isolating the user and kernel space
      page tables needs a misfeature flag so code can be made runtime
      conditional.
      
      Add the BUG bits which indicates that the CPU is affected and add a feature
      bit which indicates that the software migitation is enabled.
      
      Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be
      made later.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a89f040f
  2. 23 12月, 2017 7 次提交
    • T
      init: Invoke init_espfix_bsp() from mm_init() · 613e396b
      Thomas Gleixner 提交于
      init_espfix_bsp() needs to be invoked before the page table isolation
      initialization. Move it into mm_init() which is the place where pti_init()
      will be added.
      
      While at it get rid of the #ifdeffery and provide proper stub functions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      613e396b
    • T
      x86/cpu_entry_area: Move it out of the fixmap · 92a0f81d
      Thomas Gleixner 提交于
      Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
      and 0-day already hit a case where the fixmap PTEs were cleared by
      cleanup_highmap().
      
      Aside of that the fixmap API is a pain as it's all backwards.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      92a0f81d
    • T
      x86/cpu_entry_area: Move it to a separate unit · ed1bbc40
      Thomas Gleixner 提交于
      Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed1bbc40
    • P
      x86/microcode: Dont abuse the TLB-flush interface · 23cb7d46
      Peter Zijlstra 提交于
      Commit:
      
        ec400dde ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
      
      ... grubbed into tlbflush internals without coherent explanation.
      
      Since it says its a precaution and the SDM doesn't mention anything like
      this, take it out back.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: fenghua.yu@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      23cb7d46
    • D
      x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack · 4fe2d8b1
      Dave Hansen 提交于
      If the kernel oopses while on the trampoline stack, it will print
      "<SYSENTER>" even if SYSENTER is not involved.  That is rather confusing.
      
      The "SYSENTER" stack is used for a lot more than SYSENTER now.  Give it a
      better string to display in stack dumps, and rename the kernel code to
      match.
      
      Also move the 32-bit code over to the new naming even though it still uses
      the entry stack only for SYSENTER.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4fe2d8b1
    • T
      x86/ldt: Prevent LDT inheritance on exec · a4828f81
      Thomas Gleixner 提交于
      The LDT is inherited across fork() or exec(), but that makes no sense
      at all because exec() is supposed to start the process clean.
      
      The reason why this happens is that init_new_context_ldt() is called from
      init_new_context() which obviously needs to be called for both fork() and
      exec().
      
      It would be surprising if anything relies on that behaviour, so it seems to
      be safe to remove that misfeature.
      
      Split the context initialization into two parts. Clear the LDT pointer and
      initialize the mutex from the general context init and move the LDT
      duplication to arch_dup_mmap() which is only called on fork().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: dan.j.williams@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a4828f81
    • P
      x86/ldt: Rework locking · c2b3496b
      Peter Zijlstra 提交于
      The LDT is duplicated on fork() and on exec(), which is wrong as exec()
      should start from a clean state, i.e. without LDT. To fix this the LDT
      duplication code will be moved into arch_dup_mmap() which is only called
      for fork().
      
      This introduces a locking problem. arch_dup_mmap() holds mmap_sem of the
      parent process, but the LDT duplication code needs to acquire
      mm->context.lock to access the LDT data safely, which is the reverse lock
      order of write_ldt() where mmap_sem nests into context.lock.
      
      Solve this by introducing a new rw semaphore which serializes the
      read/write_ldt() syscall operations and use context.lock to protect the
      actual installment of the LDT descriptor.
      
      So context.lock stabilizes mm->context.ldt and can nest inside of the new
      semaphore or mmap_sem.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: dan.j.williams@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c2b3496b
  3. 17 12月, 2017 24 次提交
    • T
      x86/cpufeatures: Make CPU bugs sticky · 6cbd2171
      Thomas Gleixner 提交于
      There is currently no way to force CPU bug bits like CPU feature bits. That
      makes it impossible to set a bug bit once at boot and have it stick for all
      upcoming CPUs.
      
      Extend the force set/clear arrays to handle bug bits as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.992156574@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6cbd2171
    • T
      x86/paravirt: Dont patch flush_tlb_single · a0357954
      Thomas Gleixner 提交于
      native_flush_tlb_single() will be changed with the upcoming
      PAGE_TABLE_ISOLATION feature. This requires to have more code in
      there than INVLPG.
      
      Remove the paravirt patching for it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Cc: michael.schwarz@iaik.tugraz.at
      Cc: moritz.lipp@iaik.tugraz.at
      Cc: richard.fellner@student.tugraz.at
      Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a0357954
    • A
      x86/entry/64: Make cpu_entry_area.tss read-only · c482feef
      Andy Lutomirski 提交于
      The TSS is a fairly juicy target for exploits, and, now that the TSS
      is in the cpu_entry_area, it's no longer protected by kASLR.  Make it
      read-only on x86_64.
      
      On x86_32, it can't be RO because it's written by the CPU during task
      switches, and we use a task gate for double faults.  I'd also be
      nervous about errata if we tried to make it RO even on configurations
      without double fault handling.
      
      [ tglx: AMD confirmed that there is no problem on 64-bit with TSS RO.  So
        	it's probably safe to assume that it's a non issue, though Intel
        	might have been creative in that area. Still waiting for
        	confirmation. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bpetkov@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.733700132@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c482feef
    • A
      x86/entry: Clean up the SYSENTER_stack code · 0f9a4810
      Andy Lutomirski 提交于
      The existing code was a mess, mainly because C arrays are nasty.  Turn
      SYSENTER_stack into a struct, add a helper to find it, and do all the
      obvious cleanups this enables.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bpetkov@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.653244723@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0f9a4810
    • A
      x86/entry/64: Remove the SYSENTER stack canary · 7fbbd5cb
      Andy Lutomirski 提交于
      Now that the SYSENTER stack has a guard page, there's no need for a canary
      to detect overflow after the fact.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.572577316@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fbbd5cb
    • A
      x86/entry/64: Move the IST stacks into struct cpu_entry_area · 40e7f949
      Andy Lutomirski 提交于
      The IST stacks are needed when an IST exception occurs and are accessed
      before any kernel code at all runs.  Move them into struct cpu_entry_area.
      
      The IST stacks are unlike the rest of cpu_entry_area: they're used even for
      entries from kernel mode.  This means that they should be set up before we
      load the final IDT.  Move cpu_entry_area setup to trap_init() for the boot
      CPU and set it up for all possible CPUs at once in native_smp_prepare_cpus().
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.480598743@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      40e7f949
    • A
      x86/entry/64: Create a per-CPU SYSCALL entry trampoline · 3386bc8a
      Andy Lutomirski 提交于
      Handling SYSCALL is tricky: the SYSCALL handler is entered with every
      single register (except FLAGS), including RSP, live.  It somehow needs
      to set RSP to point to a valid stack, which means it needs to save the
      user RSP somewhere and find its own stack pointer.  The canonical way
      to do this is with SWAPGS, which lets us access percpu data using the
      %gs prefix.
      
      With PAGE_TABLE_ISOLATION-like pagetable switching, this is
      problematic.  Without a scratch register, switching CR3 is impossible, so
      %gs-based percpu memory would need to be mapped in the user pagetables.
      Doing that without information leaks is difficult or impossible.
      
      Instead, use a different sneaky trick.  Map a copy of the first part
      of the SYSCALL asm at a different address for each CPU.  Now RIP
      varies depending on the CPU, so we can use RIP-relative memory access
      to access percpu memory.  By putting the relevant information (one
      scratch slot and the stack address) at a constant offset relative to
      RIP, we can make SYSCALL work without relying on %gs.
      
      A nice thing about this approach is that we can easily switch it on
      and off if we want pagetable switching to be configurable.
      
      The compat variant of SYSCALL doesn't have this problem in the first
      place -- there are plenty of scratch registers, since we don't care
      about preserving r8-r15.  This patch therefore doesn't touch SYSCALL32
      at all.
      
      This patch actually seems to be a small speedup.  With this patch,
      SYSCALL touches an extra cache line and an extra virtual page, but
      the pipeline no longer stalls waiting for SWAPGS.  It seems that, at
      least in a tight loop, the latter outweights the former.
      
      Thanks to David Laight for an optimization tip.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bpetkov@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.403607157@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3386bc8a
    • A
      x86/entry/64: Use a per-CPU trampoline stack for IDT entries · 7f2590a1
      Andy Lutomirski 提交于
      Historically, IDT entries from usermode have always gone directly
      to the running task's kernel stack.  Rearrange it so that we enter on
      a per-CPU trampoline stack and then manually switch to the task's stack.
      This touches a couple of extra cachelines, but it gives us a chance
      to run some code before we touch the kernel stack.
      
      The asm isn't exactly beautiful, but I think that fully refactoring
      it can wait.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.225330557@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f2590a1
    • A
      x86/espfix/64: Stop assuming that pt_regs is on the entry stack · 6d9256f0
      Andy Lutomirski 提交于
      When we start using an entry trampoline, a #GP from userspace will
      be delivered on the entry stack, not on the task stack.  Fix the
      espfix64 #DF fixup to set up #GP according to TSS.SP0, rather than
      assuming that pt_regs + 1 == SP0.  This won't change anything
      without an entry stack, but it will make the code continue to work
      when an entry stack is added.
      
      While we're at it, improve the comments to explain what's actually
      going on.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.130778051@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6d9256f0
    • A
      x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0 · 9aaefe7b
      Andy Lutomirski 提交于
      On 64-bit kernels, we used to assume that TSS.sp0 was the current
      top of stack.  With the addition of an entry trampoline, this will
      no longer be the case.  Store the current top of stack in TSS.sp1,
      which is otherwise unused but shares the same cacheline.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150606.050864668@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9aaefe7b
    • A
      x86/entry: Remap the TSS into the CPU entry area · 72f5e08d
      Andy Lutomirski 提交于
      This has a secondary purpose: it puts the entry stack into a region
      with a well-controlled layout.  A subsequent patch will take
      advantage of this to streamline the SYSCALL entry code to be able to
      find it more easily.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bpetkov@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.962042855@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      72f5e08d
    • A
      x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct · 1a935bc3
      Andy Lutomirski 提交于
      SYSENTER_stack should have reliable overflow detection, which
      means that it needs to be at the bottom of a page, not the top.
      Move it to the beginning of struct tss_struct and page-align it.
      
      Also add an assertion to make sure that the fixed hardware TSS
      doesn't cross a page boundary.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.881827433@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1a935bc3
    • A
      x86/dumpstack: Handle stack overflow on all stacks · 6e60e583
      Andy Lutomirski 提交于
      We currently special-case stack overflow on the task stack.  We're
      going to start putting special stacks in the fixmap with a custom
      layout, so they'll have guard pages, too.  Teach the unwinder to be
      able to unwind an overflow of any of the stacks.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.802057305@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e60e583
    • A
      x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss · 7fb983b4
      Andy Lutomirski 提交于
      A future patch will move SYSENTER_stack to the beginning of cpu_tss
      to help detect overflow.  Before this can happen, fix several code
      paths that hardcode assumptions about the old layout.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.722425540@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fb983b4
    • A
      x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area · ef8813ab
      Andy Lutomirski 提交于
      Currently, the GDT is an ad-hoc array of pages, one per CPU, in the
      fixmap.  Generalize it to be an array of a new 'struct cpu_entry_area'
      so that we can cleanly add new things to it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.563271721@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef8813ab
    • A
      x86/dumpstack: Add get_stack_info() support for the SYSENTER stack · 33a2f1a6
      Andy Lutomirski 提交于
      get_stack_info() doesn't currently know about the SYSENTER stack, so
      unwinding will fail if we entered the kernel on the SYSENTER stack
      and haven't fully switched off.  Teach get_stack_info() about the
      SYSENTER stack.
      
      With future patches applied that run part of the entry code on the
      SYSENTER stack and introduce an intentional BUG(), I would get:
      
        PANIC: double fault, error_code: 0x0
        ...
        RIP: 0010:do_error_trap+0x33/0x1c0
        ...
        Call Trace:
        Code: ...
      
      With this patch, I get:
      
        PANIC: double fault, error_code: 0x0
        ...
        Call Trace:
         <SYSENTER>
         ? async_page_fault+0x36/0x60
         ? invalid_op+0x22/0x40
         ? async_page_fault+0x36/0x60
         ? sync_regs+0x3c/0x40
         ? sync_regs+0x2e/0x40
         ? error_entry+0x6c/0xd0
         ? async_page_fault+0x36/0x60
         </SYSENTER>
        Code: ...
      
      which is a lot more informative.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.392711508@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      33a2f1a6
    • A
      x86/entry/64: Allocate and enable the SYSENTER stack · 1a79797b
      Andy Lutomirski 提交于
      This will simplify future changes that want scratch variables early in
      the SYSENTER handler -- they'll be able to spill registers to the
      stack.  It also lets us get rid of a SWAPGS_UNSAFE_STACK user.
      
      This does not depend on CONFIG_IA32_EMULATION=y because we'll want the
      stack space even without IA32 emulation.
      
      As far as I can tell, the reason that this wasn't done from day 1 is
      that we use IST for #DB and #BP, which is IMO rather nasty and causes
      a lot more problems than it solves.  But, since #DB uses IST, we don't
      actually need a real stack for SYSENTER (because SYSENTER with TF set
      will invoke #DB on the IST stack rather than the SYSENTER stack).
      
      I want to remove IST usage from these vectors some day, and this patch
      is a prerequisite for that as well.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.312726423@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1a79797b
    • A
      x86/irq/64: Print the offending IP in the stack overflow warning · 4f3789e7
      Andy Lutomirski 提交于
      In case something goes wrong with unwind (not unlikely in case of
      overflow), print the offending IP where we detected the overflow.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.231677119@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4f3789e7
    • A
      x86/irq: Remove an old outdated comment about context tracking races · 6669a692
      Andy Lutomirski 提交于
      That race has been fixed and code cleaned up for a while now.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.150551639@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6669a692
    • J
      x86/unwinder: Handle stack overflows more gracefully · b02fcf9b
      Josh Poimboeuf 提交于
      There are at least two unwinder bugs hindering the debugging of
      stack-overflow crashes:
      
      - It doesn't deal gracefully with the case where the stack overflows and
        the stack pointer itself isn't on a valid stack but the
        to-be-dereferenced data *is*.
      
      - The ORC oops dump code doesn't know how to print partial pt_regs, for the
        case where if we get an interrupt/exception in *early* entry code
        before the full pt_regs have been saved.
      
      Fix both issues.
      
      http://lkml.kernel.org/r/20171126024031.uxi4numpbjm5rlbr@trebleSigned-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bpetkov@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150605.071425003@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b02fcf9b
    • A
      x86/unwinder/orc: Dont bail on stack overflow · d3a09104
      Andy Lutomirski 提交于
      If the stack overflows into a guard page and the ORC unwinder should work
      well: by construction, there can't be any meaningful data in the guard page
      because no writes to the guard page will have succeeded.
      
      But there is a bug that prevents unwinding from working correctly: if the
      starting register state has RSP pointing into a stack guard page, the ORC
      unwinder bails out immediately.
      
      Instead of bailing out immediately check whether the next page up is a
      valid check page and if so analyze that. As a result the ORC unwinder will
      start the unwind.
      
      Tested by intentionally overflowing the task stack.  The result is an
      accurate call trace instead of a trace consisting purely of '?' entries.
      
      There are a few other bugs that are triggered if the unwinder encounters a
      stack overflow after the first step, but they are outside the scope of this
      fix.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Link: https://lkml.kernel.org/r/20171204150604.991389777@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d3a09104
    • B
      x86/entry/64/paravirt: Use paravirt-safe macro to access eflags · e17f8234
      Boris Ostrovsky 提交于
      Commit 1d3e53e8 ("x86/entry/64: Refactor IRQ stacks and make them
      NMI-safe") added DEBUG_ENTRY_ASSERT_IRQS_OFF macro that acceses eflags
      using 'pushfq' instruction when testing for IF bit. On PV Xen guests
      looking at IF flag directly will always see it set, resulting in 'ud2'.
      
      Introduce SAVE_FLAGS() macro that will use appropriate save_fl pv op when
      running paravirt.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: xen-devel@lists.xenproject.org
      Link: https://lkml.kernel.org/r/20171204150604.899457242@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e17f8234
    • W
      locking/barriers: Convert users of lockless_dereference() to READ_ONCE() · 3382290e
      Will Deacon 提交于
      [ Note, this is a Git cherry-pick of the following commit:
      
          506458ef ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()")
      
        ... for easier x86 PTI code testing and back-porting. ]
      
      READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it
      can be used instead of lockless_dereference() without any change in
      semantics.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3382290e
    • R
      x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD · f2dbad36
      Rudolf Marek 提交于
      [ Note, this is a Git cherry-pick of the following commit:
      
          2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD")
      
        ... for easier x86 PTI code testing and back-porting. ]
      
      The latest AMD AMD64 Architecture Programmer's Manual
      adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]).
      
      If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES
      / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers,
      thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs.
      Signed-Off-By: NRudolf Marek <r.marek@assembler.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2dbad36
  4. 11 11月, 2017 1 次提交
  5. 10 11月, 2017 3 次提交
  6. 09 11月, 2017 1 次提交
    • Y
      x86/idt: Remove X86_TRAP_BP initialization in idt_setup_traps() · d0cd64b0
      Yonghong Song 提交于
      Commit b70543a0("x86/idt: Move regular trap init to tables") moves
      regular trap init for each trap vector into a table based
      initialization. It introduced the initialization for vector X86_TRAP_BP
      which was not in the code which it replaced. This breaks uprobe
      functionality for x86_32; the probed program segfaults instead of handling
      the probe proper.
      
      The reason for this is that TRAP_BP is set up as system interrupt gate
      (DPL3) in the early IDT and then replaced by a regular interrupt gate
      (DPL0) in idt_setup_traps(). The DPL0 restriction causes the int3 trap
      to fail with a #GP resulting in a SIGSEGV of the probed program.
      
      On 64bit this does not cause a problem because the IDT entry is replaced
      with a system interrupt gate (DPL3) with interrupt stack afterwards.
      
      Remove X86_TRAP_BP from the def_idts table which is used in
      idt_setup_traps(). Remove a redundant entry for X86_TRAP_NMI in def_idts
      while at it. Tested on both x86_64 and x86_32.
      
      [ tglx: Amended changelog with a description of the root cause ]
      
      Fixes: b70543a0("x86/idt: Move regular trap init to tables")
      Reported-and-tested-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: a.p.zijlstra@chello.nl
      Cc: ast@fb.com
      Cc: oleg@redhat.com
      Cc: luto@kernel.org
      Cc: kernel-team@fb.com
      Link: https://lkml.kernel.org/r/20171108192845.552709-1-yhs@fb.com
      d0cd64b0
  7. 08 11月, 2017 1 次提交
    • J
      x86/unwind: Disable KASAN checking in the ORC unwinder · 881125bf
      Josh Poimboeuf 提交于
      Fengguang reported a KASAN warning:
      
        Kprobe smoke test: started
        ==================================================================
        BUG: KASAN: stack-out-of-bounds in deref_stack_reg+0xb5/0x11a
        Read of size 8 at addr ffff8800001c7cd8 by task swapper/1
      
        CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.0-rc8 #26
        Call Trace:
         <#DB>
         ...
         save_trace+0xd9/0x1d3
         mark_lock+0x5f7/0xdc3
         __lock_acquire+0x6b4/0x38ef
         lock_acquire+0x1a1/0x2aa
         _raw_spin_lock_irqsave+0x46/0x55
         kretprobe_table_lock+0x1a/0x42
         pre_handler_kretprobe+0x3f5/0x521
         kprobe_int3_handler+0x19c/0x25f
         do_int3+0x61/0x142
         int3+0x30/0x60
        [...]
      
      The ORC unwinder got confused by some kprobes changes, which isn't
      surprising since the runtime code no longer matches vmlinux and the
      stack was modified for kretprobes.
      
      Until we have a way for generated code to register changes with the
      unwinder, these types of warnings are inevitable.  So just disable KASAN
      checks for stack accesses in the ORC unwinder.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171108021934.zbl6unh5hpugybc5@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      881125bf
  8. 07 11月, 2017 1 次提交
  9. 05 11月, 2017 1 次提交