1. 13 4月, 2016 1 次提交
  2. 10 3月, 2016 1 次提交
  3. 01 2月, 2016 2 次提交
  4. 29 1月, 2016 4 次提交
  5. 24 11月, 2015 2 次提交
    • A
      x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots · 478dc89c
      Andy Lutomirski 提交于
      On CONFIG_CONTEXT_TRACKING kernels that have context tracking
      disabled at runtime (which includes most distro kernels), we
      still have the overhead of a call to enter_from_user_mode in
      interrupt and exception entries.
      
      If jump labels are available, this uses the jump label
      infrastructure to skip the call.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/73ee804fff48cd8c66b65b724f9f728a11a8c686.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      478dc89c
    • A
      x86/entry/64: Fix irqflag tracing wrt context tracking · f1075053
      Andy Lutomirski 提交于
      Paolo pointed out that enter_from_user_mode could be called
      while irqflags were traced as though IRQs were on.
      
      In principle, this could confuse lockdep.  It doesn't cause any
      problems that I've seen in any configuration, but if I build
      with CONFIG_DEBUG_LOCKDEP=y, enable a nohz_full CPU, and add
      code like:
      
      	if (irqs_disabled()) {
      		spin_lock(&something);
      		spin_unlock(&something);
      	}
      
      to the top of enter_from_user_mode, then lockdep will complain
      without this fix.  It seems that lockdep's irqflags sanity
      checks are too weak to detect this bug without forcing the
      issue.
      
      This patch adds one byte to normal kernels, and it's IMO a bit
      ugly. I haven't spotted a better way to do this yet, though.
      The issue is that we can't do TRACE_IRQS_OFF until after SWAPGS
      (if needed), but we're also supposed to do it before calling C
      code.
      
      An alternative approach would be to call trace_hardirqs_off in
      enter_from_user_mode.  That would be less code and would not
      bloat normal kernels at all, but it would be harder to see how
      the code worked.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/86237e362390dfa6fec12de4d75a238acb0ae787.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1075053
  6. 09 10月, 2015 2 次提交
  7. 07 10月, 2015 1 次提交
  8. 23 9月, 2015 2 次提交
  9. 17 7月, 2015 8 次提交
    • A
      x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code · a97439aa
      Andy Lutomirski 提交于
      It turns out to be rather tedious to test the NMI nesting code.
      Make it easier: add a new CONFIG_DEBUG_ENTRY option that causes
      the NMI handler to pre-emptively unmask NMIs.
      
      With this option set, errors in the repeat_nmi logic or failures
      to detect that we're in a nested NMI will result in quick panics
      under perf (especially if multiple counters are running at high
      frequency) instead of requiring an unusual workload that
      generates page faults or breakpoints inside NMIs.
      
      I called it CONFIG_DEBUG_ENTRY instead of CONFIG_DEBUG_NMI_ENTRY
      because I want to add new non-NMI checks elsewhere in the entry
      code in the future, and I'd rather not add too many new config
      options or add this option and then immediately rename it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a97439aa
    • A
      x86/nmi/64: Make the "NMI executing" variable more consistent · 36f1a77b
      Andy Lutomirski 提交于
      Currently, "NMI executing" is one the first time an outermost
      NMI hits repeat_nmi and zero thereafter.  Change it to be zero
      each time for consistency.
      
      This is intended to help NMI handling fail harder if it's buggy.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      36f1a77b
    • A
      x86/nmi/64: Minor asm simplification · 23a781e9
      Andy Lutomirski 提交于
      Replace LEA; MOV with an equivalent SUB.  This saves one
      instruction.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      23a781e9
    • A
      x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · 810bc075
      Andy Lutomirski 提交于
      We have a tricky bug in the nested NMI code: if we see RSP
      pointing to the NMI stack on NMI entry from kernel mode, we
      assume that we are executing a nested NMI.
      
      This isn't quite true.  A malicious userspace program can point
      RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
      happen while RSP is still pointing at the NMI stack.
      
      Fix it with a sneaky trick.  Set DF in the region of code that
      the RSP check is intended to detect.  IRET will clear DF
      atomically.
      
      ( Note: other than paravirt, there's little need for all this
        complexity. We could check RIP instead of RSP. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      810bc075
    • A
      x86/nmi/64: Reorder nested NMI checks · a27507ca
      Andy Lutomirski 提交于
      Check the repeat_nmi .. end_repeat_nmi special case first.  The
      next patch will rework the RSP check and, as a side effect, the
      RSP check will no longer detect repeat_nmi .. end_repeat_nmi, so
      we'll need this ordering of the checks.
      
      Note: this is more subtle than it appears.  The check for
      repeat_nmi .. end_repeat_nmi jumps straight out of the NMI code
      instead of adjusting the "iret" frame to force a repeat.  This
      is necessary, because the code between repeat_nmi and
      end_repeat_nmi sets "NMI executing" and then writes to the
      "iret" frame itself.  If a nested NMI comes in and modifies the
      "iret" frame while repeat_nmi is also modifying it, we'll end up
      with garbage.  The old code got this right, as does the new
      code, but the new code is a bit more explicit.
      
      If we were to move the check right after the "NMI executing"
      check, then we'd get it wrong and have random crashes.
      
      ( Because the "NMI executing" check would jump to the code that would
        modify the "iret" frame without checking if the interrupted NMI was
        currently modifying it. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a27507ca
    • A
      x86/nmi/64: Improve nested NMI comments · 0b22930e
      Andy Lutomirski 提交于
      I found the nested NMI documentation to be difficult to follow.
      Improve the comments.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0b22930e
    • A
      x86/nmi/64: Switch stacks on userspace NMI entry · 9b6e6a83
      Andy Lutomirski 提交于
      Returning to userspace is tricky: IRET can fail, and ESPFIX can
      rearrange the stack prior to IRET.
      
      The NMI nesting fixup relies on a precise stack layout and
      atomic IRET.  Rather than trying to teach the NMI nesting fixup
      to handle ESPFIX and failed IRET, punt: run NMIs that came from
      user mode on the normal kernel stack.
      
      This will make some nested NMIs visible to C code, but the C
      code is okay with that.
      
      As a side effect, this should speed up perf: it eliminates an
      RDMSR when NMIs come from user mode.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9b6e6a83
    • A
      x86/nmi/64: Remove asm code that saves CR2 · 0e181bb5
      Andy Lutomirski 提交于
      Now that do_nmi saves CR2, we don't need to save it in asm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0e181bb5
  10. 07 7月, 2015 7 次提交
  11. 10 6月, 2015 1 次提交
    • A
      x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code · 539f5113
      Andy Lutomirski 提交于
      The error_entry/error_exit code to handle gsbase and whether we
      return to user mdoe was a mess:
      
       - error_sti was misnamed.  In particular, it did not enable interrupts.
      
       - Error handling for gs_change was hopelessly tangled the normal
         usermode path.  Separate it out.  This saves a branch in normal
         entries from kernel mode.
      
       - The comments were bad.
      
      Fix it up.  As a nice side effect, there's now a code path that
      happens on error entries from user mode.  We'll use it soon.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/f1be898ab93360169fb845ab85185948832209ee.1433878454.git.luto@kernel.org
      [ Prettified it, clarified comments some more. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      539f5113
  12. 09 6月, 2015 1 次提交
    • I
      x86/asm/entry/64: Clean up entry_64.S · 4d732138
      Ingo Molnar 提交于
      Make the 64-bit syscall entry code a bit more readable:
      
       - use consistent assembly coding style similar to the other entry_*.S files
      
       - remove old comments that are not true anymore
      
       - eliminate whitespace noise
      
       - use consistent vertical spacing
      
       - fix various comments
      
       - reorganize entry point generation tables to be more readable
      
      No code changed:
      
        # arch/x86/entry/entry_64.o:
      
         text    data     bss     dec     hex filename
        12282       0       0   12282    2ffa entry_64.o.before
        12282       0       0   12282    2ffa entry_64.o.after
      
      md5:
         cbab1f2d727a2a8a87618eeb79f391b7  entry_64.o.before.asm
         cbab1f2d727a2a8a87618eeb79f391b7  entry_64.o.after.asm
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4d732138
  13. 08 6月, 2015 1 次提交
    • I
      x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32 · b2502b41
      Ingo Molnar 提交于
      The 'system_call' entry points differ starkly between native 32-bit and 64-bit
      kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on
      64-bit it's the SYSCALL entry point.
      
      This is pretty confusing when looking at generic code, and it also obscures
      the nature of the entry point at the assembly level.
      
      So unangle this by splitting the name into its two uses:
      
      	system_call (32) -> entry_INT80_32
      	system_call (64) -> entry_SYSCALL_64
      
      As per the generic naming scheme for x86 system call entry points:
      
      	entry_MNEMONIC_qualifier
      
      where 'qualifier' is one of _32, _64 or _compat.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b2502b41
  14. 07 6月, 2015 1 次提交
    • I
      x86/asm/entry/64/compat: Rename ia32entry.S -> entry_64_compat.S · 138bd56a
      Ingo Molnar 提交于
      So we now have the following system entry code related
      files, which define the following system call instruction
      and other entry paths:
      
         entry_32.S            # 32-bit binaries on 32-bit kernels
         entry_64.S            # 64-bit binaries on 64-bit kernels
         entry_64_compat.S	 # 32-bit binaries on 64-bit kernels
      
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      138bd56a
  15. 05 6月, 2015 1 次提交
  16. 04 6月, 2015 2 次提交
    • I
      x86/asm/entry: Move arch/x86/include/asm/calling.h to arch/x86/entry/ · d36f9479
      Ingo Molnar 提交于
      asm/calling.h is private to the entry code, make this more apparent
      by moving it to the new arch/x86/entry/ directory.
      
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d36f9479
    • I
      x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/ · 905a36a2
      Ingo Molnar 提交于
      Create a new directory hierarchy for the low level x86 entry code:
      
          arch/x86/entry/*
      
      This will host all the low level glue that is currently scattered
      all across arch/x86/.
      
      Start with entry_64.S and entry_32.S.
      
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      905a36a2
  17. 02 6月, 2015 2 次提交
    • J
      x86/asm/entry/64: Fold identical code paths · 2f63b9db
      Jan Beulich 提交于
      retint_kernel doesn't require %rcx to be pointing to thread info
      (anymore?), and the code on the two alternative paths is - not
      really surprisingly - identical.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/556C664F020000780007FB64@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2f63b9db
    • I
      x86/debug: Remove perpetually broken, unmaintainable dwarf annotations · 131484c8
      Ingo Molnar 提交于
      So the dwarf2 annotations in low level assembly code have
      become an increasing hindrance: unreadable, messy macros
      mixed into some of the most security sensitive code paths
      of the Linux kernel.
      
      These debug info annotations don't even buy the upstream
      kernel anything: dwarf driven stack unwinding has caused
      problems in the past so it's out of tree, and the upstream
      kernel only uses the much more robust framepointers based
      stack unwinding method.
      
      In addition to that there's a steady, slow bitrot going
      on with these annotations, requiring frequent fixups.
      There's no tooling and no functionality upstream that
      keeps it correct.
      
      So burn down the sick forest, allowing new, healthier growth:
      
         27 files changed, 350 insertions(+), 1101 deletions(-)
      
      Someone who has the willingness and time to do this
      properly can attempt to reintroduce dwarf debuginfo in x86
      assembly code plus dwarf unwinding from first principles,
      with the following conditions:
      
       - it should be maximally readable, and maximally low-key to
         'ordinary' code reading and maintenance.
      
       - find a build time method to insert dwarf annotations
         automatically in the most common cases, for pop/push
         instructions that manipulate the stack pointer. This could
         be done for example via a preprocessing step that just
         looks for common patterns - plus special annotations for
         the few cases where we want to depart from the default.
         We have hundreds of CFI annotations, so automating most of
         that makes sense.
      
       - it should come with build tooling checks that ensure that
         CFI annotations are sensible. We've seen such efforts from
         the framepointer side, and there's no reason it couldn't be
         done on the dwarf side.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      131484c8
  18. 19 5月, 2015 1 次提交