1. 24 5月, 2017 1 次提交
    • J
      Revert "x86/entry: Fix the end of the stack for newly forked tasks" · ebd57499
      Josh Poimboeuf 提交于
      Petr Mladek reported the following warning when loading the livepatch
      sample module:
      
        WARNING: CPU: 1 PID: 3699 at arch/x86/kernel/stacktrace.c:132 save_stack_trace_tsk_reliable+0x133/0x1a0
        ...
        Call Trace:
         __schedule+0x273/0x820
         schedule+0x36/0x80
         kthreadd+0x305/0x310
         ? kthread_create_on_cpu+0x80/0x80
         ? icmp_echo.part.32+0x50/0x50
         ret_from_fork+0x2c/0x40
      
      That warning means the end of the stack is no longer recognized as such
      for newly forked tasks.  The problem was introduced with the following
      commit:
      
        ff3f7e24 ("x86/entry: Fix the end of the stack for newly forked tasks")
      
      ... which was completely misguided.  It only partially fixed the
      reported issue, and it introduced another bug in the process.  None of
      the other entry code saves the frame pointer before calling into C code,
      so it doesn't make sense for ret_from_fork to do so either.
      
      Contrary to what I originally thought, the original issue wasn't related
      to newly forked tasks.  It was actually related to ftrace.  When entry
      code calls into a function which then calls into an ftrace handler, the
      stack frame looks different than normal.
      
      The original issue will be fixed in the unwinder, in a subsequent patch.
      Reported-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: live-patching@vger.kernel.org
      Fixes: ff3f7e24 ("x86/entry: Fix the end of the stack for newly forked tasks")
      Link: http://lkml.kernel.org/r/f350760f7e82f0750c8d1dd093456eb212751caa.1495553739.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ebd57499
  2. 04 4月, 2017 1 次提交
  3. 01 3月, 2017 1 次提交
    • J
      x86/entry/64: Relax pvops stub clobber specifications · 2140a994
      Jan Beulich 提交于
      Except for the error_exit case, none of the code paths following the
      {DIS,EN}ABLE_INTERRUPTS() invocations being modified here make any
      assumptions on register values, so all registers can be clobbered
      there. In the error_exit case a minor adjustment to register usage
      (at once eliminating an instruction) also allows for this to be true.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5894556D02000078001366D3@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2140a994
  4. 12 1月, 2017 1 次提交
    • J
      x86/entry: Fix the end of the stack for newly forked tasks · ff3f7e24
      Josh Poimboeuf 提交于
      When unwinding a task, the end of the stack is always at the same offset
      right below the saved pt_regs, regardless of which syscall was used to
      enter the kernel.  That convention allows the unwinder to verify that a
      stack is sane.
      
      However, newly forked tasks don't always follow that convention, as
      reported by the following unwinder warning seen by Dave Jones:
      
        WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value           (null)
      
      The warning was due to the following call chain:
      
        (ftrace handler)
        call_usermodehelper_exec_async+0x5/0x140
        ret_from_fork+0x22/0x30
      
      The problem is that ret_from_fork() doesn't create a stack frame before
      calling other functions.  Fix that by carefully using the frame pointer
      macros.
      
      In addition to conforming to the end of stack convention, this also
      makes related stack traces more sensible by making it clear to the user
      that ret_from_fork() was involved.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff3f7e24
  5. 25 10月, 2016 1 次提交
  6. 21 10月, 2016 1 次提交
    • J
      x86/entry/unwind: Create stack frames for saved interrupt registers · 946c1911
      Josh Poimboeuf 提交于
      With frame pointers, when a task is interrupted, its stack is no longer
      completely reliable because the function could have been interrupted
      before it had a chance to save the previous frame pointer on the stack.
      So the caller of the interrupted function could get skipped by a stack
      trace.
      
      This is problematic for live patching, which needs to know whether a
      stack trace of a sleeping task can be relied upon.  There's currently no
      way to detect if a sleeping task was interrupted by a page fault
      exception or preemption before it went to sleep.
      
      Another issue is that when dumping the stack of an interrupted task, the
      unwinder has no way of knowing where the saved pt_regs registers are, so
      it can't print them.
      
      This solves those issues by encoding the pt_regs pointer in the frame
      pointer on entry from an interrupt or an exception.
      
      This patch also updates the unwinder to be able to decode it, because
      otherwise the unwinder would be broken by this change.
      
      Note that this causes a change in the behavior of the unwinder: each
      instance of a pt_regs on the stack is now considered a "frame".  So
      callers of unwind_get_return_address() will now get an occasional
      'regs->ip' address that would have previously been skipped over.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8b9f84a21e39d249049e0547b559ff8da0df0988.1476973742.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      946c1911
  7. 30 9月, 2016 1 次提交
    • W
      x86/entry/64: Fix context tracking state warning when load_gs_index fails · 2fa5f04f
      Wanpeng Li 提交于
      This warning:
      
       WARNING: CPU: 0 PID: 3331 at arch/x86/entry/common.c:45 enter_from_user_mode+0x32/0x50
       CPU: 0 PID: 3331 Comm: ldt_gdt_64 Not tainted 4.8.0-rc7+ #13
       Call Trace:
        dump_stack+0x99/0xd0
        __warn+0xd1/0xf0
        warn_slowpath_null+0x1d/0x20
        enter_from_user_mode+0x32/0x50
        error_entry+0x6d/0xc0
        ? general_protection+0x12/0x30
        ? native_load_gs_index+0xd/0x20
        ? do_set_thread_area+0x19c/0x1f0
        SyS_set_thread_area+0x24/0x30
        do_int80_syscall_32+0x7c/0x220
        entry_INT80_compat+0x38/0x50
      
      ... can be reproduced by running the GS testcase of the ldt_gdt test unit in
      the x86 selftests.
      
      do_int80_syscall_32() will call enter_form_user_mode() to convert context
      tracking state from user state to kernel state. The load_gs_index() call
      can fail with user gsbase, gsbase will be fixed up and proceed if this
      happen.
      
      However, enter_from_user_mode() will be called again in the fixed up path
      though it is context tracking kernel state currently.
      
      This patch fixes it by just fixing up gsbase and telling lockdep that IRQs
      are off once load_gs_index() failed with user gsbase.
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1475197266-3440-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2fa5f04f
  8. 29 9月, 2016 1 次提交
  9. 16 9月, 2016 1 次提交
  10. 15 9月, 2016 1 次提交
  11. 14 9月, 2016 1 次提交
  12. 24 8月, 2016 2 次提交
  13. 10 8月, 2016 2 次提交
  14. 08 8月, 2016 1 次提交
  15. 01 8月, 2016 1 次提交
  16. 15 7月, 2016 1 次提交
  17. 05 5月, 2016 1 次提交
  18. 29 4月, 2016 1 次提交
  19. 13 4月, 2016 2 次提交
  20. 10 3月, 2016 1 次提交
  21. 01 2月, 2016 2 次提交
  22. 29 1月, 2016 4 次提交
  23. 24 11月, 2015 2 次提交
    • A
      x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots · 478dc89c
      Andy Lutomirski 提交于
      On CONFIG_CONTEXT_TRACKING kernels that have context tracking
      disabled at runtime (which includes most distro kernels), we
      still have the overhead of a call to enter_from_user_mode in
      interrupt and exception entries.
      
      If jump labels are available, this uses the jump label
      infrastructure to skip the call.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/73ee804fff48cd8c66b65b724f9f728a11a8c686.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      478dc89c
    • A
      x86/entry/64: Fix irqflag tracing wrt context tracking · f1075053
      Andy Lutomirski 提交于
      Paolo pointed out that enter_from_user_mode could be called
      while irqflags were traced as though IRQs were on.
      
      In principle, this could confuse lockdep.  It doesn't cause any
      problems that I've seen in any configuration, but if I build
      with CONFIG_DEBUG_LOCKDEP=y, enable a nohz_full CPU, and add
      code like:
      
      	if (irqs_disabled()) {
      		spin_lock(&something);
      		spin_unlock(&something);
      	}
      
      to the top of enter_from_user_mode, then lockdep will complain
      without this fix.  It seems that lockdep's irqflags sanity
      checks are too weak to detect this bug without forcing the
      issue.
      
      This patch adds one byte to normal kernels, and it's IMO a bit
      ugly. I haven't spotted a better way to do this yet, though.
      The issue is that we can't do TRACE_IRQS_OFF until after SWAPGS
      (if needed), but we're also supposed to do it before calling C
      code.
      
      An alternative approach would be to call trace_hardirqs_off in
      enter_from_user_mode.  That would be less code and would not
      bloat normal kernels at all, but it would be harder to see how
      the code worked.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/86237e362390dfa6fec12de4d75a238acb0ae787.1447361906.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1075053
  24. 09 10月, 2015 2 次提交
  25. 07 10月, 2015 1 次提交
  26. 23 9月, 2015 2 次提交
  27. 17 7月, 2015 4 次提交
    • A
      x86/entry/64, x86/nmi/64: Add CONFIG_DEBUG_ENTRY NMI testing code · a97439aa
      Andy Lutomirski 提交于
      It turns out to be rather tedious to test the NMI nesting code.
      Make it easier: add a new CONFIG_DEBUG_ENTRY option that causes
      the NMI handler to pre-emptively unmask NMIs.
      
      With this option set, errors in the repeat_nmi logic or failures
      to detect that we're in a nested NMI will result in quick panics
      under perf (especially if multiple counters are running at high
      frequency) instead of requiring an unusual workload that
      generates page faults or breakpoints inside NMIs.
      
      I called it CONFIG_DEBUG_ENTRY instead of CONFIG_DEBUG_NMI_ENTRY
      because I want to add new non-NMI checks elsewhere in the entry
      code in the future, and I'd rather not add too many new config
      options or add this option and then immediately rename it.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a97439aa
    • A
      x86/nmi/64: Make the "NMI executing" variable more consistent · 36f1a77b
      Andy Lutomirski 提交于
      Currently, "NMI executing" is one the first time an outermost
      NMI hits repeat_nmi and zero thereafter.  Change it to be zero
      each time for consistency.
      
      This is intended to help NMI handling fail harder if it's buggy.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      36f1a77b
    • A
      x86/nmi/64: Minor asm simplification · 23a781e9
      Andy Lutomirski 提交于
      Replace LEA; MOV with an equivalent SUB.  This saves one
      instruction.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      23a781e9
    • A
      x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · 810bc075
      Andy Lutomirski 提交于
      We have a tricky bug in the nested NMI code: if we see RSP
      pointing to the NMI stack on NMI entry from kernel mode, we
      assume that we are executing a nested NMI.
      
      This isn't quite true.  A malicious userspace program can point
      RSP at the NMI stack, issue SYSCALL, and arrange for an NMI to
      happen while RSP is still pointing at the NMI stack.
      
      Fix it with a sneaky trick.  Set DF in the region of code that
      the RSP check is intended to detect.  IRET will clear DF
      atomically.
      
      ( Note: other than paravirt, there's little need for all this
        complexity. We could check RIP instead of RSP. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      810bc075