1. 14 1月, 2017 1 次提交
  2. 12 1月, 2017 4 次提交
    • J
      x86/entry: Fix the end of the stack for newly forked tasks · ff3f7e24
      Josh Poimboeuf 提交于
      When unwinding a task, the end of the stack is always at the same offset
      right below the saved pt_regs, regardless of which syscall was used to
      enter the kernel.  That convention allows the unwinder to verify that a
      stack is sane.
      
      However, newly forked tasks don't always follow that convention, as
      reported by the following unwinder warning seen by Dave Jones:
      
        WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value           (null)
      
      The warning was due to the following call chain:
      
        (ftrace handler)
        call_usermodehelper_exec_async+0x5/0x140
        ret_from_fork+0x22/0x30
      
      The problem is that ret_from_fork() doesn't create a stack frame before
      calling other functions.  Fix that by carefully using the frame pointer
      macros.
      
      In addition to conforming to the end of stack convention, this also
      makes related stack traces more sensible by making it clear to the user
      that ret_from_fork() was involved.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff3f7e24
    • J
      x86/unwind: Include __schedule() in stack traces · 2c96b2fe
      Josh Poimboeuf 提交于
      In the following commit:
      
        0100301b ("sched/x86: Rewrite the switch_to() code")
      
      ... the layout of the 'inactive_task_frame' struct was designed to have
      a frame pointer header embedded in it, so that the unwinder could use
      the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or
      ret_from_fork() for newly forked tasks which haven't actually run yet).
      
      Finish the job by changing get_frame_pointer() to return a pointer to
      inactive_task_frame's 'bp' field rather than 'bp' itself.  This allows
      the unwinder to start one frame higher on the stack, so that it properly
      reports __schedule().
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2c96b2fe
    • J
      x86/unwind: Disable KASAN checks for non-current tasks · 84936118
      Josh Poimboeuf 提交于
      There are a handful of callers to save_stack_trace_tsk() and
      show_stack() which try to unwind the stack of a task other than current.
      In such cases, it's remotely possible that the task is running on one
      CPU while the unwinder is reading its stack from another CPU, causing
      the unwinder to see stack corruption.
      
      These cases seem to be mostly harmless.  The unwinder has checks which
      prevent it from following bad pointers beyond the bounds of the stack.
      So it's not really a bug as long as the caller understands that
      unwinding another task will not always succeed.
      
      In such cases, it's possible that the unwinder may read a KASAN-poisoned
      region of the stack.  Account for that by using READ_ONCE_NOCHECK() when
      reading the stack of another task.
      
      Use READ_ONCE() when reading the stack of the current task, since KASAN
      warnings can still be useful for finding bugs in that case.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84936118
    • J
      x86/unwind: Silence warnings for non-current tasks · 900742d8
      Josh Poimboeuf 提交于
      There are a handful of callers to save_stack_trace_tsk() and
      show_stack() which try to unwind the stack of a task other than current.
      In such cases, it's remotely possible that the task is running on one
      CPU while the unwinder is reading its stack from another CPU, causing
      the unwinder to see stack corruption.
      
      These cases seem to be mostly harmless.  The unwinder has checks which
      prevent it from following bad pointers beyond the bounds of the stack.
      So it's not really a bug as long as the caller understands that
      unwinding another task will not always succeed.
      
      Since stack "corruption" on another task's stack isn't necessarily a
      bug, silence the warnings when unwinding tasks other than current.
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      900742d8
  3. 10 1月, 2017 5 次提交
  4. 09 1月, 2017 1 次提交
  5. 06 1月, 2017 1 次提交
  6. 05 1月, 2017 3 次提交
  7. 03 1月, 2017 1 次提交
  8. 02 1月, 2017 1 次提交
  9. 30 12月, 2016 3 次提交
    • H
      parisc: Drop TIF_RESTORE_SIGMASK and switch to generic code · 1fe0a7e0
      Helge Deller 提交于
      Commit 7e781418 ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code")
      introduced code with which the "restore sigmask" flag lives in task_struct
      instead of ti->flags. Let's use this optimization on parisc too.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      1fe0a7e0
    • H
      parisc: Mark cr16 clocksource unstable on SMP systems · 41744213
      Helge Deller 提交于
      The cr16 interval timer of each CPU is not syncronized to other cr16
      timers in other CPUs in a SMP system. So, delay the registration of the
      cr16 clocksource until all CPUs have been detected and then - if we are
      on a SMP machine - mark the cr16 clocksource as unstable and lower it's
      rating before registering it at the clocksource framework.
      
      This patch fixes the stalled CPU warnings which we have seen since
      introduction of the cr16 clocksource.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: <stable@vger.kernel.org> # v4.8+
      41744213
    • L
      mm: optimize PageWaiters bit use for unlock_page() · b91e1302
      Linus Torvalds 提交于
      In commit 62906027 ("mm: add PageWaiters indicating tasks are
      waiting for a page bit") Nick Piggin made our page locking no longer
      unconditionally touch the hashed page waitqueue, which not only helps
      performance in general, but is particularly helpful on NUMA machines
      where the hashed wait queues can bounce around a lot.
      
      However, the "clear lock bit atomically and then test the waiters bit"
      sequence turns out to be much more expensive than it needs to be,
      because you get a nasty stall when trying to access the same word that
      just got updated atomically.
      
      On architectures where locking is done with LL/SC, this would be trivial
      to fix with a new primitive that clears one bit and tests another
      atomically, but that ends up not working on x86, where the only atomic
      operations that return the result end up being cmpxchg and xadd.  The
      atomic bit operations return the old value of the same bit we changed,
      not the value of an unrelated bit.
      
      On x86, we could put the lock bit in the high bit of the byte, and use
      "xadd" with that bit (where the overflow ends up not touching other
      bits), and look at the other bits of the result.  However, an even
      simpler model is to just use a regular atomic "and" to clear the lock
      bit, and then the sign bit in eflags will indicate the resulting state
      of the unrelated bit #7.
      
      So by moving the PageWaiters bit up to bit #7, we can atomically clear
      the lock bit and test the waiters bit on x86 too.  And architectures
      with LL/SC (which is all the usual RISC suspects), the particular bit
      doesn't matter, so they are fine with this approach too.
      
      This avoids the extra access to the same atomic word, and thus avoids
      the costly stall at page unlock time.
      
      The only downside is that the interface ends up being a bit odd and
      specialized: clear a bit in a byte, and test the sign bit.  Nick doesn't
      love the resulting name of the new primitive, but I'd rather make the
      name be descriptive and very clear about the limitation imposed by
      trying to work across all relevant architectures than make it be some
      generic thing that doesn't make the odd semantics explicit.
      
      So this introduces the new architecture primitive
      
          clear_bit_unlock_is_negative_byte();
      
      and adds the trivial implementation for x86.  We have a generic
      non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
      combination) which can be overridden by any architecture that can do
      better.  According to Nick, Power has the same hickup x86 has, for
      example, but some other architectures may not even care.
      
      All these optimizations mean that my page locking stress-test (which is
      just executing a lot of small short-lived shell scripts: "make test" in
      the git source tree) no longer makes our page locking look horribly bad.
      Before all these optimizations, just the unlock_page() costs were just
      over 3% of all CPU overhead on "make test".  After this, it's down to
      0.66%, so just a quarter of the cost it used to be.
      
      (The difference on NUMA is bigger, but there this micro-optimization is
      likely less noticeable, since the big issue on NUMA was not the accesses
      to 'struct page', but the waitqueue accesses that were already removed
      by Nick's earlier commit).
      Acked-by: NNick Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b91e1302
  10. 27 12月, 2016 2 次提交
  11. 26 12月, 2016 2 次提交
    • L
      powerpc: Fix build warning on 32-bit PPC · 8ae679c4
      Larry Finger 提交于
      I am getting the following warning when I build kernel 4.9-git on my
      PowerBook G4 with a 32-bit PPC processor:
      
          AS      arch/powerpc/kernel/misc_32.o
        arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
      
      This problem is evident after commit 989cea5c ("kbuild: prevent
      lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
      error that has been in the code since 2005 when this source file was
      created.  That was with commit 9994a338 ("powerpc: Introduce
      entry_{32,64}.S, misc_{32,64}.S, systbl.S").
      
      The offending line does not make a lot of sense.  This error does not
      seem to cause any errors in the executable, thus I am not recommending
      that it be applied to any stable versions.
      
      Thanks to Nicholas Piggin for suggesting this solution.
      
      Fixes: 9994a338 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
      Signed-off-by: NLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ae679c4
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  12. 25 12月, 2016 6 次提交
  13. 24 12月, 2016 1 次提交
    • J
      Revert "x86/unwind: Detect bad stack return address" · c280f773
      Josh Poimboeuf 提交于
      Revert the following commit:
      
        b6959a36 ("x86/unwind: Detect bad stack return address")
      
      ... because Andrey Konovalov reported an unwinder warning:
      
        WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467
      
      The unwind was initiated from an interrupt which occurred while running in the
      generated code for a kprobe.  The unwinder printed the warning because it
      expected regs->ip to point to a valid text address, but instead it pointed to
      the generated code.
      
      Eventually we may want come up with a way to identify generated kprobe
      code so the unwinder can know that it's a valid return address.  Until
      then, just remove the warning.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c280f773
  14. 23 12月, 2016 3 次提交
    • P
      perf/x86: Fix overlap counter scheduling bug · 1134c2b5
      Peter Zijlstra 提交于
      Jiri reported the overlap scheduling exceeding its max stack.
      
      Looking at the constraint that triggered this, it turns out the
      overlap marker isn't needed.
      
      The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
      the counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight".
      
      Esp. that latter part is of interest here I think, our overlapping mask
      is 0x0e, that has 3 bits set and is the highest weight mask in on the
      PMU, therefore it will be placed last. Can we still create a scenario
      where we would need to rewind that?
      
      The scenario for AMD Fam15h is we're having masks like:
      
      	0x3F -- 111111
      	0x38 -- 111000
      	0x07 -- 000111
      
      	0x09 -- 001001
      
      And we mark 0x09 as overlapping, because it is not a direct subset of
      0x38 or 0x07 and has less weight than either of those. This means we'll
      first try and place the 0x09 event, then try and place 0x38/0x07 events.
      Now imagine we have:
      
      	3 * 0x07 + 0x09
      
      and the initial pick for the 0x09 event is counter 0, then we'll fail to
      place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
      event, and then re-try all 0x07 events, which will now work.
      
      The masks on the PMU in question are:
      
        0x01 - 0001
        0x03 - 0011
        0x0e - 1110
        0x0c - 1100
      
      But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
      0x1) are of heavier weight, it should all work out.
      Reported-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Liang Kan <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1134c2b5
    • S
      perf/x86/pebs: Fix handling of PEBS buffer overflows · daa864b8
      Stephane Eranian 提交于
      This patch solves a race condition between PEBS and the PMU handler.
      
      In case multiple PEBS events are sampled at the same time,
      it is possible to have GLOBAL_STATUS bit 62 set indicating
      PEBS buffer overflow and also seeing at most 3 PEBS counters
      having their bits set in the status register. This is a sign
      that there was at least one PEBS record pending at the time
      of the PMU interrupt. PEBS counters must only be processed
      via the drain_pebs() calls, and not via the regular sample
      processing loop coming after that the function, otherwise
      phony regular samples may be generated in the sampling buffer
      not marked with the EXACT tag.
      
      Another possibility is to have one PEBS event and at least
      one non-PEBS event whic hoverflows while PEBS has armed. In this
      case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
      status bit for the PEBS counter will be on Skylake.
      
      To avoid this problem, we systematically ignore the PEBS-enabled
      counters from the GLOBAL_STATUS mask and we always process PEBS
      events via drain_pebs().
      
      The problem manifested itself by having non-exact samples when
      sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
      not have the EXACT flag set.
      
      Note that this problem is only present on Skylake processor.
      This fix is harmless on older processors.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      daa864b8
    • P
      x86/paravirt: Mark unused patch_default label · cef4402d
      Peter Zijlstra 提交于
      A bugfix commit:
      
        45dbea5f ("x86/paravirt: Fix native_patch()")
      
      ... introduced a harmless warning:
      
        arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
        arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]
      
      Fix it by annotating the label as __maybe_unused.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Reported-by: NPiotr Gregor <piotrgregor@rsyncme.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 45dbea5f ("x86/paravirt: Fix native_patch()")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cef4402d
  15. 21 12月, 2016 6 次提交