1. 25 5月, 2010 1 次提交
    • P
      sh: handle early calls to return_address() when using dwarf unwinder. · 8a37f520
      Paul Mundt 提交于
      The dwarf unwinder ties in to an early initcall, but it's possible that
      return_address() calls will be made prior to that. This implements some
      additional error handling in to the dwarf unwinder as well as an exit
      path in the return_address() case to bail out if the unwinder hasn't come
      up yet.
      
      This fixes a NULL pointer deref in early boot when mempool_alloc() blows
      up on the not-yet-ready mempool via dwarf_unwind_stack().
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      8a37f520
  2. 22 5月, 2010 1 次提交
  3. 20 4月, 2010 1 次提交
  4. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  5. 23 3月, 2010 1 次提交
  6. 08 2月, 2010 2 次提交
    • M
      sh: Optimise FDE/CIE lookup by using red-black trees · 858918b7
      Matt Fleming 提交于
      Now that the DWARF unwinder is being used to provide perf callstacks
      unwinding speed is an issue. It is no longer being used in exceptional
      circumstances where we don't care about runtime performance, e.g. when
      panicing, so it makes sense improve performance is possible.
      
      With this patch I saw a 42% improvement in unwind time when calling
      return_address(1). Greater improvements will be seen as the number of
      levels unwound increases as each unwind is now cheaper.
      
      Note that insertion time has doubled but that's just the price we pay
      for keeping the trees balanced. However, this is a one-time cost for
      kernel boot/module load and so the improvements in lookup time dominate
      the extra time we spend keeping the trees balanced.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      858918b7
    • M
      sh: Don't continue unwinding across interrupts · 944a3438
      Matt Fleming 提交于
      Unfortunately, due to poor DWARF info in current toolchains, unwinding
      through interrutps cannot be done reliably. The problem is that the
      DWARF info for function epilogues is wrong.
      
      Take this standard epilogue sequence,
      
      80003cc4:       e3 6f           mov     r14,r15
      80003cc6:       26 4f           lds.l   @r15+,pr
      80003cc8:       f6 6e           mov.l   @r15+,r14
      						<---- interrupt here
      80003cca:       f6 6b           mov.l   @r15+,r11
      80003ccc:       f6 6a           mov.l   @r15+,r10
      80003cce:       f6 69           mov.l   @r15+,r9
      80003cd0:       0b 00           rts
      
      If we take an interrupt at the highlighted point, the DWARF info will
      bogusly claim that the return address can be found at some offset from
      the frame pointer, even though the frame pointer was just restored. The
      worst part is if the unwinder finds a text address at the bogus stack
      address - unwinding will continue, for a bit, until it finally comes
      across an unexpected address on the stack and blows up.
      
      The only solution is to stop unwinding once we've calculated the
      function that was executing when the interrupt occurred. This PC can be
      easily calculated from pt_regs->pc.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      944a3438
  7. 02 2月, 2010 1 次提交
  8. 06 11月, 2009 1 次提交
  9. 26 10月, 2009 1 次提交
    • M
      sh: Check for return_to_handler when unwinding the stack · 60339fad
      Matt Fleming 提交于
      When CONFIG_FUNCTION_GRAPH_TRACER is enabled the function graph tracer
      may patch return addresses on the stack with the address of
      return_to_handler(). This really confuses the DWARF unwinder because it
      will try find the caller of return_to_handler(), not the caller of the
      real return address.
      
      So teach the DWARF unwinder how to find the real return address whenever
      it encounters return_to_handler().
      
      This patch does not cope very well when multiple return addresses on the
      stack have been patched. To make it work properly it would require state
      to track how many return_to_handler()'s have been seen so that we'd know
      where to look in current->curr_ret_stack[]. So for now, instead of
      trying to handle this, just moan if more than one return address on the
      stack has been patched.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      60339fad
  10. 19 10月, 2009 1 次提交
  11. 13 10月, 2009 1 次提交
  12. 12 10月, 2009 2 次提交
  13. 11 10月, 2009 1 次提交
  14. 24 9月, 2009 1 次提交
  15. 31 8月, 2009 1 次提交
  16. 22 8月, 2009 1 次提交
  17. 21 8月, 2009 5 次提交
    • M
      sh: Handle the DWARF op, DW_CFA_undefined · 5580e904
      Matt Fleming 提交于
      Allow a DWARF register to have an undefined value. When applied to the
      DWARF return address register this lets lets us label a function as
      having no direct caller, e.g. kernel_thread_helper().
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      5580e904
    • M
      sh: Fix bug calculating the end of the FDE instructions · 5480675d
      Matt Fleming 提交于
      The 'end' member of struct dwarf_fde denotes one byte past the end of
      the CFA instruction stream for an FDE. The value of 'end' was being
      calcualted incorrectly, it was being set too high. This resulted in
      dwarf_cfa_execute_insns() interpreting data past the end of valid
      instructions, thus causing all sorts of weird crashes.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      5480675d
    • M
      sh: unwinder: Introduce UNWINDER_BUG() and UNWINDER_BUG_ON() · b344e24a
      Matt Fleming 提交于
      We can't assume that if we execute the unwinder code and the unwinder
      was already running that it has faulted. Clearly two kernel threads can
      invoke the unwinder at the same time and may be running simultaneously.
      
      The previous approach used BUG() and BUG_ON() in the unwinder code to
      detect whether the unwinder was incapable of unwinding the stack, and
      that the next available unwinder should be used instead. A better
      approach is to explicitly invoke a trap handler to switch unwinders when
      the current unwinder cannot continue.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      b344e24a
    • M
      sh: unwinder: Set the flags for DW_CFA_val_offset ops as DWARF_VAL_OFFSET · 97efbbd5
      Matt Fleming 提交于
      The handling of DW_CFA_val_offset ops was incorrectly using the
      DWARF_REG_OFFSET flag but the register's value cannot be calculated
      using the DWARF_REG_OFFSET method. Create a new flag to indicate that a
      different method must be used to calculate the register's value even
      though there is no implementation for DWARF_VAL_OFFSET yet; it's mainly
      just a place holder.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      97efbbd5
    • M
      sh: unwinder: Fix memory leak and create our own kmem cache · fb3f3e7f
      Matt Fleming 提交于
      Plug a memory leak in dwarf_unwinder_dump() where we didn't free the
      memory that we had previously allocated for the DWARF frames and DWARF
      registers.
      
      Now is also a opportune time to implement our own mempool and kmem
      cache. It's a good idea to have a certain number of frame and register
      objects in reserve at all times, so that we are guaranteed to have our
      allocation satisfied even when memory is scarce. Since we have pools to
      allocate from we can implement the registers for each frame as a linked
      list as opposed to a sparsely populated array. Whilst it's true that the
      lookup time for a linked list is larger than for arrays, there's only
      usually a maximum of 8 registers per frame. So the overhead isn't that
      much of a concern.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      fb3f3e7f
  18. 17 8月, 2009 1 次提交
  19. 16 8月, 2009 2 次提交
    • M
      sh: Add support for DWARF GNU extensions · cd7246f0
      Matt Fleming 提交于
      Also, remove the "fix" to DW_CFA_def_cfa_register where we reset the
      frame's cfa_offset to 0. This action is incorrect when handling
      DW_CFA_def_cfa_register as the DWARF spec specifically states that the
      previous contents of cfa_offset should be used with the new
      register. The reason that I thought cfa_offset should be reset to 0 was
      because it was being assigned a bogus value prior to executing the
      DW_CFA_def_cfa_register op. It turns out that the bogus cfa_offset value
      came from interpreting .cfi_escape pseudo-ops (those used by the GNU
      extensions) as CFA_DW_def_cfa ops.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      cd7246f0
    • M
      sh: Try again at getting the initial return address for an unwind · b955873b
      Matt Fleming 提交于
      The previous hack for calculating the return address for the first frame
      we unwind (dwarf_unwinder_dump) didn't always work. The problem was that
      it assumed once it read the rule for calculating the return address,
      there would be no new rules for calculating it. This isn't true because
      the way in which the CFA is calculated can change as you progress
      through a function and the return address is figured out using the
      CFA. Therefore, the way to calculate the return address can change.
      
      So, instead of using some offset from the beginning of
      dwarf_unwind_stack which is just a flakey approach, and instead of
      executing instructions from the FDE until the return address is setup,
      we now figure out the pc in dwarf_unwind_stack() just before we call
      dwarf_cfa_execute_insns().
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      b955873b
  20. 15 8月, 2009 1 次提交
  21. 14 8月, 2009 5 次提交
    • P
      sh: unwinder: Convert frame allocations to GFP_ATOMIC. · 0fc11e36
      Paul Mundt 提交于
      save_stack_trace_tsk() and friends can be called from atomic context (as
      triggered by latencytop), and subsequently hit two problematic allocation
      points that were using GFP_KERNEL (these were dwarf_unwind_stack() and
      dwarf_frame_alloc_regs()). Convert these over to GFP_ATOMIC and get
      latencytop working with the DWARF unwinder.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      0fc11e36
    • M
      sh: Delete DWARF_ARCH_UNWIND_OFFSET · f8264667
      Matt Fleming 提交于
      Trying to figure out the best value for DWARF_ARCH_UNWIND_OFFSET is
      tricky at best. Various things can change the size (and offset from the
      beginning of the function) of the prologue. Notably, turning on ftrace
      adds calls to mcount at the beginning of functions, thereby pushing the
      prologue further into the function.
      
      So replace DWARF_ARCH_UNWIND_OFFSET with some code that continues to
      execute CFA instructions until the value of return address register is
      defined. This is safe to do because we know that the return address must
      have been pushed onto the frame before our first function call; we just
      can't figure out where at compile-time.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      f8264667
    • P
      sh: unwinder: Restore put_unaligned() for an unaligned destination. · bf43a160
      Paul Mundt 提交于
      The destination address might be unaligned, so set it with
      put_unaligned() for safety. This restores the previous behaviour, albeit
      through the proper API.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      bf43a160
    • P
      sh: unwinder: Fix up usage of unaligned accessors. · 3497447f
      Paul Mundt 提交于
      This was using internal symbols for unaligned accesses, bypassing the
      exposed interface for variable sized safe accesses. This converts all of
      the __get_unaligned_cpuXX() users over to get_unaligned() directly,
      relying on the cast to select the proper internal routine.
      
      Additionally, the __put_unaligned_cpuXX() case is superfluous given that
      the destination address is aligned in all of the current cases, so just
      drop that outright.
      
      Furthermore, this switches to the asm/unaligned.h header instead of the
      asm-generic version, which was silently bypassing the SH-4A optimized
      unaligned ops.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      3497447f
    • M
      sh: dwarf unwinder support. · bd353861
      Matt Fleming 提交于
      This is a first cut at a generic DWARF unwinder for the kernel. It's
      still lacking DWARF64 support and the DWARF expression support hasn't
      been tested very well but it is generating proper stacktraces on SH for
      WARN_ON() and NULL dereferences.
      Signed-off-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      bd353861