1. 30 4月, 2008 1 次提交
    • T
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner 提交于
      We can see an ever repeating problem pattern with objects of any kind in the
      kernel:
      
      1) freeing of active objects
      2) reinitialization of active objects
      
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      freed.
      
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ac7fe5a
  2. 26 4月, 2008 1 次提交
    • A
      Add option to enable -Wframe-larger-than= on gcc 4.4 · 35bb5b1e
      Andi Kleen 提交于
      Add option to enable -Wframe-larger-than= on gcc 4.4
      
      gcc mainline (upcoming 4.4) added a new -Wframe-larger-than=...
      option to warn at build time about too large stack frames. Add a config
      option to enable this warning, since this very useful for the kernel.
      
      I choose (somewhat arbitarily) 2048 as default warning threshold for 64bit
      and 1024 as default for 32bit architectures.  With some research and
      fixing all the code for smaller values these defaults should be probably
      lowered.
      
      With the default allyesconfigs have some new warnings, but I think
      that is all code that should be just fixed.
      
      At some point (when gcc 4.4 is released and widely used) this should
      obsolete make checkstack
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      35bb5b1e
  3. 19 4月, 2008 1 次提交
  4. 18 4月, 2008 2 次提交
  5. 17 4月, 2008 1 次提交
  6. 14 4月, 2008 1 次提交
    • C
      slub: Deal with config variable dependencies · 5b06c853
      Christoph Lameter 提交于
      count_partial() is used by both slabinfo and the sysfs proc support. Move
      the function directly before the beginning of the sysfs code so that it can
      be easily found. Rework the preprocessor conditional to take into account
      that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.
      
      Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
      is no point of keeping statistics if no one can restrive them.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      5b06c853
  7. 24 2月, 2008 1 次提交
  8. 15 2月, 2008 1 次提交
  9. 10 2月, 2008 1 次提交
  10. 09 2月, 2008 1 次提交
  11. 08 2月, 2008 1 次提交
    • C
      SLUB: Support for performance statistics · 8ff12cfc
      Christoph Lameter 提交于
      The statistics provided here allow the monitoring of allocator behavior but
      at the cost of some (minimal) loss of performance. Counters are placed in
      SLUB's per cpu data structure. The per cpu structure may be extended by the
      statistics to grow larger than one cacheline which will increase the cache
      footprint of SLUB.
      
      There is a compile option to enable/disable the inclusion of the runtime
      statistics and its off by default.
      
      The slabinfo tool is enhanced to support these statistics via two options:
      
      -D 	Switches the line of information displayed for a slab from size
      	mode to activity mode.
      
      -A	Sorts the slabs displayed by activity. This allows the display of
      	the slabs most important to the performance of a certain load.
      
      -r	Report option will report detailed statistics on
      
      Example (tbench load):
      
      slabinfo -AD		->Shows the most active slabs
      
      Name                   Objects    Alloc     Free   %Fast
      skbuff_fclone_cache         33 111953835 111953835  99  99
      :0000192                  2666  5283688  5281047  99  99
      :0001024                   849  5247230  5246389  83  83
      vm_area_struct            1349   119642   118355  91  22
      :0004096                    15    66753    66751  98  98
      :0000064                  2067    25297    23383  98  78
      dentry                   10259    28635    18464  91  45
      :0000080                 11004    18950     8089  98  98
      :0000096                  1703    12358    10784  99  98
      :0000128                   762    10582     9875  94  18
      :0000512                   184     9807     9647  95  81
      :0002048                   479     9669     9195  83  65
      anon_vma                   777     9461     9002  99  71
      kmalloc-8                 6492     9981     5624  99  97
      :0000768                   258     7174     6931  58  15
      
      So the skbuff_fclone_cache is of highest importance for the tbench load.
      Pretty high load on the 192 sized slab. Look for the aliases
      
      slabinfo -a | grep 000192
      :0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
      	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili
      
      Likely skbuff_head_cache.
      
      
      Looking into the statistics of the skbuff_fclone_cache is possible through
      
      slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned
      
      
      .... Usual output ...
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath             111953360 111946981  99  99
      Slowpath                 1044     7423   0   0
      Page Alloc                272      264   0   0
      Add partial                25      325   0   0
      Remove partial             86      264   0   0
      RemoteObj/SlabFrozen      350     4832   0   0
      Total                111954404 111954404
      
      Flushes       49 Refill        0
      Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)
      
      Looks good because the fastpath is overwhelmingly taken.
      
      
      skbuff_head_cache:
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath              5297262  5259882  99  99
      Slowpath                 4477    39586   0   0
      Page Alloc                937      824   0   0
      Add partial                 0     2515   0   0
      Remove partial           1691      824   0   0
      RemoteObj/SlabFrozen     2621     9684   0   0
      Total                 5301739  5299468
      
      Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)
      
      
      Descriptions of the output:
      
      Total:		The total number of allocation and frees that occurred for a
      		slab
      
      Fastpath:	The number of allocations/frees that used the fastpath.
      
      Slowpath:	Other allocations
      
      Page Alloc:	Number of calls to the page allocator as a result of slowpath
      		processing
      
      Add Partial:	Number of slabs added to the partial list through free or
      		alloc (occurs during cpuslab flushes)
      
      Remove Partial:	Number of slabs removed from the partial list as a result of
      		allocations retrieving a partial slab or by a free freeing
      		the last object of a slab.
      
      RemoteObj/Froz:	How many times were remotely freed object encountered when a
      		slab was about to be deactivated. Frozen: How many times was
      		free able to skip list processing because the slab was in use
      		as the cpuslab of another processor.
      
      Flushes:	Number of times the cpuslab was flushed on request
      		(kmem_cache_shrink, may result from races in __slab_alloc)
      
      Refill:		Number of times we were able to refill the cpuslab from
      		remotely freed objects for the same slab.
      
      Deactivate:	Statistics how slabs were deactivated. Shows how they were
      		put onto the partial list.
      
      In general fastpath is very good. Slowpath without partial list processing is
      also desirable. Any touching of partial list uses node specific locks which
      may potentially cause list lock contention.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      8ff12cfc
  12. 03 2月, 2008 2 次提交
  13. 02 2月, 2008 1 次提交
  14. 30 1月, 2008 3 次提交
    • B
      x86: early boot debugging via FireWire (ohci1394_dma=early) · f212ec4b
      Bernhard Kaindl 提交于
      This patch adds a new configuration option, which adds support for a new
      early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
      to decide wether OHCI-1394 FireWire controllers should be initialized and
      enabled for physical DMA access to allow remote debugging of early problems
      like issues ACPI or other subsystems which are executed very early.
      
      If the config option is not enabled, no code is changed, and if the boot
      paramenter is not given, no new code is executed, and independent of that,
      all new code is freed after boot, so the config option can be even enabled
      in standard, non-debug kernels.
      
      With specialized tools, it is then possible to get debugging information
      from machines which have no serial ports (notebooks) such as the printk
      buffer contents, or any data which can be referenced from global pointers,
      if it is stored below the 4GB limit and even memory dumps of of the physical
      RAM region below the 4GB limit can be taken without any cooperation from the
      CPU of the host, so the machine can be crashed early, it does not matter.
      
      In the extreme, even kernel debuggers can be accessed in this way. I wrote
      a small kgdb module and an accompanying gdb stub for FireWire which allows
      to gdb to talk to kgdb using remote remory reads and writes over FireWire.
      
      An version of the gdb stub fore FireWire is able to read all global data
      from a system which is running a a normal kernel without any kernel debugger,
      without any interruption or support of the system's CPU. That way, e.g. the
      task struct and so on can be read and even manipulated when the physical DMA
      access is granted.
      
      A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
      and I've put a copy online at
      ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
      
      It also has links to all the tools which are available to make use of it
      another copy of it is online at:
      ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diffSigned-Off-By: NBernhard Kaindl <bk@suse.de>
      Tested-By: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f212ec4b
    • A
      x86: add a simple backtrace test module · 6dab2778
      Arjan van de Ven 提交于
      During the work on the x86 32 and 64 bit backtrace code I found it useful
      to have a simple test module to test a process and irq context backtrace.
      Since the existing backtrace code was buggy, I figure it might be useful
      to have such a test module in the kernel so that maybe we can even
      detect such bugs earlier..
      
      [ mingo@elte.hu: build fix ]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6dab2778
    • A
      x86: kprobes: add kprobes smoke tests that run on boot · 8c1c9356
      Ananth N Mavinakayanahalli 提交于
      Here is a quick and naive smoke test for kprobes. This is intended to
      just verify if some unrelated change broke the *probes subsystem. It is
      self contained, architecture agnostic and isn't of any great use by itself.
      
      This needs to be built in the kernel and runs a basic set of tests to
      verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
      of an error, it'll print out a message with a "BUG" prefix.
      
      This is a start; we intend to add more tests to this bucket over time.
      
      Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.
      
      Tested on x86 (32/64) and powerpc.
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c1c9356
  15. 29 1月, 2008 2 次提交
    • S
      kbuild: add verbose option to Section mismatch reporting in modpost · 588ccd73
      Sam Ravnborg 提交于
      If the config option CONFIG_SECTION_MISMATCH is not set and
      we see a Section mismatch present the following to the user:
      
      modpost: Found 1 section mismatch(es).
      To see additional details select "Enable full Section mismatch analysis"
      in the Kernel Hacking menu (CONFIG_SECTION_MISMATCH).
      
      If the option CONFIG_SECTION_MISMATCH is selected
      then be verbose in the Section mismatch reporting from mdopost.
      Sample outputs:
      
      WARNING: o-x86_64/vmlinux.o(.text+0x7396): Section mismatch in reference from the function discover_ebda() to the variable .init.data:ebda_addr
      The function  discover_ebda() references
      the variable __initdata ebda_addr.
      This is often because discover_ebda lacks a __initdata
      annotation or the annotation of ebda_addr is wrong.
      
      WARNING: o-x86_64/vmlinux.o(.data+0x74d58): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit()
      The variable pci_serial_quirks references
      the function __devexit pci_plx9050_exit()
      If the reference is valid then annotate the
      variable with __exit* (see linux/init.h) or name the variable:
      *driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
      
      WARNING: o-x86_64/vmlinux.o(__ksymtab+0x630): Section mismatch in reference from the variable __ksymtab_arch_register_cpu to the function .cpuinit.text:arch_register_cpu()
      The symbol arch_register_cpu is exported and annotated __cpuinit
      Fix this by removing the __cpuinit annotation of arch_register_cpu or drop the export.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      588ccd73
    • S
      kbuild: introduce new option to enhance section mismatch analysis · 91341d4b
      Sam Ravnborg 提交于
      Setting the option DEBUG_SECTION_MISMATCH will
      report additional section mismatch'es but this
      should in the end makes it possible to get rid of
      all of them.
      
      See help text in lib/Kconfig.debug for details.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      91341d4b
  16. 26 1月, 2008 1 次提交
  17. 23 11月, 2007 1 次提交
  18. 26 10月, 2007 1 次提交
    • J
      Permit silencing of __deprecated warnings. · de488443
      Jeff Garzik 提交于
      The __deprecated marker is quite useful in highlighting the remnants of
      old APIs that want removing.
      
      However, it is quite normal for one or more years to pass, before the
      (usually ancient, bitrotten) code in question is either updated or
      deleted.
      
      Thus, like __must_check, add a Kconfig option that permits the silencing
      of this compiler warning.
      
      This change mimics the ifdef-ery and Kconfig defaults of MUST_CHECK as
      closely as possible.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de488443
  19. 23 10月, 2007 1 次提交
  20. 20 10月, 2007 1 次提交
  21. 17 10月, 2007 1 次提交
  22. 08 10月, 2007 1 次提交
  23. 03 10月, 2007 1 次提交
    • H
      [POWERPC] ppc64: support CONFIG_DEBUG_PREEMPT · 048c8bc9
      Hugh Dickins 提交于
      Add CONFIG_DEBUG_PREEMPT support to ppc64: it was useful for testing
      get_paca() preemption.  Cheat a little, just use debug_smp_processor_id()
      in the debug version of get_paca(): it contains all the right checks and
      reporting, though get_paca() doesn't really use smp_processor_id().
      
      Use local_paca for what might have been called __raw_get_paca().
      Silence harmless warnings from io.h and lparcfg.c with local_paca -
      it is okay for iseries_lparcfg_data to be referencing shared_proc
      with preemption enabled: all cpus should show the same value for
      shared_proc.
      
      Why do other architectures need TRACE_IRQFLAGS_SUPPORT for DEBUG_PREEMPT?
      I don't know, ppc64 appears to get along fine without it.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      048c8bc9
  24. 25 9月, 2007 1 次提交
  25. 01 8月, 2007 1 次提交
  26. 20 7月, 2007 1 次提交
    • P
      lockstat: core infrastructure · f20786ff
      Peter Zijlstra 提交于
      Introduce the core lock statistics code.
      
      Lock statistics provides lock wait-time and hold-time (as well as the count
      of corresponding contention and acquisitions events). Also, the first few
      call-sites that encounter contention are tracked.
      
      Lock wait-time is the time spent waiting on the lock. This provides insight
      into the locking scheme, that is, a heavily contended lock is indicative of
      a too coarse locking scheme.
      
      Lock hold-time is the duration the lock was held, this provides a reference for
      the wait-time numbers, so they can be put into perspective.
      
        1)
          lock
        2)
          ... do stuff ..
          unlock
        3)
      
      The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
      hold-time.
      
      The lockdep held-lock tracking code is reused, because it already collects locks
      into meaningful groups (classes), and because it is an existing infrastructure
      for lock instrumentation.
      
      Currently lockdep tracks lock acquisition with two hooks:
      
        lock()
          lock_acquire()
          _lock()
      
       ... code protected by lock ...
      
        unlock()
          lock_release()
          _unlock()
      
      We need to extend this with two more hooks, in order to measure contention.
      
        lock_contended() - used to measure contention events
        lock_acquired()  - completion of the contention
      
      These are then placed the following way:
      
        lock()
          lock_acquire()
          if (!_try_lock())
            lock_contended()
            _lock()
            lock_acquired()
      
       ... do locked stuff ...
      
        unlock()
          lock_release()
          _unlock()
      
      (Note: the try_lock() 'trick' is used to avoid instrumenting all platform
             dependent lock primitive implementations.)
      
      It is also possible to toggle the two lockdep features at runtime using:
      
        /proc/sys/kernel/prove_locking
        /proc/sys/kernel/lock_stat
      
      (esp. turning off the O(n^2) prove_locking functionaliy can help)
      
      [akpm@linux-foundation.org: build fixes]
      [akpm@linux-foundation.org: nuke unneeded ifdefs]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f20786ff
  27. 17 7月, 2007 1 次提交
  28. 10 7月, 2007 1 次提交
  29. 01 6月, 2007 1 次提交
  30. 24 5月, 2007 1 次提交
  31. 13 5月, 2007 1 次提交
  32. 09 5月, 2007 2 次提交
  33. 08 5月, 2007 1 次提交
  34. 03 5月, 2007 1 次提交