1. 29 12月, 2008 1 次提交
  2. 19 12月, 2008 1 次提交
    • P
      "Tree RCU": scalable classic RCU implementation · 64db4cff
      Paul E. McKenney 提交于
      This patch fixes a long-standing performance bug in classic RCU that
      results in massive internal-to-RCU lock contention on systems with
      more than a few hundred CPUs.  Although this patch creates a separate
      flavor of RCU for ease of review and patch maintenance, it is intended
      to replace classic RCU.
      
      This patch still handles stress better than does mainline, so I am still
      calling it ready for inclusion.  This patch is against the -tip tree.
      Nevertheless, experience on an actual 1000+ CPU machine would still be
      most welcome.
      
      Most of the changes noted below were found while creating an rcutiny
      (which should permit ejecting the current rcuclassic) and while doing
      detailed line-by-line documentation.
      
      Updates from v9 (http://lkml.org/lkml/2008/12/2/334):
      
      o	Fixes from remainder of line-by-line code walkthrough,
      	including comment spelling, initialization, undesirable
      	narrowing due to type conversion, removing redundant memory
      	barriers, removing redundant local-variable initialization,
      	and removing redundant local variables.
      
      	I do not believe that any of these fixes address the CPU-hotplug
      	issues that Andi Kleen was seeing, but please do give it a whirl
      	in case the machine is smarter than I am.
      
      	A writeup from the walkthrough may be found at the following
      	URL, in case you are suffering from terminal insomnia or
      	masochism:
      
      	http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf
      
      o	Made rcutree tracing use seq_file, as suggested some time
      	ago by Lai Jiangshan.
      
      o	Added a .csv variant of the rcudata debugfs trace file, to allow
      	people having thousands of CPUs to drop the data into
      	a spreadsheet.	Tested with oocalc and gnumeric.  Updated
      	documentation to suit.
      
      Updates from v8 (http://lkml.org/lkml/2008/11/15/139):
      
      o	Fix a theoretical race between grace-period initialization and
      	force_quiescent_state() that could occur if more than three
      	jiffies were required to carry out the grace-period
      	initialization.  Which it might, if you had enough CPUs.
      
      o	Apply Ingo's printk-standardization patch.
      
      o	Substitute local variables for repeated accesses to global
      	variables.
      
      o	Fix comment misspellings and redundant (but harmless) increments
      	of ->n_rcu_pending (this latter after having explicitly added it).
      
      o	Apply checkpatch fixes.
      
      Updates from v7 (http://lkml.org/lkml/2008/10/10/291):
      
      o	Fixed a number of problems noted by Gautham Shenoy, including
      	the cpu-stall-detection bug that he was having difficulty
      	convincing me was real.  ;-)
      
      o	Changed cpu-stall detection to wait for ten seconds rather than
      	three in order to reduce false positive, as suggested by Ingo
      	Molnar.
      
      o	Produced a design document (http://lwn.net/Articles/305782/).
      	The act of writing this document uncovered a number of both
      	theoretical and "here and now" bugs as noted below.
      
      o	Fix dynticks_nesting accounting confusion, simplify WARN_ON()
      	condition, fix kerneldoc comments, and add memory barriers
      	in dynticks interface functions.
      
      o	Add more data to tracing.
      
      o	Remove unused "rcu_barrier" field from rcu_data structure.
      
      o	Count calls to rcu_pending() from scheduling-clock interrupt
      	to use as a surrogate timebase should jiffies stop counting.
      
      o	Fix a theoretical race between force_quiescent_state() and
      	grace-period initialization.  Yes, initialization does have to
      	go on for some jiffies for this race to occur, but given enough
      	CPUs...
      
      Updates from v6 (http://lkml.org/lkml/2008/9/23/448):
      
      o	Fix a number of checkpatch.pl complaints.
      
      o	Apply review comments from Ingo Molnar and Lai Jiangshan
      	on the stall-detection code.
      
      o	Fix several bugs in !CONFIG_SMP builds.
      
      o	Fix a misspelled config-parameter name so that RCU now announces
      	at boot time if stall detection is configured.
      
      o	Run tests on numerous combinations of configurations parameters,
      	which after the fixes above, now build and run correctly.
      
      Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):
      
      o	Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
      	changeset some time ago, and finally got around to retesting
      	this option).
      
      o	Fix some tracing bugs in rcupreempt that caused incorrect
      	totals to be printed.
      
      o	I now test with a more brutal random-selection online/offline
      	script (attached).  Probably more brutal than it needs to be
      	on the people reading it as well, but so it goes.
      
      o	A number of optimizations and usability improvements:
      
      	o	Make rcu_pending() ignore the grace-period timeout when
      		there is no grace period in progress.
      
      	o	Make force_quiescent_state() avoid going for a global
      		lock in the case where there is no grace period in
      		progress.
      
      	o	Rearrange struct fields to improve struct layout.
      
      	o	Make call_rcu() initiate a grace period if RCU was
      		idle, rather than waiting for the next scheduling
      		clock interrupt.
      
      	o	Invoke rcu_irq_enter() and rcu_irq_exit() only when
      		idle, as suggested by Andi Kleen.  I still don't
      		completely trust this change, and might back it out.
      
      	o	Make CONFIG_RCU_TRACE be the single config variable
      		manipulated for all forms of RCU, instead of the prior
      		confusion.
      
      	o	Document tracing files and formats for both rcupreempt
      		and rcutree.
      
      Updates from v4 for those missing v5 given its bad subject line:
      
      o	Separated dynticks interface so that NMIs and irqs call separate
      	functions, greatly simplifying it.  In particular, this code
      	no longer requires a proof of correctness.  ;-)
      
      o	Separated dynticks state out into its own per-CPU structure,
      	avoiding the duplicated accounting.
      
      o	The case where a dynticks-idle CPU runs an irq handler that
      	invokes call_rcu() is now correctly handled, forcing that CPU
      	out of dynticks-idle mode.
      
      o	Review comments have been applied (thank you all!!!).
      	For but one example, fixed the dynticks-ordering issue that
      	Manfred pointed out, saving me much debugging.  ;-)
      
      o	Adjusted rcuclassic and rcupreempt to handle dynticks changes.
      
      Attached is an updated patch to Classic RCU that applies a hierarchy,
      greatly reducing the contention on the top-level lock for large machines.
      This passes 10-hour concurrent rcutorture and online-offline testing on
      128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
      bugs in presence of dynticks (exciting working on a system where
      "sleep 1" hangs until interrupted...), which were fixed in the
      2.6.27 kernel.  It is getting more reliable than mainline by some
      measures, so the next version will be against -tip for inclusion.
      See also Manfred Spraul's recent patches (or his earlier work from
      2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
      We will converge onto a common patch in the fullness of time, but are
      currently exploring different regions of the design space.  That said,
      I have already gratefully stolen quite a few of Manfred's ideas.
      
      This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
      of the RCU hierarchy.  Defaults to 32 on 32-bit machines and 64 on
      64-bit machines.  If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
      there is no hierarchy.  By default, the RCU initialization code will
      adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
      architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
      this balancing, allowing the hierarchy to be exactly aligned to the
      underlying hardware.  Up to two levels of hierarchy are permitted
      (in addition to the root node), allowing up to 16,384 CPUs on 32-bit
      systems and up to 262,144 CPUs on 64-bit systems.  I just know that I
      am going to regret saying this, but this seems more than sufficient
      for the foreseeable future.  (Some architectures might wish to set
      CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
      If this becomes a real problem, additional levels can be added, but I
      doubt that it will make a significant difference on real hardware.)
      
      In the common case, a given CPU will manipulate its private rcu_data
      structure and the rcu_node structure that it shares with its immediate
      neighbors.  This can reduce both lock and memory contention by multiple
      orders of magnitude, which should eliminate the need for the strange
      manipulations that are reported to be required when running Linux on
      very large systems.
      
      Some shortcomings:
      
      o	More bugs will probably surface as a result of an ongoing
      	line-by-line code inspection.
      
      	Patches will be provided as required.
      
      o	There are probably hangs, rcutorture failures, &c.  Seems
      	quite stable on a 128-CPU machine, but that is kind of small
      	compared to 4096 CPUs.  However, seems to do better than
      	mainline.
      
      	Patches will be provided as required.
      
      o	The memory footprint of this version is several KB larger
      	than rcuclassic.
      
      	A separate UP-only rcutiny patch will be provided, which will
      	reduce the memory footprint significantly, even compared
      	to the old rcuclassic.  One such patch passes light testing,
      	and has a memory footprint smaller even than rcuclassic.
      	Initial reaction from various embedded guys was "it is not
      	worth it", so am putting it aside.
      
      Credits:
      
      o	Manfred Spraul for ideas, review comments, and bugs spotted,
      	as well as some good friendly competition.  ;-)
      
      o	Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
      	Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
      	for reviews and comments.
      
      o	Thomas Gleixner for much-needed help with some timer issues
      	(see patches below).
      
      o	Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
      	Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
      	Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
      	alive despite my heavy abuse^Wtesting.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64db4cff
  3. 26 11月, 2008 1 次提交
    • I
      debugobjects: add boot parameter default value · 3ae70205
      Ingo Molnar 提交于
      Impact: add .config driven boot parameter default value
      
      Right now debugobjects can only be activated if the debug_objects
      boot parameter is passed in via the boot command line.
      
      Make this more convenient (and randomizable) by also providing
      a .config method. Enable it by default. (DEBUG_OBJECTS itself
      is default-off)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ae70205
  4. 17 10月, 2008 2 次提交
    • T
      block: add BIG FAT WARNING to CONFIG_DEBUG_BLOCK_EXT_DEVT · 0e11e342
      Tejun Heo 提交于
      CONFIG_DEBUG_BLOCK_EXT_DEVT can break booting even on some modern
      distros.  Add BIG FAT WARNING to keep people at a safe distance.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e11e342
    • J
      driver core: basic infrastructure for per-module dynamic debug messages · 346e15be
      Jason Baron 提交于
      Base infrastructure to enable per-module debug messages.
      
      I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
      control of debugging statements on a per-module basis in one /proc file,
      currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
      is not set, debugging statements can still be enabled as before, often by
      defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
      affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
      
      The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
      is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
      can be dynamically enabled/disabled on a per-module basis.
      
      Future plans include extending this functionality to subsystems, that define 
      their own debug levels and flags.
      
      Usage:
      
      Dynamic debugging is controlled by the debugfs file, 
      <debugfs>/dynamic_printk/modules. This file contains a list of the modules that
      can be enabled. The format of the file is as follows:
      
      	<module_name> <enabled=0/1>
      		.
      		.
      		.
      
      	<module_name> : Name of the module in which the debug call resides
      	<enabled=0/1> : whether the messages are enabled or not
      
      For example:
      
      	snd_hda_intel enabled=0
      	fixup enabled=1
      	driver enabled=0
      
      Enable a module:
      
      	$echo "set enabled=1 <module_name>" > dynamic_printk/modules
      
      Disable a module:
      
      	$echo "set enabled=0 <module_name>" > dynamic_printk/modules
      
      Enable all modules:
      
      	$echo "set enabled=1 all" > dynamic_printk/modules
      
      Disable all modules:
      
      	$echo "set enabled=0 all" > dynamic_printk/modules
      
      Finally, passing "dynamic_printk" at the command line enables
      debugging for all modules. This mode can be turned off via the above
      disable command.
      
      [gkh: minor cleanups and tweaks to make the build work quietly]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      346e15be
  5. 09 10月, 2008 4 次提交
  6. 03 10月, 2008 1 次提交
    • P
      rcu: RCU-based detection of stalled CPUs for Classic RCU · 2133b5d7
      Paul E. McKenney 提交于
      This patch adds stalled-CPU detection to Classic RCU.  This capability
      is enabled by a new config variable CONFIG_RCU_CPU_STALL_DETECTOR, which
      defaults disabled.
      
      This is a debugging feature to detect infinite loops in kernel code, not
      something that non-kernel-hackers would be expected to care about.
      
      This feature can detect looping CPUs in !PREEMPT builds and looping CPUs
      with preemption disabled in PREEMPT builds.  This is essentially a port of
      this functionality from the treercu patch, replacing the stall debug patch
      that is already in tip/core/rcu (commit 67182ae1).
      
      The changes from the patch in tip/core/rcu include making the config
      variable name match that in treercu, changing from seconds to jiffies to
      avoid spurious warnings, and printing a boot message when this feature
      is enabled.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2133b5d7
  7. 10 9月, 2008 1 次提交
    • A
      debug: add notifier chain debugging · 1b2439db
      Arjan van de Ven 提交于
      during some development we suspected a case where we left something
      in a notifier chain that was from a module that was unloaded already...
      and that sort of thing is rather hard to track down.
      
      This patch adds a very simple sanity check (which isn't all that
      expensive) to make sure the notifier we're about to call is
      actually from either the kernel itself of from a still-loaded
      module, avoiding a hard-to-chase-down crash.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b2439db
  8. 03 9月, 2008 1 次提交
    • T
      powerpc: Work around gcc's -fno-omit-frame-pointer bug · 7563dc64
      Tony Breeds 提交于
      This bug is causing random crashes
      (http://bugzilla.kernel.org/show_bug.cgi?id=11414).
      
      -fno-omit-frame-pointer is only needed on powerpc when -pg is also
      supplied, and there is a gcc bug that causes incorrect code generation
      on 32-bit powerpc when -fno-omit-frame-pointer is used---it uses stack
      locations below the stack pointer, which is not allowed by the ABI
      because those locations can and sometimes do get corrupted by an
      interrupt.
      
      This ensures that CONFIG_FRAME_POINTER is only selected by ftrace.
      When CONFIG_FTRACE is enabled we also pass -mno-sched-epilog to work
      around the gcc codegen bug.
      
      Patch based on work by:
      	Andreas Schwab <schwab@suse.de>
      	Segher Boessenkool <segher@kernel.crashing.org>
      Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7563dc64
  9. 17 8月, 2008 1 次提交
  10. 13 8月, 2008 1 次提交
    • R
      docsrc: build Documentation/ sources · 3794f3e8
      Randy Dunlap 提交于
      Currently source files in the Documentation/ sub-dir can easily bit-rot
      since they are not generally buildable, either because they are hidden in
      text files or because there are no Makefile rules for them.  This needs to
      be fixed so that the source files remain usable and good examples of code
      instead of bad examples.
      
      Add the ability to build source files that are in the Documentation/ dir.
      Add to Kconfig as "BUILD_DOCSRC" config symbol.
      
      Use "CONFIG_BUILD_DOCSRC=1 make ..." to build objects from the
      Documentation/ sources.  Or enable BUILD_DOCSRC in the *config system.
      However, this symbol depends on HEADERS_CHECK since the header files need
      to be installed (for userspace builds).
      
      Built (using cross-tools) for x86-64, i386, alpha, ia64, sparc32,
      sparc64, powerpc, sh, m68k, & mips.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Reviewed-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3794f3e8
  11. 11 8月, 2008 1 次提交
    • P
      rcu, debug: detect stalled grace periods · 67182ae1
      Paul E. McKenney 提交于
      this is a diagnostic patch for Classic RCU.
      
      The approach is to record a timestamp at the beginning
      of the grace period (in rcu_start_batch()), then have
      rcu_check_callbacks() complain if:
      
       1.	it is running on a CPU that has holding up grace periods for
       	a long time (say one second).  This will identify the culprit
       	assuming that the culprit has not disabled hardware irqs,
       	instruction execution, or some such.
      
       2.	it is running on a CPU that is not holding up grace periods,
       	but grace periods have been held up for an even longer time
       	(say two seconds).
      
      It is enabled via the default-off CONFIG_DEBUG_RCU_STALL kernel parameter.
      
      Rather than exponential backoff, it backs off to once per 30 seconds.
      My feeling upon thinking on it was that if you have stalled RCU grace
      periods for that long, a few extra printk() messages are probably the
      least of your worries...
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: David Witbrodt <dawitbro@sbcglobal.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      67182ae1
  12. 25 7月, 2008 1 次提交
    • M
      mm: add a basic debugging framework for memory initialisation · 6b74ab97
      Mel Gorman 提交于
      Boot initialisation is very complex, with significant numbers of
      architecture-specific routines, hooks and code ordering.  While significant
      amounts of the initialisation is architecture-independent, it trusts the data
      received from the architecture layer.  This is a mistake, and has resulted in
      a number of difficult-to-diagnose bugs.
      
      This patchset adds some validation and tracing to memory initialisation.  It
      also introduces a few basic defensive measures.  The validation code can be
      explicitly disabled for embedded systems.
      
      This patch:
      
      Add additional debugging and verification code for memory initialisation.
      
      Once enabled, the verification checks are always run and when required
      additional debugging information may be outputted via a mminit_loglevel=
      command-line parameter.
      
      The verification code is placed in a new file mm/mm_init.c.  Ideally other mm
      initialisation code will be moved here over time.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b74ab97
  13. 22 7月, 2008 1 次提交
  14. 28 6月, 2008 1 次提交
  15. 19 6月, 2008 2 次提交
    • J
      MM: virtual address debug · 59ea7463
      Jiri Slaby 提交于
      Add some (configurable) expensive sanity checking to catch wrong address
      translations on x86.
      
      - create linux/mmdebug.h file to be able include this file in
        asm headers to not get unsolvable loops in header files
      - __phys_addr on x86_32 became a function in ioremap.c since
        PAGE_OFFSET, is_vmalloc_addr and VMALLOC_* non-constasts are undefined
        if declared in page_32.h
      - add __phys_addr_const for initializing doublefault_tss.__cr3
      
      Tested on 386, 386pae, x86_64 and x86_64 numa=fake=2.
      
      Contains Andi's enable numa virtual address debug patch.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      59ea7463
    • P
      rcu: make rcutorture more vicious: reinstate boot-time testing · 31a72bce
      Paul E. McKenney 提交于
      This patch re-institutes the ability to build rcutorture directly into
      the Linux kernel.  The reason that this capability was removed was that
      this could result in your kernel being pretty much useless, as rcutorture
      would be running starting from early boot.  This problem has been avoided
      by (1) making rcutorture run only three seconds of every six by default,
      (2) adding a CONFIG_RCU_TORTURE_TEST_RUNNABLE that permits rcutorture
      to be quiesced at boot time, and (3) adding a sysctl in /proc named
      /proc/sys/kernel/rcutorture_runnable that permits rcutorture to be
      quiesced and unquiesced when built into the kernel.
      
      Please note that this /proc file is -not- available when rcutorture
      is built as a module.  Please also note that to get the earlier
      take-no-prisoners behavior, you must use the boot command line to set
      rcutorture's "stutter" parameter to zero.
      
      The rcutorture quiescing mechanism is currently quite crude: loops
      in each rcutorture process that poll a global variable once per tick.
      Suggestions for improvement are welcome.  The default action will
      be to reduce the polling rate to a few times per second.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      31a72bce
  16. 16 6月, 2008 1 次提交
    • I
      Revert "prohibit rcutorture from being compiled into the kernel" · 1462a200
      Ingo Molnar 提交于
      This reverts commit 9aaffc89.
      
      That commit was a very bad idea. RCU_TORTURE found many boot timing
      bugs and other sorts of bugs in the past, so excluding it from
      boot images is very silly.
      
      The option already depends on DEBUG_KERNEL and is disabled by default.
      Even when it runs, the test threads are reniced. If it annoys people
      we could add a runtime sysctl.
      1462a200
  17. 25 5月, 2008 2 次提交
  18. 24 5月, 2008 1 次提交
  19. 30 4月, 2008 2 次提交
    • T
      debugobjects: add timer specific object debugging code · c6f3a97f
      Thomas Gleixner 提交于
      Add calls to the generic object debugging infrastructure and provide fixup
      functions which allow to keep the system alive when recoverable problems have
      been detected by the object debugging core code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6f3a97f
    • T
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner 提交于
      We can see an ever repeating problem pattern with objects of any kind in the
      kernel:
      
      1) freeing of active objects
      2) reinitialization of active objects
      
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      freed.
      
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ac7fe5a
  20. 26 4月, 2008 1 次提交
    • A
      Add option to enable -Wframe-larger-than= on gcc 4.4 · 35bb5b1e
      Andi Kleen 提交于
      Add option to enable -Wframe-larger-than= on gcc 4.4
      
      gcc mainline (upcoming 4.4) added a new -Wframe-larger-than=...
      option to warn at build time about too large stack frames. Add a config
      option to enable this warning, since this very useful for the kernel.
      
      I choose (somewhat arbitarily) 2048 as default warning threshold for 64bit
      and 1024 as default for 32bit architectures.  With some research and
      fixing all the code for smaller values these defaults should be probably
      lowered.
      
      With the default allyesconfigs have some new warnings, but I think
      that is all code that should be just fixed.
      
      At some point (when gcc 4.4 is released and widely used) this should
      obsolete make checkstack
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      35bb5b1e
  21. 19 4月, 2008 1 次提交
  22. 18 4月, 2008 2 次提交
  23. 17 4月, 2008 1 次提交
  24. 14 4月, 2008 1 次提交
    • C
      slub: Deal with config variable dependencies · 5b06c853
      Christoph Lameter 提交于
      count_partial() is used by both slabinfo and the sysfs proc support. Move
      the function directly before the beginning of the sysfs code so that it can
      be easily found. Rework the preprocessor conditional to take into account
      that slub sysfs support depends on CONFIG_SYSFS *and* CONFIG_SLUB_DEBUG.
      
      Make CONFIG_SLUB_STATS depend on CONFIG_SLUB_DEBUG and CONFIG_SYSFS. There
      is no point of keeping statistics if no one can restrive them.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      5b06c853
  25. 24 2月, 2008 1 次提交
  26. 15 2月, 2008 1 次提交
  27. 10 2月, 2008 1 次提交
  28. 09 2月, 2008 1 次提交
  29. 08 2月, 2008 1 次提交
    • C
      SLUB: Support for performance statistics · 8ff12cfc
      Christoph Lameter 提交于
      The statistics provided here allow the monitoring of allocator behavior but
      at the cost of some (minimal) loss of performance. Counters are placed in
      SLUB's per cpu data structure. The per cpu structure may be extended by the
      statistics to grow larger than one cacheline which will increase the cache
      footprint of SLUB.
      
      There is a compile option to enable/disable the inclusion of the runtime
      statistics and its off by default.
      
      The slabinfo tool is enhanced to support these statistics via two options:
      
      -D 	Switches the line of information displayed for a slab from size
      	mode to activity mode.
      
      -A	Sorts the slabs displayed by activity. This allows the display of
      	the slabs most important to the performance of a certain load.
      
      -r	Report option will report detailed statistics on
      
      Example (tbench load):
      
      slabinfo -AD		->Shows the most active slabs
      
      Name                   Objects    Alloc     Free   %Fast
      skbuff_fclone_cache         33 111953835 111953835  99  99
      :0000192                  2666  5283688  5281047  99  99
      :0001024                   849  5247230  5246389  83  83
      vm_area_struct            1349   119642   118355  91  22
      :0004096                    15    66753    66751  98  98
      :0000064                  2067    25297    23383  98  78
      dentry                   10259    28635    18464  91  45
      :0000080                 11004    18950     8089  98  98
      :0000096                  1703    12358    10784  99  98
      :0000128                   762    10582     9875  94  18
      :0000512                   184     9807     9647  95  81
      :0002048                   479     9669     9195  83  65
      anon_vma                   777     9461     9002  99  71
      kmalloc-8                 6492     9981     5624  99  97
      :0000768                   258     7174     6931  58  15
      
      So the skbuff_fclone_cache is of highest importance for the tbench load.
      Pretty high load on the 192 sized slab. Look for the aliases
      
      slabinfo -a | grep 000192
      :0000192     <- xfs_btree_cur filp kmalloc-192 uid_cache tw_sock_TCP
      	request_sock_TCPv6 tw_sock_TCPv6 skbuff_head_cache xfs_ili
      
      Likely skbuff_head_cache.
      
      
      Looking into the statistics of the skbuff_fclone_cache is possible through
      
      slabinfo skbuff_fclone_cache	->-r option implied if cache name is mentioned
      
      
      .... Usual output ...
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath             111953360 111946981  99  99
      Slowpath                 1044     7423   0   0
      Page Alloc                272      264   0   0
      Add partial                25      325   0   0
      Remove partial             86      264   0   0
      RemoteObj/SlabFrozen      350     4832   0   0
      Total                111954404 111954404
      
      Flushes       49 Refill        0
      Deactivate Full=325(92%) Empty=0(0%) ToHead=24(6%) ToTail=1(0%)
      
      Looks good because the fastpath is overwhelmingly taken.
      
      
      skbuff_head_cache:
      
      Slab Perf Counter       Alloc     Free %Al %Fr
      --------------------------------------------------
      Fastpath              5297262  5259882  99  99
      Slowpath                 4477    39586   0   0
      Page Alloc                937      824   0   0
      Add partial                 0     2515   0   0
      Remove partial           1691      824   0   0
      RemoteObj/SlabFrozen     2621     9684   0   0
      Total                 5301739  5299468
      
      Deactivate Full=2620(100%) Empty=0(0%) ToHead=0(0%) ToTail=0(0%)
      
      
      Descriptions of the output:
      
      Total:		The total number of allocation and frees that occurred for a
      		slab
      
      Fastpath:	The number of allocations/frees that used the fastpath.
      
      Slowpath:	Other allocations
      
      Page Alloc:	Number of calls to the page allocator as a result of slowpath
      		processing
      
      Add Partial:	Number of slabs added to the partial list through free or
      		alloc (occurs during cpuslab flushes)
      
      Remove Partial:	Number of slabs removed from the partial list as a result of
      		allocations retrieving a partial slab or by a free freeing
      		the last object of a slab.
      
      RemoteObj/Froz:	How many times were remotely freed object encountered when a
      		slab was about to be deactivated. Frozen: How many times was
      		free able to skip list processing because the slab was in use
      		as the cpuslab of another processor.
      
      Flushes:	Number of times the cpuslab was flushed on request
      		(kmem_cache_shrink, may result from races in __slab_alloc)
      
      Refill:		Number of times we were able to refill the cpuslab from
      		remotely freed objects for the same slab.
      
      Deactivate:	Statistics how slabs were deactivated. Shows how they were
      		put onto the partial list.
      
      In general fastpath is very good. Slowpath without partial list processing is
      also desirable. Any touching of partial list uses node specific locks which
      may potentially cause list lock contention.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      8ff12cfc
  30. 03 2月, 2008 2 次提交
  31. 02 2月, 2008 1 次提交