1. 18 9月, 2009 3 次提交
    • P
      rcu: Add debug checks to TREE_PREEMPT_RCU for premature grace periods · b0e165c0
      Paul E. McKenney 提交于
      Check to make sure that there are no blocked tasks for the previous
      grace period while initializing for the next grace period, verify
      that rcu_preempt_qs() is given the correct CPU number and is never
      called for an offline CPU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      LKML-Reference: <12528585111986-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b0e165c0
    • P
      rcu: Initialize multi-level RCU grace periods holding locks · b835db1f
      Paul E. McKenney 提交于
      Prior implementations initialized the root and any internal
      nodes without holding locks, then initialized the leaves
      holding locks.
      
      This is a false economy, as the leaf nodes will usually greatly
      outnumber the root and internal nodes.  Acquiring locks on all
      nodes is conceptually much simpler as well.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      LKML-Reference: <12524504773190-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b835db1f
    • P
      rcu: Need to update rnp->gpnum if preemptable RCU is to be reliable · de078d87
      Paul E. McKenney 提交于
      Without this patch, tasks preempted in RCU read-side critical
      sections can fail to block the grace period, given that
      rnp->gpnum is used to determine which rnp->blocked_tasks[]
      element the preempted task is enqueued on.
      
      Before the patch, rnp->gpnum is always zero, so preempted tasks
      are always enqueued on rnp->blocked_tasks[0], which is correct
      only when the current CPU has not checked into the current
      grace period and the grace-period number is even, or,
      similarly, if the current CPU -has- checked into the current
      grace period and the grace-period number is odd.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      LKML-Reference: <12524504771622-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de078d87
  2. 29 8月, 2009 2 次提交
  3. 26 8月, 2009 1 次提交
  4. 25 8月, 2009 1 次提交
    • P
      rcu: Add CPU-offline processing for single-node configurations · 33f76148
      Paul E. McKenney 提交于
      Add preemptable-RCU plugin to handle the CPU-offline
      processing.
      
      An additional plugin is forthcoming to handle multinode RCU
      trees, but this current plugin works for configurations up to
      32 CPUs (64 CPUs for 64-bit kernels).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12511321213336-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      33f76148
  5. 23 8月, 2009 5 次提交
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
    • P
      rcu: Simplify rcu_pending()/rcu_check_callbacks() API · a157229c
      Paul E. McKenney 提交于
      All calls from outside RCU are of the form:
      
      	if (rcu_pending(cpu))
      		rcu_check_callbacks(cpu, user);
      
      This is silly, instead we put a call to rcu_pending() in
      rcu_check_callbacks(), and then make the outside calls be to
      rcu_check_callbacks().  This cuts down on the code a bit and
      also gives the compiler a better chance of optimizing.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <125097461311-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a157229c
    • P
      rcu: Merge per-RCU-flavor initialization into pre-existing macro · 65cf8f86
      Paul E. McKenney 提交于
      Rename the RCU_DATA_PTR_INIT() macro to RCU_INIT_FLAVOR() and
      make it do the rcu_init_one() and rcu_boot_init_percpu_data()
      calls.  Merge the loop that was in the original macro with the
      loops that were in __rcu_init().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746133916-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65cf8f86
    • P
      rcu: Renamings to increase RCU clarity · d6714c22
      Paul E. McKenney 提交于
      Make RCU-sched, RCU-bh, and RCU-preempt be underlying
      implementations, with "RCU" defined in terms of one of the
      three.  Update the outdated rcu_qsctr_inc() names, as these
      functions no longer increment anything.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746132696-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6714c22
    • P
      rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h · 9f77da9f
      Paul E. McKenney 提交于
      Some information hiding that makes it easier to merge
      preemptability into rcutree without descending into #include
      hell.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <1250974613373-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f77da9f
  6. 16 8月, 2009 2 次提交
    • P
      rcu: Simplify RCU CPU-hotplug notification · 2e597558
      Paul E. McKenney 提交于
      Use the new cpu_notifier() API to simplify RCU's CPU-hotplug
      notifiers, collapsing down to a single such notifier.
      
      This makes it trivial to provide the notifier-ordering
      guarantee that rcu_barrier() depends on.
      
      Also remove redundant open_softirq() calls from Hierarchical
      RCU notifier.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: josht@linux.vnet.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: hugh.dickins@tiscali.co.uk
      Cc: benh@kernel.crashing.org
      LKML-Reference: <12503552312510-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e597558
    • P
      rcu: Split hierarchical RCU initialization into boot-time and CPU-online pieces · 27569620
      Paul E. McKenney 提交于
      This patch divides the rcutree initialization into boot-time
      and hotplug-time components, so that the tree data structures
      are guaranteed to be fully linked at boot time regardless of
      what might happen in CPU hotplug operations.
      
      This makes RCU more resilient against CPU hotplug misbehavior
      (and vice versa), but more importantly, does a better job of
      compartmentalizing the code.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: josht@linux.vnet.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: hugh.dickins@tiscali.co.uk
      Cc: benh@kernel.crashing.org
      LKML-Reference: <1250355231152-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      27569620
  7. 02 8月, 2009 1 次提交
    • I
      debug lockups: Improve lockup detection · c1dc0b9c
      Ingo Molnar 提交于
      When debugging a recent lockup bug i found various deficiencies
      in how our current lockup detection helpers work:
      
       - SysRq-L is not very efficient as it uses a workqueue, hence
         it cannot punch through hard lockups and cannot see through
         most soft lockups either.
      
       - The SysRq-L code depends on the NMI watchdog - which is off
         by default.
      
       - We dont print backtraces from the RCU code's built-in
         'RCU state machine is stuck' debug code. This debug
         code tends to be one of the first (and only) mechanisms
         that show that a lockup has occured.
      
      This patch changes the code so taht we:
      
       - Trigger the NMI backtrace code from SysRq-L instead of using
         a workqueue (which cannot punch through hard lockups)
      
       - Trigger print-all-CPU-backtraces from the RCU lockup detection
         code
      
      Also decouple the backtrace printing code from the NMI watchdog:
      
       - Dont use variable size cpumasks (it might not be initialized
         and they are a bit more fragile anyway)
      
       - Trigger an NMI immediately via an IPI, instead of waiting
         for the NMI tick to occur. This is a lot faster and can
         produce more relevant backtraces. It will also work if the
         NMI watchdog is disabled.
      
       - Dont print the 'dazed and confused' message when we print
         a backtrace from the NMI
      
       - Do a show_regs() plus a dump_stack() to get maximum info
         out of the dump. Worst-case we get two stacktraces - which
         is not a big deal. Sometimes, if register content is
         corrupted, the precise stack walker in show_regs() wont
         give us a full backtrace - in this case dump_stack() will
         do it.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1dc0b9c
  8. 24 6月, 2009 1 次提交
    • P
      rcu: Mark Hierarchical RCU no longer experimental · f6faac71
      Paul E. McKenney 提交于
      Removes the warnings about Hierarchical RCU being experimental,
      given that it has gone through almost six months of being the
      default RCU in mainline for the x86 with very little trouble.
      
      This makes hierarchical-RCU bootup look less scary.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: akpm@linux-foundation.org
      Cc: niv@us.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: dipankar@in.ibm.com
      Cc: dhowells@redhat.com
      Cc: lethal@linux-sh.org
      Cc: kernel@wantstofly.org
      Cc: cl@linux-foundation.org
      Cc: schamp@sgi.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6faac71
  9. 14 4月, 2009 2 次提交
    • P
      rcu: Add __rcu_pending tracing to hierarchical RCU · 7ba5c840
      Paul E. McKenney 提交于
      Add tracing to __rcu_pending() to provide information on why RCU
      processing was kicked off.  This is helpful for debugging hierarchical
      RCU, and might also be helpful in learning how hierarchical RCU operates.
      Located-by: NAnton Blanchard <anton@au1.ibm.com>
      Tested-by: NAnton Blanchard <anton@au1.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: anton@samba.org
      Cc: akpm@linux-foundation.org
      Cc: dipankar@in.ibm.com
      Cc: manfred@colorfullife.com
      Cc: cl@linux-foundation.org
      Cc: josht@linux.vnet.ibm.com
      Cc: schamp@sgi.com
      Cc: niv@us.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: ego@in.ibm.com
      Cc: laijs@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: peterz@infradead.org
      Cc: penberg@cs.helsinki.fi
      Cc: andi@firstfloor.org
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <1239683479943-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ba5c840
    • P
      rcu: Make hierarchical RCU less IPI-happy · ef631b0c
      Paul E. McKenney 提交于
      This patch fixes a hierarchical-RCU performance bug located by Anton
      Blanchard.  The problem stems from a misguided attempt to provide a
      work-around for jiffies-counter failure.  This work-around uses a per-CPU
      n_rcu_pending counter, which is incremented on each call to rcu_pending(),
      which in turn is called from each scheduling-clock interrupt.  Each CPU
      then treats this counter as a surrogate for the jiffies counter, so
      that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
      counter will cause RCU to invoke force_quiescent_state(), which in turn
      will (among other things) send resched IPIs to CPUs that have thus far
      failed to pass through an RCU quiescent state.
      
      Unfortunately, each CPU resets only its own counter after sending a
      batch of IPIs.  This means that the other CPUs will also (needlessly)
      send -another- round of IPIs, for a full N-squared set of IPIs in the
      worst case every three scheduler-clock ticks until the grace period
      finally ends.  It is not reasonable for a given CPU to reset each and
      every n_rcu_pending for all the other CPUs, so this patch instead simply
      disables the jiffies-counter "training wheels", thus eliminating the
      excessive IPIs.
      
      Note that the jiffies-counter IPIs do not have this problem due to
      the fact that the jiffies counter is global, so that the CPU sending
      the IPIs can easily reset things, thus preventing the other CPUs from
      sending redundant IPIs.
      
      Note also that the n_rcu_pending counter remains, as it will continue to
      be used for tracing.  It may also see use to update the jiffies counter,
      should an appropriate kick-the-jiffies-counter API appear.
      Located-by: NAnton Blanchard <anton@au1.ibm.com>
      Tested-by: NAnton Blanchard <anton@au1.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: anton@samba.org
      Cc: akpm@linux-foundation.org
      Cc: dipankar@in.ibm.com
      Cc: manfred@colorfullife.com
      Cc: cl@linux-foundation.org
      Cc: josht@linux.vnet.ibm.com
      Cc: schamp@sgi.com
      Cc: niv@us.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: ego@in.ibm.com
      Cc: laijs@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: peterz@infradead.org
      Cc: penberg@cs.helsinki.fi
      Cc: andi@firstfloor.org
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <12396834793575-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef631b0c
  10. 03 4月, 2009 2 次提交
    • I
      kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies · 6258c4fb
      Ingo Molnar 提交于
      Impact: cleanup
      
      We want to remove rcutree internals from the public rcutree.h file for
      upcoming kmemtrace changes - but kernel/rcutree_trace.c depends on them.
      
      Introduce kernel/rcutree.h for internal definitions. (Probably all
      the other data types from include/linux/rcutree.h could be
      moved here too - except rcu_data.)
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: paulmck@linux.vnet.ibm.com
      LKML-Reference: <1237898630.25315.83.camel@penberg-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6258c4fb
    • I
      kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies · b1f77b05
      Ingo Molnar 提交于
      Impact: build fix for all non-x86 architectures
      
      We want to remove percpu.h from rcuclassic.h/rcutree.h (for upcoming
      kmemtrace changes) but that would break the DECLARE_PER_CPU based
      declarations in these files.
      
      Move the quiescent counter management functions to their respective
      RCU implementation .c files - they were slightly above the inlining
      limit anyway.
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: paulmck@linux.vnet.ibm.com
      LKML-Reference: <1237898630.25315.83.camel@penberg-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b1f77b05
  11. 26 2月, 2009 1 次提交
    • P
      rcu: Teach RCU that idle task is not quiscent state at boot · a6826048
      Paul E. McKenney 提交于
      This patch fixes a bug located by Vegard Nossum with the aid of
      kmemcheck, updated based on review comments from Nick Piggin,
      Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
      and function-name language.  ;-)
      
      The boot CPU runs in the context of its idle thread during boot-up.
      During this time, idle_cpu(0) will always return nonzero, which will
      fool Classic and Hierarchical RCU into deciding that a large chunk of
      the boot-up sequence is a big long quiescent state.  This in turn causes
      RCU to prematurely end grace periods during this time.
      
      This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
      function to ignore the idle task as a quiescent state until the
      system has started up the scheduler in rest_init(), introducing a
      new non-API function rcu_idle_now_means_idle() to inform RCU of this
      transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
      to track this state, which is then used by rcu_check_callback() to
      determine if it should believe idle_cpu().
      
      Because this patch has the effect of disallowing RCU grace periods
      during long stretches of the boot-up sequence, this patch also introduces
      Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
      no-op if num_online_cpus() returns 1.  This allows boot-time code that
      calls synchronize_rcu() to proceed normally.  Note, however, that RCU
      callbacks registered by call_rcu() will likely queue up until later in
      the boot sequence.  Although rcuclassic and rcutree can also use this
      same optimization after boot completes, rcupreempt must restrict its
      use of this optimization to the portion of the boot sequence before the
      scheduler starts up, given that an rcupreempt RCU read-side critical
      section may be preeempted.
      
      In addition, this patch takes Nick Piggin's suggestion to make the
      system_state global variable be __read_mostly.
      
      Changes since v4:
      
      o	Changes the name of the introduced function and variable to
      	be less emotional.  ;-)
      
      Changes since v3:
      
      o	WARN_ON(nr_context_switches() > 0) to verify that RCU
      	switches out of boot-time mode before the first context
      	switch, as suggested by Nick Piggin.
      
      Changes since v2:
      
      o	Created rcu_blocking_is_gp() internal-to-RCU API that
      	determines whether a call to synchronize_rcu() is itself
      	a grace period.
      
      o	The definition of rcu_blocking_is_gp() for rcuclassic and
      	rcutree checks to see if but a single CPU is online.
      
      o	The definition of rcu_blocking_is_gp() for rcupreempt
      	checks to see both if but a single CPU is online and if
      	the system is still in early boot.
      
      	This allows rcupreempt to again work correctly if running
      	on a single CPU after booting is complete.
      
      o	Added check to rcupreempt's synchronize_sched() for there
      	being but one online CPU.
      
      Tested all three variants both SMP and !SMP, booted fine, passed a short
      rcutorture test on both x86 and Power.
      Located-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6826048
  12. 14 1月, 2009 1 次提交
  13. 05 1月, 2009 2 次提交
  14. 19 12月, 2008 1 次提交
    • P
      "Tree RCU": scalable classic RCU implementation · 64db4cff
      Paul E. McKenney 提交于
      This patch fixes a long-standing performance bug in classic RCU that
      results in massive internal-to-RCU lock contention on systems with
      more than a few hundred CPUs.  Although this patch creates a separate
      flavor of RCU for ease of review and patch maintenance, it is intended
      to replace classic RCU.
      
      This patch still handles stress better than does mainline, so I am still
      calling it ready for inclusion.  This patch is against the -tip tree.
      Nevertheless, experience on an actual 1000+ CPU machine would still be
      most welcome.
      
      Most of the changes noted below were found while creating an rcutiny
      (which should permit ejecting the current rcuclassic) and while doing
      detailed line-by-line documentation.
      
      Updates from v9 (http://lkml.org/lkml/2008/12/2/334):
      
      o	Fixes from remainder of line-by-line code walkthrough,
      	including comment spelling, initialization, undesirable
      	narrowing due to type conversion, removing redundant memory
      	barriers, removing redundant local-variable initialization,
      	and removing redundant local variables.
      
      	I do not believe that any of these fixes address the CPU-hotplug
      	issues that Andi Kleen was seeing, but please do give it a whirl
      	in case the machine is smarter than I am.
      
      	A writeup from the walkthrough may be found at the following
      	URL, in case you are suffering from terminal insomnia or
      	masochism:
      
      	http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf
      
      o	Made rcutree tracing use seq_file, as suggested some time
      	ago by Lai Jiangshan.
      
      o	Added a .csv variant of the rcudata debugfs trace file, to allow
      	people having thousands of CPUs to drop the data into
      	a spreadsheet.	Tested with oocalc and gnumeric.  Updated
      	documentation to suit.
      
      Updates from v8 (http://lkml.org/lkml/2008/11/15/139):
      
      o	Fix a theoretical race between grace-period initialization and
      	force_quiescent_state() that could occur if more than three
      	jiffies were required to carry out the grace-period
      	initialization.  Which it might, if you had enough CPUs.
      
      o	Apply Ingo's printk-standardization patch.
      
      o	Substitute local variables for repeated accesses to global
      	variables.
      
      o	Fix comment misspellings and redundant (but harmless) increments
      	of ->n_rcu_pending (this latter after having explicitly added it).
      
      o	Apply checkpatch fixes.
      
      Updates from v7 (http://lkml.org/lkml/2008/10/10/291):
      
      o	Fixed a number of problems noted by Gautham Shenoy, including
      	the cpu-stall-detection bug that he was having difficulty
      	convincing me was real.  ;-)
      
      o	Changed cpu-stall detection to wait for ten seconds rather than
      	three in order to reduce false positive, as suggested by Ingo
      	Molnar.
      
      o	Produced a design document (http://lwn.net/Articles/305782/).
      	The act of writing this document uncovered a number of both
      	theoretical and "here and now" bugs as noted below.
      
      o	Fix dynticks_nesting accounting confusion, simplify WARN_ON()
      	condition, fix kerneldoc comments, and add memory barriers
      	in dynticks interface functions.
      
      o	Add more data to tracing.
      
      o	Remove unused "rcu_barrier" field from rcu_data structure.
      
      o	Count calls to rcu_pending() from scheduling-clock interrupt
      	to use as a surrogate timebase should jiffies stop counting.
      
      o	Fix a theoretical race between force_quiescent_state() and
      	grace-period initialization.  Yes, initialization does have to
      	go on for some jiffies for this race to occur, but given enough
      	CPUs...
      
      Updates from v6 (http://lkml.org/lkml/2008/9/23/448):
      
      o	Fix a number of checkpatch.pl complaints.
      
      o	Apply review comments from Ingo Molnar and Lai Jiangshan
      	on the stall-detection code.
      
      o	Fix several bugs in !CONFIG_SMP builds.
      
      o	Fix a misspelled config-parameter name so that RCU now announces
      	at boot time if stall detection is configured.
      
      o	Run tests on numerous combinations of configurations parameters,
      	which after the fixes above, now build and run correctly.
      
      Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):
      
      o	Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
      	changeset some time ago, and finally got around to retesting
      	this option).
      
      o	Fix some tracing bugs in rcupreempt that caused incorrect
      	totals to be printed.
      
      o	I now test with a more brutal random-selection online/offline
      	script (attached).  Probably more brutal than it needs to be
      	on the people reading it as well, but so it goes.
      
      o	A number of optimizations and usability improvements:
      
      	o	Make rcu_pending() ignore the grace-period timeout when
      		there is no grace period in progress.
      
      	o	Make force_quiescent_state() avoid going for a global
      		lock in the case where there is no grace period in
      		progress.
      
      	o	Rearrange struct fields to improve struct layout.
      
      	o	Make call_rcu() initiate a grace period if RCU was
      		idle, rather than waiting for the next scheduling
      		clock interrupt.
      
      	o	Invoke rcu_irq_enter() and rcu_irq_exit() only when
      		idle, as suggested by Andi Kleen.  I still don't
      		completely trust this change, and might back it out.
      
      	o	Make CONFIG_RCU_TRACE be the single config variable
      		manipulated for all forms of RCU, instead of the prior
      		confusion.
      
      	o	Document tracing files and formats for both rcupreempt
      		and rcutree.
      
      Updates from v4 for those missing v5 given its bad subject line:
      
      o	Separated dynticks interface so that NMIs and irqs call separate
      	functions, greatly simplifying it.  In particular, this code
      	no longer requires a proof of correctness.  ;-)
      
      o	Separated dynticks state out into its own per-CPU structure,
      	avoiding the duplicated accounting.
      
      o	The case where a dynticks-idle CPU runs an irq handler that
      	invokes call_rcu() is now correctly handled, forcing that CPU
      	out of dynticks-idle mode.
      
      o	Review comments have been applied (thank you all!!!).
      	For but one example, fixed the dynticks-ordering issue that
      	Manfred pointed out, saving me much debugging.  ;-)
      
      o	Adjusted rcuclassic and rcupreempt to handle dynticks changes.
      
      Attached is an updated patch to Classic RCU that applies a hierarchy,
      greatly reducing the contention on the top-level lock for large machines.
      This passes 10-hour concurrent rcutorture and online-offline testing on
      128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
      bugs in presence of dynticks (exciting working on a system where
      "sleep 1" hangs until interrupted...), which were fixed in the
      2.6.27 kernel.  It is getting more reliable than mainline by some
      measures, so the next version will be against -tip for inclusion.
      See also Manfred Spraul's recent patches (or his earlier work from
      2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
      We will converge onto a common patch in the fullness of time, but are
      currently exploring different regions of the design space.  That said,
      I have already gratefully stolen quite a few of Manfred's ideas.
      
      This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
      of the RCU hierarchy.  Defaults to 32 on 32-bit machines and 64 on
      64-bit machines.  If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
      there is no hierarchy.  By default, the RCU initialization code will
      adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
      architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
      this balancing, allowing the hierarchy to be exactly aligned to the
      underlying hardware.  Up to two levels of hierarchy are permitted
      (in addition to the root node), allowing up to 16,384 CPUs on 32-bit
      systems and up to 262,144 CPUs on 64-bit systems.  I just know that I
      am going to regret saying this, but this seems more than sufficient
      for the foreseeable future.  (Some architectures might wish to set
      CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
      If this becomes a real problem, additional levels can be added, but I
      doubt that it will make a significant difference on real hardware.)
      
      In the common case, a given CPU will manipulate its private rcu_data
      structure and the rcu_node structure that it shares with its immediate
      neighbors.  This can reduce both lock and memory contention by multiple
      orders of magnitude, which should eliminate the need for the strange
      manipulations that are reported to be required when running Linux on
      very large systems.
      
      Some shortcomings:
      
      o	More bugs will probably surface as a result of an ongoing
      	line-by-line code inspection.
      
      	Patches will be provided as required.
      
      o	There are probably hangs, rcutorture failures, &c.  Seems
      	quite stable on a 128-CPU machine, but that is kind of small
      	compared to 4096 CPUs.  However, seems to do better than
      	mainline.
      
      	Patches will be provided as required.
      
      o	The memory footprint of this version is several KB larger
      	than rcuclassic.
      
      	A separate UP-only rcutiny patch will be provided, which will
      	reduce the memory footprint significantly, even compared
      	to the old rcuclassic.  One such patch passes light testing,
      	and has a memory footprint smaller even than rcuclassic.
      	Initial reaction from various embedded guys was "it is not
      	worth it", so am putting it aside.
      
      Credits:
      
      o	Manfred Spraul for ideas, review comments, and bugs spotted,
      	as well as some good friendly competition.  ;-)
      
      o	Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
      	Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
      	for reviews and comments.
      
      o	Thomas Gleixner for much-needed help with some timer issues
      	(see patches below).
      
      o	Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
      	Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
      	Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
      	alive despite my heavy abuse^Wtesting.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64db4cff