1. 25 2月, 2010 2 次提交
    • P
      rcu: Convert to raw_spinlocks · 1304afb2
      Paul E. McKenney 提交于
      The spinlocks in rcutree need to be real spinlocks in
      preempt-rt. Convert them to raw_spinlocks.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-18-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1304afb2
    • P
      rcu: Stop overflowing signed integers · 20133cfc
      Paul E. McKenney 提交于
      The C standard does not specify the result of an operation that
      overflows a signed integer, so such operations need to be
      avoided.  This patch changes the type of several fields from
      "long" to "unsigned long" and adjusts operations as needed.
      ULONG_CMP_GE() and ULONG_CMP_LT() macros are introduced to do
      the modular comparisons that are appropriate given that overflow
      is an expected event.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-17-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      20133cfc
  2. 16 1月, 2010 1 次提交
    • P
      rcu: Fix sparse warnings · 017c4261
      Paul E. McKenney 提交于
      Rename local variable "i" in rcu_init() to avoid conflict with
      RCU_INIT_FLAVOR(), restrict the scope of RCU_TREE_NONCORE, and
      make __synchronize_srcu() static.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12635142581560-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      017c4261
  3. 13 1月, 2010 4 次提交
    • P
      rcu: Make force_quiescent_state() start grace period if needed · 46a1e34e
      Paul E. McKenney 提交于
      Grace periods cannot be started while force_quiescent_state() is
      active.  This is OK in that the affected CPUs will try again
      later, but it does induce needless grace-period delays.  This
      patch causes rcu_start_gp() to record a failed attempt to start
      a grace period. When force_quiescent_state() prepares to return,
      it then starts the grace period if there was such a failed
      attempt.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501854-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46a1e34e
    • P
      rcu: Remove leg of force_quiescent_state() switch statement · ee47eb9f
      Paul E. McKenney 提交于
      The comparisons of rsp->gpnum nad rsp->completed in
      rcu_process_dyntick() and force_quiescent_state() can be
      replaced by the much more clear rcu_gp_in_progress() predicate
      function.  After doing this, it becomes clear that the
      RCU_SAVE_COMPLETED leg of the force_quiescent_state() function's
      switch statement is almost completely a no-op.  A small change
      to the RCU_SAVE_DYNTICK leg renders it a complete no-op, after
      which it can be removed.  Doing so also eliminates the forcenow
      local variable from force_quiescent_state().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501781-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ee47eb9f
    • P
      rcu: Eliminate local variable lastcomp from force_quiescent_state() · 39c0bbfc
      Paul E. McKenney 提交于
      Because rsp->fqs_active is set to 1 across
      force_quiescent_state()'s switch statement, rcu_start_gp() will
      refrain from starting a new grace period during this time.
      Therefore, rsp->gpnum is constant, and can be propagated to all
      uses of lastcomp, eliminating this local variable.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465502985-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39c0bbfc
    • P
      rcu: Prohibit starting new grace periods while forcing quiescent states · 07079d53
      Paul E. McKenney 提交于
      Reduce the number and variety of race conditions by prohibiting
      the start of a new grace period while force_quiescent_state() is
      active. A new fqs_active flag in the rcu_state structure is used
      to trace whether or not force_quiescent_state() is active, and
      this new flag is tested by rcu_start_gp().  If the CPU that
      closed out the last grace period needs another grace period,
      this new grace period may be delayed up to one scheduling-clock
      tick, but it will eventually get started.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <126264655052-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07079d53
  4. 03 12月, 2009 3 次提交
    • P
      rcu: Add expedited grace-period support for preemptible RCU · d9a3da06
      Paul E. McKenney 提交于
      Implement an synchronize_rcu_expedited() for preemptible RCU
      that actually is expedited.  This uses
      synchronize_sched_expedited() to force all threads currently
      running in a preemptible-RCU read-side critical section onto the
      appropriate ->blocked_tasks[] list, then takes a snapshot of all
      of these lists and waits for them to drain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1259784616158-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9a3da06
    • P
      rcu: Enable fourth level of TREE_RCU hierarchy · cf244dc0
      Paul E. McKenney 提交于
      Enable a fourth level of rcu_node hierarchy for TREE_RCU and
      TREE_PREEMPT_RCU.  This is for stress-testing and experiemental
      purposes only, although in theory this would enable 16,777,216
      CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
      systems. Normal experimental use of this fourth level will
      normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
      though the more adventurous (and more fortunate) experimenters
      may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
      CONFIG_RCU_FANOUT=4 for 256-CPU systems.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846161257-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf244dc0
    • P
      rcu: Rename "quiet" functions · d3f6bad3
      Paul E. McKenney 提交于
      The number of "quiet" functions has grown recently, and the
      names are no longer very descriptive.  The point of all of these
      functions is to do some portion of the task of reporting a
      quiescent state, so rename them accordingly:
      
      o	cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
      	quiescent state to the per-CPU rcu_data structure.  If this
      	turns out to be a new quiescent state for this grace period,
      	then rcu_report_qs_rnp() will be invoked to propagate the
      	quiescent state up the rcu_node hierarchy.
      
      o	cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
      	a quiescent state for a given CPU (or possibly a set of CPUs)
      	up the rcu_node hierarchy.
      
      o	cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
      	reports a full set of quiescent states to the global rcu_state
      	structure.
      
      o	task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
      	a quiescent state due to a task exiting an RCU read-side critical
      	section that had previously blocked in that same critical section.
      	As indicated by the new name, this type of quiescent state is
      	reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
      	to do so).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846163698-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3f6bad3
  5. 23 11月, 2009 1 次提交
    • P
      rcu: Fix grace-period-stall bug on large systems with CPU hotplug · b668c9cf
      Paul E. McKenney 提交于
      When the last CPU of a given leaf rcu_node structure goes
      offline, all of the tasks queued on that leaf rcu_node structure
      (due to having blocked in their current RCU read-side critical
      sections) are requeued onto the root rcu_node structure.  This
      requeuing is carried out by rcu_preempt_offline_tasks().
      However, it is possible that these queued tasks are the only
      thing preventing the leaf rcu_node structure from reporting a
      quiescent state up the rcu_node hierarchy.  Unfortunately, the
      old code would fail to do this reporting, resulting in a
      grace-period stall given the following sequence of events:
      
      1.	Kernel built for more than 32 CPUs on 32-bit systems or for more
      	than 64 CPUs on 64-bit systems, so that there is more than one
      	rcu_node structure.  (Or CONFIG_RCU_FANOUT is artificially set
      	to a number smaller than CONFIG_NR_CPUS.)
      
      2.	The kernel is built with CONFIG_TREE_PREEMPT_RCU.
      
      3.	A task running on a CPU associated with a given leaf rcu_node
      	structure blocks while in an RCU read-side critical section
      	-and- that CPU has not yet passed through a quiescent state
      	for the current RCU grace period.  This will cause the task
      	to be queued on the leaf rcu_node's blocked_tasks[] array, in
      	particular, on the element of this array corresponding to the
      	current grace period.
      
      4.	Each of the remaining CPUs corresponding to this same leaf rcu_node
      	structure pass through a quiescent state.  However, the task is
      	still in its RCU read-side critical section, so these quiescent
      	states cannot be reported further up the rcu_node hierarchy.
      	Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
      	field are now zero.
      
      5.	Each of the remaining CPUs go offline.  (The events in step
      	#4 and #5 can happen in any order as long as each CPU passes
      	through a quiescent state before going offline.)
      
      6.	When the last CPU goes offline, __rcu_offline_cpu() will invoke
      	rcu_preempt_offline_tasks(), which will move the task to the
      	root rcu_node structure, but without reporting a quiescent state
      	up the rcu_node hierarchy (and this failure to report a quiescent
      	state is the bug).
      
      	But because this leaf rcu_node structure's ->qsmask field is
      	already zero and its ->block_tasks[] entries are all empty,
      	force_quiescent_state() will skip this rcu_node structure.
      
      	Therefore, grace periods are now hung.
      
      This patch abstracts some code out of rcu_read_unlock_special(),
      calling the result task_quiet() by analogy with cpu_quiet(), and
      invokes task_quiet() from both rcu_read_lock_special() and
      __rcu_offline_cpu().  Invoking task_quiet() from
      __rcu_offline_cpu() reports the quiescent state up the rcu_node
      hierarchy, fixing the bug.  This ends up requiring a separate
      lock_class_key per level of the rcu_node hierarchy, which this
      patch also provides.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12589088301770-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b668c9cf
  6. 11 11月, 2009 2 次提交
    • P
      rcu: Rename dynticks_completed to completed_fqs · 4bcfe055
      Paul E. McKenney 提交于
      This field is used whether or not CONFIG_NO_HZ is set, so the
      old name of ->dynticks_completed is quite misleading.
      
      Change to ->completed_fqs, given that it the value that
      force_quiescent_state() is trying to drive the ->completed field
      away from.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890423298-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4bcfe055
    • P
      rcu: Remove inline from forward-referenced functions · dbe01350
      Paul E. McKenney 提交于
      Some variants of gcc are reputed to dislike forward references
      to functions declared "inline".  Remove the "inline" keyword
      from such functions.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890422402-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbe01350
  7. 10 11月, 2009 2 次提交
    • P
      rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter · d09b62df
      Paul E. McKenney 提交于
      Impose a clear locking design on the rcu_process_gp_end()
      function's use of the ->completed counter.  This is done by
      creating a ->completed field in the rcu_node structure, which
      can safely be accessed under the protection of that structure's
      lock.  Performance and scalability are maintained by using a
      form of double-checked locking, so that rcu_process_gp_end()
      only acquires the leaf rcu_node structure's ->lock if a grace
      period has recently ended.
      
      This fix reduces rcutorture failure rate by at least two orders
      of magnitude under heavy stress with force_quiescent_state()
      being invoked artificially often.  Without this fix,
      unsynchronized access to the ->completed field can cause
      rcu_process_gp_end() to advance callbacks whose grace period has
      not yet expired.  (Bad idea!)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987494069-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d09b62df
    • P
      rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter · 281d150c
      Paul E. McKenney 提交于
      Impose a clear locking design on non-NO_HZ handling of the
      ->completed counter.  This increases the distance between the
      RCU and the CPU-hotplug mechanisms.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987491353-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      281d150c
  8. 02 11月, 2009 1 次提交
    • P
      rcu: Fix long-grace-period race between forcing and initialization · 83f5b01f
      Paul E. McKenney 提交于
      Very long RCU read-side critical sections (50 milliseconds or
      so) can cause a race between force_quiescent_state() and
      rcu_start_gp() as follows on kernel builds with multi-level
      rcu_node hierarchies:
      
      1.	CPU 0 calls force_quiescent_state(), sees that there is a
      	grace period in progress, and acquires ->fsqlock.
      
      2.	CPU 1 detects the end of the grace period, and so
      	cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
      	This operation is carried out under the root rnp->lock,
      	but CPU 0 has not yet acquired that lock.  Note that
      	rsp->signaled is still RCU_SAVE_DYNTICK from the last
      	grace period.
      
      3.	CPU 1 calls rcu_start_gp(), but no one wants a new grace
      	period, so it drops the root rnp->lock and returns.
      
      4.	CPU 0 acquires the root rnp->lock and picks up rsp->completed
      	and rsp->signaled, then drops rnp->lock.  It then enters the
      	RCU_SAVE_DYNTICK leg of the switch statement.
      
      5.	CPU 2 invokes call_rcu(), and now needs a new grace period.
      	It calls rcu_start_gp(), which acquires the root rnp->lock, sets
      	rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
      	the RCU_SAVE_DYNTICK leg of the switch statement!)  and starts
      	initializing the rcu_node hierarchy.  If there are multiple
      	levels to the hierarchy, it will drop the root rnp->lock and
      	initialize the lower levels of the hierarchy.
      
      6.	CPU 0 notes that rsp->completed has not changed, which permits
              both CPU 2 and CPU 0 to try updating it concurrently.  If CPU 0's
      	update prevails, later calls to force_quiescent_state() can
      	count old quiescent states against the new grace period, which
      	can in turn result in premature ending of grace periods.
      
      	Not good.
      
      This patch adds an RCU_GP_IDLE state for rsp->signaled that is
      set initially at boot time and any time a grace period ends.
      This prevents CPU 0 from getting into the workings of
      force_quiescent_state() in step 4.  Additional locking and
      checks prevent the concurrent update of rsp->signaled in step 6.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1256742889199-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83f5b01f
  9. 16 10月, 2009 1 次提交
    • P
      rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang · 237c80c5
      Paul E. McKenney 提交于
      If the following sequence of events occurs, then
      TREE_PREEMPT_RCU will hang waiting for a grace period to
      complete, eventually OOMing the system:
      
      o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
      	with more than 64 physical CPUs present (32 on a 32-bit system).
      	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
      	with RCU_FANOUT set to a sufficiently small value that the
      	physical CPUs populate two or more leaf rcu_node structures.
      
      o	A task is preempted in an RCU read-side critical section
      	while running on a CPU corresponding to a given leaf rcu_node
      	structure.
      
      o	All CPUs corresponding to this same leaf rcu_node structure
      	record quiescent states for the current grace period.
      
      o	All of these same CPUs go offline (hence the need for enough
      	physical CPUs to populate more than one leaf rcu_node structure).
      	This causes the preempted task to be moved to the root rcu_node
      	structure.
      
      At this point, there is nothing left to cause the quiescent
      state to be propagated up the rcu_node tree, so the current
      grace period never completes.
      
      The simplest fix, especially after considering the deadlock
      possibilities, is to detect this situation when the last CPU is
      offlined, and to set that CPU's ->qsmask bit in its leaf
      rcu_node structure.  This will cause the next invocation of
      force_quiescent_state() to end the grace period.
      
      Without this fix, this hang can be triggered in an hour or so on
      some machines with rcutorture and random CPU onlining/offlining.
      With this fix, these same machines pass a full 10 hours of this
      sort of abuse.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      237c80c5
  10. 15 10月, 2009 1 次提交
    • P
      rcu: Prevent RCU IPI storms in presence of high call_rcu() load · 37c72e56
      Paul E. McKenney 提交于
      As the number of callbacks on a given CPU rises, invoke
      force_quiescent_state() only every blimit number of callbacks
      (defaults to 10,000), and even then only if no other CPU has
      invoked force_quiescent_state() in the meantime.
      
      This should fix the performance regression reported by Nick.
      Reported-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592133-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37c72e56
  11. 07 10月, 2009 1 次提交
    • P
      rcu: Make hot-unplugged CPU relinquish its own RCU callbacks · e74f4c45
      Paul E. McKenney 提交于
      The current interaction between RCU and CPU hotplug requires that
      RCU block in CPU notifiers waiting for callbacks to drain.
      
      This can be greatly simplified by having each CPU relinquish its
      own callbacks, and for both _rcu_barrier() and CPU_DEAD notifiers
      to adopt all callbacks that were previously relinquished.
      
      This change also eliminates the possibility of certain types of
      hangs due to the previous practice of waiting for callbacks to be
      invoked from within CPU notifiers.  If you don't every wait, you
      cannot hang.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1254890898456-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e74f4c45
  12. 06 10月, 2009 1 次提交
    • P
      rcu: Clean up code based on review feedback from Josh Triplett, part 4 · a0b6c9a7
      Paul E. McKenney 提交于
      These issues identified during an old-fashioned face-to-face code
      review extending over many hours.  This group improves an existing
      abstraction and introduces two new ones.  It also fixes an RCU
      stall-warning bug found while making the other changes.
      
      o	Make RCU_INIT_FLAVOR() declare its own variables, removing
      	the need to declare them at each call site.
      
      o	Create an rcu_for_each_leaf() macro that scans the leaf
      	nodes of the rcu_node tree.
      
      o	Create an rcu_for_each_node_breadth_first() macro that does
      	a breadth-first traversal of the rcu_node tree, AKA
      	stepping through the array in index-number order.
      
      o	If all CPUs corresponding to a given leaf rcu_node
      	structure go offline, then any tasks queued on that leaf
      	will be moved to the root rcu_node structure.  Therefore,
      	the stall-warning code must dump out tasks queued on the
      	root rcu_node structure as well as those queued on the leaf
      	rcu_node structures.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12541491934126-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a0b6c9a7
  13. 24 9月, 2009 3 次提交
    • P
      rcu: Clean up code to address Ingo's checkpatch feedback · 9b2619af
      Paul E. McKenney 提交于
      Move declarations and update storage classes to make checkpatch happy.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12537246441701-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9b2619af
    • P
      rcu: Clean up code based on review feedback from Josh Triplett, part 2 · 1eba8f84
      Paul E. McKenney 提交于
      These issues identified during an old-fashioned face-to-face code
      review extending over many hours.
      
      o	Add comments for tricky parts of code, and correct comments
      	that have passed their sell-by date.
      
      o	Get rid of the vestiges of rcu_init_sched(), which is no
      	longer needed now that PREEMPT_RCU is gone.
      
      o	Move the #include of rcutree_plugin.h to the end of
      	rcutree.c, which means that, rather than having a random
      	collection of forward declarations, the new set of forward
      	declarations document the set of plugins.  The new home for
      	this #include also allows __rcu_init_preempt() to move into
      	rcutree_plugin.h.
      
      o	Fix rcu_preempt_check_callbacks() to be static.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12537246443924-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Peter Zijlstra <peterz@infradead.org>
      1eba8f84
    • P
      rcu: Clean up code based on review feedback from Josh Triplett · fc2219d4
      Paul E. McKenney 提交于
      These issues identified during an old-fashioned face-to-face code
      review extended over many hours.
      
      o	Bury various forms of the "rsp->completed == rsp->gpnum"
      	comparison into an rcu_gp_in_progress() function, which has
      	the beneficial side-effect of forcing consistent use of
      	ACCESS_ONCE().
      
      o	Replace hand-coded arithmetic with DIV_ROUND_UP().
      
      o	Bury several "!list_empty(&rnp->blocked_tasks[rnp->gpnum & 0x01])"
      	instances into an rcu_preempted_readers() function, as this
      	expression indicates that there are no readers blocked
      	within RCU read-side critical sections blocking the current
      	grace period.  (Though there might well be similar readers
      	blocking the next grace period.)
      
      o	Remove a dangling rcu_restart_cpu() declaration that has
      	been dangling for almost 20 minor releases of the kernel.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12537246442687-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fc2219d4
  14. 19 9月, 2009 1 次提交
    • P
      rcu: Fix whitespace inconsistencies · a71fca58
      Paul E. McKenney 提交于
      Fix a number of whitespace ^Ierrors in the include/linux/rcu*
      and the kernel/rcu* files.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      LKML-Reference: <20090918172819.GA24405@linux.vnet.ibm.com>
      [ did more checkpatch fixlets ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a71fca58
  15. 29 8月, 2009 1 次提交
  16. 23 8月, 2009 3 次提交
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
    • P
      rcu: Renamings to increase RCU clarity · d6714c22
      Paul E. McKenney 提交于
      Make RCU-sched, RCU-bh, and RCU-preempt be underlying
      implementations, with "RCU" defined in terms of one of the
      three.  Update the outdated rcu_qsctr_inc() names, as these
      functions no longer increment anything.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746132696-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6714c22
    • P
      rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h · 9f77da9f
      Paul E. McKenney 提交于
      Some information hiding that makes it easier to merge
      preemptability into rcutree without descending into #include
      hell.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <1250974613373-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f77da9f
  17. 03 4月, 2009 1 次提交
    • I
      kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies · 6258c4fb
      Ingo Molnar 提交于
      Impact: cleanup
      
      We want to remove rcutree internals from the public rcutree.h file for
      upcoming kmemtrace changes - but kernel/rcutree_trace.c depends on them.
      
      Introduce kernel/rcutree.h for internal definitions. (Probably all
      the other data types from include/linux/rcutree.h could be
      moved here too - except rcu_data.)
      
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: paulmck@linux.vnet.ibm.com
      LKML-Reference: <1237898630.25315.83.camel@penberg-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6258c4fb