1. 06 5月, 2011 4 次提交
    • P
      rcu: move TREE_RCU from softirq to kthread · a26ac245
      Paul E. McKenney 提交于
      If RCU priority boosting is to be meaningful, callback invocation must
      be boosted in addition to preempted RCU readers.  Otherwise, in presence
      of CPU real-time threads, the grace period ends, but the callbacks don't
      get invoked.  If the callbacks don't get invoked, the associated memory
      doesn't get freed, so the system is still subject to OOM.
      
      But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit
      moves the callback invocations to a kthread, which can be boosted easily.
      
      Also add comments and properly synchronized all accesses to
      rcu_cpu_kthread_task, as suggested by Lai Jiangshan.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a26ac245
    • P
      rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists · 12f5f524
      Paul E. McKenney 提交于
      Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the
      rcu_node structure into a single ->blkd_tasks list with ->gp_tasks
      and ->exp_tasks tail pointers.  This is in preparation for RCU priority
      boosting, which will add a third dimension to the combinatorial explosion
      in the ->blocked_tasks[] case, but simply a third pointer in the new
      ->blkd_tasks case.
      
      Also update documentation to reflect blocked_tasks[] merge
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      12f5f524
    • P
      rcu: Decrease memory-barrier usage based on semi-formal proof · e59fb312
      Paul E. McKenney 提交于
      Commit d09b62df fixed grace-period synchronization, but left some smp_mb()
      invocations in rcu_process_callbacks() that are no longer needed, but
      sheer paranoia prevented them from being removed.  This commit removes
      them and provides a proof of correctness in their absence.  It also adds
      a memory barrier to rcu_report_qs_rsp() immediately before the update to
      rsp->completed in order to handle the theoretical possibility that the
      compiler or CPU might move massive quantities of code into a lock-based
      critical section.  This also proves that the sheer paranoia was not
      entirely unjustified, at least from a theoretical point of view.
      
      In addition, the old dyntick-idle synchronization depended on the fact
      that grace periods were many milliseconds in duration, so that it could
      be assumed that no dyntick-idle CPU could reorder a memory reference
      across an entire grace period.  Unfortunately for this design, the
      addition of expedited grace periods breaks this assumption, which has
      the unfortunate side-effect of requiring atomic operations in the
      functions that track dyntick-idle state for RCU.  (There is some hope
      that the algorithms used in user-level RCU might be applied here, but
      some work is required to handle the NMIs that user-space applications
      can happily ignore.  For the short term, better safe than sorry.)
      
      This proof assumes that neither compiler nor CPU will allow a lock
      acquisition and release to be reordered, as doing so can result in
      deadlock.  The proof is as follows:
      
      1.	A given CPU declares a quiescent state under the protection of
      	its leaf rcu_node's lock.
      
      2.	If there is more than one level of rcu_node hierarchy, the
      	last CPU to declare a quiescent state will also acquire the
      	->lock of the next rcu_node up in the hierarchy,  but only
      	after releasing the lower level's lock.  The acquisition of this
      	lock clearly cannot occur prior to the acquisition of the leaf
      	node's lock.
      
      3.	Step 2 repeats until we reach the root rcu_node structure.
      	Please note again that only one lock is held at a time through
      	this process.  The acquisition of the root rcu_node's ->lock
      	must occur after the release of that of the leaf rcu_node.
      
      4.	At this point, we set the ->completed field in the rcu_state
      	structure in rcu_report_qs_rsp().  However, if the rcu_node
      	hierarchy contains only one rcu_node, then in theory the code
      	preceding the quiescent state could leak into the critical
      	section.  We therefore precede the update of ->completed with a
      	memory barrier.  All CPUs will therefore agree that any updates
      	preceding any report of a quiescent state will have happened
      	before the update of ->completed.
      
      5.	Regardless of whether a new grace period is needed, rcu_start_gp()
      	will propagate the new value of ->completed to all of the leaf
      	rcu_node structures, under the protection of each rcu_node's ->lock.
      	If a new grace period is needed immediately, this propagation
      	will occur in the same critical section that ->completed was
      	set in, but courtesy of the memory barrier in #4 above, is still
      	seen to follow any pre-quiescent-state activity.
      
      6.	When a given CPU invokes __rcu_process_gp_end(), it becomes
      	aware of the end of the old grace period and therefore makes
      	any RCU callbacks that were waiting on that grace period eligible
      	for invocation.
      
      	If this CPU is the same one that detected the end of the grace
      	period, and if there is but a single rcu_node in the hierarchy,
      	we will still be in the single critical section.  In this case,
      	the memory barrier in step #4 guarantees that all callbacks will
      	be seen to execute after each CPU's quiescent state.
      
      	On the other hand, if this is a different CPU, it will acquire
      	the leaf rcu_node's ->lock, and will again be serialized after
      	each CPU's quiescent state for the old grace period.
      
      On the strength of this proof, this commit therefore removes the memory
      barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
      The effect is to reduce the number of memory barriers by one and to
      reduce the frequency of execution from about once per scheduling tick
      per CPU to once per grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e59fb312
    • P
      rcu: Remove conditional compilation for RCU CPU stall warnings · a00e0d71
      Paul E. McKenney 提交于
      The RCU CPU stall warnings can now be controlled using the
      rcu_cpu_stall_suppress boot-time parameter or via the same parameter
      from sysfs.  There is therefore no longer any reason to have
      kernel config parameters for this feature.  This commit therefore
      removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE
      kernel config parameters.  The RCU_CPU_STALL_TIMEOUT parameter remains
      to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter
      remains to allow task-stall information to be suppressed if desired.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a00e0d71
  2. 18 12月, 2010 1 次提交
    • T
      rcu: increase synchronize_sched_expedited() batching · e27fc964
      Tejun Heo 提交于
      The fix in commit #6a0cc49 requires more than three concurrent instances
      of synchronize_sched_expedited() before batching is possible.  This
      patch uses a ticket-counter-like approach that is also not unrelated to
      Lai Jiangshan's Ring RCU to allow sharing of expedited grace periods even
      when there are only two concurrent instances of synchronize_sched_expedited().
      
      This commit builds on Tejun's original posting, which may be found at
      http://lkml.org/lkml/2010/11/9/204, adding memory barriers, avoiding
      overflow of signed integers (other than via atomic_t), and fixing the
      detection of batching.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e27fc964
  3. 30 11月, 2010 4 次提交
    • P
      rcu: fix race condition in synchronize_sched_expedited() · db3a8920
      Paul E. McKenney 提交于
      The new (early 2010) implementation of synchronize_sched_expedited() uses
      try_stop_cpu() to force a context switch on every CPU.  It also permits
      concurrent calls to synchronize_sched_expedited() to share a single call
      to try_stop_cpu() through use of an atomically incremented
      synchronize_sched_expedited_count variable.  Unfortunately, this is
      subject to failure as follows:
      
      o	Task A invokes synchronize_sched_expedited(), try_stop_cpus()
      	succeeds, but Task A is preempted before getting to the atomic
      	increment of synchronize_sched_expedited_count.
      
      o	Task B also invokes synchronize_sched_expedited(), with exactly
      	the same outcome as Task A.
      
      o	Task C also invokes synchronize_sched_expedited(), again with
      	exactly the same outcome as Tasks A and B.
      
      o	Task D also invokes synchronize_sched_expedited(), but only
      	gets as far as acquiring the mutex within try_stop_cpus()
      	before being preempted, interrupted, or otherwise delayed.
      
      o	Task E also invokes synchronize_sched_expedited(), but only
      	gets to the snapshotting of synchronize_sched_expedited_count.
      
      o	Tasks A, B, and C all increment synchronize_sched_expedited_count.
      
      o	Task E fails to get the mutex, so checks the new value
      	of synchronize_sched_expedited_count.  It finds that the
      	value has increased, so (wrongly) assumes that its work
      	has been done, returning despite there having been no
      	expedited grace period since it began.
      
      The solution is to have the lowest-numbered CPU atomically increment
      the synchronize_sched_expedited_count variable within the
      synchronize_sched_expedited_cpu_stop() function, which is under
      the protection of the mutex acquired by try_stop_cpus().  However, this
      also requires that piggybacking tasks wait for three rather than two
      instances of try_stop_cpu(), because we cannot control the order in
      which the per-CPU callback function occur.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      db3a8920
    • P
      rcu: update documentation/comments for Lai's adoption patch · 2d999e03
      Paul E. McKenney 提交于
      Lai's RCU-callback immediate-adoption patch changes the RCU tracing
      output, so update tracing.txt.  Also update a few comments to clarify
      the synchronization design.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2d999e03
    • L
      rcu,cleanup: simplify the code when cpu is dying · 29494be7
      Lai Jiangshan 提交于
      When we handle the CPU_DYING notifier, the whole system is stopped except
      for the current CPU.  We therefore need no synchronization with the other
      CPUs.  This allows us to move any orphaned RCU callbacks directly to the
      list of any online CPU without needing to run them through the global
      orphan lists.  These global orphan lists can therefore be dispensed with.
      This commit makes thes changes, though currently victimizes CPU 0 @@@.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      29494be7
    • L
      rcu,cleanup: move synchronize_sched_expedited() out of sched.c · 7b27d547
      Lai Jiangshan 提交于
      The first version of synchronize_sched_expedited() used the migration
      code in the scheduler, and was therefore implemented in kernel/sched.c.
      However, the more recent version of this code no longer uses the
      migration code, so this commit moves it to the main RCU source files.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7b27d547
  4. 03 9月, 2010 1 次提交
  5. 21 8月, 2010 3 次提交
  6. 20 8月, 2010 2 次提交
  7. 12 5月, 2010 1 次提交
  8. 11 5月, 2010 5 次提交
    • P
      d822ed10
    • P
      rcu: RCU_FAST_NO_HZ must check RCU dyntick state · 77e38ed3
      Paul E. McKenney 提交于
      The current version of RCU_FAST_NO_HZ reproduces the old CLASSIC_RCU
      dyntick-idle bug, as it fails to detect CPUs that have interrupted
      or NMIed out of dyntick-idle mode.  Fix this by making rcu_needs_cpu()
      check the state in the per-CPU rcu_dynticks variables, thus correctly
      detecting the dyntick-idle state from an RCU perspective.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      77e38ed3
    • P
      rcu: print boot-time console messages if RCU configs out of ordinary · 26845c28
      Paul E. McKenney 提交于
      Print boot-time messages if tracing is enabled, if fanout is set
      to non-default values, if exact fanout is specified, if accelerated
      dyntick-idle grace periods have been enabled, if RCU-lockdep is enabled,
      if rcutorture has been boot-time enabled, if the CPU stall detector has
      been disabled, or if four-level hierarchy has been enabled.
      
      This is all for TREE_RCU and TREE_PREEMPT_RCU.  TINY_RCU will be handled
      separately, if at all.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      26845c28
    • P
      rcu: refactor RCU's context-switch handling · 25502a6c
      Paul E. McKenney 提交于
      The addition of preemptible RCU to treercu resulted in a bit of
      confusion and inefficiency surrounding the handling of context switches
      for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
      is a quiescent state, pure and simple, just like it always has been.
      For RCU-preempt, a context switch is in no way a quiescent state, but
      special handling is required when a task blocks in an RCU read-side
      critical section.
      
      However, the callout from the scheduler and the outer loop in ksoftirqd
      still calls something named rcu_sched_qs(), whose name is no longer
      accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
      quiescent state, it ends up unnecessarily (though harmlessly, aside
      from the performance hit) enqueuing the current task if it happens to
      be running in an RCU-preempt read-side critical section.  This not only
      increases the maximum latency of scheduler_tick(), it also needlessly
      increases the overhead of the next outermost rcu_read_unlock() invocation.
      
      This patch addresses this situation by separating the notion of RCU's
      context-switch handling from that of RCU-sched's quiescent states.
      The context-switch handling is covered by rcu_note_context_switch() in
      general and by rcu_preempt_note_context_switch() for preemptible RCU.
      This permits rcu_sched_qs() to handle quiescent states and only quiescent
      states.  It also reduces the maximum latency of scheduler_tick(), though
      probably by much less than a microsecond.  Finally, it means that tasks
      within preemptible-RCU read-side critical sections avoid incurring the
      overhead of queuing unless there really is a context switch.
      Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      25502a6c
    • L
      rcu: ignore offline CPUs in last non-dyntick-idle CPU check · 5db35673
      Lai Jiangshan 提交于
      Offline CPUs are not in nohz_cpu_mask, but can be ignored when checking
      for the last non-dyntick-idle CPU.  This patch therefore only checks
      online CPUs for not being dyntick idle, allowing fast entry into
      full-system dyntick-idle state even when there are some offline CPUs.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5db35673
  9. 28 2月, 2010 1 次提交
  10. 27 2月, 2010 2 次提交
    • P
      rcu: Fix accelerated GPs for last non-dynticked CPU · 71da8132
      Paul E. McKenney 提交于
      This patch disables irqs across the call to rcu_needs_cpu().  It
      also enforces a hold-off period so that the idle loop doesn't
      softirq itself to death when there are lots of RCU callbacks in
      flight on the last non-dynticked CPU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267231138-27856-3-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71da8132
    • P
      rcu: Fix accelerated grace periods for last non-dynticked CPU · a47cd880
      Paul E. McKenney 提交于
      It is invalid to invoke __rcu_process_callbacks() with irqs
      disabled, so do it indirectly via raise_softirq().  This
      requires a state-machine implementation to cycle through the
      grace-period machinery the required number of times.
      Located-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267231138-27856-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a47cd880
  11. 25 2月, 2010 4 次提交
    • P
      rcu: Add RCU_CPU_STALL_VERBOSE to dump detailed per-task information · 1ed509a2
      Paul E. McKenney 提交于
      When RCU detects a grace-period stall, it currently just prints
      out the PID of any tasks doing the stalling.  This patch adds
      RCU_CPU_STALL_VERBOSE, which enables the more-verbose reporting
      from sched_show_task().
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-21-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ed509a2
    • P
      rcu: Fix deadlock in TREE_PREEMPT_RCU CPU stall detection · 3acd9eb3
      Paul E. McKenney 提交于
      Under TREE_PREEMPT_RCU, print_other_cpu_stall() invokes
      rcu_print_task_stall() with the root rcu_node structure's ->lock
      held, and rcu_print_task_stall() acquires that same lock for
      self-deadlock. Fix this by removing the lock acquisition from
      rcu_print_task_stall(), and making all callers acquire the lock
      instead.
      Tested-by: NJohn Kacur <jkacur@redhat.com>
      Tested-by: NThomas Gleixner <tglx@linutronix.de>
      Located-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-19-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3acd9eb3
    • P
      rcu: Convert to raw_spinlocks · 1304afb2
      Paul E. McKenney 提交于
      The spinlocks in rcutree need to be real spinlocks in
      preempt-rt. Convert them to raw_spinlocks.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-18-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1304afb2
    • P
      rcu: Accelerate grace period if last non-dynticked CPU · 8bd93a2c
      Paul E. McKenney 提交于
      Currently, rcu_needs_cpu() simply checks whether the current CPU
      has an outstanding RCU callback, which means that the last CPU
      to go into dyntick-idle mode might wait a few ticks for the
      relevant grace periods to complete.  However, if all the other
      CPUs are in dyntick-idle mode, and if this CPU is in a quiescent
      state (which it is for RCU-bh and RCU-sched any time that we are
      considering going into dyntick-idle mode), then the grace period
      is instantly complete.
      
      This patch therefore repeatedly invokes the RCU grace-period
      machinery in order to force any needed grace periods to complete
      quickly.  It does so a limited number of times in order to
      prevent starvation by an RCU callback function that might pass
      itself to call_rcu().
      
      However, if any CPU other than the current one is not in
      dyntick-idle mode, fall back to simply checking (with fix to bug
      noted by Lai Jiangshan).  Also, take advantage of last
      grace-period forcing, the opportunity to do so noted by Steve
      Rostedt.  And apply simplified #ifdef condition suggested by
      Frederic Weisbecker.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bd93a2c
  12. 13 1月, 2010 2 次提交
    • P
      rcu: Add debug check for too many rcu_read_unlock() · cba8244a
      Paul E. McKenney 提交于
      TREE_PREEMPT_RCU maintains an rcu_read_lock_nesting counter in
      the task structure, which happens to be a signed int.  So this
      patch adds a check for this counter being negative at the end of
      __rcu_read_unlock(). This check is under CONFIG_PROVE_LOCKING,
      so can be thought of as being part of lockdep.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626498423064-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cba8244a
    • P
      rcu: Add force_quiescent_state() testing to rcutorture · bf66f18e
      Paul E. McKenney 提交于
      Add force_quiescent_state() testing to rcutorture, with a
      separate thread that repeatedly invokes force_quiescent_state()
      in bursts. This can greatly increase the probability of
      encountering certain types of race conditions.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1262646551116-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf66f18e
  13. 03 12月, 2009 2 次提交
    • P
      rcu: Add expedited grace-period support for preemptible RCU · d9a3da06
      Paul E. McKenney 提交于
      Implement an synchronize_rcu_expedited() for preemptible RCU
      that actually is expedited.  This uses
      synchronize_sched_expedited() to force all threads currently
      running in a preemptible-RCU read-side critical section onto the
      appropriate ->blocked_tasks[] list, then takes a snapshot of all
      of these lists and waits for them to drain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1259784616158-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9a3da06
    • P
      rcu: Rename "quiet" functions · d3f6bad3
      Paul E. McKenney 提交于
      The number of "quiet" functions has grown recently, and the
      names are no longer very descriptive.  The point of all of these
      functions is to do some portion of the task of reporting a
      quiescent state, so rename them accordingly:
      
      o	cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
      	quiescent state to the per-CPU rcu_data structure.  If this
      	turns out to be a new quiescent state for this grace period,
      	then rcu_report_qs_rnp() will be invoked to propagate the
      	quiescent state up the rcu_node hierarchy.
      
      o	cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
      	a quiescent state for a given CPU (or possibly a set of CPUs)
      	up the rcu_node hierarchy.
      
      o	cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
      	reports a full set of quiescent states to the global rcu_state
      	structure.
      
      o	task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
      	a quiescent state due to a task exiting an RCU read-side critical
      	section that had previously blocked in that same critical section.
      	As indicated by the new name, this type of quiescent state is
      	reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
      	to do so).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846163698-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3f6bad3
  14. 23 11月, 2009 2 次提交
    • P
      rcu: Re-arrange code to reduce #ifdef pain · 6ebb237b
      Paul E. McKenney 提交于
      Remove #ifdefs from kernel/rcupdate.c and
      include/linux/rcupdate.h by moving code to
      include/linux/rcutiny.h, include/linux/rcutree.h, and
      kernel/rcutree.c.
      
      Also remove some definitions that are no longer used.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258908830885-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ebb237b
    • P
      rcu: Fix grace-period-stall bug on large systems with CPU hotplug · b668c9cf
      Paul E. McKenney 提交于
      When the last CPU of a given leaf rcu_node structure goes
      offline, all of the tasks queued on that leaf rcu_node structure
      (due to having blocked in their current RCU read-side critical
      sections) are requeued onto the root rcu_node structure.  This
      requeuing is carried out by rcu_preempt_offline_tasks().
      However, it is possible that these queued tasks are the only
      thing preventing the leaf rcu_node structure from reporting a
      quiescent state up the rcu_node hierarchy.  Unfortunately, the
      old code would fail to do this reporting, resulting in a
      grace-period stall given the following sequence of events:
      
      1.	Kernel built for more than 32 CPUs on 32-bit systems or for more
      	than 64 CPUs on 64-bit systems, so that there is more than one
      	rcu_node structure.  (Or CONFIG_RCU_FANOUT is artificially set
      	to a number smaller than CONFIG_NR_CPUS.)
      
      2.	The kernel is built with CONFIG_TREE_PREEMPT_RCU.
      
      3.	A task running on a CPU associated with a given leaf rcu_node
      	structure blocks while in an RCU read-side critical section
      	-and- that CPU has not yet passed through a quiescent state
      	for the current RCU grace period.  This will cause the task
      	to be queued on the leaf rcu_node's blocked_tasks[] array, in
      	particular, on the element of this array corresponding to the
      	current grace period.
      
      4.	Each of the remaining CPUs corresponding to this same leaf rcu_node
      	structure pass through a quiescent state.  However, the task is
      	still in its RCU read-side critical section, so these quiescent
      	states cannot be reported further up the rcu_node hierarchy.
      	Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
      	field are now zero.
      
      5.	Each of the remaining CPUs go offline.  (The events in step
      	#4 and #5 can happen in any order as long as each CPU passes
      	through a quiescent state before going offline.)
      
      6.	When the last CPU goes offline, __rcu_offline_cpu() will invoke
      	rcu_preempt_offline_tasks(), which will move the task to the
      	root rcu_node structure, but without reporting a quiescent state
      	up the rcu_node hierarchy (and this failure to report a quiescent
      	state is the bug).
      
      	But because this leaf rcu_node structure's ->qsmask field is
      	already zero and its ->block_tasks[] entries are all empty,
      	force_quiescent_state() will skip this rcu_node structure.
      
      	Therefore, grace periods are now hung.
      
      This patch abstracts some code out of rcu_read_unlock_special(),
      calling the result task_quiet() by analogy with cpu_quiet(), and
      invokes task_quiet() from both rcu_read_lock_special() and
      __rcu_offline_cpu().  Invoking task_quiet() from
      __rcu_offline_cpu() reports the quiescent state up the rcu_node
      hierarchy, fixing the bug.  This ends up requiring a separate
      lock_class_key per level of the rcu_node hierarchy, which this
      patch also provides.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12589088301770-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b668c9cf
  15. 12 11月, 2009 1 次提交
  16. 11 11月, 2009 2 次提交
    • P
      rcu: Simplify association of quiescent states with grace periods · c64ac3ce
      Paul E. McKenney 提交于
      The rdp->passed_quiesc_completed fields are used to properly
      associate the recorded quiescent state with a grace period.  It
      is OK to wrongly associate a given quiescent state with a
      preceding grace period, but it is fatal to associate a given
      quiescent state with a grace period that begins after the
      quiescent state occurred.  Grace periods are numbered, and the
      following fields track them:
      
      o	->gpnum is the number of the grace period currently in
      	progress, or the number of the last grace period to
      	complete if no grace period is currently in progress.
      
      o	->completed is the number of the last grace period to
      	have completed.
      
      These two fields are equal if there is no grace period in
      progress, otherwise ->gpnum is one greater than ->completed.
      But the rdp->passed_quiesc_completed field compared against
      ->completed, and if equal, the quiescent state is presumed to
      count against the current grace period.
      
      The earlier code copied rdp->completed to
      rdp->passed_quiesc_completed, which has been made to work, but
      is error-prone.  In contrast, copying one less than rdp->gpnum
      is guaranteed safe, because rdp->gpnum is not incremented until
      after the start of the corresponding grace period. At the end of
      the grace period, when ->completed has incremented, then any
      quiescent periods recorded previously will be discarded.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890421011-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c64ac3ce
    • P
      rcu: Remove inline from forward-referenced functions · dbe01350
      Paul E. McKenney 提交于
      Some variants of gcc are reputed to dislike forward references
      to functions declared "inline".  Remove the "inline" keyword
      from such functions.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890422402-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbe01350
  17. 16 10月, 2009 1 次提交
    • P
      rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang · 237c80c5
      Paul E. McKenney 提交于
      If the following sequence of events occurs, then
      TREE_PREEMPT_RCU will hang waiting for a grace period to
      complete, eventually OOMing the system:
      
      o	A TREE_PREEMPT_RCU build of the kernel is booted on a system
      	with more than 64 physical CPUs present (32 on a 32-bit system).
      	Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
      	with RCU_FANOUT set to a sufficiently small value that the
      	physical CPUs populate two or more leaf rcu_node structures.
      
      o	A task is preempted in an RCU read-side critical section
      	while running on a CPU corresponding to a given leaf rcu_node
      	structure.
      
      o	All CPUs corresponding to this same leaf rcu_node structure
      	record quiescent states for the current grace period.
      
      o	All of these same CPUs go offline (hence the need for enough
      	physical CPUs to populate more than one leaf rcu_node structure).
      	This causes the preempted task to be moved to the root rcu_node
      	structure.
      
      At this point, there is nothing left to cause the quiescent
      state to be propagated up the rcu_node tree, so the current
      grace period never completes.
      
      The simplest fix, especially after considering the deadlock
      possibilities, is to detect this situation when the last CPU is
      offlined, and to set that CPU's ->qsmask bit in its leaf
      rcu_node structure.  This will cause the next invocation of
      force_quiescent_state() to end the grace period.
      
      Without this fix, this hang can be triggered in an hour or so on
      some machines with rcutorture and random CPU onlining/offlining.
      With this fix, these same machines pass a full 10 hours of this
      sort of abuse.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      237c80c5
  18. 15 10月, 2009 1 次提交
    • P
      rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU · 019129d5
      Paul E. McKenney 提交于
      For the short term, map synchronize_rcu_expedited() to
      synchronize_rcu() for TREE_PREEMPT_RCU and to
      synchronize_sched_expedited() for TREE_RCU.
      
      Longer term, there needs to be a real expedited grace period for
      TREE_PREEMPT_RCU, but candidate patches to date are considerably
      more complex and intrusive.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592331-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      019129d5
  19. 07 10月, 2009 1 次提交
    • P
      rcu: Make hot-unplugged CPU relinquish its own RCU callbacks · e74f4c45
      Paul E. McKenney 提交于
      The current interaction between RCU and CPU hotplug requires that
      RCU block in CPU notifiers waiting for callbacks to drain.
      
      This can be greatly simplified by having each CPU relinquish its
      own callbacks, and for both _rcu_barrier() and CPU_DEAD notifiers
      to adopt all callbacks that were previously relinquished.
      
      This change also eliminates the possibility of certain types of
      hangs due to the previous practice of waiting for callbacks to be
      invoked from within CPU notifiers.  If you don't every wait, you
      cannot hang.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1254890898456-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e74f4c45