1. 03 7月, 2012 1 次提交
  2. 01 5月, 2012 1 次提交
  3. 22 2月, 2012 7 次提交
  4. 12 12月, 2011 6 次提交
    • K
      docs: Additional LWN links to RCU API · d493011a
      Kees Cook 提交于
      Tyler Hicks pointed me at an additional article on RCU and I figured
      it should probably be mentioned with the others.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d493011a
    • P
      rcu: Add rcutorture CPU-hotplug capability · b58bdcca
      Paul E. McKenney 提交于
      Running CPU-hotplug operations concurrently with rcutorture has
      historically been a good way to find bugs in both RCU and CPU hotplug.
      This commit therefore adds an rcutorture module parameter called
      "onoff_interval" that causes a randomly selected CPU-hotplug operation to
      be executed at the specified interval, in seconds.  The default value of
      "onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
      operations.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b58bdcca
    • P
      rcu: Add rcutorture system-shutdown capability · d5f546d8
      Paul E. McKenney 提交于
      Although it is easy to run rcutorture tests under KVM, there is currently
      no nice way to run such a test for a fixed time period, collect all of
      the rcutorture data, and then shut the system down cleanly.  This commit
      therefore adds an rcutorture module parameter named "shutdown_secs" that
      specified the run duration in seconds, after which rcutorture terminates
      the test and powers the system down.  The default value for "shutdown_secs"
      is zero, which disables shutdown.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d5f546d8
    • P
      rcu: Add documentation for raw SRCU read-side primitives · 9ceae0e2
      Paul E. McKenney 提交于
      Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
      and srcu_read_unlock_raw().  Credit to Peter Zijlstra for suggesting
      use of the existing _raw suffix instead of the earlier bulkref names.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9ceae0e2
    • P
      rcu: Document failing tick as cause of RCU CPU stall warning · 2c01531f
      Paul E. McKenney 提交于
      One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
      These turned out to be caused by a bug that stopped scheduling-clock
      tick interrupts from being sent to a given CPU for several hundred seconds.
      This commit therefore updates the documentation to call this out as a
      possible cause for RCU CPU stall warnings.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      2c01531f
    • P
      rcu: Track idleness independent of idle tasks · 9b2e4f18
      Paul E. McKenney 提交于
      Earlier versions of RCU used the scheduling-clock tick to detect idleness
      by checking for the idle task, but handled idleness differently for
      CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
      critical sections in the idle task, for example, for tracing.  A more
      fine-grained detection of idleness is therefore required.
      
      This commit presses the old dyntick-idle code into full-time service,
      so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
      always invoked at the beginning of an idle loop iteration.  Similarly,
      rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
      at the end of an idle-loop iteration.  This allows the idle task to
      use RCU everywhere except between consecutive rcu_idle_enter() and
      rcu_idle_exit() calls, in turn allowing architecture maintainers to
      specify exactly where in the idle loop that RCU may be used.
      
      Because some of the userspace upcall uses can result in what looks
      to RCU like half of an interrupt, it is not possible to expect that
      the irq_enter() and irq_exit() hooks will give exact counts.  This
      patch therefore expands the ->dynticks_nesting counter to 64 bits
      and uses two separate bitfields to count process/idle transitions
      and interrupt entry/exit transitions.  It is presumed that userspace
      upcalls do not happen in the idle loop or from usermode execution
      (though usermode might do a system call that results in an upcall).
      The counter is hard-reset on each process/idle transition, which
      avoids the interrupt entry/exit error from accumulating.  Overflow
      is avoided by the 64-bitness of the ->dyntick_nesting counter.
      
      This commit also adds warnings if a non-idle task asks RCU to enter
      idle state (and these checks will need some adjustment before applying
      Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
      In addition, validation of ->dynticks and ->dynticks_nesting is added.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9b2e4f18
  5. 29 9月, 2011 8 次提交
  6. 13 6月, 2011 1 次提交
  7. 27 5月, 2011 1 次提交
    • P
      rcu: Decrease memory-barrier usage based on semi-formal proof · 23b5c8fa
      Paul E. McKenney 提交于
      (Note: this was reverted, and is now being re-applied in pieces, with
      this being the fifth and final piece.  See below for the reason that
      it is now felt to be safe to re-apply this.)
      
      Commit d09b62df fixed grace-period synchronization, but left some smp_mb()
      invocations in rcu_process_callbacks() that are no longer needed, but
      sheer paranoia prevented them from being removed.  This commit removes
      them and provides a proof of correctness in their absence.  It also adds
      a memory barrier to rcu_report_qs_rsp() immediately before the update to
      rsp->completed in order to handle the theoretical possibility that the
      compiler or CPU might move massive quantities of code into a lock-based
      critical section.  This also proves that the sheer paranoia was not
      entirely unjustified, at least from a theoretical point of view.
      
      In addition, the old dyntick-idle synchronization depended on the fact
      that grace periods were many milliseconds in duration, so that it could
      be assumed that no dyntick-idle CPU could reorder a memory reference
      across an entire grace period.  Unfortunately for this design, the
      addition of expedited grace periods breaks this assumption, which has
      the unfortunate side-effect of requiring atomic operations in the
      functions that track dyntick-idle state for RCU.  (There is some hope
      that the algorithms used in user-level RCU might be applied here, but
      some work is required to handle the NMIs that user-space applications
      can happily ignore.  For the short term, better safe than sorry.)
      
      This proof assumes that neither compiler nor CPU will allow a lock
      acquisition and release to be reordered, as doing so can result in
      deadlock.  The proof is as follows:
      
      1.	A given CPU declares a quiescent state under the protection of
      	its leaf rcu_node's lock.
      
      2.	If there is more than one level of rcu_node hierarchy, the
      	last CPU to declare a quiescent state will also acquire the
      	->lock of the next rcu_node up in the hierarchy,  but only
      	after releasing the lower level's lock.  The acquisition of this
      	lock clearly cannot occur prior to the acquisition of the leaf
      	node's lock.
      
      3.	Step 2 repeats until we reach the root rcu_node structure.
      	Please note again that only one lock is held at a time through
      	this process.  The acquisition of the root rcu_node's ->lock
      	must occur after the release of that of the leaf rcu_node.
      
      4.	At this point, we set the ->completed field in the rcu_state
      	structure in rcu_report_qs_rsp().  However, if the rcu_node
      	hierarchy contains only one rcu_node, then in theory the code
      	preceding the quiescent state could leak into the critical
      	section.  We therefore precede the update of ->completed with a
      	memory barrier.  All CPUs will therefore agree that any updates
      	preceding any report of a quiescent state will have happened
      	before the update of ->completed.
      
      5.	Regardless of whether a new grace period is needed, rcu_start_gp()
      	will propagate the new value of ->completed to all of the leaf
      	rcu_node structures, under the protection of each rcu_node's ->lock.
      	If a new grace period is needed immediately, this propagation
      	will occur in the same critical section that ->completed was
      	set in, but courtesy of the memory barrier in #4 above, is still
      	seen to follow any pre-quiescent-state activity.
      
      6.	When a given CPU invokes __rcu_process_gp_end(), it becomes
      	aware of the end of the old grace period and therefore makes
      	any RCU callbacks that were waiting on that grace period eligible
      	for invocation.
      
      	If this CPU is the same one that detected the end of the grace
      	period, and if there is but a single rcu_node in the hierarchy,
      	we will still be in the single critical section.  In this case,
      	the memory barrier in step #4 guarantees that all callbacks will
      	be seen to execute after each CPU's quiescent state.
      
      	On the other hand, if this is a different CPU, it will acquire
      	the leaf rcu_node's ->lock, and will again be serialized after
      	each CPU's quiescent state for the old grace period.
      
      On the strength of this proof, this commit therefore removes the memory
      barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
      The effect is to reduce the number of memory barriers by one and to
      reduce the frequency of execution from about once per scheduling tick
      per CPU to once per grace period.
      
      This was reverted do to hangs found during testing by Yinghai Lu and
      Ingo Molnar.  Frederic Weisbecker supplied Yinghai with tracing that
      located the underlying problem, and Frederic also provided the fix.
      
      The underlying problem was that the HARDIRQ_ENTER() macro from
      lib/locking-selftest.c invoked irq_enter(), which in turn invokes
      rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which
      does not invoke rcu_irq_exit().  This situation resulted in calls
      to rcu_irq_enter() that were not balanced by the required calls to
      rcu_irq_exit().  Therefore, after these locking selftests completed,
      RCU's dyntick-idle nesting count was a large number (for example,
      72), which caused RCU to to conclude that the affected CPU was not in
      dyntick-idle mode when in fact it was.
      
      RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting
      in hangs.
      
      In contrast, with Frederic's patch, which replaces the irq_enter()
      in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call
      either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU
      running the test is already marked as not being in dyntick-idle mode.
      This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU
      then has no problem working out which CPUs are in dyntick-idle mode and
      which are not.
      
      The reason that the imbalance was not noticed before the barrier patch
      was applied is that the old implementation of rcu_enter_nohz() ignored
      the nesting depth.  This could still result in delays, but much shorter
      ones.  Whenever there was a delay, RCU would IPI the CPU with the
      unbalanced nesting level, which would eventually result in rcu_enter_nohz()
      being called, which in turn would force RCU to see that the CPU was in
      dyntick-idle mode.
      
      The reason that very few people noticed the problem is that the mismatched
      irq_enter() vs. __irq_exit() occured only when the kernel was built with
      CONFIG_DEBUG_LOCKING_API_SELFTESTS.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      23b5c8fa
  8. 20 5月, 2011 1 次提交
  9. 06 5月, 2011 8 次提交
    • P
      rcu: Add forward-progress diagnostic for per-CPU kthreads · 5ece5bab
      Paul E. McKenney 提交于
      Increment a per-CPU counter on each pass through rcu_cpu_kthread()'s
      service loop, and add it to the rcudata trace output.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      5ece5bab
    • P
      rcu: add grace-period age and more kthread state to tracing · 15ba0ba8
      Paul E. McKenney 提交于
      This commit adds the age in jiffies of the current grace period along
      with the duration in jiffies of the longest grace period since boot
      to the rcu/rcugp debugfs file.  It also adds an additional "O" state
      to kthread tracing to differentiate between the kthread waiting due to
      having nothing to do on the one hand and waiting due to being on the
      wrong CPU on the other hand.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      15ba0ba8
    • P
      rcu: update tracing documentation for new rcutorture and rcuboost · 90e6ac36
      Paul E. McKenney 提交于
      This commit documents the new debugfs rcu/rcutorture and rcu/rcuboost
      trace files.  The description has been updated as suggested by Josh
      Triplett.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      90e6ac36
    • P
      rcu: add callback-queue information to rcudata output · 0ac3d136
      Paul E. McKenney 提交于
      This commit adds an indication of the state of the callback queue using
      a string of four characters following the "ql=" integer queue length.
      The first character is "N" if there are callbacks that have been
      queued that are not yet ready to be handled by the next grace period, or
      "." otherwise.  The second character is "R" if there are callbacks queued
      that are ready to be handled by the next grace period, or "." otherwise.
      The third character is "W" if there are callbacks waiting for the current
      grace period, or "." otherwise.  Finally, the fourth character is "D"
      if there are callbacks that have been handled by a prior grace period
      and are waiting to be invoked, or ".".
      
      Note that callbacks that are in the process of being invoked are
      not shown.  These callbacks would have been removed from the rcu_data
      structure's list by rcu_do_batch() prior to being executed.  (These
      callbacks are also not reflected in the "ql=" total, FWIW.)
      
      Also, document the new callback-queue trace information.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      0ac3d136
    • P
      rcu: Update RCU's trace.txt documentation for new format · 2fa218d8
      Paul E. McKenney 提交于
      The trace.txt file had obsolete output for the debugfs rcu/rcudata
      file, so update it.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      2fa218d8
    • P
      rcu: merge TREE_PREEPT_RCU blocked_tasks[] lists · 12f5f524
      Paul E. McKenney 提交于
      Combine the current TREE_PREEMPT_RCU ->blocked_tasks[] lists in the
      rcu_node structure into a single ->blkd_tasks list with ->gp_tasks
      and ->exp_tasks tail pointers.  This is in preparation for RCU priority
      boosting, which will add a third dimension to the combinatorial explosion
      in the ->blocked_tasks[] case, but simply a third pointer in the new
      ->blkd_tasks case.
      
      Also update documentation to reflect blocked_tasks[] merge
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      12f5f524
    • P
      rcu: Decrease memory-barrier usage based on semi-formal proof · e59fb312
      Paul E. McKenney 提交于
      Commit d09b62df fixed grace-period synchronization, but left some smp_mb()
      invocations in rcu_process_callbacks() that are no longer needed, but
      sheer paranoia prevented them from being removed.  This commit removes
      them and provides a proof of correctness in their absence.  It also adds
      a memory barrier to rcu_report_qs_rsp() immediately before the update to
      rsp->completed in order to handle the theoretical possibility that the
      compiler or CPU might move massive quantities of code into a lock-based
      critical section.  This also proves that the sheer paranoia was not
      entirely unjustified, at least from a theoretical point of view.
      
      In addition, the old dyntick-idle synchronization depended on the fact
      that grace periods were many milliseconds in duration, so that it could
      be assumed that no dyntick-idle CPU could reorder a memory reference
      across an entire grace period.  Unfortunately for this design, the
      addition of expedited grace periods breaks this assumption, which has
      the unfortunate side-effect of requiring atomic operations in the
      functions that track dyntick-idle state for RCU.  (There is some hope
      that the algorithms used in user-level RCU might be applied here, but
      some work is required to handle the NMIs that user-space applications
      can happily ignore.  For the short term, better safe than sorry.)
      
      This proof assumes that neither compiler nor CPU will allow a lock
      acquisition and release to be reordered, as doing so can result in
      deadlock.  The proof is as follows:
      
      1.	A given CPU declares a quiescent state under the protection of
      	its leaf rcu_node's lock.
      
      2.	If there is more than one level of rcu_node hierarchy, the
      	last CPU to declare a quiescent state will also acquire the
      	->lock of the next rcu_node up in the hierarchy,  but only
      	after releasing the lower level's lock.  The acquisition of this
      	lock clearly cannot occur prior to the acquisition of the leaf
      	node's lock.
      
      3.	Step 2 repeats until we reach the root rcu_node structure.
      	Please note again that only one lock is held at a time through
      	this process.  The acquisition of the root rcu_node's ->lock
      	must occur after the release of that of the leaf rcu_node.
      
      4.	At this point, we set the ->completed field in the rcu_state
      	structure in rcu_report_qs_rsp().  However, if the rcu_node
      	hierarchy contains only one rcu_node, then in theory the code
      	preceding the quiescent state could leak into the critical
      	section.  We therefore precede the update of ->completed with a
      	memory barrier.  All CPUs will therefore agree that any updates
      	preceding any report of a quiescent state will have happened
      	before the update of ->completed.
      
      5.	Regardless of whether a new grace period is needed, rcu_start_gp()
      	will propagate the new value of ->completed to all of the leaf
      	rcu_node structures, under the protection of each rcu_node's ->lock.
      	If a new grace period is needed immediately, this propagation
      	will occur in the same critical section that ->completed was
      	set in, but courtesy of the memory barrier in #4 above, is still
      	seen to follow any pre-quiescent-state activity.
      
      6.	When a given CPU invokes __rcu_process_gp_end(), it becomes
      	aware of the end of the old grace period and therefore makes
      	any RCU callbacks that were waiting on that grace period eligible
      	for invocation.
      
      	If this CPU is the same one that detected the end of the grace
      	period, and if there is but a single rcu_node in the hierarchy,
      	we will still be in the single critical section.  In this case,
      	the memory barrier in step #4 guarantees that all callbacks will
      	be seen to execute after each CPU's quiescent state.
      
      	On the other hand, if this is a different CPU, it will acquire
      	the leaf rcu_node's ->lock, and will again be serialized after
      	each CPU's quiescent state for the old grace period.
      
      On the strength of this proof, this commit therefore removes the memory
      barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
      The effect is to reduce the number of memory barriers by one and to
      reduce the frequency of execution from about once per scheduling tick
      per CPU to once per grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      e59fb312
    • P
      rcu: Remove conditional compilation for RCU CPU stall warnings · a00e0d71
      Paul E. McKenney 提交于
      The RCU CPU stall warnings can now be controlled using the
      rcu_cpu_stall_suppress boot-time parameter or via the same parameter
      from sysfs.  There is therefore no longer any reason to have
      kernel config parameters for this feature.  This commit therefore
      removes the RCU_CPU_STALL_DETECTOR and RCU_CPU_STALL_DETECTOR_RUNNABLE
      kernel config parameters.  The RCU_CPU_STALL_TIMEOUT parameter remains
      to allow the timeout to be tuned and the RCU_CPU_STALL_VERBOSE parameter
      remains to allow task-stall information to be suppressed if desired.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a00e0d71
  10. 05 3月, 2011 1 次提交
  11. 30 11月, 2010 2 次提交
  12. 24 9月, 2010 1 次提交
    • P
      rcu: Add tracing data to support queueing models · 269dcc1c
      Paul E. McKenney 提交于
      The current tracing data is not sufficient to deduce the average time
      that a callback spends waiting for a grace period to end.  Add three
      per-CPU counters recording the number of callbacks invoked (ci), the
      number of callbacks orphaned (co), and the number of callbacks adopted
      (ca).  Given the existing callback queue length (ql), the average wait
      time in absence of CPU hotplug operations is ql/ci.  The units of wait
      time will be in terms of the duration over which ci was measured.
      
      In the presence of CPU hotplug operations, there is room for argument,
      but ql/(ci-co+ca) won't steer you too far wrong.
      
      Also fixes a typo called out by Lucas De Marchi <lucas.de.marchi@gmail.com>.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      269dcc1c
  13. 24 8月, 2010 1 次提交
  14. 21 8月, 2010 1 次提交