1. 17 11月, 2012 1 次提交
    • P
      rcu: Add callback-free CPUs · 3fbfbf7a
      Paul E. McKenney 提交于
      RCU callback execution can add significant OS jitter and also can
      degrade both scheduling latency and, in asymmetric multiprocessors,
      energy efficiency.  This commit therefore adds the ability for selected
      CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
      to kthreads.  If the "rcu_nocb_poll" boot parameter is also specified,
      these kthreads will do polling, removing the need for the offloaded
      CPUs to do wakeups.  At least one CPU must be doing normal callback
      processing: currently CPU 0 cannot be selected as a no-CBs CPU.
      In addition, attempts to offline the last normal-CBs CPU will fail.
      
      This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
      this commit includes fixes to problems located by Fengguang Wu's
      kbuild test robot.
      
      [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3fbfbf7a
  2. 14 11月, 2012 1 次提交
  3. 09 11月, 2012 1 次提交
  4. 24 10月, 2012 1 次提交
  5. 26 9月, 2012 1 次提交
  6. 23 9月, 2012 11 次提交
    • P
      rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message · 86f343b5
      Paul E. McKenney 提交于
      The print_cpu_stall_fast_no_hz() function attempts to print -1 when
      the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
      to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
      18446744073709551615 on 64-bit systems.  Neither of these are the most
      reader-friendly values, so this commit instead causes "timer not pending"
      to be printed when ->idle_gp_timer is not pending.
      Reported-by: NPaul Walmsley <paul@pwsan.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      86f343b5
    • P
      rcu: Avoid rcu_print_detail_task_stall_rnp() segfault · 5fd4dc06
      Paul E. McKenney 提交于
      The rcu_print_detail_task_stall_rnp() function invokes
      rcu_preempt_blocked_readers_cgp() to verify that there are some preempted
      RCU readers blocking the current grace period outside of the protection
      of the rcu_node structure's ->lock.  This means that the last blocked
      reader might exit its RCU read-side critical section and remove itself
      from the ->blkd_tasks list before the ->lock is acquired, resulting in
      a segmentation fault when the subsequent code attempts to dereference
      the now-NULL gp_tasks pointer.
      
      This commit therefore moves the test under the lock.  This will not
      have measurable effect on lock contention because this code is invoked
      only when printing RCU CPU stall warnings, in other words, in the common
      case, never.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5fd4dc06
    • P
      rcu: Apply for_each_rcu_flavor() to increment_cpu_stall_ticks() · 115f7a7c
      Paul E. McKenney 提交于
      The increment_cpu_stall_ticks() function listed each RCU flavor
      explicitly, with an ifdef to handle preemptible RCU.  This commit
      therefore applies for_each_rcu_flavor() to save a line of code.
      
      Because this commit switches from a code-based enumeration of the
      flavors of RCU to an rcu_state-list-based enumeration, it is no longer
      possible to apply __get_cpu_var() to the per-CPU rcu_data structures.
      We instead use __this_cpu_var() on the rcu_state structure's ->rda field
      that references the corresponding rcu_data structures.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      115f7a7c
    • P
      rcu: Fix obsolete rcu_initiate_boost() header comment · b065a853
      Paul E. McKenney 提交于
      Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
      runqueue locks) made rcu_initiate_boost() restore irq state when releasing
      the rcu_node structure's ->lock, but failed to update the header comment
      accordingly.  This commit therefore brings the header comment up to date.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b065a853
    • P
      rcu: Improve boost selection when moving tasks to root rcu_node · 5cc900cf
      Paul E. McKenney 提交于
      The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
      rcu_node structure to the root rcu_node, which is done when the last CPU
      corresponding the the leaf rcu_node structure goes offline.  Now that
      RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
      operations during the initialization of each rcu_node structure's
      ->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
      of setting the root rcu_node's ->boost_tasks pointer.
      
      The key point is that rcu_preempt_offline_tasks() runs as part of the
      CPU-hotplug process, so that a concurrent synchronize_rcu_expedited()
      is guaranteed to either have not started on the one hand (in which case
      there is no boosting on behalf of the expedited grace period) or to be
      completely initialized on the other (in which case, in the absence of
      other priority boosting, all ->boost_tasks pointers will be initialized).
      Therefore, if rcu_preempt_offline_tasks() finds that the ->boost_tasks
      pointer is equal to the ->exp_tasks pointer, it can be sure that it is
      correctly placed.
      
      In the case where there was boosting ongoing at the time that the
      synchronize_rcu_expedited() function started, different nodes might start
      boosting the tasks blocking the expedited grace period at different times.
      In this mixed case, the root node will either be boosting tasks for
      the expedited grace period already, or it will start as soon as it gets
      done boosting for the normal grace period -- but in this latter case,
      the root node's tasks needed to be boosted in any case.
      
      This commit therefore adds a check of the ->boost_tasks pointer against
      the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      5cc900cf
    • P
      rcu: Properly initialize ->boost_tasks on CPU offline · 1e3fd2b3
      Paul E. McKenney 提交于
      When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
      structure, it does not NULL out the structure's ->boost_tasks field.
      This commit therefore fixes this issue.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      1e3fd2b3
    • P
      rcu: Simplify quiescent-state detection · d7d6a11e
      Paul E. McKenney 提交于
      The current quiescent-state detection algorithm is needlessly
      complex.  It records the grace-period number corresponding to
      the quiescent state at the time of the quiescent state, which
      works, but it seems better to simply erase any record of previous
      quiescent states at the time that the CPU notices the new grace
      period.  This has the further advantage of removing another piece
      of RCU for which lockless reasoning is required.
      
      Therefore, this commit makes this change.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d7d6a11e
    • P
      rcu: Reduce synchronize_rcu_expedited() latency · 1943c89d
      Paul E. McKenney 提交于
      The synchronize_rcu_expedited() function disables interrupts across a
      scan of all leaf rcu_node structures, which is not good for real-time
      scheduling latency on large systems (hundreds or especially thousands
      of CPUs).  This commit therefore holds off CPU-hotplug operations using
      get_online_cpus(), and removes the prior acquisiion of the ->onofflock
      (which required disabling interrupts).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      1943c89d
    • P
      rcu: Eliminate signed overflow in synchronize_rcu_expedited() · bcfa57ce
      Paul E. McKenney 提交于
      In the C language, signed overflow is undefined.  It is true that
      twos-complement arithmetic normally comes to the rescue, but if the
      compiler can subvert this any time it has any information about the values
      being compared.  For example, given "if (a - b > 0)", if the compiler
      has enough information to realize that (for example) the value of "a"
      is positive and that of "b" is negative, the compiler is within its
      rights to optimize to a simple "if (1)", which might not be what you want.
      
      This commit therefore converts synchronize_rcu_expedited()'s work-done
      detection counter from signed to unsigned.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      bcfa57ce
    • P
      rcu: Move quiescent-state forcing into kthread · 4cdfc175
      Paul E. McKenney 提交于
      As the first step towards allowing quiescent-state forcing to be
      preemptible, this commit moves RCU quiescent-state forcing into the
      same kthread that is now used to initialize and clean up after grace
      periods.  This is yet another step towards keeping scheduling
      latency down to a dull roar.
      
      Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq()
      and to remove the now-unused rcu_state structure fields as suggested by
      Peter Zijlstra.
      Reported-by: NMike Galbraith <mgalbraith@suse.de>
      Reported-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4cdfc175
    • P
      rcu: Provide OOM handler to motivate lazy RCU callbacks · b626c1b6
      Paul E. McKenney 提交于
      In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
      large number of lazy callbacks, which as the name implies will be slow
      to be invoked.  This can be a problem on small-memory systems, where the
      default 6-second sleep for CPUs having only lazy RCU callbacks could well
      be fatal.  This commit therefore installs an OOM hander that ensures that
      every CPU with lazy callbacks has at least one non-lazy callback, in turn
      ensuring timely advancement for these callbacks.
      
      Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan.
      
      Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(),
      thus reducing the number of IPIs, as suggested by Steven Rostedt.  Also
      to make the for_each_online_cpu() loop be preemptible.  (Later, it might
      be good to use smp_call_function(), as suggested by Peter Zijlstra.)
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSasha Levin <levinsasha928@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b626c1b6
  7. 13 8月, 2012 2 次提交
    • P
      rcu: Use smp_hotplug_thread facility for RCUs per-CPU kthread · 62ab7072
      Paul E. McKenney 提交于
      Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
      kthread code to use the new smp_hotplug_thread facility.
      
      [ tglx: Adapted it to use callbacks and to the simplified rcu yield ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62ab7072
    • T
      rcu: Yield simpler · 5d01bbd1
      Thomas Gleixner 提交于
      The rcu_yield() code is amazing. It's there to avoid starvation of the
      system when lots of (boosting) work is to be done.
      
      Now looking at the code it's functionality is:
      
       Make the thread SCHED_OTHER and very nice, i.e. get it out of the way
       Arm a timer with 2 ticks
       schedule()
      
      Now if the system goes idle the rcu task returns, regains SCHED_FIFO
      and plugs on. If the systems stays busy the timer fires and wakes a
      per node kthread which in turn makes the per cpu thread SCHED_FIFO and
      brings it back on the cpu. For the boosting thread the "make it FIFO"
      bit is missing and it just runs some magic boost checks. Now this is a
      lot of code with extra threads and complexity.
      
      It's way simpler to let the tasks when they detect overload schedule
      away for 2 ticks and defer the normal wakeup as long as they are in
      yielded state and the cpu is not idle.
      
      That solves the same problem and the only difference is that when the
      cpu goes idle it's not guaranteed that the thread returns right away,
      but it won't be longer out than two ticks, so no harm is done. If
      that's an issue than it is way simpler just to wake the task from
      idle as RCU has callbacks there anyway.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20120716103948.131256723@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d01bbd1
  8. 06 7月, 2012 1 次提交
  9. 03 7月, 2012 11 次提交
  10. 07 6月, 2012 3 次提交
  11. 10 5月, 2012 2 次提交
    • P
      rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables · 98248a0e
      Paul E. McKenney 提交于
      The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
      needless and fragile assumptions about the initial value of things like
      the jiffies counter.  This commit therefore explicitly initializes all of
      them that are better started with a non-zero value.  It also adds some
      comments describing the per-CPU state variables.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      98248a0e
    • P
      rcu: Make RCU_FAST_NO_HZ handle timer migration · 21e52e15
      Paul E. McKenney 提交于
      The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
      CPU goes offline, in which case it assumes that the CPU will have to come
      out of dyntick-idle mode (cancelling the timer) in order to go offline.
      This is important because when RCU_FAST_NO_HZ permits a CPU to enter
      dyntick-idle mode despite having RCU callbacks pending, it posts a timer
      on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
      CPU will eventually handle the end of the grace period, including invoking
      its RCU callbacks.
      
      However, Pascal Chapperon's test setup shows that the timer handler
      rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
      problematic because this can cause the CPU that entered dyntick-idle
      mode despite still having RCU callbacks pending to remain in
      dyntick-idle mode indefinitely, which means that its RCU callbacks might
      never be invoked.  This situation can result in grace-period delays or
      even system hangs, which matches Pascal's observations of slow boot-up
      and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=806548
      
      This commit therefore causes the "should never be invoked" timer handler
      rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
      the CPU for which the timer was intended, allowing that CPU to invoke
      its RCU callbacks in a timely manner.
      Reported-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      21e52e15
  12. 03 5月, 2012 2 次提交
  13. 01 5月, 2012 1 次提交
  14. 26 4月, 2012 1 次提交
    • P
      rcu: Add warning for RCU_FAST_NO_HZ timer firing · 79b9a75f
      Paul E. McKenney 提交于
      RCU_FAST_NO_HZ uses a timer to limit the time that a CPU with callbacks
      can remain in dyntick-idle mode.  This timer is cancelled when the CPU
      exits idle, and therefore should never fire.  However, if the timer
      were migrated to some other CPU for whatever reason (1) the timer could
      actually fire and (2) firing on some other CPU would fail to wake up the
      CPU with callbacks, possibly resulting in sluggishness or a system hang.
      
      This commit therfore adds a WARN_ON_ONCE() to the timer handler in order
      to detect this condition.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      79b9a75f
  15. 25 4月, 2012 1 次提交
    • P
      rcu: Make RCU_FAST_NO_HZ account for pauses out of idle · c57afe80
      Paul E. McKenney 提交于
      Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
      macro can cause RCU to momentarily pause out of idle without the rest
      of the system being involved.  This can cause rcu_prepare_for_idle()
      to run through its state machine too quickly, which can in turn result
      in needless scheduling-clock interrupts.
      
      This commit therefore adds code to enable rcu_prepare_for_idle() to
      distinguish between an initial entry to idle on the one hand (which needs
      to advance the rcu_prepare_for_idle() state machine) and an idle reentry
      due to idle-capable trace macros and RCU_NONIDLE() on the other hand
      (which should avoid advancing the rcu_prepare_for_idle() state machine).
      Additional state is maintained to allow the timer to be correctly reposted
      when returning after a momentary pause out of idle, and even more state
      is maintained to detect when new non-lazy callbacks have been enqueued
      (which may require re-evaluation of the approach to idleness).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c57afe80