1. 19 4月, 2013 1 次提交
    • F
      nohz: Ensure full dynticks CPUs are RCU nocbs · d1e43fa5
      Frederic Weisbecker 提交于
      We need full dynticks CPU to also be RCU nocb so
      that we don't have to keep the tick to handle RCU
      callbacks.
      
      Make sure the range passed to nohz_full= boot
      parameter is a subset of rcu_nocbs=
      
      The CPUs that fail to meet this requirement will be
      excluded from the nohz_full range. This is checked
      early in boot time, before any CPU has the opportunity
      to stop its tick.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d1e43fa5
  2. 16 4月, 2013 1 次提交
    • P
      rcu: Kick adaptive-ticks CPUs that are holding up RCU grace periods · 65d798f0
      Paul E. McKenney 提交于
      Adaptive-ticks CPUs inform RCU when they enter kernel mode, but they do
      not necessarily turn the scheduler-clock tick back on.  This state of
      affairs could result in RCU waiting on an adaptive-ticks CPU running
      for an extended period in kernel mode.  Such a CPU will never run the
      RCU state machine, and could therefore indefinitely extend the RCU state
      machine, sooner or later resulting in an OOM condition.
      
      This patch, inspired by an earlier patch by Frederic Weisbecker, therefore
      causes RCU's force-quiescent-state processing to check for this condition
      and to send an IPI to CPUs that remain in that state for too long.
      "Too long" currently means about three jiffies by default, which is
      quite some time for a CPU to remain in the kernel without blocking.
      The rcu_tree.jiffies_till_first_fqs and rcutree.jiffies_till_next_fqs
      sysfs variables may be used to tune "too long" if needed.
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      65d798f0
  3. 09 1月, 2013 2 次提交
    • P
      rcu: Make rcu_nocb_poll an early_param instead of module_param · 1b0048a4
      Paul Gortmaker 提交于
      The as-documented rcu_nocb_poll will fail to enable this feature
      for two reasons.  (1) there is an extra "s" in the documented
      name which is not in the code, and (2) since it uses module_param,
      it really is expecting a prefix, akin to "rcutree.fanout_leaf"
      and the prefix isn't documented.
      
      However, there are several reasons why we might not want to
      simply fix the typo and add the prefix:
      
      1) we'd end up with rcutree.rcu_nocb_poll, and rather probably make
      a change to rcutree.nocb_poll
      
      2) if we did #1, then the prefix wouldn't be consistent with the
      rcu_nocbs=<cpumap> parameter (i.e. one with, one without prefix)
      
      3) the use of module_param in a header file is less than desired,
      since it isn't immediately obvious that it will get processed
      via rcutree.c and get the prefix from that (although use of
      module_param_named() could clarify that.)
      
      4) the implied export of /sys/module/rcutree/parameters/rcu_nocb_poll
      data to userspace via module_param() doesn't really buy us anything,
      as it is read-only and we can tell if it is enabled already without
      it, since there is a printk at early boot telling us so.
      
      In light of all that, just change it from a module_param() to an
      early_setup() call, and worry about adding it to /sys later on if
      we decide to allow a dynamic setting of it.
      
      Also change the variable to be tagged as read_mostly, since it
      will only ever be fiddled with at most, once at boot.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1b0048a4
    • P
      rcu: Prevent soft-lockup complaints about no-CBs CPUs · 353af9c9
      Paul Gortmaker 提交于
      The wait_event() at the head of the rcu_nocb_kthread() can result in
      soft-lockup complaints if the CPU in question does not register RCU
      callbacks for an extended period.  This commit therefore changes
      the wait_event() to a wait_event_interruptible().
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      353af9c9
  4. 17 11月, 2012 2 次提交
    • P
      rcu: Separate accounting of callbacks from callback-free CPUs · c635a4e1
      Paul E. McKenney 提交于
      Currently, callback invocations from callback-free CPUs are accounted to
      the CPU that registered the callback, but using the same field that is
      used for normal callbacks.  This makes it impossible to determine from
      debugfs output whether callbacks are in fact being diverted.  This commit
      therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
      in which diverted callback invocations are counted.  RCU's debugfs tracing
      still displays normal callback invocations using ci=, but displayed
      diverted callbacks with nci=.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c635a4e1
    • P
      rcu: Add callback-free CPUs · 3fbfbf7a
      Paul E. McKenney 提交于
      RCU callback execution can add significant OS jitter and also can
      degrade both scheduling latency and, in asymmetric multiprocessors,
      energy efficiency.  This commit therefore adds the ability for selected
      CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
      to kthreads.  If the "rcu_nocb_poll" boot parameter is also specified,
      these kthreads will do polling, removing the need for the offloaded
      CPUs to do wakeups.  At least one CPU must be doing normal callback
      processing: currently CPU 0 cannot be selected as a no-CBs CPU.
      In addition, attempts to offline the last normal-CBs CPU will fail.
      
      This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
      this commit includes fixes to problems located by Fengguang Wu's
      kbuild test robot.
      
      [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3fbfbf7a
  5. 14 11月, 2012 1 次提交
  6. 09 11月, 2012 1 次提交
  7. 24 10月, 2012 1 次提交
  8. 26 9月, 2012 1 次提交
  9. 23 9月, 2012 11 次提交
    • P
      rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message · 86f343b5
      Paul E. McKenney 提交于
      The print_cpu_stall_fast_no_hz() function attempts to print -1 when
      the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
      to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
      18446744073709551615 on 64-bit systems.  Neither of these are the most
      reader-friendly values, so this commit instead causes "timer not pending"
      to be printed when ->idle_gp_timer is not pending.
      Reported-by: NPaul Walmsley <paul@pwsan.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      86f343b5
    • P
      rcu: Avoid rcu_print_detail_task_stall_rnp() segfault · 5fd4dc06
      Paul E. McKenney 提交于
      The rcu_print_detail_task_stall_rnp() function invokes
      rcu_preempt_blocked_readers_cgp() to verify that there are some preempted
      RCU readers blocking the current grace period outside of the protection
      of the rcu_node structure's ->lock.  This means that the last blocked
      reader might exit its RCU read-side critical section and remove itself
      from the ->blkd_tasks list before the ->lock is acquired, resulting in
      a segmentation fault when the subsequent code attempts to dereference
      the now-NULL gp_tasks pointer.
      
      This commit therefore moves the test under the lock.  This will not
      have measurable effect on lock contention because this code is invoked
      only when printing RCU CPU stall warnings, in other words, in the common
      case, never.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5fd4dc06
    • P
      rcu: Apply for_each_rcu_flavor() to increment_cpu_stall_ticks() · 115f7a7c
      Paul E. McKenney 提交于
      The increment_cpu_stall_ticks() function listed each RCU flavor
      explicitly, with an ifdef to handle preemptible RCU.  This commit
      therefore applies for_each_rcu_flavor() to save a line of code.
      
      Because this commit switches from a code-based enumeration of the
      flavors of RCU to an rcu_state-list-based enumeration, it is no longer
      possible to apply __get_cpu_var() to the per-CPU rcu_data structures.
      We instead use __this_cpu_var() on the rcu_state structure's ->rda field
      that references the corresponding rcu_data structures.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      115f7a7c
    • P
      rcu: Fix obsolete rcu_initiate_boost() header comment · b065a853
      Paul E. McKenney 提交于
      Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
      runqueue locks) made rcu_initiate_boost() restore irq state when releasing
      the rcu_node structure's ->lock, but failed to update the header comment
      accordingly.  This commit therefore brings the header comment up to date.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b065a853
    • P
      rcu: Improve boost selection when moving tasks to root rcu_node · 5cc900cf
      Paul E. McKenney 提交于
      The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
      rcu_node structure to the root rcu_node, which is done when the last CPU
      corresponding the the leaf rcu_node structure goes offline.  Now that
      RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
      operations during the initialization of each rcu_node structure's
      ->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
      of setting the root rcu_node's ->boost_tasks pointer.
      
      The key point is that rcu_preempt_offline_tasks() runs as part of the
      CPU-hotplug process, so that a concurrent synchronize_rcu_expedited()
      is guaranteed to either have not started on the one hand (in which case
      there is no boosting on behalf of the expedited grace period) or to be
      completely initialized on the other (in which case, in the absence of
      other priority boosting, all ->boost_tasks pointers will be initialized).
      Therefore, if rcu_preempt_offline_tasks() finds that the ->boost_tasks
      pointer is equal to the ->exp_tasks pointer, it can be sure that it is
      correctly placed.
      
      In the case where there was boosting ongoing at the time that the
      synchronize_rcu_expedited() function started, different nodes might start
      boosting the tasks blocking the expedited grace period at different times.
      In this mixed case, the root node will either be boosting tasks for
      the expedited grace period already, or it will start as soon as it gets
      done boosting for the normal grace period -- but in this latter case,
      the root node's tasks needed to be boosted in any case.
      
      This commit therefore adds a check of the ->boost_tasks pointer against
      the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      5cc900cf
    • P
      rcu: Properly initialize ->boost_tasks on CPU offline · 1e3fd2b3
      Paul E. McKenney 提交于
      When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
      structure, it does not NULL out the structure's ->boost_tasks field.
      This commit therefore fixes this issue.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      1e3fd2b3
    • P
      rcu: Simplify quiescent-state detection · d7d6a11e
      Paul E. McKenney 提交于
      The current quiescent-state detection algorithm is needlessly
      complex.  It records the grace-period number corresponding to
      the quiescent state at the time of the quiescent state, which
      works, but it seems better to simply erase any record of previous
      quiescent states at the time that the CPU notices the new grace
      period.  This has the further advantage of removing another piece
      of RCU for which lockless reasoning is required.
      
      Therefore, this commit makes this change.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      d7d6a11e
    • P
      rcu: Reduce synchronize_rcu_expedited() latency · 1943c89d
      Paul E. McKenney 提交于
      The synchronize_rcu_expedited() function disables interrupts across a
      scan of all leaf rcu_node structures, which is not good for real-time
      scheduling latency on large systems (hundreds or especially thousands
      of CPUs).  This commit therefore holds off CPU-hotplug operations using
      get_online_cpus(), and removes the prior acquisiion of the ->onofflock
      (which required disabling interrupts).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      1943c89d
    • P
      rcu: Eliminate signed overflow in synchronize_rcu_expedited() · bcfa57ce
      Paul E. McKenney 提交于
      In the C language, signed overflow is undefined.  It is true that
      twos-complement arithmetic normally comes to the rescue, but if the
      compiler can subvert this any time it has any information about the values
      being compared.  For example, given "if (a - b > 0)", if the compiler
      has enough information to realize that (for example) the value of "a"
      is positive and that of "b" is negative, the compiler is within its
      rights to optimize to a simple "if (1)", which might not be what you want.
      
      This commit therefore converts synchronize_rcu_expedited()'s work-done
      detection counter from signed to unsigned.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      bcfa57ce
    • P
      rcu: Move quiescent-state forcing into kthread · 4cdfc175
      Paul E. McKenney 提交于
      As the first step towards allowing quiescent-state forcing to be
      preemptible, this commit moves RCU quiescent-state forcing into the
      same kthread that is now used to initialize and clean up after grace
      periods.  This is yet another step towards keeping scheduling
      latency down to a dull roar.
      
      Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq()
      and to remove the now-unused rcu_state structure fields as suggested by
      Peter Zijlstra.
      Reported-by: NMike Galbraith <mgalbraith@suse.de>
      Reported-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4cdfc175
    • P
      rcu: Provide OOM handler to motivate lazy RCU callbacks · b626c1b6
      Paul E. McKenney 提交于
      In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
      large number of lazy callbacks, which as the name implies will be slow
      to be invoked.  This can be a problem on small-memory systems, where the
      default 6-second sleep for CPUs having only lazy RCU callbacks could well
      be fatal.  This commit therefore installs an OOM hander that ensures that
      every CPU with lazy callbacks has at least one non-lazy callback, in turn
      ensuring timely advancement for these callbacks.
      
      Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan.
      
      Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(),
      thus reducing the number of IPIs, as suggested by Steven Rostedt.  Also
      to make the for_each_online_cpu() loop be preemptible.  (Later, it might
      be good to use smp_call_function(), as suggested by Peter Zijlstra.)
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NSasha Levin <levinsasha928@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b626c1b6
  10. 13 8月, 2012 2 次提交
    • P
      rcu: Use smp_hotplug_thread facility for RCUs per-CPU kthread · 62ab7072
      Paul E. McKenney 提交于
      Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
      kthread code to use the new smp_hotplug_thread facility.
      
      [ tglx: Adapted it to use callbacks and to the simplified rcu yield ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62ab7072
    • T
      rcu: Yield simpler · 5d01bbd1
      Thomas Gleixner 提交于
      The rcu_yield() code is amazing. It's there to avoid starvation of the
      system when lots of (boosting) work is to be done.
      
      Now looking at the code it's functionality is:
      
       Make the thread SCHED_OTHER and very nice, i.e. get it out of the way
       Arm a timer with 2 ticks
       schedule()
      
      Now if the system goes idle the rcu task returns, regains SCHED_FIFO
      and plugs on. If the systems stays busy the timer fires and wakes a
      per node kthread which in turn makes the per cpu thread SCHED_FIFO and
      brings it back on the cpu. For the boosting thread the "make it FIFO"
      bit is missing and it just runs some magic boost checks. Now this is a
      lot of code with extra threads and complexity.
      
      It's way simpler to let the tasks when they detect overload schedule
      away for 2 ticks and defer the normal wakeup as long as they are in
      yielded state and the cpu is not idle.
      
      That solves the same problem and the only difference is that when the
      cpu goes idle it's not guaranteed that the thread returns right away,
      but it won't be longer out than two ticks, so no harm is done. If
      that's an issue than it is way simpler just to wake the task from
      idle as RCU has callbacks there anyway.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20120716103948.131256723@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d01bbd1
  11. 06 7月, 2012 1 次提交
  12. 03 7月, 2012 11 次提交
  13. 07 6月, 2012 3 次提交
  14. 10 5月, 2012 2 次提交
    • P
      rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables · 98248a0e
      Paul E. McKenney 提交于
      The current initialization of the RCU_FAST_NO_HZ per-CPU variables makes
      needless and fragile assumptions about the initial value of things like
      the jiffies counter.  This commit therefore explicitly initializes all of
      them that are better started with a non-zero value.  It also adds some
      comments describing the per-CPU state variables.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      98248a0e
    • P
      rcu: Make RCU_FAST_NO_HZ handle timer migration · 21e52e15
      Paul E. McKenney 提交于
      The current RCU_FAST_NO_HZ assumes that timers do not migrate unless a
      CPU goes offline, in which case it assumes that the CPU will have to come
      out of dyntick-idle mode (cancelling the timer) in order to go offline.
      This is important because when RCU_FAST_NO_HZ permits a CPU to enter
      dyntick-idle mode despite having RCU callbacks pending, it posts a timer
      on that CPU to force a wakeup on that CPU.  This wakeup ensures that the
      CPU will eventually handle the end of the grace period, including invoking
      its RCU callbacks.
      
      However, Pascal Chapperon's test setup shows that the timer handler
      rcu_idle_gp_timer_func() really does get invoked in some cases.  This is
      problematic because this can cause the CPU that entered dyntick-idle
      mode despite still having RCU callbacks pending to remain in
      dyntick-idle mode indefinitely, which means that its RCU callbacks might
      never be invoked.  This situation can result in grace-period delays or
      even system hangs, which matches Pascal's observations of slow boot-up
      and shutdown (https://lkml.org/lkml/2012/4/5/142).  See also the bugzilla:
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=806548
      
      This commit therefore causes the "should never be invoked" timer handler
      rcu_idle_gp_timer_func() to use smp_call_function_single() to wake up
      the CPU for which the timer was intended, allowing that CPU to invoke
      its RCU callbacks in a timely manner.
      Reported-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      21e52e15