1. 13 3月, 2013 1 次提交
    • P
      rcu: Remove restrictions on no-CBs CPUs · 34ed6246
      Paul E. McKenney 提交于
      Currently, CPU 0 is constrained to not be a no-CBs CPU, and furthermore
      at least one no-CBs CPU must remain online at any given time.  These
      restrictions are problematic in some situations, such as cases where
      all CPUs must run a real-time workload that needs to be insulated from
      OS jitter and latencies due to RCU callback invocation.  This commit
      therefore provides no-CBs CPUs a (very crude and energy-inefficient)
      way to start and to wait for grace periods independently of the normal
      RCU callback mechanisms.  This approach allows any or all of the CPUs to
      be designated as no-CBs CPUs, and allows any proper subset of the CPUs
      (whether no-CBs CPUs or not) to be offlined.
      
      This commit also provides a fix for a locking bug spotted by Xie
      ChanglongX <changlongx.xie@intel.com>.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      34ed6246
  2. 29 1月, 2013 1 次提交
  3. 27 1月, 2013 1 次提交
  4. 09 1月, 2013 1 次提交
    • P
      rcu: Tag callback lists with corresponding grace-period number · dc35c893
      Paul E. McKenney 提交于
      Currently, callbacks are advanced each time the corresponding CPU
      notices a change in its leaf rcu_node structure's ->completed value
      (this value counts grace-period completions).  This approach has worked
      quite well, but with the advent of RCU_FAST_NO_HZ, we cannot count on
      a given CPU seeing all the grace-period completions.  When a CPU misses
      a grace-period completion that occurs while it is in dyntick-idle mode,
      this will delay invocation of its callbacks.
      
      In addition, acceleration of callbacks (when RCU realizes that a given
      callback need only wait until the end of the next grace period, rather
      than having to wait for a partial grace period followed by a full
      grace period) must be carried out extremely carefully.  Insufficient
      acceleration will result in unnecessarily long grace-period latencies,
      while excessive acceleration will result in premature callback invocation.
      Changes that involve this tradeoff are therefore among the most
      nerve-wracking changes to RCU.
      
      This commit therefore explicitly tags groups of callbacks with the
      number of the grace period that they are waiting for.  This means that
      callback-advancement and callback-acceleration functions are idempotent,
      so that excessive acceleration will merely waste a few CPU cycles.  This
      also allows a CPU to take full advantage of any grace periods that have
      elapsed while it has been in dyntick-idle mode.  It should also enable
      simulataneous simplifications to and optimizations of RCU_FAST_NO_HZ.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dc35c893
  5. 17 11月, 2012 2 次提交
    • P
      rcu: Separate accounting of callbacks from callback-free CPUs · c635a4e1
      Paul E. McKenney 提交于
      Currently, callback invocations from callback-free CPUs are accounted to
      the CPU that registered the callback, but using the same field that is
      used for normal callbacks.  This makes it impossible to determine from
      debugfs output whether callbacks are in fact being diverted.  This commit
      therefore adds a separate ->n_nocbs_invoked field in the rcu_data structure
      in which diverted callback invocations are counted.  RCU's debugfs tracing
      still displays normal callback invocations using ci=, but displayed
      diverted callbacks with nci=.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c635a4e1
    • P
      rcu: Add callback-free CPUs · 3fbfbf7a
      Paul E. McKenney 提交于
      RCU callback execution can add significant OS jitter and also can
      degrade both scheduling latency and, in asymmetric multiprocessors,
      energy efficiency.  This commit therefore adds the ability for selected
      CPUs ("rcu_nocbs=" boot parameter) to have their callbacks offloaded
      to kthreads.  If the "rcu_nocb_poll" boot parameter is also specified,
      these kthreads will do polling, removing the need for the offloaded
      CPUs to do wakeups.  At least one CPU must be doing normal callback
      processing: currently CPU 0 cannot be selected as a no-CBs CPU.
      In addition, attempts to offline the last normal-CBs CPU will fail.
      
      This feature was inspired by Jim Houston's and Joe Korty's JRCU, and
      this commit includes fixes to problems located by Fengguang Wu's
      kbuild test robot.
      
      [ paulmck: Added gfp.h include file as suggested by Fengguang Wu. ]
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3fbfbf7a
  6. 09 11月, 2012 3 次提交
  7. 09 10月, 2012 1 次提交
    • P
      rcu: Grace-period initialization excludes only RCU notifier · a4fbe35a
      Paul E. McKenney 提交于
      Kirill noted the following deadlock cycle on shutdown involving padata:
      
      > With commit 755609a9 I've got deadlock on
      > poweroff.
      >
      > It guess it happens because of race for cpu_hotplug.lock:
      >
      >       CPU A                                   CPU B
      > disable_nonboot_cpus()
      > _cpu_down()
      > cpu_hotplug_begin()
      >  mutex_lock(&cpu_hotplug.lock);
      > __cpu_notify()
      > padata_cpu_callback()
      > __padata_remove_cpu()
      > padata_replace()
      > synchronize_rcu()
      >                                       rcu_gp_kthread()
      >                                       get_online_cpus();
      >                                       mutex_lock(&cpu_hotplug.lock);
      
      It would of course be good to eliminate grace-period delays from
      CPU-hotplug notifiers, but that is a separate issue.  Deadlock is
      not an appropriate diagnostic for excessive CPU-hotplug latency.
      
      Fortunately, grace-period initialization does not actually need to
      exclude all of the CPU-hotplug operation, but rather only RCU's own
      CPU_UP_PREPARE and CPU_DEAD CPU-hotplug notifiers.  This commit therefore
      introduces a new per-rcu_state onoff_mutex that provides the required
      concurrency control in place of the get_online_cpus() that was previously
      in rcu_gp_init().
      Reported-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NKirill A. Shutemov <kirill@shutemov.name>
      a4fbe35a
  8. 26 9月, 2012 2 次提交
    • F
      rcu: Ignore userspace extended quiescent state by default · 1e1a689f
      Frederic Weisbecker 提交于
      By default we don't want to enter into RCU extended quiescent
      state while in userspace because doing this produces some overhead
      (eg: use of syscall slowpath). Set it off by default and ready to
      run when some feature like adaptive tickless need it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      1e1a689f
    • F
      rcu: Allow rcu_user_enter()/exit() to nest · c5d900bf
      Frederic Weisbecker 提交于
      Allow calls to rcu_user_enter() even if we are already
      in userspace (as seen by RCU) and allow calls to rcu_user_exit()
      even if we are already in the kernel.
      
      This makes the APIs more flexible to be called from architectures.
      Exception entries for example won't need to know if they come from
      userspace before calling rcu_user_exit().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      c5d900bf
  9. 23 9月, 2012 8 次提交
  10. 13 8月, 2012 2 次提交
    • P
      rcu: Use smp_hotplug_thread facility for RCUs per-CPU kthread · 62ab7072
      Paul E. McKenney 提交于
      Bring RCU into the new-age CPU-hotplug fold by modifying RCU's per-CPU
      kthread code to use the new smp_hotplug_thread facility.
      
      [ tglx: Adapted it to use callbacks and to the simplified rcu yield ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/20120716103948.673354828@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62ab7072
    • T
      rcu: Yield simpler · 5d01bbd1
      Thomas Gleixner 提交于
      The rcu_yield() code is amazing. It's there to avoid starvation of the
      system when lots of (boosting) work is to be done.
      
      Now looking at the code it's functionality is:
      
       Make the thread SCHED_OTHER and very nice, i.e. get it out of the way
       Arm a timer with 2 ticks
       schedule()
      
      Now if the system goes idle the rcu task returns, regains SCHED_FIFO
      and plugs on. If the systems stays busy the timer fires and wakes a
      per node kthread which in turn makes the per cpu thread SCHED_FIFO and
      brings it back on the cpu. For the boosting thread the "make it FIFO"
      bit is missing and it just runs some magic boost checks. Now this is a
      lot of code with extra threads and complexity.
      
      It's way simpler to let the tasks when they detect overload schedule
      away for 2 ticks and defer the normal wakeup as long as they are in
      yielded state and the cpu is not idle.
      
      That solves the same problem and the only difference is that when the
      cpu goes idle it's not guaranteed that the thread returns right away,
      but it won't be longer out than two ticks, so no harm is done. If
      that's an issue than it is way simpler just to wake the task from
      idle as RCU has callbacks there anyway.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20120716103948.131256723@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d01bbd1
  11. 03 7月, 2012 10 次提交
  12. 07 6月, 2012 1 次提交
  13. 10 5月, 2012 1 次提交
    • P
      rcu: Make rcu_barrier() less disruptive · b1420f1c
      Paul E. McKenney 提交于
      The rcu_barrier() primitive interrupts each and every CPU, registering
      a callback on every CPU.  Once all of these callbacks have been invoked,
      rcu_barrier() knows that every callback that was registered before
      the call to rcu_barrier() has also been invoked.
      
      However, there is no point in registering a callback on a CPU that
      currently has no callbacks, most especially if that CPU is in a
      deep idle state.  This commit therefore makes rcu_barrier() avoid
      interrupting CPUs that have no callbacks.  Doing this requires reworking
      the handling of orphaned callbacks, otherwise callbacks could slip through
      rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
      not yet interrupted to a CPU that rcu_barrier() had already interrupted.
      This reworking was needed anyway to take a first step towards weaning
      RCU from the CPU_DYING notifier's use of stop_cpu().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b1420f1c
  14. 03 5月, 2012 1 次提交
  15. 25 4月, 2012 2 次提交
    • P
      rcu: Make RCU_FAST_NO_HZ account for pauses out of idle · c57afe80
      Paul E. McKenney 提交于
      Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
      macro can cause RCU to momentarily pause out of idle without the rest
      of the system being involved.  This can cause rcu_prepare_for_idle()
      to run through its state machine too quickly, which can in turn result
      in needless scheduling-clock interrupts.
      
      This commit therefore adds code to enable rcu_prepare_for_idle() to
      distinguish between an initial entry to idle on the one hand (which needs
      to advance the rcu_prepare_for_idle() state machine) and an idle reentry
      due to idle-capable trace macros and RCU_NONIDLE() on the other hand
      (which should avoid advancing the rcu_prepare_for_idle() state machine).
      Additional state is maintained to allow the timer to be correctly reposted
      when returning after a momentary pause out of idle, and even more state
      is maintained to detect when new non-lazy callbacks have been enqueued
      (which may require re-evaluation of the approach to idleness).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c57afe80
    • P
      rcu: Reduce cache-miss initialization latencies for large systems · 8932a63d
      Paul E. McKenney 提交于
      Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
      limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
      needed to reduce lock contention that was induced by the synchronization
      of scheduling-clock interrupts, which was in turn needed to improve
      energy efficiency for moderate-sized lightly loaded servers.
      
      However, reducing the leaf-level fanout means that there are more
      leaf-level rcu_node structures in the tree, which in turn means that
      RCU's grace-period initialization incurs more cache misses.  This is
      not a problem on moderate-sized servers with only a few tens of CPUs,
      but becomes a major source of real-time latency spikes on systems with
      many hundreds of CPUs.  In addition, the workloads running on these large
      systems tend to be CPU-bound, which eliminates the energy-efficiency
      advantages of synchronizing scheduling-clock interrupts.  Therefore,
      these systems need maximal values for the rcu_node leaf-level fanout.
      
      This commit addresses this problem by introducing a new kernel parameter
      named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
      This parameter defaults to 16 to handle the common case of a moderate
      sized lightly loaded servers, but may be set higher on larger systems.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8932a63d
  16. 22 2月, 2012 3 次提交
    • P
      rcu: Rework detection of use of RCU by offline CPUs · 2036d94a
      Paul E. McKenney 提交于
      Because newly offlined CPUs continue executing after completing the
      CPU_DYING notifiers, they legitimately enter the scheduler and use
      RCU while appearing to be offline.  This calls for a more sophisticated
      approach as follows:
      
      1.	RCU marks the CPU online during the CPU_UP_PREPARE phase.
      
      2.	RCU marks the CPU offline during the CPU_DEAD phase.
      
      3.	Diagnostics regarding use of read-side RCU by offline CPUs use
      	RCU's accounting rather than the cpu_online_map.  (Note that
      	__call_rcu() still uses cpu_online_map to detect illegal
      	invocations within CPU_DYING notifiers.)
      
      4.	Offline CPUs are prevented from hanging the system by
      	force_quiescent_state(), which pays attention to cpu_online_map.
      	Some additional work (in a later commit) will be needed to
      	guarantee that force_quiescent_state() waits a full jiffy before
      	assuming that a CPU is offline, for example, when called from
      	idle entry.  (This commit also makes the one-jiffy wait
      	explicit, since the old-style implicit wait can now be defeated
      	by RCU_FAST_NO_HZ and by rcutorture.)
      
      This approach avoids the false positives encountered when attempting to
      use more exact classification of CPU online/offline state.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2036d94a
    • P
      rcu: Print scheduling-clock information on RCU CPU stall-warning messages · a858af28
      Paul E. McKenney 提交于
      There have been situations where RCU CPU stall warnings were caused by
      issues in scheduling-clock timer initialization.  To make it easier to
      track these down, this commit causes the RCU CPU stall-warning messages
      to print out the number of scheduling-clock interrupts taken in the
      current grace period for each stalled CPU.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a858af28
    • P
      rcu: Set RCU CPU stall times via sysfs · 13cfcca0
      Paul E. McKenney 提交于
      The default CONFIG_RCU_CPU_STALL_TIMEOUT value of 60 seconds has served
      Linux users well for production use for quite some time.  However, for
      debugging, there will be more than three minutes between subsequent
      stall-warning messages.  This can be an annoyingly long wait if you
      are trying to work out where the offending infinite loop is hiding.
      
      Therefore, this commit provides a rcu_cpu_stall_timeout sysfs
      parameter that may be adjusted at boot time and at runtime to speed
      up debugging.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      13cfcca0