1. 23 9月, 2012 4 次提交
  2. 03 7月, 2012 10 次提交
  3. 07 6月, 2012 1 次提交
  4. 10 5月, 2012 1 次提交
    • P
      rcu: Make rcu_barrier() less disruptive · b1420f1c
      Paul E. McKenney 提交于
      The rcu_barrier() primitive interrupts each and every CPU, registering
      a callback on every CPU.  Once all of these callbacks have been invoked,
      rcu_barrier() knows that every callback that was registered before
      the call to rcu_barrier() has also been invoked.
      
      However, there is no point in registering a callback on a CPU that
      currently has no callbacks, most especially if that CPU is in a
      deep idle state.  This commit therefore makes rcu_barrier() avoid
      interrupting CPUs that have no callbacks.  Doing this requires reworking
      the handling of orphaned callbacks, otherwise callbacks could slip through
      rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
      not yet interrupted to a CPU that rcu_barrier() had already interrupted.
      This reworking was needed anyway to take a first step towards weaning
      RCU from the CPU_DYING notifier's use of stop_cpu().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b1420f1c
  5. 03 5月, 2012 1 次提交
  6. 25 4月, 2012 2 次提交
    • P
      rcu: Make RCU_FAST_NO_HZ account for pauses out of idle · c57afe80
      Paul E. McKenney 提交于
      Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
      macro can cause RCU to momentarily pause out of idle without the rest
      of the system being involved.  This can cause rcu_prepare_for_idle()
      to run through its state machine too quickly, which can in turn result
      in needless scheduling-clock interrupts.
      
      This commit therefore adds code to enable rcu_prepare_for_idle() to
      distinguish between an initial entry to idle on the one hand (which needs
      to advance the rcu_prepare_for_idle() state machine) and an idle reentry
      due to idle-capable trace macros and RCU_NONIDLE() on the other hand
      (which should avoid advancing the rcu_prepare_for_idle() state machine).
      Additional state is maintained to allow the timer to be correctly reposted
      when returning after a momentary pause out of idle, and even more state
      is maintained to detect when new non-lazy callbacks have been enqueued
      (which may require re-evaluation of the approach to idleness).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c57afe80
    • P
      rcu: Reduce cache-miss initialization latencies for large systems · 8932a63d
      Paul E. McKenney 提交于
      Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
      limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
      needed to reduce lock contention that was induced by the synchronization
      of scheduling-clock interrupts, which was in turn needed to improve
      energy efficiency for moderate-sized lightly loaded servers.
      
      However, reducing the leaf-level fanout means that there are more
      leaf-level rcu_node structures in the tree, which in turn means that
      RCU's grace-period initialization incurs more cache misses.  This is
      not a problem on moderate-sized servers with only a few tens of CPUs,
      but becomes a major source of real-time latency spikes on systems with
      many hundreds of CPUs.  In addition, the workloads running on these large
      systems tend to be CPU-bound, which eliminates the energy-efficiency
      advantages of synchronizing scheduling-clock interrupts.  Therefore,
      these systems need maximal values for the rcu_node leaf-level fanout.
      
      This commit addresses this problem by introducing a new kernel parameter
      named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
      This parameter defaults to 16 to handle the common case of a moderate
      sized lightly loaded servers, but may be set higher on larger systems.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8932a63d
  7. 22 2月, 2012 6 次提交
    • P
      rcu: Rework detection of use of RCU by offline CPUs · 2036d94a
      Paul E. McKenney 提交于
      Because newly offlined CPUs continue executing after completing the
      CPU_DYING notifiers, they legitimately enter the scheduler and use
      RCU while appearing to be offline.  This calls for a more sophisticated
      approach as follows:
      
      1.	RCU marks the CPU online during the CPU_UP_PREPARE phase.
      
      2.	RCU marks the CPU offline during the CPU_DEAD phase.
      
      3.	Diagnostics regarding use of read-side RCU by offline CPUs use
      	RCU's accounting rather than the cpu_online_map.  (Note that
      	__call_rcu() still uses cpu_online_map to detect illegal
      	invocations within CPU_DYING notifiers.)
      
      4.	Offline CPUs are prevented from hanging the system by
      	force_quiescent_state(), which pays attention to cpu_online_map.
      	Some additional work (in a later commit) will be needed to
      	guarantee that force_quiescent_state() waits a full jiffy before
      	assuming that a CPU is offline, for example, when called from
      	idle entry.  (This commit also makes the one-jiffy wait
      	explicit, since the old-style implicit wait can now be defeated
      	by RCU_FAST_NO_HZ and by rcutorture.)
      
      This approach avoids the false positives encountered when attempting to
      use more exact classification of CPU online/offline state.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2036d94a
    • P
      rcu: Print scheduling-clock information on RCU CPU stall-warning messages · a858af28
      Paul E. McKenney 提交于
      There have been situations where RCU CPU stall warnings were caused by
      issues in scheduling-clock timer initialization.  To make it easier to
      track these down, this commit causes the RCU CPU stall-warning messages
      to print out the number of scheduling-clock interrupts taken in the
      current grace period for each stalled CPU.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a858af28
    • P
      rcu: Set RCU CPU stall times via sysfs · 13cfcca0
      Paul E. McKenney 提交于
      The default CONFIG_RCU_CPU_STALL_TIMEOUT value of 60 seconds has served
      Linux users well for production use for quite some time.  However, for
      debugging, there will be more than three minutes between subsequent
      stall-warning messages.  This can be an annoyingly long wait if you
      are trying to work out where the offending infinite loop is hiding.
      
      Therefore, this commit provides a rcu_cpu_stall_timeout sysfs
      parameter that may be adjusted at boot time and at runtime to speed
      up debugging.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      13cfcca0
    • P
      rcu: Clean up straggling rcu_preempt_needs_cpu() name · 30fbcc90
      Paul E. McKenney 提交于
      The recent updates to RCU_CPU_FAST_NO_HZ have an rcu_needs_cpu() that
      does more than just check for callbacks, so get the name for
      rcu_preempt_needs_cpu() consistent with that change, now calling it
      rcu_preempt_cpu_has_callbacks().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      30fbcc90
    • P
      rcu: Simplify offline processing · e5601400
      Paul E. McKenney 提交于
      Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING
      notifier.  This simplifies the code by eliminating a potential
      deadlock and by reducing the responsibilities of force_quiescent_state().
      Also rename functions to make their connection to the CPU-hotplug
      stages explicit.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e5601400
    • P
      rcu: Avoid waking up CPUs having only kfree_rcu() callbacks · 486e2593
      Paul E. McKenney 提交于
      When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to
      enter dyntick-idle mode even if it still has RCU callbacks queued.
      RCU avoids system hangs in this case by scheduling a timer for several
      jiffies in the future.  However, if all of the callbacks on that CPU
      are from kfree_rcu(), there is no reason to wake the CPU up, as it is
      not a problem to defer freeing of memory.
      
      This commit therefore tracks the number of callbacks on a given CPU
      that are from kfree_rcu(), and avoids scheduling the timer if all of
      a given CPU's callbacks are from kfree_rcu().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      486e2593
  8. 12 12月, 2011 7 次提交
    • P
      rcu: Keep invoking callbacks if CPU otherwise idle · dff1672d
      Paul E. McKenney 提交于
      The rcu_do_batch() function that invokes callbacks for TREE_RCU and
      TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading
      scheduling latency.  However, as long as the CPU would otherwise be idle,
      there is no downside to continuing to invoke any callbacks that have passed
      through their grace periods.  In fact, processing such callbacks in a
      timely manner has the benefit of increasing the probability that the
      CPU can enter the power-saving dyntick-idle mode.
      
      Therefore, this commit allows callback invocation to continue beyond the
      preset limit as long as the scheduler does not have some other task to
      run and as long as context is that of the idle task or the relevant
      RCU kthread.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dff1672d
    • P
      rcu: Permit dyntick-idle with callbacks pending · 7cb92499
      Paul E. McKenney 提交于
      The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
      dyntick-idle state if they have RCU callbacks pending.  Unfortunately,
      this has the side-effect of often preventing them from entering this
      state, especially if at least one other CPU is not in dyntick-idle state.
      However, the resulting per-tick wakeup is wasteful in many cases: if the
      CPU has already fully responded to the current RCU grace period, there
      will be nothing for it to do until this grace period ends, which will
      frequently take several jiffies.
      
      This commit therefore permits a CPU that has done everything that the
      current grace period has asked of it (rcu_pending() == 0) even if it
      still as RCU callbacks pending.  However, such a CPU posts a timer to
      wake it up several jiffies later (6 jiffies, based on experience with
      grace-period lengths).  This wakeup is required to handle situations
      that can result in all CPUs being in dyntick-idle mode, thus failing
      to ever complete the current grace period.  If a CPU wakes up before
      the timer goes off, then it cancels that timer, thus avoiding spurious
      wakeups.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7cb92499
    • P
      rcu: Eliminate RCU_FAST_NO_HZ grace-period hang · f535a607
      Paul E. McKenney 提交于
      With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
      RCU grace periods as follows:
      
      o	CPU 0 attempts to go idle, cycles several times through the
      	rcu_prepare_for_idle() loop, then goes dyntick-idle when
      	RCU needs nothing more from it, while still having at least
      	on RCU callback pending.
      
      o	CPU 1 goes idle with no callbacks.
      
      Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
      the RCU grace period from ever completing, possibly hanging the system.
      
      This commit therefore prevents CPUs that have RCU callbacks from entering
      dyntick-idle mode.  This approach also eliminates the need for the
      end-of-grace-period IPIs used previously.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f535a607
    • P
      rcu: Allow dyntick-idle mode for CPUs with callbacks · aea1b35e
      Paul E. McKenney 提交于
      Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
      CPU has any RCU callbacks queued.  This means that workloads for which
      each CPU wakes up and does some RCU updates every few ticks will never
      enter dyntick-idle mode.  This can result in significant unnecessary power
      consumption, so this patch permits a given to enter dyntick-idle mode if
      it has callbacks, but only if that same CPU has completed all current
      work for the RCU core.  We determine use rcu_pending() to determine
      whether a given CPU has completed all current work for the RCU core.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      aea1b35e
    • T
      rcu: Omit self-awaken when setting up expedited grace period · b40d293e
      Thomas Gleixner 提交于
      When setting up an expedited grace period, if there were no readers, the
      task will awaken itself.  This commit removes this useless self-awakening.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b40d293e
    • P
      rcu: Track idleness independent of idle tasks · 9b2e4f18
      Paul E. McKenney 提交于
      Earlier versions of RCU used the scheduling-clock tick to detect idleness
      by checking for the idle task, but handled idleness differently for
      CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
      critical sections in the idle task, for example, for tracing.  A more
      fine-grained detection of idleness is therefore required.
      
      This commit presses the old dyntick-idle code into full-time service,
      so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
      always invoked at the beginning of an idle loop iteration.  Similarly,
      rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
      at the end of an idle-loop iteration.  This allows the idle task to
      use RCU everywhere except between consecutive rcu_idle_enter() and
      rcu_idle_exit() calls, in turn allowing architecture maintainers to
      specify exactly where in the idle loop that RCU may be used.
      
      Because some of the userspace upcall uses can result in what looks
      to RCU like half of an interrupt, it is not possible to expect that
      the irq_enter() and irq_exit() hooks will give exact counts.  This
      patch therefore expands the ->dynticks_nesting counter to 64 bits
      and uses two separate bitfields to count process/idle transitions
      and interrupt entry/exit transitions.  It is presumed that userspace
      upcalls do not happen in the idle loop or from usermode execution
      (though usermode might do a system call that results in an upcall).
      The counter is hard-reset on each process/idle transition, which
      avoids the interrupt entry/exit error from accumulating.  Overflow
      is avoided by the 64-bitness of the ->dyntick_nesting counter.
      
      This commit also adds warnings if a non-idle task asks RCU to enter
      idle state (and these checks will need some adjustment before applying
      Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
      In addition, validation of ->dynticks and ->dynticks_nesting is added.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      9b2e4f18
    • P
      rcu: ->signaled better named ->fqs_state · af446b70
      Paul E. McKenney 提交于
      The ->signaled field was named before complications in the form of
      dyntick-idle mode and offlined CPUs.  These complications have required
      that force_quiescent_state() be implemented as a state machine, instead
      of simply unconditionally sending reschedule IPIs.  Therefore, this
      commit renames ->signaled to ->fqs_state to catch up with the new
      force_quiescent_state() reality.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      af446b70
  9. 29 9月, 2011 5 次提交
    • P
      rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states · e90c53d3
      Paul E. McKenney 提交于
      The purpose of rcu_needs_cpu_flush() was to iterate on pushing the
      current grace period in order to help the current CPU enter dyntick-idle
      mode.  However, this can result in failures if the CPU starts entering
      dyntick-idle mode, but then backs out.  In this case, the call to
      rcu_pending() from rcu_needs_cpu_flush() might end up announcing a
      non-existing quiescent state.
      
      This commit therefore removes rcu_needs_cpu_flush() in favor of letting
      the dyntick-idle machinery at the end of the softirq handler push the
      loop along via its call to rcu_pending().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e90c53d3
    • P
      rcu: Suppress NMI backtraces when stall ends before dump · 9bc8b558
      Paul E. McKenney 提交于
      It is possible for an RCU CPU stall to end just as it is detected, in
      which case the current code will uselessly dump all CPU's stacks.
      This commit therefore checks for this condition and refrains from
      sending needless NMIs.
      
      And yes, the stall might also end just after we checked all CPUs and
      tasks, but in that case we would at least have given some clue as
      to which CPU/task was at fault.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9bc8b558
    • P
      rcu: Simplify quiescent-state accounting · e4cc1f22
      Paul E. McKenney 提交于
      There is often a delay between the time that a CPU passes through a
      quiescent state and the time that this quiescent state is reported to the
      RCU core.  It is quite possible that the grace period ended before the
      quiescent state could be reported, for example, some other CPU might have
      deduced that this CPU passed through dyntick-idle mode.  It is critically
      important that quiescent state be counted only against the grace period
      that was in effect at the time that the quiescent state was detected.
      
      Previously, this was handled by recording the number of the last grace
      period to complete when passing through a quiescent state.  The RCU
      core then checks this number against the current value, and rejects
      the quiescent state if there is a mismatch.  However, one additional
      possibility must be accounted for, namely that the quiescent state was
      recorded after the prior grace period completed but before the current
      grace period started.  In this case, the RCU core must reject the
      quiescent state, but the recorded number will match.  This is handled
      when the CPU becomes aware of a new grace period -- at that point,
      it invalidates any prior quiescent state.
      
      This works, but is a bit indirect.  The new approach records the current
      grace period, and the RCU core checks to see (1) that this is still the
      current grace period and (2) that this grace period has not yet ended.
      This approach simplifies reasoning about correctness, and this commit
      changes over to this new approach.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e4cc1f22
    • P
      rcu: Add grace-period, quiescent-state, and call_rcu trace events · d4c08f2a
      Paul E. McKenney 提交于
      Add trace events to record grace-period start and end, quiescent states,
      CPUs noticing grace-period start and end, grace-period initialization,
      call_rcu() invocation, tasks blocking in RCU read-side critical sections,
      tasks exiting those same critical sections, force_quiescent_state()
      detection of dyntick-idle and offline CPUs, CPUs entering and leaving
      dyntick-idle mode (except from NMIs), CPUs coming online and going
      offline, and CPUs being kicked for staying in dyntick-idle mode for too
      long (as in many weeks, even on 32-bit systems).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      rcu: Add the rcu flavor to callback trace events
      
      The earlier trace events for registering RCU callbacks and for invoking
      them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
      This commit adds the RCU flavor to those trace events.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d4c08f2a
    • P
      rcu: Move RCU_BOOST declarations to allow compiler checking · eab0993c
      Paul E. McKenney 提交于
      Andi Kleen noticed that one of the RCU_BOOST data declarations was
      out of sync with the definition.  Move the declarations so that the
      compiler can do the checking in the future.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      eab0993c
  10. 17 6月, 2011 1 次提交
  11. 16 6月, 2011 1 次提交
  12. 15 6月, 2011 1 次提交
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371