1. 23 9月, 2012 1 次提交
    • P
      rcu: Fix day-one dyntick-idle stall-warning bug · a10d206e
      Paul E. McKenney 提交于
      Each grace period is supposed to have at least one callback waiting
      for that grace period to complete.  However, if CONFIG_NO_HZ=n, an
      extra callback-free grace period is no big problem -- it will chew up
      a tiny bit of CPU time, but it will complete normally.  In contrast,
      CONFIG_NO_HZ=y kernels have the potential for all the CPUs to go to
      sleep indefinitely, in turn indefinitely delaying completion of the
      callback-free grace period.  Given that nothing is waiting on this grace
      period, this is also not a problem.
      
      That is, unless RCU CPU stall warnings are also enabled, as they are
      in recent kernels.  In this case, if a CPU wakes up after at least one
      minute of inactivity, an RCU CPU stall warning will result.  The reason
      that no one noticed until quite recently is that most systems have enough
      OS noise that they will never remain absolutely idle for a full minute.
      But there are some embedded systems with cut-down userspace configurations
      that consistently get into this situation.
      
      All this begs the question of exactly how a callback-free grace period
      gets started in the first place.  This can happen due to the fact that
      CPUs do not necessarily agree on which grace period is in progress.
      If a CPU still believes that the grace period that just completed is
      still ongoing, it will believe that it has callbacks that need to wait for
      another grace period, never mind the fact that the grace period that they
      were waiting for just completed.  This CPU can therefore erroneously
      decide to start a new grace period.  Note that this can happen in
      TREE_RCU and TREE_PREEMPT_RCU even on a single-CPU system:  Deadlock
      considerations mean that the CPU that detected the end of the grace
      period is not necessarily officially informed of this fact for some time.
      
      Once this CPU notices that the earlier grace period completed, it will
      invoke its callbacks.  It then won't have any callbacks left.  If no
      other CPU has any callbacks, we now have a callback-free grace period.
      
      This commit therefore makes CPUs check more carefully before starting a
      new grace period.  This new check relies on an array of tail pointers
      into each CPU's list of callbacks.  If the CPU is up to date on which
      grace periods have completed, it checks to see if any callbacks follow
      the RCU_DONE_TAIL segment, otherwise it checks to see if any callbacks
      follow the RCU_WAIT_TAIL segment.  The reason that this works is that
      the RCU_WAIT_TAIL segment will be promoted to the RCU_DONE_TAIL segment
      as soon as the CPU is officially notified that the old grace period
      has ended.
      
      This change is to cpu_needs_another_gp(), which is called in a number
      of places.  The only one that really matters is in rcu_start_gp(), where
      the root rcu_node structure's ->lock is held, which prevents any
      other CPU from starting or completing a grace period, so that the
      comparison that determines whether the CPU is missing the completion
      of a grace period is stable.
      Reported-by: NBecky Bruce <bgillbruce@gmail.com>
      Reported-by: NSubodh Nijsure <snijsure@grid-net.com>
      Reported-by: NPaul Walmsley <paul@pwsan.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: Paul Walmsley <paul@pwsan.com>  # OMAP3730, OMAP4430
      Cc: stable@vger.kernel.org
      a10d206e
  2. 06 7月, 2012 2 次提交
  3. 03 7月, 2012 24 次提交
  4. 26 6月, 2012 1 次提交
  5. 07 6月, 2012 1 次提交
  6. 10 5月, 2012 1 次提交
    • P
      rcu: Make rcu_barrier() less disruptive · b1420f1c
      Paul E. McKenney 提交于
      The rcu_barrier() primitive interrupts each and every CPU, registering
      a callback on every CPU.  Once all of these callbacks have been invoked,
      rcu_barrier() knows that every callback that was registered before
      the call to rcu_barrier() has also been invoked.
      
      However, there is no point in registering a callback on a CPU that
      currently has no callbacks, most especially if that CPU is in a
      deep idle state.  This commit therefore makes rcu_barrier() avoid
      interrupting CPUs that have no callbacks.  Doing this requires reworking
      the handling of orphaned callbacks, otherwise callbacks could slip through
      rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had
      not yet interrupted to a CPU that rcu_barrier() had already interrupted.
      This reworking was needed anyway to take a first step towards weaning
      RCU from the CPU_DYING notifier's use of stop_cpu().
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b1420f1c
  7. 03 5月, 2012 1 次提交
  8. 25 4月, 2012 3 次提交
    • P
      rcu: Make RCU_FAST_NO_HZ account for pauses out of idle · c57afe80
      Paul E. McKenney 提交于
      Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE()
      macro can cause RCU to momentarily pause out of idle without the rest
      of the system being involved.  This can cause rcu_prepare_for_idle()
      to run through its state machine too quickly, which can in turn result
      in needless scheduling-clock interrupts.
      
      This commit therefore adds code to enable rcu_prepare_for_idle() to
      distinguish between an initial entry to idle on the one hand (which needs
      to advance the rcu_prepare_for_idle() state machine) and an idle reentry
      due to idle-capable trace macros and RCU_NONIDLE() on the other hand
      (which should avoid advancing the rcu_prepare_for_idle() state machine).
      Additional state is maintained to allow the timer to be correctly reposted
      when returning after a momentary pause out of idle, and even more state
      is maintained to detect when new non-lazy callbacks have been enqueued
      (which may require re-evaluation of the approach to idleness).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c57afe80
    • P
      rcu: Document why rcu_blocking_is_gp() is safe · 6d813391
      Paul E. McKenney 提交于
      The rcu_blocking_is_gp() function tests to see if there is only one
      online CPU, and if so, synchronize_sched() and friends become no-ops.
      However, for larger systems, num_online_cpus() scans a large vector,
      and might be preempted while doing so.  While preempted, any number
      of CPUs might come online and go offline, potentially resulting in
      num_online_cpus() returning 1 when there never had only been one
      CPU online.  This could result in a too-short RCU grace period, which
      could in turn result in total failure, except that the only way that
      the grace period is too short is if there is an RCU read-side critical
      section spanning it.  For RCU-sched and RCU-bh (which are the only
      cases using rcu_blocking_is_gp()), RCU read-side critical sections
      have either preemption or bh disabled, which prevents CPUs from going
      offline.  This in turn prevents actual failures from occurring.
      
      This commit therefore adds a large block comment to rcu_blocking_is_gp()
      documenting why it is safe.  This commit also moves rcu_blocking_is_gp()
      into kernel/rcutree.c, which should help prevent unwary developers from
      mistaking it for a generally useful function.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6d813391
    • P
      rcu: Reduce cache-miss initialization latencies for large systems · 8932a63d
      Paul E. McKenney 提交于
      Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper
      limit of 16 on the leaf-level fanout for the rcu_node tree.  This was
      needed to reduce lock contention that was induced by the synchronization
      of scheduling-clock interrupts, which was in turn needed to improve
      energy efficiency for moderate-sized lightly loaded servers.
      
      However, reducing the leaf-level fanout means that there are more
      leaf-level rcu_node structures in the tree, which in turn means that
      RCU's grace-period initialization incurs more cache misses.  This is
      not a problem on moderate-sized servers with only a few tens of CPUs,
      but becomes a major source of real-time latency spikes on systems with
      many hundreds of CPUs.  In addition, the workloads running on these large
      systems tend to be CPU-bound, which eliminates the energy-efficiency
      advantages of synchronizing scheduling-clock interrupts.  Therefore,
      these systems need maximal values for the rcu_node leaf-level fanout.
      
      This commit addresses this problem by introducing a new kernel parameter
      named RCU_FANOUT_LEAF that directly controls the leaf-level fanout.
      This parameter defaults to 16 to handle the common case of a moderate
      sized lightly loaded servers, but may be set higher on larger systems.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Reported-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8932a63d
  9. 17 4月, 2012 1 次提交
    • P
      rcu: Permit call_rcu() from CPU_DYING notifiers · 92c38702
      Paul E. McKenney 提交于
      As of:
      
        29494be7 ("rcu,cleanup: simplify the code when cpu is dying")
      
      RCU adopts callbacks from the dying CPU in its CPU_DYING notifier,
      which means that any callbacks posted by later CPU_DYING notifiers
      are ignored until the CPU comes back online.
      
      A WARN_ON_ONCE() was added to __call_rcu() by:
      
        e5601400 ("rcu: Simplify offline processing")
      
      to check for this condition.  Although this condition did not trigger
      (at least as far as I know) during -next testing, it did recently
      trigger in mainline:
      
        https://lkml.org/lkml/2012/4/2/34
      
      What is needed longer term is for RCU's CPU_DEAD notifier to adopt any
      callbacks that were posted by CPU_DYING notifiers, however, the Linux
      kernel has been running with this sort of thing happening for quite
      some time.  So the only thing that qualifies as a regression is the
      WARN_ON_ONCE(), which this commit removes.
      
      Making RCU's CPU_DEAD notifier adopt callbacks posted by CPU_DYING
      notifiers is a topic for the 3.5 release of the Linux kernel.
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      92c38702
  10. 22 2月, 2012 5 次提交
    • H
      rcu: Stop spurious warnings from synchronize_sched_expedited · 1cc85961
      Hugh Dickins 提交于
      synchronize_sched_expedited() is spamming CONFIG_DEBUG_PREEMPT=y
      users with an unintended warning from the cpu_is_offline() check: use
      raw_smp_processor_id() instead of smp_processor_id() there.
      
      Because the warning is under a get_online_cpus(), it is not possible
      for any CPUs to go offline, though it is quite possible that the
      task might migrate between the raw_smp_processor_id() and the check
      of cpu_is_offline().  This is not a problem because the task cannot
      migrate from an offline CPU to an online one or vice versa.  The point
      of the check is to verify that synchronize_sched_expedited() is not
      called from an offline CPU, for example, from a CPU_DYING notifier, or,
      more important, from an outgoing CPU making its way from its CPU_DYING
      notifiers to the idle loop.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1cc85961
    • P
      rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections · 8a2ecf47
      Paul E. McKenney 提交于
      RCU, RCU-bh, and RCU-sched read-side critical sections are forbidden
      in the inner idle loop, that is, between the rcu_idle_enter() and the
      rcu_idle_exit() -- RCU will happily ignore any such read-side critical
      sections.  However, things like powertop need tracepoints in the inner
      idle loop.
      
      This commit therefore provides an RCU_NONIDLE() macro that can be used to
      wrap code in the idle loop that requires RCU read-side critical sections.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      8a2ecf47
    • P
      rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit() · 29e37d81
      Paul E. McKenney 提交于
      Use of RCU in the idle loop is incorrect, quite a few instances of
      just that have made their way into mainline, primarily event tracing.
      The problem with RCU read-side critical sections on CPUs that RCU believes
      to be idle is that RCU is completely ignoring the CPU, along with any
      attempts and RCU read-side critical sections.
      
      The approaches of eliminating the offending uses and of pushing the
      definition of idle down beyond the offending uses have both proved
      impractical.  The new approach is to encapsulate offending uses of RCU
      with rcu_idle_exit() and rcu_idle_enter(), but this requires nesting
      for code that is invoked both during idle and and during normal execution.
      Therefore, this commit modifies rcu_idle_enter() and rcu_idle_exit() to
      permit nesting.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      29e37d81
    • P
      rcu: Call out dangers of expedited RCU primitives · 236fefaf
      Paul E. McKenney 提交于
      The expedited RCU primitives can be quite useful, but they have some
      high costs as well.  This commit updates and creates docbook comments
      calling out the costs, and updates the RCU documentation as well.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      236fefaf
    • P
      rcu: Rework detection of use of RCU by offline CPUs · 2036d94a
      Paul E. McKenney 提交于
      Because newly offlined CPUs continue executing after completing the
      CPU_DYING notifiers, they legitimately enter the scheduler and use
      RCU while appearing to be offline.  This calls for a more sophisticated
      approach as follows:
      
      1.	RCU marks the CPU online during the CPU_UP_PREPARE phase.
      
      2.	RCU marks the CPU offline during the CPU_DEAD phase.
      
      3.	Diagnostics regarding use of read-side RCU by offline CPUs use
      	RCU's accounting rather than the cpu_online_map.  (Note that
      	__call_rcu() still uses cpu_online_map to detect illegal
      	invocations within CPU_DYING notifiers.)
      
      4.	Offline CPUs are prevented from hanging the system by
      	force_quiescent_state(), which pays attention to cpu_online_map.
      	Some additional work (in a later commit) will be needed to
      	guarantee that force_quiescent_state() waits a full jiffy before
      	assuming that a CPU is offline, for example, when called from
      	idle entry.  (This commit also makes the one-jiffy wait
      	explicit, since the old-style implicit wait can now be defeated
      	by RCU_FAST_NO_HZ and by rcutorture.)
      
      This approach avoids the false positives encountered when attempting to
      use more exact classification of CPU online/offline state.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2036d94a