1. 11 5月, 2010 7 次提交
    • P
      rcu: print boot-time console messages if RCU configs out of ordinary · 26845c28
      Paul E. McKenney 提交于
      Print boot-time messages if tracing is enabled, if fanout is set
      to non-default values, if exact fanout is specified, if accelerated
      dyntick-idle grace periods have been enabled, if RCU-lockdep is enabled,
      if rcutorture has been boot-time enabled, if the CPU stall detector has
      been disabled, or if four-level hierarchy has been enabled.
      
      This is all for TREE_RCU and TREE_PREEMPT_RCU.  TINY_RCU will be handled
      separately, if at all.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      26845c28
    • P
      rcu: disable CPU stall warnings upon panic · c68de209
      Paul E. McKenney 提交于
      The current RCU CPU stall warnings remain enabled even after a panic
      occurs, which some people have found to be a bit counterproductive.
      This patch therefore uses a notifier to disable stall warnings once a
      panic occurs.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c68de209
    • P
      rcu: slim down rcutiny by removing rcu_scheduler_active and friends · bbad9379
      Paul E. McKenney 提交于
      TINY_RCU does not need rcu_scheduler_active unless CONFIG_DEBUG_LOCK_ALLOC.
      So conditionally compile rcu_scheduler_active in order to slim down
      rcutiny a bit more.  Also gets rid of an EXPORT_SYMBOL_GPL, which is
      responsible for most of the slimming.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bbad9379
    • P
      rcu: refactor RCU's context-switch handling · 25502a6c
      Paul E. McKenney 提交于
      The addition of preemptible RCU to treercu resulted in a bit of
      confusion and inefficiency surrounding the handling of context switches
      for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
      is a quiescent state, pure and simple, just like it always has been.
      For RCU-preempt, a context switch is in no way a quiescent state, but
      special handling is required when a task blocks in an RCU read-side
      critical section.
      
      However, the callout from the scheduler and the outer loop in ksoftirqd
      still calls something named rcu_sched_qs(), whose name is no longer
      accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
      quiescent state, it ends up unnecessarily (though harmlessly, aside
      from the performance hit) enqueuing the current task if it happens to
      be running in an RCU-preempt read-side critical section.  This not only
      increases the maximum latency of scheduler_tick(), it also needlessly
      increases the overhead of the next outermost rcu_read_unlock() invocation.
      
      This patch addresses this situation by separating the notion of RCU's
      context-switch handling from that of RCU-sched's quiescent states.
      The context-switch handling is covered by rcu_note_context_switch() in
      general and by rcu_preempt_note_context_switch() for preemptible RCU.
      This permits rcu_sched_qs() to handle quiescent states and only quiescent
      states.  It also reduces the maximum latency of scheduler_tick(), though
      probably by much less than a microsecond.  Finally, it means that tasks
      within preemptible-RCU read-side critical sections avoid incurring the
      overhead of queuing unless there really is a context switch.
      Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      25502a6c
    • L
      rcu: move some code from macro to function · 0c34029a
      Lai Jiangshan 提交于
      Shrink the RCU_INIT_FLAVOR() macro by moving all but the initialization
      of the ->rda[] array to rcu_init_one().  The call to rcu_init_one()
      can then be moved to the end of the RCU_INIT_FLAVOR() macro, which is
      required because rcu_boot_init_percpu_data(), which is now called from
      rcu_init_one(), depends on the initialization of the ->rda[] array.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0c34029a
    • L
      rcu: make dead code really dead · f261414f
      Lai Jiangshan 提交于
      cleanup: make dead code really dead
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f261414f
    • P
      rcu: substitute set_need_resched for sending resched IPIs · d25eb944
      Paul E. McKenney 提交于
      This patch adds a check to __rcu_pending() that does a local
      set_need_resched() if the current CPU is holding up the current grace
      period and if force_quiescent_state() will be called soon.  The goal is
      to reduce the probability that force_quiescent_state() will need to do
      smp_send_reschedule(), which sends an IPI and is therefore more expensive
      on most architectures.
      Signed-off-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      d25eb944
  2. 27 2月, 2010 1 次提交
  3. 26 2月, 2010 1 次提交
    • P
      rcu: Make rcu_read_lock_sched_held() take boot time into account · d9f1bb6a
      Paul E. McKenney 提交于
      Before the scheduler starts, all tasks are non-preemptible by
      definition. So, during that time, rcu_read_lock_sched_held()
      needs to always return "true".  This patch makes that be so.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267135607-7056-2-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9f1bb6a
  4. 25 2月, 2010 5 次提交
    • P
      rcu: Add RCU_CPU_STALL_VERBOSE to dump detailed per-task information · 1ed509a2
      Paul E. McKenney 提交于
      When RCU detects a grace-period stall, it currently just prints
      out the PID of any tasks doing the stalling.  This patch adds
      RCU_CPU_STALL_VERBOSE, which enables the more-verbose reporting
      from sched_show_task().
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-21-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ed509a2
    • P
      rcu: Fix deadlock in TREE_PREEMPT_RCU CPU stall detection · 3acd9eb3
      Paul E. McKenney 提交于
      Under TREE_PREEMPT_RCU, print_other_cpu_stall() invokes
      rcu_print_task_stall() with the root rcu_node structure's ->lock
      held, and rcu_print_task_stall() acquires that same lock for
      self-deadlock. Fix this by removing the lock acquisition from
      rcu_print_task_stall(), and making all callers acquire the lock
      instead.
      Tested-by: NJohn Kacur <jkacur@redhat.com>
      Tested-by: NThomas Gleixner <tglx@linutronix.de>
      Located-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-19-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3acd9eb3
    • P
      rcu: Convert to raw_spinlocks · 1304afb2
      Paul E. McKenney 提交于
      The spinlocks in rcutree need to be real spinlocks in
      preempt-rt. Convert them to raw_spinlocks.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-18-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1304afb2
    • P
      rcu: Stop overflowing signed integers · 20133cfc
      Paul E. McKenney 提交于
      The C standard does not specify the result of an operation that
      overflows a signed integer, so such operations need to be
      avoided.  This patch changes the type of several fields from
      "long" to "unsigned long" and adjusts operations as needed.
      ULONG_CMP_GE() and ULONG_CMP_LT() macros are introduced to do
      the modular comparisons that are appropriate given that overflow
      is an expected event.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-17-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      20133cfc
    • P
      rcu: Accelerate grace period if last non-dynticked CPU · 8bd93a2c
      Paul E. McKenney 提交于
      Currently, rcu_needs_cpu() simply checks whether the current CPU
      has an outstanding RCU callback, which means that the last CPU
      to go into dyntick-idle mode might wait a few ticks for the
      relevant grace periods to complete.  However, if all the other
      CPUs are in dyntick-idle mode, and if this CPU is in a quiescent
      state (which it is for RCU-bh and RCU-sched any time that we are
      considering going into dyntick-idle mode), then the grace period
      is instantly complete.
      
      This patch therefore repeatedly invokes the RCU grace-period
      machinery in order to force any needed grace periods to complete
      quickly.  It does so a limited number of times in order to
      prevent starvation by an RCU callback function that might pass
      itself to call_rcu().
      
      However, if any CPU other than the current one is not in
      dyntick-idle mode, fall back to simply checking (with fix to bug
      noted by Lai Jiangshan).  Also, take advantage of last
      grace-period forcing, the opportunity to do so noted by Steve
      Rostedt.  And apply simplified #ifdef condition suggested by
      Frederic Weisbecker.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bd93a2c
  5. 16 1月, 2010 1 次提交
    • P
      rcu: Fix sparse warnings · 017c4261
      Paul E. McKenney 提交于
      Rename local variable "i" in rcu_init() to avoid conflict with
      RCU_INIT_FLAVOR(), restrict the scope of RCU_TREE_NONCORE, and
      make __synchronize_srcu() static.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12635142581560-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      017c4261
  6. 13 1月, 2010 12 次提交
    • P
      rcu: Give different levels of the rcu_node hierarchy distinct lockdep names · b6407e86
      Paul E. McKenney 提交于
      Previously, each level of the rcu_node hierarchy had the same
      rather unimaginative name: "&rcu_node_class[i]".  This makes
      lockdep diagnostics involving these lockdep classes less helpful
      than would be nice. This patch fixes this by giving each level
      of the rcu_node hierarchy a distinct name: "rcu_node_level_0",
      "rcu_node_level_1", and so on. This version of the patch
      includes improved diagnostics suggested by Josh Triplett and
      Peter Zijlstra.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626498421830-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b6407e86
    • P
      rcu: Add force_quiescent_state() testing to rcutorture · bf66f18e
      Paul E. McKenney 提交于
      Add force_quiescent_state() testing to rcutorture, with a
      separate thread that repeatedly invokes force_quiescent_state()
      in bursts. This can greatly increase the probability of
      encountering certain types of race conditions.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1262646551116-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf66f18e
    • P
      rcu: Make force_quiescent_state() start grace period if needed · 46a1e34e
      Paul E. McKenney 提交于
      Grace periods cannot be started while force_quiescent_state() is
      active.  This is OK in that the affected CPUs will try again
      later, but it does induce needless grace-period delays.  This
      patch causes rcu_start_gp() to record a failed attempt to start
      a grace period. When force_quiescent_state() prepares to return,
      it then starts the grace period if there was such a failed
      attempt.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501854-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46a1e34e
    • P
      rcu: Remove redundant grace-period check · 45f014c5
      Paul E. McKenney 提交于
      The rcu_process_dyntick() function checks twice for the end of
      the current grace period.  However, it holds the current
      rcu_node structure's ->lock field throughout, and doesn't get to
      the second call to rcu_gp_in_progress() unless there is at least
      one CPU corresponding to this rcu_node structure that has not
      yet checked in for the current grace period, which would prevent
      the current grace period from ending. So the current grace
      period cannot have ended, and the second check is redundant, so
      remove it.
      
      Also, given that this function is used even with !CONFIG_NO_HZ,
      its name is quite misleading.  Change from rcu_process_dyntick()
      to force_qs_rnp().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1262646550562-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45f014c5
    • P
      rcu: Remove leg of force_quiescent_state() switch statement · ee47eb9f
      Paul E. McKenney 提交于
      The comparisons of rsp->gpnum nad rsp->completed in
      rcu_process_dyntick() and force_quiescent_state() can be
      replaced by the much more clear rcu_gp_in_progress() predicate
      function.  After doing this, it becomes clear that the
      RCU_SAVE_COMPLETED leg of the force_quiescent_state() function's
      switch statement is almost completely a no-op.  A small change
      to the RCU_SAVE_DYNTICK leg renders it a complete no-op, after
      which it can be removed.  Doing so also eliminates the forcenow
      local variable from force_quiescent_state().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501781-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ee47eb9f
    • P
      rcu: Eliminate rcu_process_dyntick() return value · 0f10dc82
      Paul E. McKenney 提交于
      Because a new grace period cannot start while we are executing
      within the force_quiescent_state() function's switch statement,
      if any test within that switch statement or within any function
      called from that switch statement shows that the current grace
      period has ended, we can safely re-do that test any time before
      we leave the switch statement.  This means that we no longer
      need a return value from rcu_process_dyntick(), as we can simply
      invoke rcu_gp_in_progress() to check whether the old grace
      period has finished -- there is no longer any need to worry
      about whether or not a new grace period has been started.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501857-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f10dc82
    • P
      rcu: Eliminate second argument of rcu_process_dyntick() · eb1ba45f
      Paul E. McKenney 提交于
      At this point, the second argument to all calls to
      rcu_process_dyntick() is a function of the same field of the
      structure passed in as the first argument, namely, rsp->gpnum-1.
       So propagate rsp->gpnum-1 to all uses of the second argument
      within rcu_process_dyntick() and then eliminate the second
      argument.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465503786-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eb1ba45f
    • P
      rcu: Eliminate local variable lastcomp from force_quiescent_state() · 39c0bbfc
      Paul E. McKenney 提交于
      Because rsp->fqs_active is set to 1 across
      force_quiescent_state()'s switch statement, rcu_start_gp() will
      refrain from starting a new grace period during this time.
      Therefore, rsp->gpnum is constant, and can be propagated to all
      uses of lastcomp, eliminating this local variable.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465502985-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      39c0bbfc
    • P
      rcu: Eliminate local variable signaled from force_quiescent_state() · f3a8b5c6
      Paul E. McKenney 提交于
      Because the root rcu_node lock is held across entry to the
      switch statement in force_quiescent_state(), it is no longer
      necessary to snapshot rsp->signaled to a local variable.
      Eliminate both the snapshotting and the local variable.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1262646550602-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3a8b5c6
    • P
      rcu: Prohibit starting new grace periods while forcing quiescent states · 07079d53
      Paul E. McKenney 提交于
      Reduce the number and variety of race conditions by prohibiting
      the start of a new grace period while force_quiescent_state() is
      active. A new fqs_active flag in the rcu_state structure is used
      to trace whether or not force_quiescent_state() is active, and
      this new flag is tested by rcu_start_gp().  If the CPU that
      closed out the last grace period needs another grace period,
      this new grace period may be delayed up to one scheduling-clock
      tick, but it will eventually get started.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <126264655052-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07079d53
    • P
      rcu: Adjust force_quiescent_state() locking, step 2 · 559569ac
      Paul E. McKenney 提交于
      This patch releases rnp->lock after the end of
      force_quiescent_state()'s switch statement.  This is a second
      step towards prohibiting starting grace periods while
      force_quiescent_state() is executing, which will reduce the
      number and complexity of races that force_quiescent_state() is
      involved in.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501994-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      559569ac
    • P
      rcu: Adjust force_quiescent_state() locking, step 1 · f96e9232
      Paul E. McKenney 提交于
      This causes rnp->lock to be held on entry to
      force_quiescent_state()'s switch statement.  This is a first
      step towards prohibiting starting grace periods while
      force_quiescent_state() is executing, which will reduce the
      number and complexity of races that force_quiescent_state() is
      involved in.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12626465501455-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f96e9232
  7. 03 12月, 2009 3 次提交
    • P
      rcu: Add expedited grace-period support for preemptible RCU · d9a3da06
      Paul E. McKenney 提交于
      Implement an synchronize_rcu_expedited() for preemptible RCU
      that actually is expedited.  This uses
      synchronize_sched_expedited() to force all threads currently
      running in a preemptible-RCU read-side critical section onto the
      appropriate ->blocked_tasks[] list, then takes a snapshot of all
      of these lists and waits for them to drain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1259784616158-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9a3da06
    • P
      rcu: Enable fourth level of TREE_RCU hierarchy · cf244dc0
      Paul E. McKenney 提交于
      Enable a fourth level of rcu_node hierarchy for TREE_RCU and
      TREE_PREEMPT_RCU.  This is for stress-testing and experiemental
      purposes only, although in theory this would enable 16,777,216
      CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
      systems. Normal experimental use of this fourth level will
      normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
      though the more adventurous (and more fortunate) experimenters
      may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
      CONFIG_RCU_FANOUT=4 for 256-CPU systems.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846161257-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cf244dc0
    • P
      rcu: Rename "quiet" functions · d3f6bad3
      Paul E. McKenney 提交于
      The number of "quiet" functions has grown recently, and the
      names are no longer very descriptive.  The point of all of these
      functions is to do some portion of the task of reporting a
      quiescent state, so rename them accordingly:
      
      o	cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
      	quiescent state to the per-CPU rcu_data structure.  If this
      	turns out to be a new quiescent state for this grace period,
      	then rcu_report_qs_rnp() will be invoked to propagate the
      	quiescent state up the rcu_node hierarchy.
      
      o	cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
      	a quiescent state for a given CPU (or possibly a set of CPUs)
      	up the rcu_node hierarchy.
      
      o	cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
      	reports a full set of quiescent states to the global rcu_state
      	structure.
      
      o	task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
      	a quiescent state due to a task exiting an RCU read-side critical
      	section that had previously blocked in that same critical section.
      	As indicated by the new name, this type of quiescent state is
      	reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
      	to do so).
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846163698-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3f6bad3
  8. 23 11月, 2009 3 次提交
    • P
      rcu: Re-arrange code to reduce #ifdef pain · 6ebb237b
      Paul E. McKenney 提交于
      Remove #ifdefs from kernel/rcupdate.c and
      include/linux/rcupdate.h by moving code to
      include/linux/rcutiny.h, include/linux/rcutree.h, and
      kernel/rcutree.c.
      
      Also remove some definitions that are no longer used.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258908830885-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ebb237b
    • P
      rcu: Eliminate unneeded function wrapping · 9f680ab4
      Paul E. McKenney 提交于
      The functions rcu_init() is a wrapper for __rcu_init(), and also
      sets up the CPU-hotplug notifier for rcu_barrier_cpu_hotplug().
      But TINY_RCU doesn't need CPU-hotplug notification, and the
      rcu_barrier_cpu_hotplug() is a simple wrapper for
      rcu_cpu_notify().
      
      So push rcu_init() out to kernel/rcutree.c and kernel/rcutiny.c
      and get rid of the wrapper function rcu_barrier_cpu_hotplug().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12589088302320-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f680ab4
    • P
      rcu: Fix grace-period-stall bug on large systems with CPU hotplug · b668c9cf
      Paul E. McKenney 提交于
      When the last CPU of a given leaf rcu_node structure goes
      offline, all of the tasks queued on that leaf rcu_node structure
      (due to having blocked in their current RCU read-side critical
      sections) are requeued onto the root rcu_node structure.  This
      requeuing is carried out by rcu_preempt_offline_tasks().
      However, it is possible that these queued tasks are the only
      thing preventing the leaf rcu_node structure from reporting a
      quiescent state up the rcu_node hierarchy.  Unfortunately, the
      old code would fail to do this reporting, resulting in a
      grace-period stall given the following sequence of events:
      
      1.	Kernel built for more than 32 CPUs on 32-bit systems or for more
      	than 64 CPUs on 64-bit systems, so that there is more than one
      	rcu_node structure.  (Or CONFIG_RCU_FANOUT is artificially set
      	to a number smaller than CONFIG_NR_CPUS.)
      
      2.	The kernel is built with CONFIG_TREE_PREEMPT_RCU.
      
      3.	A task running on a CPU associated with a given leaf rcu_node
      	structure blocks while in an RCU read-side critical section
      	-and- that CPU has not yet passed through a quiescent state
      	for the current RCU grace period.  This will cause the task
      	to be queued on the leaf rcu_node's blocked_tasks[] array, in
      	particular, on the element of this array corresponding to the
      	current grace period.
      
      4.	Each of the remaining CPUs corresponding to this same leaf rcu_node
      	structure pass through a quiescent state.  However, the task is
      	still in its RCU read-side critical section, so these quiescent
      	states cannot be reported further up the rcu_node hierarchy.
      	Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
      	field are now zero.
      
      5.	Each of the remaining CPUs go offline.  (The events in step
      	#4 and #5 can happen in any order as long as each CPU passes
      	through a quiescent state before going offline.)
      
      6.	When the last CPU goes offline, __rcu_offline_cpu() will invoke
      	rcu_preempt_offline_tasks(), which will move the task to the
      	root rcu_node structure, but without reporting a quiescent state
      	up the rcu_node hierarchy (and this failure to report a quiescent
      	state is the bug).
      
      	But because this leaf rcu_node structure's ->qsmask field is
      	already zero and its ->block_tasks[] entries are all empty,
      	force_quiescent_state() will skip this rcu_node structure.
      
      	Therefore, grace periods are now hung.
      
      This patch abstracts some code out of rcu_read_unlock_special(),
      calling the result task_quiet() by analogy with cpu_quiet(), and
      invokes task_quiet() from both rcu_read_lock_special() and
      __rcu_offline_cpu().  Invoking task_quiet() from
      __rcu_offline_cpu() reports the quiescent state up the rcu_node
      hierarchy, fixing the bug.  This ends up requiring a separate
      lock_class_key per level of the rcu_node hierarchy, which this
      patch also provides.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12589088301770-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b668c9cf
  9. 14 11月, 2009 2 次提交
    • P
      rcu: Eliminate __rcu_pending() false positives · 2f51f988
      Paul E. McKenney 提交于
      Now that there are both ->gpnum and ->completed fields in the
      rcu_node structure, __rcu_pending() should check rdp->gpnum and
      rdp->completed against rnp->gpnum and rdp->completed, respectively,
      instead of the prior comparison against the rcu_state fields
      rsp->gpnum and rsp->completed.
      
      Given the old comparison, __rcu_pending() could return 1, resulting
      in a needless raise_softirq(RCU_SOFTIRQ).  This useless work would
      happen if RCU responded to a scheduling-clock interrupt after the
      rcu_state fields had been updated, but before the rcu_node fields
      had been updated.
      
      Changing the comparison from the rcu_state fields to the rcu_node
      fields prevents this useless work from happening.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12581706991966-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2f51f988
    • P
      rcu: Further cleanups of use of lastcomp · 560d4bc0
      Paul E. McKenney 提交于
      Now that a copy of the rsp->completed flag is available in all
      rcu_node structures, make full use of it.  It is still
      legitimate to access rsp->completed while holding the root
      rcu_node structure's lock, however.
      
      Also, tighten up force_quiescent_state()'s checks for end of
      current grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258170699933-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      560d4bc0
  10. 13 11月, 2009 2 次提交
    • P
      rcu: Simplify association of forced quiescent states with grace periods · 8e9aa8f0
      Paul E. McKenney 提交于
      The force_quiescent_state() function also took a snapshot
      of the ->completed field, which was as obnoxious as it was in
      rcu_sched_qs() and friends.  So snapshot ->gpnum-1.
      
      Also, since the dyntick_record_completed() and
      dyntick_recall_completed() functions are now simple assignments
      that are independent of CONFIG_NO_HZ, and since their names are
      now misleading, get rid of them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12580941042308-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8e9aa8f0
    • P
      rcu: Accelerate callback processing on CPUs not detecting GP end · b32e9eb6
      Paul E. McKenney 提交于
      An earlier fix for a race resulted in a situation where the CPUs
      other than the CPU that detected the end of the grace period would
      not process their callbacks until the next grace period started.
      
      This means that these other CPUs would unnecessarily demand that an
      extra grace period be started.
      
      This patch eliminates this extra grace period and speeds callback
      processing by propagating rsp->completed to the rcu_node structures
      in the case where the CPU detecting the end of the grace period
      sees no reason to start a new grace period.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1258094104417-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b32e9eb6
  11. 11 11月, 2009 2 次提交
    • P
      rcu: Simplify association of quiescent states with grace periods · c64ac3ce
      Paul E. McKenney 提交于
      The rdp->passed_quiesc_completed fields are used to properly
      associate the recorded quiescent state with a grace period.  It
      is OK to wrongly associate a given quiescent state with a
      preceding grace period, but it is fatal to associate a given
      quiescent state with a grace period that begins after the
      quiescent state occurred.  Grace periods are numbered, and the
      following fields track them:
      
      o	->gpnum is the number of the grace period currently in
      	progress, or the number of the last grace period to
      	complete if no grace period is currently in progress.
      
      o	->completed is the number of the last grace period to
      	have completed.
      
      These two fields are equal if there is no grace period in
      progress, otherwise ->gpnum is one greater than ->completed.
      But the rdp->passed_quiesc_completed field compared against
      ->completed, and if equal, the quiescent state is presumed to
      count against the current grace period.
      
      The earlier code copied rdp->completed to
      rdp->passed_quiesc_completed, which has been made to work, but
      is error-prone.  In contrast, copying one less than rdp->gpnum
      is guaranteed safe, because rdp->gpnum is not incremented until
      after the start of the corresponding grace period. At the end of
      the grace period, when ->completed has incremented, then any
      quiescent periods recorded previously will be discarded.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890421011-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c64ac3ce
    • P
      rcu: Rename dynticks_completed to completed_fqs · 4bcfe055
      Paul E. McKenney 提交于
      This field is used whether or not CONFIG_NO_HZ is set, so the
      old name of ->dynticks_completed is quite misleading.
      
      Change to ->completed_fqs, given that it the value that
      force_quiescent_state() is trying to drive the ->completed field
      away from.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12578890423298-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4bcfe055
  12. 10 11月, 2009 1 次提交
    • P
      rcu: Fix note_new_gpnum() uses of ->gpnum · 9160306e
      Paul E. McKenney 提交于
      Impose a clear locking design on the note_new_gpnum()
      function's use of the ->gpnum counter.  This is done by updating
      rdp->gpnum only from the corresponding leaf rcu_node structure's
      rnp->gpnum field, and even then only under the protection of
      that same rcu_node structure's ->lock field.  Performance and
      scalability are maintained using a form of double-checked
      locking, and excessive spinning is avoided by use of the
      spin_trylock() function.  The use of spin_trylock() is safe due
      to the fact that CPUs who fail to acquire this lock will try
      again later. The hierarchical nature of the rcu_node data
      structure limits contention (which could be limited further if
      need be using the RCU_FANOUT kernel parameter).
      
      Without this patch, obscure but quite possible races could
      result in a quiescent state that occurred during one grace
      period to be accounted to the following grace period, causing
      this following grace period to end prematurely.  Not good!
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: <stable@kernel.org> # .32.x
      LKML-Reference: <12571987492350-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9160306e