1. 28 5月, 2015 1 次提交
  2. 13 3月, 2015 1 次提交
    • P
      rcu: Handle outgoing CPUs on exit from idle loop · 88428cc5
      Paul E. McKenney 提交于
      This commit informs RCU of an outgoing CPU just before that CPU invokes
      arch_cpu_idle_dead() during its last pass through the idle loop (via a
      new CPU_DYING_IDLE notifier value).  This change means that RCU need not
      deal with outgoing CPUs passing through the scheduler after informing
      RCU that they are no longer online.  Note that removing the CPU from
      the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time,
      and orphaning callbacks is still done at CPU_DEAD time, the reason being
      that at CPU_DEAD time we have another CPU that can adopt them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      88428cc5
  3. 04 3月, 2015 2 次提交
    • P
      rcu: Reverse rcu_dereference_check() conditions · b826565a
      Paul E. McKenney 提交于
      The rcu_dereference_check() family of primitives evaluates the RCU
      lockdep expression first, and only then evaluates the expression passed
      in.  This works fine normally, but can potentially fail in environments
      (such as NMI handlers) where lockdep cannot be invoked.  The problem is
      that even if the expression passed in is "1", the compiler would need to
      prove that the RCU lockdep expression (rcu_read_lock_held(), for example)
      is free of side effects in order to be able to elide it.  Given that
      rcu_read_lock_held() is sometimes separately compiled, the compiler cannot
      always use this optimization.
      
      This commit therefore reverse the order of evaluation, so that the
      expression passed in is evaluated first, and the RCU lockdep expression is
      evaluated only if the passed-in expression evaluated to false, courtesy
      of the C-language short-circuit boolean evaluation rules.  This compells
      the compiler to forego executing the RCU lockdep expression in cases
      where the passed-in expression evaluates to "1" at compile time, so that
      (for example) rcu_dereference_raw() can be guaranteed to execute safely
      within an NMI handler.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      b826565a
    • P
      rcu: Improve diagnostics for blocked critical sections in irq · d24209bb
      Paul E. McKenney 提交于
      If an RCU read-side critical section occurs within an interrupt handler
      or a softirq handler, it cannot have been preempted.  Therefore, there is
      a check in rcu_read_unlock_special() checking for this error.  However,
      when this check triggers, it lacks diagnostic information.  This commit
      therefore moves rcu_read_unlock()'s lockdep annotation to follow the
      call to __rcu_read_unlock() and changes rcu_read_unlock_special()'s
      WARN_ON_ONCE() to an lockdep_rcu_suspicious() in order to locate where
      the offending RCU read-side critical section began.  In addition, the
      value of the ->rcu_read_unlock_special field is printed.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d24209bb
  4. 27 2月, 2015 2 次提交
    • P
      rcu: Add Kconfig option to expedite grace periods during boot · ee42571f
      Paul E. McKenney 提交于
      This commit adds a CONFIG_RCU_EXPEDITE_BOOT Kconfig parameter
      that emulates a very early boot rcu_expedite_gp().  A late-boot
      call to rcu_end_inkernel_boot() will provide the corresponding
      rcu_unexpedite_gp().  The late-boot call to rcu_end_inkernel_boot()
      should be made just before init is spawned.
      
      According to Arjan:
      
      > To show the boot time, I'm using the timestamp of the "Write protecting"
      > line, that's pretty much the last thing we print prior to ring 3 execution.
      >
      > A kernel with default RCU behavior (inside KVM, only virtual devices)
      > looks like this:
      >
      > [    0.038724] Write protecting the kernel read-only data: 10240k
      >
      > a kernel with expedited RCU (using the command line option, so that I
      > don't have to recompile between measurements and thus am completely
      > oranges-to-oranges)
      >
      > [    0.031768] Write protecting the kernel read-only data: 10240k
      >
      > which, in percentage, is an 18% improvement.
      Reported-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NArjan van de Ven <arjan@linux.intel.com>
      ee42571f
    • P
      rcu: Provide rcu_expedite_gp() and rcu_unexpedite_gp() · 0d39482c
      Paul E. McKenney 提交于
      Currently, expediting of normal synchronous grace-period primitives
      (synchronize_rcu() and friends) is controlled by the rcu_expedited()
      boot/sysfs parameter.  This works well, but does not handle nesting.
      This commit therefore provides rcu_expedite_gp() to enable expediting
      and rcu_unexpedite_gp() to cancel a prior rcu_expedite_gp(), both of
      which support nesting.
      Reported-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0d39482c
  5. 26 2月, 2015 1 次提交
  6. 16 1月, 2015 1 次提交
    • P
      rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193
      Paul E. McKenney 提交于
      Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
      in places where it would be useful for it to apply to the normal RCU
      flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
      case for workloads that aggressively overload the system, particularly
      those that generate large numbers of RCU updates on systems running
      NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
      from cond_resched_rcu_qs() to the normal RCU flavors.
      
      Note that it is unfortunately necessary to leave the old ->passed_quiesce
      mechanism in place to allow quiescent states that apply to only one
      flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
      that case, but that is not so good for debugging of RCU internals.)
      In addition, if one of the RCU flavor's grace period has stalled, this
      will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
      quiescent state visible from other CPUs.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
        was used in preemptible code. ]
      5cd37193
  7. 07 1月, 2015 1 次提交
  8. 14 11月, 2014 3 次提交
  9. 04 11月, 2014 2 次提交
  10. 30 10月, 2014 1 次提交
  11. 29 10月, 2014 1 次提交
  12. 17 9月, 2014 2 次提交
  13. 08 9月, 2014 9 次提交
    • P
      rcu: Per-CPU operation cleanups to rcu_*_qs() functions · 284a8c93
      Paul E. McKenney 提交于
      The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
      old-style per-CPU variable access and write to ->passed_quiesce even
      if it is already set.  This commit therefore updates to use the new-style
      per-CPU variable access functions and avoids the spurious writes.
      This commit also eliminates the "cpu" argument to these functions because
      they are always invoked on the indicated CPU.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      284a8c93
    • P
      rcu: Remove redundant preempt_disable() from rcu_note_voluntary_context_switch() · 01a81330
      Paul E. McKenney 提交于
      In theory, synchronize_sched() requires a read-side critical section
      to order against.  In practice, preemption can be thought of as
      being disabled across every machine instruction, at least for those
      machine instructions that are not in the idle loop and not on offline
      CPUs.  So this commit removes the redundant preempt_disable() from
      rcu_note_voluntary_context_switch().
      
      Please note that the single instruction in question is the store of
      zero to ->rcu_tasks_holdout.  The "if" is simply a performance optimization
      that avoids unnecessary stores.  To see this, keep in mind that both
      the "if" condition and the store are in a quiescent state.  Therefore,
      even if the task is preempted for a full grace period (presumably due
      to its having done a context switch beforehand), the store will be
      recording a legitimate quiescent state.
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      Conflicts:
      	include/linux/rcupdate.h
      01a81330
    • P
      rcutorture: Add torture tests for RCU-tasks · 69c60455
      Paul E. McKenney 提交于
      This commit adds torture tests for RCU-tasks.  It also fixes a bug that
      would segfault for an RCU flavor lacking a callback-barrier function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      69c60455
    • P
      rcu: Make TASKS_RCU handle tasks that are almost done exiting · 3f95aa81
      Paul E. McKenney 提交于
      Once a task has passed exit_notify() in the do_exit() code path, it
      is no longer on the task lists, and is therefore no longer visible
      to rcu_tasks_kthread().  This means that an almost-exited task might
      be preempted while within a trampoline, and this task won't be waited
      on by rcu_tasks_kthread().  This commit fixes this bug by adding an
      srcu_struct.  An exiting task does srcu_read_lock() just before calling
      exit_notify(), and does the corresponding srcu_read_unlock() after
      doing the final preempt_disable().  This means that rcu_tasks_kthread()
      can do synchronize_srcu() to wait for all mostly-exited tasks to reach
      their final preempt_disable() region, and then use synchronize_sched()
      to wait for those tasks to finish exiting.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3f95aa81
    • P
      rcu: Add synchronous grace-period waiting for RCU-tasks · 53c6d4ed
      Paul E. McKenney 提交于
      It turns out to be easier to add the synchronous grace-period waiting
      functions to RCU-tasks than to work around their absense in rcutorture,
      so this commit adds them.  The key point is that the existence of
      call_rcu_tasks() means that rcutorture needs an rcu_barrier_tasks().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      53c6d4ed
    • P
      rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops · bde6c3aa
      Paul E. McKenney 提交于
      RCU-tasks requires the occasional voluntary context switch
      from CPU-bound in-kernel tasks.  In some cases, this requires
      instrumenting cond_resched().  However, there is some reluctance
      to countenance unconditionally instrumenting cond_resched() (see
      http://lwn.net/Articles/603252/), so this commit creates a separate
      cond_resched_rcu_qs() that may be used in place of cond_resched() in
      locations prone to long-duration in-kernel looping.
      
      This commit currently instruments only RCU-tasks.  Future possibilities
      include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
      IPI usage.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bde6c3aa
    • P
      rcu: Add call_rcu_tasks() · 8315f422
      Paul E. McKenney 提交于
      This commit adds a new RCU-tasks flavor of RCU, which provides
      call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
      context switch (not preemption!) and userspace execution (not the idle
      loop -- use some sort of schedule_on_each_cpu() if you need to handle the
      idle tasks.  Note that unlike other RCU flavors, these quiescent states
      occur in tasks, not necessarily CPUs.  Includes fixes from Steven Rostedt.
      
      This RCU flavor is assumed to have very infrequent latency-tolerant
      updaters.  This assumption permits significant simplifications, including
      a single global callback list protected by a single global lock, along
      with a single task-private linked list containing all tasks that have not
      yet passed through a quiescent state.  If experience shows this assumption
      to be incorrect, the required additional complexity will be added.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8315f422
    • O
      rcu: Uninline rcu_read_lock_held() · 85b39d30
      Oleg Nesterov 提交于
      This commit uninlines rcu_read_lock_held(). According to "size vmlinux"
      this saves 28549 in .text:
      
      	- 5541731 3014560 14757888 23314179
      	+ 5513182 3026848 14757888 23297918
      
      Note: it looks as if the data grows by 12288 bytes but this is not true,
      it does not actually grow. But .data starts with ALIGN(THREAD_SIZE) and
      since .text shrinks the padding grows, and thus .data grows too as it
      seen by /bin/size. diff System.map:
      
      	- ffffffff81510000 D _sdata
      	- ffffffff81510000 D init_thread_union
      	+ ffffffff81509000 D _sdata
      	+ ffffffff8150c000 D init_thread_union
      
      Perhaps we can change vmlinux.lds.S to .data itself, so that /bin/size
      can't "wrongly" report that .data grows if .text shinks.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      85b39d30
    • P
      rcu: Return bool type in rcu_lockdep_current_cpu_online() · 521d24ee
      Pranith Kumar 提交于
      Return true instead of 1 in rcu_lockdep_current_cpu_online() as this
      has bool as return type.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      521d24ee
  14. 10 7月, 2014 2 次提交
  15. 24 6月, 2014 2 次提交
    • P
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney 提交于
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
    • P
      rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() · 546a9d85
      Paul E. McKenney 提交于
      Currently, call_rcu() relies on implicit allocation and initialization
      for the debug-objects handling of RCU callbacks.  If you hammer the
      kernel hard enough with Sasha's modified version of trinity, you can end
      up with the sl*b allocators recursing into themselves via this implicit
      call_rcu() allocation.
      
      This commit therefore exports the debug_init_rcu_head() and
      debug_rcu_head_free() functions, which permits the allocators to allocated
      and pre-initialize the debug-objects information, so that there no longer
      any need for call_rcu() to do that initialization, which in turn prevents
      the recursion into the memory allocators.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Looks-good-to: Christoph Lameter <cl@linux.com>
      546a9d85
  16. 20 5月, 2014 1 次提交
  17. 15 5月, 2014 1 次提交
    • P
      sched,rcu: Make cond_resched() report RCU quiescent states · ac1bea85
      Paul E. McKenney 提交于
      Given a CPU running a loop containing cond_resched(), with no
      other tasks runnable on that CPU, RCU will eventually report RCU
      CPU stall warnings due to lack of quiescent states.  Fortunately,
      every call to cond_resched() is a perfectly good quiescent state.
      Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
      for cond_resched(), especially given the need to disable preemption,
      and, for RCU-preempt, interrupts as well.
      
      This commit therefore maintains a per-CPU counter that causes
      cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
      rcu_note_context_switch(), but only about once per 256 invocations.
      This ratio was chosen in keeping with the relative time constants of
      RCU grace periods.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      ac1bea85
  18. 14 5月, 2014 1 次提交
  19. 29 4月, 2014 2 次提交
  20. 26 2月, 2014 1 次提交
  21. 18 2月, 2014 3 次提交