1. 08 9月, 2014 1 次提交
    • P
      rcu: Add call_rcu_tasks() · 8315f422
      Paul E. McKenney 提交于
      This commit adds a new RCU-tasks flavor of RCU, which provides
      call_rcu_tasks().  This RCU flavor's quiescent states are voluntary
      context switch (not preemption!) and userspace execution (not the idle
      loop -- use some sort of schedule_on_each_cpu() if you need to handle the
      idle tasks.  Note that unlike other RCU flavors, these quiescent states
      occur in tasks, not necessarily CPUs.  Includes fixes from Steven Rostedt.
      
      This RCU flavor is assumed to have very infrequent latency-tolerant
      updaters.  This assumption permits significant simplifications, including
      a single global callback list protected by a single global lock, along
      with a single task-private linked list containing all tasks that have not
      yet passed through a quiescent state.  If experience shows this assumption
      to be incorrect, the required additional complexity will be added.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8315f422
  2. 10 7月, 2014 2 次提交
  3. 24 6月, 2014 2 次提交
    • P
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney 提交于
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
    • P
      rcu: Export debug_init_rcu_head() and and debug_init_rcu_head() · 546a9d85
      Paul E. McKenney 提交于
      Currently, call_rcu() relies on implicit allocation and initialization
      for the debug-objects handling of RCU callbacks.  If you hammer the
      kernel hard enough with Sasha's modified version of trinity, you can end
      up with the sl*b allocators recursing into themselves via this implicit
      call_rcu() allocation.
      
      This commit therefore exports the debug_init_rcu_head() and
      debug_rcu_head_free() functions, which permits the allocators to allocated
      and pre-initialize the debug-objects information, so that there no longer
      any need for call_rcu() to do that initialization, which in turn prevents
      the recursion into the memory allocators.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Looks-good-to: Christoph Lameter <cl@linux.com>
      546a9d85
  4. 20 5月, 2014 1 次提交
  5. 15 5月, 2014 1 次提交
    • P
      sched,rcu: Make cond_resched() report RCU quiescent states · ac1bea85
      Paul E. McKenney 提交于
      Given a CPU running a loop containing cond_resched(), with no
      other tasks runnable on that CPU, RCU will eventually report RCU
      CPU stall warnings due to lack of quiescent states.  Fortunately,
      every call to cond_resched() is a perfectly good quiescent state.
      Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
      for cond_resched(), especially given the need to disable preemption,
      and, for RCU-preempt, interrupts as well.
      
      This commit therefore maintains a per-CPU counter that causes
      cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
      rcu_note_context_switch(), but only about once per 256 invocations.
      This ratio was chosen in keeping with the relative time constants of
      RCU grace periods.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      ac1bea85
  6. 14 5月, 2014 1 次提交
  7. 29 4月, 2014 2 次提交
  8. 26 2月, 2014 1 次提交
  9. 18 2月, 2014 6 次提交
  10. 10 2月, 2014 1 次提交
    • O
      lockdep: Make held_lock->check and "int check" argument bool · fb9edbe9
      Oleg Nesterov 提交于
      The "int check" argument of lock_acquire() and held_lock->check are
      misleading. This is actually a boolean: 2 means "true", everything
      else is "false".
      
      And there is no need to pass 1 or 0 to lock_acquire() depending on
      CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the
      start and clears "check" if !CONFIG_PROVE_LOCKING.
      
      Note: probably we can simply kill this member/arg. The only explicit
      user of check => 0 is rcu_lock_acquire(), perhaps we can change it to
      use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means
      check => 0 implicitly, but we can change validate_chain() to check
      hlock->instance->key instead. Not to mention it would be nice to get
      rid of lockdep_set_novalidate_class().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb9edbe9
  11. 25 1月, 2014 1 次提交
    • O
      introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty() · a8d4b834
      Oleg Nesterov 提交于
      rcu_dereference_check_fdtable() looks very wrong,
      
      1. rcu_my_thread_group_empty() was added by 844b9a87 "vfs: fix
         RCU-lockdep false positive due to /proc" but it doesn't really
         fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
         hit the same race with get_files_struct().
      
         And otoh rcu_my_thread_group_empty() can suppress the correct
         warning if the caller is the CLONE_FILES (without CLONE_THREAD)
         task.
      
      2. files->count == 1 check is not really right too. Even if this
         files_struct is not shared it is not safe to access it lockless
         unless the caller is the owner.
      
         Otoh, this check is sub-optimal. files->count == 0 always means
         it is safe to use it lockless even if files != current->files,
         but put_files_struct() has to take rcu_read_lock(). See the next
         patch.
      
      This patch removes the buggy checks and turns fcheck_files() into
      __fcheck_files() which uses rcu_dereference_raw(), the "unshared"
      callers, fget_light() and fget_raw_light(), can use it to avoid
      the warning from RCU-lockdep.
      
      fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
      plus __fcheck_files().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8d4b834
  12. 13 12月, 2013 4 次提交
  13. 10 12月, 2013 1 次提交
  14. 25 9月, 2013 2 次提交
  15. 01 9月, 2013 1 次提交
    • P
      nohz_full: Add full-system-idle state machine · 0edd1b17
      Paul E. McKenney 提交于
      This commit adds the state machine that takes the per-CPU idle data
      as input and produces a full-system-idle indication as output.  This
      state machine is driven out of RCU's quiescent-state-forcing
      mechanism, which invokes rcu_sysidle_check_cpu() to collect per-CPU
      idle state and then rcu_sysidle_report() to drive the state machine.
      
      The full-system-idle state is sampled using rcu_sys_is_idle(), which
      also drives the state machine if RCU is idle (and does so by forcing
      RCU to become non-idle).  This function returns true if all but the
      timekeeping CPU (tick_do_timer_cpu) are idle and have been idle long
      enough to avoid memory contention on the full_sysidle_state state
      variable.  The rcu_sysidle_force_exit() may be called externally
      to reset the state machine back into non-idle state.
      
      For large systems the state machine is driven out of RCU's
      force-quiescent-state logic, which provides good scalability at the price
      of millisecond-scale latencies on the transition to full-system-idle
      state.  This is not so good for battery-powered systems, which are usually
      small enough that they don't need to care about scalability, but which
      do care deeply about energy efficiency.  Small systems therefore drive
      the state machine directly out of the idle-entry code.  The number of
      CPUs in a "small" system is defined by a new NO_HZ_FULL_SYSIDLE_SMALL
      Kconfig parameter, which defaults to 8.  Note that this is a build-time
      definition.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      [ paulmck: Use true and false for boolean constants per Lai Jiangshan. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Simplify logic and provide better comments for memory barriers,
        based on review comments and questions by Lai Jiangshan. ]
      0edd1b17
  16. 19 8月, 2013 1 次提交
  17. 30 7月, 2013 1 次提交
    • S
      rcu: Add const annotation to char * for RCU tracepoints and functions · e66c33d5
      Steven Rostedt (Red Hat) 提交于
      All the RCU tracepoints and functions that reference char pointers do
      so with just 'char *' even though they do not modify the contents of
      the string itself. This will cause warnings if a const char * is used
      in one of these functions.
      
      The RCU tracepoints store the pointer to the string to refer back to them
      when the trace output is displayed. As this can be minutes, hours or
      even days later, those strings had better be constant.
      
      This change also opens the door to allow the RCU tracepoint strings and
      their addresses to be exported so that userspace tracing tools can
      translate the contents of the pointers of the RCU tracepoints.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e66c33d5
  18. 11 6月, 2013 3 次提交
  19. 29 5月, 2013 1 次提交
  20. 19 4月, 2013 1 次提交
    • F
      nohz: Ensure full dynticks CPUs are RCU nocbs · d1e43fa5
      Frederic Weisbecker 提交于
      We need full dynticks CPU to also be RCU nocb so
      that we don't have to keep the tick to handle RCU
      callbacks.
      
      Make sure the range passed to nohz_full= boot
      parameter is a subset of rcu_nocbs=
      
      The CPUs that fail to meet this requirement will be
      excluded from the nohz_full range. This is checked
      early in boot time, before any CPU has the opportunity
      to stop its tick.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d1e43fa5
  21. 26 3月, 2013 1 次提交
  22. 27 1月, 2013 1 次提交
  23. 09 1月, 2013 1 次提交
    • P
      rcu: Reduce rcutorture tracing · 52494535
      Paul E. McKenney 提交于
      Currently, rcutorture traces every read-side access.  This can be
      problematic because even a two-minute rcutorture run on a two-CPU system
      can generate 28,853,363 reads.  Normally, only a failing read is of
      interest, so this commit traces adjusts rcutorture's tracing to only
      trace failing reads.  The resulting event tracing records the time
      and the ->completed value captured at the beginning of the RCU read-side
      critical section, allowing correlation with other event-tracing messages.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Add fix to build problem located by Randy Dunlap based on
        diagnosis by Steven Rostedt. ]
      52494535
  24. 01 12月, 2012 1 次提交
    • F
      context_tracking: New context tracking susbsystem · 91d1aa43
      Frederic Weisbecker 提交于
      Create a new subsystem that probes on kernel boundaries
      to keep track of the transitions between level contexts
      with two basic initial contexts: user or kernel.
      
      This is an abstraction of some RCU code that use such tracking
      to implement its userspace extended quiescent state.
      
      We need to pull this up from RCU into this new level of indirection
      because this tracking is also going to be used to implement an "on
      demand" generic virtual cputime accounting. A necessary step to
      shutdown the tick while still accounting the cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      [ paulmck: fix whitespace error and email address. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      91d1aa43
  25. 14 11月, 2012 1 次提交
  26. 24 10月, 2012 1 次提交