1. 26 1月, 2017 2 次提交
    • P
      srcu: Force full grace-period ordering · d85b62f1
      Paul E. McKenney 提交于
      If a process invokes synchronize_srcu(), is delayed just the right amount
      of time, and thus does not sleep when waiting for the grace period to
      complete, there is no ordering between the end of the grace period and
      the code following the synchronize_srcu().  Similarly, there can be a
      lack of ordering between the end of the SRCU grace period and callback
      invocation.
      
      This commit adds the necessary ordering.
      Reported-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
      d85b62f1
    • L
      srcu: Implement more-efficient reader counts · f2c46896
      Lance Roy 提交于
      SRCU uses two per-cpu counters: a nesting counter to count the number of
      active critical sections, and a sequence counter to ensure that the nesting
      counters don't change while they are being added together in
      srcu_readers_active_idx_check().
      
      This patch instead uses per-cpu lock and unlock counters. Because both
      counters only increase and srcu_readers_active_idx_check() reads the unlock
      counter before the lock counter, this achieves the same end without having
      to increment two different counters in srcu_read_lock(). This also saves a
      smp_mb() in srcu_readers_active_idx_check().
      
      Possible bug: There is no guarantee that the lock counter won't overflow
      during srcu_readers_active_idx_check(), as there are no memory barriers
      around srcu_flip() (see comment in srcu_readers_active_idx_check() for
      details). However, this problem was already present before this patch.
      Suggested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLance Roy <ldr709@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2c46896
  2. 05 12月, 2015 1 次提交
    • P
      rcu: Add rcu_normal kernel parameter to suppress expediting · 5a9be7c6
      Paul E. McKenney 提交于
      Although expedited grace periods can be quite useful, and although their
      OS jitter has been greatly reduced, they can still pose problems for
      extreme real-time workloads.  This commit therefore adds a rcu_normal
      kernel boot parameter (which can also be manipulated via sysfs)
      to suppress expedited grace periods, that is, to treat requests for
      expedited grace periods as if they were requests for normal grace periods.
      If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
      This means that if you are relying on expedited grace periods to speed up
      boot, you will want to specify rcu_expedited on the kernel command line,
      and then specify rcu_normal via sysfs once boot completes.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a9be7c6
  3. 07 10月, 2015 2 次提交
  4. 23 7月, 2015 1 次提交
  5. 16 7月, 2015 1 次提交
    • N
      rcu: Change return type to bool · f765d113
      Nicholas Mc Guire 提交于
      Type-checking coccinelle spatches are being used to locate type mismatches
      between function signatures and return values in this case this produced:
      ./kernel/rcu/srcu.c:271 WARNING: return of wrong type
              int != unsigned long,
      
      srcu_readers_active() returns an int that is the sum of per_cpu unsigned
      long but the only user is cleanup_srcu_struct() which is using it as a
      boolean (condition) to see if there is any readers rather than actually
      using the approximate number of readers. The theoretically possible
      unsigned long overflow case does not need to be handled explicitly - if
      we had 4G++ readers then something else went wrong a long time ago.
      
      proposal: change the return type to boolean. The function name is left
                unchanged as it fits the naming expectation for a boolean.
      
      patch was compile tested for x86_64_defconfig (implies CONFIG_SRCU=y)
      
      patch is against 4.1-rc5 (localversion-next is -next-20150525)
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f765d113
  6. 28 5月, 2015 1 次提交
  7. 27 2月, 2015 1 次提交
  8. 26 2月, 2015 1 次提交
  9. 07 1月, 2015 1 次提交
  10. 10 7月, 2014 1 次提交
    • P
      rcu: Eliminate read-modify-write ACCESS_ONCE() calls · a792563b
      Paul E. McKenney 提交于
      RCU contains code of the following forms:
      
      	ACCESS_ONCE(x)++;
      	ACCESS_ONCE(x) += y;
      	ACCESS_ONCE(x) -= y;
      
      Now these constructs do operate correctly, but they really result in a
      pair of volatile accesses, one to do the load and another to do the store.
      This can be confusing, as the casual reader might well assume that (for
      example) gcc might generate a memory-to-memory add instruction for each
      of these three cases.  In fact, gcc will do no such thing.  Also, there
      is a good chance that the kernel will move to separate load and store
      variants of ACCESS_ONCE(), and constructs like the above could easily
      confuse both people and scripts attempting to make that sort of change.
      Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
      only need the store to be volatile, so that the read-modify-write form
      might be misleading.
      
      This commit therefore changes the above forms in RCU so that each instance
      of ACCESS_ONCE() either does a load or a store, but not both.  In a few
      cases, ACCESS_ONCE() was not critical, for example, for maintaining
      statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
      entirely.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a792563b
  11. 26 2月, 2014 1 次提交
    • P
      rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone · 5cb5c6e1
      Paul Gortmaker 提交于
      The kbuild test bot uncovered an implicit dependence on the
      trace header being present before rcu.h in ia64 allmodconfig
      that looks like this:
      
      In file included from kernel/ksysfs.c:22:0:
      kernel/rcu/rcu.h: In function '__rcu_reclaim':
      kernel/rcu/rcu.h:107:3: error: implicit declaration of function 'trace_rcu_invoke_kfree_callback' [-Werror=implicit-function-declaration]
      kernel/rcu/rcu.h:112:3: error: implicit declaration of function 'trace_rcu_invoke_callback' [-Werror=implicit-function-declaration]
      cc1: some warnings being treated as errors
      
      Looking at other rcu.h users, we can find that they all
      were sourcing the trace header in advance of rcu.h itself,
      as seen in the context of this diff.  There were also some
      inconsistencies as to whether it was or wasn't sourced based
      on the parent tracing Kconfig.
      
      Rather than "fix" it at each use site, and have inconsistent
      use based on whether "#ifdef CONFIG_RCU_TRACE" was used or not,
      lets just source the trace header just once, in the actual consumer
      of it, which is rcu.h itself.  We include it unconditionally, as
      build testing shows us that is a hard requirement for some files.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5cb5c6e1
  12. 18 2月, 2014 2 次提交
  13. 10 12月, 2013 1 次提交
  14. 04 12月, 2013 1 次提交
  15. 16 10月, 2013 1 次提交
  16. 08 2月, 2013 6 次提交
  17. 24 10月, 2012 3 次提交
  18. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate system_nrt[_freezable]_wq · 3b07e9ca
      Tejun Heo 提交于
      system_nrt[_freezable]_wq are now spurious.  Mark them deprecated and
      convert all users to system[_freezable]_wq.
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
      Please use system[_freezable]_wq instead.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-By: NLai Jiangshan <laijs@cn.fujitsu.com>
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      3b07e9ca
  19. 01 5月, 2012 9 次提交
    • L
      rcu: Implement per-domain single-threaded call_srcu() state machine · 931ea9d1
      Lai Jiangshan 提交于
      This commit implements an SRCU state machine in support of call_srcu().
      The state machine is preemptible, light-weight, and single-threaded,
      minimizing synchronization overhead.  In particular, there is no longer
      any need for synchronize_srcu() to be guarded by a mutex.
      
      Expedited processing is handled, at least in the absence of concurrent
      grace-period operations on that same srcu_struct structure, by having
      the synchronize_srcu_expedited() thread take on the role of the
      workqueue thread for one iteration.
      
      There is a reasonable probability that a given SRCU callback will
      be invoked on the same CPU that registered it, however, there is no
      guarantee.  Concurrent SRCU grace-period primitives can cause callbacks
      to be executed elsewhere, even in absence of CPU-hotplug operations.
      
      Callbacks execute in process context, but under the influence of
      local_bh_disable(), so it is illegal to sleep in an SRCU callback
      function.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      931ea9d1
    • L
      rcu: Use single value to handle expedited SRCU grace periods · d9792edd
      Lai Jiangshan 提交于
      The earlier algorithm used an "expedited" flag combined with a "trycount"
      counter to differentiate between normal and expedited SRCU grace periods.
      However, the difference can be encoded into a single counter with a cutoff
      value and different initial values for expedited and normal SRCU grace
      periods.  This commit makes that change.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      
      Conflicts:
      
      	kernel/srcu.c
      d9792edd
    • L
      rcu: Improve srcu_readers_active_idx()'s cache locality · dc879175
      Lai Jiangshan 提交于
      Expand the calls to srcu_readers_active_idx() from srcu_readers_active()
      inline.  This change improves cache locality by interating over the CPUs
      once rather than twice.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dc879175
    • L
      rcu: Implement a variant of Peter's SRCU algorithm · b52ce066
      Lai Jiangshan 提交于
      This commit implements a variant of Peter's algorithm, which may be found
      at https://lkml.org/lkml/2012/2/1/119.
      
      o	Make the checking lock-free to enable parallel checking.
      	Parallel checking is required when (1) the original checking
      	task is preempted for a long time, (2) sychronize_srcu_expedited()
      	starts during an ongoing SRCU grace period, or (3) we wish to
      	avoid acquiring a lock.
      
      o	Since the checking is lock-free, we avoid a mutex in state machine
      	for call_srcu().
      
      o	Remove the SRCU_REF_MASK and remove the coupling with the flipping.
      	This might allow us to remove the preempt_disable() in future
      	versions, though such removal will need great care because it
      	rescinds the one-old-reader-per-CPU guarantee.
      
      o	Remove a smp_mb(), simplify the comments and make the smp_mb() pairs
      	more intuitive.
      Inspired-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b52ce066
    • L
      rcu: Improve SRCU's wait_idx() comments · 18108ebf
      Lai Jiangshan 提交于
      The safety of SRCU is provided byy wait_idx() rather than flipping.
      The flipping actually prevents starvation.
      
      This commit therefore updates the comments to more accurately and
      precisely describe what is going on.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      18108ebf
    • L
      rcu: Flip ->completed only once per SRCU grace period · 944ce9af
      Lai Jiangshan 提交于
      This is an optimization of the SRCU grace period.  To guard against
      preempted readers with old values of the counter, it suffices to scan the
      old counters once more, then flip ->completed only one time.  The reason
      this works is that the old readers must have incremented the old set of
      counters (if they have not yet incremented, then their critical section
      starts after this grace period, so they may be safely ignored).
      
      This commit therefore optimizes the second flip out in favor of a simple
      rescan.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      944ce9af
    • L
      rcu: Increment upper bit only for srcu_read_lock() · 440253c1
      Lai Jiangshan 提交于
      The purpose of the upper bit of SRCU's per-CPU counters is to guarantee
      that no reasonable series of srcu_read_lock() and srcu_read_unlock()
      operations can return the value of the counter to its original value.
      This guarantee is require only after the index has been switched to
      the other set of counters, so at most one srcu_read_lock() can affect
      a given CPU's counter.  The number of srcu_read_unlock() operations
      on a given counter is limited to the number of tasks in the system,
      which given the Linux kernel's current structure is limited to far less
      than 2^30 on 32-bit systems and far less than 2^62 on 64-bit systems.
      (Something about a limited number of bytes in the kernel's address space.)
      
      Therefore, if srcu_read_lock() increments the upper bits, then
      srcu_read_unlock() need not do so.  In this case, an srcu_read_lock() and
      an srcu_read_unlock() will flip the lower bit of the upper field of the
      counter.  An unreasonably large additional number of srcu_read_unlock()
      operations would be required to return the counter to its initial value,
      thus preserving the guarantee.
      
      This commit takes this approach, which further allows it to shrink
      the size of the upper field to one bit, making the number of
      srcu_read_unlock() operations required to return the counter to its
      initial value even more unreasonable than before.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      440253c1
    • L
      rcu: Remove fast check path from __synchronize_srcu() · 4b7a3e9e
      Lai Jiangshan 提交于
      The fastpath in __synchronize_srcu() is designed to handle cases where
      there are a large number of concurrent calls for the same srcu_struct
      structure.  However, the Linux kernel currently does not use SRCU in
      this manner, so remove the fastpath checks for simplicity.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4b7a3e9e
    • P
      rcu: Direct algorithmic SRCU implementation · cef50120
      Paul E. McKenney 提交于
      The current implementation of synchronize_srcu_expedited() can cause
      severe OS jitter due to its use of synchronize_sched(), which in turn
      invokes try_stop_cpus(), which causes each CPU to be sent an IPI.
      This can result in severe performance degradation for real-time workloads
      and especially for short-interation-length HPC workloads.  Furthermore,
      because only one instance of try_stop_cpus() can be making forward progress
      at a given time, only one instance of synchronize_srcu_expedited() can
      make forward progress at a time, even if they are all operating on
      distinct srcu_struct structures.
      
      This commit, inspired by an earlier implementation by Peter Zijlstra
      (https://lkml.org/lkml/2012/1/31/211) and by further offline discussions,
      takes a strictly algorithmic bits-in-memory approach.  This has the
      disadvantage of requiring one explicit memory-barrier instruction in
      each of srcu_read_lock() and srcu_read_unlock(), but on the other hand
      completely dispenses with OS jitter and furthermore allows SRCU to be
      used freely by CPUs that RCU believes to be idle or offline.
      
      The update-side implementation handles the single read-side memory
      barrier by rechecking the per-CPU counters after summing them and
      by running through the update-side state machine twice.
      
      This implementation has passed moderate rcutorture testing on both
      x86 and Power.  Also updated to use this_cpu_ptr() instead of per_cpu_ptr(),
      as suggested by Peter Zijlstra.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      cef50120
  20. 22 2月, 2012 2 次提交
  21. 31 10月, 2011 1 次提交