1. 21 4月, 2017 2 次提交
    • P
      srcu: Expedite srcu_schedule_cbs_snp() callback invocation · 0497b489
      Paul E. McKenney 提交于
      Although Tree SRCU does reduce delays when there is at least one
      synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp()
      still waits for SRCU_INTERVAL before invoking callbacks.  Since
      synchronize_srcu_expedited() now posts a callback and waits for
      that callback to do a wakeup, this destroys the expedited nature of
      synchronize_srcu_expedited().  This destruction became apparent to
      Marc Zyngier in the guise of a guest-OS bootup slowdown from five
      seconds to no fewer than forty seconds.
      
      This commit therefore invokes callbacks immediately at the end of the
      grace period when there is at least one synchronize_srcu_expedited()
      invocation pending.  This brought Marc's guest-OS bootup times back
      into the realm of reason.
      Reported-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      0497b489
    • P
      srcu: Parallelize callback handling · da915ad5
      Paul E. McKenney 提交于
      Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
      however, there are workloads that could result in a high volume of
      concurrent invocations of call_srcu(), which with current SRCU would
      result in excessive lock contention on the srcu_struct structure's
      ->queue_lock, which protects SRCU's callback lists.  This commit therefore
      moves SRCU to per-CPU callback lists, thus greatly reducing contention.
      
      Because a given SRCU instance no longer has a single centralized callback
      list, starting grace periods and invoking callbacks are both more complex
      than in the single-list Classic SRCU implementation.  Starting grace
      periods and handling callbacks are now handled using an srcu_node tree
      that is in some ways similar to the rcu_node trees used by RCU-bh,
      RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
      controlled by exactly the same Kconfig options and boot parameters that
      control the shape of the rcu_node tree).
      
      In addition, the old per-CPU srcu_array structure is now named srcu_data
      and contains an rcu_segcblist structure named ->srcu_cblist for its
      callbacks (and a spinlock to protect this).  The srcu_struct gets
      an srcu_gp_seq that is used to associate callback segments with the
      corresponding completion-time grace-period number.  These completion-time
      grace-period numbers are propagated up the srcu_node tree so that the
      grace-period workqueue handler can determine whether additional grace
      periods are needed on the one hand and where to look for callbacks that
      are ready to be invoked.
      
      The srcu_barrier() function must now wait on all instances of the per-CPU
      ->srcu_cblist.  Because each ->srcu_cblist is protected by ->lock,
      srcu_barrier() can remotely add the needed callbacks.  In theory,
      it could also remotely start grace periods, but in practice doing so
      is complex and racy.  And interestingly enough, it is never necessary
      for srcu_barrier() to start a grace period because srcu_barrier() only
      enqueues a callback when a callback is already present--and it turns out
      that a grace period has to have already been started for this pre-existing
      callback.  Furthermore, it is only the callback that srcu_barrier()
      needs to wait on, not any particular grace period.  Therefore, a new
      rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
      callback into the same segment occupied by the last pre-existing callback
      in the list.  The special case where all the pre-existing callbacks are
      on a different list (because they are in the process of being invoked)
      is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
      segment, relying on the done-callbacks check that takes place after all
      callbacks are inovked.
      
      Note that the readers use the same algorithm as before.  Note that there
      is a separate srcu_idx that tells the readers what counter to increment.
      This unfortunately cannot be combined with srcu_gp_seq because they
      need to be incremented at different times.
      
      This commit introduces some ugly #ifdefs in rcutorture.  These will go
      away when I feel good enough about Tree SRCU to ditch Classic SRCU.
      
      Some crude performance comparisons, courtesy of a quickly hacked rcuperf
      asynchronous-grace-period capability:
      
      			Callback Queuing Overhead
      			-------------------------
      	# CPUS		Classic SRCU	Tree SRCU
      	------          ------------    ---------
      	     2              0.349 us     0.342 us
      	    16             31.66  us     0.4   us
      	    41             ---------     0.417 us
      
      The times are the 90th percentiles, a statistic that was chosen to reject
      the overheads of the occasional srcu_barrier() call needed to avoid OOMing
      the test machine.  The rcuperf test hangs when running Classic SRCU at 41
      CPUs, hence the line of dashes.  Despite the hacks to both the rcuperf code
      and that statistics, this is a convincing demonstration of Tree SRCU's
      performance and scalability advantages.
      
      [1] https://lwn.net/Articles/309030/
      [2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
      da915ad5
  2. 19 4月, 2017 12 次提交
    • P
      srcu: Introduce CLASSIC_SRCU Kconfig option · dad81a20
      Paul E. McKenney 提交于
      The TREE_SRCU rewrite is large and a bit on the non-simple side, so
      this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
      to be selected using a new CLASSIC_SRCU Kconfig option that depends
      on RCU_EXPERT.  The default is to use the new TREE_SRCU and TINY_SRCU
      algorithms, in order to help get these the testing that they need.
      However, if your users do not require the update-side scalability that
      is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
      to revert back to the old classic SRCU algorithm.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      dad81a20
    • P
      srcu: Crude control of expedited grace periods · f60d231a
      Paul E. McKenney 提交于
      SRCU's implementation of expedited grace periods has always assumed
      that the SRCU instance is idle when the expedited request arrives.
      This commit improves this a bit by maintaining a count of the number
      of outstanding expedited requests, thus allowing prior non-expedited
      grace periods accommodate these requests by shifting to expedited mode.
      However, any non-expedited wait already in progress will still wait for
      the full duration.
      
      Improved control of expedited grace periods is planned, but one step
      at a time.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f60d231a
    • P
      srcu: Merge ->srcu_state into ->srcu_gp_seq · 80a7956f
      Paul E. McKenney 提交于
      Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
      race conditions given multiple callback queues, so this commit takes
      advantage of the two-bit state now available in rcu_seq counters to
      store the state in the bottom two bits of ->srcu_gp_seq.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      80a7956f
    • P
      91e27c35
    • P
      srcu: Move rcu_init_levelspread() to rcu_tree_node.h · 2b34c43c
      Paul E. McKenney 提交于
      This commit moves the rcu_init_levelspread() function from
      kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it.  This is
      another step towards enabling SRCU to create its own combining tree.
      This commit is code-movement only, give or take knock-on adjustments.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2b34c43c
    • P
      srcu: Use rcu_segcblist to track SRCU callbacks · 8660b7d8
      Paul E. McKenney 提交于
      This commit switches SRCU from custom-built callback queues to the new
      rcu_segcblist structure.  This change associates grace-period sequence
      numbers with groups of callbacks, which will be needed for efficient
      processing of per-CPU callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8660b7d8
    • P
      srcu: Add grace-period sequence numbers · ac367c1c
      Paul E. McKenney 提交于
      This commit adds grace-period sequence numbers, which will be used to
      handle mid-boot grace periods and per-CPU callback lists.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ac367c1c
    • P
      srcu: Move to state-based grace-period sequencing · c2a8ec07
      Paul E. McKenney 提交于
      The current SRCU grace-period processing might never reach the last
      portion of srcu_advance_batches().  This is OK given the current
      implementation, as the first portion, up to the try_check_zero()
      following the srcu_flip() is sufficient to drive grace periods forward.
      However, it has the unfortunate side-effect of making it impossible to
      determine when a given grace period has ended, and it will be necessary
      to efficiently trace ends of grace periods in order to efficiently handle
      per-CPU SRCU callback lists.
      
      This commit therefore adds states to the SRCU grace-period processing,
      so that the end of a given SRCU grace period is marked by the transition
      to the SRCU_STATE_DONE state.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c2a8ec07
    • P
      srcu: Push srcu_advance_batches() fastpath into common case · c6e56f59
      Paul E. McKenney 提交于
      This commit simplifies the SRCU state machine by pushing the
      srcu_advance_batches() idle-SRCU fastpath into the common case.  This is
      done by giving srcu_reschedule() a delay parameter, which is zero in
      the call from srcu_advance_batches().
      
      This commit is a step towards numbering callbacks in order to
      efficiently handle per-CPU callback lists.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c6e56f59
    • P
      srcu: Allow early boot use of synchronize_srcu() · b5eaeaa5
      Paul E. McKenney 提交于
      This commit checks for pre-scheduler state, and if that early in the
      boot process, synchronize_srcu() and friends are no-ops.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b5eaeaa5
    • P
      srcu: Check for tardy grace-period activity in cleanup_srcu_struct() · 15c68f7f
      Paul E. McKenney 提交于
      Users of SRCU are obliged to complete all grace-period activity before
      invoking cleanup_srcu_struct().  This means that all calls to either
      synchronize_srcu() or synchronize_srcu_expedited() must have returned,
      and all calls to call_srcu() must have returned, and the last call to
      call_srcu() must have been followed by a call to srcu_barrier().
      Furthermore, the caller must have done something to prevent any
      further calls to synchronize_srcu(), synchronize_srcu_expedited(),
      and call_srcu().
      
      Therefore, if there has ever been an invocation of call_srcu() on
      the srcu_struct in question, the sequence of events must be as
      follows:
      
      1.  Prevent any further calls to call_srcu().
      2.  Wait for any pre-existing call_srcu() invocations to return.
      3.  Invoke srcu_barrier().
      4.  It is now safe to invoke cleanup_srcu_struct().
      
      On the other hand, if there has ever been a call to synchronize_srcu()
      or synchronize_srcu_expedited(), the sequence of events must be as
      follows:
      
      1.  Prevent any further calls to synchronize_srcu() or
          synchronize_srcu_expedited().
      2.  Wait for any pre-existing synchronize_srcu() or
          synchronize_srcu_expedited() invocations to return.
      3.  It is now safe to invoke cleanup_srcu_struct().
      
      If there have been calls to all both types of functions (call_srcu()
      and either of synchronize_srcu() and synchronize_srcu_expedited()), then
      the caller must do the first three steps of the call_srcu() procedure
      above and the first two steps of the synchronize_s*() procedure above,
      and only then invoke cleanup_srcu_struct().
      
      Note that cleanup_srcu_struct() does some probabilistic checks
      for the caller failing to follow these procedures, in which case
      cleanup_srcu_struct() does WARN_ON() and avoids freeing the per-CPU
      structures associated with the specified srcu_struct structure.
      Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      15c68f7f
    • P
      srcu: Consolidate batch checking into rcu_all_batches_empty() · cc985822
      Paul E. McKenney 提交于
      The srcu_reschedule() function invokes rcu_batch_empty() on each of
      the four rcu_batch structures in the srcu_struct in question twice.
      Given that this check will also be needed in cleanup_srcu_struct(), this
      commit consolidates these four checks into a new rcu_all_batches_empty()
      function.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      cc985822
  3. 02 3月, 2017 1 次提交
    • I
      rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> · f9411ebe
      Ingo Molnar 提交于
      So rcupdate.h is a pretty complex header, in particular it includes
      <linux/completion.h> which includes <linux/wait.h> - creating a
      dependency that includes <linux/wait.h> in <linux/sched.h>,
      which prevents the isolation of <linux/sched.h> from the derived
      <linux/wait.h> header.
      
      Solve part of the problem by decoupling rcupdate.h from completions:
      this can be done by separating out the rcu_synchronize types and APIs,
      and updating their usage sites.
      
      Since this is a mostly RCU-internal types this will not just simplify
      <linux/sched.h>'s dependencies, but will make all the hundreds of
      .c files that include rcupdate.h but not completions or wait.h build
      faster.
      
      ( For rcutiny this means that two dependent APIs have to be uninlined,
        but that shouldn't be much of a problem as they are rare variants. )
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f9411ebe
  4. 26 1月, 2017 3 次提交
    • P
      srcu: Reduce probability of SRCU ->unlock_count[] counter overflow · 7f554a3d
      Paul E. McKenney 提交于
      Because there are no memory barriers between the srcu_flip() ->completed
      increment and the summation of the read-side ->unlock_count[] counters,
      both the compiler and the CPU can reorder the summation with the
      ->completed increment.  If the updater is preempted long enough during
      this process, the read-side counters could overflow, resulting in a
      too-short grace period.
      
      This commit therefore adds a memory barrier just after the ->completed
      increment, ensuring that if the summation misses an increment of
      ->unlock_count[] from __srcu_read_unlock(), the next __srcu_read_lock()
      will see the new value of ->completed, thus bounding the number of
      ->unlock_count[] increments that can be missed to NR_CPUS.  The actual
      overflow computation is more complex due to the possibility of nesting
      of __srcu_read_lock().
      Reported-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7f554a3d
    • P
      srcu: Force full grace-period ordering · d85b62f1
      Paul E. McKenney 提交于
      If a process invokes synchronize_srcu(), is delayed just the right amount
      of time, and thus does not sleep when waiting for the grace period to
      complete, there is no ordering between the end of the grace period and
      the code following the synchronize_srcu().  Similarly, there can be a
      lack of ordering between the end of the SRCU grace period and callback
      invocation.
      
      This commit adds the necessary ordering.
      Reported-by: NLance Roy <ldr709@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Further smp_mb() adjustment per email with Lance Roy. ]
      d85b62f1
    • L
      srcu: Implement more-efficient reader counts · f2c46896
      Lance Roy 提交于
      SRCU uses two per-cpu counters: a nesting counter to count the number of
      active critical sections, and a sequence counter to ensure that the nesting
      counters don't change while they are being added together in
      srcu_readers_active_idx_check().
      
      This patch instead uses per-cpu lock and unlock counters. Because both
      counters only increase and srcu_readers_active_idx_check() reads the unlock
      counter before the lock counter, this achieves the same end without having
      to increment two different counters in srcu_read_lock(). This also saves a
      smp_mb() in srcu_readers_active_idx_check().
      
      Possible bug: There is no guarantee that the lock counter won't overflow
      during srcu_readers_active_idx_check(), as there are no memory barriers
      around srcu_flip() (see comment in srcu_readers_active_idx_check() for
      details). However, this problem was already present before this patch.
      Suggested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLance Roy <ldr709@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2c46896
  5. 05 12月, 2015 1 次提交
    • P
      rcu: Add rcu_normal kernel parameter to suppress expediting · 5a9be7c6
      Paul E. McKenney 提交于
      Although expedited grace periods can be quite useful, and although their
      OS jitter has been greatly reduced, they can still pose problems for
      extreme real-time workloads.  This commit therefore adds a rcu_normal
      kernel boot parameter (which can also be manipulated via sysfs)
      to suppress expedited grace periods, that is, to treat requests for
      expedited grace periods as if they were requests for normal grace periods.
      If both rcu_expedited and rcu_normal are specified, rcu_normal wins.
      This means that if you are relying on expedited grace periods to speed up
      boot, you will want to specify rcu_expedited on the kernel command line,
      and then specify rcu_normal via sysfs once boot completes.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a9be7c6
  6. 07 10月, 2015 2 次提交
  7. 23 7月, 2015 1 次提交
  8. 16 7月, 2015 1 次提交
    • N
      rcu: Change return type to bool · f765d113
      Nicholas Mc Guire 提交于
      Type-checking coccinelle spatches are being used to locate type mismatches
      between function signatures and return values in this case this produced:
      ./kernel/rcu/srcu.c:271 WARNING: return of wrong type
              int != unsigned long,
      
      srcu_readers_active() returns an int that is the sum of per_cpu unsigned
      long but the only user is cleanup_srcu_struct() which is using it as a
      boolean (condition) to see if there is any readers rather than actually
      using the approximate number of readers. The theoretically possible
      unsigned long overflow case does not need to be handled explicitly - if
      we had 4G++ readers then something else went wrong a long time ago.
      
      proposal: change the return type to boolean. The function name is left
                unchanged as it fits the naming expectation for a boolean.
      
      patch was compile tested for x86_64_defconfig (implies CONFIG_SRCU=y)
      
      patch is against 4.1-rc5 (localversion-next is -next-20150525)
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f765d113
  9. 28 5月, 2015 1 次提交
  10. 27 2月, 2015 1 次提交
  11. 26 2月, 2015 1 次提交
  12. 07 1月, 2015 1 次提交
  13. 10 7月, 2014 1 次提交
    • P
      rcu: Eliminate read-modify-write ACCESS_ONCE() calls · a792563b
      Paul E. McKenney 提交于
      RCU contains code of the following forms:
      
      	ACCESS_ONCE(x)++;
      	ACCESS_ONCE(x) += y;
      	ACCESS_ONCE(x) -= y;
      
      Now these constructs do operate correctly, but they really result in a
      pair of volatile accesses, one to do the load and another to do the store.
      This can be confusing, as the casual reader might well assume that (for
      example) gcc might generate a memory-to-memory add instruction for each
      of these three cases.  In fact, gcc will do no such thing.  Also, there
      is a good chance that the kernel will move to separate load and store
      variants of ACCESS_ONCE(), and constructs like the above could easily
      confuse both people and scripts attempting to make that sort of change.
      Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
      only need the store to be volatile, so that the read-modify-write form
      might be misleading.
      
      This commit therefore changes the above forms in RCU so that each instance
      of ACCESS_ONCE() either does a load or a store, but not both.  In a few
      cases, ACCESS_ONCE() was not critical, for example, for maintaining
      statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
      entirely.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a792563b
  14. 26 2月, 2014 1 次提交
    • P
      rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone · 5cb5c6e1
      Paul Gortmaker 提交于
      The kbuild test bot uncovered an implicit dependence on the
      trace header being present before rcu.h in ia64 allmodconfig
      that looks like this:
      
      In file included from kernel/ksysfs.c:22:0:
      kernel/rcu/rcu.h: In function '__rcu_reclaim':
      kernel/rcu/rcu.h:107:3: error: implicit declaration of function 'trace_rcu_invoke_kfree_callback' [-Werror=implicit-function-declaration]
      kernel/rcu/rcu.h:112:3: error: implicit declaration of function 'trace_rcu_invoke_callback' [-Werror=implicit-function-declaration]
      cc1: some warnings being treated as errors
      
      Looking at other rcu.h users, we can find that they all
      were sourcing the trace header in advance of rcu.h itself,
      as seen in the context of this diff.  There were also some
      inconsistencies as to whether it was or wasn't sourced based
      on the parent tracing Kconfig.
      
      Rather than "fix" it at each use site, and have inconsistent
      use based on whether "#ifdef CONFIG_RCU_TRACE" was used or not,
      lets just source the trace header just once, in the actual consumer
      of it, which is rcu.h itself.  We include it unconditionally, as
      build testing shows us that is a hard requirement for some files.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5cb5c6e1
  15. 18 2月, 2014 2 次提交
  16. 10 12月, 2013 1 次提交
  17. 04 12月, 2013 1 次提交
  18. 16 10月, 2013 1 次提交
  19. 08 2月, 2013 6 次提交