1. 26 7月, 2017 10 次提交
    • P
      rcu: Move callback-list warning to irq-disable region · 09efeeee
      Paul E. McKenney 提交于
      After adopting callbacks from a newly offlined CPU, the adopting CPU
      checks to make sure that its callback list's count is zero only if the
      list has no callbacks and vice versa.  Unfortunately, it does so after
      enabling interrupts, which means that false positives are possible due to
      interrupt handlers invoking call_rcu().  Although these false positives
      are improbable, rcutorture did make it happen once.
      
      This commit therefore moves this check to an irq-disabled region of code,
      thus suppressing the false positive.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09efeeee
    • P
      rcu: Localize rcu_state ->orphan_pend and ->orphan_done · f2dbe4a5
      Paul E. McKenney 提交于
      Given that the rcu_state structure's >orphan_pend and ->orphan_done
      fields are used only during migration of callbacks from the recently
      offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
      rcu_adopt_orphan_cbs() are combined, these fields can become local
      variables in the combined function.  This commit therefore combines
      rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
      rcu_segcblist_merge() function and removes the ->orphan_pend and
      ->orphan_done fields.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      f2dbe4a5
    • P
      rcu: Advance callbacks after migration · 21cc2483
      Paul E. McKenney 提交于
      When migrating callbacks from a newly offlined CPU, we are already
      holding the root rcu_node structure's lock, so it costs almost nothing
      to advance and accelerate the newly migrated callbacks.  This patch
      therefore makes this advancing and acceleration happen.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      21cc2483
    • P
      rcu: Eliminate rcu_state ->orphan_lock · 537b85c8
      Paul E. McKenney 提交于
      The ->orphan_lock is acquired and released only within the
      rcu_migrate_callbacks() function, which now acquires the root rcu_node
      structure's ->lock.  This commit therefore eliminates the ->orphan_lock
      in favor of the root rcu_node structure's ->lock.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      537b85c8
    • P
      rcu: Advance outgoing CPU's callbacks before migrating them · 9fa46fb8
      Paul E. McKenney 提交于
      It is possible that the outgoing CPU is unaware of recent grace periods,
      and so it is also possible that some of its pending callbacks are actually
      ready to be invoked.  The current callback-migration code would needlessly
      force these callbacks to pass through another grace period.  This commit
      therefore invokes rcu_advance_cbs() on the outgoing CPU's callbacks in
      order to give them full credit for having passed through any recent
      grace periods.
      
      This also fixes an odd theoretical bug where there are no callbacks in
      the system except for those on the outgoing CPU, none of those callbacks
      have yet been associated with a grace-period number, there is never again
      another callback registered, and the surviving CPU never again takes a
      scheduling-clock interrupt, never goes idle, and never enters nohz_full
      userspace execution.  Yes, this is (just barely) possible.  It requires
      that the surviving CPU be a nohz_full CPU, that its scheduler-clock
      interrupt be shut off, and that it loop forever in the kernel.  You get
      bonus points if you can make this one happen!  ;-)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fa46fb8
    • P
      rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU · b1a2d79f
      Paul E. McKenney 提交于
      RCU's CPU-hotplug callback-migration code first moves the outgoing
      CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
      moves them to the NOCB callback list.  This commit avoids the
      extra step (and simplifies the code) by moving the callbacks directly
      from the outgoing CPU's callback list to the NOCB callback list.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b1a2d79f
    • P
      rcu: Check for NOCB CPUs and empty lists earlier in CB migration · 95335c03
      Paul E. McKenney 提交于
      The current CPU-hotplug RCU-callback-migration code checks
      for the source (newly offlined) CPU being a NOCBs CPU down in
      rcu_send_cbs_to_orphanage().  This commit simplifies callback migration a
      bit by moving this check up to rcu_migrate_callbacks().  This commit also
      adds a check for the source CPU having no callbacks, which eases analysis
      of the rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() functions.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      95335c03
    • P
      rcu: Remove orphan/adopt event-tracing fields · c47e067a
      Paul E. McKenney 提交于
      The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
      are updated, but never read.  This commit therefore removes them.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c47e067a
    • P
      rcu: Make expedited GPs correctly handle hardware CPU insertion · 313517fc
      Paul E. McKenney 提交于
      The update of the ->expmaskinitnext and of ->ncpus are unsynchronized,
      with the value of ->ncpus being incremented long before the corresponding
      ->expmaskinitnext mask is updated.  If an RCU expedited grace period
      sees ->ncpus change, it will update the ->expmaskinit masks from the new
      ->expmaskinitnext masks.  But it is possible that ->ncpus has already
      been updated, but the ->expmaskinitnext masks still have their old values.
      For the current expedited grace period, no harm done.  The CPU could not
      have been online before the grace period started, so there is no need to
      wait for its non-existent pre-existing readers.
      
      But the next RCU expedited grace period is in a world of hurt.  The value
      of ->ncpus has already been updated, so this grace period will assume
      that the ->expmaskinitnext masks have not changed.  But they have, and
      they won't be taken into account until the next never-been-online CPU
      comes online.  This means that RCU will be ignoring some CPUs that it
      should be paying attention to.
      
      The solution is to update ->ncpus and ->expmaskinitnext while holding
      the ->lock for the rcu_node structure containing the ->expmaskinitnext
      mask.  Because smp_store_release() is now used to update ->ncpus and
      smp_load_acquire() is now used to locklessly read it, if the expedited
      grace period sees ->ncpus change, then the updating CPU has to
      already be holding the corresponding ->lock.  Therefore, when the
      expedited grace period later acquires that ->lock, it is guaranteed
      to see the new value of ->expmaskinitnext.
      
      On the other hand, if the expedited grace period loads ->ncpus just
      before an update, earlier full memory barriers guarantee that
      the incoming CPU isn't far enough along to be running any RCU readers.
      
      This commit therefore makes the required change.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      313517fc
    • P
      rcu: Migrate callbacks earlier in the CPU-offline timeline · a58163d8
      Paul E. McKenney 提交于
      RCU callbacks must be migrated away from an outgoing CPU, and this is
      done near the end of the CPU-hotplug operation, after the outgoing CPU is
      long gone.  Unfortunately, this means that other CPU-hotplug callbacks
      can execute while the outgoing CPU's callbacks are still immobilized
      on the long-gone CPU's callback lists.  If any of these CPU-hotplug
      callbacks must wait, either directly or indirectly, for the invocation
      of any of the immobilized RCU callbacks, the system will hang.
      
      This commit avoids such hangs by migrating the callbacks away from the
      outgoing CPU immediately upon its departure, shortly after the return
      from __cpu_die() in takedown_cpu().  Thus, RCU is able to advance these
      callbacks and invoke them, which allows all the after-the-fact CPU-hotplug
      callbacks to wait on these RCU callbacks without risk of a hang.
      
      While in the neighborhood, this commit also moves rcu_send_cbs_to_orphanage()
      and rcu_adopt_orphan_cbs() under a pre-existing #ifdef to avoid including
      dead code on the one hand and to avoid define-without-use warnings on the
      other hand.
      Reported-by: NJeffrey Hugo <jhugo@codeaurora.org>
      Link: http://lkml.kernel.org/r/db9c91f6-1b17-6136-84f0-03c3c2581ab4@codeaurora.orgSigned-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Richard Weinberger <richard@nod.at>
      a58163d8
  2. 09 6月, 2017 6 次提交
  3. 08 6月, 2017 6 次提交
  4. 03 5月, 2017 2 次提交
  5. 02 5月, 2017 1 次提交
  6. 27 4月, 2017 1 次提交
  7. 21 4月, 2017 2 次提交
    • P
      rcu: Make non-preemptive schedule be Tasks RCU quiescent state · bcbfdd01
      Paul E. McKenney 提交于
      Currently, a call to schedule() acts as a Tasks RCU quiescent state
      only if a context switch actually takes place.  However, just the
      call to schedule() guarantees that the calling task has moved off of
      whatever tracing trampoline that it might have been one previously.
      This commit therefore plumbs schedule()'s "preempt" parameter into
      rcu_note_context_switch(), which then records the Tasks RCU quiescent
      state, but only if this call to schedule() was -not- due to a preemption.
      
      To avoid adding overhead to the common-case context-switch path,
      this commit hides the rcu_note_context_switch() check under an existing
      non-common-case check.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bcbfdd01
    • P
      srcu: Parallelize callback handling · da915ad5
      Paul E. McKenney 提交于
      Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
      however, there are workloads that could result in a high volume of
      concurrent invocations of call_srcu(), which with current SRCU would
      result in excessive lock contention on the srcu_struct structure's
      ->queue_lock, which protects SRCU's callback lists.  This commit therefore
      moves SRCU to per-CPU callback lists, thus greatly reducing contention.
      
      Because a given SRCU instance no longer has a single centralized callback
      list, starting grace periods and invoking callbacks are both more complex
      than in the single-list Classic SRCU implementation.  Starting grace
      periods and handling callbacks are now handled using an srcu_node tree
      that is in some ways similar to the rcu_node trees used by RCU-bh,
      RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
      controlled by exactly the same Kconfig options and boot parameters that
      control the shape of the rcu_node tree).
      
      In addition, the old per-CPU srcu_array structure is now named srcu_data
      and contains an rcu_segcblist structure named ->srcu_cblist for its
      callbacks (and a spinlock to protect this).  The srcu_struct gets
      an srcu_gp_seq that is used to associate callback segments with the
      corresponding completion-time grace-period number.  These completion-time
      grace-period numbers are propagated up the srcu_node tree so that the
      grace-period workqueue handler can determine whether additional grace
      periods are needed on the one hand and where to look for callbacks that
      are ready to be invoked.
      
      The srcu_barrier() function must now wait on all instances of the per-CPU
      ->srcu_cblist.  Because each ->srcu_cblist is protected by ->lock,
      srcu_barrier() can remotely add the needed callbacks.  In theory,
      it could also remotely start grace periods, but in practice doing so
      is complex and racy.  And interestingly enough, it is never necessary
      for srcu_barrier() to start a grace period because srcu_barrier() only
      enqueues a callback when a callback is already present--and it turns out
      that a grace period has to have already been started for this pre-existing
      callback.  Furthermore, it is only the callback that srcu_barrier()
      needs to wait on, not any particular grace period.  Therefore, a new
      rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
      callback into the same segment occupied by the last pre-existing callback
      in the list.  The special case where all the pre-existing callbacks are
      on a different list (because they are in the process of being invoked)
      is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
      segment, relying on the done-callbacks check that takes place after all
      callbacks are inovked.
      
      Note that the readers use the same algorithm as before.  Note that there
      is a separate srcu_idx that tells the readers what counter to increment.
      This unfortunately cannot be combined with srcu_gp_seq because they
      need to be incremented at different times.
      
      This commit introduces some ugly #ifdefs in rcutorture.  These will go
      away when I feel good enough about Tree SRCU to ditch Classic SRCU.
      
      Some crude performance comparisons, courtesy of a quickly hacked rcuperf
      asynchronous-grace-period capability:
      
      			Callback Queuing Overhead
      			-------------------------
      	# CPUS		Classic SRCU	Tree SRCU
      	------          ------------    ---------
      	     2              0.349 us     0.342 us
      	    16             31.66  us     0.4   us
      	    41             ---------     0.417 us
      
      The times are the 90th percentiles, a statistic that was chosen to reject
      the overheads of the occasional srcu_barrier() call needed to avoid OOMing
      the test machine.  The rcuperf test hangs when running Classic SRCU at 41
      CPUs, hence the line of dashes.  Despite the hacks to both the rcuperf code
      and that statistics, this is a convincing demonstration of Tree SRCU's
      performance and scalability advantages.
      
      [1] https://lwn.net/Articles/309030/
      [2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
      da915ad5
  8. 20 4月, 2017 4 次提交
  9. 19 4月, 2017 8 次提交