1. 16 1月, 2015 3 次提交
    • P
      rcu: Add GP-kthread-starvation checks to CPU stall warnings · fb81a44b
      Paul E. McKenney 提交于
      This commit adds a message that is printed if the relevant grace-period
      kthread has not been able to run for the two seconds preceding the
      stall warning.  (The two seconds is double the maximum interval between
      successive bouts of quiescent-state forcing.)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fb81a44b
    • P
      rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193
      Paul E. McKenney 提交于
      Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
      in places where it would be useful for it to apply to the normal RCU
      flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
      case for workloads that aggressively overload the system, particularly
      those that generate large numbers of RCU updates on systems running
      NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
      from cond_resched_rcu_qs() to the normal RCU flavors.
      
      Note that it is unfortunately necessary to leave the old ->passed_quiesce
      mechanism in place to allow quiescent states that apply to only one
      flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
      that case, but that is not so good for debugging of RCU internals.)
      In addition, if one of the RCU flavor's grace period has stalled, this
      will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
      quiescent state visible from other CPUs.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
        was used in preemptible code. ]
      5cd37193
    • P
      rcu: Optionally run grace-period kthreads at real-time priority · a94844b2
      Paul E. McKenney 提交于
      Recent testing has shown that under heavy load, running RCU's grace-period
      kthreads at real-time priority can improve performance (according to 0day
      test robot) and reduce the incidence of RCU CPU stall warnings.  However,
      most systems do just fine with the default non-realtime priorities for
      these kthreads, and it does not make sense to expose the entire user
      base to any risk stemming from this change, given that this change is
      of use only to a few users running extremely heavy workloads.
      
      Therefore, this commit allows users to specify realtime priorities
      for the grace-period kthreads, but leaves them running SCHED_OTHER
      by default.  The realtime priority may be specified at build time
      via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
      rcutree.kthread_prio parameter.  Either way, 0 says to continue the
      default SCHED_OTHER behavior and values from 1-99 specify that priority
      of SCHED_FIFO behavior.  Note that a value of 0 is not permitted when
      the RCU_BOOST Kconfig parameter is specified.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a94844b2
  2. 11 1月, 2015 2 次提交
    • P
      rcutorture: Check from beginning to end of grace period · 917963d0
      Paul E. McKenney 提交于
      Currently, rcutorture's Reader Batch checks measure from the end of
      the previous grace period to the end of the current one.  This commit
      tightens up these checks by measuring from the start and end of the same
      grace period.  This involves adding rcu_batches_started() and friends
      corresponding to the existing rcu_batches_completed() and friends.
      
      We leave SRCU alone for the moment, as it does not yet have a way of
      tracking both ends of its grace periods.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      917963d0
    • P
      rcu: Make _batches_completed() functions return unsigned long · 9733e4f0
      Paul E. McKenney 提交于
      Long ago, the various ->completed fields were of type long, but now are
      unsigned long due to signed-integer-overflow concerns.  However, the
      various _batches_completed() functions remained of type long, even though
      their only purpose in life is to return the corresponding ->completed
      field.  This patch cleans this up by changing these functions' return
      types to unsigned long.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9733e4f0
  3. 07 1月, 2015 13 次提交
    • P
      rcu: Handle gpnum/completed wrap while dyntick idle · e3663b10
      Paul E. McKenney 提交于
      Subtle race conditions can result if a CPU stays in dyntick-idle mode
      long enough for the ->gpnum and ->completed fields to wrap.  For
      example, consider the following sequence of events:
      
      o	CPU 1 encounters a quiescent state while waiting for grace period
      	5 to complete, but then enters dyntick-idle mode.
      
      o	While CPU 1 is in dyntick-idle mode, the grace-period counters
      	wrap around so that the grace period number is now 4.
      
      o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
      	and grace period 5 begins.
      
      o	The quiescent state that CPU 1 passed through during the old
      	grace period 5 looks like it applies to the new grace period
      	5.  Therefore, the new grace period 5 completes without CPU 1
      	having passed through a quiescent state.
      
      This could clearly be a fatal surprise to any long-running RCU read-side
      critical section that happened to be running on CPU 1 at the time.  At one
      time, this was not a problem, given that it takes significant time for
      the grace-period counters to overflow even on 32-bit systems.  However,
      with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
      idle periods are now becoming quite feasible.  It is therefore time to
      close this race.
      
      This commit therefore avoids this race condition by having the
      quiescent-state forcing code detect when a CPU is falling too far
      behind, and setting a new rcu_data field ->gpwrap when this happens.
      Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
      fields are known to be untrustworthy, and can be ignored, along with
      any associated quiescent states.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e3663b10
    • P
      rcu: Improve diagnostics for spurious RCU CPU stall warnings · 6ccd2ecd
      Paul E. McKenney 提交于
      The current RCU CPU stall warning code will print "Stall ended before
      state dump start" any time that the stall-warning code is triggered on
      a CPU that has already reported a quiescent state for the current grace
      period and if all quiescent states have been reported for the current
      grace period.  However, a true stall can result in these symptoms, for
      example, by preventing RCU's grace-period kthreads from ever running
      
      This commit therefore checks for this condition, reporting the end of
      the stall only if one of the grace-period counters has actually advanced.
      Otherwise, it reports the last time that the grace-period kthread made
      meaningful progress.  (In normal situations, the grace-period kthread
      should make meaningful progress at least every jiffies_till_next_fqs
      jiffies.)
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMiroslav Benes <mbenes@suse.cz>
      6ccd2ecd
    • P
      rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts · fc908ed3
      Paul E. McKenney 提交于
      One way that an RCU CPU stall warning can happen is if the grace-period
      kthread is not allowed to execute.  One proxy for this kthread's
      forward progress is the number of force-quiescent-state (fqs) scans.
      This commit therefore adds the number of fqs scans to the RCU CPU stall
      warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fc908ed3
    • P
      rcu: Remove redundant callback-list initialization · ab954c16
      Paul E. McKenney 提交于
      The RCU callback lists are initialized in both rcu_boot_init_percpu_data()
      and rcu_init_percpu_data().  The former is intended for initializing
      immutable data, so this commit removes the initialization from
      rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data().
      This change prepares for permitting callbacks to be queued very early
      in boot.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ab954c16
    • P
      rcu: Don't scan root rcu_node structure for stalled tasks · 6cd534ef
      Paul E. McKenney 提交于
      Now that blocked tasks are no longer migrated to the root rcu_node
      structure, there is no need to scan the root rcu_node structure for
      blocked tasks stalling the current grace period.  This commit therefore
      removes this scan.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6cd534ef
    • P
      rcu: Note quiescent state when CPU goes offline · 3ba4d0e0
      Paul E. McKenney 提交于
      The rcu_cleanup_dead_cpu() function (called after a CPU has gone
      completely offline) has not reported a quiescent state because there
      was probably at least one synchronize_rcu() between the time the CPU
      went offline and the CPU_DEAD notifier, and this would have detected
      the CPU's offline state via quiescent-state forcing.  However, the plan
      is for CPUs to take themselves offline, at which point it makes sense
      for them to report their own quiescent state.  This commit makes this
      change in preparation for the new CPU-hotplug setup.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ba4d0e0
    • P
      rcu: Don't initiate RCU priority boosting on root rcu_node · 1be0085b
      Paul E. McKenney 提交于
      Because there is no longer any preempted tasks on the root rcu_node, and
      because there is no longer ever an rcub kthread for the root rcu_node,
      this commit drops the code in force_qs_rnp() that attempts to awaken
      the non-existent root rcub kthread.  This is strictly a performance
      enhancement, removing a root rcu_node ->lock acquisition and release
      along with some tests in rcu_initiate_boost(), ending with the test that
      notes that there is no rcub kthread.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1be0085b
    • P
      rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu() · a8f4cbad
      Paul E. McKenney 提交于
      Now that we are not migrating callbacks, there is no need to hold the
      ->orphan_lock across the the ->qsmaskinit bit-clearing process.
      This commit therefore releases ->orphan_lock immediately after adopting
      the orphaned RCU callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a8f4cbad
    • P
      rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1
      Paul E. McKenney 提交于
      When the last CPU associated with a given leaf rcu_node structure
      goes offline, something must be done about the tasks queued on that
      rcu_node structure.  Each of these tasks has been preempted on one of
      the leaf rcu_node structure's CPUs while in an RCU read-side critical
      section that it have not yet exited.  Handling these tasks is the job of
      rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
      structure to the root rcu_node structure.
      
      Unfortunately, this migration has to be done one task at a time because
      each tasks allegiance must be shifted from the original leaf rcu_node to
      the root, so that future attempts to deal with these tasks will acquire
      the root rcu_node structure's ->lock rather than that of the leaf.
      Worse yet, this migration must be done with interrupts disabled, which
      is not so good for realtime response, especially given that there is
      no bound on the number of tasks on a given rcu_node structure's list.
      (OK, OK, there is a bound, it is just that it is unreasonably large,
      especially on 64-bit systems.)  This was not considered a problem back
      when rcu_preempt_offline_tasks() was first written because realtime
      systems were assumed not to do CPU-hotplug operations while real-time
      applications were running.  This assumption has proved of dubious validity
      given that people are starting to run multiple realtime applications
      on a single SMP system and that it is common practice to offline then
      online a CPU before starting its real-time application in order to clear
      extraneous processing off of that CPU.  So we now need CPU hotplug
      operations to avoid undue latencies.
      
      This commit therefore avoids migrating these tasks, instead letting
      them be dequeued one by one from the original leaf rcu_node structure
      by rcu_read_unlock_special().  This means that the clearing of bits
      from the upper-level rcu_node structures must be deferred until the
      last such task has been dequeued, because otherwise subsequent grace
      periods won't wait on them.  This commit has the beneficial side effect
      of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
      CONFIG_RCU_BOOST builds.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d19fb8d1
    • P
      rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing · b6a932d1
      Paul E. McKenney 提交于
      This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit
      bit clearing up the rcu_node tree once a given rcu_node structure's
      blkd_tasks list becomes empty.  This is the final commit in preparation
      for the rework of RCU priority boosting:  It enables preempted tasks to
      remain queued on their rcu_node structure even after all of that rcu_node
      structure's CPUs have gone offline.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b6a932d1
    • P
      rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() · 8af3a5e7
      Paul E. McKenney 提交于
      This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
      in preparation for the rework of RCU priority boosting.  This new function
      will be invoked from rcu_read_unlock_special() in the reworked scheme,
      which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node
      structure's ->qsmaskinit field has already been updated.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8af3a5e7
    • P
      rcu: Fix rcu_barrier() race that could result in too-short wait · 41050a00
      Paul E. McKenney 提交于
      The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions.
      It checks a given CPU's lists of callbacks, and if all three no-CBs lists
      are empty, ignores that CPU.  However, these three lists could potentially
      be empty even when callbacks are present if the check executed just as
      the callbacks were being moved from one list to another.  It turns out
      that recent versions of rcutorture can spot this race.
      
      This commit plugs this hole by consolidating the per-list counts of
      no-CBs callbacks into a single count, which is incremented before
      the corresponding callback is posted and after it is invoked.  Then
      rcu_barrier() checks this single count to reliably determine whether
      the corresponding CPU has no-CBs callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      41050a00
    • L
      tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task · 5f6130fa
      Lai Jiangshan 提交于
      For RCU in UP, context-switch = QS = GP, thus we can force a
      context-switch when any call_rcu_[bh|sched]() is happened on idle_task.
      After doing so, rcu_idle/irq_enter/exit() are useless, so we can simply
      make these functions empty.
      
      More important, this change does not change the functionality logically.
      Note: raise_softirq(RCU_SOFTIRQ)/rcu_sched_qs() in rcu_idle_enter() and
      outmost rcu_irq_exit() will have to wake up the ksoftirqd
      (due to in_interrupt() == 0).
      
      Before this patch		After this patch:
      call_rcu_sched() in idle;	call_rcu_sched() in idle
      				  set resched
      do other stuffs;		do other stuffs
      outmost rcu_irq_exit()		outmost rcu_irq_exit() (empty function)
        (or rcu_idle_enter())		  (or rcu_idle_enter(), also empty function)
      				start to resched. (see above)
        rcu_sched_qs()		rcu_sched_qs()
          QS,and GP and advance cb	  QS,and GP and advance cb
          wake up the ksoftirqd	    wake up the ksoftirqd
            set resched
      resched to ksoftirqd (or other)	resched to ksoftirqd (or other)
      
      These two code patches are almost the same.
      
      Size changed after patched:
      
      size kernel/rcu/tiny-old.o kernel/rcu/tiny-patched.o
         text	   data	    bss	    dec	    hex	filename
         3449	    206	      8	   3663	    e4f	kernel/rcu/tiny-old.o
         2406	    144	      8	   2558	    9fe	kernel/rcu/tiny-patched.o
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5f6130fa
  4. 31 12月, 2014 2 次提交
    • P
      rcu: Fix invoke_rcu_callbacks() comment · 924df8a0
      Paul E. McKenney 提交于
      Despite what the comment says, it is only softirqs that are disabled,
      not interrupts.  This commit therefore fixes the comment.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      924df8a0
    • P
      rcu: Make rcu_nmi_enter() handle nesting · 734d1680
      Paul E. McKenney 提交于
      The x86 architecture has multiple types of NMI-like interrupts: real
      NMIs, machine checks, and, for some values of NMI-like, debugging
      and breakpoint interrupts.  These interrupts can nest inside each
      other.  Andy Lutomirski is adding RCU support to these interrupts,
      so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting.
      
      This commit therefore introduces nesting, using a clever NMI-coordination
      algorithm suggested by Andy.  The trick is to atomically increment
      ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry
      (and, accordingly, after on exit).  In addition, ->dynticks_nmi_nesting
      is incremented by one if ->dynticks was incremented and by two otherwise.
      This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal
      to one, it knows that ->dynticks must be atomically incremented.
      
      This NMI-coordination algorithms has been validated by the following
      Promela model:
      
      ------------------------------------------------------------------------
      
      /*
       * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter()
       * that allows nesting.
       *
       * This program is free software; you can redistribute it and/or modify
       * it under the terms of the GNU General Public License as published by
       * the Free Software Foundation; either version 2 of the License, or
       * (at your option) any later version.
       *
       * This program is distributed in the hope that it will be useful,
       * but WITHOUT ANY WARRANTY; without even the implied warranty of
       * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       * GNU General Public License for more details.
       *
       * You should have received a copy of the GNU General Public License
       * along with this program; if not, you can access it online at
       * http://www.gnu.org/licenses/gpl-2.0.html.
       *
       * Copyright IBM Corporation, 2014
       *
       * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
       */
      
      byte dynticks_nmi_nesting = 0;
      byte dynticks = 0;
      
      /*
       * Promela verision of rcu_nmi_enter().
       */
      inline rcu_nmi_enter()
      {
      	byte incby;
      	byte tmp;
      
      	incby = BUSY_INCBY;
      	assert(dynticks_nmi_nesting >= 0);
      	if
      	:: (dynticks & 1) == 0 ->
      		atomic {
      			dynticks = dynticks + 1;
      		}
      		assert((dynticks & 1) == 1);
      		incby = 1;
      	:: else ->
      		skip;
      	fi;
      	tmp = dynticks_nmi_nesting;
      	tmp = tmp + incby;
      	dynticks_nmi_nesting = tmp;
      	assert(dynticks_nmi_nesting >= 1);
      }
      
      /*
       * Promela verision of rcu_nmi_exit().
       */
      inline rcu_nmi_exit()
      {
      	byte tmp;
      
      	assert(dynticks_nmi_nesting > 0);
      	assert((dynticks & 1) != 0);
      	if
      	:: dynticks_nmi_nesting != 1 ->
      		tmp = dynticks_nmi_nesting;
      		tmp = tmp - BUSY_INCBY;
      		dynticks_nmi_nesting = tmp;
      	:: else ->
      		dynticks_nmi_nesting = 0;
      		atomic {
      			dynticks = dynticks + 1;
      		}
      		assert((dynticks & 1) == 0);
      	fi;
      }
      
      /*
       * Base-level NMI runs non-atomically.  Crudely emulates process-level
       * dynticks-idle entry/exit.
       */
      proctype base_NMI()
      {
      	byte busy;
      
      	busy = 0;
      	do
      	::	/* Emulate base-level dynticks and not. */
      		if
      		:: 1 ->	atomic {
      				dynticks = dynticks + 1;
      			}
      			busy = 1;
      		:: 1 ->	skip;
      		fi;
      
      		/* Verify that we only sometimes have base-level dynticks. */
      		if
      		:: busy == 0 -> skip;
      		:: busy == 1 -> skip;
      		fi;
      
      		/* Model RCU's NMI entry and exit actions. */
      		rcu_nmi_enter();
      		assert((dynticks & 1) == 1);
      		rcu_nmi_exit();
      
      		/* Emulated re-entering base-level dynticks and not. */
      		if
      		:: !busy -> skip;
      		:: busy ->
      			atomic {
      				dynticks = dynticks + 1;
      			}
      			busy = 0;
      		fi;
      
      		/* We had better now be in dyntick-idle mode. */
      		assert((dynticks & 1) == 0);
      	od;
      }
      
      /*
       * Nested NMI runs atomically to emulate interrupting base_level().
       */
      proctype nested_NMI()
      {
      	do
      	::	/*
      		 * Use an atomic section to model a nested NMI.  This is
      		 * guaranteed to interleave into base_NMI() between a pair
      		 * of base_NMI() statements, just as a nested NMI would.
      		 */
      		atomic {
      			/* Verify that we only sometimes are in dynticks. */
      			if
      			:: (dynticks & 1) == 0 -> skip;
      			:: (dynticks & 1) == 1 -> skip;
      			fi;
      
      			/* Model RCU's NMI entry and exit actions. */
      			rcu_nmi_enter();
      			assert((dynticks & 1) == 1);
      			rcu_nmi_exit();
      		}
      	od;
      }
      
      init {
      	run base_NMI();
      	run nested_NMI();
      }
      
      ------------------------------------------------------------------------
      
      The following script can be used to run this model if placed in
      rcu_nmi.spin:
      
      ------------------------------------------------------------------------
      
      if ! spin -a rcu_nmi.spin
      then
      	echo Spin errors!!!
      	exit 1
      fi
      if ! cc -DSAFETY -o pan pan.c
      then
      	echo Compilation errors!!!
      	exit 1
      fi
      ./pan -m100000
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      734d1680
  5. 04 11月, 2014 10 次提交
  6. 30 10月, 2014 1 次提交
  7. 29 10月, 2014 3 次提交
    • P
      rcu: Avoid IPIing idle CPUs from synchronize_sched_expedited() · e0775cef
      Paul E. McKenney 提交于
      Currently, synchronize_sched_expedited() sends IPIs to all online CPUs,
      even those that are idle or executing in nohz_full= userspace.  Because
      idle CPUs and nohz_full= userspace CPUs are in extended quiescent states,
      there is no need to IPI them in the first place.  This commit therefore
      avoids IPIing CPUs that are already in extended quiescent states.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e0775cef
    • P
      rcu: Move RCU_BOOST variable declarations, eliminating #ifdef · 61cfd097
      Paul E. McKenney 提交于
      There are some RCU_BOOST-specific per-CPU variable declarations that
      are needlessly defined under #ifdef in kernel/rcu/tree.c.  This commit
      therefore moves these declarations into a pre-existing #ifdef in
      kernel/rcu/tree_plugin.h.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      61cfd097
    • P
      rcu: Make rcu_barrier() understand about missing rcuo kthreads · d7e29933
      Paul E. McKenney 提交于
      Commit 35ce7f29 (rcu: Create rcuo kthreads only for onlined CPUs)
      avoids creating rcuo kthreads for CPUs that never come online.  This
      fixes a bug in many instances of firmware: Instead of lying about their
      age, these systems instead lie about the number of CPUs that they have.
      Before commit 35ce7f29, this could result in huge numbers of useless
      rcuo kthreads being created.
      
      It appears that experience indicates that I should have told the
      people suffering from this problem to fix their broken firmware, but
      I instead produced what turned out to be a partial fix.   The missing
      piece supplied by this commit makes sure that rcu_barrier() knows not to
      post callbacks for no-CBs CPUs that have not yet come online, because
      otherwise rcu_barrier() will hang on systems having firmware that lies
      about the number of CPUs.
      
      It is tempting to simply have rcu_barrier() refuse to post a callback on
      any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
      does not work because rcu_barrier() is required to wait for all pending
      callbacks.  It is therefore required to wait even for those callbacks
      that cannot possibly be invoked.  Even if doing so hangs the system.
      
      Given that posting a callback to a no-CBs CPU that does not yet have an
      rcuo kthread can hang rcu_barrier(), It is tempting to report an error
      in this case.  Unfortunately, this will result in false positives at
      boot time, when it is perfectly legal to post callbacks to the boot CPU
      before the scheduler has started, in other words, before it is legal
      to invoke rcu_barrier().
      
      So this commit instead has rcu_barrier() avoid posting callbacks to
      CPUs having neither rcuo kthread nor pending callbacks, and has it
      complain bitterly if it finds CPUs having no rcuo kthread but some
      pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
      kthread but pending callbacks, as noted earlier, it has no choice but
      to hang indefinitely.
      Reported-by: NYanko Kaneti <yaneti@declera.com>
      Reported-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NEric B Munson <emunson@akamai.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NEric B Munson <emunson@akamai.com>
      Tested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Tested-by: NYanko Kaneti <yaneti@declera.com>
      Tested-by: NKevin Fenzi <kevin@scrye.com>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      d7e29933
  8. 19 9月, 2014 1 次提交
    • P
      rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42
      Paul E. McKenney 提交于
      Currently, the expedited grace-period primitives do get_online_cpus().
      This greatly simplifies their implementation, but means that calls
      to them holding locks that are acquired by CPU-hotplug notifiers (to
      say nothing of calls to these primitives from CPU-hotplug notifiers)
      can deadlock.  But this is starting to become inconvenient, as can be
      seen here: https://lkml.org/lkml/2014/8/5/754.  The problem in this
      case is that some developers need to acquire a mutex from a CPU-hotplug
      notifier, but also need to hold it across a synchronize_rcu_expedited().
      As noted above, this currently results in deadlock.
      
      This commit avoids the deadlock and retains the simplicity by creating
      a try_get_online_cpus(), which returns false if the get_online_cpus()
      reference count could not immediately be incremented.  If a call to
      try_get_online_cpus() returns true, the expedited primitives operate as
      before.  If a call returns false, the expedited primitives fall back to
      normal grace-period operations.  This falling back of course results in
      increased grace-period latency, but only during times when CPU hotplug
      operations are actually in flight.  The effect should therefore be
      negligible during normal operation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Tested-by: NLan Tianyu <tianyu.lan@intel.com>
      dd56af42
  9. 17 9月, 2014 2 次提交
  10. 08 9月, 2014 3 次提交
    • P
      rcu: Per-CPU operation cleanups to rcu_*_qs() functions · 284a8c93
      Paul E. McKenney 提交于
      The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
      old-style per-CPU variable access and write to ->passed_quiesce even
      if it is already set.  This commit therefore updates to use the new-style
      per-CPU variable access functions and avoids the spurious writes.
      This commit also eliminates the "cpu" argument to these functions because
      they are always invoked on the indicated CPU.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      284a8c93
    • P
      rcu: Make TASKS_RCU handle nohz_full= CPUs · 176f8f7a
      Paul E. McKenney 提交于
      Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
      usermode execution.  There would be neither a context switch nor a
      scheduling-clock interrupt to tell TASKS_RCU that the task in question
      had passed through a quiescent state.  The grace period would therefore
      extend indefinitely.  This commit therefore makes RCU's dyntick-idle
      subsystem record the task_struct structure of the task that is running
      in dyntick-idle mode on each CPU.  The TASKS_RCU grace period can
      then access this information and record a quiescent state on
      behalf of any CPU running in dyntick-idle usermode.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      176f8f7a
    • P
      rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops · bde6c3aa
      Paul E. McKenney 提交于
      RCU-tasks requires the occasional voluntary context switch
      from CPU-bound in-kernel tasks.  In some cases, this requires
      instrumenting cond_resched().  However, there is some reluctance
      to countenance unconditionally instrumenting cond_resched() (see
      http://lwn.net/Articles/603252/), so this commit creates a separate
      cond_resched_rcu_qs() that may be used in place of cond_resched() in
      locations prone to long-duration in-kernel looping.
      
      This commit currently instruments only RCU-tasks.  Future possibilities
      include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
      IPI usage.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bde6c3aa