1. 07 1月, 2015 14 次提交
    • P
      rcu: Remove redundant callback-list initialization · ab954c16
      Paul E. McKenney 提交于
      The RCU callback lists are initialized in both rcu_boot_init_percpu_data()
      and rcu_init_percpu_data().  The former is intended for initializing
      immutable data, so this commit removes the initialization from
      rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data().
      This change prepares for permitting callbacks to be queued very early
      in boot.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ab954c16
    • P
      rcu: Don't scan root rcu_node structure for stalled tasks · 6cd534ef
      Paul E. McKenney 提交于
      Now that blocked tasks are no longer migrated to the root rcu_node
      structure, there is no need to scan the root rcu_node structure for
      blocked tasks stalling the current grace period.  This commit therefore
      removes this scan.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6cd534ef
    • L
      rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion · abaf3f9d
      Lai Jiangshan 提交于
      The patch dfeb9765 ("Allow post-unlock reference for rt_mutex")
      ensured rcu-boost safe even the rt_mutex has post-unlock reference.
      
      But rt_mutex allowing post-unlock reference is definitely a bug and it was
      fixed by the commit 27e35715 ("rtmutex: Plug slow unlock race").
      This fix made the previous patch (dfeb9765) useless.
      
      And even worse, the priority-inversion introduced by the the previous
      patch still exists.
      
      rcu_read_unlock_special() {
      	rt_mutex_unlock(&rnp->boost_mtx);
      	/* Priority-Inversion:
      	 * the current task had been deboosted and preempted as a low
      	 * priority task immediately, it could wait long before reschedule in,
      	 * and the rcu-booster also waits on this low priority task and sleeps.
      	 * This priority-inversion makes rcu-booster can't work
      	 * as expected.
      	 */
      	complete(&rnp->boost_completion);
      }
      
      Just revert the patch to avoid it.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abaf3f9d
    • P
      rcu: Note quiescent state when CPU goes offline · 3ba4d0e0
      Paul E. McKenney 提交于
      The rcu_cleanup_dead_cpu() function (called after a CPU has gone
      completely offline) has not reported a quiescent state because there
      was probably at least one synchronize_rcu() between the time the CPU
      went offline and the CPU_DEAD notifier, and this would have detected
      the CPU's offline state via quiescent-state forcing.  However, the plan
      is for CPUs to take themselves offline, at which point it makes sense
      for them to report their own quiescent state.  This commit makes this
      change in preparation for the new CPU-hotplug setup.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ba4d0e0
    • P
      rcu: Don't bother affinitying rcub kthreads away from offline CPUs · 5d0b0249
      Paul E. McKenney 提交于
      When rcu_boost_kthread_setaffinity() sees that all CPUs for a given
      rcu_node structure are now offline, it affinities the corresponding
      RCU-boost ("rcub") kthread away from those CPUs.  This is pointless
      because the kthread cannot run on those offline CPUs in any case.
      This commit therefore removes this unneeded code.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5d0b0249
    • P
      rcu: Don't initiate RCU priority boosting on root rcu_node · 1be0085b
      Paul E. McKenney 提交于
      Because there is no longer any preempted tasks on the root rcu_node, and
      because there is no longer ever an rcub kthread for the root rcu_node,
      this commit drops the code in force_qs_rnp() that attempts to awaken
      the non-existent root rcub kthread.  This is strictly a performance
      enhancement, removing a root rcu_node ->lock acquisition and release
      along with some tests in rcu_initiate_boost(), ending with the test that
      notes that there is no rcub kthread.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1be0085b
    • P
      rcu: Don't spawn rcub kthreads on root rcu_node structure · 3e9f5c70
      Paul E. McKenney 提交于
      Now that offlining CPUs no longer moves leaf rcu_node structures'
      ->blkd_tasks lists to the root, there is no way for the root rcu_node
      structure's ->blkd_task list to be nonempty, unless the root node is also
      the sole leaf node.  This commit therefore refrains from creating an rcub
      kthread for the root rcu_node structure unless it is also the sole leaf.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3e9f5c70
    • P
      rcu: Make use of rcu_preempt_has_tasks() · 96e92021
      Paul E. McKenney 提交于
      Given that there is now arcu_preempt_has_tasks() function that checks
      to see if the ->blkd_tasks list is non-empty, this commit makes use of it.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      96e92021
    • P
      rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu() · a8f4cbad
      Paul E. McKenney 提交于
      Now that we are not migrating callbacks, there is no need to hold the
      ->orphan_lock across the the ->qsmaskinit bit-clearing process.
      This commit therefore releases ->orphan_lock immediately after adopting
      the orphaned RCU callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a8f4cbad
    • P
      rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1
      Paul E. McKenney 提交于
      When the last CPU associated with a given leaf rcu_node structure
      goes offline, something must be done about the tasks queued on that
      rcu_node structure.  Each of these tasks has been preempted on one of
      the leaf rcu_node structure's CPUs while in an RCU read-side critical
      section that it have not yet exited.  Handling these tasks is the job of
      rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
      structure to the root rcu_node structure.
      
      Unfortunately, this migration has to be done one task at a time because
      each tasks allegiance must be shifted from the original leaf rcu_node to
      the root, so that future attempts to deal with these tasks will acquire
      the root rcu_node structure's ->lock rather than that of the leaf.
      Worse yet, this migration must be done with interrupts disabled, which
      is not so good for realtime response, especially given that there is
      no bound on the number of tasks on a given rcu_node structure's list.
      (OK, OK, there is a bound, it is just that it is unreasonably large,
      especially on 64-bit systems.)  This was not considered a problem back
      when rcu_preempt_offline_tasks() was first written because realtime
      systems were assumed not to do CPU-hotplug operations while real-time
      applications were running.  This assumption has proved of dubious validity
      given that people are starting to run multiple realtime applications
      on a single SMP system and that it is common practice to offline then
      online a CPU before starting its real-time application in order to clear
      extraneous processing off of that CPU.  So we now need CPU hotplug
      operations to avoid undue latencies.
      
      This commit therefore avoids migrating these tasks, instead letting
      them be dequeued one by one from the original leaf rcu_node structure
      by rcu_read_unlock_special().  This means that the clearing of bits
      from the upper-level rcu_node structures must be deferred until the
      last such task has been dequeued, because otherwise subsequent grace
      periods won't wait on them.  This commit has the beneficial side effect
      of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
      CONFIG_RCU_BOOST builds.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d19fb8d1
    • P
      rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing · b6a932d1
      Paul E. McKenney 提交于
      This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit
      bit clearing up the rcu_node tree once a given rcu_node structure's
      blkd_tasks list becomes empty.  This is the final commit in preparation
      for the rework of RCU priority boosting:  It enables preempted tasks to
      remain queued on their rcu_node structure even after all of that rcu_node
      structure's CPUs have gone offline.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b6a932d1
    • P
      rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() · 8af3a5e7
      Paul E. McKenney 提交于
      This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
      in preparation for the rework of RCU priority boosting.  This new function
      will be invoked from rcu_read_unlock_special() in the reworked scheme,
      which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node
      structure's ->qsmaskinit field has already been updated.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8af3a5e7
    • P
      rcu: Rename "empty" to "empty_norm" in preparation for boost rework · 74e871ac
      Paul E. McKenney 提交于
      This commit undertakes a simple variable renaming to make way for
      some rework of RCU priority boosting.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      74e871ac
    • P
      rcu: Protect rcu_boost() lockless accesses with ACCESS_ONCE() · b08ea27d
      Paul E. McKenney 提交于
      This commit prevents random compiler optimizations by applying
      ACCESS_ONCE() to lockless accesses.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b08ea27d
  2. 31 12月, 2014 1 次提交
    • P
      rcu: Make rcu_nmi_enter() handle nesting · 734d1680
      Paul E. McKenney 提交于
      The x86 architecture has multiple types of NMI-like interrupts: real
      NMIs, machine checks, and, for some values of NMI-like, debugging
      and breakpoint interrupts.  These interrupts can nest inside each
      other.  Andy Lutomirski is adding RCU support to these interrupts,
      so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting.
      
      This commit therefore introduces nesting, using a clever NMI-coordination
      algorithm suggested by Andy.  The trick is to atomically increment
      ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry
      (and, accordingly, after on exit).  In addition, ->dynticks_nmi_nesting
      is incremented by one if ->dynticks was incremented and by two otherwise.
      This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal
      to one, it knows that ->dynticks must be atomically incremented.
      
      This NMI-coordination algorithms has been validated by the following
      Promela model:
      
      ------------------------------------------------------------------------
      
      /*
       * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter()
       * that allows nesting.
       *
       * This program is free software; you can redistribute it and/or modify
       * it under the terms of the GNU General Public License as published by
       * the Free Software Foundation; either version 2 of the License, or
       * (at your option) any later version.
       *
       * This program is distributed in the hope that it will be useful,
       * but WITHOUT ANY WARRANTY; without even the implied warranty of
       * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       * GNU General Public License for more details.
       *
       * You should have received a copy of the GNU General Public License
       * along with this program; if not, you can access it online at
       * http://www.gnu.org/licenses/gpl-2.0.html.
       *
       * Copyright IBM Corporation, 2014
       *
       * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
       */
      
      byte dynticks_nmi_nesting = 0;
      byte dynticks = 0;
      
      /*
       * Promela verision of rcu_nmi_enter().
       */
      inline rcu_nmi_enter()
      {
      	byte incby;
      	byte tmp;
      
      	incby = BUSY_INCBY;
      	assert(dynticks_nmi_nesting >= 0);
      	if
      	:: (dynticks & 1) == 0 ->
      		atomic {
      			dynticks = dynticks + 1;
      		}
      		assert((dynticks & 1) == 1);
      		incby = 1;
      	:: else ->
      		skip;
      	fi;
      	tmp = dynticks_nmi_nesting;
      	tmp = tmp + incby;
      	dynticks_nmi_nesting = tmp;
      	assert(dynticks_nmi_nesting >= 1);
      }
      
      /*
       * Promela verision of rcu_nmi_exit().
       */
      inline rcu_nmi_exit()
      {
      	byte tmp;
      
      	assert(dynticks_nmi_nesting > 0);
      	assert((dynticks & 1) != 0);
      	if
      	:: dynticks_nmi_nesting != 1 ->
      		tmp = dynticks_nmi_nesting;
      		tmp = tmp - BUSY_INCBY;
      		dynticks_nmi_nesting = tmp;
      	:: else ->
      		dynticks_nmi_nesting = 0;
      		atomic {
      			dynticks = dynticks + 1;
      		}
      		assert((dynticks & 1) == 0);
      	fi;
      }
      
      /*
       * Base-level NMI runs non-atomically.  Crudely emulates process-level
       * dynticks-idle entry/exit.
       */
      proctype base_NMI()
      {
      	byte busy;
      
      	busy = 0;
      	do
      	::	/* Emulate base-level dynticks and not. */
      		if
      		:: 1 ->	atomic {
      				dynticks = dynticks + 1;
      			}
      			busy = 1;
      		:: 1 ->	skip;
      		fi;
      
      		/* Verify that we only sometimes have base-level dynticks. */
      		if
      		:: busy == 0 -> skip;
      		:: busy == 1 -> skip;
      		fi;
      
      		/* Model RCU's NMI entry and exit actions. */
      		rcu_nmi_enter();
      		assert((dynticks & 1) == 1);
      		rcu_nmi_exit();
      
      		/* Emulated re-entering base-level dynticks and not. */
      		if
      		:: !busy -> skip;
      		:: busy ->
      			atomic {
      				dynticks = dynticks + 1;
      			}
      			busy = 0;
      		fi;
      
      		/* We had better now be in dyntick-idle mode. */
      		assert((dynticks & 1) == 0);
      	od;
      }
      
      /*
       * Nested NMI runs atomically to emulate interrupting base_level().
       */
      proctype nested_NMI()
      {
      	do
      	::	/*
      		 * Use an atomic section to model a nested NMI.  This is
      		 * guaranteed to interleave into base_NMI() between a pair
      		 * of base_NMI() statements, just as a nested NMI would.
      		 */
      		atomic {
      			/* Verify that we only sometimes are in dynticks. */
      			if
      			:: (dynticks & 1) == 0 -> skip;
      			:: (dynticks & 1) == 1 -> skip;
      			fi;
      
      			/* Model RCU's NMI entry and exit actions. */
      			rcu_nmi_enter();
      			assert((dynticks & 1) == 1);
      			rcu_nmi_exit();
      		}
      	od;
      }
      
      init {
      	run base_NMI();
      	run nested_NMI();
      }
      
      ------------------------------------------------------------------------
      
      The following script can be used to run this model if placed in
      rcu_nmi.spin:
      
      ------------------------------------------------------------------------
      
      if ! spin -a rcu_nmi.spin
      then
      	echo Spin errors!!!
      	exit 1
      fi
      if ! cc -DSAFETY -o pan pan.c
      then
      	echo Compilation errors!!!
      	exit 1
      fi
      ./pan -m100000
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      734d1680
  3. 14 11月, 2014 1 次提交
  4. 04 11月, 2014 11 次提交
  5. 30 10月, 2014 4 次提交
  6. 29 10月, 2014 4 次提交
    • P
      rcu: Avoid IPIing idle CPUs from synchronize_sched_expedited() · e0775cef
      Paul E. McKenney 提交于
      Currently, synchronize_sched_expedited() sends IPIs to all online CPUs,
      even those that are idle or executing in nohz_full= userspace.  Because
      idle CPUs and nohz_full= userspace CPUs are in extended quiescent states,
      there is no need to IPI them in the first place.  This commit therefore
      avoids IPIing CPUs that are already in extended quiescent states.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e0775cef
    • P
      rcu: Move RCU_BOOST variable declarations, eliminating #ifdef · 61cfd097
      Paul E. McKenney 提交于
      There are some RCU_BOOST-specific per-CPU variable declarations that
      are needlessly defined under #ifdef in kernel/rcu/tree.c.  This commit
      therefore moves these declarations into a pre-existing #ifdef in
      kernel/rcu/tree_plugin.h.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      61cfd097
    • P
      rcu: Remove CONFIG_RCU_CPU_STALL_VERBOSE · 0eafa468
      Paul E. McKenney 提交于
      The CONFIG_RCU_CPU_STALL_VERBOSE Kconfig parameter causes preemptible
      RCU's CPU stall warnings to dump out any preempted tasks that are blocking
      the current RCU grace period.  This information is useful, and the default
      has been CONFIG_RCU_CPU_STALL_VERBOSE=y for some years.  It is therefore
      time for this commit to remove this Kconfig parameter, so that future
      kernel builds will always act as if CONFIG_RCU_CPU_STALL_VERBOSE=y.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0eafa468
    • P
      rcu: Make rcu_barrier() understand about missing rcuo kthreads · d7e29933
      Paul E. McKenney 提交于
      Commit 35ce7f29 (rcu: Create rcuo kthreads only for onlined CPUs)
      avoids creating rcuo kthreads for CPUs that never come online.  This
      fixes a bug in many instances of firmware: Instead of lying about their
      age, these systems instead lie about the number of CPUs that they have.
      Before commit 35ce7f29, this could result in huge numbers of useless
      rcuo kthreads being created.
      
      It appears that experience indicates that I should have told the
      people suffering from this problem to fix their broken firmware, but
      I instead produced what turned out to be a partial fix.   The missing
      piece supplied by this commit makes sure that rcu_barrier() knows not to
      post callbacks for no-CBs CPUs that have not yet come online, because
      otherwise rcu_barrier() will hang on systems having firmware that lies
      about the number of CPUs.
      
      It is tempting to simply have rcu_barrier() refuse to post a callback on
      any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
      does not work because rcu_barrier() is required to wait for all pending
      callbacks.  It is therefore required to wait even for those callbacks
      that cannot possibly be invoked.  Even if doing so hangs the system.
      
      Given that posting a callback to a no-CBs CPU that does not yet have an
      rcuo kthread can hang rcu_barrier(), It is tempting to report an error
      in this case.  Unfortunately, this will result in false positives at
      boot time, when it is perfectly legal to post callbacks to the boot CPU
      before the scheduler has started, in other words, before it is legal
      to invoke rcu_barrier().
      
      So this commit instead has rcu_barrier() avoid posting callbacks to
      CPUs having neither rcuo kthread nor pending callbacks, and has it
      complain bitterly if it finds CPUs having no rcuo kthread but some
      pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
      kthread but pending callbacks, as noted earlier, it has no choice but
      to hang indefinitely.
      Reported-by: NYanko Kaneti <yaneti@declera.com>
      Reported-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NEric B Munson <emunson@akamai.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NEric B Munson <emunson@akamai.com>
      Tested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Tested-by: NYanko Kaneti <yaneti@declera.com>
      Tested-by: NKevin Fenzi <kevin@scrye.com>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      d7e29933
  7. 19 9月, 2014 1 次提交
    • P
      rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42
      Paul E. McKenney 提交于
      Currently, the expedited grace-period primitives do get_online_cpus().
      This greatly simplifies their implementation, but means that calls
      to them holding locks that are acquired by CPU-hotplug notifiers (to
      say nothing of calls to these primitives from CPU-hotplug notifiers)
      can deadlock.  But this is starting to become inconvenient, as can be
      seen here: https://lkml.org/lkml/2014/8/5/754.  The problem in this
      case is that some developers need to acquire a mutex from a CPU-hotplug
      notifier, but also need to hold it across a synchronize_rcu_expedited().
      As noted above, this currently results in deadlock.
      
      This commit avoids the deadlock and retains the simplicity by creating
      a try_get_online_cpus(), which returns false if the get_online_cpus()
      reference count could not immediately be incremented.  If a call to
      try_get_online_cpus() returns true, the expedited primitives operate as
      before.  If a call returns false, the expedited primitives fall back to
      normal grace-period operations.  This falling back of course results in
      increased grace-period latency, but only during times when CPU hotplug
      operations are actually in flight.  The effect should therefore be
      negligible during normal operation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Tested-by: NLan Tianyu <tianyu.lan@intel.com>
      dd56af42
  8. 17 9月, 2014 4 次提交