1. 07 1月, 2015 8 次提交
    • P
      rcu: Remove redundant callback-list initialization · ab954c16
      Paul E. McKenney 提交于
      The RCU callback lists are initialized in both rcu_boot_init_percpu_data()
      and rcu_init_percpu_data().  The former is intended for initializing
      immutable data, so this commit removes the initialization from
      rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data().
      This change prepares for permitting callbacks to be queued very early
      in boot.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ab954c16
    • P
      rcu: Don't scan root rcu_node structure for stalled tasks · 6cd534ef
      Paul E. McKenney 提交于
      Now that blocked tasks are no longer migrated to the root rcu_node
      structure, there is no need to scan the root rcu_node structure for
      blocked tasks stalling the current grace period.  This commit therefore
      removes this scan.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6cd534ef
    • P
      rcu: Note quiescent state when CPU goes offline · 3ba4d0e0
      Paul E. McKenney 提交于
      The rcu_cleanup_dead_cpu() function (called after a CPU has gone
      completely offline) has not reported a quiescent state because there
      was probably at least one synchronize_rcu() between the time the CPU
      went offline and the CPU_DEAD notifier, and this would have detected
      the CPU's offline state via quiescent-state forcing.  However, the plan
      is for CPUs to take themselves offline, at which point it makes sense
      for them to report their own quiescent state.  This commit makes this
      change in preparation for the new CPU-hotplug setup.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ba4d0e0
    • P
      rcu: Don't initiate RCU priority boosting on root rcu_node · 1be0085b
      Paul E. McKenney 提交于
      Because there is no longer any preempted tasks on the root rcu_node, and
      because there is no longer ever an rcub kthread for the root rcu_node,
      this commit drops the code in force_qs_rnp() that attempts to awaken
      the non-existent root rcub kthread.  This is strictly a performance
      enhancement, removing a root rcu_node ->lock acquisition and release
      along with some tests in rcu_initiate_boost(), ending with the test that
      notes that there is no rcub kthread.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1be0085b
    • P
      rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu() · a8f4cbad
      Paul E. McKenney 提交于
      Now that we are not migrating callbacks, there is no need to hold the
      ->orphan_lock across the the ->qsmaskinit bit-clearing process.
      This commit therefore releases ->orphan_lock immediately after adopting
      the orphaned RCU callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a8f4cbad
    • P
      rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1
      Paul E. McKenney 提交于
      When the last CPU associated with a given leaf rcu_node structure
      goes offline, something must be done about the tasks queued on that
      rcu_node structure.  Each of these tasks has been preempted on one of
      the leaf rcu_node structure's CPUs while in an RCU read-side critical
      section that it have not yet exited.  Handling these tasks is the job of
      rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
      structure to the root rcu_node structure.
      
      Unfortunately, this migration has to be done one task at a time because
      each tasks allegiance must be shifted from the original leaf rcu_node to
      the root, so that future attempts to deal with these tasks will acquire
      the root rcu_node structure's ->lock rather than that of the leaf.
      Worse yet, this migration must be done with interrupts disabled, which
      is not so good for realtime response, especially given that there is
      no bound on the number of tasks on a given rcu_node structure's list.
      (OK, OK, there is a bound, it is just that it is unreasonably large,
      especially on 64-bit systems.)  This was not considered a problem back
      when rcu_preempt_offline_tasks() was first written because realtime
      systems were assumed not to do CPU-hotplug operations while real-time
      applications were running.  This assumption has proved of dubious validity
      given that people are starting to run multiple realtime applications
      on a single SMP system and that it is common practice to offline then
      online a CPU before starting its real-time application in order to clear
      extraneous processing off of that CPU.  So we now need CPU hotplug
      operations to avoid undue latencies.
      
      This commit therefore avoids migrating these tasks, instead letting
      them be dequeued one by one from the original leaf rcu_node structure
      by rcu_read_unlock_special().  This means that the clearing of bits
      from the upper-level rcu_node structures must be deferred until the
      last such task has been dequeued, because otherwise subsequent grace
      periods won't wait on them.  This commit has the beneficial side effect
      of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
      CONFIG_RCU_BOOST builds.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d19fb8d1
    • P
      rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing · b6a932d1
      Paul E. McKenney 提交于
      This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit
      bit clearing up the rcu_node tree once a given rcu_node structure's
      blkd_tasks list becomes empty.  This is the final commit in preparation
      for the rework of RCU priority boosting:  It enables preempted tasks to
      remain queued on their rcu_node structure even after all of that rcu_node
      structure's CPUs have gone offline.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b6a932d1
    • P
      rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() · 8af3a5e7
      Paul E. McKenney 提交于
      This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
      in preparation for the rework of RCU priority boosting.  This new function
      will be invoked from rcu_read_unlock_special() in the reworked scheme,
      which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node
      structure's ->qsmaskinit field has already been updated.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8af3a5e7
  2. 31 12月, 2014 1 次提交
    • P
      rcu: Make rcu_nmi_enter() handle nesting · 734d1680
      Paul E. McKenney 提交于
      The x86 architecture has multiple types of NMI-like interrupts: real
      NMIs, machine checks, and, for some values of NMI-like, debugging
      and breakpoint interrupts.  These interrupts can nest inside each
      other.  Andy Lutomirski is adding RCU support to these interrupts,
      so rcu_nmi_enter() and rcu_nmi_exit() must now correctly handle nesting.
      
      This commit therefore introduces nesting, using a clever NMI-coordination
      algorithm suggested by Andy.  The trick is to atomically increment
      ->dynticks (if needed) before manipulating ->dynticks_nmi_nesting on entry
      (and, accordingly, after on exit).  In addition, ->dynticks_nmi_nesting
      is incremented by one if ->dynticks was incremented and by two otherwise.
      This means that when rcu_nmi_exit() sees ->dynticks_nmi_nesting equal
      to one, it knows that ->dynticks must be atomically incremented.
      
      This NMI-coordination algorithms has been validated by the following
      Promela model:
      
      ------------------------------------------------------------------------
      
      /*
       * Promela model for Andy Lutomirski's suggested change to rcu_nmi_enter()
       * that allows nesting.
       *
       * This program is free software; you can redistribute it and/or modify
       * it under the terms of the GNU General Public License as published by
       * the Free Software Foundation; either version 2 of the License, or
       * (at your option) any later version.
       *
       * This program is distributed in the hope that it will be useful,
       * but WITHOUT ANY WARRANTY; without even the implied warranty of
       * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       * GNU General Public License for more details.
       *
       * You should have received a copy of the GNU General Public License
       * along with this program; if not, you can access it online at
       * http://www.gnu.org/licenses/gpl-2.0.html.
       *
       * Copyright IBM Corporation, 2014
       *
       * Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
       */
      
      byte dynticks_nmi_nesting = 0;
      byte dynticks = 0;
      
      /*
       * Promela verision of rcu_nmi_enter().
       */
      inline rcu_nmi_enter()
      {
      	byte incby;
      	byte tmp;
      
      	incby = BUSY_INCBY;
      	assert(dynticks_nmi_nesting >= 0);
      	if
      	:: (dynticks & 1) == 0 ->
      		atomic {
      			dynticks = dynticks + 1;
      		}
      		assert((dynticks & 1) == 1);
      		incby = 1;
      	:: else ->
      		skip;
      	fi;
      	tmp = dynticks_nmi_nesting;
      	tmp = tmp + incby;
      	dynticks_nmi_nesting = tmp;
      	assert(dynticks_nmi_nesting >= 1);
      }
      
      /*
       * Promela verision of rcu_nmi_exit().
       */
      inline rcu_nmi_exit()
      {
      	byte tmp;
      
      	assert(dynticks_nmi_nesting > 0);
      	assert((dynticks & 1) != 0);
      	if
      	:: dynticks_nmi_nesting != 1 ->
      		tmp = dynticks_nmi_nesting;
      		tmp = tmp - BUSY_INCBY;
      		dynticks_nmi_nesting = tmp;
      	:: else ->
      		dynticks_nmi_nesting = 0;
      		atomic {
      			dynticks = dynticks + 1;
      		}
      		assert((dynticks & 1) == 0);
      	fi;
      }
      
      /*
       * Base-level NMI runs non-atomically.  Crudely emulates process-level
       * dynticks-idle entry/exit.
       */
      proctype base_NMI()
      {
      	byte busy;
      
      	busy = 0;
      	do
      	::	/* Emulate base-level dynticks and not. */
      		if
      		:: 1 ->	atomic {
      				dynticks = dynticks + 1;
      			}
      			busy = 1;
      		:: 1 ->	skip;
      		fi;
      
      		/* Verify that we only sometimes have base-level dynticks. */
      		if
      		:: busy == 0 -> skip;
      		:: busy == 1 -> skip;
      		fi;
      
      		/* Model RCU's NMI entry and exit actions. */
      		rcu_nmi_enter();
      		assert((dynticks & 1) == 1);
      		rcu_nmi_exit();
      
      		/* Emulated re-entering base-level dynticks and not. */
      		if
      		:: !busy -> skip;
      		:: busy ->
      			atomic {
      				dynticks = dynticks + 1;
      			}
      			busy = 0;
      		fi;
      
      		/* We had better now be in dyntick-idle mode. */
      		assert((dynticks & 1) == 0);
      	od;
      }
      
      /*
       * Nested NMI runs atomically to emulate interrupting base_level().
       */
      proctype nested_NMI()
      {
      	do
      	::	/*
      		 * Use an atomic section to model a nested NMI.  This is
      		 * guaranteed to interleave into base_NMI() between a pair
      		 * of base_NMI() statements, just as a nested NMI would.
      		 */
      		atomic {
      			/* Verify that we only sometimes are in dynticks. */
      			if
      			:: (dynticks & 1) == 0 -> skip;
      			:: (dynticks & 1) == 1 -> skip;
      			fi;
      
      			/* Model RCU's NMI entry and exit actions. */
      			rcu_nmi_enter();
      			assert((dynticks & 1) == 1);
      			rcu_nmi_exit();
      		}
      	od;
      }
      
      init {
      	run base_NMI();
      	run nested_NMI();
      }
      
      ------------------------------------------------------------------------
      
      The following script can be used to run this model if placed in
      rcu_nmi.spin:
      
      ------------------------------------------------------------------------
      
      if ! spin -a rcu_nmi.spin
      then
      	echo Spin errors!!!
      	exit 1
      fi
      if ! cc -DSAFETY -o pan pan.c
      then
      	echo Compilation errors!!!
      	exit 1
      fi
      ./pan -m100000
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      734d1680
  3. 04 11月, 2014 10 次提交
  4. 30 10月, 2014 1 次提交
  5. 29 10月, 2014 3 次提交
    • P
      rcu: Avoid IPIing idle CPUs from synchronize_sched_expedited() · e0775cef
      Paul E. McKenney 提交于
      Currently, synchronize_sched_expedited() sends IPIs to all online CPUs,
      even those that are idle or executing in nohz_full= userspace.  Because
      idle CPUs and nohz_full= userspace CPUs are in extended quiescent states,
      there is no need to IPI them in the first place.  This commit therefore
      avoids IPIing CPUs that are already in extended quiescent states.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e0775cef
    • P
      rcu: Move RCU_BOOST variable declarations, eliminating #ifdef · 61cfd097
      Paul E. McKenney 提交于
      There are some RCU_BOOST-specific per-CPU variable declarations that
      are needlessly defined under #ifdef in kernel/rcu/tree.c.  This commit
      therefore moves these declarations into a pre-existing #ifdef in
      kernel/rcu/tree_plugin.h.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      61cfd097
    • P
      rcu: Make rcu_barrier() understand about missing rcuo kthreads · d7e29933
      Paul E. McKenney 提交于
      Commit 35ce7f29 (rcu: Create rcuo kthreads only for onlined CPUs)
      avoids creating rcuo kthreads for CPUs that never come online.  This
      fixes a bug in many instances of firmware: Instead of lying about their
      age, these systems instead lie about the number of CPUs that they have.
      Before commit 35ce7f29, this could result in huge numbers of useless
      rcuo kthreads being created.
      
      It appears that experience indicates that I should have told the
      people suffering from this problem to fix their broken firmware, but
      I instead produced what turned out to be a partial fix.   The missing
      piece supplied by this commit makes sure that rcu_barrier() knows not to
      post callbacks for no-CBs CPUs that have not yet come online, because
      otherwise rcu_barrier() will hang on systems having firmware that lies
      about the number of CPUs.
      
      It is tempting to simply have rcu_barrier() refuse to post a callback on
      any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
      does not work because rcu_barrier() is required to wait for all pending
      callbacks.  It is therefore required to wait even for those callbacks
      that cannot possibly be invoked.  Even if doing so hangs the system.
      
      Given that posting a callback to a no-CBs CPU that does not yet have an
      rcuo kthread can hang rcu_barrier(), It is tempting to report an error
      in this case.  Unfortunately, this will result in false positives at
      boot time, when it is perfectly legal to post callbacks to the boot CPU
      before the scheduler has started, in other words, before it is legal
      to invoke rcu_barrier().
      
      So this commit instead has rcu_barrier() avoid posting callbacks to
      CPUs having neither rcuo kthread nor pending callbacks, and has it
      complain bitterly if it finds CPUs having no rcuo kthread but some
      pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
      kthread but pending callbacks, as noted earlier, it has no choice but
      to hang indefinitely.
      Reported-by: NYanko Kaneti <yaneti@declera.com>
      Reported-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Reported-by: NEric B Munson <emunson@akamai.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NEric B Munson <emunson@akamai.com>
      Tested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Tested-by: NYanko Kaneti <yaneti@declera.com>
      Tested-by: NKevin Fenzi <kevin@scrye.com>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      d7e29933
  6. 19 9月, 2014 1 次提交
    • P
      rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42
      Paul E. McKenney 提交于
      Currently, the expedited grace-period primitives do get_online_cpus().
      This greatly simplifies their implementation, but means that calls
      to them holding locks that are acquired by CPU-hotplug notifiers (to
      say nothing of calls to these primitives from CPU-hotplug notifiers)
      can deadlock.  But this is starting to become inconvenient, as can be
      seen here: https://lkml.org/lkml/2014/8/5/754.  The problem in this
      case is that some developers need to acquire a mutex from a CPU-hotplug
      notifier, but also need to hold it across a synchronize_rcu_expedited().
      As noted above, this currently results in deadlock.
      
      This commit avoids the deadlock and retains the simplicity by creating
      a try_get_online_cpus(), which returns false if the get_online_cpus()
      reference count could not immediately be incremented.  If a call to
      try_get_online_cpus() returns true, the expedited primitives operate as
      before.  If a call returns false, the expedited primitives fall back to
      normal grace-period operations.  This falling back of course results in
      increased grace-period latency, but only during times when CPU hotplug
      operations are actually in flight.  The effect should therefore be
      negligible during normal operation.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Tested-by: NLan Tianyu <tianyu.lan@intel.com>
      dd56af42
  7. 17 9月, 2014 2 次提交
  8. 08 9月, 2014 11 次提交
  9. 10 7月, 2014 3 次提交
    • P
      rcu: Remove CONFIG_PROVE_RCU_DELAY · 11992c70
      Paul E. McKenney 提交于
      The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
      effective at finding race conditions, so this commit removes it.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      [ paulmck: Remove definition and uses as noted by Paul Bolle. ]
      11992c70
    • S
      rcu: Use __this_cpu_read() instead of per_cpu_ptr() · d860d403
      Shan Wei 提交于
      The __this_cpu_read() function produces better code than does
      per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
      4.7.3-12ubuntu1) 4.7.3 produces the following:
      
      ARMv7 per_cpu_ptr():
      
      force_quiescent_state:
          mov    r3, sp    @,
          bic    r1, r3, #8128    @ tmp171,,
          ldr    r2, .L98    @ tmp169,
          bic    r1, r1, #63    @ tmp170, tmp171,
          ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)->rda
          ldr    r1, [r1, #20]    @ D.35903_68->cpu, D.35903_68->cpu
          mov    r6, r0    @ rsp, rsp
          ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
          add    r3, r3, r2    @ tmp175, __ptr, tmp173
          ldr    r5, [r3, #12]    @ rnp_old, D.29162_13->mynode
      
      ARMv7 __this_cpu_read():
      
      force_quiescent_state:
          ldr    r3, [r0, #220]    @ rsp_7(D)->rda, rsp_7(D)->rda
          mov    r6, r0    @ rsp, rsp
          add    r3, r3, #12    @ __ptr, rsp_7(D)->rda,
          ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13
      
      Using gcc 4.8.2:
      
      x86_64 per_cpu_ptr():
      
          movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
          movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
          movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
          movq    %rdi, %r13    # rsp, rsp
          movq    1000(%rdi), %rax    # rsp_9(D)->rda, __ptr
          movq    24(%rdx,%rax), %r12    # _15->mynode, rnp_old
      
      x86_64 __this_cpu_read():
      
          movq    %rdi, %r13    # rsp, rsp
          movq    1000(%rdi), %rax    # rsp_9(D)->rda, rsp_9(D)->rda
          movq %gs:24(%rax),%r12    # _10->mynode, rnp_old
      
      Because this change produces significant benefits for these two very
      diverse architectures, this commit makes this change.
      Signed-off-by: NShan Wei <davidshan@tencent.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      d860d403
    • P
      rcu: Don't use NMIs to dump other CPUs' stacks · bc1dce51
      Paul E. McKenney 提交于
      Although NMI-based stack dumps are in principle more accurate, they are
      also more likely to trigger deadlocks.  This commit therefore replaces
      all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
      that the CPU detecting an RCU CPU stall does the stack dumping.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      bc1dce51