1. 04 3月, 2015 1 次提交
  2. 27 2月, 2015 5 次提交
  3. 14 2月, 2015 1 次提交
  4. 12 2月, 2015 1 次提交
    • P
      rcu: Clear need_qs flag to prevent splat · c0135d07
      Paul E. McKenney 提交于
      If the scheduling-clock interrupt sets the current tasks need_qs flag,
      but if the current CPU passes through a quiescent state in the meantime,
      then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool
      RCU into thinking that additional rcu_read_unlock_special() processing
      is needed.  This commit therefore clears the need_qs flag before checking
      for additional processing.
      
      For this problem to occur, we need rcu_preempt_data.passed_quiesce equal
      to true and current->rcu_read_unlock_special.b.need_qs also equal to true.
      This condition can occur as follows:
      
      1.	CPU 0 is aware of the current preemptible RCU grace period,
      	but has not yet passed through a quiescent state.  Among other
      	things, this means that rcu_preempt_data.passed_quiesce is false.
      
      2.	Task A running on CPU 0 enters a preemptible RCU read-side
      	critical section.
      
      3.	CPU 0 takes a scheduling-clock interrupt, which notices the
      	RCU read-side critical section and the need for a quiescent state,
      	and thus sets current->rcu_read_unlock_special.b.need_qs to true.
      
      4.	Task A is preempted, enters the scheduler, eventually invoking
      	rcu_preempt_note_context_switch() which in turn invokes
      	rcu_preempt_qs().
      
      	Because rcu_preempt_data.passed_quiesce is false,
      	control enters the body of the "if" statement, which sets
      	rcu_preempt_data.passed_quiesce to true.
      
      5.	At this point, CPU 0 takes an interrupt.  The interrupt
      	handler contains an RCU read-side critical section, and
      	the rcu_read_unlock() notes that current->rcu_read_unlock_special
      	is nonzero, and thus invokes rcu_read_unlock_special().
      
      6.	Once in rcu_read_unlock_special(), the fact that
      	current->rcu_read_unlock_special.b.need_qs is true becomes
      	apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs().
      	Recursively, given that we interrupted out of that same
      	function in the preceding step.
      
      7.	Because rcu_preempt_data.passed_quiesce is now true,
      	rcu_preempt_qs() does nothing, and simply returns.
      
      8.	Upon return to rcu_read_unlock_special(), it is noted that
      	current->rcu_read_unlock_special is still nonzero (because
      	the interrupted rcu_preempt_qs() had not yet gotten around
      	to clearing current->rcu_read_unlock_special.b.need_qs).
      
      9.	Execution proceeds to the WARN_ON_ONCE(), which notes that
      	we are in an interrupt handler and thus duly splats.
      
      The solution, as noted above, is to make rcu_read_unlock_special()
      clear out current->rcu_read_unlock_special.b.need_qs after calling
      rcu_preempt_qs().  The interrupted rcu_preempt_qs() will clear it again,
      but this is harmless.  The worst that happens is that we clobber another
      attempt to set this field, but this is not a problem because we just
      got done reporting a quiescent state.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Fix embarrassing build bug noted by Sasha Levin. ]
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      c0135d07
  5. 16 1月, 2015 5 次提交
    • P
      rcu: Initialize tiny RCU stall-warning timeouts at boot · 630181c4
      Paul E. McKenney 提交于
      The current tiny RCU stall-warning code assumes that the jiffies counter
      starts at zero, however, it is sometimes initialized to other values,
      for example, -30,000.  This commit therefore changes rcu_init() to
      invoke reset_cpu_stall_ticks() for both flavors of RCU to initialize
      the stall-warning times properly at boot.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      630181c4
    • M
      rcu: Fix RCU CPU stall detection in tiny implementation · ec1fe396
      Miroslav Benes 提交于
      The tiny RCU CPU stall detection depends on *rcp->curtail not being
      NULL. It is however a tail pointer and thus NULL by definition. Instead we
      should check rcp->rcucblist for the presence of pending callbacks which
      need to be processed. With this fix INFO about the stall is printed and
      jiffies_stall (jiffies at next stall) correctly updated.
      
      Note that the check for pending callback is necessary to avoid spurious
      warnings if there are no pendings callbacks.
      Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
      [ paulmck: Fused identical "if" statements, ported to -rcu. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ec1fe396
    • P
      rcu: Add GP-kthread-starvation checks to CPU stall warnings · fb81a44b
      Paul E. McKenney 提交于
      This commit adds a message that is printed if the relevant grace-period
      kthread has not been able to run for the two seconds preceding the
      stall warning.  (The two seconds is double the maximum interval between
      successive bouts of quiescent-state forcing.)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fb81a44b
    • P
      rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193
      Paul E. McKenney 提交于
      Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
      in places where it would be useful for it to apply to the normal RCU
      flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
      case for workloads that aggressively overload the system, particularly
      those that generate large numbers of RCU updates on systems running
      NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
      from cond_resched_rcu_qs() to the normal RCU flavors.
      
      Note that it is unfortunately necessary to leave the old ->passed_quiesce
      mechanism in place to allow quiescent states that apply to only one
      flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
      that case, but that is not so good for debugging of RCU internals.)
      In addition, if one of the RCU flavor's grace period has stalled, this
      will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
      quiescent state visible from other CPUs.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
        was used in preemptible code. ]
      5cd37193
    • P
      rcu: Optionally run grace-period kthreads at real-time priority · a94844b2
      Paul E. McKenney 提交于
      Recent testing has shown that under heavy load, running RCU's grace-period
      kthreads at real-time priority can improve performance (according to 0day
      test robot) and reduce the incidence of RCU CPU stall warnings.  However,
      most systems do just fine with the default non-realtime priorities for
      these kthreads, and it does not make sense to expose the entire user
      base to any risk stemming from this change, given that this change is
      of use only to a few users running extremely heavy workloads.
      
      Therefore, this commit allows users to specify realtime priorities
      for the grace-period kthreads, but leaves them running SCHED_OTHER
      by default.  The realtime priority may be specified at build time
      via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
      rcutree.kthread_prio parameter.  Either way, 0 says to continue the
      default SCHED_OTHER behavior and values from 1-99 specify that priority
      of SCHED_FIFO behavior.  Note that a value of 0 is not permitted when
      the RCU_BOOST Kconfig parameter is specified.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a94844b2
  6. 11 1月, 2015 6 次提交
  7. 07 1月, 2015 21 次提交
    • P
      rcu: Handle gpnum/completed wrap while dyntick idle · e3663b10
      Paul E. McKenney 提交于
      Subtle race conditions can result if a CPU stays in dyntick-idle mode
      long enough for the ->gpnum and ->completed fields to wrap.  For
      example, consider the following sequence of events:
      
      o	CPU 1 encounters a quiescent state while waiting for grace period
      	5 to complete, but then enters dyntick-idle mode.
      
      o	While CPU 1 is in dyntick-idle mode, the grace-period counters
      	wrap around so that the grace period number is now 4.
      
      o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
      	and grace period 5 begins.
      
      o	The quiescent state that CPU 1 passed through during the old
      	grace period 5 looks like it applies to the new grace period
      	5.  Therefore, the new grace period 5 completes without CPU 1
      	having passed through a quiescent state.
      
      This could clearly be a fatal surprise to any long-running RCU read-side
      critical section that happened to be running on CPU 1 at the time.  At one
      time, this was not a problem, given that it takes significant time for
      the grace-period counters to overflow even on 32-bit systems.  However,
      with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
      idle periods are now becoming quite feasible.  It is therefore time to
      close this race.
      
      This commit therefore avoids this race condition by having the
      quiescent-state forcing code detect when a CPU is falling too far
      behind, and setting a new rcu_data field ->gpwrap when this happens.
      Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
      fields are known to be untrustworthy, and can be ignored, along with
      any associated quiescent states.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e3663b10
    • P
      rcu: Improve diagnostics for spurious RCU CPU stall warnings · 6ccd2ecd
      Paul E. McKenney 提交于
      The current RCU CPU stall warning code will print "Stall ended before
      state dump start" any time that the stall-warning code is triggered on
      a CPU that has already reported a quiescent state for the current grace
      period and if all quiescent states have been reported for the current
      grace period.  However, a true stall can result in these symptoms, for
      example, by preventing RCU's grace-period kthreads from ever running
      
      This commit therefore checks for this condition, reporting the end of
      the stall only if one of the grace-period counters has actually advanced.
      Otherwise, it reports the last time that the grace-period kthread made
      meaningful progress.  (In normal situations, the grace-period kthread
      should make meaningful progress at least every jiffies_till_next_fqs
      jiffies.)
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMiroslav Benes <mbenes@suse.cz>
      6ccd2ecd
    • P
      rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts · fc908ed3
      Paul E. McKenney 提交于
      One way that an RCU CPU stall warning can happen is if the grace-period
      kthread is not allowed to execute.  One proxy for this kthread's
      forward progress is the number of force-quiescent-state (fqs) scans.
      This commit therefore adds the number of fqs scans to the RCU CPU stall
      warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fc908ed3
    • P
      rcu: Make SRCU optional by using CONFIG_SRCU · 83fe27ea
      Pranith Kumar 提交于
      SRCU is not necessary to be compiled by default in all cases. For tinification
      efforts not compiling SRCU unless necessary is desirable.
      
      The current patch tries to make compiling SRCU optional by introducing a new
      Kconfig option CONFIG_SRCU which is selected when any of the components making
      use of SRCU are selected.
      
      If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
      
         text    data     bss     dec     hex filename
         2007       0       0    2007     7d7 kernel/rcu/srcu.o
      
      Size of arch/powerpc/boot/zImage changes from
      
         text    data     bss     dec     hex filename
       831552   64180   23944  919676   e087c arch/powerpc/boot/zImage : before
       829504   64180   23952  917636   e0084 arch/powerpc/boot/zImage : after
      
      so the savings are about ~2000 bytes.
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Josh Triplett <josh@joshtriplett.org>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
      83fe27ea
    • P
      rcu: Expand SRCU ->completed to 64 bits · a5c198f4
      Paul E. McKenney 提交于
      When rcutorture used only the low-order 32 bits of the grace-period
      number, it was not a problem for SRCU to use a 32-bit completed field.
      However, rcutorture now uses the full 64 bits on 64-bit systems, so
      this commit converts SRCU's ->completed field to unsigned long so as to
      provide 64 bits on 64-bit systems.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a5c198f4
    • P
      rcu: Remove redundant callback-list initialization · ab954c16
      Paul E. McKenney 提交于
      The RCU callback lists are initialized in both rcu_boot_init_percpu_data()
      and rcu_init_percpu_data().  The former is intended for initializing
      immutable data, so this commit removes the initialization from
      rcu_boot_init_percpu_data() and leaves it in rcu_init_percpu_data().
      This change prepares for permitting callbacks to be queued very early
      in boot.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      ab954c16
    • P
      rcu: Don't scan root rcu_node structure for stalled tasks · 6cd534ef
      Paul E. McKenney 提交于
      Now that blocked tasks are no longer migrated to the root rcu_node
      structure, there is no need to scan the root rcu_node structure for
      blocked tasks stalling the current grace period.  This commit therefore
      removes this scan.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6cd534ef
    • L
      rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion · abaf3f9d
      Lai Jiangshan 提交于
      The patch dfeb9765 ("Allow post-unlock reference for rt_mutex")
      ensured rcu-boost safe even the rt_mutex has post-unlock reference.
      
      But rt_mutex allowing post-unlock reference is definitely a bug and it was
      fixed by the commit 27e35715 ("rtmutex: Plug slow unlock race").
      This fix made the previous patch (dfeb9765) useless.
      
      And even worse, the priority-inversion introduced by the the previous
      patch still exists.
      
      rcu_read_unlock_special() {
      	rt_mutex_unlock(&rnp->boost_mtx);
      	/* Priority-Inversion:
      	 * the current task had been deboosted and preempted as a low
      	 * priority task immediately, it could wait long before reschedule in,
      	 * and the rcu-booster also waits on this low priority task and sleeps.
      	 * This priority-inversion makes rcu-booster can't work
      	 * as expected.
      	 */
      	complete(&rnp->boost_completion);
      }
      
      Just revert the patch to avoid it.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abaf3f9d
    • P
      rcu: Note quiescent state when CPU goes offline · 3ba4d0e0
      Paul E. McKenney 提交于
      The rcu_cleanup_dead_cpu() function (called after a CPU has gone
      completely offline) has not reported a quiescent state because there
      was probably at least one synchronize_rcu() between the time the CPU
      went offline and the CPU_DEAD notifier, and this would have detected
      the CPU's offline state via quiescent-state forcing.  However, the plan
      is for CPUs to take themselves offline, at which point it makes sense
      for them to report their own quiescent state.  This commit makes this
      change in preparation for the new CPU-hotplug setup.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3ba4d0e0
    • P
      rcu: Don't bother affinitying rcub kthreads away from offline CPUs · 5d0b0249
      Paul E. McKenney 提交于
      When rcu_boost_kthread_setaffinity() sees that all CPUs for a given
      rcu_node structure are now offline, it affinities the corresponding
      RCU-boost ("rcub") kthread away from those CPUs.  This is pointless
      because the kthread cannot run on those offline CPUs in any case.
      This commit therefore removes this unneeded code.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5d0b0249
    • P
      rcu: Don't initiate RCU priority boosting on root rcu_node · 1be0085b
      Paul E. McKenney 提交于
      Because there is no longer any preempted tasks on the root rcu_node, and
      because there is no longer ever an rcub kthread for the root rcu_node,
      this commit drops the code in force_qs_rnp() that attempts to awaken
      the non-existent root rcub kthread.  This is strictly a performance
      enhancement, removing a root rcu_node ->lock acquisition and release
      along with some tests in rcu_initiate_boost(), ending with the test that
      notes that there is no rcub kthread.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1be0085b
    • P
      rcu: Don't spawn rcub kthreads on root rcu_node structure · 3e9f5c70
      Paul E. McKenney 提交于
      Now that offlining CPUs no longer moves leaf rcu_node structures'
      ->blkd_tasks lists to the root, there is no way for the root rcu_node
      structure's ->blkd_task list to be nonempty, unless the root node is also
      the sole leaf node.  This commit therefore refrains from creating an rcub
      kthread for the root rcu_node structure unless it is also the sole leaf.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3e9f5c70
    • P
      rcu: Make use of rcu_preempt_has_tasks() · 96e92021
      Paul E. McKenney 提交于
      Given that there is now arcu_preempt_has_tasks() function that checks
      to see if the ->blkd_tasks list is non-empty, this commit makes use of it.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      96e92021
    • P
      rcu: Shorten irq-disable region in rcu_cleanup_dead_cpu() · a8f4cbad
      Paul E. McKenney 提交于
      Now that we are not migrating callbacks, there is no need to hold the
      ->orphan_lock across the the ->qsmaskinit bit-clearing process.
      This commit therefore releases ->orphan_lock immediately after adopting
      the orphaned RCU callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a8f4cbad
    • P
      rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1
      Paul E. McKenney 提交于
      When the last CPU associated with a given leaf rcu_node structure
      goes offline, something must be done about the tasks queued on that
      rcu_node structure.  Each of these tasks has been preempted on one of
      the leaf rcu_node structure's CPUs while in an RCU read-side critical
      section that it have not yet exited.  Handling these tasks is the job of
      rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
      structure to the root rcu_node structure.
      
      Unfortunately, this migration has to be done one task at a time because
      each tasks allegiance must be shifted from the original leaf rcu_node to
      the root, so that future attempts to deal with these tasks will acquire
      the root rcu_node structure's ->lock rather than that of the leaf.
      Worse yet, this migration must be done with interrupts disabled, which
      is not so good for realtime response, especially given that there is
      no bound on the number of tasks on a given rcu_node structure's list.
      (OK, OK, there is a bound, it is just that it is unreasonably large,
      especially on 64-bit systems.)  This was not considered a problem back
      when rcu_preempt_offline_tasks() was first written because realtime
      systems were assumed not to do CPU-hotplug operations while real-time
      applications were running.  This assumption has proved of dubious validity
      given that people are starting to run multiple realtime applications
      on a single SMP system and that it is common practice to offline then
      online a CPU before starting its real-time application in order to clear
      extraneous processing off of that CPU.  So we now need CPU hotplug
      operations to avoid undue latencies.
      
      This commit therefore avoids migrating these tasks, instead letting
      them be dequeued one by one from the original leaf rcu_node structure
      by rcu_read_unlock_special().  This means that the clearing of bits
      from the upper-level rcu_node structures must be deferred until the
      last such task has been dequeued, because otherwise subsequent grace
      periods won't wait on them.  This commit has the beneficial side effect
      of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
      CONFIG_RCU_BOOST builds.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d19fb8d1
    • P
      rcu: Make rcu_read_unlock_special() propagate ->qsmaskinit bit clearing · b6a932d1
      Paul E. McKenney 提交于
      This commit causes rcu_read_unlock_special() to propagate ->qsmaskinit
      bit clearing up the rcu_node tree once a given rcu_node structure's
      blkd_tasks list becomes empty.  This is the final commit in preparation
      for the rework of RCU priority boosting:  It enables preempted tasks to
      remain queued on their rcu_node structure even after all of that rcu_node
      structure's CPUs have gone offline.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b6a932d1
    • P
      rcu: Abstract rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu() · 8af3a5e7
      Paul E. McKenney 提交于
      This commit abstracts rcu_cleanup_dead_rnp() from rcu_cleanup_dead_cpu()
      in preparation for the rework of RCU priority boosting.  This new function
      will be invoked from rcu_read_unlock_special() in the reworked scheme,
      which is why rcu_cleanup_dead_rnp() assumes that the leaf rcu_node
      structure's ->qsmaskinit field has already been updated.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8af3a5e7
    • P
      rcu: Rename "empty" to "empty_norm" in preparation for boost rework · 74e871ac
      Paul E. McKenney 提交于
      This commit undertakes a simple variable renaming to make way for
      some rework of RCU priority boosting.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      74e871ac
    • P
      rcu: Protect rcu_boost() lockless accesses with ACCESS_ONCE() · b08ea27d
      Paul E. McKenney 提交于
      This commit prevents random compiler optimizations by applying
      ACCESS_ONCE() to lockless accesses.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b08ea27d
    • L
      rcu: Remove "select IRQ_WORK" from config TREE_RCU · 5a43b88e
      Lai Jiangshan 提交于
      The 48a7639c ("rcu: Make callers awaken grace-period kthread")
      removed the irq_work_queue(), so the TREE_RCU doesn't need
      irq work any more.  This commit therefore updates RCU's Kconfig and
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5a43b88e
    • P
      rcu: Fix rcu_barrier() race that could result in too-short wait · 41050a00
      Paul E. McKenney 提交于
      The rcu_barrier() no-callbacks check for no-CBs CPUs has race conditions.
      It checks a given CPU's lists of callbacks, and if all three no-CBs lists
      are empty, ignores that CPU.  However, these three lists could potentially
      be empty even when callbacks are present if the check executed just as
      the callbacks were being moved from one list to another.  It turns out
      that recent versions of rcutorture can spot this race.
      
      This commit plugs this hole by consolidating the per-list counts of
      no-CBs callbacks into a single count, which is incremented before
      the corresponding callback is posted and after it is invoked.  Then
      rcu_barrier() checks this single count to reliably determine whether
      the corresponding CPU has no-CBs callbacks.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      41050a00