1. 08 10月, 2015 2 次提交
    • P
      rcu: Stop silencing lockdep false positive for expedited grace periods · 83c2c735
      Paul E. McKenney 提交于
      This reverts commit af859bea (rcu: Silence lockdep false positive
      for expedited grace periods).  Because synchronize_rcu_expedited()
      no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
      acquisition is no longer nested, so the false positive no longer happens.
      This commit therefore removes the extra lockdep data structures, as they
      are no longer needed.
      83c2c735
    • P
      rcu: Switch synchronize_sched_expedited() to IPI · 6587a23b
      Paul E. McKenney 提交于
      This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
      to smp_call_function_single(), thus moving from an IPI and a pair of
      context switches to an IPI and a single pass through the scheduler.
      Of course, if the scheduler actually does decide to switch to a different
      task, there will still be a pair of context switches, but there would
      likely have been a pair of context switches anyway, just a bit later.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      6587a23b
  2. 21 9月, 2015 5 次提交
    • P
      rcu: Make ->cpu_no_qs be a union for aggregate OR · 5b74c458
      Paul E. McKenney 提交于
      This commit converts the rcu_data structure's ->cpu_no_qs field
      to a union.  The bytewise side of this union allows individual access
      to indications as to whether this CPU needs to find a quiescent state
      for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
      side of the union allows testing whether or not a quiescent state is
      needed at all, for either type of grace period.
      
      For now, only .norm is used.  A later commit will introduce the expedited
      usage.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      5b74c458
    • P
      rcu: Invert passed_quiesce and rename to cpu_no_qs · 0d43eb34
      Paul E. McKenney 提交于
      This commit inverts the sense of the rcu_data structure's ->passed_quiesce
      field and renames it to ->cpu_no_qs.  This will allow a later commit to
      use an "aggregate OR" operation to test expedited as well as normal grace
      periods without added overhead.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0d43eb34
    • P
      rcu: Rename qs_pending to core_needs_qs · 97c668b8
      Paul E. McKenney 提交于
      An upcoming commit needs to invert the sense of the ->passed_quiesce
      rcu_data structure field, so this commit is taking this opportunity
      to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.
      
      So if !rdp->core_needs_qs, then this CPU need not concern itself with
      quiescent states, in particular, it need not acquire its leaf rcu_node
      structure's ->lock to check.  Otherwise, it needs to report the next
      quiescent state.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      97c668b8
    • P
      rcu: Move synchronize_sched_expedited() to combining tree · bce5fa12
      Paul E. McKenney 提交于
      Currently, synchronize_sched_expedited() uses a single global counter
      to track the number of remaining context switches that the current
      expedited grace period must wait on.  This is problematic on large
      systems, where the resulting memory contention can be pathological.
      This commit therefore makes synchronize_sched_expedited() instead use
      the combining tree in the same manner as synchronize_rcu_expedited(),
      keeping memory contention down to a dull roar.
      
      This commit creates a temporary function sync_sched_exp_select_cpus()
      that is very similar to sync_rcu_exp_select_cpus().  A later commit
      will consolidate these two functions, which becomes possible when
      synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
      smp_call_function_single().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bce5fa12
    • P
      rcu: Consolidate tree setup for synchronize_rcu_expedited() · b9585e94
      Paul E. McKenney 提交于
      This commit replaces sync_rcu_preempt_exp_init1(() and
      sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
      and sync_exp_reset_tree(), which will also be used by
      synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
      contains code specific to synchronize_rcu_expedited().
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      b9585e94
  3. 04 8月, 2015 2 次提交
    • P
      rcu,locking: Privatize smp_mb__after_unlock_lock() · 12d560f4
      Paul E. McKenney 提交于
      RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
      likely the only thing that ever will use it, so this commit makes this
      macro private to RCU.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
      12d560f4
    • P
      rcu: Silence lockdep false positive for expedited grace periods · af859bea
      Paul E. McKenney 提交于
      In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited()
      acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes
      synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in
      rcu_sched_state.  There can be no deadlock because rcu_preempt_state
      ->exp_funnel_mutex acquisition always precedes that of rcu_sched_state.
      But lockdep does not know that, so it gives false-positive splats.
      
      This commit therefore associates a separate lock_class_key structure
      with the rcu_sched_state structure's ->exp_funnel_mutex, allowing
      lockdep to see the lock ordering, avoiding the false positives.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      af859bea
  4. 18 7月, 2015 11 次提交
  5. 16 7月, 2015 6 次提交
  6. 28 5月, 2015 4 次提交
  7. 13 3月, 2015 2 次提交
    • P
      rcu: Eliminate ->onoff_mutex from rcu_node structure · c1990689
      Paul E. McKenney 提交于
      Because that RCU grace-period initialization need no longer exclude
      CPU-hotplug operations, this commit eliminates the ->onoff_mutex and
      its uses.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c1990689
    • P
      rcu: Process offlining and onlining only at grace-period start · 0aa04b05
      Paul E. McKenney 提交于
      Races between CPU hotplug and grace periods can be difficult to resolve,
      so the ->onoff_mutex is used to exclude the two events.  Unfortunately,
      this means that it is impossible for an outgoing CPU to perform the
      last bits of its offlining from its last pass through the idle loop,
      because sleeplocks cannot be acquired in that context.
      
      This commit avoids these problems by buffering online and offline events
      in a new ->qsmaskinitnext field in the leaf rcu_node structures.  When a
      grace period starts, the events accumulated in this mask are applied to
      the ->qsmaskinit field, and, if needed, up the rcu_node tree.  The special
      case of all CPUs corresponding to a given leaf rcu_node structure being
      offline while there are still elements in that structure's ->blkd_tasks
      list is handled using a new ->wait_blkd_tasks field.  In this case,
      propagating the offline bits up the tree is deferred until the beginning
      of the grace period after all of the tasks have exited their RCU read-side
      critical sections and removed themselves from the list, at which point
      the ->wait_blkd_tasks flag is cleared.  If one of that leaf rcu_node
      structure's CPUs comes back online before the list empties, then the
      ->wait_blkd_tasks flag is simply cleared.
      
      This of course means that RCU's notion of which CPUs are offline can be
      out of date.  This is OK because RCU need only wait on CPUs that were
      online at the time that the grace period started.  In addition, RCU's
      force-quiescent-state actions will handle the case where a CPU goes
      offline after the grace period starts.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0aa04b05
  8. 16 1月, 2015 1 次提交
    • P
      rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193
      Paul E. McKenney 提交于
      Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
      in places where it would be useful for it to apply to the normal RCU
      flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
      case for workloads that aggressively overload the system, particularly
      those that generate large numbers of RCU updates on systems running
      NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
      from cond_resched_rcu_qs() to the normal RCU flavors.
      
      Note that it is unfortunately necessary to leave the old ->passed_quiesce
      mechanism in place to allow quiescent states that apply to only one
      flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
      that case, but that is not so good for debugging of RCU internals.)
      In addition, if one of the RCU flavor's grace period has stalled, this
      will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
      quiescent state visible from other CPUs.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
        was used in preemptible code. ]
      5cd37193
  9. 11 1月, 2015 2 次提交
  10. 07 1月, 2015 5 次提交
    • P
      rcu: Handle gpnum/completed wrap while dyntick idle · e3663b10
      Paul E. McKenney 提交于
      Subtle race conditions can result if a CPU stays in dyntick-idle mode
      long enough for the ->gpnum and ->completed fields to wrap.  For
      example, consider the following sequence of events:
      
      o	CPU 1 encounters a quiescent state while waiting for grace period
      	5 to complete, but then enters dyntick-idle mode.
      
      o	While CPU 1 is in dyntick-idle mode, the grace-period counters
      	wrap around so that the grace period number is now 4.
      
      o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
      	and grace period 5 begins.
      
      o	The quiescent state that CPU 1 passed through during the old
      	grace period 5 looks like it applies to the new grace period
      	5.  Therefore, the new grace period 5 completes without CPU 1
      	having passed through a quiescent state.
      
      This could clearly be a fatal surprise to any long-running RCU read-side
      critical section that happened to be running on CPU 1 at the time.  At one
      time, this was not a problem, given that it takes significant time for
      the grace-period counters to overflow even on 32-bit systems.  However,
      with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
      idle periods are now becoming quite feasible.  It is therefore time to
      close this race.
      
      This commit therefore avoids this race condition by having the
      quiescent-state forcing code detect when a CPU is falling too far
      behind, and setting a new rcu_data field ->gpwrap when this happens.
      Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
      fields are known to be untrustworthy, and can be ignored, along with
      any associated quiescent states.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      e3663b10
    • P
      rcu: Improve diagnostics for spurious RCU CPU stall warnings · 6ccd2ecd
      Paul E. McKenney 提交于
      The current RCU CPU stall warning code will print "Stall ended before
      state dump start" any time that the stall-warning code is triggered on
      a CPU that has already reported a quiescent state for the current grace
      period and if all quiescent states have been reported for the current
      grace period.  However, a true stall can result in these symptoms, for
      example, by preventing RCU's grace-period kthreads from ever running
      
      This commit therefore checks for this condition, reporting the end of
      the stall only if one of the grace-period counters has actually advanced.
      Otherwise, it reports the last time that the grace-period kthread made
      meaningful progress.  (In normal situations, the grace-period kthread
      should make meaningful progress at least every jiffies_till_next_fqs
      jiffies.)
      Reported-by: NMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NMiroslav Benes <mbenes@suse.cz>
      6ccd2ecd
    • P
      rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts · fc908ed3
      Paul E. McKenney 提交于
      One way that an RCU CPU stall warning can happen is if the grace-period
      kthread is not allowed to execute.  One proxy for this kthread's
      forward progress is the number of force-quiescent-state (fqs) scans.
      This commit therefore adds the number of fqs scans to the RCU CPU stall
      warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fc908ed3
    • L
      rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion · abaf3f9d
      Lai Jiangshan 提交于
      The patch dfeb9765 ("Allow post-unlock reference for rt_mutex")
      ensured rcu-boost safe even the rt_mutex has post-unlock reference.
      
      But rt_mutex allowing post-unlock reference is definitely a bug and it was
      fixed by the commit 27e35715 ("rtmutex: Plug slow unlock race").
      This fix made the previous patch (dfeb9765) useless.
      
      And even worse, the priority-inversion introduced by the the previous
      patch still exists.
      
      rcu_read_unlock_special() {
      	rt_mutex_unlock(&rnp->boost_mtx);
      	/* Priority-Inversion:
      	 * the current task had been deboosted and preempted as a low
      	 * priority task immediately, it could wait long before reschedule in,
      	 * and the rcu-booster also waits on this low priority task and sleeps.
      	 * This priority-inversion makes rcu-booster can't work
      	 * as expected.
      	 */
      	complete(&rnp->boost_completion);
      }
      
      Just revert the patch to avoid it.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      abaf3f9d
    • P
      rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1
      Paul E. McKenney 提交于
      When the last CPU associated with a given leaf rcu_node structure
      goes offline, something must be done about the tasks queued on that
      rcu_node structure.  Each of these tasks has been preempted on one of
      the leaf rcu_node structure's CPUs while in an RCU read-side critical
      section that it have not yet exited.  Handling these tasks is the job of
      rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
      structure to the root rcu_node structure.
      
      Unfortunately, this migration has to be done one task at a time because
      each tasks allegiance must be shifted from the original leaf rcu_node to
      the root, so that future attempts to deal with these tasks will acquire
      the root rcu_node structure's ->lock rather than that of the leaf.
      Worse yet, this migration must be done with interrupts disabled, which
      is not so good for realtime response, especially given that there is
      no bound on the number of tasks on a given rcu_node structure's list.
      (OK, OK, there is a bound, it is just that it is unreasonably large,
      especially on 64-bit systems.)  This was not considered a problem back
      when rcu_preempt_offline_tasks() was first written because realtime
      systems were assumed not to do CPU-hotplug operations while real-time
      applications were running.  This assumption has proved of dubious validity
      given that people are starting to run multiple realtime applications
      on a single SMP system and that it is common practice to offline then
      online a CPU before starting its real-time application in order to clear
      extraneous processing off of that CPU.  So we now need CPU hotplug
      operations to avoid undue latencies.
      
      This commit therefore avoids migrating these tasks, instead letting
      them be dequeued one by one from the original leaf rcu_node structure
      by rcu_read_unlock_special().  This means that the clearing of bits
      from the upper-level rcu_node structures must be deferred until the
      last such task has been dequeued, because otherwise subsequent grace
      periods won't wait on them.  This commit has the beneficial side effect
      of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
      CONFIG_RCU_BOOST builds.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d19fb8d1