提交 · dcdb8807ba0f5127d4dc67daeeb154edf50a93d1 · openeuler / raspberrypi-kernel

08 10月, 2015 2 次提交

rcu: Stop silencing lockdep false positive for expedited grace periods · 83c2c735

由 Paul E. McKenney 提交于 8月 06, 2015

This reverts commit af859bea (rcu: Silence lockdep false positive
for expedited grace periods).  Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.

83c2c735

rcu: Switch synchronize_sched_expedited() to IPI · 6587a23b

由 Paul E. McKenney 提交于 8月 06, 2015

This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

6587a23b

21 9月, 2015 5 次提交

rcu: Make ->cpu_no_qs be a union for aggregate OR · 5b74c458

由 Paul E. McKenney 提交于 8月 06, 2015

This commit converts the rcu_data structure's ->cpu_no_qs field
to a union.  The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used.  A later commit will introduce the expedited
usage.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5b74c458

rcu: Invert passed_quiesce and rename to cpu_no_qs · 0d43eb34

由 Paul E. McKenney 提交于 8月 06, 2015

This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs. This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0d43eb34

rcu: Rename qs_pending to core_needs_qs · 97c668b8

由 Paul E. McKenney 提交于 8月 06, 2015

An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check.  Otherwise, it needs to report the next
quiescent state.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

97c668b8

rcu: Move synchronize_sched_expedited() to combining tree · bce5fa12

由 Paul E. McKenney 提交于 8月 05, 2015

Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on. This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.

This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus(). A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bce5fa12

rcu: Consolidate tree setup for synchronize_rcu_expedited() · b9585e94

由 Paul E. McKenney 提交于 7月 31, 2015

This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b9585e94

04 8月, 2015 2 次提交

rcu,locking: Privatize smp_mb__after_unlock_lock() · 12d560f4

由 Paul E. McKenney 提交于 7月 14, 2015

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>

12d560f4

rcu: Silence lockdep false positive for expedited grace periods · af859bea

由 Paul E. McKenney 提交于 7月 19, 2015

In a CONFIG_PREEMPT=y kernel, synchronize_rcu_expedited()
acquires the ->exp_funnel_mutex in rcu_preempt_state, then invokes
synchronize_sched_expedited, which acquires the ->exp_funnel_mutex in
rcu_sched_state.  There can be no deadlock because rcu_preempt_state
->exp_funnel_mutex acquisition always precedes that of rcu_sched_state.
But lockdep does not know that, so it gives false-positive splats.

This commit therefore associates a separate lock_class_key structure
with the rcu_sched_state structure's ->exp_funnel_mutex, allowing
lockdep to see the lock ordering, avoiding the false positives.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

af859bea

18 7月, 2015 11 次提交

rcu: Add fastpath bypassing funnel locking · cdacbe1f

由 Paul E. McKenney 提交于 7月 11, 2015

In the common case, there will be only one expedited grace period in
the system at a given time, in which case it is not helpful to use
funnel locking. This commit therefore adds a fastpath that bypasses
funnel locking when the root ->exp_funnel_mutex is not held.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

cdacbe1f

rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS · 32bb1c79

由 Paul E. McKenney 提交于 7月 02, 2015

The grace-period kthread sleeps waiting to do a force-quiescent-state
scan, and when awakened sets rsp->gp_state to RCU_GP_DONE_FQS.
However, this is confusing because the kthread has not done the
force-quiescent-state, but is instead just starting to do it.  This commit
therefore renames RCU_GP_DONE_FQS to RCU_GP_DOING_FQS in order to make
things a bit easier on reviewers.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

32bb1c79

rcu: Add stall warnings to synchronize_sched_expedited() · cf3620a6

由 Paul E. McKenney 提交于 6月 30, 2015

Although synchronize_sched_expedited() historically has no RCU CPU stall
warnings, the availability of the rcupdate.rcu_expedited boot parameter
invalidates the old assumption that synchronize_sched()'s stall warnings
would suffice. This commit therefore adds RCU CPU stall warnings to
synchronize_sched_expedited().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

cf3620a6

rcu: Extend expedited funnel locking to rcu_data structure · 2cd6ffaf

由 Paul E. McKenney 提交于 6月 29, 2015

The strictly rcu_node based funnel-locking scheme works well in many
cases, but systems with CONFIG_RCU_FANOUT_LEAF=64 won't necessarily get
all that much concurrency.  This commit therefore extends the funnel
locking into the per-CPU rcu_data structure, providing concurrency equal
to the number of CPUs.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2cd6ffaf

rcu: Apply rcu_seq operations to _rcu_barrier() · 4f525a52

由 Paul E. McKenney 提交于 6月 26, 2015

The rcu_seq operations were open-coded in _rcu_barrier(), so this commit
replaces the open-coding with the shiny new rcu_seq operations.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4f525a52

rcu: Make expedited GP CPU stoppage asynchronous · 3a6d7c64

由 Peter Zijlstra 提交于 6月 25, 2015

Sequentially stopping the CPUs slows down expedited grace periods by
at least a factor of two, based on rcutorture's grace-period-per-second
rate. This is a conservative measure because rcutorture uses unusually
long RCU read-side critical sections and because rcutorture periodically
quiesces the system in order to test RCU's ability to ramp down to and
up from the idle state. This commit therefore replaces the stop_one_cpu()
with stop_one_cpu_nowait(), using an atomic-counter scheme to determine
when all CPUs have passed through the stopped state.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3a6d7c64

rcu: Get rid of synchronize_sched_expedited()'s polling loop · 385b73c0

由 Paul E. McKenney 提交于 6月 24, 2015

This commit gets rid of synchronize_sched_expedited()'s mutex_trylock()
polling loop in favor of a funnel-locking scheme based on the rcu_node
tree.  The work-done check is done at each level of the tree, allowing
high-contention situations to be resolved quickly with reasonable levels
of mutex contention.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

385b73c0

rcu: Rework synchronize_sched_expedited() counter handling · d6ada2cf

由 Paul E. McKenney 提交于 6月 24, 2015

Now that synchronize_sched_expedited() have a mutex, it can use simpler
work-already-done detection scheme. This commit simplifies this scheme
by using something similar to the sequence-locking counter scheme.
A counter is incremented before and after each grace period, so that
the counter is odd in the midst of the grace period and even otherwise.
So if the counter has advanced to the second even number that is
greater than or equal to the snapshot, the required grace period has
already happened.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d6ada2cf

rcu: Switch synchronize_sched_expedited() to stop_one_cpu() · c190c3b1

由 Peter Zijlstra 提交于 6月 23, 2015

The synchronize_sched_expedited() currently invokes try_stop_cpus(),
which schedules the stopper kthreads on each online non-idle CPU,
and waits until all those kthreads are running before letting any
of them stop.  This is disastrous for real-time workloads, which
get hit with a preemption that is as long as the longest scheduling
latency on any CPU, including any non-realtime housekeeping CPUs.
This commit therefore switches to using stop_one_cpu() on each CPU
in turn.  This avoids inflicting the worst-case scheduling latency
on the worst-case CPU onto all other CPUs, and also simplifies the
code a little bit.

Follow-up commits will simplify the counter-snapshotting algorithm
and convert a number of the counters that are now protected by the
new ->expedited_mutex to non-atomic.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
[ paulmck: Kept stop_one_cpu(), dropped disabling of "guardrails". ]
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c190c3b1

rcu: Remove CONFIG_RCU_CPU_STALL_INFO · 75c27f11

由 Paul E. McKenney 提交于 6月 11, 2015

The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of
releases with no complaints, so it is time to eliminate this Kconfig
option entirely, so that the long-form RCU CPU stall warnings cannot
be disabled.  This commit does just that.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

75c27f11

rcu: Shut up bogus gcc array bounds warning · 032dfc87

由 Alexander Gordeev 提交于 7月 09, 2015

Because gcc does not realize a loop would not be entered ever
(i.e. in case of rcu_num_lvls == 1):

  for (i = 1; i < rcu_num_lvls; i++)
	  rsp->level[i] = rsp->level[i - 1] + levelcnt[i - 1];

some compiler (pre- 5.x?) versions give a bogus warning:

  kernel/rcu/tree.c: In function ‘rcu_init_one.isra.55’:
  kernel/rcu/tree.c:4108:13: warning: array subscript is above array bounds [-Warray-bounds]
     rsp->level[i] = rsp->level[i - 1] + rsp->levelcnt[i - 1];
               ^
Fix that warning by adding an extra item to rcu_state::level[]
array. Once the bogus warning is fixed in gcc and kernel drops
support of older versions, the dummy item may be removed from
the array.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Suggested-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

032dfc87

16 7月, 2015 6 次提交

rcu: Simplify arithmetic to calculate number of RCU nodes · 42621697

由 Alexander Gordeev 提交于 6月 03, 2015

This update makes arithmetic to calculate number of RCU nodes
more straight and easy to read.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

42621697

rcu: Limit count of static data to the number of RCU levels · cb007102

由 Alexander Gordeev 提交于 6月 03, 2015

Although a number of RCU levels may be less than the current
maximum of four, some static data associated with each level
are allocated for all four levels. As result, the extra data
never get accessed and just wast memory. This update limits
count of allocated items to the number of used RCU levels.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

cb007102

rcu: Remove unnecessary fields from rcu_state structure · 199977bf

由 Alexander Gordeev 提交于 6月 03, 2015

Members rcu_state::levelcnt[] and rcu_state::levelspread[]
are only used at init. There is no reason to keep them
afterwards.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

199977bf

rcu: Limit rcu_capacity[] size to RCU_NUM_LVLS items · 05b84aec

由 Alexander Gordeev 提交于 6月 03, 2015

Number of items in rcu_capacity[] array is defined by macro
MAX_RCU_LVLS. However, that array is never accessed beyond
RCU_NUM_LVLS index. Therefore, we can limit the array to
RCU_NUM_LVLS items and eliminate MAX_RCU_LVLS. As result,
in most cases the memory is conserved.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

05b84aec

rcu: Limit rcu_state::levelcnt[] to RCU_NUM_LVLS items · a6d77081

由 Alexander Gordeev 提交于 6月 03, 2015

Variable rcu_num_lvls is limited by RCU_NUM_LVLS macro.
In turn, rcu_state::levelcnt[] array is never accessed
beyond rcu_num_lvls. Thus, rcu_state::levelcnt[] is safe
to limit to RCU_NUM_LVLS items.

Since rcu_num_lvls could be changed during boot (as result
of rcutree.rcu_fanout_leaf kernel parameter update) one might
assume a new value could overflow the value of RCU_NUM_LVLS.
However, that is not the case, since leaf-level fanout is only
permitted to increase, resulting in rcu_num_lvls possibly to
decrease.

Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a6d77081

P
rcu: Provide more diagnostics for stalled GP kthread · 319362c9
由 Paul E. McKenney 提交于 5月 19, 2015
```
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
```
319362c9

28 5月, 2015 4 次提交

rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF · 47d631af

由 Paul E. McKenney 提交于 4月 21, 2015

This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so
that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined.
The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF
when defined, otherwise it is set to 32 for 32-bit systems and 64 for
64-bit systems.  This commit then makes CONFIG_RCU_FANOUT_LEAF depend
on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about
CONFIG_RCU_FANOUT_LEAF unless they want to be.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

47d631af

rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT · 05c5df31

由 Paul E. McKenney 提交于 4月 20, 2015

This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will
build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is
set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set
to 32 for 32-bit systems and 64 for 64-bit systems. This commit then
makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig
users won't be asked about CONFIG_RCU_FANOUT unless they want to be.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

05c5df31

rcu: Make rcu_*_data variables static · c92fb057

由 Nicolas Iooss 提交于 5月 05, 2015

rcu_bh_data, rcu_sched_data and rcu_preempt_data are never used outside
kernel/rcu/tree.c and thus can be made static.

Doing so fixes a section mismatch warning reported by clang when
building LLVMLinux with -Wsection, because these variables were declared
in .data..percpu and defined in .data..percpu..shared_aligned since
commit 11bbb235 ("rcu: Use DEFINE_PER_CPU_SHARED_ALIGNED for
rcu_data").
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c92fb057

rcu: Eliminate a few RCU_BOOST #ifdefs in favor of IS_ENABLED() · 727b705b

由 Paul E. McKenney 提交于 3月 03, 2015

This commit removes a few RCU_BOOST #ifdefs, replacing them with
IS_ENABLED()-protected return statements. This relies on the
optimizer to remove any resulting dead code. There are several other
RCU_BOOST #ifdefs, however these rely on some per-CPU variables that
are available only under RCU_BOOST. These might be converted later,
if the simplification proves to outweigh the increase in memory footprint.
One hoped-for advantage is more easily locating compiler errors in
obscure combinations of Kconfig parameters.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-rt-users@vger.kernel.org>

727b705b

13 3月, 2015 2 次提交

rcu: Eliminate ->onoff_mutex from rcu_node structure · c1990689

由 Paul E. McKenney 提交于 1月 23, 2015

Because that RCU grace-period initialization need no longer exclude
CPU-hotplug operations, this commit eliminates the ->onoff_mutex and
its uses.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c1990689

rcu: Process offlining and onlining only at grace-period start · 0aa04b05

由 Paul E. McKenney 提交于 1月 23, 2015

Races between CPU hotplug and grace periods can be difficult to resolve,
so the ->onoff_mutex is used to exclude the two events. Unfortunately,
this means that it is impossible for an outgoing CPU to perform the
last bits of its offlining from its last pass through the idle loop,
because sleeplocks cannot be acquired in that context.

This commit avoids these problems by buffering online and offline events
in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a
grace period starts, the events accumulated in this mask are applied to
the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special
case of all CPUs corresponding to a given leaf rcu_node structure being
offline while there are still elements in that structure's ->blkd_tasks
list is handled using a new ->wait_blkd_tasks field. In this case,
propagating the offline bits up the tree is deferred until the beginning
of the grace period after all of the tasks have exited their RCU read-side
critical sections and removed themselves from the list, at which point
the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node
structure's CPUs comes back online before the list empties, then the
->wait_blkd_tasks flag is simply cleared.

This of course means that RCU's notion of which CPUs are offline can be
out of date. This is OK because RCU need only wait on CPUs that were
online at the time that the grace period started. In addition, RCU's
force-quiescent-state actions will handle the case where a CPU goes
offline after the grace period starts.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0aa04b05

16 1月, 2015 1 次提交

rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors · 5cd37193

由 Paul E. McKenney 提交于 12月 13, 2014

Although cond_resched_rcu_qs() only applies to TASKS_RCU, it is used
in places where it would be useful for it to apply to the normal RCU
flavors, rcu_preempt, rcu_sched, and rcu_bh.  This is especially the
case for workloads that aggressively overload the system, particularly
those that generate large numbers of RCU updates on systems running
NO_HZ_FULL CPUs.  This commit therefore communicates quiescent states
from cond_resched_rcu_qs() to the normal RCU flavors.

Note that it is unfortunately necessary to leave the old ->passed_quiesce
mechanism in place to allow quiescent states that apply to only one
flavor to be recorded.  (Yes, we could decrement ->rcu_qs_ctr_snap in
that case, but that is not so good for debugging of RCU internals.)
In addition, if one of the RCU flavor's grace period has stalled, this
will invoke rcu_momentary_dyntick_idle(), resulting in a heavy-weight
quiescent state visible from other CPUs.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Merge commit from Sasha Levin fixing a bug where __this_cpu()
  was used in preemptible code. ]

5cd37193

11 1月, 2015 2 次提交

P
rcu: Remove redundant rcu_batches_completed() declaration · f9103c39
由 Paul E. McKenney 提交于 11月 21, 2014
```
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
```
f9103c39

rcu: Make _batches_completed() functions return unsigned long · 9733e4f0

由 Paul E. McKenney 提交于 11月 21, 2014

Long ago, the various ->completed fields were of type long, but now are
unsigned long due to signed-integer-overflow concerns. However, the
various _batches_completed() functions remained of type long, even though
their only purpose in life is to return the corresponding ->completed
field. This patch cleans this up by changing these functions' return
types to unsigned long.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9733e4f0

07 1月, 2015 5 次提交

rcu: Handle gpnum/completed wrap while dyntick idle · e3663b10

由 Paul E. McKenney 提交于 12月 08, 2014

Subtle race conditions can result if a CPU stays in dyntick-idle mode
long enough for the ->gpnum and ->completed fields to wrap.  For
example, consider the following sequence of events:

o	CPU 1 encounters a quiescent state while waiting for grace period
	5 to complete, but then enters dyntick-idle mode.

o	While CPU 1 is in dyntick-idle mode, the grace-period counters
	wrap around so that the grace period number is now 4.

o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
	and grace period 5 begins.

o	The quiescent state that CPU 1 passed through during the old
	grace period 5 looks like it applies to the new grace period
	5.  Therefore, the new grace period 5 completes without CPU 1
	having passed through a quiescent state.

This could clearly be a fatal surprise to any long-running RCU read-side
critical section that happened to be running on CPU 1 at the time.  At one
time, this was not a problem, given that it takes significant time for
the grace-period counters to overflow even on 32-bit systems.  However,
with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
idle periods are now becoming quite feasible.  It is therefore time to
close this race.

This commit therefore avoids this race condition by having the
quiescent-state forcing code detect when a CPU is falling too far
behind, and setting a new rcu_data field ->gpwrap when this happens.
Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
fields are known to be untrustworthy, and can be ignored, along with
any associated quiescent states.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e3663b10

rcu: Improve diagnostics for spurious RCU CPU stall warnings · 6ccd2ecd

由 Paul E. McKenney 提交于 12月 11, 2014

The current RCU CPU stall warning code will print "Stall ended before
state dump start" any time that the stall-warning code is triggered on
a CPU that has already reported a quiescent state for the current grace
period and if all quiescent states have been reported for the current
grace period. However, a true stall can result in these symptoms, for
example, by preventing RCU's grace-period kthreads from ever running

This commit therefore checks for this condition, reporting the end of
the stall only if one of the grace-period counters has actually advanced.
Otherwise, it reports the last time that the grace-period kthread made
meaningful progress. (In normal situations, the grace-period kthread
should make meaningful progress at least every jiffies_till_next_fqs
jiffies.)
Reported-by: NMiroslav Benes <mbenes@suse.cz>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMiroslav Benes <mbenes@suse.cz>

6ccd2ecd

rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts · fc908ed3

由 Paul E. McKenney 提交于 12月 08, 2014

One way that an RCU CPU stall warning can happen is if the grace-period
kthread is not allowed to execute. One proxy for this kthread's
forward progress is the number of force-quiescent-state (fqs) scans.
This commit therefore adds the number of fqs scans to the RCU CPU stall
warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

fc908ed3

rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion · abaf3f9d

由 Lai Jiangshan 提交于 11月 18, 2014

The patch dfeb9765 ("Allow post-unlock reference for rt_mutex")
ensured rcu-boost safe even the rt_mutex has post-unlock reference.

But rt_mutex allowing post-unlock reference is definitely a bug and it was
fixed by the commit 27e35715 ("rtmutex: Plug slow unlock race").
This fix made the previous patch (dfeb9765) useless.

And even worse, the priority-inversion introduced by the the previous
patch still exists.

rcu_read_unlock_special() {
	rt_mutex_unlock(&rnp->boost_mtx);
	/* Priority-Inversion:
	 * the current task had been deboosted and preempted as a low
	 * priority task immediately, it could wait long before reschedule in,
	 * and the rcu-booster also waits on this low priority task and sleeps.
	 * This priority-inversion makes rcu-booster can't work
	 * as expected.
	 */
	complete(&rnp->boost_completion);
}

Just revert the patch to avoid it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

abaf3f9d

rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1

由 Paul E. McKenney 提交于 10月 31, 2014

When the last CPU associated with a given leaf rcu_node structure
goes offline, something must be done about the tasks queued on that
rcu_node structure. Each of these tasks has been preempted on one of
the leaf rcu_node structure's CPUs while in an RCU read-side critical
section that it have not yet exited. Handling these tasks is the job of
rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
structure to the root rcu_node structure.

Unfortunately, this migration has to be done one task at a time because
each tasks allegiance must be shifted from the original leaf rcu_node to
the root, so that future attempts to deal with these tasks will acquire
the root rcu_node structure's ->lock rather than that of the leaf.
Worse yet, this migration must be done with interrupts disabled, which
is not so good for realtime response, especially given that there is
no bound on the number of tasks on a given rcu_node structure's list.
(OK, OK, there is a bound, it is just that it is unreasonably large,
especially on 64-bit systems.) This was not considered a problem back
when rcu_preempt_offline_tasks() was first written because realtime
systems were assumed not to do CPU-hotplug operations while real-time
applications were running. This assumption has proved of dubious validity
given that people are starting to run multiple realtime applications
on a single SMP system and that it is common practice to offline then
online a CPU before starting its real-time application in order to clear
extraneous processing off of that CPU. So we now need CPU hotplug
operations to avoid undue latencies.

This commit therefore avoids migrating these tasks, instead letting
them be dequeued one by one from the original leaf rcu_node structure
by rcu_read_unlock_special(). This means that the clearing of bits
from the upper-level rcu_node structures must be deferred until the
last such task has been dequeued, because otherwise subsequent grace
periods won't wait on them. This commit has the beneficial side effect
of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
CONFIG_RCU_BOOST builds.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d19fb8d1