提交 · e0775cefb5ede661dbdc0611d7bf3fcd4640005c · openanolis / cloud-kernel

29 10月, 2014 3 次提交

rcu: Avoid IPIing idle CPUs from synchronize_sched_expedited() · e0775cef

由 Paul E. McKenney 提交于 9月 03, 2014

Currently, synchronize_sched_expedited() sends IPIs to all online CPUs,
even those that are idle or executing in nohz_full= userspace. Because
idle CPUs and nohz_full= userspace CPUs are in extended quiescent states,
there is no need to IPI them in the first place. This commit therefore
avoids IPIing CPUs that are already in extended quiescent states.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e0775cef

rcu: Move RCU_BOOST variable declarations, eliminating #ifdef · 61cfd097

由 Paul E. McKenney 提交于 9月 02, 2014

There are some RCU_BOOST-specific per-CPU variable declarations that
are needlessly defined under #ifdef in kernel/rcu/tree.c.  This commit
therefore moves these declarations into a pre-existing #ifdef in
kernel/rcu/tree_plugin.h.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

61cfd097

rcu: Make rcu_barrier() understand about missing rcuo kthreads · d7e29933

由 Paul E. McKenney 提交于 10月 27, 2014

Commit 35ce7f29 (rcu: Create rcuo kthreads only for onlined CPUs)
avoids creating rcuo kthreads for CPUs that never come online.  This
fixes a bug in many instances of firmware: Instead of lying about their
age, these systems instead lie about the number of CPUs that they have.
Before commit 35ce7f29, this could result in huge numbers of useless
rcuo kthreads being created.

It appears that experience indicates that I should have told the
people suffering from this problem to fix their broken firmware, but
I instead produced what turned out to be a partial fix.   The missing
piece supplied by this commit makes sure that rcu_barrier() knows not to
post callbacks for no-CBs CPUs that have not yet come online, because
otherwise rcu_barrier() will hang on systems having firmware that lies
about the number of CPUs.

It is tempting to simply have rcu_barrier() refuse to post a callback on
any no-CBs CPU that does not have an rcuo kthread.  This unfortunately
does not work because rcu_barrier() is required to wait for all pending
callbacks.  It is therefore required to wait even for those callbacks
that cannot possibly be invoked.  Even if doing so hangs the system.

Given that posting a callback to a no-CBs CPU that does not yet have an
rcuo kthread can hang rcu_barrier(), It is tempting to report an error
in this case.  Unfortunately, this will result in false positives at
boot time, when it is perfectly legal to post callbacks to the boot CPU
before the scheduler has started, in other words, before it is legal
to invoke rcu_barrier().

So this commit instead has rcu_barrier() avoid posting callbacks to
CPUs having neither rcuo kthread nor pending callbacks, and has it
complain bitterly if it finds CPUs having no rcuo kthread but some
pending callbacks.  And when rcu_barrier() does find CPUs having no rcuo
kthread but pending callbacks, as noted earlier, it has no choice but
to hang indefinitely.
Reported-by: NYanko Kaneti <yaneti@declera.com>
Reported-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Reported-by: NMeelis Roos <mroos@linux.ee>
Reported-by: NEric B Munson <emunson@akamai.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NEric B Munson <emunson@akamai.com>
Tested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
Tested-by: NYanko Kaneti <yaneti@declera.com>
Tested-by: NKevin Fenzi <kevin@scrye.com>
Tested-by: NMeelis Roos <mroos@linux.ee>

d7e29933

19 9月, 2014 1 次提交

rcu: Eliminate deadlock between CPU hotplug and expedited grace periods · dd56af42

由 Paul E. McKenney 提交于 8月 25, 2014

Currently, the expedited grace-period primitives do get_online_cpus().
This greatly simplifies their implementation, but means that calls
to them holding locks that are acquired by CPU-hotplug notifiers (to
say nothing of calls to these primitives from CPU-hotplug notifiers)
can deadlock. But this is starting to become inconvenient, as can be
seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this
case is that some developers need to acquire a mutex from a CPU-hotplug
notifier, but also need to hold it across a synchronize_rcu_expedited().
As noted above, this currently results in deadlock.

This commit avoids the deadlock and retains the simplicity by creating
a try_get_online_cpus(), which returns false if the get_online_cpus()
reference count could not immediately be incremented. If a call to
try_get_online_cpus() returns true, the expedited primitives operate as
before. If a call returns false, the expedited primitives fall back to
normal grace-period operations. This falling back of course results in
increased grace-period latency, but only during times when CPU hotplug
operations are actually in flight. The effect should therefore be
negligible during normal operation.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Tested-by: NLan Tianyu <tianyu.lan@intel.com>

dd56af42

17 9月, 2014 2 次提交

rcu: Create rcuo kthreads only for onlined CPUs · 35ce7f29

由 Paul E. McKenney 提交于 7月 11, 2014

RCU currently uses for_each_possible_cpu() to spawn rcuo kthreads,
which can result in more rcuo kthreads than one would expect, for
example, derRichard reported 64 CPUs worth of rcuo kthreads on an
8-CPU image.  This commit therefore creates rcuo kthreads only for
those CPUs that actually come online.

This was reported by derRichard on the OFTC IRC network.
Reported-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Tested-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

35ce7f29

rcu: Rationalize kthread spawning · 9386c0b7

由 Paul E. McKenney 提交于 7月 13, 2014

Currently, RCU spawns kthreads from several different early_initcall()
functions. Although this has served RCU well for quite some time,
as more kthreads are added a more deterministic approach is required.
This commit therefore causes all of RCU's early-boot kthreads to be
spawned from a single early_initcall() function.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Tested-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

9386c0b7

08 9月, 2014 11 次提交

rcu: Per-CPU operation cleanups to rcu_*_qs() functions · 284a8c93

由 Paul E. McKenney 提交于 8月 14, 2014

The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use
old-style per-CPU variable access and write to ->passed_quiesce even
if it is already set. This commit therefore updates to use the new-style
per-CPU variable access functions and avoids the spurious writes.
This commit also eliminates the "cpu" argument to these functions because
they are always invoked on the indicated CPU.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

284a8c93

rcu: Make TASKS_RCU handle nohz_full= CPUs · 176f8f7a

由 Paul E. McKenney 提交于 8月 04, 2014

Currently TASKS_RCU would ignore a CPU running a task in nohz_full=
usermode execution. There would be neither a context switch nor a
scheduling-clock interrupt to tell TASKS_RCU that the task in question
had passed through a quiescent state. The grace period would therefore
extend indefinitely. This commit therefore makes RCU's dyntick-idle
subsystem record the task_struct structure of the task that is running
in dyntick-idle mode on each CPU. The TASKS_RCU grace period can
then access this information and record a quiescent state on
behalf of any CPU running in dyntick-idle usermode.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

176f8f7a

rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops · bde6c3aa

由 Paul E. McKenney 提交于 7月 01, 2014

RCU-tasks requires the occasional voluntary context switch
from CPU-bound in-kernel tasks.  In some cases, this requires
instrumenting cond_resched().  However, there is some reluctance
to countenance unconditionally instrumenting cond_resched() (see
http://lwn.net/Articles/603252/), so this commit creates a separate
cond_resched_rcu_qs() that may be used in place of cond_resched() in
locations prone to long-duration in-kernel looping.

This commit currently instruments only RCU-tasks.  Future possibilities
include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
IPI usage.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bde6c3aa

rcu: Add call_rcu_tasks() · 8315f422

由 Paul E. McKenney 提交于 6月 27, 2014

This commit adds a new RCU-tasks flavor of RCU, which provides
call_rcu_tasks(). This RCU flavor's quiescent states are voluntary
context switch (not preemption!) and userspace execution (not the idle
loop -- use some sort of schedule_on_each_cpu() if you need to handle the
idle tasks. Note that unlike other RCU flavors, these quiescent states
occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt.

This RCU flavor is assumed to have very infrequent latency-tolerant
updaters. This assumption permits significant simplifications, including
a single global callback list protected by a single global lock, along
with a single task-private linked list containing all tasks that have not
yet passed through a quiescent state. If experience shows this assumption
to be incorrect, the required additional complexity will be added.
Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

8315f422

rcu: Replace flush_signals() with WARN_ON(signal_pending()) · 73a860cd

由 Paul E. McKenney 提交于 8月 14, 2014

Currently, when RCU awakens from a wait_event_interruptible() that
might have awakened prematurely, it does a flush_signals(). This is
done on the off-chance that someone figured out how to deliver a signal
to a kthread, which is supposed to be impossible.  Given that this
is supposed to be impossible, this commit changes the flush_signals()
calls into WARN_ON(signal_pending()).
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

73a860cd

rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads · 2aa792e6

由 Pranith Kumar 提交于 8月 12, 2014

The rcu_gp_kthread_wake() function checks for three conditions before
waking up grace period kthreads:

*  Is the thread we are trying to wake up the current thread?
*  Are the gp_flags zero? (all threads wait on non-zero gp_flags condition)
*  Is there no thread created for this flavour, hence nothing to wake up?

If any one of these condition is true, we do not call wake_up().
It was found that there are quite a few avoidable wake ups both during
idle time and under stress induced by rcutorture.

Idle:

Total:66000, unnecessary:66000, case1:61827, case2:66000, case3:0
Total:68000, unnecessary:68000, case1:63696, case2:68000, case3:0

rcutorture:

Total:254000, unnecessary:254000, case1:199913, case2:254000, case3:0
Total:256000, unnecessary:256000, case1:201784, case2:256000, case3:0

Here case{1-3} are the cases listed above. We can avoid these wake
ups by using rcu_gp_kthread_wake() to conditionally wake up the grace
period kthreads.

There is a comment about an implied barrier supplied by the wake_up()
logic.  This barrier is necessary for the awakened thread to see the
updated ->gp_flags.  This flag is always being updated with the root node
lock held. Also, the awakened thread tries to acquire the root node lock
before reading ->gp_flags because of which there is proper ordering.

Hence this commit tries to avoid calling wake_up() whenever we can by
using rcu_gp_kthread_wake() function.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2aa792e6

rcu: Remove stale comment in tree.c · 66d701ea

由 Pranith Kumar 提交于 7月 16, 2014

This commit removes a stale comment in rcu/tree.c which was left
out when some code was moved around previously in commit 2036d94a
("rcu:  Rework detection of use of RCU by offline CPUs") For reference,
the following updated comment exists a few lines below this which means
the same:

/* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

66d701ea

rcu: Define tracepoint strings only if CONFIG_TRACING is set · a8a29b3b

由 Ard Biesheuvel 提交于 7月 12, 2014

Commit f7f7bac9 ("rcu: Have the RCU tracepoints use the tracepoint_string
infrastructure") unconditionally populates the __tracepoint_str input section,
but this section is not assigned an output section if CONFIG_TRACING is not set.
This results in the __tracepoint_str turning up in unexpected places, i.e.,
after _edata.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a8a29b3b

rcu: Use true/false instead of 1/0 for a bool type · e02b2edf

由 Pranith Kumar 提交于 7月 09, 2014

This commit uses true/false instead of 1/0 for bool types in rcu_gp_fqs()
and force_qs_rnp().
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e02b2edf

rcu: Use bool type for return value in rcu_is_watching() · f534ed1f

由 Pranith Kumar 提交于 7月 08, 2014

Use a bool type for return in rcu_is_watching().
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f534ed1f

rcu: Remove remaining read-modify-write ACCESS_ONCE() calls · 4de376a1

由 Pranith Kumar 提交于 7月 08, 2014

Change the remaining uses of ACCESS_ONCE() so that each ACCESS_ONCE() either does a load or a store, but not both.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4de376a1

10 7月, 2014 7 次提交

rcu: Remove CONFIG_PROVE_RCU_DELAY · 11992c70

由 Paul E. McKenney 提交于 6月 23, 2014

The CONFIG_PROVE_RCU_DELAY Kconfig parameter doesn't appear to be very
effective at finding race conditions, so this commit removes it.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
[ paulmck: Remove definition and uses as noted by Paul Bolle. ]

11992c70

rcu: Use __this_cpu_read() instead of per_cpu_ptr() · d860d403

由 Shan Wei 提交于 6月 19, 2014

The __this_cpu_read() function produces better code than does
per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
4.7.3-12ubuntu1) 4.7.3 produces the following:

ARMv7 per_cpu_ptr():

force_quiescent_state:
    mov    r3, sp    @,
    bic    r1, r3, #8128    @ tmp171,,
    ldr    r2, .L98    @ tmp169,
    bic    r1, r1, #63    @ tmp170, tmp171,
    ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)->rda
    ldr    r1, [r1, #20]    @ D.35903_68->cpu, D.35903_68->cpu
    mov    r6, r0    @ rsp, rsp
    ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
    add    r3, r3, r2    @ tmp175, __ptr, tmp173
    ldr    r5, [r3, #12]    @ rnp_old, D.29162_13->mynode

ARMv7 __this_cpu_read():

force_quiescent_state:
    ldr    r3, [r0, #220]    @ rsp_7(D)->rda, rsp_7(D)->rda
    mov    r6, r0    @ rsp, rsp
    add    r3, r3, #12    @ __ptr, rsp_7(D)->rda,
    ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13

Using gcc 4.8.2:

x86_64 per_cpu_ptr():

    movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
    movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
    movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)->rda, __ptr
    movq    24(%rdx,%rax), %r12    # _15->mynode, rnp_old

x86_64 __this_cpu_read():

    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)->rda, rsp_9(D)->rda
    movq %gs:24(%rax),%r12    # _10->mynode, rnp_old

Because this change produces significant benefits for these two very
diverse architectures, this commit makes this change.
Signed-off-by: NShan Wei <davidshan@tencent.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

d860d403

rcu: Don't use NMIs to dump other CPUs' stacks · bc1dce51

由 Paul E. McKenney 提交于 6月 18, 2014

Although NMI-based stack dumps are in principle more accurate, they are
also more likely to trigger deadlocks. This commit therefore replaces
all uses of trigger_all_cpu_backtrace() with rcu_dump_cpu_stacks(), so
that the CPU detecting an RCU CPU stall does the stack dumping.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

bc1dce51

rcu: Check both root and current rcu_node when setting up future grace period · 48bd8e9b

由 Pranith Kumar 提交于 6月 11, 2014

The rcu_start_future_gp() function checks the current rcu_node's ->gpnum
and ->completed twice, once without ACCESS_ONCE() and once with it.
Which is pointless because we hold that rcu_node's ->lock at that point.
The intent was to check the current rcu_node structure and the root
rcu_node structure, the latter locklessly with ACCESS_ONCE().  This
commit therefore makes that change.

The reason that it is safe to locklessly check the root rcu_nodes's
->gpnum and ->completed fields is that we hold the current rcu_node's
->lock, which constrains the root rcu_node's ability to change its
->gpnum and ->completed fields.  Of course, if there is a single rcu_node
structure, then rnp_root==rnp, and holding the lock prevents all changes.
If there is more than one rcu_node structure, then the code updates the
fields in the following order:

1.	Increment rnp_root->gpnum to start new grace period.
2.	Increment rnp->gpnum to initialize the current rcu_node,
	continuing initialization for the new grace period.
3.	Increment rnp_root->completed to end the current grace period.
4.	Increment rnp->completed to continue cleaning up after the
	old grace period.

So there are four possible combinations of relative values of these
four fields:

N   N   N   N:  RCU idle, new grace period must be initiated.
		Although rnp_root->gpnum might be incremented immediately
		after we check, that will just result in unnecessary work.
		The grace period already started, and we try to start it.

N+1 N   N   N:  RCU grace period just started.  No further change is
		possible because we hold rnp->lock, so the checks of
		rnp_root->gpnum and rnp_root->completed are stable.
		We know that our request for a future grace period will
		be seen during grace-period cleanup.

N+1 N   N+1 N:  RCU grace period is ongoing.  Because rnp->gpnum is
		different than rnp->completed, we won't even look at
		rnp_root->gpnum and rnp_root->completed, so the possible
		concurrent change to rnp_root->completed does not matter.
		We know that our request for a future grace period will
		be seen during grace-period cleanup, which cannot pass
		this rcu_node because we hold its ->lock.

N+1 N+1 N+1 N:  RCU grace period has ended, but not yet been cleaned up.
		Because rnp->gpnum is different than rnp->completed, we
		won't look at rnp_root->gpnum and rnp_root->completed, so
		the possible concurrent change to rnp_root->completed does
		not matter.  We know that our request for a future grace
		period will be seen during grace-period cleanup, which
		cannot pass this rcu_node because we hold its ->lock.

Therefore, despite initial appearances, the lockless check is safe.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
[ paulmck: Update comment to say why the lockless check is safe. ]
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

48bd8e9b

rcu: Loosen __call_rcu()'s rcu_head alignment constraint · 1146edcb

由 Paul E. McKenney 提交于 6月 09, 2014

The m68k architecture aligns only to 16-bit boundaries, which can cause
the align-to-32-bits check in __call_rcu() to trigger.  Because there is
currently no known potential need for more than one low-order bit, this
commit loosens the check to 16-bit boundaries.
Reported-by: NGreg Ungerer <gerg@uclinux.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

1146edcb

rcu: Eliminate read-modify-write ACCESS_ONCE() calls · a792563b

由 Paul E. McKenney 提交于 6月 02, 2014

RCU contains code of the following forms:

	ACCESS_ONCE(x)++;
	ACCESS_ONCE(x) += y;
	ACCESS_ONCE(x) -= y;

Now these constructs do operate correctly, but they really result in a
pair of volatile accesses, one to do the load and another to do the store.
This can be confusing, as the casual reader might well assume that (for
example) gcc might generate a memory-to-memory add instruction for each
of these three cases.  In fact, gcc will do no such thing.  Also, there
is a good chance that the kernel will move to separate load and store
variants of ACCESS_ONCE(), and constructs like the above could easily
confuse both people and scripts attempting to make that sort of change.
Finally, most of RCU's read-modify-write uses of ACCESS_ONCE() really
only need the store to be volatile, so that the read-modify-write form
might be misleading.

This commit therefore changes the above forms in RCU so that each instance
of ACCESS_ONCE() either does a load or a store, but not both.  In a few
cases, ACCESS_ONCE() was not critical, for example, for maintaining
statisitics.  In these cases, ACCESS_ONCE() has been dispensed with
entirely.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a792563b

rcu: Make rcu node arrays static const char * const · b4426b49

由 Fabian Frederick 提交于 5月 06, 2014

Those two arrays are being passed to lockdep_init_map(), which expects
const char *, and are stored in lockdep_map the same way.

Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b4426b49

24 6月, 2014 1 次提交

rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832

由 Paul E. McKenney 提交于 6月 20, 2014

Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
fixed a problem where a CPU looping in the kernel with but one runnable
task would give RCU CPU stall warnings, even if the in-kernel loop
contained cond_resched() calls.  Unfortunately, in so doing, it introduced
performance regressions in Anton Blanchard's will-it-scale "open1" test.
The problem appears to be not so much the increased cond_resched() path
length as an increase in the rate at which grace periods complete, which
increased per-update grace-period overhead.

This commit takes a different approach to fixing this bug, mainly by
moving the RCU-visible quiescent state from cond_resched() to
rcu_note_context_switch(), and by further reducing the check to a
simple non-zero test of a single per-CPU variable.  However, this
approach requires that the force-quiescent-state processing send
resched IPIs to the offending CPUs.  These will be sent only once
the grace period has reached an age specified by the boot/sysfs
parameter rcutree.jiffies_till_sched_qs, or once the grace period
reaches an age halfway to the point at which RCU CPU stall warnings
will be emitted, whichever comes first.
Reported-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
[ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
  ktest build robot.  Also fixed smp_mb() comment as noted by
  Oleg Nesterov. ]

Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4a81e832

15 5月, 2014 2 次提交

rcu: Variable name changed in tree_plugin.h and used in tree.c · e534165b

由 Uma Sharma 提交于 3月 23, 2014

The variable and struct both having the name "rcu_state" confuses
sparse in some situations, so this commit changes the variable to
"rcu_state_p" in order to avoid this confusion.  This also makes
things easier for human readers.
Signed-off-by: NUma Sharma <uma.sharma523@gmail.com>
[ paulmck: Changed the declaration and several additional uses. ]
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e534165b

rcutorture: Export RCU grace-period kthread wait state to rcutorture · afea227f

由 Paul E. McKenney 提交于 3月 12, 2014

This commit allows rcutorture to print additional state for the
RCU grace-period kthreads in cases where RCU seems reluctant to
start a new grace period.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

afea227f

14 5月, 2014 1 次提交

rcutorture: Add forward-progress checking for writer · ad0dc7f9

由 Paul E. McKenney 提交于 2月 19, 2014

The rcutorture output currently does not distinguish between stalls in
the RCU implementation and stalls in the rcu_torture_writer() kthreads.
This commit therefore adds some diagnostics to help distinguish between
these two conditions, at least for the non-SRCU implementations. (SRCU
does not provide evidence of update-side forward progress by design.)
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ad0dc7f9

29 4月, 2014 12 次提交

rcu: Replace __this_cpu_ptr() uses with raw_cpu_ptr() · fa07a58f

由 Christoph Lameter 提交于 4月 15, 2014

__this_cpu_ptr is being phased out.

One special case is increment_cpu_stall_ticks().
A per cpu variable is incremented so use raw_cpu_inc().

Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

fa07a58f

rcu: Remove duplicate resched_cpu() declaration · 8c96ae1d

由 Pranith Kumar 提交于 4月 14, 2014

Signed-off-by: NPranith Kumar <pranith@gatech.edu>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

8c96ae1d

rcu: Merge rcu_sched_force_quiescent_state() with rcu_force_quiescent_state() · a381d757

由 Andreea-Cristina Bernat 提交于 3月 19, 2014

This patch merges the function rcu_force_quiescent_state() with
rcu_sched_force_quiescent_state(), using the rcu_state pointer. Firstly,
the rcu_sched_force_quiescent_state() function is deleted from the file
kernel/rcu/tree.c. Also, the rcu_force_quiescent_state() function that was
calling force_quiescent_state with the argument rcu_preempt_state pointer
was deleted as well. The new function that combines the old ones uses
the rcu_state pointer and is located after rcu_batches_completed_bh()
in kernel/rcu/tree.c.
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

a381d757

rcu: Consolidate kfree_call_rcu() to use rcu_state pointer · 495aa969

由 Andreea-Cristina Bernat 提交于 3月 18, 2014

kfree_call_rcu is defined two times. When defined under CONFIG_TREE_PREEMPT_RCU,
it uses rcu_preempt_state. Otherwise, it uses rcu_sched_state.
This patch uses the rcu_state_pointer to combine the two definitions into one.
The resulting function is placed after the closing of the preprocessor
conditional CONFIG_TREE_PREEMPT_RCU.
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

495aa969

rcu: Replace NR_CPUS with nr_cpu_ids · 595f3900

由 Himangi Saraogi 提交于 3月 18, 2014

This patch replaces NR_CPUS with nr_cpu_ids as NR_CPUS should
consider cpumask_var_t.
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

595f3900

rcu: Add event tracing to dyntick_save_progress_counter(). · 7941dbde

由 Andreea-Cristina Bernat 提交于 3月 17, 2014

This patch adds event tracing to dyntick_save_progress_counter() in the case
where it returns 1. I used the tracepoint string "dti" because this function
returns 1 in case the CPU is in dynticks idle mode.
Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

7941dbde

rcu: Make callers awaken grace-period kthread · 48a7639c

由 Paul E. McKenney 提交于 3月 11, 2014

The rcu_start_gp_advanced() function currently uses irq_work_queue()
to defer wakeups of the RCU grace-period kthread. This deferring
is necessary to avoid RCU-scheduler deadlocks involving the rcu_node
structure's lock, meaning that RCU cannot call any of the scheduler's
wake-up functions while holding one of these locks.

Unfortunately, the second and subsequent calls to irq_work_queue() are
ignored, and the first call will be ignored (aside from queuing the work
item) if the scheduler-clock tick is turned off. This is OK for many
uses, especially those where irq_work_queue() is called from an interrupt
or softirq handler, because in those cases the scheduler-clock-tick state
will be re-evaluated, which will turn the scheduler-clock tick back on.
On the next tick, any deferred work will then be processed.

However, this strategy does not always work for RCU, which can be invoked
at process level from idle CPUs. In this case, the tick might never
be turned back on, indefinitely defering a grace-period start request.
Note that the RCU CPU stall detector cannot see this condition, because
there is no RCU grace period in progress. Therefore, we can (and do!)
see long tens-of-seconds stalls in grace-period handling. In theory,
we could see a full grace-period hang, but rcutorture testing to date
has seen only the tens-of-seconds stalls. Event tracing demonstrates
that irq_work_queue() is being called repeatedly to no effect during
these stalls: The "newreq" event appears repeatedly from a task that is
not one of the grace-period kthreads.

In theory, irq_work_queue() might be fixed to avoid this sort of issue,
but RCU's requirements are unusual and it is quite straightforward to pass
wake-up responsibility up through RCU's call chain, so that the wakeup
happens when the offending locks are released.

This commit therefore makes this change. The rcu_start_gp_advanced(),
rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(),
__note_gp_changes(), and rcu_start_gp() functions now return a boolean
which indicates when a wake-up is needed. A new rcu_gp_kthread_wake()
does the wakeup when it is necessary and safe to do so: No self-wakes,
no wake-ups if the ->gp_flags field indicates there is no need (as in
someone else did the wake-up before we got around to it), and no wake-ups
before the grace-period kthread has been created.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

48a7639c

rcu: Protect uses of jiffies_stall field with ACCESS_ONCE() · 4fc5b755

由 Iulia Manda 提交于 3月 11, 2014

Some of the uses of the rcu_state structure's ->jiffies_stall field
do not use ACCESS_ONCE(), despite there being unprotected accesses.
This commit therefore uses the ACCESS_ONCE() macro to protect this field.
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

4fc5b755

rcu: Remove unused rcu_data structure field · 9b67122a

由 Iulia Manda 提交于 3月 11, 2014

The ->preemptible field in rcu_data is only initialized in the function
rcu_init_percpu_data(), and never used.  This commit therefore removes
this field.
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

9b67122a

rcu: Update cpu_needs_another_gp() for futures from non-NOCB CPUs · 365187fb

由 Paul E. McKenney 提交于 3月 10, 2014

In the old days, the only source of requests for future grace periods
was NOCB CPUs.  This has changed: CPUs routinely post requests for
future grace periods in order to promote power efficiency and reduce
OS jitter with minimal impact on grace-period latency.  This commit
therefore updates cpu_needs_another_gp() to invoke rcu_future_needs_gp()
instead of rcu_nocb_needs_gp().  The latter is no longer used, so is
now removed.  This commit also adds tracing for the irq_work_queue()
wakeup case.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

365187fb

rcu: Print negatives for stall-warning counter wraparound · 83ebe63e

由 Paul E. McKenney 提交于 3月 06, 2014

The print_other_cpu_stall() and print_cpu_stall() functions print
grace-period numbers using an unsigned format, which means that the number
one less than zero is a very large number. This commit therefore causes
these numbers to be printed with a signed format in order to improve
readability of the RCU CPU stall-warning output.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

83ebe63e

rcu: Protect ->gp_flags accesses with ACCESS_ONCE() · 91dc9542

由 Paul E. McKenney 提交于 2月 18, 2014

A number of ->gp_flags accesses don't have ACCESS_ONCE(), but all of
the can race against other loads or stores.  This commit therefore
applies ACCESS_ONCE() to the unprotected ->gp_flags accesses.
Reported-by: NAlexey Roytman <alexey.roytman@oracle.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

91dc9542

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功