提交 · 3451d0243c3cdfd729b36f9684a14659d4895ca3 · openeuler / raspberrypi-kernel

03 4月, 2013 1 次提交

nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMON · 3451d024

由 Frederic Weisbecker 提交于 8月 10, 2011

We are planning to convert the dynticks Kconfig options layout
into a choice menu. The user must be able to easily pick
any of the following implementations: constant periodic tick,
idle dynticks, full dynticks.

As this implies a mutual exclusion, the two dynticks implementions
need to converge on the selection of a common Kconfig option in order
to ease the sharing of a common infrastructure.

It would thus seem pretty natural to reuse CONFIG_NO_HZ to
that end. It already implements all the idle dynticks code
and the full dynticks depends on all that code for now.
So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and
CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ.

On the other hand we want to stay backward compatible: if
CONFIG_NO_HZ is set in an older config file, we want to
enable CONFIG_NO_HZ_IDLE by default.

But we can't afford both at the same time or we run into
a circular dependency:

1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select
   CONFIG_NO_HZ
2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE

We might be able to support that from Kconfig/Kbuild but it
may not be wise to introduce such a confusing behaviour.

So to solve this, create a new CONFIG_NO_HZ_COMMON option
which gathers the common code between idle and full dynticks
(that common code for now is simply the idle dynticks code)
and select it from their referring Kconfig.

Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ
to it for backward compatibility.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

3451d024

17 11月, 2012 2 次提交

rcu: Add documentation for the new rcuexp debugfs trace file · d484a215

由 Paul E. McKenney 提交于 10月 31, 2012

This commit adds the documentation of the rcuexp debugfs trace file
that records statistics for expedited grace periods.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d484a215

rcu: Update documentation for TREE_RCU debugfs tracing · 40e80c46

由 Paul E. McKenney 提交于 10月 31, 2012

This commit updates the tracing documentation to reflect the new
format that has per-RCU-flavor directories.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

40e80c46

14 11月, 2012 1 次提交

rcu: Remove list_for_each_continue_rcu() · bb08f76d

由 Paul E. McKenney 提交于 10月 20, 2012

The list_for_each_continue_rcu() macro is no longer used, so this commit
removes it.  The list_for_each_entry_continue_rcu() macro should be
used instead.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bb08f76d

09 11月, 2012 2 次提交

rcu: Document alternative RCU/reference-count algorithms · a4d611fd

由 Paul E. McKenney 提交于 10月 27, 2012

The approach for mixing RCU and reference counting listed in the RCU
documentation only describes one possible approach. This approach can
result in failure on the read side, which is nice if you want fresh data,
but not so good if you want simple code. This commit therefore adds
two additional approaches that feature unconditional reference-count
acquisition by RCU readers. These approaches are very similar to that
used in the security code.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a4d611fd

rcu: Update docs to include kfree_rcu() · 57d34a6c

由 Kees Cook 提交于 10月 19, 2012

Mention kfree_rcu() in the call_rcu() section. Additionally fix the
example code for list replacement that used the wrong structure element.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

57d34a6c

24 10月, 2012 1 次提交

rcu: Correct the name of a reference in list of RCU papers · 0f9574d8

由 Dhaval Giani 提交于 10月 17, 2012

Trying to go through the history of RCU (not for the weak
minded) led me to search for a non-existent paper.

Correct it to the actual reference
Signed-off-by: NDhaval Giani <dhaval.giani@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0f9574d8

23 9月, 2012 3 次提交

rcu: Fix CONFIG_RCU_FAST_NO_HZ stall warning message · 86f343b5

由 Paul E. McKenney 提交于 9月 21, 2012

The print_cpu_stall_fast_no_hz() function attempts to print -1 when
the ->idle_gp_timer is not pending, but unsigned arithmetic causes it
to instead print ULONG_MAX, which is 4294967295 on 32-bit systems and
18446744073709551615 on 64-bit systems. Neither of these are the most
reader-friendly values, so this commit instead causes "timer not pending"
to be printed when ->idle_gp_timer is not pending.
Reported-by: NPaul Walmsley <paul@pwsan.com>
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

86f343b5

rcu: Document SRCU dead-CPU capabilities, emphasize read-side limits · 2aef619c

由 Paul E. McKenney 提交于 8月 03, 2012

The current documentation did not help someone grepping for SRCU to
learn that disabling preemption is not a replacement for srcu_read_lock(),
so upgrade the documentation to bring this out, not just for SRCU,
but also for RCU-bh.  Also document the fact that SRCU readers are
respected on CPUs executing in user mode, idle CPUs, and even on
offline CPUs.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

2aef619c

rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing · 4605c014

由 Paul E. McKenney 提交于 6月 26, 2012

Moving quiescent-state forcing into a kthread dispenses with the need
for the ->n_rp_need_fqs field, so this commit removes it.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

4605c014

03 7月, 2012 1 次提交

rcu: Update documentation to cover call_srcu() and srcu_barrier(). · 74d874e7

由 Paul E. McKenney 提交于 5月 07, 2012

The advent of call_srcu() and srcu_barrier() obsoleted some of the
documentation, so this commit brings that up to date.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

74d874e7

01 5月, 2012 1 次提交

rcu: Introduce rcutorture testing for rcu_barrier() · fae4b54f

由 Paul E. McKenney 提交于 2月 20, 2012

Although rcutorture does invoke rcu_barrier() and friends, it cannot
really be called a torture test given that it invokes them only once
at the end of the test. This commit therefore introduces heavy-duty
rcutorture testing for rcu_barrier(), which may be carried out
concurrently with normal rcutorture testing.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

fae4b54f

22 2月, 2012 7 次提交

rcu: Call out dangers of expedited RCU primitives · 236fefaf

由 Paul E. McKenney 提交于 1月 31, 2012

The expedited RCU primitives can be quite useful, but they have some
high costs as well. This commit updates and creates docbook comments
calling out the costs, and updates the RCU documentation as well.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

236fefaf

rcu: Rework detection of use of RCU by offline CPUs · 2036d94a

由 Paul E. McKenney 提交于 1月 30, 2012

Because newly offlined CPUs continue executing after completing the
CPU_DYING notifiers, they legitimately enter the scheduler and use
RCU while appearing to be offline.  This calls for a more sophisticated
approach as follows:

1.	RCU marks the CPU online during the CPU_UP_PREPARE phase.

2.	RCU marks the CPU offline during the CPU_DEAD phase.

3.	Diagnostics regarding use of read-side RCU by offline CPUs use
	RCU's accounting rather than the cpu_online_map.  (Note that
	__call_rcu() still uses cpu_online_map to detect illegal
	invocations within CPU_DYING notifiers.)

4.	Offline CPUs are prevented from hanging the system by
	force_quiescent_state(), which pays attention to cpu_online_map.
	Some additional work (in a later commit) will be needed to
	guarantee that force_quiescent_state() waits a full jiffy before
	assuming that a CPU is offline, for example, when called from
	idle entry.  (This commit also makes the one-jiffy wait
	explicit, since the old-style implicit wait can now be defeated
	by RCU_FAST_NO_HZ and by rcutorture.)

This approach avoids the false positives encountered when attempting to
use more exact classification of CPU online/offline state.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2036d94a

rcu: Update stall-warning documentation · 24cd7fd0

由 Paul E. McKenney 提交于 1月 20, 2012

Add documentation of CONFIG_RCU_CPU_STALL_VERBOSE, CONFIG_RCU_CPU_STALL_INFO,
and RCU_STALL_DELAY_DELTA. Describe multiple stall-warning messages from
a single stall, and the timing of the subsequent messages. Add headings.
Remove RCU_SECONDS_TILL_STALL_RECHECK because this value is now computed
at runtime from RCU_CPU_STALL_TIMEOUT, so that sysfs changes to the timeout
value now directly affect the RCU_SECONDS_TILL_STALL_RECHECK value.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

24cd7fd0

rcu: Add CPU-stall capability to rcutorture · c13f3757

由 Paul E. McKenney 提交于 1月 20, 2012

Add module parameters to rcutorture that induce a CPU stall.
The stall_cpu parameter specifies how long to stall in seconds,
defaulting to zero, which indicates no stalling is to be undertaken.
The stall_cpu_holdoff parameter specifies how many seconds after
insmod (or boot, if rcutorture is built into the kernel) that this
stall is to start.  The default value for stall_cpu_holdoff is ten
seconds.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c13f3757

rcu: Make documentation give more realistic rcutorture duration · 105617da

由 Paul E. McKenney 提交于 2月 02, 2012

The torture.txt documentation gives an example rcutorture run with a
100-second duration.  This is ridiculously short, unless maybe testing
a fix for a egregious bug.  Use a more-realistic one-hour duration for
the example.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

105617da

rcutorture: Permit holding off CPU-hotplug operations during boot · 9b9ec9b9

由 Paul E. McKenney 提交于 1月 17, 2012

When rcutorture is started automatically at boot time, it might well
also start CPU-hotplug operations at that time, which might not be
desirable.  This commit therefore adds an rcutorture parameter that
allows CPU-hotplug operations to be held off for the specified number
of seconds after the start of boot.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9b9ec9b9

rcu: Bring RTFP.txt up to date. · 4c62abc9

由 Paul E. McKenney 提交于 12月 29, 2011

Add publications from 2010 and 2011 to RTFP.txt.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4c62abc9

12 12月, 2011 6 次提交

docs: Additional LWN links to RCU API · d493011a

由 Kees Cook 提交于 12月 07, 2011

Tyler Hicks pointed me at an additional article on RCU and I figured
it should probably be mentioned with the others.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d493011a

rcu: Add rcutorture CPU-hotplug capability · b58bdcca

由 Paul E. McKenney 提交于 11月 16, 2011

Running CPU-hotplug operations concurrently with rcutorture has
historically been a good way to find bugs in both RCU and CPU hotplug.
This commit therefore adds an rcutorture module parameter called
"onoff_interval" that causes a randomly selected CPU-hotplug operation to
be executed at the specified interval, in seconds.  The default value of
"onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
operations.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b58bdcca

rcu: Add rcutorture system-shutdown capability · d5f546d8

由 Paul E. McKenney 提交于 11月 04, 2011

Although it is easy to run rcutorture tests under KVM, there is currently
no nice way to run such a test for a fixed time period, collect all of
the rcutorture data, and then shut the system down cleanly. This commit
therefore adds an rcutorture module parameter named "shutdown_secs" that
specified the run duration in seconds, after which rcutorture terminates
the test and powers the system down. The default value for "shutdown_secs"
is zero, which disables shutdown.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d5f546d8

rcu: Add documentation for raw SRCU read-side primitives · 9ceae0e2

由 Paul E. McKenney 提交于 11月 03, 2011

Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
and srcu_read_unlock_raw(). Credit to Peter Zijlstra for suggesting
use of the existing _raw suffix instead of the earlier bulkref names.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9ceae0e2

rcu: Document failing tick as cause of RCU CPU stall warning · 2c01531f

由 Paul E. McKenney 提交于 10月 02, 2011

One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

2c01531f

rcu: Track idleness independent of idle tasks · 9b2e4f18

由 Paul E. McKenney 提交于 9月 30, 2011

Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing. A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration. Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration. This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts. This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions. It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating. Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

9b2e4f18

29 9月, 2011 8 次提交

rcu: Document interpretation of RCU-lockdep splats · d7bd2d68

由 Paul E. McKenney 提交于 7月 21, 2011

There has been quite a bit of confusion about what RCU-lockdep splats
mean, so this commit adds some documentation describing how to
interpret them.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d7bd2d68

rcu: Update documentation for additional RCU lockdep functions · 8cd889cb

由 Paul E. McKenney 提交于 7月 08, 2011

Add documentation for rcu_dereference_bh_check(),
rcu_dereference_sched_check(), srcu_dereference_check(), and
rcu_dereference_index_check().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

8cd889cb

rcu: Not necessary to pass rcu_read_lock_held() to rcu_dereference_protected() · e5177ec7

由 Michal Hocko 提交于 7月 08, 2011

Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check() use rcu_read_lock_held() as a part of condition
automatically. Therefore, callers of rcu_dereference_check() no longer
need to pass rcu_read_lock_held() to rcu_dereference_check().
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e5177ec7

rcu: Simplify quiescent-state accounting · e4cc1f22

由 Paul E. McKenney 提交于 6月 27, 2011

There is often a delay between the time that a CPU passes through a
quiescent state and the time that this quiescent state is reported to the
RCU core. It is quite possible that the grace period ended before the
quiescent state could be reported, for example, some other CPU might have
deduced that this CPU passed through dyntick-idle mode. It is critically
important that quiescent state be counted only against the grace period
that was in effect at the time that the quiescent state was detected.

Previously, this was handled by recording the number of the last grace
period to complete when passing through a quiescent state. The RCU
core then checks this number against the current value, and rejects
the quiescent state if there is a mismatch. However, one additional
possibility must be accounted for, namely that the quiescent state was
recorded after the prior grace period completed but before the current
grace period started. In this case, the RCU core must reject the
quiescent state, but the recorded number will match. This is handled
when the CPU becomes aware of a new grace period -- at that point,
it invalidates any prior quiescent state.

This works, but is a bit indirect. The new approach records the current
grace period, and the RCU core checks to see (1) that this is still the
current grace period and (2) that this grace period has not yet ended.
This approach simplifies reasoning about correctness, and this commit
changes over to this new approach.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e4cc1f22

rcu: Fix RCU's NMI documentation · b15a2e7d

由 Paul E. McKenney 提交于 6月 07, 2011

It has long been the case that the architecture must call nmi_enter()
and nmi_exit() rather than irq_enter() and irq_exit() in order to
permit RCU read-side critical sections in NMIs.  Catch the documentation
up with reality.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>

b15a2e7d

rcu: Catch rcutorture up to new RCU API additions · bdf2a436

由 Paul E. McKenney 提交于 6月 07, 2011

Now that the RCU API contains synchronize_rcu_bh(), synchronize_sched(),
call_rcu_sched(), and rcu_bh_expedited()...

Make rcutorture test synchronize_rcu_bh(), getting rid of the old
rcu_bh_torture_synchronize() workaround. Similarly, make rcutorture test
synchronize_sched(), getting rid of the old sched_torture_synchronize()
workaround. Make rcutorture test call_rcu_sched() instead of wrappering
synchronize_sched(). Also add testing of rcu_bh_expedited().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bdf2a436

rcu: Update rcutorture documentation · 63cd758e

由 Paul E. McKenney 提交于 6月 05, 2011

Update rcutorture documentation to account for boosting, new types of
RCU torture testing that have been added over the past few years, and
the memory-barrier testing that was added an embarrassingly long time
ago.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

63cd758e

rcu: Update documentation to flag RCU_BOOST trace information · d5988af5

由 Paul E. McKenney 提交于 6月 15, 2011

Call out the RCU_TRACE information that is provided only in kernels
built with RCU_BOOST.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d5988af5

13 6月, 2011 1 次提交

doc: fix wrong arch/i386 references · 25eb650a

由 Wanlong Gao 提交于 6月 13, 2011

Change all "arch/i386" to "arch/x86" in Documentaion/,
since the directory has changed.

Also update the files which have changed their filename
in the meantime accordingly.
Signed-off-by: NWanlong Gao <wanlong.gao@gmail.com>
[jkosina@suse.cz: reword changelog]
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

25eb650a

27 5月, 2011 1 次提交

rcu: Decrease memory-barrier usage based on semi-formal proof · 23b5c8fa

由 Paul E. McKenney 提交于 9月 07, 2010

(Note: this was reverted, and is now being re-applied in pieces, with
this being the fifth and final piece. See below for the reason that
it is now felt to be safe to re-apply this.)

Commit d09b62df fixed grace-period synchronization, but left some smp_mb()
invocations in rcu_process_callbacks() that are no longer needed, but
sheer paranoia prevented them from being removed. This commit removes
them and provides a proof of correctness in their absence. It also adds
a memory barrier to rcu_report_qs_rsp() immediately before the update to
rsp->completed in order to handle the theoretical possibility that the
compiler or CPU might move massive quantities of code into a lock-based
critical section. This also proves that the sheer paranoia was not
entirely unjustified, at least from a theoretical point of view.

In addition, the old dyntick-idle synchronization depended on the fact
that grace periods were many milliseconds in duration, so that it could
be assumed that no dyntick-idle CPU could reorder a memory reference
across an entire grace period. Unfortunately for this design, the
addition of expedited grace periods breaks this assumption, which has
the unfortunate side-effect of requiring atomic operations in the
functions that track dyntick-idle state for RCU. (There is some hope
that the algorithms used in user-level RCU might be applied here, but
some work is required to handle the NMIs that user-space applications
can happily ignore. For the short term, better safe than sorry.)

This proof assumes that neither compiler nor CPU will allow a lock
acquisition and release to be reordered, as doing so can result in
deadlock. The proof is as follows:

1. A given CPU declares a quiescent state under the protection of
its leaf rcu_node's lock.

2. If there is more than one level of rcu_node hierarchy, the
last CPU to declare a quiescent state will also acquire the
->lock of the next rcu_node up in the hierarchy, but only
after releasing the lower level's lock. The acquisition of this
lock clearly cannot occur prior to the acquisition of the leaf
node's lock.

3. Step 2 repeats until we reach the root rcu_node structure.
Please note again that only one lock is held at a time through
this process. The acquisition of the root rcu_node's ->lock
must occur after the release of that of the leaf rcu_node.

4. At this point, we set the ->completed field in the rcu_state
structure in rcu_report_qs_rsp(). However, if the rcu_node
hierarchy contains only one rcu_node, then in theory the code
preceding the quiescent state could leak into the critical
section. We therefore precede the update of ->completed with a
memory barrier. All CPUs will therefore agree that any updates
preceding any report of a quiescent state will have happened
before the update of ->completed.

5. Regardless of whether a new grace period is needed, rcu_start_gp()
will propagate the new value of ->completed to all of the leaf
rcu_node structures, under the protection of each rcu_node's ->lock.
If a new grace period is needed immediately, this propagation
will occur in the same critical section that ->completed was
set in, but courtesy of the memory barrier in #4 above, is still
seen to follow any pre-quiescent-state activity.

6. When a given CPU invokes __rcu_process_gp_end(), it becomes
aware of the end of the old grace period and therefore makes
any RCU callbacks that were waiting on that grace period eligible
for invocation.

If this CPU is the same one that detected the end of the grace
period, and if there is but a single rcu_node in the hierarchy,
we will still be in the single critical section. In this case,
the memory barrier in step #4 guarantees that all callbacks will
be seen to execute after each CPU's quiescent state.

On the other hand, if this is a different CPU, it will acquire
the leaf rcu_node's ->lock, and will again be serialized after
each CPU's quiescent state for the old grace period.

On the strength of this proof, this commit therefore removes the memory
barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
The effect is to reduce the number of memory barriers by one and to
reduce the frequency of execution from about once per scheduling tick
per CPU to once per grace period.

This was reverted do to hangs found during testing by Yinghai Lu and
Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that
located the underlying problem, and Frederic also provided the fix.

The underlying problem was that the HARDIRQ_ENTER() macro from
lib/locking-selftest.c invoked irq_enter(), which in turn invokes
rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which
does not invoke rcu_irq_exit(). This situation resulted in calls
to rcu_irq_enter() that were not balanced by the required calls to
rcu_irq_exit(). Therefore, after these locking selftests completed,
RCU's dyntick-idle nesting count was a large number (for example,
72), which caused RCU to to conclude that the affected CPU was not in
dyntick-idle mode when in fact it was.

RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting
in hangs.

In contrast, with Frederic's patch, which replaces the irq_enter()
in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call
either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU
running the test is already marked as not being in dyntick-idle mode.
This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU
then has no problem working out which CPUs are in dyntick-idle mode and
which are not.

The reason that the imbalance was not noticed before the barrier patch
was applied is that the old implementation of rcu_enter_nohz() ignored
the nesting depth. This could still result in delays, but much shorter
ones. Whenever there was a delay, RCU would IPI the CPU with the
unbalanced nesting level, which would eventually result in rcu_enter_nohz()
being called, which in turn would force RCU to see that the CPU was in
dyntick-idle mode.

The reason that very few people noticed the problem is that the mismatched
irq_enter() vs. __irq_exit() occured only when the kernel was built with
CONFIG_DEBUG_LOCKING_API_SELFTESTS.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

23b5c8fa

20 5月, 2011 1 次提交

Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" · 80d02085

由 Paul E. McKenney 提交于 5月 12, 2011

This reverts commit e59fb312.

This reversion was due to (extreme) boot-time slowdowns on SPARC seen by
Yinghai Lu and on x86 by Ingo
.
This is a non-trivial reversion due to intervening commits.

Conflicts:

	Documentation/RCU/trace.txt
	kernel/rcutree.c
Signed-off-by: NIngo Molnar <mingo@elte.hu>

80d02085

06 5月, 2011 4 次提交

rcu: Add forward-progress diagnostic for per-CPU kthreads · 5ece5bab

由 Paul E. McKenney 提交于 4月 22, 2011

Increment a per-CPU counter on each pass through rcu_cpu_kthread()'s
service loop, and add it to the rcudata trace output.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

5ece5bab

rcu: add grace-period age and more kthread state to tracing · 15ba0ba8

由 Paul E. McKenney 提交于 4月 06, 2011

This commit adds the age in jiffies of the current grace period along
with the duration in jiffies of the longest grace period since boot
to the rcu/rcugp debugfs file. It also adds an additional "O" state
to kthread tracing to differentiate between the kthread waiting due to
having nothing to do on the one hand and waiting due to being on the
wrong CPU on the other hand.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

15ba0ba8

rcu: update tracing documentation for new rcutorture and rcuboost · 90e6ac36

由 Paul E. McKenney 提交于 4月 06, 2011

This commit documents the new debugfs rcu/rcutorture and rcu/rcuboost
trace files.  The description has been updated as suggested by Josh
Triplett.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

90e6ac36

rcu: add callback-queue information to rcudata output · 0ac3d136

由 Paul E. McKenney 提交于 3月 28, 2011

This commit adds an indication of the state of the callback queue using
a string of four characters following the "ql=" integer queue length.
The first character is "N" if there are callbacks that have been
queued that are not yet ready to be handled by the next grace period, or
"." otherwise. The second character is "R" if there are callbacks queued
that are ready to be handled by the next grace period, or "." otherwise.
The third character is "W" if there are callbacks waiting for the current
grace period, or "." otherwise. Finally, the fourth character is "D"
if there are callbacks that have been handled by a prior grace period
and are waiting to be invoked, or ".".

Note that callbacks that are in the process of being invoked are
not shown. These callbacks would have been removed from the rcu_data
structure's list by rcu_do_batch() prior to being executed. (These
callbacks are also not reflected in the "ql=" total, FWIW.)

Also, document the new callback-queue trace information.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

0ac3d136