提交 · f78f5b90c4ffa559e400c3919a02236101f29f3f · openanolis / cloud-kernel

23 7月, 2015 2 次提交

rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN() · f78f5b90

由 Paul E. McKenney 提交于 6月 18, 2015

This commit renames rcu_lockdep_assert() to RCU_LOCKDEP_WARN() for
consistency with the WARN() series of macros.  This also requires
inverting the sense of the conditional, which this commit also does.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NIngo Molnar <mingo@kernel.org>

f78f5b90

rcu: Fix obsolete priority-boosting comment · bc17ea10

由 Paul E. McKenney 提交于 6月 06, 2015

Tasks are no longer migrated to the root rcu_node, so there is no
longer any need for a boost kthread for the root rcu_node, and there no
longer is such a kthread. This commit therefore fixes the comment in
rcu_boost_kthread()'s header to reflect this new reality.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bc17ea10

19 6月, 2015 1 次提交

timer: Reduce timer migration overhead if disabled · bc7a34b8

由 Thomas Gleixner 提交于 5月 26, 2015

Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.

We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.

The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.

With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.

Before:
  47.76%  hog       [.] main
  14.84%  [kernel]  [k] _raw_spin_lock_irqsave
   9.55%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.71%  [kernel]  [k] mod_timer
   6.24%  [kernel]  [k] lock_timer_base.isra.38
   3.76%  [kernel]  [k] detach_if_pending
   3.71%  [kernel]  [k] del_timer
   2.50%  [kernel]  [k] internal_add_timer
   1.51%  [kernel]  [k] get_nohz_timer_target
   1.28%  [kernel]  [k] __internal_add_timer
   0.78%  [kernel]  [k] timerfn
   0.48%  [kernel]  [k] wake_up_nohz_cpu

After:
  48.10%  hog       [.] main
  15.25%  [kernel]  [k] _raw_spin_lock_irqsave
   9.76%  [kernel]  [k] _raw_spin_unlock_irqrestore
   6.50%  [kernel]  [k] mod_timer
   6.44%  [kernel]  [k] lock_timer_base.isra.38
   3.87%  [kernel]  [k] detach_if_pending
   3.80%  [kernel]  [k] del_timer
   2.67%  [kernel]  [k] internal_add_timer
   1.33%  [kernel]  [k] __internal_add_timer
   0.73%  [kernel]  [k] timerfn
   0.54%  [kernel]  [k] wake_up_nohz_cpu
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

bc7a34b8

28 5月, 2015 12 次提交

rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT_LEAF · 47d631af

由 Paul E. McKenney 提交于 4月 21, 2015

This commit introduces an RCU_FANOUT_LEAF C-preprocessor macro so
that RCU will build even when CONFIG_RCU_FANOUT_LEAF is undefined.
The RCU_FANOUT_LEAF macro is set to the value of CONFIG_RCU_FANOUT_LEAF
when defined, otherwise it is set to 32 for 32-bit systems and 64 for
64-bit systems.  This commit then makes CONFIG_RCU_FANOUT_LEAF depend
on CONFIG_RCU_EXPERT, so that Kconfig users won't be asked about
CONFIG_RCU_FANOUT_LEAF unless they want to be.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

47d631af

rcu: Make RCU able to tolerate undefined CONFIG_RCU_FANOUT · 05c5df31

由 Paul E. McKenney 提交于 4月 20, 2015

This commit introduces an RCU_FANOUT C-preprocessor macro so that RCU will
build even when CONFIG_RCU_FANOUT is undefined. The RCU_FANOUT macro is
set to the value of CONFIG_RCU_FANOUT when defined, otherwise it is set
to 32 for 32-bit systems and 64 for 64-bit systems. This commit then
makes CONFIG_RCU_FANOUT depend on CONFIG_RCU_EXPERT, so that Kconfig
users won't be asked about CONFIG_RCU_FANOUT unless they want to be.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

05c5df31

rcu: Convert CONFIG_RCU_FANOUT_EXACT to boot parameter · 7fa27001

由 Paul E. McKenney 提交于 4月 20, 2015

The CONFIG_RCU_FANOUT_EXACT Kconfig parameter is used primarily (and
perhaps only) by rcutorture to verify that RCU works correctly in specific
rcu_node combining-tree configurations.  It therefore does not make
much sense have this as a question to people attempting to configure
their kernels.  So this commit creates an rcutree.rcu_fanout_exact=
boot parameter that rcutorture can use, and eliminates the original
CONFIG_RCU_FANOUT_EXACT Kconfig parameter.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>

7fa27001

rcu: Adjust ->lock acquisition for tasks no longer migrating · 0a0ba1c9

由 Paul E. McKenney 提交于 3月 08, 2015

Tasks are no longer migrated away from a given rcu_node structure
when all CPUs corresponding to that rcu_node structure have gone offline.
This means that rcu_read_unlock_special() no longer needs to loop
retrying rcu_node ->lock acquisition because the current task is
guaranteed to stay put.

This commit takes a small and paranoid step towards relying on this
guarantee by placing a WARN_ON_ONCE() just after the early exit from
the lock-acquisition loop.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0a0ba1c9

rcu: Fix missing task information during rcu-preempt stall · 82efed06

由 Patrick Daly 提交于 4月 07, 2015

The first item list_for_each_entry_continue(alist) iterates over is
alist->next, rather than alist itself. Consequently,
rcu_print_detail_task_stall_rnp() skips the task referenced by gp_tasks.

Use gp_tasks->prev as the argument to list_for_each_entry_continue()
instead.
Signed-off-by: NPatrick Daly <pdaly@codeaurora.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

82efed06

rcu: tree_plugin: Use bool function return values of true/false not 1/0 · 5ce035fb

由 Joe Perches 提交于 3月 30, 2015

Use the normal return values for bool functions
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5ce035fb

rcu: Eliminate a few CONFIG_RCU_NOCB_CPU_ALL #ifdefs · 3382adbc

由 Paul E. McKenney 提交于 3月 04, 2015

This commit converts several CONFIG_RCU_NOCB_CPU_ALL #ifdefs to
instead use IS_ENABLED().  This change should help avoid hiding
code from compiler diagnostics.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3382adbc

rcu: Create an immutable rcu_data_p pointer to default rcu_data structure · 2927a689

由 Paul E. McKenney 提交于 3月 04, 2015

This commit creates an immutable rcu_data_p pointer that references
rcu_preempt_data for TREE_PREEMPT_RCU builds and that references
rcu_sched_data for TREE_RCU builds. This rcu_data_p pointer will enable
more code to move from #ifdef to IS_ENABLED().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2927a689

rcu: Tell the compiler that rcu_state_p is immutable · b28a7c01

由 Paul E. McKenney 提交于 3月 04, 2015

This commit adds a "const" tag to the declarations of rcu_state_p,
which should allow the compiler to generate better code and also to
catch erroneous assignments to this variable.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b28a7c01

rcu: Eliminate a few RCU_BOOST #ifdefs in favor of IS_ENABLED() · 727b705b

由 Paul E. McKenney 提交于 3月 03, 2015

This commit removes a few RCU_BOOST #ifdefs, replacing them with
IS_ENABLED()-protected return statements. This relies on the
optimizer to remove any resulting dead code. There are several other
RCU_BOOST #ifdefs, however these rely on some per-CPU variables that
are available only under RCU_BOOST. These might be converted later,
if the simplification proves to outweigh the increase in memory footprint.
One hoped-for advantage is more easily locating compiler errors in
obscure combinations of Kconfig parameters.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-rt-users@vger.kernel.org>

727b705b

rcu: Convert from rcu_preempt_state to *rcu_state_p · e63c887c

由 Paul E. McKenney 提交于 3月 03, 2015

It would be good to move more code from #ifdef to IS_ENABLED(), but
that does not work if the body of the IS_ENABLED() "if" statement
references a variable (such as rcu_preempt_state) that does not
exist if the IS_ENABLED() Kconfig variable is not set.  This commit
therefore substitutes *rcu_state_p for all uses of rcu_preempt_state
in kernel/rcu/tree_preempt.h, which should enable elimination of
a few #ifdefs.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e63c887c

rcu: Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() · 7d0ae808

由 Paul E. McKenney 提交于 3月 03, 2015

This commit moves from the old ACCESS_ONCE() API to the new READ_ONCE()
and WRITE_ONCE() APIs.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck:  Updated to include kernel/torture.c as suggested by Jason Low. ]

7d0ae808

22 4月, 2015 1 次提交

tick: Nohz: Rework next timer evaluation · c1ad348b

由 Thomas Gleixner 提交于 4月 14, 2015

The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.

Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.

Makes the code more readable and gets rid of the jiffies magic in the
nohz code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c1ad348b

13 3月, 2015 3 次提交

rcu: Process offlining and onlining only at grace-period start · 0aa04b05

由 Paul E. McKenney 提交于 1月 23, 2015

Races between CPU hotplug and grace periods can be difficult to resolve,
so the ->onoff_mutex is used to exclude the two events. Unfortunately,
this means that it is impossible for an outgoing CPU to perform the
last bits of its offlining from its last pass through the idle loop,
because sleeplocks cannot be acquired in that context.

This commit avoids these problems by buffering online and offline events
in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a
grace period starts, the events accumulated in this mask are applied to
the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special
case of all CPUs corresponding to a given leaf rcu_node structure being
offline while there are still elements in that structure's ->blkd_tasks
list is handled using a new ->wait_blkd_tasks field. In this case,
propagating the offline bits up the tree is deferred until the beginning
of the grace period after all of the tasks have exited their RCU read-side
critical sections and removed themselves from the list, at which point
the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node
structure's CPUs comes back online before the list empties, then the
->wait_blkd_tasks flag is simply cleared.

This of course means that RCU's notion of which CPUs are offline can be
out of date. This is OK because RCU need only wait on CPUs that were
online at the time that the grace period started. In addition, RCU's
force-quiescent-state actions will handle the case where a CPU goes
offline after the grace period starts.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0aa04b05

rcu: Move rcu_report_unblock_qs_rnp() to common code · cc99a310

由 Paul E. McKenney 提交于 2月 23, 2015

The rcu_report_unblock_qs_rnp() function is invoked when the
last task blocking the current grace period exits its outermost
RCU read-side critical section. Previously, this was called only
from rcu_read_unlock_special(), and was therefore defined only when
CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when
CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at
the beginnings of RCU grace periods. The reason for this change is that
the last task on a given leaf rcu_node structure's ->blkd_tasks list
might well exit its RCU read-side critical section between the time that
recent CPU-hotplug operations were applied and when the new grace period
was initialized. This situation could result in RCU waiting forever on
that leaf rcu_node structure, because if all that structure's CPUs were
already offline, there would be no quiescent-state events to drive that
structure's part of the grace period.

This commit therefore moves rcu_report_unblock_qs_rnp() to common code
that is built unconditionally so that the quiescent-state-forcing code
can clean up after this situation, avoiding the grace-period stall.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

cc99a310

rcu: Rework preemptible expedited bitmask handling · 8eb74b2b

由 Paul E. McKenney 提交于 2月 13, 2015

Currently, the rcu_node tree ->expmask bitmasks are initially set to
reflect the online CPUs.  This is pointless, because only the CPUs
preempted within RCU read-side critical sections by the preceding
synchronize_sched_expedited() need to be tracked.  This commit therefore
instead sets up these bitmasks based on the state of the ->blkd_tasks
lists.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

8eb74b2b

12 3月, 2015 2 次提交

P
rcu: Eliminate empty HOTPLUG_CPU ifdef · 18c629ea
由 Paul E. McKenney 提交于 1月 19, 2015
```
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
```
18c629ea

rcu: Simplify sync_rcu_preempt_exp_init() · c8aead6a

由 Paul E. McKenney 提交于 1月 19, 2015

This commit eliminates a boolean and associated "if" statement by
rearranging the code.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c8aead6a

04 3月, 2015 4 次提交

P
rcu: Add boot-up check for non-default CONFIG_RCU_FANOUT_LEAF values · a3bd2c09
由 Paul E. McKenney 提交于 1月 21, 2015
```
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
```
a3bd2c09

rcu: Use IS_ENABLED() to simplify rcu_bootup_announce_oddness() · ab6f5bd6

由 Paul E. McKenney 提交于 1月 21, 2015

This commit gets rid of some inline #ifdefs by replacing them with
IS_ENABLED.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ab6f5bd6

rcu: Improve diagnostics for blocked critical sections in irq · d24209bb

由 Paul E. McKenney 提交于 1月 21, 2015

If an RCU read-side critical section occurs within an interrupt handler
or a softirq handler, it cannot have been preempted. Therefore, there is
a check in rcu_read_unlock_special() checking for this error. However,
when this check triggers, it lacks diagnostic information. This commit
therefore moves rcu_read_unlock()'s lockdep annotation to follow the
call to __rcu_read_unlock() and changes rcu_read_unlock_special()'s
WARN_ON_ONCE() to an lockdep_rcu_suspicious() in order to locate where
the offending RCU read-side critical section began. In addition, the
value of the ->rcu_read_unlock_special field is printed.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d24209bb

rcu: Move early-boot callbacks to no-CBs lists for no-CBs CPUs · 34404ca8

由 Paul E. McKenney 提交于 1月 19, 2015

When a CPU is first determined to be a no-CBs CPUs, this commit causes
any early boot callbacks to be moved to the no-CBs callback list,
allowing them to be invoked.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

34404ca8

27 2月, 2015 3 次提交

rcu: Tighten up affinity and check for sysidle · 5871968d

由 Paul E. McKenney 提交于 2月 24, 2015

If the RCU grace-period kthread invoking rcu_sysidle_check_cpu()
happens to be running on the tick_do_timer_cpu initially,
then rcu_bind_gp_kthread() won't bind it. This kthread might
then migrate before invoking rcu_gp_fqs(), which will trigger the
WARN_ON_ONCE() in rcu_sysidle_check_cpu(). This commit therefore makes
rcu_bind_gp_kthread() do the binding even if the kthread is currently
on the same CPU. Because this incurs added overhead, this commit also
causes each RCU grace-period kthread to invoke rcu_bind_gp_kthread()
once at boot rather than at the beginning of each grace period.
And as long as rcu_bind_gp_kthread() is being modified, this commit
eliminates its #ifdef.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5871968d

rcu: Update from rcu_expedited variable to rcu_gp_is_expedited() · 5afff48b

由 Paul E. McKenney 提交于 2月 18, 2015

This commit updates open-coded tests of the rcu_expedited variable
to instead use rcu_gp_is_expedited().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5afff48b

rcu: Refine diagnostics for lacking kthread for no-CBs callbacks · 59f792d1

由 Paul E. McKenney 提交于 1月 19, 2015

Some diagnostics under CONFIG_PROVE_RCU in rcu_nocb_cpu_needs_barrier()
assume that there can be no early-boot callbacks. This commit therefore
qualifies the diagnostic with rcu_scheduler_fully_active to permit
early boot callbacks to avoid this splat.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

59f792d1

14 2月, 2015 1 次提交

rcu: use %*pb[l] to print bitmaps including cpumasks and nodemasks · ad853b48

由 Tejun Heo 提交于 2月 13, 2015

printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad853b48

12 2月, 2015 1 次提交

rcu: Clear need_qs flag to prevent splat · c0135d07

由 Paul E. McKenney 提交于 1月 22, 2015

If the scheduling-clock interrupt sets the current tasks need_qs flag,
but if the current CPU passes through a quiescent state in the meantime,
then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool
RCU into thinking that additional rcu_read_unlock_special() processing
is needed.  This commit therefore clears the need_qs flag before checking
for additional processing.

For this problem to occur, we need rcu_preempt_data.passed_quiesce equal
to true and current->rcu_read_unlock_special.b.need_qs also equal to true.
This condition can occur as follows:

1.	CPU 0 is aware of the current preemptible RCU grace period,
	but has not yet passed through a quiescent state.  Among other
	things, this means that rcu_preempt_data.passed_quiesce is false.

2.	Task A running on CPU 0 enters a preemptible RCU read-side
	critical section.

3.	CPU 0 takes a scheduling-clock interrupt, which notices the
	RCU read-side critical section and the need for a quiescent state,
	and thus sets current->rcu_read_unlock_special.b.need_qs to true.

4.	Task A is preempted, enters the scheduler, eventually invoking
	rcu_preempt_note_context_switch() which in turn invokes
	rcu_preempt_qs().

	Because rcu_preempt_data.passed_quiesce is false,
	control enters the body of the "if" statement, which sets
	rcu_preempt_data.passed_quiesce to true.

5.	At this point, CPU 0 takes an interrupt.  The interrupt
	handler contains an RCU read-side critical section, and
	the rcu_read_unlock() notes that current->rcu_read_unlock_special
	is nonzero, and thus invokes rcu_read_unlock_special().

6.	Once in rcu_read_unlock_special(), the fact that
	current->rcu_read_unlock_special.b.need_qs is true becomes
	apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs().
	Recursively, given that we interrupted out of that same
	function in the preceding step.

7.	Because rcu_preempt_data.passed_quiesce is now true,
	rcu_preempt_qs() does nothing, and simply returns.

8.	Upon return to rcu_read_unlock_special(), it is noted that
	current->rcu_read_unlock_special is still nonzero (because
	the interrupted rcu_preempt_qs() had not yet gotten around
	to clearing current->rcu_read_unlock_special.b.need_qs).

9.	Execution proceeds to the WARN_ON_ONCE(), which notes that
	we are in an interrupt handler and thus duly splats.

The solution, as noted above, is to make rcu_read_unlock_special()
clear out current->rcu_read_unlock_special.b.need_qs after calling
rcu_preempt_qs().  The interrupted rcu_preempt_qs() will clear it again,
but this is harmless.  The worst that happens is that we clobber another
attempt to set this field, but this is not a problem because we just
got done reporting a quiescent state.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix embarrassing build bug noted by Sasha Levin. ]
Tested-by: NSasha Levin <sasha.levin@oracle.com>

c0135d07

16 1月, 2015 1 次提交

rcu: Optionally run grace-period kthreads at real-time priority · a94844b2

由 Paul E. McKenney 提交于 12月 12, 2014

Recent testing has shown that under heavy load, running RCU's grace-period
kthreads at real-time priority can improve performance (according to 0day
test robot) and reduce the incidence of RCU CPU stall warnings. However,
most systems do just fine with the default non-realtime priorities for
these kthreads, and it does not make sense to expose the entire user
base to any risk stemming from this change, given that this change is
of use only to a few users running extremely heavy workloads.

Therefore, this commit allows users to specify realtime priorities
for the grace-period kthreads, but leaves them running SCHED_OTHER
by default. The realtime priority may be specified at build time
via the RCU_KTHREAD_PRIO Kconfig parameter, or at boot time via the
rcutree.kthread_prio parameter. Either way, 0 says to continue the
default SCHED_OTHER behavior and values from 1-99 specify that priority
of SCHED_FIFO behavior. Note that a value of 0 is not permitted when
the RCU_BOOST Kconfig parameter is specified.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a94844b2

11 1月, 2015 2 次提交

rcutorture: Check from beginning to end of grace period · 917963d0

由 Paul E. McKenney 提交于 11月 21, 2014

Currently, rcutorture's Reader Batch checks measure from the end of
the previous grace period to the end of the current one. This commit
tightens up these checks by measuring from the start and end of the same
grace period. This involves adding rcu_batches_started() and friends
corresponding to the existing rcu_batches_completed() and friends.

We leave SRCU alone for the moment, as it does not yet have a way of
tracking both ends of its grace periods.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

917963d0

rcu: Make _batches_completed() functions return unsigned long · 9733e4f0

由 Paul E. McKenney 提交于 11月 21, 2014

Long ago, the various ->completed fields were of type long, but now are
unsigned long due to signed-integer-overflow concerns. However, the
various _batches_completed() functions remained of type long, even though
their only purpose in life is to return the corresponding ->completed
field. This patch cleans this up by changing these functions' return
types to unsigned long.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9733e4f0

07 1月, 2015 7 次提交

rcu: Handle gpnum/completed wrap while dyntick idle · e3663b10

由 Paul E. McKenney 提交于 12月 08, 2014

Subtle race conditions can result if a CPU stays in dyntick-idle mode
long enough for the ->gpnum and ->completed fields to wrap.  For
example, consider the following sequence of events:

o	CPU 1 encounters a quiescent state while waiting for grace period
	5 to complete, but then enters dyntick-idle mode.

o	While CPU 1 is in dyntick-idle mode, the grace-period counters
	wrap around so that the grace period number is now 4.

o	Just as CPU 1 exits dyntick-idle mode, grace period 4 completes
	and grace period 5 begins.

o	The quiescent state that CPU 1 passed through during the old
	grace period 5 looks like it applies to the new grace period
	5.  Therefore, the new grace period 5 completes without CPU 1
	having passed through a quiescent state.

This could clearly be a fatal surprise to any long-running RCU read-side
critical section that happened to be running on CPU 1 at the time.  At one
time, this was not a problem, given that it takes significant time for
the grace-period counters to overflow even on 32-bit systems.  However,
with the advent of NO_HZ_FULL and SMP embedded systems, arbitrarily long
idle periods are now becoming quite feasible.  It is therefore time to
close this race.

This commit therefore avoids this race condition by having the
quiescent-state forcing code detect when a CPU is falling too far
behind, and setting a new rcu_data field ->gpwrap when this happens.
Whenever this new ->gpwrap field is set, the CPU's ->gpnum and ->completed
fields are known to be untrustworthy, and can be ignored, along with
any associated quiescent states.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e3663b10

rcu: Make RCU_CPU_STALL_INFO include number of fqs attempts · fc908ed3

由 Paul E. McKenney 提交于 12月 08, 2014

One way that an RCU CPU stall warning can happen is if the grace-period
kthread is not allowed to execute. One proxy for this kthread's
forward progress is the number of force-quiescent-state (fqs) scans.
This commit therefore adds the number of fqs scans to the RCU CPU stall
warning printouts when CONFIG_RCU_CPU_STALL_INFO=y.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

fc908ed3

rcu: Revert "Allow post-unlock reference for rt_mutex" to avoid priority-inversion · abaf3f9d

由 Lai Jiangshan 提交于 11月 18, 2014

The patch dfeb9765 ("Allow post-unlock reference for rt_mutex")
ensured rcu-boost safe even the rt_mutex has post-unlock reference.

But rt_mutex allowing post-unlock reference is definitely a bug and it was
fixed by the commit 27e35715 ("rtmutex: Plug slow unlock race").
This fix made the previous patch (dfeb9765) useless.

And even worse, the priority-inversion introduced by the the previous
patch still exists.

rcu_read_unlock_special() {
	rt_mutex_unlock(&rnp->boost_mtx);
	/* Priority-Inversion:
	 * the current task had been deboosted and preempted as a low
	 * priority task immediately, it could wait long before reschedule in,
	 * and the rcu-booster also waits on this low priority task and sleeps.
	 * This priority-inversion makes rcu-booster can't work
	 * as expected.
	 */
	complete(&rnp->boost_completion);
}

Just revert the patch to avoid it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

abaf3f9d

rcu: Don't bother affinitying rcub kthreads away from offline CPUs · 5d0b0249

由 Paul E. McKenney 提交于 11月 10, 2014

When rcu_boost_kthread_setaffinity() sees that all CPUs for a given
rcu_node structure are now offline, it affinities the corresponding
RCU-boost ("rcub") kthread away from those CPUs. This is pointless
because the kthread cannot run on those offline CPUs in any case.
This commit therefore removes this unneeded code.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5d0b0249

rcu: Don't spawn rcub kthreads on root rcu_node structure · 3e9f5c70

由 Paul E. McKenney 提交于 11月 03, 2014

Now that offlining CPUs no longer moves leaf rcu_node structures'
->blkd_tasks lists to the root, there is no way for the root rcu_node
structure's ->blkd_task list to be nonempty, unless the root node is also
the sole leaf node.  This commit therefore refrains from creating an rcub
kthread for the root rcu_node structure unless it is also the sole leaf.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3e9f5c70

rcu: Make use of rcu_preempt_has_tasks() · 96e92021

由 Paul E. McKenney 提交于 10月 31, 2014

Given that there is now arcu_preempt_has_tasks() function that checks
to see if the ->blkd_tasks list is non-empty, this commit makes use of it.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

96e92021

rcu: Don't migrate blocked tasks even if all corresponding CPUs offline · d19fb8d1

由 Paul E. McKenney 提交于 10月 31, 2014

When the last CPU associated with a given leaf rcu_node structure
goes offline, something must be done about the tasks queued on that
rcu_node structure. Each of these tasks has been preempted on one of
the leaf rcu_node structure's CPUs while in an RCU read-side critical
section that it have not yet exited. Handling these tasks is the job of
rcu_preempt_offline_tasks(), which migrates them from the leaf rcu_node
structure to the root rcu_node structure.

Unfortunately, this migration has to be done one task at a time because
each tasks allegiance must be shifted from the original leaf rcu_node to
the root, so that future attempts to deal with these tasks will acquire
the root rcu_node structure's ->lock rather than that of the leaf.
Worse yet, this migration must be done with interrupts disabled, which
is not so good for realtime response, especially given that there is
no bound on the number of tasks on a given rcu_node structure's list.
(OK, OK, there is a bound, it is just that it is unreasonably large,
especially on 64-bit systems.) This was not considered a problem back
when rcu_preempt_offline_tasks() was first written because realtime
systems were assumed not to do CPU-hotplug operations while real-time
applications were running. This assumption has proved of dubious validity
given that people are starting to run multiple realtime applications
on a single SMP system and that it is common practice to offline then
online a CPU before starting its real-time application in order to clear
extraneous processing off of that CPU. So we now need CPU hotplug
operations to avoid undue latencies.

This commit therefore avoids migrating these tasks, instead letting
them be dequeued one by one from the original leaf rcu_node structure
by rcu_read_unlock_special(). This means that the clearing of bits
from the upper-level rcu_node structures must be deferred until the
last such task has been dequeued, because otherwise subsequent grace
periods won't wait on them. This commit has the beneficial side effect
of simplifying the CPU-hotplug code for TREE_PREEMPT_RCU, especially in
CONFIG_RCU_BOOST builds.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d19fb8d1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功