提交 330e9e46 编写于 作者: L Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:
 "The sole purpose of these changes is to shrink and simplify the RCU
  code base, which has suffered from creeping bloat over the past couple
  of years. The end result is a net removal of ~2700 lines of code:

     79 files changed, 1496 insertions(+), 4211 deletions(-)

  Plus there's a marked reduction in the Kconfig space complexity as
  well, here's the number of matches on 'grep RCU' in the .config:

                               before       after

     x86-defconfig                 17          15
     x86-allmodconfig              33          20"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (86 commits)
  rcu: Remove RCU CPU stall warnings from Tiny RCU
  rcu: Remove event tracing from Tiny RCU
  rcu: Move RCU debug Kconfig options to kernel/rcu
  rcu: Move RCU non-debug Kconfig options to kernel/rcu
  rcu: Eliminate NOCBs CPU-state Kconfig options
  rcu: Remove debugfs tracing
  srcu: Remove Classic SRCU
  srcu: Fix rcutorture-statistics typo
  rcu: Remove SPARSE_RCU_POINTER Kconfig option
  rcu: Remove the now-obsolete PROVE_RCU_REPEATEDLY Kconfig option
  rcu: Remove typecheck() from RCU locking wrapper functions
  rcu: Remove #ifdef moving rcu_end_inkernel_boot from rcupdate.h
  rcu: Remove nohz_full full-system-idle state machine
  rcu: Remove the RCU_KTHREAD_PRIO Kconfig option
  rcu: Remove *_SLOW_* Kconfig options
  srcu: Use rnp->lock wrappers to replace explicit memory barriers
  rcu: Move rnp->lock wrappers for SRCU use
  rcu: Convert rnp->lock wrappers to macros for SRCU use
  rcu: Refactor #includes from include/linux/rcupdate.h
  bcm47xx: Fix build regression
  ...
......@@ -28,8 +28,6 @@ stallwarn.txt
- RCU CPU stall warnings (module parameter rcu_cpu_stall_suppress)
torture.txt
- RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
trace.txt
- CONFIG_RCU_TRACE debugfs files and formats
UP.txt
- RCU on Uniprocessor Systems
whatisRCU.txt
......
......@@ -559,9 +559,7 @@ The <tt>rcu_access_pointer()</tt> on line&nbsp;6 is similar to
For <tt>remove_gp_synchronous()</tt>, as long as all modifications
to <tt>gp</tt> are carried out while holding <tt>gp_lock</tt>,
the above optimizations are harmless.
However,
with <tt>CONFIG_SPARSE_RCU_POINTER=y</tt>,
<tt>sparse</tt> will complain if you
However, <tt>sparse</tt> will complain if you
define <tt>gp</tt> with <tt>__rcu</tt> and then
access it without using
either <tt>rcu_access_pointer()</tt> or <tt>rcu_dereference()</tt>.
......@@ -1849,7 +1847,8 @@ mass storage, or user patience, whichever comes first.
If the nesting is not visible to the compiler, as is the case with
mutually recursive functions each in its own translation unit,
stack overflow will result.
If the nesting takes the form of loops, either the control variable
If the nesting takes the form of loops, perhaps in the guise of tail
recursion, either the control variable
will overflow or (in the Linux kernel) you will get an RCU CPU stall warning.
Nevertheless, this class of RCU implementations is one
of the most composable constructs in existence.
......@@ -1977,9 +1976,8 @@ guard against mishaps and misuse:
and <tt>rcu_dereference()</tt>, perhaps (incorrectly)
substituting a simple assignment.
To catch this sort of error, a given RCU-protected pointer may be
tagged with <tt>__rcu</tt>, after which running sparse
with <tt>CONFIG_SPARSE_RCU_POINTER=y</tt> will complain
about simple-assignment accesses to that pointer.
tagged with <tt>__rcu</tt>, after which sparse
will complain about simple-assignment accesses to that pointer.
Arnd Bergmann made me aware of this requirement, and also
supplied the needed
<a href="https://lwn.net/Articles/376011/">patch series</a>.
......@@ -2036,7 +2034,7 @@ guard against mishaps and misuse:
some other synchronization mechanism, for example, reference
counting.
<li> In kernels built with <tt>CONFIG_RCU_TRACE=y</tt>, RCU-related
information is provided via both debugfs and event tracing.
information is provided via event tracing.
<li> Open-coded use of <tt>rcu_assign_pointer()</tt> and
<tt>rcu_dereference()</tt> to create typical linked
data structures can be surprisingly error-prone.
......@@ -2519,11 +2517,7 @@ It is similarly socially unacceptable to interrupt an
<tt>nohz_full</tt> CPU running in userspace.
RCU must therefore track <tt>nohz_full</tt> userspace
execution.
And in
<a href="https://lwn.net/Articles/558284/"><tt>CONFIG_NO_HZ_FULL_SYSIDLE=y</tt></a>
kernels, RCU must separately track idle CPUs on the one hand and
CPUs that are either idle or executing in userspace on the other.
In both cases, RCU must be able to sample state at two points in
RCU must therefore be able to sample state at two points in
time, and be able to determine whether or not some other CPU spent
any time idle and/or executing in userspace.
......@@ -2935,6 +2929,20 @@ The reason that this is possible is that SRCU is insensitive
to whether or not a CPU is online, which means that <tt>srcu_barrier()</tt>
need not exclude CPU-hotplug operations.
<p>
SRCU also differs from other RCU flavors in that SRCU's expedited and
non-expedited grace periods are implemented by the same mechanism.
This means that in the current SRCU implementation, expediting a
future grace period has the side effect of expediting all prior
grace periods that have not yet completed.
(But please note that this is a property of the current implementation,
not necessarily of future implementations.)
In addition, if SRCU has been idle for longer than the interval
specified by the <tt>srcutree.exp_holdoff</tt> kernel boot parameter
(25&nbsp;microseconds by default),
and if a <tt>synchronize_srcu()</tt> invocation ends this idle period,
that invocation will be automatically expedited.
<p>
As of v4.12, SRCU's callbacks are maintained per-CPU, eliminating
a locking bottleneck present in prior kernel versions.
......
......@@ -413,11 +413,11 @@ over a rather long period of time, but improvements are always welcome!
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.
17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
__rcu sparse checks (enabled by CONFIG_SPARSE_RCU_POINTER) to
validate your RCU code. These can help find problems as follows:
17. Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
__rcu sparse checks to validate your RCU code. These can help
find problems as follows:
CONFIG_PROVE_RCU: check that accesses to RCU-protected data
CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
......
此差异已折叠。
......@@ -3238,21 +3238,17 @@
rcutree.gp_cleanup_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period cleanup. This only has effect
when CONFIG_RCU_TORTURE_TEST_SLOW_CLEANUP is set.
RCU grace-period cleanup.
rcutree.gp_init_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period initialization. This only has
effect when CONFIG_RCU_TORTURE_TEST_SLOW_INIT
is set.
RCU grace-period initialization.
rcutree.gp_preinit_delay= [KNL]
Set the number of jiffies to delay each step of
RCU grace-period pre-initialization, that is,
the propagation of recent CPU-hotplug changes up
the rcu_node combining tree. This only has effect
when CONFIG_RCU_TORTURE_TEST_SLOW_PREINIT is set.
the rcu_node combining tree.
rcutree.rcu_fanout_exact= [KNL]
Disable autobalancing of the rcu_node combining
......@@ -3328,6 +3324,17 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
rcuperf.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
rcuperf.gp_async_max= [KNL]
Specify the maximum number of outstanding
callbacks per writer thread. When a writer
thread exceeds this limit, it invokes the
corresponding flavor of rcu_barrier() to allow
previously posted callbacks to drain.
rcuperf.gp_exp= [KNL]
Measure performance of expedited synchronous
grace-period primitives.
......@@ -3355,17 +3362,22 @@
rcuperf.perf_runnable= [BOOT]
Start rcuperf running at boot time.
rcuperf.perf_type= [KNL]
Specify the RCU implementation to test.
rcuperf.shutdown= [KNL]
Shut the system down after performance tests
complete. This is useful for hands-off automated
testing.
rcuperf.perf_type= [KNL]
Specify the RCU implementation to test.
rcuperf.verbose= [KNL]
Enable additional printk() statements.
rcuperf.writer_holdoff= [KNL]
Write-side holdoff between grace periods,
in microseconds. The default of zero says
no holdoff.
rcutorture.cbflood_inter_holdoff= [KNL]
Set holdoff time (jiffies) between successive
callback-flood tests.
......@@ -3803,6 +3815,15 @@
spia_pedr=
spia_peddr=
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
srcu_data structure's ->srcu_gp_seq_needed field.
The greater the number of bits set in this kernel
parameter, the less frequently counter wrap will
be checked for. Note that the bottom two bits
are ignored.
srcutree.exp_holdoff [KNL]
Specifies how many nanoseconds must elapse
since the end of the last SRCU grace period for
......
......@@ -303,6 +303,11 @@ defined which accomplish this::
void smp_mb__before_atomic(void);
void smp_mb__after_atomic(void);
Preceding a non-value-returning read-modify-write atomic operation with
smp_mb__before_atomic() and following it with smp_mb__after_atomic()
provides the same full ordering that is provided by value-returning
read-modify-write atomic operations.
For example, smp_mb__before_atomic() can be used like so::
obj->dead = 1;
......
......@@ -103,9 +103,3 @@ have already built it.
The optional make variable CF can be used to pass arguments to sparse. The
build system passes -Wbitwise to sparse automatically.
Checking RCU annotations
~~~~~~~~~~~~~~~~~~~~~~~~
RCU annotations are not checked by default. To enable RCU annotation
checks, include -DCONFIG_SPARSE_RCU_POINTER in your CF flags.
......@@ -109,13 +109,12 @@ SCHED_SOFTIRQ: Do all of the following:
on that CPU. If a thread that expects to run on the de-jittered
CPU awakens, the scheduler will send an IPI that can result in
a subsequent SCHED_SOFTIRQ.
2. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y,
CONFIG_NO_HZ_FULL=y, and, in addition, ensure that the CPU
to be de-jittered is marked as an adaptive-ticks CPU using the
"nohz_full=" boot parameter. This reduces the number of
scheduler-clock interrupts that the de-jittered CPU receives,
minimizing its chances of being selected to do the load balancing
work that runs in SCHED_SOFTIRQ context.
2. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be de-jittered
is marked as an adaptive-ticks CPU using the "nohz_full="
boot parameter. This reduces the number of scheduler-clock
interrupts that the de-jittered CPU receives, minimizing its
chances of being selected to do the load balancing work that
runs in SCHED_SOFTIRQ context.
3. To the extent possible, keep the CPU out of the kernel when it
is non-idle, for example, by avoiding system calls and by
forcing both kernel threads and interrupts to execute elsewhere.
......@@ -135,11 +134,10 @@ HRTIMER_SOFTIRQ: Do all of the following:
RCU_SOFTIRQ: Do at least one of the following:
1. Offload callbacks and keep the CPU in either dyntick-idle or
adaptive-ticks state by doing all of the following:
a. Build with CONFIG_RCU_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_ALL=y,
CONFIG_NO_HZ_FULL=y, and, in addition ensure that the CPU
to be de-jittered is marked as an adaptive-ticks CPU using
the "nohz_full=" boot parameter. Bind the rcuo kthreads
to housekeeping CPUs, which can tolerate OS jitter.
a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
de-jittered is marked as an adaptive-ticks CPU using the
"nohz_full=" boot parameter. Bind the rcuo kthreads to
housekeeping CPUs, which can tolerate OS jitter.
b. To the extent possible, keep the CPU out of the kernel
when it is non-idle, for example, by avoiding system
calls and by forcing both kernel threads and interrupts
......@@ -236,11 +234,10 @@ To reduce its OS jitter, do at least one of the following:
is feasible only if your workload never requires RCU priority
boosting, for example, if you ensure frequent idle time on all
CPUs that might execute within the kernel.
3. Build with CONFIG_RCU_NOCB_CPU=y and CONFIG_RCU_NOCB_CPU_ALL=y,
which offloads all RCU callbacks to kthreads that can be moved
off of CPUs susceptible to OS jitter. This approach prevents the
rcuc/%u kthreads from having any work to do, so that they are
never awakened.
3. Build with CONFIG_RCU_NOCB_CPU=y and boot with the rcu_nocbs=
boot parameter offloading RCU callbacks from all CPUs susceptible
to OS jitter. This approach prevents the rcuc/%u kthreads from
having any work to do, so that they are never awakened.
4. Ensure that the CPU never enters the kernel, and, in particular,
avoid initiating any CPU hotplug operations on this CPU. This is
another way of preventing any callbacks from being queued on the
......
......@@ -27,7 +27,7 @@ The purpose of this document is twofold:
(2) to provide a guide as to how to use the barriers that are available.
Note that an architecture can provide more than the minimum requirement
for any particular barrier, but if the architecure provides less than
for any particular barrier, but if the architecture provides less than
that, that architecture is incorrect.
Note also that it is possible that a barrier may be a no-op for an
......
......@@ -194,32 +194,9 @@ that the RCU callbacks are processed in a timely fashion.
Another approach is to offload RCU callback processing to "rcuo" kthreads
using the CONFIG_RCU_NOCB_CPU=y Kconfig option. The specific CPUs to
offload may be selected via several methods:
1. One of three mutually exclusive Kconfig options specify a
build-time default for the CPUs to offload:
a. The CONFIG_RCU_NOCB_CPU_NONE=y Kconfig option results in
no CPUs being offloaded.
b. The CONFIG_RCU_NOCB_CPU_ZERO=y Kconfig option causes
CPU 0 to be offloaded.
c. The CONFIG_RCU_NOCB_CPU_ALL=y Kconfig option causes all
CPUs to be offloaded. Note that the callbacks will be
offloaded to "rcuo" kthreads, and that those kthreads
will in fact run on some CPU. However, this approach
gives fine-grained control on exactly which CPUs the
callbacks run on, along with their scheduling priority
(including the default of SCHED_OTHER), and it further
allows this control to be varied dynamically at runtime.
2. The "rcu_nocbs=" kernel boot parameter, which takes a comma-separated
list of CPUs and CPU ranges, for example, "1,3-5" selects CPUs 1,
3, 4, and 5. The specified CPUs will be offloaded in addition to
any CPUs specified as offloaded by CONFIG_RCU_NOCB_CPU_ZERO=y or
CONFIG_RCU_NOCB_CPU_ALL=y. This means that the "rcu_nocbs=" boot
parameter has no effect for kernels built with RCU_NOCB_CPU_ALL=y.
offload may be selected using The "rcu_nocbs=" kernel boot parameter,
which takes a comma-separated list of CPUs and CPU ranges, for example,
"1,3-5" selects CPUs 1, 3, 4, and 5.
The offloaded CPUs will never queue RCU callbacks, and therefore RCU
never prevents offloaded CPUs from entering either dyntick-idle mode
......
......@@ -8,6 +8,7 @@
#ifndef __BCM47XX_NVRAM_H
#define __BCM47XX_NVRAM_H
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/vmalloc.h>
......
......@@ -17,11 +17,7 @@
# define __release(x) __context__(x,-1)
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
# define __percpu __attribute__((noderef, address_space(3)))
#ifdef CONFIG_SPARSE_RCU_POINTER
# define __rcu __attribute__((noderef, address_space(4)))
#else /* CONFIG_SPARSE_RCU_POINTER */
# define __rcu
#endif /* CONFIG_SPARSE_RCU_POINTER */
# define __private __attribute__((noderef))
extern void __chk_user_ptr(const volatile void __user *);
extern void __chk_io_ptr(const volatile void __iomem *);
......
......@@ -7,6 +7,10 @@
* unlimited scalability while maintaining a constant level of contention
* on the root node.
*
* This seemingly RCU-private file must be available to SRCU users
* because the size of the TREE SRCU srcu_struct structure depends
* on these definitions.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
......
/*
* RCU segmented callback lists
*
* This seemingly RCU-private file must be available to SRCU users
* because the size of the TREE SRCU srcu_struct structure depends
* on these definitions.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
......
......@@ -34,104 +34,15 @@
#define __LINUX_RCUPDATE_H
#include <linux/types.h>
#include <linux/cache.h>
#include <linux/spinlock.h>
#include <linux/threads.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
#include <linux/lockdep.h>
#include <linux/debugobjects.h>
#include <linux/bug.h>
#include <linux/compiler.h>
#include <linux/ktime.h>
#include <linux/atomic.h>
#include <linux/irqflags.h>
#include <linux/preempt.h>
#include <linux/bottom_half.h>
#include <linux/lockdep.h>
#include <asm/processor.h>
#include <linux/cpumask.h>
#include <asm/barrier.h>
#ifndef CONFIG_TINY_RCU
extern int rcu_expedited; /* for sysctl */
extern int rcu_normal; /* also for sysctl */
#endif /* #ifndef CONFIG_TINY_RCU */
#ifdef CONFIG_TINY_RCU
/* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
static inline bool rcu_gp_is_normal(void) /* Internal RCU use. */
{
return true;
}
static inline bool rcu_gp_is_expedited(void) /* Internal RCU use. */
{
return false;
}
static inline void rcu_expedite_gp(void)
{
}
static inline void rcu_unexpedite_gp(void)
{
}
#else /* #ifdef CONFIG_TINY_RCU */
bool rcu_gp_is_normal(void); /* Internal RCU use. */
bool rcu_gp_is_expedited(void); /* Internal RCU use. */
void rcu_expedite_gp(void);
void rcu_unexpedite_gp(void);
#endif /* #else #ifdef CONFIG_TINY_RCU */
enum rcutorture_type {
RCU_FLAVOR,
RCU_BH_FLAVOR,
RCU_SCHED_FLAVOR,
RCU_TASKS_FLAVOR,
SRCU_FLAVOR,
INVALID_RCU_FLAVOR
};
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
unsigned long *gpnum, unsigned long *completed);
void rcutorture_record_test_transition(void);
void rcutorture_record_progress(unsigned long vernum);
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
bool rcu_irq_enter_disabled(void);
#else
static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
*flags = 0;
*gpnum = 0;
*completed = 0;
}
static inline void rcutorture_record_test_transition(void)
{
}
static inline void rcutorture_record_progress(unsigned long vernum)
{
}
static inline bool rcu_irq_enter_disabled(void)
{
return false;
}
#ifdef CONFIG_RCU_TRACE
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
#define do_trace_rcu_torture_read(rcutorturename, rhp, secs, c_old, c) \
do { } while (0)
#endif
#endif
#define UINT_CMP_GE(a, b) (UINT_MAX / 2 >= (a) - (b))
#define UINT_CMP_LT(a, b) (UINT_MAX / 2 < (a) - (b))
#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
#define ulong2long(a) (*(long *)(&(a)))
......@@ -139,115 +50,14 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
/* Exported common interfaces */
#ifdef CONFIG_PREEMPT_RCU
/**
* call_rcu() - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual callback function to be invoked after the grace period
*
* The callback function will be invoked some time after a full grace
* period elapses, in other words after all pre-existing RCU read-side
* critical sections have completed. However, the callback function
* might well execute concurrently with RCU read-side critical sections
* that started after call_rcu() was invoked. RCU read-side critical
* sections are delimited by rcu_read_lock() and rcu_read_unlock(),
* and may be nested.
*
* Note that all CPUs must agree that the grace period extended beyond
* all pre-existing RCU read-side critical section. On systems with more
* than one CPU, this means that when "func()" is invoked, each CPU is
* guaranteed to have executed a full memory barrier since the end of its
* last RCU read-side critical section whose beginning preceded the call
* to call_rcu(). It also means that each CPU executing an RCU read-side
* critical section that continues beyond the start of "func()" must have
* executed a memory barrier after the call_rcu() but before the beginning
* of that RCU read-side critical section. Note that these guarantees
* include CPUs that are offline, idle, or executing in user mode, as
* well as CPUs that are executing in the kernel.
*
* Furthermore, if CPU A invoked call_rcu() and CPU B invoked the
* resulting RCU callback function "func()", then both CPU A and CPU B are
* guaranteed to execute a full memory barrier during the time interval
* between the call to call_rcu() and the invocation of "func()" -- even
* if CPU A and CPU B are the same CPU (but again only if the system has
* more than one CPU).
*/
void call_rcu(struct rcu_head *head,
rcu_callback_t func);
void call_rcu(struct rcu_head *head, rcu_callback_t func);
#else /* #ifdef CONFIG_PREEMPT_RCU */
/* In classic RCU, call_rcu() is just call_rcu_sched(). */
#define call_rcu call_rcu_sched
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
/**
* call_rcu_bh() - Queue an RCU for invocation after a quicker grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual callback function to be invoked after the grace period
*
* The callback function will be invoked some time after a full grace
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_bh() assumes
* that the read-side critical sections end on completion of a softirq
* handler. This means that read-side critical sections in process
* context must not be interrupted by softirqs. This interface is to be
* used when most of the read-side critical sections are in softirq context.
* RCU read-side critical sections are delimited by :
* - rcu_read_lock() and rcu_read_unlock(), if in interrupt context.
* OR
* - rcu_read_lock_bh() and rcu_read_unlock_bh(), if in process context.
* These may be nested.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
*/
void call_rcu_bh(struct rcu_head *head,
rcu_callback_t func);
/**
* call_rcu_sched() - Queue an RCU for invocation after sched grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual callback function to be invoked after the grace period
*
* The callback function will be invoked some time after a full grace
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_sched() assumes
* that the read-side critical sections end on enabling of preemption
* or on voluntary preemption.
* RCU read-side critical sections are delimited by :
* - rcu_read_lock_sched() and rcu_read_unlock_sched(),
* OR
* anything that disables preemption.
* These may be nested.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
*/
void call_rcu_sched(struct rcu_head *head,
rcu_callback_t func);
void call_rcu_bh(struct rcu_head *head, rcu_callback_t func);
void call_rcu_sched(struct rcu_head *head, rcu_callback_t func);
void synchronize_sched(void);
/**
* call_rcu_tasks() - Queue an RCU for invocation task-based grace period
* @head: structure to be used for queueing the RCU updates.
* @func: actual callback function to be invoked after the grace period
*
* The callback function will be invoked some time after a full grace
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_tasks() assumes
* that the read-side critical sections end at a voluntary context
* switch (not a preemption!), entry into idle, or transition to usermode
* execution. As such, there are no read-side primitives analogous to
* rcu_read_lock() and rcu_read_unlock() because this primitive is intended
* to determine that all tasks have passed through a safe state, not so
* much for data-strcuture synchronization.
*
* See the description of call_rcu() for more detailed information on
* memory ordering guarantees.
*/
void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
void synchronize_rcu_tasks(void);
void rcu_barrier_tasks(void);
......@@ -301,22 +111,12 @@ void rcu_check_callbacks(int user);
void rcu_report_dead(unsigned int cpu);
void rcu_cpu_starting(unsigned int cpu);
#ifndef CONFIG_TINY_RCU
void rcu_end_inkernel_boot(void);
#else /* #ifndef CONFIG_TINY_RCU */
static inline void rcu_end_inkernel_boot(void) { }
#endif /* #ifndef CONFIG_TINY_RCU */
#ifdef CONFIG_RCU_STALL_COMMON
void rcu_sysrq_start(void);
void rcu_sysrq_end(void);
#else /* #ifdef CONFIG_RCU_STALL_COMMON */
static inline void rcu_sysrq_start(void)
{
}
static inline void rcu_sysrq_end(void)
{
}
static inline void rcu_sysrq_start(void) { }
static inline void rcu_sysrq_end(void) { }
#endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */
#ifdef CONFIG_NO_HZ_FULL
......@@ -330,9 +130,7 @@ static inline void rcu_user_exit(void) { }
#ifdef CONFIG_RCU_NOCB_CPU
void rcu_init_nohz(void);
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
static inline void rcu_init_nohz(void)
{
}
static inline void rcu_init_nohz(void) { }
#endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
/**
......@@ -397,10 +195,6 @@ do { \
rcu_note_voluntary_context_switch(current); \
} while (0)
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP)
bool __rcu_is_watching(void);
#endif /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) || defined(CONFIG_SMP) */
/*
* Infrastructure to implement the synchronize_() primitives in
* TREE_RCU and rcu_barrier_() primitives in TINY_RCU.
......@@ -414,10 +208,6 @@ bool __rcu_is_watching(void);
#error "Unknown RCU implementation specified to kernel configuration"
#endif
#define RCU_SCHEDULER_INACTIVE 0
#define RCU_SCHEDULER_INIT 1
#define RCU_SCHEDULER_RUNNING 2
/*
* init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
* initialization and destruction of rcu_head on the stack. rcu_head structures
......@@ -430,30 +220,16 @@ void destroy_rcu_head(struct rcu_head *head);
void init_rcu_head_on_stack(struct rcu_head *head);
void destroy_rcu_head_on_stack(struct rcu_head *head);
#else /* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
static inline void init_rcu_head(struct rcu_head *head)
{
}
static inline void destroy_rcu_head(struct rcu_head *head)
{
}
static inline void init_rcu_head_on_stack(struct rcu_head *head)
{
}
static inline void destroy_rcu_head_on_stack(struct rcu_head *head)
{
}
static inline void init_rcu_head(struct rcu_head *head) { }
static inline void destroy_rcu_head(struct rcu_head *head) { }
static inline void init_rcu_head_on_stack(struct rcu_head *head) { }
static inline void destroy_rcu_head_on_stack(struct rcu_head *head) { }
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU)
bool rcu_lockdep_current_cpu_online(void);
#else /* #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */
static inline bool rcu_lockdep_current_cpu_online(void)
{
return true;
}
static inline bool rcu_lockdep_current_cpu_online(void) { return true; }
#endif /* #else #if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PROVE_RCU) */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
......@@ -473,18 +249,8 @@ extern struct lockdep_map rcu_bh_lock_map;
extern struct lockdep_map rcu_sched_lock_map;
extern struct lockdep_map rcu_callback_map;
int debug_lockdep_rcu_enabled(void);
int rcu_read_lock_held(void);
int rcu_read_lock_bh_held(void);
/**
* rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
* RCU-sched read-side critical section. In absence of
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise.
*/
int rcu_read_lock_sched_held(void);
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
......@@ -531,9 +297,7 @@ static inline void rcu_preempt_sleep_check(void)
"Illegal context switch in RCU read-side critical section");
}
#else /* #ifdef CONFIG_PROVE_RCU */
static inline void rcu_preempt_sleep_check(void)
{
}
static inline void rcu_preempt_sleep_check(void) { }
#endif /* #else #ifdef CONFIG_PROVE_RCU */
#define rcu_sleep_check() \
......@@ -1084,52 +848,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#define kfree_rcu(ptr, rcu_head) \
__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
#ifdef CONFIG_TINY_RCU
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
{
*nextevt = KTIME_MAX;
return 0;
}
#endif /* #ifdef CONFIG_TINY_RCU */
#if defined(CONFIG_RCU_NOCB_CPU_ALL)
static inline bool rcu_is_nocb_cpu(int cpu) { return true; }
#elif defined(CONFIG_RCU_NOCB_CPU)
bool rcu_is_nocb_cpu(int cpu);
#else
static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
#endif
/* Only for use by adaptive-ticks code. */
#ifdef CONFIG_NO_HZ_FULL_SYSIDLE
bool rcu_sys_is_idle(void);
void rcu_sysidle_force_exit(void);
#else /* #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
static inline bool rcu_sys_is_idle(void)
{
return false;
}
static inline void rcu_sysidle_force_exit(void)
{
}
#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
/*
* Dump the ftrace buffer, but only one time per callsite per boot.
*/
#define rcu_ftrace_dump(oops_dump_mode) \
do { \
static atomic_t ___rfd_beenhere = ATOMIC_INIT(0); \
\
if (!atomic_read(&___rfd_beenhere) && \
!atomic_xchg(&___rfd_beenhere, 1)) \
ftrace_dump(oops_dump_mode); \
} while (0)
/*
* Place this after a lock-acquisition primitive to guarantee that
......
......@@ -25,7 +25,7 @@
#ifndef __LINUX_TINY_H
#define __LINUX_TINY_H
#include <linux/cache.h>
#include <linux/ktime.h>
struct rcu_dynticks;
static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
......@@ -33,10 +33,8 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
return 0;
}
static inline bool rcu_eqs_special_set(int cpu)
{
return false; /* Never flag non-existent other CPUs! */
}
/* Never flag non-existent other CPUs! */
static inline bool rcu_eqs_special_set(int cpu) { return false; }
static inline unsigned long get_state_synchronize_rcu(void)
{
......@@ -98,159 +96,38 @@ static inline void kfree_call_rcu(struct rcu_head *head,
rcu_note_voluntary_context_switch_lite(current); \
} while (0)
/*
* Take advantage of the fact that there is only one CPU, which
* allows us to ignore virtualization-based context switches.
*/
static inline void rcu_virt_note_context_switch(int cpu)
{
}
/*
* Return the number of grace periods started.
*/
static inline unsigned long rcu_batches_started(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods started.
*/
static inline unsigned long rcu_batches_started_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods started.
*/
static inline unsigned long rcu_batches_started_sched(void)
{
return 0;
}
/*
* Return the number of grace periods completed.
*/
static inline unsigned long rcu_batches_completed(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods completed.
*/
static inline unsigned long rcu_batches_completed_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods completed.
*/
static inline unsigned long rcu_batches_completed_sched(void)
static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
{
*nextevt = KTIME_MAX;
return 0;
}
/*
* Return the number of expedited grace periods completed.
*/
static inline unsigned long rcu_exp_batches_completed(void)
{
return 0;
}
/*
* Return the number of expedited sched grace periods completed.
* Take advantage of the fact that there is only one CPU, which
* allows us to ignore virtualization-based context switches.
*/
static inline unsigned long rcu_exp_batches_completed_sched(void)
{
return 0;
}
static inline void rcu_force_quiescent_state(void)
{
}
static inline void rcu_bh_force_quiescent_state(void)
{
}
static inline void rcu_sched_force_quiescent_state(void)
{
}
static inline void show_rcu_gp_kthreads(void)
{
}
static inline void rcu_cpu_stall_reset(void)
{
}
static inline void rcu_idle_enter(void)
{
}
static inline void rcu_idle_exit(void)
{
}
static inline void rcu_irq_enter(void)
{
}
static inline void rcu_irq_exit_irqson(void)
{
}
static inline void rcu_irq_enter_irqson(void)
{
}
static inline void rcu_irq_exit(void)
{
}
static inline void exit_rcu(void)
{
}
static inline void rcu_virt_note_context_switch(int cpu) { }
static inline void rcu_cpu_stall_reset(void) { }
static inline void rcu_idle_enter(void) { }
static inline void rcu_idle_exit(void) { }
static inline void rcu_irq_enter(void) { }
static inline bool rcu_irq_enter_disabled(void) { return false; }
static inline void rcu_irq_exit_irqson(void) { }
static inline void rcu_irq_enter_irqson(void) { }
static inline void rcu_irq_exit(void) { }
static inline void exit_rcu(void) { }
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU)
extern int rcu_scheduler_active __read_mostly;
void rcu_scheduler_starting(void);
#else /* #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
static inline void rcu_scheduler_starting(void)
{
}
static inline void rcu_scheduler_starting(void) { }
#endif /* #else #if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_SRCU) */
static inline void rcu_end_inkernel_boot(void) { }
static inline bool rcu_is_watching(void) { return true; }
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE)
static inline bool rcu_is_watching(void)
{
return __rcu_is_watching();
}
#else /* defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
static inline bool rcu_is_watching(void)
{
return true;
}
#endif /* #else defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
static inline void rcu_request_urgent_qs_task(struct task_struct *t)
{
}
static inline void rcu_all_qs(void)
{
barrier(); /* Avoid RCU read-side critical sections leaking across. */
}
/* Avoid RCU read-side critical sections leaking across. */
static inline void rcu_all_qs(void) { barrier(); }
/* RCUtree hotplug events */
#define rcutree_prepare_cpu NULL
......
......@@ -79,37 +79,20 @@ void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
void cond_synchronize_sched(unsigned long oldstate);
extern unsigned long rcutorture_testseq;
extern unsigned long rcutorture_vernum;
unsigned long rcu_batches_started(void);
unsigned long rcu_batches_started_bh(void);
unsigned long rcu_batches_started_sched(void);
unsigned long rcu_batches_completed(void);
unsigned long rcu_batches_completed_bh(void);
unsigned long rcu_batches_completed_sched(void);
unsigned long rcu_exp_batches_completed(void);
unsigned long rcu_exp_batches_completed_sched(void);
void show_rcu_gp_kthreads(void);
void rcu_force_quiescent_state(void);
void rcu_bh_force_quiescent_state(void);
void rcu_sched_force_quiescent_state(void);
void rcu_idle_enter(void);
void rcu_idle_exit(void);
void rcu_irq_enter(void);
void rcu_irq_exit(void);
void rcu_irq_enter_irqson(void);
void rcu_irq_exit_irqson(void);
bool rcu_irq_enter_disabled(void);
void exit_rcu(void);
void rcu_scheduler_starting(void);
extern int rcu_scheduler_active __read_mostly;
void rcu_end_inkernel_boot(void);
bool rcu_is_watching(void);
void rcu_request_urgent_qs_task(struct task_struct *t);
void rcu_all_qs(void);
/* RCUtree hotplug events */
......
......@@ -369,6 +369,26 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock)
raw_spin_trylock_irqsave(spinlock_check(lock), flags); \
})
/**
* spin_unlock_wait - Interpose between successive critical sections
* @lock: the spinlock whose critical sections are to be interposed.
*
* Semantically this is equivalent to a spin_lock() immediately
* followed by a spin_unlock(). However, most architectures have
* more efficient implementations in which the spin_unlock_wait()
* cannot block concurrent lock acquisition, and in some cases
* where spin_unlock_wait() does not write to the lock variable.
* Nevertheless, spin_unlock_wait() can have high overhead, so if
* you feel the need to use it, please check to see if there is
* a better way to get your job done.
*
* The ordering guarantees provided by spin_unlock_wait() are:
*
* 1. All accesses preceding the spin_unlock_wait() happen before
* any accesses in later critical sections for this same lock.
* 2. All accesses following the spin_unlock_wait() happen after
* any accesses in earlier critical sections for this same lock.
*/
static __always_inline void spin_unlock_wait(spinlock_t *lock)
{
raw_spin_unlock_wait(&lock->rlock);
......
......@@ -60,32 +60,15 @@ int init_srcu_struct(struct srcu_struct *sp);
#include <linux/srcutiny.h>
#elif defined(CONFIG_TREE_SRCU)
#include <linux/srcutree.h>
#elif defined(CONFIG_CLASSIC_SRCU)
#include <linux/srcuclassic.h>
#else
#elif defined(CONFIG_SRCU)
#error "Unknown SRCU implementation specified to kernel configuration"
#else
/* Dummy definition for things like notifiers. Actual use gets link error. */
struct srcu_struct { };
#endif
/**
* call_srcu() - Queue a callback for invocation after an SRCU grace period
* @sp: srcu_struct in queue the callback
* @head: structure to be used for queueing the SRCU callback.
* @func: function to be invoked after the SRCU grace period
*
* The callback function will be invoked some time after a full SRCU
* grace period elapses, in other words after all pre-existing SRCU
* read-side critical sections have completed. However, the callback
* function might well execute concurrently with other SRCU read-side
* critical sections that started after call_srcu() was invoked. SRCU
* read-side critical sections are delimited by srcu_read_lock() and
* srcu_read_unlock(), and may be nested.
*
* The callback will be invoked from process context, but must nevertheless
* be fast and must not block.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
void (*func)(struct rcu_head *head));
void cleanup_srcu_struct(struct srcu_struct *sp);
int __srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void __srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
......
/*
* Sleepable Read-Copy Update mechanism for mutual exclusion,
* classic v4.11 variant.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, you can access it online at
* http://www.gnu.org/licenses/gpl-2.0.html.
*
* Copyright (C) IBM Corporation, 2017
*
* Author: Paul McKenney <paulmck@us.ibm.com>
*/
#ifndef _LINUX_SRCU_CLASSIC_H
#define _LINUX_SRCU_CLASSIC_H
struct srcu_array {
unsigned long lock_count[2];
unsigned long unlock_count[2];
};
struct rcu_batch {
struct rcu_head *head, **tail;
};
#define RCU_BATCH_INIT(name) { NULL, &(name.head) }
struct srcu_struct {
unsigned long completed;
struct srcu_array __percpu *per_cpu_ref;
spinlock_t queue_lock; /* protect ->batch_queue, ->running */
bool running;
/* callbacks just queued */
struct rcu_batch batch_queue;
/* callbacks try to do the first check_zero */
struct rcu_batch batch_check0;
/* callbacks done with the first check_zero and the flip */
struct rcu_batch batch_check1;
struct rcu_batch batch_done;
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.completed = -300, \
.per_cpu_ref = &name##_srcu_array, \
.queue_lock = __SPIN_LOCK_UNLOCKED(name.queue_lock), \
.running = false, \
.batch_queue = RCU_BATCH_INIT(name.batch_queue), \
.batch_check0 = RCU_BATCH_INIT(name.batch_check0), \
.batch_check1 = RCU_BATCH_INIT(name.batch_check1), \
.batch_done = RCU_BATCH_INIT(name.batch_done), \
.work = __DELAYED_WORK_INITIALIZER(name.work, process_srcu, 0),\
__SRCU_DEP_MAP_INIT(name) \
}
/*
* Define and initialize a srcu struct at build time.
* Do -not- call init_srcu_struct() nor cleanup_srcu_struct() on it.
*
* Note that although DEFINE_STATIC_SRCU() hides the name from other
* files, the per-CPU variable rules nevertheless require that the
* chosen name be globally unique. These rules also prohibit use of
* DEFINE_STATIC_SRCU() within a function. If these rules are too
* restrictive, declare the srcu_struct manually. For example, in
* each file:
*
* static struct srcu_struct my_srcu;
*
* Then, before the first use of each my_srcu, manually initialize it:
*
* init_srcu_struct(&my_srcu);
*
* See include/linux/percpu-defs.h for the rules on per-CPU variables.
*/
#define __DEFINE_SRCU(name, is_static) \
static DEFINE_PER_CPU(struct srcu_array, name##_srcu_array);\
is_static struct srcu_struct name = __SRCU_STRUCT_INIT(name)
#define DEFINE_SRCU(name) __DEFINE_SRCU(name, /* not static */)
#define DEFINE_STATIC_SRCU(name) __DEFINE_SRCU(name, static)
void synchronize_srcu_expedited(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*completed = sp->completed;
*gpnum = *completed;
if (sp->batch_queue.head || sp->batch_check0.head || sp->batch_check0.head)
(*gpnum)++;
}
#endif
......@@ -27,15 +27,14 @@
#include <linux/swait.h>
struct srcu_struct {
int srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */
short srcu_idx; /* Current reader array element. */
u8 srcu_gp_running; /* GP workqueue running? */
u8 srcu_gp_waiting; /* GP waiting for readers? */
struct swait_queue_head srcu_wq;
/* Last srcu_read_unlock() wakes GP. */
unsigned long srcu_gp_seq; /* GP seq # for callback tagging. */
struct rcu_segcblist srcu_cblist;
/* Pending SRCU callbacks. */
int srcu_idx; /* Current reader array element. */
bool srcu_gp_running; /* GP workqueue running? */
bool srcu_gp_waiting; /* GP waiting for readers? */
struct rcu_head *srcu_cb_head; /* Pending callbacks: Head. */
struct rcu_head **srcu_cb_tail; /* Pending callbacks: Tail. */
struct work_struct srcu_work; /* For driving grace periods. */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
......@@ -47,7 +46,7 @@ void srcu_drive_gp(struct work_struct *wp);
#define __SRCU_STRUCT_INIT(name) \
{ \
.srcu_wq = __SWAIT_QUEUE_HEAD_INITIALIZER(name.srcu_wq), \
.srcu_cblist = RCU_SEGCBLIST_INITIALIZER(name.srcu_cblist), \
.srcu_cb_tail = &name.srcu_cb_head, \
.srcu_work = __WORK_INITIALIZER(name.srcu_work, srcu_drive_gp), \
__SRCU_DEP_MAP_INIT(name) \
}
......@@ -63,31 +62,29 @@ void srcu_drive_gp(struct work_struct *wp);
void synchronize_srcu(struct srcu_struct *sp);
static inline void synchronize_srcu_expedited(struct srcu_struct *sp)
/*
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Can be invoked from irq/bh handlers, but the matching
* __srcu_read_unlock() must be in the same handler instance. Returns an
* index that must be passed to the matching srcu_read_unlock().
*/
static inline int __srcu_read_lock(struct srcu_struct *sp)
{
synchronize_srcu(sp);
}
int idx;
static inline void srcu_barrier(struct srcu_struct *sp)
{
synchronize_srcu(sp);
idx = READ_ONCE(sp->srcu_idx);
WRITE_ONCE(sp->srcu_lock_nesting[idx], sp->srcu_lock_nesting[idx] + 1);
return idx;
}
static inline unsigned long srcu_batches_completed(struct srcu_struct *sp)
static inline void synchronize_srcu_expedited(struct srcu_struct *sp)
{
return 0;
synchronize_srcu(sp);
}
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum,
unsigned long *completed)
static inline void srcu_barrier(struct srcu_struct *sp)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*completed = sp->srcu_gp_seq;
*gpnum = *completed;
synchronize_srcu(sp);
}
#endif
......@@ -40,7 +40,7 @@ struct srcu_data {
unsigned long srcu_unlock_count[2]; /* Unlocks per CPU. */
/* Update-side state. */
spinlock_t lock ____cacheline_internodealigned_in_smp;
raw_spinlock_t __private lock ____cacheline_internodealigned_in_smp;
struct rcu_segcblist srcu_cblist; /* List of callbacks.*/
unsigned long srcu_gp_seq_needed; /* Furthest future GP needed. */
unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */
......@@ -58,7 +58,7 @@ struct srcu_data {
* Node in SRCU combining tree, similar in function to rcu_data.
*/
struct srcu_node {
spinlock_t lock;
raw_spinlock_t __private lock;
unsigned long srcu_have_cbs[4]; /* GP seq for children */
/* having CBs, but only */
/* is > ->srcu_gq_seq. */
......@@ -78,7 +78,7 @@ struct srcu_struct {
struct srcu_node *level[RCU_NUM_LVLS + 1];
/* First node at each level. */
struct mutex srcu_cb_mutex; /* Serialize CB preparation. */
spinlock_t gp_lock; /* protect ->srcu_cblist */
raw_spinlock_t __private lock; /* Protect counters */
struct mutex srcu_gp_mutex; /* Serialize GP work. */
unsigned int srcu_idx; /* Current rdr array element. */
unsigned long srcu_gp_seq; /* Grace-period seq #. */
......@@ -109,7 +109,7 @@ void process_srcu(struct work_struct *work);
#define __SRCU_STRUCT_INIT(name) \
{ \
.sda = &name##_srcu_data, \
.gp_lock = __SPIN_LOCK_UNLOCKED(name.gp_lock), \
.lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
.srcu_gp_seq_needed = 0 - 1, \
__SRCU_DEP_MAP_INIT(name) \
}
......@@ -141,10 +141,5 @@ void process_srcu(struct work_struct *work);
void synchronize_srcu_expedited(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum, unsigned long *completed);
#endif
......@@ -742,6 +742,7 @@ TRACE_EVENT(rcu_torture_read,
* "OnlineQ": _rcu_barrier() found online CPU with callbacks.
* "OnlineNQ": _rcu_barrier() found online CPU, no callbacks.
* "IRQ": An rcu_barrier_callback() callback posted on remote CPU.
* "IRQNQ": An rcu_barrier_callback() callback found no callbacks.
* "CB": An rcu_barrier_callback() invoked a callback, not the last.
* "LastCB": An rcu_barrier_callback() invoked the last callback.
* "Inc2": _rcu_barrier() piggyback check counter incremented.
......
......@@ -472,354 +472,7 @@ config TASK_IO_ACCOUNTING
endmenu # "CPU/Task time and stats accounting"
menu "RCU Subsystem"
config TREE_RCU
bool
default y if !PREEMPT && SMP
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
thousands of CPUs. It also scales down nicely to
smaller systems.
config PREEMPT_RCU
bool
default y if PREEMPT
help
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
is also required. It also scales down nicely to
smaller systems.
Select this option if you are unsure.
config TINY_RCU
bool
default y if !PREEMPT && !SMP
help
This option selects the RCU implementation that is
designed for UP systems from which real-time response
is not required. This option greatly reduces the
memory footprint of RCU.
config RCU_EXPERT
bool "Make expert-level adjustments to RCU configuration"
default n
help
This option needs to be enabled if you wish to make
expert-level adjustments to RCU configuration. By default,
no such adjustments can be made, which has the often-beneficial
side-effect of preventing "make oldconfig" from asking you all
sorts of detailed questions about how you would like numerous
obscure RCU options to be set up.
Say Y if you need to make expert-level adjustments to RCU.
Say N if you are unsure.
config SRCU
bool
default y
help
This option selects the sleepable version of RCU. This version
permits arbitrary sleeping or blocking within RCU read-side critical
sections.
config CLASSIC_SRCU
bool "Use v4.11 classic SRCU implementation"
default n
depends on RCU_EXPERT && SRCU
help
This option selects the traditional well-tested classic SRCU
implementation from v4.11, as might be desired for enterprise
Linux distributions. Without this option, the shiny new
Tiny SRCU and Tree SRCU implementations are used instead.
At some point, it is hoped that Tiny SRCU and Tree SRCU
will accumulate enough test time and confidence to allow
Classic SRCU to be dropped entirely.
Say Y if you need a rock-solid SRCU.
Say N if you would like help test Tree SRCU.
config TINY_SRCU
bool
default y if SRCU && TINY_RCU && !CLASSIC_SRCU
help
This option selects the single-CPU non-preemptible version of SRCU.
config TREE_SRCU
bool
default y if SRCU && !TINY_RCU && !CLASSIC_SRCU
help
This option selects the full-fledged version of SRCU.
config TASKS_RCU
bool
default n
select SRCU
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
user-mode execution as quiescent states.
config RCU_STALL_COMMON
def_bool ( TREE_RCU || PREEMPT_RCU || RCU_TRACE )
help
This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow
the tiny variants to disable RCU CPU stall warnings, while
making these warnings mandatory for the tree variants.
config RCU_NEED_SEGCBLIST
def_bool ( TREE_RCU || PREEMPT_RCU || TINY_SRCU || TREE_SRCU )
config CONTEXT_TRACKING
bool
config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
default y if !NO_HZ_FULL
help
The major pre-requirement for full dynticks to work is to
support the context tracking subsystem. But there are also
other dependencies to provide in order to make the full
dynticks working.
This option stands for testing when an arch implements the
context tracking backend but doesn't yet fullfill all the
requirements to make the full dynticks feature working.
Without the full dynticks, there is no way to test the support
for context tracking and the subsystems that rely on it: RCU
userspace extended quiescent state and tickless cputime
accounting. This option copes with the absence of the full
dynticks subsystem by forcing the context tracking on all
CPUs in the system.
Say Y only if you're working on the development of an
architecture backend for the context tracking.
Say N otherwise, this option brings an overhead that you
don't want in production.
config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
default 64 if 64BIT
default 32 if !64BIT
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
large numbers of CPUs. This value must be at least the fourth
root of NR_CPUS, which allows NR_CPUS to be insanely large.
The default value of RCU_FANOUT should be used for production
systems, but if you are stress-testing the RCU implementation
itself, small RCU_FANOUT values allow you to test large-system
code paths on small(er) systems.
Select a specific number if testing RCU itself.
Take the default if unsure.
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
default 16
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
against lock contention. Systems that synchronize their
scheduling-clock interrupts for energy-efficiency reasons will
want the default because the smaller leaf-level fanout keeps
lock contention levels acceptably low. Very large systems
(hundreds or thousands of CPUs) will instead want to set this
value to the maximum value possible in order to reduce the
number of cache misses incurred during RCU's grace-period
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well. That said, setting leaf-level
fanout to a large number will likely cause problematic
lock contention on the leaf-level rcu_node structures unless
you boot with the skew_tick kernel parameter.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems, but
please understand that you may also need to set the skew_tick
kernel boot parameter to avoid contention on the rcu_node
structure's locks.
Take the default if unsure.
config RCU_FAST_NO_HZ
bool "Accelerate last non-dyntick-idle CPU's grace periods"
depends on NO_HZ_COMMON && SMP && RCU_EXPERT
default n
help
This option permits CPUs to enter dynticks-idle state even if
they have RCU callbacks queued, and prevents RCU from waking
these CPUs up more than roughly once every four jiffies (by
default, you can adjust this using the rcutree.rcu_idle_gp_delay
parameter), thus improving energy efficiency. On the other
hand, this option increases the duration of RCU grace periods,
for example, slowing down synchronize_rcu().
Say Y if energy efficiency is critically important, and you
don't care about increased grace-period durations.
Say N if you are unsure.
config TREE_RCU_TRACE
def_bool RCU_TRACE && ( TREE_RCU || PREEMPT_RCU )
select DEBUG_FS
help
This option provides tracing for the TREE_RCU and
PREEMPT_RCU implementations, permitting Makefile to
trivially select kernel/rcutree_trace.c.
config RCU_BOOST
bool "Enable RCU priority boosting"
depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT
default n
help
This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long.
This option also prevents heavy loads from blocking RCU
callback invocation for all flavors of RCU.
Say Y here if you are working with real-time apps or heavy loads
Say N here if you are unsure.
config RCU_KTHREAD_PRIO
int "Real-time priority to use for RCU worker threads"
range 1 99 if RCU_BOOST
range 0 99 if !RCU_BOOST
default 1 if RCU_BOOST
default 0 if !RCU_BOOST
depends on RCU_EXPERT
help
This option specifies the SCHED_FIFO priority value that will be
assigned to the rcuc/n and rcub/n threads and is also the value
used for RCU_BOOST (if enabled). If you are working with a
real-time application that has one or more CPU-bound threads
running at a real-time priority level, you should set
RCU_KTHREAD_PRIO to a priority higher than the highest-priority
real-time CPU-bound application thread. The default RCU_KTHREAD_PRIO
value of 1 is appropriate in the common case, which is real-time
applications that do not have any CPU-bound threads.
Some real-time applications might not have a single real-time
thread that saturates a given CPU, but instead might have
multiple real-time threads that, taken together, fully utilize
that CPU. In this case, you should set RCU_KTHREAD_PRIO to
a priority higher than the lowest-priority thread that is
conspiring to prevent the CPU from running any non-real-time
tasks. For example, if one thread at priority 10 and another
thread at priority 5 are between themselves fully consuming
the CPU time on a given CPU, then RCU_KTHREAD_PRIO should be
set to priority 6 or higher.
Specify the real-time priority, or take the default if unsure.
config RCU_BOOST_DELAY
int "Milliseconds to delay boosting after RCU grace-period start"
range 0 3000
depends on RCU_BOOST
default 500
help
This option specifies the time to wait after the beginning of
a given grace period before priority-boosting preempted RCU
readers blocking that grace period. Note that any RCU reader
blocking an expedited RCU grace period is boosted immediately.
Accept the default if unsure.
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || PREEMPT_RCU
depends on RCU_EXPERT || NO_HZ_FULL
default n
help
Use this option to reduce OS jitter for aggressive HPC or
real-time workloads. It can also be used to offload RCU
callback invocation to energy-efficient CPUs in battery-powered
asymmetric multiprocessors.
This option offloads callback invocation from the set of
CPUs specified at boot time by the rcu_nocbs parameter.
For each such CPU, a kthread ("rcuox/N") will be created to
invoke callbacks, where the "N" is the CPU being offloaded,
and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
"s" for RCU-sched. Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired.
Say Y here if you want to help to debug reduced OS jitter.
Say N here if you are unsure.
choice
prompt "Build-forced no-CBs CPUs"
default RCU_NOCB_CPU_NONE
depends on RCU_NOCB_CPU
help
This option allows no-CBs CPUs (whose RCU callbacks are invoked
from kthreads rather than from softirq context) to be specified
at build time. Additional no-CBs CPUs may be specified by
the rcu_nocbs= boot parameter.
config RCU_NOCB_CPU_NONE
bool "No build_forced no-CBs CPUs"
help
This option does not force any of the CPUs to be no-CBs CPUs.
Only CPUs designated by the rcu_nocbs= boot parameter will be
no-CBs CPUs, whose RCU callbacks will be invoked by per-CPU
kthreads whose names begin with "rcuo". All other CPUs will
invoke their own RCU callbacks in softirq context.
Select this option if you want to choose no-CBs CPUs at
boot time, for example, to allow testing of different no-CBs
configurations without having to rebuild the kernel each time.
config RCU_NOCB_CPU_ZERO
bool "CPU 0 is a build_forced no-CBs CPU"
help
This option forces CPU 0 to be a no-CBs CPU, so that its RCU
callbacks are invoked by a per-CPU kthread whose name begins
with "rcuo". Additional CPUs may be designated as no-CBs
CPUs using the rcu_nocbs= boot parameter will be no-CBs CPUs.
All other CPUs will invoke their own RCU callbacks in softirq
context.
Select this if CPU 0 needs to be a no-CBs CPU for real-time
or energy-efficiency reasons, but the real reason it exists
is to ensure that randconfig testing covers mixed systems.
config RCU_NOCB_CPU_ALL
bool "All CPUs are build_forced no-CBs CPUs"
help
This option forces all CPUs to be no-CBs CPUs. The rcu_nocbs=
boot parameter will be ignored. All CPUs' RCU callbacks will
be executed in the context of per-CPU rcuo kthreads created for
this purpose. Assuming that the kthreads whose names start with
"rcuo" are bound to "housekeeping" CPUs, this reduces OS jitter
on the remaining CPUs, but might decrease memory locality during
RCU-callback invocation, thus potentially degrading throughput.
Select this if all CPUs need to be no-CBs CPUs for real-time
or energy-efficiency reasons.
endchoice
endmenu # "RCU Subsystem"
source "kernel/rcu/Kconfig"
config BUILD_BIN2C
bool
......
......@@ -1157,18 +1157,18 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
if (debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("======================================================\n");
pr_warn("WARNING: possible circular locking dependency detected\n");
print_kernel_ident();
pr_warn("------------------------------------------------------\n");
printk("%s/%d is trying to acquire lock:\n",
pr_warn("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(check_src);
printk("\nbut task is already holding lock:\n");
pr_warn("\nbut task is already holding lock:\n");
print_lock(check_tgt);
printk("\nwhich lock already depends on the new lock.\n\n");
printk("\nthe existing dependency chain (in reverse order) is:\n");
pr_warn("\nwhich lock already depends on the new lock.\n\n");
pr_warn("\nthe existing dependency chain (in reverse order) is:\n");
print_circular_bug_entry(entry, depth);
......@@ -1495,13 +1495,13 @@ print_bad_irq_dependency(struct task_struct *curr,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("=====================================================\n");
pr_warn("WARNING: %s-safe -> %s-unsafe lock order detected\n",
irqclass, irqclass);
print_kernel_ident();
pr_warn("-----------------------------------------------------\n");
printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
curr->comm, task_pid_nr(curr),
curr->hardirq_context, hardirq_count() >> HARDIRQ_SHIFT,
curr->softirq_context, softirq_count() >> SOFTIRQ_SHIFT,
......@@ -1509,46 +1509,46 @@ print_bad_irq_dependency(struct task_struct *curr,
curr->softirqs_enabled);
print_lock(next);
printk("\nand this task is already holding:\n");
pr_warn("\nand this task is already holding:\n");
print_lock(prev);
printk("which would create a new lock dependency:\n");
pr_warn("which would create a new lock dependency:\n");
print_lock_name(hlock_class(prev));
printk(KERN_CONT " ->");
pr_cont(" ->");
print_lock_name(hlock_class(next));
printk(KERN_CONT "\n");
pr_cont("\n");
printk("\nbut this new dependency connects a %s-irq-safe lock:\n",
pr_warn("\nbut this new dependency connects a %s-irq-safe lock:\n",
irqclass);
print_lock_name(backwards_entry->class);
printk("\n... which became %s-irq-safe at:\n", irqclass);
pr_warn("\n... which became %s-irq-safe at:\n", irqclass);
print_stack_trace(backwards_entry->class->usage_traces + bit1, 1);
printk("\nto a %s-irq-unsafe lock:\n", irqclass);
pr_warn("\nto a %s-irq-unsafe lock:\n", irqclass);
print_lock_name(forwards_entry->class);
printk("\n... which became %s-irq-unsafe at:\n", irqclass);
printk("...");
pr_warn("\n... which became %s-irq-unsafe at:\n", irqclass);
pr_warn("...");
print_stack_trace(forwards_entry->class->usage_traces + bit2, 1);
printk("\nother info that might help us debug this:\n\n");
pr_warn("\nother info that might help us debug this:\n\n");
print_irq_lock_scenario(backwards_entry, forwards_entry,
hlock_class(prev), hlock_class(next));
lockdep_print_held_locks(curr);
printk("\nthe dependencies between %s-irq-safe lock and the holding lock:\n", irqclass);
pr_warn("\nthe dependencies between %s-irq-safe lock and the holding lock:\n", irqclass);
if (!save_trace(&prev_root->trace))
return 0;
print_shortest_lock_dependencies(backwards_entry, prev_root);
printk("\nthe dependencies between the lock to be acquired");
printk(" and %s-irq-unsafe lock:\n", irqclass);
pr_warn("\nthe dependencies between the lock to be acquired");
pr_warn(" and %s-irq-unsafe lock:\n", irqclass);
if (!save_trace(&next_root->trace))
return 0;
print_shortest_lock_dependencies(forwards_entry, next_root);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -1724,22 +1724,22 @@ print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("============================================\n");
pr_warn("WARNING: possible recursive locking detected\n");
print_kernel_ident();
pr_warn("--------------------------------------------\n");
printk("%s/%d is trying to acquire lock:\n",
pr_warn("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(next);
printk("\nbut task is already holding lock:\n");
pr_warn("\nbut task is already holding lock:\n");
print_lock(prev);
printk("\nother info that might help us debug this:\n");
pr_warn("\nother info that might help us debug this:\n");
print_deadlock_scenario(next, prev);
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -2074,21 +2074,21 @@ static void print_collision(struct task_struct *curr,
struct held_lock *hlock_next,
struct lock_chain *chain)
{
printk("\n");
pr_warn("\n");
pr_warn("============================\n");
pr_warn("WARNING: chain_key collision\n");
print_kernel_ident();
pr_warn("----------------------------\n");
printk("%s/%d: ", current->comm, task_pid_nr(current));
printk("Hash chain already cached but the contents don't match!\n");
pr_warn("%s/%d: ", current->comm, task_pid_nr(current));
pr_warn("Hash chain already cached but the contents don't match!\n");
printk("Held locks:");
pr_warn("Held locks:");
print_chain_keys_held_locks(curr, hlock_next);
printk("Locks in cached chain:");
pr_warn("Locks in cached chain:");
print_chain_keys_chain(chain);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
}
#endif
......@@ -2373,16 +2373,16 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("================================\n");
pr_warn("WARNING: inconsistent lock state\n");
print_kernel_ident();
pr_warn("--------------------------------\n");
printk("inconsistent {%s} -> {%s} usage.\n",
pr_warn("inconsistent {%s} -> {%s} usage.\n",
usage_str[prev_bit], usage_str[new_bit]);
printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
pr_warn("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] takes:\n",
curr->comm, task_pid_nr(curr),
trace_hardirq_context(curr), hardirq_count() >> HARDIRQ_SHIFT,
trace_softirq_context(curr), softirq_count() >> SOFTIRQ_SHIFT,
......@@ -2390,16 +2390,16 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
trace_softirqs_enabled(curr));
print_lock(this);
printk("{%s} state was registered at:\n", usage_str[prev_bit]);
pr_warn("{%s} state was registered at:\n", usage_str[prev_bit]);
print_stack_trace(hlock_class(this)->usage_traces + prev_bit, 1);
print_irqtrace_events(curr);
printk("\nother info that might help us debug this:\n");
pr_warn("\nother info that might help us debug this:\n");
print_usage_bug_scenario(this);
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -2438,28 +2438,28 @@ print_irq_inversion_bug(struct task_struct *curr,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("========================================================\n");
pr_warn("WARNING: possible irq lock inversion dependency detected\n");
print_kernel_ident();
pr_warn("--------------------------------------------------------\n");
printk("%s/%d just changed the state of lock:\n",
pr_warn("%s/%d just changed the state of lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(this);
if (forwards)
printk("but this lock took another, %s-unsafe lock in the past:\n", irqclass);
pr_warn("but this lock took another, %s-unsafe lock in the past:\n", irqclass);
else
printk("but this lock was taken by another, %s-safe lock in the past:\n", irqclass);
pr_warn("but this lock was taken by another, %s-safe lock in the past:\n", irqclass);
print_lock_name(other->class);
printk("\n\nand interrupts could create inverse lock ordering between them.\n\n");
pr_warn("\n\nand interrupts could create inverse lock ordering between them.\n\n");
printk("\nother info that might help us debug this:\n");
pr_warn("\nother info that might help us debug this:\n");
/* Find a middle lock (if one exists) */
depth = get_lock_depth(other);
do {
if (depth == 0 && (entry != root)) {
printk("lockdep:%s bad path found in chain graph\n", __func__);
pr_warn("lockdep:%s bad path found in chain graph\n", __func__);
break;
}
middle = entry;
......@@ -2475,12 +2475,12 @@ print_irq_inversion_bug(struct task_struct *curr,
lockdep_print_held_locks(curr);
printk("\nthe shortest dependencies between 2nd lock and 1st lock:\n");
pr_warn("\nthe shortest dependencies between 2nd lock and 1st lock:\n");
if (!save_trace(&root->trace))
return 0;
print_shortest_lock_dependencies(other, root);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -3189,25 +3189,25 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
if (debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("==================================\n");
pr_warn("WARNING: Nested lock was not taken\n");
print_kernel_ident();
pr_warn("----------------------------------\n");
printk("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
pr_warn("%s/%d is trying to lock:\n", curr->comm, task_pid_nr(curr));
print_lock(hlock);
printk("\nbut this task is not holding:\n");
printk("%s\n", hlock->nest_lock->name);
pr_warn("\nbut this task is not holding:\n");
pr_warn("%s\n", hlock->nest_lock->name);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
printk("\nother info that might help us debug this:\n");
pr_warn("\nother info that might help us debug this:\n");
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -3402,21 +3402,21 @@ print_unlock_imbalance_bug(struct task_struct *curr, struct lockdep_map *lock,
if (debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("=====================================\n");
pr_warn("WARNING: bad unlock balance detected!\n");
print_kernel_ident();
pr_warn("-------------------------------------\n");
printk("%s/%d is trying to release lock (",
pr_warn("%s/%d is trying to release lock (",
curr->comm, task_pid_nr(curr));
print_lockdep_cache(lock);
printk(KERN_CONT ") at:\n");
pr_cont(") at:\n");
print_ip_sym(ip);
printk("but there are no more locks to release!\n");
printk("\nother info that might help us debug this:\n");
pr_warn("but there are no more locks to release!\n");
pr_warn("\nother info that might help us debug this:\n");
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -3974,21 +3974,21 @@ print_lock_contention_bug(struct task_struct *curr, struct lockdep_map *lock,
if (debug_locks_silent)
return 0;
printk("\n");
pr_warn("\n");
pr_warn("=================================\n");
pr_warn("WARNING: bad contention detected!\n");
print_kernel_ident();
pr_warn("---------------------------------\n");
printk("%s/%d is trying to contend lock (",
pr_warn("%s/%d is trying to contend lock (",
curr->comm, task_pid_nr(curr));
print_lockdep_cache(lock);
printk(KERN_CONT ") at:\n");
pr_cont(") at:\n");
print_ip_sym(ip);
printk("but there are no locks held!\n");
printk("\nother info that might help us debug this:\n");
pr_warn("but there are no locks held!\n");
pr_warn("\nother info that might help us debug this:\n");
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
return 0;
......@@ -4318,17 +4318,17 @@ print_freed_lock_bug(struct task_struct *curr, const void *mem_from,
if (debug_locks_silent)
return;
printk("\n");
pr_warn("\n");
pr_warn("=========================\n");
pr_warn("WARNING: held lock freed!\n");
print_kernel_ident();
pr_warn("-------------------------\n");
printk("%s/%d is freeing memory %p-%p, with a lock still held there!\n",
pr_warn("%s/%d is freeing memory %p-%p, with a lock still held there!\n",
curr->comm, task_pid_nr(curr), mem_from, mem_to-1);
print_lock(hlock);
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
}
......@@ -4376,14 +4376,14 @@ static void print_held_locks_bug(void)
if (debug_locks_silent)
return;
printk("\n");
pr_warn("\n");
pr_warn("====================================\n");
pr_warn("WARNING: %s/%d still has locks held!\n",
current->comm, task_pid_nr(current));
print_kernel_ident();
pr_warn("------------------------------------\n");
lockdep_print_held_locks(current);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
}
......@@ -4402,10 +4402,10 @@ void debug_show_all_locks(void)
int unlock = 1;
if (unlikely(!debug_locks)) {
printk("INFO: lockdep is turned off.\n");
pr_warn("INFO: lockdep is turned off.\n");
return;
}
printk("\nShowing all locks held in the system:\n");
pr_warn("\nShowing all locks held in the system:\n");
/*
* Here we try to get the tasklist_lock as hard as possible,
......@@ -4416,18 +4416,18 @@ void debug_show_all_locks(void)
retry:
if (!read_trylock(&tasklist_lock)) {
if (count == 10)
printk("hm, tasklist_lock locked, retrying... ");
pr_warn("hm, tasklist_lock locked, retrying... ");
if (count) {
count--;
printk(" #%d", 10-count);
pr_cont(" #%d", 10-count);
mdelay(200);
goto retry;
}
printk(" ignoring it.\n");
pr_cont(" ignoring it.\n");
unlock = 0;
} else {
if (count != 10)
printk(KERN_CONT " locked it.\n");
pr_cont(" locked it.\n");
}
do_each_thread(g, p) {
......@@ -4445,7 +4445,7 @@ void debug_show_all_locks(void)
unlock = 1;
} while_each_thread(g, p);
printk("\n");
pr_warn("\n");
pr_warn("=============================================\n\n");
if (unlock)
......@@ -4475,12 +4475,12 @@ asmlinkage __visible void lockdep_sys_exit(void)
if (unlikely(curr->lockdep_depth)) {
if (!debug_locks_off())
return;
printk("\n");
pr_warn("\n");
pr_warn("================================================\n");
pr_warn("WARNING: lock held when returning to user space!\n");
print_kernel_ident();
pr_warn("------------------------------------------------\n");
printk("%s/%d is leaving the kernel with locks still held!\n",
pr_warn("%s/%d is leaving the kernel with locks still held!\n",
curr->comm, curr->pid);
lockdep_print_held_locks(curr);
}
......@@ -4490,19 +4490,15 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
{
struct task_struct *curr = current;
#ifndef CONFIG_PROVE_RCU_REPEATEDLY
if (!debug_locks_off())
return;
#endif /* #ifdef CONFIG_PROVE_RCU_REPEATEDLY */
/* Note: the following can be executed concurrently, so be careful. */
printk("\n");
pr_warn("\n");
pr_warn("=============================\n");
pr_warn("WARNING: suspicious RCU usage\n");
print_kernel_ident();
pr_warn("-----------------------------\n");
printk("%s:%d %s!\n", file, line, s);
printk("\nother info that might help us debug this:\n\n");
printk("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
pr_warn("%s:%d %s!\n", file, line, s);
pr_warn("\nother info that might help us debug this:\n\n");
pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
!rcu_lockdep_current_cpu_online()
? "RCU used illegally from offline CPU!\n"
: !rcu_is_watching()
......@@ -4529,10 +4525,10 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
* rcu_read_lock_bh() and so on from extended quiescent states.
*/
if (!rcu_is_watching())
printk("RCU used illegally from extended quiescent state!\n");
pr_warn("RCU used illegally from extended quiescent state!\n");
lockdep_print_held_locks(curr);
printk("\nstack backtrace:\n");
pr_warn("\nstack backtrace:\n");
dump_stack();
}
EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
#
# RCU-related configuration options
#
menu "RCU Subsystem"
config TREE_RCU
bool
default y if !PREEMPT && SMP
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
thousands of CPUs. It also scales down nicely to
smaller systems.
config PREEMPT_RCU
bool
default y if PREEMPT
help
This option selects the RCU implementation that is
designed for very large SMP systems with hundreds or
thousands of CPUs, but for which real-time response
is also required. It also scales down nicely to
smaller systems.
Select this option if you are unsure.
config TINY_RCU
bool
default y if !PREEMPT && !SMP
help
This option selects the RCU implementation that is
designed for UP systems from which real-time response
is not required. This option greatly reduces the
memory footprint of RCU.
config RCU_EXPERT
bool "Make expert-level adjustments to RCU configuration"
default n
help
This option needs to be enabled if you wish to make
expert-level adjustments to RCU configuration. By default,
no such adjustments can be made, which has the often-beneficial
side-effect of preventing "make oldconfig" from asking you all
sorts of detailed questions about how you would like numerous
obscure RCU options to be set up.
Say Y if you need to make expert-level adjustments to RCU.
Say N if you are unsure.
config SRCU
bool
help
This option selects the sleepable version of RCU. This version
permits arbitrary sleeping or blocking within RCU read-side critical
sections.
config TINY_SRCU
bool
default y if SRCU && TINY_RCU
help
This option selects the single-CPU non-preemptible version of SRCU.
config TREE_SRCU
bool
default y if SRCU && !TINY_RCU
help
This option selects the full-fledged version of SRCU.
config TASKS_RCU
bool
default n
select SRCU
help
This option enables a task-based RCU implementation that uses
only voluntary context switch (not preemption!), idle, and
user-mode execution as quiescent states.
config RCU_STALL_COMMON
def_bool ( TREE_RCU || PREEMPT_RCU )
help
This option enables RCU CPU stall code that is common between
the TINY and TREE variants of RCU. The purpose is to allow
the tiny variants to disable RCU CPU stall warnings, while
making these warnings mandatory for the tree variants.
config RCU_NEED_SEGCBLIST
def_bool ( TREE_RCU || PREEMPT_RCU || TREE_SRCU )
config CONTEXT_TRACKING
bool
config CONTEXT_TRACKING_FORCE
bool "Force context tracking"
depends on CONTEXT_TRACKING
default y if !NO_HZ_FULL
help
The major pre-requirement for full dynticks to work is to
support the context tracking subsystem. But there are also
other dependencies to provide in order to make the full
dynticks working.
This option stands for testing when an arch implements the
context tracking backend but doesn't yet fullfill all the
requirements to make the full dynticks feature working.
Without the full dynticks, there is no way to test the support
for context tracking and the subsystems that rely on it: RCU
userspace extended quiescent state and tickless cputime
accounting. This option copes with the absence of the full
dynticks subsystem by forcing the context tracking on all
CPUs in the system.
Say Y only if you're working on the development of an
architecture backend for the context tracking.
Say N otherwise, this option brings an overhead that you
don't want in production.
config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
default 64 if 64BIT
default 32 if !64BIT
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
large numbers of CPUs. This value must be at least the fourth
root of NR_CPUS, which allows NR_CPUS to be insanely large.
The default value of RCU_FANOUT should be used for production
systems, but if you are stress-testing the RCU implementation
itself, small RCU_FANOUT values allow you to test large-system
code paths on small(er) systems.
Select a specific number if testing RCU itself.
Take the default if unsure.
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 64 if 64BIT
range 2 32 if !64BIT
depends on (TREE_RCU || PREEMPT_RCU) && RCU_EXPERT
default 16
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
against lock contention. Systems that synchronize their
scheduling-clock interrupts for energy-efficiency reasons will
want the default because the smaller leaf-level fanout keeps
lock contention levels acceptably low. Very large systems
(hundreds or thousands of CPUs) will instead want to set this
value to the maximum value possible in order to reduce the
number of cache misses incurred during RCU's grace-period
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well. That said, setting leaf-level
fanout to a large number will likely cause problematic
lock contention on the leaf-level rcu_node structures unless
you boot with the skew_tick kernel parameter.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems, but
please understand that you may also need to set the skew_tick
kernel boot parameter to avoid contention on the rcu_node
structure's locks.
Take the default if unsure.
config RCU_FAST_NO_HZ
bool "Accelerate last non-dyntick-idle CPU's grace periods"
depends on NO_HZ_COMMON && SMP && RCU_EXPERT
default n
help
This option permits CPUs to enter dynticks-idle state even if
they have RCU callbacks queued, and prevents RCU from waking
these CPUs up more than roughly once every four jiffies (by
default, you can adjust this using the rcutree.rcu_idle_gp_delay
parameter), thus improving energy efficiency. On the other
hand, this option increases the duration of RCU grace periods,
for example, slowing down synchronize_rcu().
Say Y if energy efficiency is critically important, and you
don't care about increased grace-period durations.
Say N if you are unsure.
config RCU_BOOST
bool "Enable RCU priority boosting"
depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT
default n
help
This option boosts the priority of preempted RCU readers that
block the current preemptible RCU grace period for too long.
This option also prevents heavy loads from blocking RCU
callback invocation for all flavors of RCU.
Say Y here if you are working with real-time apps or heavy loads
Say N here if you are unsure.
config RCU_BOOST_DELAY
int "Milliseconds to delay boosting after RCU grace-period start"
range 0 3000
depends on RCU_BOOST
default 500
help
This option specifies the time to wait after the beginning of
a given grace period before priority-boosting preempted RCU
readers blocking that grace period. Note that any RCU reader
blocking an expedited RCU grace period is boosted immediately.
Accept the default if unsure.
config RCU_NOCB_CPU
bool "Offload RCU callback processing from boot-selected CPUs"
depends on TREE_RCU || PREEMPT_RCU
depends on RCU_EXPERT || NO_HZ_FULL
default n
help
Use this option to reduce OS jitter for aggressive HPC or
real-time workloads. It can also be used to offload RCU
callback invocation to energy-efficient CPUs in battery-powered
asymmetric multiprocessors.
This option offloads callback invocation from the set of
CPUs specified at boot time by the rcu_nocbs parameter.
For each such CPU, a kthread ("rcuox/N") will be created to
invoke callbacks, where the "N" is the CPU being offloaded,
and where the "x" is "b" for RCU-bh, "p" for RCU-preempt, and
"s" for RCU-sched. Nothing prevents this kthread from running
on the specified CPUs, but (1) the kthreads may be preempted
between each callback, and (2) affinity or cgroups can be used
to force the kthreads to run on whatever set of CPUs is desired.
Say Y here if you want to help to debug reduced OS jitter.
Say N here if you are unsure.
endmenu # "RCU Subsystem"
#
# RCU-related debugging configuration options
#
menu "RCU Debugging"
config PROVE_RCU
def_bool PROVE_LOCKING
config TORTURE_TEST
tristate
default n
config RCU_PERF_TEST
tristate "performance tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
default n
help
This option provides a kernel module that runs performance
tests on the RCU infrastructure. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want RCU performance tests to be built into
the kernel.
Say M if you want the RCU performance tests to build as a module.
Say N if you are unsure.
config RCU_TORTURE_TEST
tristate "torture tests for RCU"
depends on DEBUG_KERNEL
select TORTURE_TEST
select SRCU
select TASKS_RCU
default n
help
This option provides a kernel module that runs torture tests
on the RCU infrastructure. The kernel module may be built
after the fact on the running kernel to be tested, if desired.
Say Y here if you want RCU torture tests to be built into
the kernel.
Say M if you want the RCU torture tests to build as a module.
Say N if you are unsure.
config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
range 3 300
default 21
help
If a given RCU grace period extends more than the specified
number of seconds, a CPU stall warning is printed. If the
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL
default y if TREE_RCU
select TRACE_CLOCK
help
This option enables additional tracepoints for ftrace-style
event tracing.
Say Y here if you want to enable RCU tracing
Say N if you are unsure.
config RCU_EQS_DEBUG
bool "Provide debugging asserts for adding NO_HZ support to an arch"
depends on DEBUG_KERNEL
help
This option provides consistency checks in RCU's handling of
NO_HZ. These checks have proven quite helpful in detecting
bugs in arch-specific NO_HZ code.
Say N here if you need ultimate kernel/user switch latencies
Say Y if you are unsure
endmenu # "RCU Debugging"
......@@ -3,13 +3,11 @@
KCOV_INSTRUMENT := n
obj-y += update.o sync.o
obj-$(CONFIG_CLASSIC_SRCU) += srcu.o
obj-$(CONFIG_TREE_SRCU) += srcutree.o
obj-$(CONFIG_TINY_SRCU) += srcutiny.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
obj-$(CONFIG_RCU_PERF_TEST) += rcuperf.o
obj-$(CONFIG_TREE_RCU) += tree.o
obj-$(CONFIG_PREEMPT_RCU) += tree.o
obj-$(CONFIG_TREE_RCU_TRACE) += tree_trace.o
obj-$(CONFIG_TINY_RCU) += tiny.o
obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o
......@@ -212,6 +212,18 @@ int rcu_jiffies_till_stall_check(void);
*/
#define TPS(x) tracepoint_string(x)
/*
* Dump the ftrace buffer, but only one time per callsite per boot.
*/
#define rcu_ftrace_dump(oops_dump_mode) \
do { \
static atomic_t ___rfd_beenhere = ATOMIC_INIT(0); \
\
if (!atomic_read(&___rfd_beenhere) && \
!atomic_xchg(&___rfd_beenhere, 1)) \
ftrace_dump(oops_dump_mode); \
} while (0)
void rcu_early_boot_tests(void);
void rcu_test_sync_prims(void);
......@@ -291,6 +303,271 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
cpu <= rnp->grphi; \
cpu = cpumask_next((cpu), cpu_possible_mask))
/*
* Wrappers for the rcu_node::lock acquire and release.
*
* Because the rcu_nodes form a tree, the tree traversal locking will observe
* different lock values, this in turn means that an UNLOCK of one level
* followed by a LOCK of another level does not imply a full memory barrier;
* and most importantly transitivity is lost.
*
* In order to restore full ordering between tree levels, augment the regular
* lock acquire functions with smp_mb__after_unlock_lock().
*
* As ->lock of struct rcu_node is a __private field, therefore one should use
* these wrappers rather than directly call raw_spin_{lock,unlock}* on ->lock.
*/
#define raw_spin_lock_rcu_node(p) \
do { \
raw_spin_lock(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_rcu_node(p) raw_spin_unlock(&ACCESS_PRIVATE(p, lock))
#define raw_spin_lock_irq_rcu_node(p) \
do { \
raw_spin_lock_irq(&ACCESS_PRIVATE(p, lock)); \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_irq_rcu_node(p) \
raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
#define raw_spin_lock_irqsave_rcu_node(p, flags) \
do { \
raw_spin_lock_irqsave(&ACCESS_PRIVATE(p, lock), flags); \
smp_mb__after_unlock_lock(); \
} while (0)
#define raw_spin_unlock_irqrestore_rcu_node(p, flags) \
raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) \
#define raw_spin_trylock_rcu_node(p) \
({ \
bool ___locked = raw_spin_trylock(&ACCESS_PRIVATE(p, lock)); \
\
if (___locked) \
smp_mb__after_unlock_lock(); \
___locked; \
})
#endif /* #if defined(SRCU) || !defined(TINY_RCU) */
#ifdef CONFIG_TINY_RCU
/* Tiny RCU doesn't expedite, as its purpose in life is instead to be tiny. */
static inline bool rcu_gp_is_normal(void) /* Internal RCU use. */
{
return true;
}
static inline bool rcu_gp_is_expedited(void) /* Internal RCU use. */
{
return false;
}
static inline void rcu_expedite_gp(void)
{
}
static inline void rcu_unexpedite_gp(void)
{
}
#else /* #ifdef CONFIG_TINY_RCU */
bool rcu_gp_is_normal(void); /* Internal RCU use. */
bool rcu_gp_is_expedited(void); /* Internal RCU use. */
void rcu_expedite_gp(void);
void rcu_unexpedite_gp(void);
void rcupdate_announce_bootup_oddness(void);
#endif /* #else #ifdef CONFIG_TINY_RCU */
#define RCU_SCHEDULER_INACTIVE 0
#define RCU_SCHEDULER_INIT 1
#define RCU_SCHEDULER_RUNNING 2
#ifdef CONFIG_TINY_RCU
static inline void rcu_request_urgent_qs_task(struct task_struct *t) { }
#else /* #ifdef CONFIG_TINY_RCU */
void rcu_request_urgent_qs_task(struct task_struct *t);
#endif /* #else #ifdef CONFIG_TINY_RCU */
enum rcutorture_type {
RCU_FLAVOR,
RCU_BH_FLAVOR,
RCU_SCHED_FLAVOR,
RCU_TASKS_FLAVOR,
SRCU_FLAVOR,
INVALID_RCU_FLAVOR
};
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_PREEMPT_RCU)
void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
unsigned long *gpnum, unsigned long *completed);
void rcutorture_record_test_transition(void);
void rcutorture_record_progress(unsigned long vernum);
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
*flags = 0;
*gpnum = 0;
*completed = 0;
}
static inline void rcutorture_record_test_transition(void)
{
}
static inline void rcutorture_record_progress(unsigned long vernum)
{
}
#ifdef CONFIG_RCU_TRACE
void do_trace_rcu_torture_read(const char *rcutorturename,
struct rcu_head *rhp,
unsigned long secs,
unsigned long c_old,
unsigned long c);
#else
#define do_trace_rcu_torture_read(rcutorturename, rhp, secs, c_old, c) \
do { } while (0)
#endif
#endif
#ifdef CONFIG_TINY_SRCU
static inline void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum,
unsigned long *completed)
{
if (test_type != SRCU_FLAVOR)
return;
*flags = 0;
*completed = sp->srcu_idx;
*gpnum = *completed;
}
#elif defined(CONFIG_TREE_SRCU)
void srcutorture_get_gp_data(enum rcutorture_type test_type,
struct srcu_struct *sp, int *flags,
unsigned long *gpnum, unsigned long *completed);
#endif
#ifdef CONFIG_TINY_RCU
/*
* Return the number of grace periods started.
*/
static inline unsigned long rcu_batches_started(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods started.
*/
static inline unsigned long rcu_batches_started_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods started.
*/
static inline unsigned long rcu_batches_started_sched(void)
{
return 0;
}
/*
* Return the number of grace periods completed.
*/
static inline unsigned long rcu_batches_completed(void)
{
return 0;
}
/*
* Return the number of bottom-half grace periods completed.
*/
static inline unsigned long rcu_batches_completed_bh(void)
{
return 0;
}
/*
* Return the number of sched grace periods completed.
*/
static inline unsigned long rcu_batches_completed_sched(void)
{
return 0;
}
/*
* Return the number of expedited grace periods completed.
*/
static inline unsigned long rcu_exp_batches_completed(void)
{
return 0;
}
/*
* Return the number of expedited sched grace periods completed.
*/
static inline unsigned long rcu_exp_batches_completed_sched(void)
{
return 0;
}
static inline unsigned long srcu_batches_completed(struct srcu_struct *sp)
{
return 0;
}
static inline void rcu_force_quiescent_state(void)
{
}
static inline void rcu_bh_force_quiescent_state(void)
{
}
static inline void rcu_sched_force_quiescent_state(void)
{
}
static inline void show_rcu_gp_kthreads(void)
{
}
#else /* #ifdef CONFIG_TINY_RCU */
extern unsigned long rcutorture_testseq;
extern unsigned long rcutorture_vernum;
unsigned long rcu_batches_started(void);
unsigned long rcu_batches_started_bh(void);
unsigned long rcu_batches_started_sched(void);
unsigned long rcu_batches_completed(void);
unsigned long rcu_batches_completed_bh(void);
unsigned long rcu_batches_completed_sched(void);
unsigned long rcu_exp_batches_completed(void);
unsigned long rcu_exp_batches_completed_sched(void);
unsigned long srcu_batches_completed(struct srcu_struct *sp);
void show_rcu_gp_kthreads(void);
void rcu_force_quiescent_state(void);
void rcu_bh_force_quiescent_state(void);
void rcu_sched_force_quiescent_state(void);
#endif /* #else #ifdef CONFIG_TINY_RCU */
#ifdef CONFIG_RCU_NOCB_CPU
bool rcu_is_nocb_cpu(int cpu);
#else
static inline bool rcu_is_nocb_cpu(int cpu) { return false; }
#endif
#endif /* __LINUX_RCU_H */
......@@ -48,6 +48,8 @@
#include <linux/torture.h>
#include <linux/vmalloc.h>
#include "rcu.h"
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>");
......@@ -59,12 +61,16 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.vnet.ibm.com>");
#define VERBOSE_PERFOUT_ERRSTRING(s) \
do { if (verbose) pr_alert("%s" PERF_FLAG "!!! %s\n", perf_type, s); } while (0)
torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives");
torture_param(int, gp_async_max, 1000, "Max # outstanding waits per reader");
torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
torture_param(int, nreaders, -1, "Number of RCU reader threads");
torture_param(int, nreaders, 0, "Number of RCU reader threads");
torture_param(int, nwriters, -1, "Number of RCU updater threads");
torture_param(bool, shutdown, false, "Shutdown at end of performance tests.");
torture_param(bool, shutdown, !IS_ENABLED(MODULE),
"Shutdown at end of performance tests.");
torture_param(bool, verbose, true, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
static char *perf_type = "rcu";
module_param(perf_type, charp, 0444);
......@@ -86,13 +92,16 @@ static u64 t_rcu_perf_writer_started;
static u64 t_rcu_perf_writer_finished;
static unsigned long b_rcu_perf_writer_started;
static unsigned long b_rcu_perf_writer_finished;
static DEFINE_PER_CPU(atomic_t, n_async_inflight);
static int rcu_perf_writer_state;
#define RTWS_INIT 0
#define RTWS_EXP_SYNC 1
#define RTWS_SYNC 2
#define RTWS_IDLE 2
#define RTWS_STOPPING 3
#define RTWS_ASYNC 1
#define RTWS_BARRIER 2
#define RTWS_EXP_SYNC 3
#define RTWS_SYNC 4
#define RTWS_IDLE 5
#define RTWS_STOPPING 6
#define MAX_MEAS 10000
#define MIN_MEAS 100
......@@ -114,6 +123,8 @@ struct rcu_perf_ops {
unsigned long (*started)(void);
unsigned long (*completed)(void);
unsigned long (*exp_completed)(void);
void (*async)(struct rcu_head *head, rcu_callback_t func);
void (*gp_barrier)(void);
void (*sync)(void);
void (*exp_sync)(void);
const char *name;
......@@ -153,6 +164,8 @@ static struct rcu_perf_ops rcu_ops = {
.started = rcu_batches_started,
.completed = rcu_batches_completed,
.exp_completed = rcu_exp_batches_completed,
.async = call_rcu,
.gp_barrier = rcu_barrier,
.sync = synchronize_rcu,
.exp_sync = synchronize_rcu_expedited,
.name = "rcu"
......@@ -181,6 +194,8 @@ static struct rcu_perf_ops rcu_bh_ops = {
.started = rcu_batches_started_bh,
.completed = rcu_batches_completed_bh,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_bh,
.gp_barrier = rcu_barrier_bh,
.sync = synchronize_rcu_bh,
.exp_sync = synchronize_rcu_bh_expedited,
.name = "rcu_bh"
......@@ -208,6 +223,16 @@ static unsigned long srcu_perf_completed(void)
return srcu_batches_completed(srcu_ctlp);
}
static void srcu_call_rcu(struct rcu_head *head, rcu_callback_t func)
{
call_srcu(srcu_ctlp, head, func);
}
static void srcu_rcu_barrier(void)
{
srcu_barrier(srcu_ctlp);
}
static void srcu_perf_synchronize(void)
{
synchronize_srcu(srcu_ctlp);
......@@ -226,11 +251,42 @@ static struct rcu_perf_ops srcu_ops = {
.started = NULL,
.completed = srcu_perf_completed,
.exp_completed = srcu_perf_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
.name = "srcu"
};
static struct srcu_struct srcud;
static void srcu_sync_perf_init(void)
{
srcu_ctlp = &srcud;
init_srcu_struct(srcu_ctlp);
}
static void srcu_sync_perf_cleanup(void)
{
cleanup_srcu_struct(srcu_ctlp);
}
static struct rcu_perf_ops srcud_ops = {
.ptype = SRCU_FLAVOR,
.init = srcu_sync_perf_init,
.cleanup = srcu_sync_perf_cleanup,
.readlock = srcu_perf_read_lock,
.readunlock = srcu_perf_read_unlock,
.started = NULL,
.completed = srcu_perf_completed,
.exp_completed = srcu_perf_completed,
.async = srcu_call_rcu,
.gp_barrier = srcu_rcu_barrier,
.sync = srcu_perf_synchronize,
.exp_sync = srcu_perf_synchronize_expedited,
.name = "srcud"
};
/*
* Definitions for sched perf testing.
*/
......@@ -254,6 +310,8 @@ static struct rcu_perf_ops sched_ops = {
.started = rcu_batches_started_sched,
.completed = rcu_batches_completed_sched,
.exp_completed = rcu_exp_batches_completed_sched,
.async = call_rcu_sched,
.gp_barrier = rcu_barrier_sched,
.sync = synchronize_sched,
.exp_sync = synchronize_sched_expedited,
.name = "sched"
......@@ -281,6 +339,8 @@ static struct rcu_perf_ops tasks_ops = {
.readunlock = tasks_perf_read_unlock,
.started = rcu_no_completed,
.completed = rcu_no_completed,
.async = call_rcu_tasks,
.gp_barrier = rcu_barrier_tasks,
.sync = synchronize_rcu_tasks,
.exp_sync = synchronize_rcu_tasks,
.name = "tasks"
......@@ -343,6 +403,15 @@ rcu_perf_reader(void *arg)
return 0;
}
/*
* Callback function for asynchronous grace periods from rcu_perf_writer().
*/
static void rcu_perf_async_cb(struct rcu_head *rhp)
{
atomic_dec(this_cpu_ptr(&n_async_inflight));
kfree(rhp);
}
/*
* RCU perf writer kthread. Repeatedly does a grace period.
*/
......@@ -352,6 +421,7 @@ rcu_perf_writer(void *arg)
int i = 0;
int i_max;
long me = (long)arg;
struct rcu_head *rhp = NULL;
struct sched_param sp;
bool started = false, done = false, alldone = false;
u64 t;
......@@ -380,9 +450,27 @@ rcu_perf_writer(void *arg)
}
do {
if (writer_holdoff)
udelay(writer_holdoff);
wdp = &wdpp[i];
*wdp = ktime_get_mono_fast_ns();
if (gp_exp) {
if (gp_async) {
retry:
if (!rhp)
rhp = kmalloc(sizeof(*rhp), GFP_KERNEL);
if (rhp && atomic_read(this_cpu_ptr(&n_async_inflight)) < gp_async_max) {
rcu_perf_writer_state = RTWS_ASYNC;
atomic_inc(this_cpu_ptr(&n_async_inflight));
cur_ops->async(rhp, rcu_perf_async_cb);
rhp = NULL;
} else if (!kthread_should_stop()) {
rcu_perf_writer_state = RTWS_BARRIER;
cur_ops->gp_barrier();
goto retry;
} else {
kfree(rhp); /* Because we are stopping. */
}
} else if (gp_exp) {
rcu_perf_writer_state = RTWS_EXP_SYNC;
cur_ops->exp_sync();
} else {
......@@ -429,6 +517,10 @@ rcu_perf_writer(void *arg)
i++;
rcu_perf_wait_shutdown();
} while (!torture_must_stop());
if (gp_async) {
rcu_perf_writer_state = RTWS_BARRIER;
cur_ops->gp_barrier();
}
rcu_perf_writer_state = RTWS_STOPPING;
writer_n_durations[me] = i_max;
torture_kthread_stopping("rcu_perf_writer");
......@@ -452,6 +544,17 @@ rcu_perf_cleanup(void)
u64 *wdp;
u64 *wdpp;
/*
* Would like warning at start, but everything is expedited
* during the mid-boot phase, so have to wait till the end.
*/
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp)
VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
if (rcu_gp_is_normal() && gp_exp)
VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
if (gp_exp && gp_async)
VERBOSE_PERFOUT_ERRSTRING("No expedited async GPs, so went with async!");
if (torture_cleanup_begin())
return;
......@@ -554,7 +657,7 @@ rcu_perf_init(void)
long i;
int firsterr = 0;
static struct rcu_perf_ops *perf_ops[] = {
&rcu_ops, &rcu_bh_ops, &srcu_ops, &sched_ops,
&rcu_ops, &rcu_bh_ops, &srcu_ops, &srcud_ops, &sched_ops,
RCUPERF_TASKS_OPS
};
......@@ -624,16 +727,6 @@ rcu_perf_init(void)
firsterr = -ENOMEM;
goto unwind;
}
if (rcu_gp_is_expedited() && !rcu_gp_is_normal() && !gp_exp) {
VERBOSE_PERFOUT_ERRSTRING("All grace periods expedited, no normal ones to measure!");
firsterr = -EINVAL;
goto unwind;
}
if (rcu_gp_is_normal() && gp_exp) {
VERBOSE_PERFOUT_ERRSTRING("All grace periods normal, no expedited ones to measure!");
firsterr = -EINVAL;
goto unwind;
}
for (i = 0; i < nrealwriters; i++) {
writer_durations[i] =
kcalloc(MAX_MEAS, sizeof(*writer_durations[i]),
......
......@@ -52,6 +52,8 @@
#include <linux/torture.h>
#include <linux/vmalloc.h>
#include "rcu.h"
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Paul E. McKenney <paulmck@us.ibm.com> and Josh Triplett <josh@joshtriplett.org>");
......@@ -562,31 +564,19 @@ static void srcu_torture_stats(void)
int __maybe_unused cpu;
int idx;
#if defined(CONFIG_TREE_SRCU) || defined(CONFIG_CLASSIC_SRCU)
#ifdef CONFIG_TREE_SRCU
idx = srcu_ctlp->srcu_idx & 0x1;
#else /* #ifdef CONFIG_TREE_SRCU */
idx = srcu_ctlp->completed & 0x1;
#endif /* #else #ifdef CONFIG_TREE_SRCU */
pr_alert("%s%s Tree SRCU per-CPU(idx=%d):",
torture_type, TORTURE_FLAG, idx);
for_each_possible_cpu(cpu) {
unsigned long l0, l1;
unsigned long u0, u1;
long c0, c1;
#ifdef CONFIG_TREE_SRCU
struct srcu_data *counts;
counts = per_cpu_ptr(srcu_ctlp->sda, cpu);
u0 = counts->srcu_unlock_count[!idx];
u1 = counts->srcu_unlock_count[idx];
#else /* #ifdef CONFIG_TREE_SRCU */
struct srcu_array *counts;
counts = per_cpu_ptr(srcu_ctlp->per_cpu_ref, cpu);
u0 = counts->unlock_count[!idx];
u1 = counts->unlock_count[idx];
#endif /* #else #ifdef CONFIG_TREE_SRCU */
/*
* Make sure that a lock is always counted if the corresponding
......@@ -594,13 +584,8 @@ static void srcu_torture_stats(void)
*/
smp_rmb();
#ifdef CONFIG_TREE_SRCU
l0 = counts->srcu_lock_count[!idx];
l1 = counts->srcu_lock_count[idx];
#else /* #ifdef CONFIG_TREE_SRCU */
l0 = counts->lock_count[!idx];
l1 = counts->lock_count[idx];
#endif /* #else #ifdef CONFIG_TREE_SRCU */
c0 = l0 - u0;
c1 = l1 - u1;
......@@ -609,7 +594,7 @@ static void srcu_torture_stats(void)
pr_cont("\n");
#elif defined(CONFIG_TINY_SRCU)
idx = READ_ONCE(srcu_ctlp->srcu_idx) & 0x1;
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%d,%d)\n",
pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n",
torture_type, TORTURE_FLAG, idx,
READ_ONCE(srcu_ctlp->srcu_lock_nesting[!idx]),
READ_ONCE(srcu_ctlp->srcu_lock_nesting[idx]));
......
此差异已折叠。
......@@ -38,8 +38,8 @@ static int init_srcu_struct_fields(struct srcu_struct *sp)
sp->srcu_lock_nesting[0] = 0;
sp->srcu_lock_nesting[1] = 0;
init_swait_queue_head(&sp->srcu_wq);
sp->srcu_gp_seq = 0;
rcu_segcblist_init(&sp->srcu_cblist);
sp->srcu_cb_head = NULL;
sp->srcu_cb_tail = &sp->srcu_cb_head;
sp->srcu_gp_running = false;
sp->srcu_gp_waiting = false;
sp->srcu_idx = 0;
......@@ -88,29 +88,13 @@ void cleanup_srcu_struct(struct srcu_struct *sp)
{
WARN_ON(sp->srcu_lock_nesting[0] || sp->srcu_lock_nesting[1]);
flush_work(&sp->srcu_work);
WARN_ON(rcu_seq_state(sp->srcu_gp_seq));
WARN_ON(sp->srcu_gp_running);
WARN_ON(sp->srcu_gp_waiting);
WARN_ON(!rcu_segcblist_empty(&sp->srcu_cblist));
WARN_ON(sp->srcu_cb_head);
WARN_ON(&sp->srcu_cb_head != sp->srcu_cb_tail);
}
EXPORT_SYMBOL_GPL(cleanup_srcu_struct);
/*
* Counts the new reader in the appropriate per-CPU element of the
* srcu_struct. Can be invoked from irq/bh handlers, but the matching
* __srcu_read_unlock() must be in the same handler instance. Returns an
* index that must be passed to the matching srcu_read_unlock().
*/
int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;
idx = READ_ONCE(sp->srcu_idx);
WRITE_ONCE(sp->srcu_lock_nesting[idx], sp->srcu_lock_nesting[idx] + 1);
return idx;
}
EXPORT_SYMBOL_GPL(__srcu_read_lock);
/*
* Removes the count for the old reader from the appropriate element of
* the srcu_struct.
......@@ -133,52 +117,44 @@ EXPORT_SYMBOL_GPL(__srcu_read_unlock);
void srcu_drive_gp(struct work_struct *wp)
{
int idx;
struct rcu_cblist ready_cbs;
struct srcu_struct *sp;
struct rcu_head *lh;
struct rcu_head *rhp;
struct srcu_struct *sp;
sp = container_of(wp, struct srcu_struct, srcu_work);
if (sp->srcu_gp_running || rcu_segcblist_empty(&sp->srcu_cblist))
if (sp->srcu_gp_running || !READ_ONCE(sp->srcu_cb_head))
return; /* Already running or nothing to do. */
/* Tag recently arrived callbacks and wait for readers. */
/* Remove recently arrived callbacks and wait for readers. */
WRITE_ONCE(sp->srcu_gp_running, true);
rcu_segcblist_accelerate(&sp->srcu_cblist,
rcu_seq_snap(&sp->srcu_gp_seq));
rcu_seq_start(&sp->srcu_gp_seq);
local_irq_disable();
lh = sp->srcu_cb_head;
sp->srcu_cb_head = NULL;
sp->srcu_cb_tail = &sp->srcu_cb_head;
local_irq_enable();
idx = sp->srcu_idx;
WRITE_ONCE(sp->srcu_idx, !sp->srcu_idx);
WRITE_ONCE(sp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */
swait_event(sp->srcu_wq, !READ_ONCE(sp->srcu_lock_nesting[idx]));
WRITE_ONCE(sp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
rcu_seq_end(&sp->srcu_gp_seq);
/* Update callback list based on GP, and invoke ready callbacks. */
rcu_segcblist_advance(&sp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
if (rcu_segcblist_ready_cbs(&sp->srcu_cblist)) {
rcu_cblist_init(&ready_cbs);
local_irq_disable();
rcu_segcblist_extract_done_cbs(&sp->srcu_cblist, &ready_cbs);
local_irq_enable();
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
}
local_irq_disable();
rcu_segcblist_insert_count(&sp->srcu_cblist, &ready_cbs);
local_irq_enable();
/* Invoke the callbacks we removed above. */
while (lh) {
rhp = lh;
lh = lh->next;
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
}
WRITE_ONCE(sp->srcu_gp_running, false);
/*
* If more callbacks, reschedule ourselves. This can race with
* a call_srcu() at interrupt level, but the ->srcu_gp_running
* checks will straighten that out.
* Enable rescheduling, and if there are more callbacks,
* reschedule ourselves. This can race with a call_srcu()
* at interrupt level, but the ->srcu_gp_running checks will
* straighten that out.
*/
if (!rcu_segcblist_empty(&sp->srcu_cblist))
WRITE_ONCE(sp->srcu_gp_running, false);
if (READ_ONCE(sp->srcu_cb_head))
schedule_work(&sp->srcu_work);
}
EXPORT_SYMBOL_GPL(srcu_drive_gp);
......@@ -187,14 +163,16 @@ EXPORT_SYMBOL_GPL(srcu_drive_gp);
* Enqueue an SRCU callback on the specified srcu_struct structure,
* initiating grace-period processing if it is not already running.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
void call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
rcu_callback_t func)
{
unsigned long flags;
head->func = func;
rhp->func = func;
rhp->next = NULL;
local_irq_save(flags);
rcu_segcblist_enqueue(&sp->srcu_cblist, head, false);
*sp->srcu_cb_tail = rhp;
sp->srcu_cb_tail = &rhp->next;
local_irq_restore(flags);
if (!READ_ONCE(sp->srcu_gp_running))
schedule_work(&sp->srcu_work);
......
......@@ -40,9 +40,15 @@
#include "rcu.h"
#include "rcu_segcblist.h"
ulong exp_holdoff = 25 * 1000; /* Holdoff (ns) for auto-expediting. */
/* Holdoff in nanoseconds for auto-expediting. */
#define DEFAULT_SRCU_EXP_HOLDOFF (25 * 1000)
static ulong exp_holdoff = DEFAULT_SRCU_EXP_HOLDOFF;
module_param(exp_holdoff, ulong, 0444);
/* Overflow-check frequency. N bits roughly says every 2**N grace periods. */
static ulong counter_wrap_check = (ULONG_MAX >> 2);
module_param(counter_wrap_check, ulong, 0444);
static void srcu_invoke_callbacks(struct work_struct *work);
static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay);
......@@ -70,7 +76,7 @@ static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static)
/* Each pass through this loop initializes one srcu_node structure. */
rcu_for_each_node_breadth_first(sp, snp) {
spin_lock_init(&snp->lock);
raw_spin_lock_init(&ACCESS_PRIVATE(snp, lock));
WARN_ON_ONCE(ARRAY_SIZE(snp->srcu_have_cbs) !=
ARRAY_SIZE(snp->srcu_data_have_cbs));
for (i = 0; i < ARRAY_SIZE(snp->srcu_have_cbs); i++) {
......@@ -104,7 +110,7 @@ static void init_srcu_struct_nodes(struct srcu_struct *sp, bool is_static)
snp_first = sp->level[level];
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(sp->sda, cpu);
spin_lock_init(&sdp->lock);
raw_spin_lock_init(&ACCESS_PRIVATE(sdp, lock));
rcu_segcblist_init(&sdp->srcu_cblist);
sdp->srcu_cblist_invoking = false;
sdp->srcu_gp_seq_needed = sp->srcu_gp_seq;
......@@ -163,7 +169,7 @@ int __init_srcu_struct(struct srcu_struct *sp, const char *name,
/* Don't re-initialize a lock while it is held. */
debug_check_no_locks_freed((void *)sp, sizeof(*sp));
lockdep_init_map(&sp->dep_map, name, key, 0);
spin_lock_init(&sp->gp_lock);
raw_spin_lock_init(&ACCESS_PRIVATE(sp, lock));
return init_srcu_struct_fields(sp, false);
}
EXPORT_SYMBOL_GPL(__init_srcu_struct);
......@@ -180,7 +186,7 @@ EXPORT_SYMBOL_GPL(__init_srcu_struct);
*/
int init_srcu_struct(struct srcu_struct *sp)
{
spin_lock_init(&sp->gp_lock);
raw_spin_lock_init(&ACCESS_PRIVATE(sp, lock));
return init_srcu_struct_fields(sp, false);
}
EXPORT_SYMBOL_GPL(init_srcu_struct);
......@@ -191,7 +197,7 @@ EXPORT_SYMBOL_GPL(init_srcu_struct);
* First-use initialization of statically allocated srcu_struct
* structure. Wiring up the combining tree is more than can be
* done with compile-time initialization, so this check is added
* to each update-side SRCU primitive. Use ->gp_lock, which -is-
* to each update-side SRCU primitive. Use sp->lock, which -is-
* compile-time initialized, to resolve races involving multiple
* CPUs trying to garner first-use privileges.
*/
......@@ -203,13 +209,13 @@ static void check_init_srcu_struct(struct srcu_struct *sp)
/* The smp_load_acquire() pairs with the smp_store_release(). */
if (!rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq_needed))) /*^^^*/
return; /* Already initialized. */
spin_lock_irqsave(&sp->gp_lock, flags);
raw_spin_lock_irqsave_rcu_node(sp, flags);
if (!rcu_seq_state(sp->srcu_gp_seq_needed)) {
spin_unlock_irqrestore(&sp->gp_lock, flags);
raw_spin_unlock_irqrestore_rcu_node(sp, flags);
return;
}
init_srcu_struct_fields(sp, true);
spin_unlock_irqrestore(&sp->gp_lock, flags);
raw_spin_unlock_irqrestore_rcu_node(sp, flags);
}
/*
......@@ -275,15 +281,20 @@ static bool srcu_readers_active_idx_check(struct srcu_struct *sp, int idx)
* not mean that there are no more readers, as one could have read
* the current index but not have incremented the lock counter yet.
*
* Possible bug: There is no guarantee that there haven't been
* ULONG_MAX increments of ->srcu_lock_count[] since the unlocks were
* counted, meaning that this could return true even if there are
* still active readers. Since there are no memory barriers around
* srcu_flip(), the CPU is not required to increment ->srcu_idx
* before running srcu_readers_unlock_idx(), which means that there
* could be an arbitrarily large number of critical sections that
* execute after srcu_readers_unlock_idx() but use the old value
* of ->srcu_idx.
* So suppose that the updater is preempted here for so long
* that more than ULONG_MAX non-nested readers come and go in
* the meantime. It turns out that this cannot result in overflow
* because if a reader modifies its unlock count after we read it
* above, then that reader's next load of ->srcu_idx is guaranteed
* to get the new value, which will cause it to operate on the
* other bank of counters, where it cannot contribute to the
* overflow of these counters. This means that there is a maximum
* of 2*NR_CPUS increments, which cannot overflow given current
* systems, especially not on 64-bit systems.
*
* OK, how about nesting? This does impose a limit on nesting
* of floor(ULONG_MAX/NR_CPUS/2), which should be sufficient,
* especially on 64-bit systems.
*/
return srcu_readers_lock_idx(sp, idx) == unlocks;
}
......@@ -400,8 +411,7 @@ static void srcu_gp_start(struct srcu_struct *sp)
struct srcu_data *sdp = this_cpu_ptr(sp->sda);
int state;
RCU_LOCKDEP_WARN(!lockdep_is_held(&sp->gp_lock),
"Invoked srcu_gp_start() without ->gp_lock!");
lockdep_assert_held(&sp->lock);
WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
......@@ -489,17 +499,20 @@ static void srcu_gp_end(struct srcu_struct *sp)
{
unsigned long cbdelay;
bool cbs;
int cpu;
unsigned long flags;
unsigned long gpseq;
int idx;
int idxnext;
unsigned long mask;
struct srcu_data *sdp;
struct srcu_node *snp;
/* Prevent more than one additional grace period. */
mutex_lock(&sp->srcu_cb_mutex);
/* End the current grace period. */
spin_lock_irq(&sp->gp_lock);
raw_spin_lock_irq_rcu_node(sp);
idx = rcu_seq_state(sp->srcu_gp_seq);
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
cbdelay = srcu_get_delay(sp);
......@@ -508,7 +521,7 @@ static void srcu_gp_end(struct srcu_struct *sp)
gpseq = rcu_seq_current(&sp->srcu_gp_seq);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, gpseq))
sp->srcu_gp_seq_needed_exp = gpseq;
spin_unlock_irq(&sp->gp_lock);
raw_spin_unlock_irq_rcu_node(sp);
mutex_unlock(&sp->srcu_gp_mutex);
/* A new grace period can start at this point. But only one. */
......@@ -516,7 +529,7 @@ static void srcu_gp_end(struct srcu_struct *sp)
idx = rcu_seq_ctr(gpseq) % ARRAY_SIZE(snp->srcu_have_cbs);
idxnext = (idx + 1) % ARRAY_SIZE(snp->srcu_have_cbs);
rcu_for_each_node_breadth_first(sp, snp) {
spin_lock_irq(&snp->lock);
raw_spin_lock_irq_rcu_node(snp);
cbs = false;
if (snp >= sp->level[rcu_num_lvls - 1])
cbs = snp->srcu_have_cbs[idx] == gpseq;
......@@ -526,28 +539,37 @@ static void srcu_gp_end(struct srcu_struct *sp)
snp->srcu_gp_seq_needed_exp = gpseq;
mask = snp->srcu_data_have_cbs[idx];
snp->srcu_data_have_cbs[idx] = 0;
spin_unlock_irq(&snp->lock);
if (cbs) {
smp_mb(); /* GP end before CB invocation. */
raw_spin_unlock_irq_rcu_node(snp);
if (cbs)
srcu_schedule_cbs_snp(sp, snp, mask, cbdelay);
}
/* Occasionally prevent srcu_data counter wrap. */
if (!(gpseq & counter_wrap_check))
for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
sdp = per_cpu_ptr(sp->sda, cpu);
raw_spin_lock_irqsave_rcu_node(sdp, flags);
if (ULONG_CMP_GE(gpseq,
sdp->srcu_gp_seq_needed + 100))
sdp->srcu_gp_seq_needed = gpseq;
raw_spin_unlock_irqrestore_rcu_node(sdp, flags);
}
}
/* Callback initiation done, allow grace periods after next. */
mutex_unlock(&sp->srcu_cb_mutex);
/* Start a new grace period if needed. */
spin_lock_irq(&sp->gp_lock);
raw_spin_lock_irq_rcu_node(sp);
gpseq = rcu_seq_current(&sp->srcu_gp_seq);
if (!rcu_seq_state(gpseq) &&
ULONG_CMP_LT(gpseq, sp->srcu_gp_seq_needed)) {
srcu_gp_start(sp);
spin_unlock_irq(&sp->gp_lock);
raw_spin_unlock_irq_rcu_node(sp);
/* Throttle expedited grace periods: Should be rare! */
srcu_reschedule(sp, rcu_seq_ctr(gpseq) & 0x3ff
? 0 : SRCU_INTERVAL);
} else {
spin_unlock_irq(&sp->gp_lock);
raw_spin_unlock_irq_rcu_node(sp);
}
}
......@@ -567,18 +589,18 @@ static void srcu_funnel_exp_start(struct srcu_struct *sp, struct srcu_node *snp,
if (rcu_seq_done(&sp->srcu_gp_seq, s) ||
ULONG_CMP_GE(READ_ONCE(snp->srcu_gp_seq_needed_exp), s))
return;
spin_lock_irqsave(&snp->lock, flags);
raw_spin_lock_irqsave_rcu_node(snp, flags);
if (ULONG_CMP_GE(snp->srcu_gp_seq_needed_exp, s)) {
spin_unlock_irqrestore(&snp->lock, flags);
raw_spin_unlock_irqrestore_rcu_node(snp, flags);
return;
}
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, s);
spin_unlock_irqrestore(&snp->lock, flags);
raw_spin_unlock_irqrestore_rcu_node(snp, flags);
}
spin_lock_irqsave(&sp->gp_lock, flags);
raw_spin_lock_irqsave_rcu_node(sp, flags);
if (!ULONG_CMP_LT(sp->srcu_gp_seq_needed_exp, s))
sp->srcu_gp_seq_needed_exp = s;
spin_unlock_irqrestore(&sp->gp_lock, flags);
raw_spin_unlock_irqrestore_rcu_node(sp, flags);
}
/*
......@@ -600,14 +622,13 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp,
for (; snp != NULL; snp = snp->srcu_parent) {
if (rcu_seq_done(&sp->srcu_gp_seq, s) && snp != sdp->mynode)
return; /* GP already done and CBs recorded. */
spin_lock_irqsave(&snp->lock, flags);
raw_spin_lock_irqsave_rcu_node(snp, flags);
if (ULONG_CMP_GE(snp->srcu_have_cbs[idx], s)) {
snp_seq = snp->srcu_have_cbs[idx];
if (snp == sdp->mynode && snp_seq == s)
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
spin_unlock_irqrestore(&snp->lock, flags);
raw_spin_unlock_irqrestore_rcu_node(snp, flags);
if (snp == sdp->mynode && snp_seq != s) {
smp_mb(); /* CBs after GP! */
srcu_schedule_cbs_sdp(sdp, do_norm
? SRCU_INTERVAL
: 0);
......@@ -622,11 +643,11 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp,
snp->srcu_data_have_cbs[idx] |= sdp->grpmask;
if (!do_norm && ULONG_CMP_LT(snp->srcu_gp_seq_needed_exp, s))
snp->srcu_gp_seq_needed_exp = s;
spin_unlock_irqrestore(&snp->lock, flags);
raw_spin_unlock_irqrestore_rcu_node(snp, flags);
}
/* Top of tree, must ensure the grace period will be started. */
spin_lock_irqsave(&sp->gp_lock, flags);
raw_spin_lock_irqsave_rcu_node(sp, flags);
if (ULONG_CMP_LT(sp->srcu_gp_seq_needed, s)) {
/*
* Record need for grace period s. Pair with load
......@@ -645,7 +666,7 @@ static void srcu_funnel_gp_start(struct srcu_struct *sp, struct srcu_data *sdp,
queue_delayed_work(system_power_efficient_wq, &sp->work,
srcu_get_delay(sp));
}
spin_unlock_irqrestore(&sp->gp_lock, flags);
raw_spin_unlock_irqrestore_rcu_node(sp, flags);
}
/*
......@@ -671,6 +692,16 @@ static bool try_check_zero(struct srcu_struct *sp, int idx, int trycount)
*/
static void srcu_flip(struct srcu_struct *sp)
{
/*
* Ensure that if this updater saw a given reader's increment
* from __srcu_read_lock(), that reader was using an old value
* of ->srcu_idx. Also ensure that if a given reader sees the
* new value of ->srcu_idx, this updater's earlier scans cannot
* have seen that reader's increments (which is OK, because this
* grace period need not wait on that reader).
*/
smp_mb(); /* E */ /* Pairs with B and C. */
WRITE_ONCE(sp->srcu_idx, sp->srcu_idx + 1);
/*
......@@ -744,6 +775,13 @@ static bool srcu_might_be_idle(struct srcu_struct *sp)
return true; /* With reasonable probability, idle! */
}
/*
* SRCU callback function to leak a callback.
*/
static void srcu_leak_callback(struct rcu_head *rhp)
{
}
/*
* Enqueue an SRCU callback on the srcu_data structure associated with
* the current CPU and the specified srcu_struct structure, initiating
......@@ -782,10 +820,16 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
struct srcu_data *sdp;
check_init_srcu_struct(sp);
if (debug_rcu_head_queue(rhp)) {
/* Probable double call_srcu(), so leak the callback. */
WRITE_ONCE(rhp->func, srcu_leak_callback);
WARN_ONCE(1, "call_srcu(): Leaked duplicate callback\n");
return;
}
rhp->func = func;
local_irq_save(flags);
sdp = this_cpu_ptr(sp->sda);
spin_lock(&sdp->lock);
raw_spin_lock_rcu_node(sdp);
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp, false);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
......@@ -799,13 +843,30 @@ void __call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
sdp->srcu_gp_seq_needed_exp = s;
needexp = true;
}
spin_unlock_irqrestore(&sdp->lock, flags);
raw_spin_unlock_irqrestore_rcu_node(sdp, flags);
if (needgp)
srcu_funnel_gp_start(sp, sdp, s, do_norm);
else if (needexp)
srcu_funnel_exp_start(sp, sdp->mynode, s);
}
/**
* call_srcu() - Queue a callback for invocation after an SRCU grace period
* @sp: srcu_struct in queue the callback
* @head: structure to be used for queueing the SRCU callback.
* @func: function to be invoked after the SRCU grace period
*
* The callback function will be invoked some time after a full SRCU
* grace period elapses, in other words after all pre-existing SRCU
* read-side critical sections have completed. However, the callback
* function might well execute concurrently with other SRCU read-side
* critical sections that started after call_srcu() was invoked. SRCU
* read-side critical sections are delimited by srcu_read_lock() and
* srcu_read_unlock(), and may be nested.
*
* The callback will be invoked from process context, but must nevertheless
* be fast and must not block.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *rhp,
rcu_callback_t func)
{
......@@ -953,13 +1014,16 @@ void srcu_barrier(struct srcu_struct *sp)
*/
for_each_possible_cpu(cpu) {
sdp = per_cpu_ptr(sp->sda, cpu);
spin_lock_irq(&sdp->lock);
raw_spin_lock_irq_rcu_node(sdp);
atomic_inc(&sp->srcu_barrier_cpu_cnt);
sdp->srcu_barrier_head.func = srcu_barrier_cb;
debug_rcu_head_queue(&sdp->srcu_barrier_head);
if (!rcu_segcblist_entrain(&sdp->srcu_cblist,
&sdp->srcu_barrier_head, 0))
&sdp->srcu_barrier_head, 0)) {
debug_rcu_head_unqueue(&sdp->srcu_barrier_head);
atomic_dec(&sp->srcu_barrier_cpu_cnt);
spin_unlock_irq(&sdp->lock);
}
raw_spin_unlock_irq_rcu_node(sdp);
}
/* Remove the initial count, at which point reaching zero can happen. */
......@@ -1008,17 +1072,17 @@ static void srcu_advance_state(struct srcu_struct *sp)
*/
idx = rcu_seq_state(smp_load_acquire(&sp->srcu_gp_seq)); /* ^^^ */
if (idx == SRCU_STATE_IDLE) {
spin_lock_irq(&sp->gp_lock);
raw_spin_lock_irq_rcu_node(sp);
if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) {
WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq));
spin_unlock_irq(&sp->gp_lock);
raw_spin_unlock_irq_rcu_node(sp);
mutex_unlock(&sp->srcu_gp_mutex);
return;
}
idx = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
if (idx == SRCU_STATE_IDLE)
srcu_gp_start(sp);
spin_unlock_irq(&sp->gp_lock);
raw_spin_unlock_irq_rcu_node(sp);
if (idx != SRCU_STATE_IDLE) {
mutex_unlock(&sp->srcu_gp_mutex);
return; /* Someone else started the grace period. */
......@@ -1067,22 +1131,22 @@ static void srcu_invoke_callbacks(struct work_struct *work)
sdp = container_of(work, struct srcu_data, work.work);
sp = sdp->sp;
rcu_cblist_init(&ready_cbs);
spin_lock_irq(&sdp->lock);
smp_mb(); /* Old grace periods before callback invocation! */
raw_spin_lock_irq_rcu_node(sdp);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&sp->srcu_gp_seq));
if (sdp->srcu_cblist_invoking ||
!rcu_segcblist_ready_cbs(&sdp->srcu_cblist)) {
spin_unlock_irq(&sdp->lock);
raw_spin_unlock_irq_rcu_node(sdp);
return; /* Someone else on the job or nothing to do. */
}
/* We are on the job! Extract and invoke ready callbacks. */
sdp->srcu_cblist_invoking = true;
rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs);
spin_unlock_irq(&sdp->lock);
raw_spin_unlock_irq_rcu_node(sdp);
rhp = rcu_cblist_dequeue(&ready_cbs);
for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) {
debug_rcu_head_unqueue(rhp);
local_bh_disable();
rhp->func(rhp);
local_bh_enable();
......@@ -1092,13 +1156,13 @@ static void srcu_invoke_callbacks(struct work_struct *work)
* Update counts, accelerate new callbacks, and if needed,
* schedule another round of callback invocation.
*/
spin_lock_irq(&sdp->lock);
raw_spin_lock_irq_rcu_node(sdp);
rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs);
(void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
rcu_seq_snap(&sp->srcu_gp_seq));
sdp->srcu_cblist_invoking = false;
more = rcu_segcblist_ready_cbs(&sdp->srcu_cblist);
spin_unlock_irq(&sdp->lock);
raw_spin_unlock_irq_rcu_node(sdp);
if (more)
srcu_schedule_cbs_sdp(sdp, 0);
}
......@@ -1111,7 +1175,7 @@ static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay)
{
bool pushgp = true;
spin_lock_irq(&sp->gp_lock);
raw_spin_lock_irq_rcu_node(sp);
if (ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed)) {
if (!WARN_ON_ONCE(rcu_seq_state(sp->srcu_gp_seq))) {
/* All requests fulfilled, time to go idle. */
......@@ -1121,7 +1185,7 @@ static void srcu_reschedule(struct srcu_struct *sp, unsigned long delay)
/* Outstanding request and no GP. Start one. */
srcu_gp_start(sp);
}
spin_unlock_irq(&sp->gp_lock);
raw_spin_unlock_irq_rcu_node(sp);
if (pushgp)
queue_delayed_work(system_power_efficient_wq, &sp->work, delay);
......@@ -1152,3 +1216,12 @@ void srcutorture_get_gp_data(enum rcutorture_type test_type,
*gpnum = rcu_seq_ctr(sp->srcu_gp_seq_needed);
}
EXPORT_SYMBOL_GPL(srcutorture_get_gp_data);
static int __init srcu_bootup_announce(void)
{
pr_info("Hierarchical SRCU implementation.\n");
if (exp_holdoff != DEFAULT_SRCU_EXP_HOLDOFF)
pr_info("\tNon-default auto-expedite holdoff of %lu ns.\n", exp_holdoff);
return 0;
}
early_initcall(srcu_bootup_announce);
......@@ -35,15 +35,26 @@
#include <linux/time.h>
#include <linux/cpu.h>
#include <linux/prefetch.h>
#include <linux/trace_events.h>
#include "rcu.h"
/* Forward declarations for tiny_plugin.h. */
struct rcu_ctrlblk;
static void __call_rcu(struct rcu_head *head,
rcu_callback_t func,
struct rcu_ctrlblk *rcp);
/* Global control variables for rcupdate callback mechanism. */
struct rcu_ctrlblk {
struct rcu_head *rcucblist; /* List of pending callbacks (CBs). */
struct rcu_head **donetail; /* ->next pointer of last "done" CB. */
struct rcu_head **curtail; /* ->next pointer of last CB. */
};
/* Definition for rcupdate control block. */
static struct rcu_ctrlblk rcu_sched_ctrlblk = {
.donetail = &rcu_sched_ctrlblk.rcucblist,
.curtail = &rcu_sched_ctrlblk.rcucblist,
};
static struct rcu_ctrlblk rcu_bh_ctrlblk = {
.donetail = &rcu_bh_ctrlblk.rcucblist,
.curtail = &rcu_bh_ctrlblk.rcucblist,
};
#include "tiny_plugin.h"
......@@ -59,19 +70,6 @@ void rcu_barrier_sched(void)
}
EXPORT_SYMBOL(rcu_barrier_sched);
#if defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE)
/*
* Test whether RCU thinks that the current CPU is idle.
*/
bool notrace __rcu_is_watching(void)
{
return true;
}
EXPORT_SYMBOL(__rcu_is_watching);
#endif /* defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_RCU_TRACE) */
/*
* Helper function for rcu_sched_qs() and rcu_bh_qs().
* Also irqs are disabled to avoid confusion due to interrupt handlers
......@@ -79,7 +77,6 @@ EXPORT_SYMBOL(__rcu_is_watching);
*/
static int rcu_qsctr_help(struct rcu_ctrlblk *rcp)
{
RCU_TRACE(reset_cpu_stall_ticks(rcp);)
if (rcp->donetail != rcp->curtail) {
rcp->donetail = rcp->curtail;
return 1;
......@@ -125,7 +122,6 @@ void rcu_bh_qs(void)
*/
void rcu_check_callbacks(int user)
{
RCU_TRACE(check_cpu_stalls();)
if (user)
rcu_sched_qs();
else if (!in_softirq())
......@@ -140,10 +136,8 @@ void rcu_check_callbacks(int user)
*/
static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
{
const char *rn = NULL;
struct rcu_head *next, *list;
unsigned long flags;
RCU_TRACE(int cb_count = 0;)
/* Move the ready-to-invoke callbacks to a local list. */
local_irq_save(flags);
......@@ -152,7 +146,6 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
local_irq_restore(flags);
return;
}
RCU_TRACE(trace_rcu_batch_start(rcp->name, 0, rcp->qlen, -1);)
list = rcp->rcucblist;
rcp->rcucblist = *rcp->donetail;
*rcp->donetail = NULL;
......@@ -162,22 +155,15 @@ static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp)
local_irq_restore(flags);
/* Invoke the callbacks on the local list. */
RCU_TRACE(rn = rcp->name;)
while (list) {
next = list->next;
prefetch(next);
debug_rcu_head_unqueue(list);
local_bh_disable();
__rcu_reclaim(rn, list);
__rcu_reclaim("", list);
local_bh_enable();
list = next;
RCU_TRACE(cb_count++;)
}
RCU_TRACE(rcu_trace_sub_qlen(rcp, cb_count);)
RCU_TRACE(trace_rcu_batch_end(rcp->name,
cb_count, 0, need_resched(),
is_idle_task(current),
false));
}
static __latent_entropy void rcu_process_callbacks(struct softirq_action *unused)
......@@ -221,7 +207,6 @@ static void __call_rcu(struct rcu_head *head,
local_irq_save(flags);
*rcp->curtail = head;
rcp->curtail = &head->next;
RCU_TRACE(rcp->qlen++;)
local_irq_restore(flags);
if (unlikely(is_idle_task(current))) {
......@@ -254,8 +239,5 @@ EXPORT_SYMBOL_GPL(call_rcu_bh);
void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
RCU_TRACE(reset_cpu_stall_ticks(&rcu_sched_ctrlblk);)
RCU_TRACE(reset_cpu_stall_ticks(&rcu_bh_ctrlblk);)
rcu_early_boot_tests();
}
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -147,7 +147,7 @@ static void __maybe_unused sync_exp_reset_tree(struct rcu_state *rsp)
*
* Caller must hold the rcu_state's exp_mutex.
*/
static int sync_rcu_preempt_exp_done(struct rcu_node *rnp)
static bool sync_rcu_preempt_exp_done(struct rcu_node *rnp)
{
return rnp->exp_tasks == NULL &&
READ_ONCE(rnp->expmask) == 0;
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -5874,15 +5874,9 @@ int sched_cpu_deactivate(unsigned int cpu)
* users of this state to go away such that all new such users will
* observe it.
*
* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
* not imply sync_sched(), so wait for both.
*
* Do sync before park smpboot threads to take care the rcu boost case.
*/
if (IS_ENABLED(CONFIG_PREEMPT))
synchronize_rcu_mult(call_rcu, call_rcu_sched);
else
synchronize_rcu();
synchronize_rcu_mult(call_rcu, call_rcu_sched);
if (!sched_smp_initialized)
return 0;
......
......@@ -126,56 +126,6 @@ config NO_HZ_FULL_ALL
Note the boot CPU will still be kept outside the range to
handle the timekeeping duty.
config NO_HZ_FULL_SYSIDLE
bool "Detect full-system idle state for full dynticks system"
depends on NO_HZ_FULL
default n
help
At least one CPU must keep the scheduling-clock tick running for
timekeeping purposes whenever there is a non-idle CPU, where
"non-idle" also includes dynticks CPUs as long as they are
running non-idle tasks. Because the underlying adaptive-tick
support cannot distinguish between all CPUs being idle and
all CPUs each running a single task in dynticks mode, the
underlying support simply ensures that there is always a CPU
handling the scheduling-clock tick, whether or not all CPUs
are idle. This Kconfig option enables scalable detection of
the all-CPUs-idle state, thus allowing the scheduling-clock
tick to be disabled when all CPUs are idle. Note that scalable
detection of the all-CPUs-idle state means that larger systems
will be slower to declare the all-CPUs-idle state.
Say Y if you would like to help debug all-CPUs-idle detection.
Say N if you are unsure.
config NO_HZ_FULL_SYSIDLE_SMALL
int "Number of CPUs above which large-system approach is used"
depends on NO_HZ_FULL_SYSIDLE
range 1 NR_CPUS
default 8
help
The full-system idle detection mechanism takes a lazy approach
on large systems, as is required to attain decent scalability.
However, on smaller systems, scalability is not anywhere near as
large a concern as is energy efficiency. The sysidle subsystem
therefore uses a fast but non-scalable algorithm for small
systems and a lazier but scalable algorithm for large systems.
This Kconfig parameter defines the number of CPUs in the largest
system that will be considered to be "small".
The default value will be fine in most cases. Battery-powered
systems that (1) enable NO_HZ_FULL_SYSIDLE, (2) have larger
numbers of CPUs, and (3) are suffering from battery-lifetime
problems due to long sysidle latencies might wish to experiment
with larger values for this Kconfig parameter. On the other
hand, they might be even better served by disabling NO_HZ_FULL
entirely, given that NO_HZ_FULL is intended for HPC and
real-time workloads that at present do not tend to be run on
battery-powered systems.
Take the default if you are unsure.
config NO_HZ
bool "Old Idle dynticks config"
depends on !ARCH_USES_GETTIMEOFFSET && GENERIC_CLOCKEVENTS
......
此差异已折叠。
......@@ -25,9 +25,6 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
earlycpio.o seq_buf.o siphash.o \
nmi_backtrace.o nodemask.o win_minmax.o
CFLAGS_radix-tree.o += -DCONFIG_SPARSE_RCU_POINTER
CFLAGS_idr.o += -DCONFIG_SPARSE_RCU_POINTER
lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
lib-$(CONFIG_DMA_NOOP_OPS) += dma-noop.o
......
......@@ -5533,23 +5533,6 @@ sub process {
}
}
# Check for expedited grace periods that interrupt non-idle non-nohz
# online CPUs. These expedited can therefore degrade real-time response
# if used carelessly, and should be avoided where not absolutely
# needed. It is always OK to use synchronize_rcu_expedited() and
# synchronize_sched_expedited() at boot time (before real-time applications
# start) and in error situations where real-time response is compromised in
# any case. Note that synchronize_srcu_expedited() does -not- interrupt
# other CPUs, so don't warn on uses of synchronize_srcu_expedited().
# Of course, nothing comes for free, and srcu_read_lock() and
# srcu_read_unlock() do contain full memory barriers in payment for
# synchronize_srcu_expedited() non-interruption properties.
if ($line =~ /\b(synchronize_rcu_expedited|synchronize_sched_expedited)\(/) {
WARN("EXPEDITED_RCU_GRACE_PERIOD",
"expedited RCU grace periods should be avoided where they can degrade real-time response\n" . $herecurr);
}
# check of hardware specific defines
if ($line =~ m@^.\s*\#\s*if.*\b(__i386__|__powerpc64__|__sun__|__s390x__)\b@ && $realfile !~ m@include/asm-@) {
CHK("ARCH_DEFINES",
......
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册