提交 226da0db 编写于 作者: L Linus Torvalds

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU changes from Ingo Molnar:
 "This is the v3.5 RCU tree from Paul E.  McKenney:

 1) A set of improvements and fixes to the RCU_FAST_NO_HZ feature (with
    more on the way for 3.6).  Posted to LKML:
       https://lkml.org/lkml/2012/4/23/324 (commits 1-3 and 5),
       https://lkml.org/lkml/2012/4/16/611 (commit 4),
       https://lkml.org/lkml/2012/4/30/390 (commit 6), and
       https://lkml.org/lkml/2012/5/4/410 (commit 7, combined with
       the other commits for the convenience of the tester).

 2) Changes to make rcu_barrier() avoid disrupting execution of CPUs
    that have no RCU callbacks.  Posted to LKML:
       https://lkml.org/lkml/2012/4/23/322.

 3) A couple of commits that improve the efficiency of the interaction
    between preemptible RCU and the scheduler, these two being all that
    survived an abortive attempt to allow preemptible RCU's
    __rcu_read_lock() to be inlined.  The full set was posted to LKML at
    https://lkml.org/lkml/2012/4/14/143, and the first and third patches
    of that set remain.

 4) Lai Jiangshan's algorithmic implementation of SRCU, which includes
    call_srcu() and srcu_barrier().  A major feature of this new
    implementation is that synchronize_srcu() no longer disturbs the
    execution of other CPUs.  This work is based on earlier
    implementations by Peter Zijlstra and Paul E.  McKenney.  Posted to
    LKML: https://lkml.org/lkml/2012/2/22/82.

 5) A number of miscellaneous bug fixes and improvements which were
    posted to LKML at: https://lkml.org/lkml/2012/4/23/353 with
    subsequent updates posted to LKML."

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
  rcu: Make rcu_barrier() less disruptive
  rcu: Explicitly initialize RCU_FAST_NO_HZ per-CPU variables
  rcu: Make RCU_FAST_NO_HZ handle timer migration
  rcu: Update RCU maintainership
  rcu: Make exit_rcu() more precise and consolidate
  rcu: Move PREEMPT_RCU preemption to switch_to() invocation
  rcu: Ensure that RCU_FAST_NO_HZ timers expire on correct CPU
  rcu: Add rcutorture test for call_srcu()
  rcu: Implement per-domain single-threaded call_srcu() state machine
  rcu: Use single value to handle expedited SRCU grace periods
  rcu: Improve srcu_readers_active_idx()'s cache locality
  rcu: Remove unused srcu_barrier()
  rcu: Implement a variant of Peter's SRCU algorithm
  rcu: Improve SRCU's wait_idx() comments
  rcu: Flip ->completed only once per SRCU grace period
  rcu: Increment upper bit only for srcu_read_lock()
  rcu: Remove fast check path from __synchronize_srcu()
  rcu: Direct algorithmic SRCU implementation
  rcu: Introduce rcutorture testing for rcu_barrier()
  timer: Fix mod_timer_pinned() header comment
  ...
......@@ -47,6 +47,16 @@ irqreader Says to invoke RCU readers from irq level. This is currently
permit this. (Or, more accurately, variants of RCU that do
-not- permit this know to ignore this variable.)
n_barrier_cbs If this is nonzero, RCU barrier testing will be conducted,
in which case n_barrier_cbs specifies the number of
RCU callbacks (and corresponding kthreads) to use for
this testing. The value cannot be negative. If you
specify this to be non-zero when torture_type indicates a
synchronous RCU implementation (one for which a member of
the synchronize_rcu() rather than the call_rcu() family is
used -- see the documentation for torture_type below), an
error will be reported and no testing will be carried out.
nfakewriters This is the number of RCU fake writer threads to run. Fake
writer threads repeatedly use the synchronous "wait for
current readers" function of the interface selected by
......@@ -188,7 +198,7 @@ OUTPUT
The statistics output is as follows:
rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767
rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767
rcu-torture: Reader Pipe: 727860534 34213 0 0 0 0 0 0 0 0 0
rcu-torture: Reader Batch: 727877838 17003 0 0 0 0 0 0 0 0 0
rcu-torture: Free-Block Circulation: 155440 155440 155440 155440 155440 155440 155440 155440 155440 155440 0
......@@ -230,6 +240,9 @@ o "rtmbe": A non-zero value indicates that rcutorture believes that
rcu_assign_pointer() and rcu_dereference() are not working
correctly. This value should be zero.
o "rtbe": A non-zero value indicates that one of the rcu_barrier()
family of functions is not working correctly.
o "rtbke": rcutorture was unable to create the real-time kthreads
used to force RCU priority inversion. This value should be zero.
......
......@@ -2333,18 +2333,100 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/blockdev/ramdisk.txt.
rcupdate.blimit= [KNL,BOOT]
rcutree.blimit= [KNL,BOOT]
Set maximum number of finished RCU callbacks to process
in one batch.
rcupdate.qhimark= [KNL,BOOT]
rcutree.qhimark= [KNL,BOOT]
Set threshold of queued
RCU callbacks over which batch limiting is disabled.
rcupdate.qlowmark= [KNL,BOOT]
rcutree.qlowmark= [KNL,BOOT]
Set threshold of queued RCU callbacks below which
batch limiting is re-enabled.
rcutree.rcu_cpu_stall_suppress= [KNL,BOOT]
Suppress RCU CPU stall warning messages.
rcutree.rcu_cpu_stall_timeout= [KNL,BOOT]
Set timeout for RCU CPU stall warning messages.
rcutorture.fqs_duration= [KNL,BOOT]
Set duration of force_quiescent_state bursts.
rcutorture.fqs_holdoff= [KNL,BOOT]
Set holdoff time within force_quiescent_state bursts.
rcutorture.fqs_stutter= [KNL,BOOT]
Set wait time between force_quiescent_state bursts.
rcutorture.irqreader= [KNL,BOOT]
Test RCU readers from irq handlers.
rcutorture.n_barrier_cbs= [KNL,BOOT]
Set callbacks/threads for rcu_barrier() testing.
rcutorture.nfakewriters= [KNL,BOOT]
Set number of concurrent RCU writers. These just
stress RCU, they don't participate in the actual
test, hence the "fake".
rcutorture.nreaders= [KNL,BOOT]
Set number of RCU readers.
rcutorture.onoff_holdoff= [KNL,BOOT]
Set time (s) after boot for CPU-hotplug testing.
rcutorture.onoff_interval= [KNL,BOOT]
Set time (s) between CPU-hotplug operations, or
zero to disable CPU-hotplug testing.
rcutorture.shuffle_interval= [KNL,BOOT]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
during the rcutorture test.
rcutorture.shutdown_secs= [KNL,BOOT]
Set time (s) after boot system shutdown. This
is useful for hands-off automated testing.
rcutorture.stall_cpu= [KNL,BOOT]
Duration of CPU stall (s) to test RCU CPU stall
warnings, zero to disable.
rcutorture.stall_cpu_holdoff= [KNL,BOOT]
Time to wait (s) after boot before inducing stall.
rcutorture.stat_interval= [KNL,BOOT]
Time (s) between statistics printk()s.
rcutorture.stutter= [KNL,BOOT]
Time (s) to stutter testing, for example, specifying
five seconds causes the test to run for five seconds,
wait for five seconds, and so on. This tests RCU's
ability to transition abruptly to and from idle.
rcutorture.test_boost= [KNL,BOOT]
Test RCU priority boosting? 0=no, 1=maybe, 2=yes.
"Maybe" means test if the RCU implementation
under test support RCU priority boosting.
rcutorture.test_boost_duration= [KNL,BOOT]
Duration (s) of each individual boost test.
rcutorture.test_boost_interval= [KNL,BOOT]
Interval (s) between each boost test.
rcutorture.test_no_idle_hz= [KNL,BOOT]
Test RCU's dyntick-idle handling. See also the
rcutorture.shuffle_interval parameter.
rcutorture.torture_type= [KNL,BOOT]
Specify the RCU implementation to test.
rcutorture.verbose= [KNL,BOOT]
Enable additional printk() statements.
rdinit= [KNL]
Format: <full_path>
Run specified binary instead of /init from the ramdisk,
......
......@@ -5598,14 +5598,13 @@ F: net/rds/
READ-COPY UPDATE (RCU)
M: Dipankar Sarma <dipankar@in.ibm.com>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
W: http://www.rdrop.com/users/paulmck/rclock/
W: http://www.rdrop.com/users/paulmck/RCU/
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
F: Documentation/RCU/
X: Documentation/RCU/torture.txt
F: include/linux/rcu*
F: include/linux/srcu*
F: kernel/rcu*
F: kernel/srcu*
X: kernel/rcutorture.c
REAL TIME CLOCK (RTC) SUBSYSTEM
......@@ -6122,6 +6121,15 @@ S: Maintained
F: include/linux/sl?b*.h
F: mm/sl?b.c
SLEEPABLE READ-COPY UPDATE (SRCU)
M: Lai Jiangshan <laijs@cn.fujitsu.com>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
W: http://www.rdrop.com/users/paulmck/RCU/
S: Supported
T: git git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
F: include/linux/srcu*
F: kernel/srcu*
SMC91x ETHERNET DRIVER
M: Nicolas Pitre <nico@fluxnic.net>
S: Odd Fixes
......
......@@ -705,6 +705,7 @@ static void stack_proc(void *arg)
struct task_struct *from = current, *to = arg;
to->thread.saved_task = from;
rcu_switch_from(from);
switch_to(from, to, from);
}
......
......@@ -30,6 +30,7 @@
* This is only for internal list manipulation where we know
* the prev/next entries already!
*/
#ifndef CONFIG_DEBUG_LIST
static inline void __list_add_rcu(struct list_head *new,
struct list_head *prev, struct list_head *next)
{
......@@ -38,6 +39,10 @@ static inline void __list_add_rcu(struct list_head *new,
rcu_assign_pointer(list_next_rcu(prev), new);
next->prev = new;
}
#else
extern void __list_add_rcu(struct list_head *new,
struct list_head *prev, struct list_head *next);
#endif
/**
* list_add_rcu - add a new entry to rcu-protected list
......@@ -108,7 +113,7 @@ static inline void list_add_tail_rcu(struct list_head *new,
*/
static inline void list_del_rcu(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
__list_del_entry(entry);
entry->prev = LIST_POISON2;
}
......@@ -228,18 +233,43 @@ static inline void list_splice_init_rcu(struct list_head *list,
})
/**
* list_first_entry_rcu - get the first element from a list
* Where are list_empty_rcu() and list_first_entry_rcu()?
*
* Implementing those functions following their counterparts list_empty() and
* list_first_entry() is not advisable because they lead to subtle race
* conditions as the following snippet shows:
*
* if (!list_empty_rcu(mylist)) {
* struct foo *bar = list_first_entry_rcu(mylist, struct foo, list_member);
* do_something(bar);
* }
*
* The list may not be empty when list_empty_rcu checks it, but it may be when
* list_first_entry_rcu rereads the ->next pointer.
*
* Rereading the ->next pointer is not a problem for list_empty() and
* list_first_entry() because they would be protected by a lock that blocks
* writers.
*
* See list_first_or_null_rcu for an alternative.
*/
/**
* list_first_or_null_rcu - get the first element from a list
* @ptr: the list head to take the element from.
* @type: the type of the struct this is embedded in.
* @member: the name of the list_struct within the struct.
*
* Note, that list is expected to be not empty.
* Note that if the list is empty, it returns NULL.
*
* This primitive may safely run concurrently with the _rcu list-mutation
* primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
*/
#define list_first_entry_rcu(ptr, type, member) \
list_entry_rcu((ptr)->next, type, member)
#define list_first_or_null_rcu(ptr, type, member) \
({struct list_head *__ptr = (ptr); \
struct list_head __rcu *__next = list_next_rcu(__ptr); \
likely(__ptr != __next) ? container_of(__next, type, member) : NULL; \
})
/**
* list_for_each_entry_rcu - iterate over rcu list of given type
......
......@@ -184,12 +184,14 @@ static inline int rcu_preempt_depth(void)
/* Internal to kernel */
extern void rcu_sched_qs(int cpu);
extern void rcu_bh_qs(int cpu);
extern void rcu_preempt_note_context_switch(void);
extern void rcu_check_callbacks(int cpu, int user);
struct notifier_block;
extern void rcu_idle_enter(void);
extern void rcu_idle_exit(void);
extern void rcu_irq_enter(void);
extern void rcu_irq_exit(void);
extern void exit_rcu(void);
/**
* RCU_NONIDLE - Indicate idle-loop code that needs RCU readers
......@@ -922,6 +924,21 @@ void __kfree_rcu(struct rcu_head *head, unsigned long offset)
kfree_call_rcu(head, (rcu_callback)offset);
}
/*
* Does the specified offset indicate that the corresponding rcu_head
* structure can be handled by kfree_rcu()?
*/
#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
/*
* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
*/
#define __kfree_rcu(head, offset) \
do { \
BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
call_rcu(head, (void (*)(struct rcu_head *))(unsigned long)(offset)); \
} while (0)
/**
* kfree_rcu() - kfree an object after a grace period.
* @ptr: pointer to kfree
......@@ -944,6 +961,9 @@ void __kfree_rcu(struct rcu_head *head, unsigned long offset)
*
* Note that the allowable offset might decrease in the future, for example,
* to allow something like kmem_cache_free_rcu().
*
* The BUILD_BUG_ON check must not involve any function calls, hence the
* checks are done in macros here.
*/
#define kfree_rcu(ptr, rcu_head) \
__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
......
......@@ -87,14 +87,6 @@ static inline void kfree_call_rcu(struct rcu_head *head,
#ifdef CONFIG_TINY_RCU
static inline void rcu_preempt_note_context_switch(void)
{
}
static inline void exit_rcu(void)
{
}
static inline int rcu_needs_cpu(int cpu)
{
return 0;
......@@ -102,8 +94,6 @@ static inline int rcu_needs_cpu(int cpu)
#else /* #ifdef CONFIG_TINY_RCU */
void rcu_preempt_note_context_switch(void);
extern void exit_rcu(void);
int rcu_preempt_needs_cpu(void);
static inline int rcu_needs_cpu(int cpu)
......@@ -116,7 +106,6 @@ static inline int rcu_needs_cpu(int cpu)
static inline void rcu_note_context_switch(int cpu)
{
rcu_sched_qs(cpu);
rcu_preempt_note_context_switch();
}
/*
......
......@@ -45,18 +45,6 @@ static inline void rcu_virt_note_context_switch(int cpu)
rcu_note_context_switch(cpu);
}
#ifdef CONFIG_TREE_PREEMPT_RCU
extern void exit_rcu(void);
#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
static inline void exit_rcu(void)
{
}
#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
extern void synchronize_rcu_bh(void);
extern void synchronize_sched_expedited(void);
extern void synchronize_rcu_expedited(void);
......@@ -98,13 +86,6 @@ extern void rcu_force_quiescent_state(void);
extern void rcu_bh_force_quiescent_state(void);
extern void rcu_sched_force_quiescent_state(void);
/* A context switch is a grace period for RCU-sched and RCU-bh. */
static inline int rcu_blocking_is_gp(void)
{
might_sleep(); /* Check for RCU read-side critical section. */
return num_online_cpus() == 1;
}
extern void rcu_scheduler_starting(void);
extern int rcu_scheduler_active __read_mostly;
......
......@@ -1905,12 +1905,22 @@ static inline void rcu_copy_process(struct task_struct *p)
INIT_LIST_HEAD(&p->rcu_node_entry);
}
static inline void rcu_switch_from(struct task_struct *prev)
{
if (prev->rcu_read_lock_nesting != 0)
rcu_preempt_note_context_switch();
}
#else
static inline void rcu_copy_process(struct task_struct *p)
{
}
static inline void rcu_switch_from(struct task_struct *prev)
{
}
#endif
#ifdef CONFIG_SMP
......
......@@ -29,26 +29,35 @@
#include <linux/mutex.h>
#include <linux/rcupdate.h>
#include <linux/workqueue.h>
struct srcu_struct_array {
int c[2];
unsigned long c[2];
unsigned long seq[2];
};
struct rcu_batch {
struct rcu_head *head, **tail;
};
struct srcu_struct {
int completed;
unsigned completed;
struct srcu_struct_array __percpu *per_cpu_ref;
struct mutex mutex;
spinlock_t queue_lock; /* protect ->batch_queue, ->running */
bool running;
/* callbacks just queued */
struct rcu_batch batch_queue;
/* callbacks try to do the first check_zero */
struct rcu_batch batch_check0;
/* callbacks done with the first check_zero and the flip */
struct rcu_batch batch_check1;
struct rcu_batch batch_done;
struct delayed_work work;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
};
#ifndef CONFIG_PREEMPT
#define srcu_barrier() barrier()
#else /* #ifndef CONFIG_PREEMPT */
#define srcu_barrier()
#endif /* #else #ifndef CONFIG_PREEMPT */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
int __init_srcu_struct(struct srcu_struct *sp, const char *name,
......@@ -67,12 +76,33 @@ int init_srcu_struct(struct srcu_struct *sp);
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/**
* call_srcu() - Queue a callback for invocation after an SRCU grace period
* @sp: srcu_struct in queue the callback
* @head: structure to be used for queueing the SRCU callback.
* @func: function to be invoked after the SRCU grace period
*
* The callback function will be invoked some time after a full SRCU
* grace period elapses, in other words after all pre-existing SRCU
* read-side critical sections have completed. However, the callback
* function might well execute concurrently with other SRCU read-side
* critical sections that started after call_srcu() was invoked. SRCU
* read-side critical sections are delimited by srcu_read_lock() and
* srcu_read_unlock(), and may be nested.
*
* The callback will be invoked from process context, but must nevertheless
* be fast and must not block.
*/
void call_srcu(struct srcu_struct *sp, struct rcu_head *head,
void (*func)(struct rcu_head *head));
void cleanup_srcu_struct(struct srcu_struct *sp);
int __srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
void __srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
void synchronize_srcu(struct srcu_struct *sp);
void synchronize_srcu_expedited(struct srcu_struct *sp);
long srcu_batches_completed(struct srcu_struct *sp);
void srcu_barrier(struct srcu_struct *sp);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
......
......@@ -292,6 +292,8 @@ TRACE_EVENT(rcu_dyntick,
* "More callbacks": Still more callbacks, try again to clear them out.
* "Callbacks drained": All callbacks processed, off to dyntick idle!
* "Timer": Timer fired to cause CPU to continue processing callbacks.
* "Demigrate": Timer fired on wrong CPU, woke up correct CPU.
* "Cleanup after idle": Idle exited, timer canceled.
*/
TRACE_EVENT(rcu_prep_idle,
......
......@@ -458,6 +458,33 @@ config RCU_FANOUT
Select a specific number if testing RCU itself.
Take the default if unsure.
config RCU_FANOUT_LEAF
int "Tree-based hierarchical RCU leaf-level fanout value"
range 2 RCU_FANOUT if 64BIT
range 2 RCU_FANOUT if !64BIT
depends on TREE_RCU || TREE_PREEMPT_RCU
default 16
help
This option controls the leaf-level fanout of hierarchical
implementations of RCU, and allows trading off cache misses
against lock contention. Systems that synchronize their
scheduling-clock interrupts for energy-efficiency reasons will
want the default because the smaller leaf-level fanout keeps
lock contention levels acceptably low. Very large systems
(hundreds or thousands of CPUs) will instead want to set this
value to the maximum value possible in order to reduce the
number of cache misses incurred during RCU's grace-period
initialization. These systems tend to run CPU-bound, and thus
are not helped by synchronized interrupts, and thus tend to
skew them, which reduces lock contention enough that large
leaf-level fanouts work well.
Select a specific number if testing RCU itself.
Select the maximum permissible value for large systems.
Take the default if unsure.
config RCU_FANOUT_EXACT
bool "Disable tree-based hierarchical RCU auto-balancing"
depends on TREE_RCU || TREE_PREEMPT_RCU
......@@ -515,10 +542,25 @@ config RCU_BOOST_PRIO
depends on RCU_BOOST
default 1
help
This option specifies the real-time priority to which preempted
RCU readers are to be boosted. If you are working with CPU-bound
real-time applications, you should specify a priority higher then
the highest-priority CPU-bound application.
This option specifies the real-time priority to which long-term
preempted RCU readers are to be boosted. If you are working
with a real-time application that has one or more CPU-bound
threads running at a real-time priority level, you should set
RCU_BOOST_PRIO to a priority higher then the highest-priority
real-time CPU-bound thread. The default RCU_BOOST_PRIO value
of 1 is appropriate in the common case, which is real-time
applications that do not have any CPU-bound threads.
Some real-time applications might not have a single real-time
thread that saturates a given CPU, but instead might have
multiple real-time threads that, taken together, fully utilize
that CPU. In this case, you should set RCU_BOOST_PRIO to
a priority higher than the lowest-priority thread that is
conspiring to prevent the CPU from running any non-real-time
tasks. For example, if one thread at priority 10 and another
thread at priority 5 are between themselves fully consuming
the CPU time on a given CPU, then RCU_BOOST_PRIO should be
set to priority 6 or higher.
Specify the real-time priority, or take the default if unsure.
......
......@@ -51,6 +51,34 @@
#include "rcu.h"
#ifdef CONFIG_PREEMPT_RCU
/*
* Check for a task exiting while in a preemptible-RCU read-side
* critical section, clean up if so. No need to issue warnings,
* as debug_check_no_locks_held() already does this if lockdep
* is enabled.
*/
void exit_rcu(void)
{
struct task_struct *t = current;
if (likely(list_empty(&current->rcu_node_entry)))
return;
t->rcu_read_lock_nesting = 1;
barrier();
t->rcu_read_unlock_special = RCU_READ_UNLOCK_BLOCKED;
__rcu_read_unlock();
}
#else /* #ifdef CONFIG_PREEMPT_RCU */
void exit_rcu(void)
{
}
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
#ifdef CONFIG_DEBUG_LOCK_ALLOC
static struct lock_class_key rcu_lock_key;
struct lockdep_map rcu_lock_map =
......
......@@ -851,22 +851,6 @@ int rcu_preempt_needs_cpu(void)
return rcu_preempt_ctrlblk.rcb.rcucblist != NULL;
}
/*
* Check for a task exiting while in a preemptible -RCU read-side
* critical section, clean up if so. No need to issue warnings,
* as debug_check_no_locks_held() already does this if lockdep
* is enabled.
*/
void exit_rcu(void)
{
struct task_struct *t = current;
if (t->rcu_read_lock_nesting == 0)
return;
t->rcu_read_lock_nesting = 1;
__rcu_read_unlock();
}
#else /* #ifdef CONFIG_TINY_PREEMPT_RCU */
#ifdef CONFIG_RCU_TRACE
......
......@@ -64,6 +64,7 @@ static int irqreader = 1; /* RCU readers from irq (timers). */
static int fqs_duration; /* Duration of bursts (us), 0 to disable. */
static int fqs_holdoff; /* Hold time within burst (us). */
static int fqs_stutter = 3; /* Wait time between bursts (s). */
static int n_barrier_cbs; /* Number of callbacks to test RCU barriers. */
static int onoff_interval; /* Wait time between CPU hotplugs, 0=disable. */
static int onoff_holdoff; /* Seconds after boot before CPU hotplugs. */
static int shutdown_secs; /* Shutdown time (s). <=0 for no shutdown. */
......@@ -96,6 +97,8 @@ module_param(fqs_holdoff, int, 0444);
MODULE_PARM_DESC(fqs_holdoff, "Holdoff time within fqs bursts (us)");
module_param(fqs_stutter, int, 0444);
MODULE_PARM_DESC(fqs_stutter, "Wait time between fqs bursts (s)");
module_param(n_barrier_cbs, int, 0444);
MODULE_PARM_DESC(n_barrier_cbs, "# of callbacks/kthreads for barrier testing");
module_param(onoff_interval, int, 0444);
MODULE_PARM_DESC(onoff_interval, "Time between CPU hotplugs (s), 0=disable");
module_param(onoff_holdoff, int, 0444);
......@@ -139,6 +142,8 @@ static struct task_struct *shutdown_task;
static struct task_struct *onoff_task;
#endif /* #ifdef CONFIG_HOTPLUG_CPU */
static struct task_struct *stall_task;
static struct task_struct **barrier_cbs_tasks;
static struct task_struct *barrier_task;
#define RCU_TORTURE_PIPE_LEN 10
......@@ -164,6 +169,7 @@ static atomic_t n_rcu_torture_alloc_fail;
static atomic_t n_rcu_torture_free;
static atomic_t n_rcu_torture_mberror;
static atomic_t n_rcu_torture_error;
static long n_rcu_torture_barrier_error;
static long n_rcu_torture_boost_ktrerror;
static long n_rcu_torture_boost_rterror;
static long n_rcu_torture_boost_failure;
......@@ -173,6 +179,8 @@ static long n_offline_attempts;
static long n_offline_successes;
static long n_online_attempts;
static long n_online_successes;
static long n_barrier_attempts;
static long n_barrier_successes;
static struct list_head rcu_torture_removed;
static cpumask_var_t shuffle_tmp_mask;
......@@ -197,6 +205,10 @@ static unsigned long shutdown_time; /* jiffies to system shutdown. */
static unsigned long boost_starttime; /* jiffies of next boost test start. */
DEFINE_MUTEX(boost_mutex); /* protect setting boost_starttime */
/* and boost task create/destroy. */
static atomic_t barrier_cbs_count; /* Barrier callbacks registered. */
static atomic_t barrier_cbs_invoked; /* Barrier callbacks invoked. */
static wait_queue_head_t *barrier_cbs_wq; /* Coordinate barrier testing. */
static DECLARE_WAIT_QUEUE_HEAD(barrier_wq);
/* Mediate rmmod and system shutdown. Concurrent rmmod & shutdown illegal! */
......@@ -327,6 +339,7 @@ struct rcu_torture_ops {
int (*completed)(void);
void (*deferred_free)(struct rcu_torture *p);
void (*sync)(void);
void (*call)(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
void (*cb_barrier)(void);
void (*fqs)(void);
int (*stats)(char *page);
......@@ -417,6 +430,7 @@ static struct rcu_torture_ops rcu_ops = {
.completed = rcu_torture_completed,
.deferred_free = rcu_torture_deferred_free,
.sync = synchronize_rcu,
.call = call_rcu,
.cb_barrier = rcu_barrier,
.fqs = rcu_force_quiescent_state,
.stats = NULL,
......@@ -460,6 +474,7 @@ static struct rcu_torture_ops rcu_sync_ops = {
.completed = rcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = synchronize_rcu,
.call = NULL,
.cb_barrier = NULL,
.fqs = rcu_force_quiescent_state,
.stats = NULL,
......@@ -477,6 +492,7 @@ static struct rcu_torture_ops rcu_expedited_ops = {
.completed = rcu_no_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = synchronize_rcu_expedited,
.call = NULL,
.cb_barrier = NULL,
.fqs = rcu_force_quiescent_state,
.stats = NULL,
......@@ -519,6 +535,7 @@ static struct rcu_torture_ops rcu_bh_ops = {
.completed = rcu_bh_torture_completed,
.deferred_free = rcu_bh_torture_deferred_free,
.sync = synchronize_rcu_bh,
.call = call_rcu_bh,
.cb_barrier = rcu_barrier_bh,
.fqs = rcu_bh_force_quiescent_state,
.stats = NULL,
......@@ -535,6 +552,7 @@ static struct rcu_torture_ops rcu_bh_sync_ops = {
.completed = rcu_bh_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = synchronize_rcu_bh,
.call = NULL,
.cb_barrier = NULL,
.fqs = rcu_bh_force_quiescent_state,
.stats = NULL,
......@@ -551,6 +569,7 @@ static struct rcu_torture_ops rcu_bh_expedited_ops = {
.completed = rcu_bh_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = synchronize_rcu_bh_expedited,
.call = NULL,
.cb_barrier = NULL,
.fqs = rcu_bh_force_quiescent_state,
.stats = NULL,
......@@ -606,6 +625,11 @@ static int srcu_torture_completed(void)
return srcu_batches_completed(&srcu_ctl);
}
static void srcu_torture_deferred_free(struct rcu_torture *rp)
{
call_srcu(&srcu_ctl, &rp->rtort_rcu, rcu_torture_cb);
}
static void srcu_torture_synchronize(void)
{
synchronize_srcu(&srcu_ctl);
......@@ -620,7 +644,7 @@ static int srcu_torture_stats(char *page)
cnt += sprintf(&page[cnt], "%s%s per-CPU(idx=%d):",
torture_type, TORTURE_FLAG, idx);
for_each_possible_cpu(cpu) {
cnt += sprintf(&page[cnt], " %d(%d,%d)", cpu,
cnt += sprintf(&page[cnt], " %d(%lu,%lu)", cpu,
per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[!idx],
per_cpu_ptr(srcu_ctl.per_cpu_ref, cpu)->c[idx]);
}
......@@ -635,13 +659,29 @@ static struct rcu_torture_ops srcu_ops = {
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
.completed = srcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.deferred_free = srcu_torture_deferred_free,
.sync = srcu_torture_synchronize,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu"
};
static struct rcu_torture_ops srcu_sync_ops = {
.init = srcu_torture_init,
.cleanup = srcu_torture_cleanup,
.readlock = srcu_torture_read_lock,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock,
.completed = srcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = srcu_torture_synchronize,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu_sync"
};
static int srcu_torture_read_lock_raw(void) __acquires(&srcu_ctl)
{
return srcu_read_lock_raw(&srcu_ctl);
......@@ -659,13 +699,29 @@ static struct rcu_torture_ops srcu_raw_ops = {
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock_raw,
.completed = srcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.deferred_free = srcu_torture_deferred_free,
.sync = srcu_torture_synchronize,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu_raw"
};
static struct rcu_torture_ops srcu_raw_sync_ops = {
.init = srcu_torture_init,
.cleanup = srcu_torture_cleanup,
.readlock = srcu_torture_read_lock_raw,
.read_delay = srcu_read_delay,
.readunlock = srcu_torture_read_unlock_raw,
.completed = srcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = srcu_torture_synchronize,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu_raw_sync"
};
static void srcu_torture_synchronize_expedited(void)
{
synchronize_srcu_expedited(&srcu_ctl);
......@@ -680,6 +736,7 @@ static struct rcu_torture_ops srcu_expedited_ops = {
.completed = srcu_torture_completed,
.deferred_free = rcu_sync_torture_deferred_free,
.sync = srcu_torture_synchronize_expedited,
.call = NULL,
.cb_barrier = NULL,
.stats = srcu_torture_stats,
.name = "srcu_expedited"
......@@ -1129,7 +1186,8 @@ rcu_torture_printk(char *page)
"rtc: %p ver: %lu tfle: %d rta: %d rtaf: %d rtf: %d "
"rtmbe: %d rtbke: %ld rtbre: %ld "
"rtbf: %ld rtb: %ld nt: %ld "
"onoff: %ld/%ld:%ld/%ld",
"onoff: %ld/%ld:%ld/%ld "
"barrier: %ld/%ld:%ld",
rcu_torture_current,
rcu_torture_current_version,
list_empty(&rcu_torture_freelist),
......@@ -1145,14 +1203,17 @@ rcu_torture_printk(char *page)
n_online_successes,
n_online_attempts,
n_offline_successes,
n_offline_attempts);
n_offline_attempts,
n_barrier_successes,
n_barrier_attempts,
n_rcu_torture_barrier_error);
cnt += sprintf(&page[cnt], "\n%s%s ", torture_type, TORTURE_FLAG);
if (atomic_read(&n_rcu_torture_mberror) != 0 ||
n_rcu_torture_barrier_error != 0 ||
n_rcu_torture_boost_ktrerror != 0 ||
n_rcu_torture_boost_rterror != 0 ||
n_rcu_torture_boost_failure != 0)
cnt += sprintf(&page[cnt], " !!!");
cnt += sprintf(&page[cnt], "\n%s%s ", torture_type, TORTURE_FLAG);
if (i > 1) {
n_rcu_torture_boost_failure != 0 ||
i > 1) {
cnt += sprintf(&page[cnt], "!!! ");
atomic_inc(&n_rcu_torture_error);
WARN_ON_ONCE(1);
......@@ -1337,6 +1398,7 @@ static void rcutorture_booster_cleanup(int cpu)
/* This must be outside of the mutex, otherwise deadlock! */
kthread_stop(t);
boost_tasks[cpu] = NULL;
}
static int rcutorture_booster_init(int cpu)
......@@ -1484,13 +1546,15 @@ static void rcu_torture_onoff_cleanup(void)
return;
VERBOSE_PRINTK_STRING("Stopping rcu_torture_onoff task");
kthread_stop(onoff_task);
onoff_task = NULL;
}
#else /* #ifdef CONFIG_HOTPLUG_CPU */
static void
static int
rcu_torture_onoff_init(void)
{
return 0;
}
static void rcu_torture_onoff_cleanup(void)
......@@ -1554,6 +1618,152 @@ static void rcu_torture_stall_cleanup(void)
return;
VERBOSE_PRINTK_STRING("Stopping rcu_torture_stall_task.");
kthread_stop(stall_task);
stall_task = NULL;
}
/* Callback function for RCU barrier testing. */
void rcu_torture_barrier_cbf(struct rcu_head *rcu)
{
atomic_inc(&barrier_cbs_invoked);
}
/* kthread function to register callbacks used to test RCU barriers. */
static int rcu_torture_barrier_cbs(void *arg)
{
long myid = (long)arg;
struct rcu_head rcu;
init_rcu_head_on_stack(&rcu);
VERBOSE_PRINTK_STRING("rcu_torture_barrier_cbs task started");
set_user_nice(current, 19);
do {
wait_event(barrier_cbs_wq[myid],
atomic_read(&barrier_cbs_count) == n_barrier_cbs ||
kthread_should_stop() ||
fullstop != FULLSTOP_DONTSTOP);
if (kthread_should_stop() || fullstop != FULLSTOP_DONTSTOP)
break;
cur_ops->call(&rcu, rcu_torture_barrier_cbf);
if (atomic_dec_and_test(&barrier_cbs_count))
wake_up(&barrier_wq);
} while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
VERBOSE_PRINTK_STRING("rcu_torture_barrier_cbs task stopping");
rcutorture_shutdown_absorb("rcu_torture_barrier_cbs");
while (!kthread_should_stop())
schedule_timeout_interruptible(1);
cur_ops->cb_barrier();
destroy_rcu_head_on_stack(&rcu);
return 0;
}
/* kthread function to drive and coordinate RCU barrier testing. */
static int rcu_torture_barrier(void *arg)
{
int i;
VERBOSE_PRINTK_STRING("rcu_torture_barrier task starting");
do {
atomic_set(&barrier_cbs_invoked, 0);
atomic_set(&barrier_cbs_count, n_barrier_cbs);
/* wake_up() path contains the required barriers. */
for (i = 0; i < n_barrier_cbs; i++)
wake_up(&barrier_cbs_wq[i]);
wait_event(barrier_wq,
atomic_read(&barrier_cbs_count) == 0 ||
kthread_should_stop() ||
fullstop != FULLSTOP_DONTSTOP);
if (kthread_should_stop() || fullstop != FULLSTOP_DONTSTOP)
break;
n_barrier_attempts++;
cur_ops->cb_barrier();
if (atomic_read(&barrier_cbs_invoked) != n_barrier_cbs) {
n_rcu_torture_barrier_error++;
WARN_ON_ONCE(1);
}
n_barrier_successes++;
schedule_timeout_interruptible(HZ / 10);
} while (!kthread_should_stop() && fullstop == FULLSTOP_DONTSTOP);
VERBOSE_PRINTK_STRING("rcu_torture_barrier task stopping");
rcutorture_shutdown_absorb("rcu_torture_barrier_cbs");
while (!kthread_should_stop())
schedule_timeout_interruptible(1);
return 0;
}
/* Initialize RCU barrier testing. */
static int rcu_torture_barrier_init(void)
{
int i;
int ret;
if (n_barrier_cbs == 0)
return 0;
if (cur_ops->call == NULL || cur_ops->cb_barrier == NULL) {
printk(KERN_ALERT "%s" TORTURE_FLAG
" Call or barrier ops missing for %s,\n",
torture_type, cur_ops->name);
printk(KERN_ALERT "%s" TORTURE_FLAG
" RCU barrier testing omitted from run.\n",
torture_type);
return 0;
}
atomic_set(&barrier_cbs_count, 0);
atomic_set(&barrier_cbs_invoked, 0);
barrier_cbs_tasks =
kzalloc(n_barrier_cbs * sizeof(barrier_cbs_tasks[0]),
GFP_KERNEL);
barrier_cbs_wq =
kzalloc(n_barrier_cbs * sizeof(barrier_cbs_wq[0]),
GFP_KERNEL);
if (barrier_cbs_tasks == NULL || barrier_cbs_wq == 0)
return -ENOMEM;
for (i = 0; i < n_barrier_cbs; i++) {
init_waitqueue_head(&barrier_cbs_wq[i]);
barrier_cbs_tasks[i] = kthread_run(rcu_torture_barrier_cbs,
(void *)(long)i,
"rcu_torture_barrier_cbs");
if (IS_ERR(barrier_cbs_tasks[i])) {
ret = PTR_ERR(barrier_cbs_tasks[i]);
VERBOSE_PRINTK_ERRSTRING("Failed to create rcu_torture_barrier_cbs");
barrier_cbs_tasks[i] = NULL;
return ret;
}
}
barrier_task = kthread_run(rcu_torture_barrier, NULL,
"rcu_torture_barrier");
if (IS_ERR(barrier_task)) {
ret = PTR_ERR(barrier_task);
VERBOSE_PRINTK_ERRSTRING("Failed to create rcu_torture_barrier");
barrier_task = NULL;
}
return 0;
}
/* Clean up after RCU barrier testing. */
static void rcu_torture_barrier_cleanup(void)
{
int i;
if (barrier_task != NULL) {
VERBOSE_PRINTK_STRING("Stopping rcu_torture_barrier task");
kthread_stop(barrier_task);
barrier_task = NULL;
}
if (barrier_cbs_tasks != NULL) {
for (i = 0; i < n_barrier_cbs; i++) {
if (barrier_cbs_tasks[i] != NULL) {
VERBOSE_PRINTK_STRING("Stopping rcu_torture_barrier_cbs task");
kthread_stop(barrier_cbs_tasks[i]);
barrier_cbs_tasks[i] = NULL;
}
}
kfree(barrier_cbs_tasks);
barrier_cbs_tasks = NULL;
}
if (barrier_cbs_wq != NULL) {
kfree(barrier_cbs_wq);
barrier_cbs_wq = NULL;
}
}
static int rcutorture_cpu_notify(struct notifier_block *self,
......@@ -1598,6 +1808,7 @@ rcu_torture_cleanup(void)
fullstop = FULLSTOP_RMMOD;
mutex_unlock(&fullstop_mutex);
unregister_reboot_notifier(&rcutorture_shutdown_nb);
rcu_torture_barrier_cleanup();
rcu_torture_stall_cleanup();
if (stutter_task) {
VERBOSE_PRINTK_STRING("Stopping rcu_torture_stutter task");
......@@ -1665,6 +1876,7 @@ rcu_torture_cleanup(void)
VERBOSE_PRINTK_STRING("Stopping rcu_torture_shutdown task");
kthread_stop(shutdown_task);
}
shutdown_task = NULL;
rcu_torture_onoff_cleanup();
/* Wait for all RCU callbacks to fire. */
......@@ -1676,7 +1888,7 @@ rcu_torture_cleanup(void)
if (cur_ops->cleanup)
cur_ops->cleanup();
if (atomic_read(&n_rcu_torture_error))
if (atomic_read(&n_rcu_torture_error) || n_rcu_torture_barrier_error)
rcu_torture_print_module_parms(cur_ops, "End of test: FAILURE");
else if (n_online_successes != n_online_attempts ||
n_offline_successes != n_offline_attempts)
......@@ -1692,10 +1904,12 @@ rcu_torture_init(void)
int i;
int cpu;
int firsterr = 0;
int retval;
static struct rcu_torture_ops *torture_ops[] =
{ &rcu_ops, &rcu_sync_ops, &rcu_expedited_ops,
&rcu_bh_ops, &rcu_bh_sync_ops, &rcu_bh_expedited_ops,
&srcu_ops, &srcu_raw_ops, &srcu_expedited_ops,
&srcu_ops, &srcu_sync_ops, &srcu_raw_ops,
&srcu_raw_sync_ops, &srcu_expedited_ops,
&sched_ops, &sched_sync_ops, &sched_expedited_ops, };
mutex_lock(&fullstop_mutex);
......@@ -1749,6 +1963,7 @@ rcu_torture_init(void)
atomic_set(&n_rcu_torture_free, 0);
atomic_set(&n_rcu_torture_mberror, 0);
atomic_set(&n_rcu_torture_error, 0);
n_rcu_torture_barrier_error = 0;
n_rcu_torture_boost_ktrerror = 0;
n_rcu_torture_boost_rterror = 0;
n_rcu_torture_boost_failure = 0;
......@@ -1872,7 +2087,6 @@ rcu_torture_init(void)
test_boost_duration = 2;
if ((test_boost == 1 && cur_ops->can_boost) ||
test_boost == 2) {
int retval;
boost_starttime = jiffies + test_boost_interval * HZ;
register_cpu_notifier(&rcutorture_cpu_nb);
......@@ -1897,9 +2111,22 @@ rcu_torture_init(void)
goto unwind;
}
}
rcu_torture_onoff_init();
i = rcu_torture_onoff_init();
if (i != 0) {
firsterr = i;
goto unwind;
}
register_reboot_notifier(&rcutorture_shutdown_nb);
rcu_torture_stall_init();
i = rcu_torture_stall_init();
if (i != 0) {
firsterr = i;
goto unwind;
}
retval = rcu_torture_barrier_init();
if (retval != 0) {
firsterr = retval;
goto unwind;
}
rcutorture_record_test_transition();
mutex_unlock(&fullstop_mutex);
return 0;
......
......@@ -75,6 +75,8 @@ static struct lock_class_key rcu_node_class[NUM_RCU_LVLS];
.gpnum = -300, \
.completed = -300, \
.onofflock = __RAW_SPIN_LOCK_UNLOCKED(&structname##_state.onofflock), \
.orphan_nxttail = &structname##_state.orphan_nxtlist, \
.orphan_donetail = &structname##_state.orphan_donelist, \
.fqslock = __RAW_SPIN_LOCK_UNLOCKED(&structname##_state.fqslock), \
.n_force_qs = 0, \
.n_force_qs_ngp = 0, \
......@@ -145,6 +147,13 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
unsigned long rcutorture_testseq;
unsigned long rcutorture_vernum;
/* State information for rcu_barrier() and friends. */
static DEFINE_PER_CPU(struct rcu_head, rcu_barrier_head) = {NULL};
static atomic_t rcu_barrier_cpu_count;
static DEFINE_MUTEX(rcu_barrier_mutex);
static struct completion rcu_barrier_completion;
/*
* Return true if an RCU grace period is in progress. The ACCESS_ONCE()s
* permit this function to be invoked without holding the root rcu_node
......@@ -192,7 +201,6 @@ void rcu_note_context_switch(int cpu)
{
trace_rcu_utilization("Start context switch");
rcu_sched_qs(cpu);
rcu_preempt_note_context_switch(cpu);
trace_rcu_utilization("End context switch");
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
......@@ -1311,95 +1319,133 @@ rcu_check_quiescent_state(struct rcu_state *rsp, struct rcu_data *rdp)
#ifdef CONFIG_HOTPLUG_CPU
/*
* Move a dying CPU's RCU callbacks to online CPU's callback list.
* Also record a quiescent state for this CPU for the current grace period.
* Synchronization and interrupt disabling are not required because
* this function executes in stop_machine() context. Therefore, cleanup
* operations that might block must be done later from the CPU_DEAD
* notifier.
*
* Note that the outgoing CPU's bit has already been cleared in the
* cpu_online_mask. This allows us to randomly pick a callback
* destination from the bits set in that mask.
* Send the specified CPU's RCU callbacks to the orphanage. The
* specified CPU must be offline, and the caller must hold the
* ->onofflock.
*/
static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
static void
rcu_send_cbs_to_orphanage(int cpu, struct rcu_state *rsp,
struct rcu_node *rnp, struct rcu_data *rdp)
{
int i;
unsigned long mask;
int receive_cpu = cpumask_any(cpu_online_mask);
struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
struct rcu_data *receive_rdp = per_cpu_ptr(rsp->rda, receive_cpu);
RCU_TRACE(struct rcu_node *rnp = rdp->mynode); /* For dying CPU. */
/* First, adjust the counts. */
/*
* Orphan the callbacks. First adjust the counts. This is safe
* because ->onofflock excludes _rcu_barrier()'s adoption of
* the callbacks, thus no memory barrier is required.
*/
if (rdp->nxtlist != NULL) {
receive_rdp->qlen_lazy += rdp->qlen_lazy;
receive_rdp->qlen += rdp->qlen;
rsp->qlen_lazy += rdp->qlen_lazy;
rsp->qlen += rdp->qlen;
rdp->n_cbs_orphaned += rdp->qlen;
rdp->qlen_lazy = 0;
rdp->qlen = 0;
}
/*
* Next, move ready-to-invoke callbacks to be invoked on some
* other CPU. These will not be required to pass through another
* grace period: They are done, regardless of CPU.
* Next, move those callbacks still needing a grace period to
* the orphanage, where some other CPU will pick them up.
* Some of the callbacks might have gone partway through a grace
* period, but that is too bad. They get to start over because we
* cannot assume that grace periods are synchronized across CPUs.
* We don't bother updating the ->nxttail[] array yet, instead
* we just reset the whole thing later on.
*/
if (rdp->nxtlist != NULL &&
rdp->nxttail[RCU_DONE_TAIL] != &rdp->nxtlist) {
struct rcu_head *oldhead;
struct rcu_head **oldtail;
struct rcu_head **newtail;
oldhead = rdp->nxtlist;
oldtail = receive_rdp->nxttail[RCU_DONE_TAIL];
rdp->nxtlist = *rdp->nxttail[RCU_DONE_TAIL];
*rdp->nxttail[RCU_DONE_TAIL] = *oldtail;
*receive_rdp->nxttail[RCU_DONE_TAIL] = oldhead;
newtail = rdp->nxttail[RCU_DONE_TAIL];
for (i = RCU_DONE_TAIL; i < RCU_NEXT_SIZE; i++) {
if (receive_rdp->nxttail[i] == oldtail)
receive_rdp->nxttail[i] = newtail;
if (rdp->nxttail[i] == newtail)
rdp->nxttail[i] = &rdp->nxtlist;
}
if (*rdp->nxttail[RCU_DONE_TAIL] != NULL) {
*rsp->orphan_nxttail = *rdp->nxttail[RCU_DONE_TAIL];
rsp->orphan_nxttail = rdp->nxttail[RCU_NEXT_TAIL];
*rdp->nxttail[RCU_DONE_TAIL] = NULL;
}
/*
* Finally, put the rest of the callbacks at the end of the list.
* The ones that made it partway through get to start over: We
* cannot assume that grace periods are synchronized across CPUs.
* (We could splice RCU_WAIT_TAIL into RCU_NEXT_READY_TAIL, but
* this does not seem compelling. Not yet, anyway.)
* Then move the ready-to-invoke callbacks to the orphanage,
* where some other CPU will pick them up. These will not be
* required to pass though another grace period: They are done.
*/
if (rdp->nxtlist != NULL) {
*receive_rdp->nxttail[RCU_NEXT_TAIL] = rdp->nxtlist;
receive_rdp->nxttail[RCU_NEXT_TAIL] =
rdp->nxttail[RCU_NEXT_TAIL];
receive_rdp->n_cbs_adopted += rdp->qlen;
rdp->n_cbs_orphaned += rdp->qlen;
rdp->nxtlist = NULL;
for (i = 0; i < RCU_NEXT_SIZE; i++)
rdp->nxttail[i] = &rdp->nxtlist;
*rsp->orphan_donetail = rdp->nxtlist;
rsp->orphan_donetail = rdp->nxttail[RCU_DONE_TAIL];
}
/* Finally, initialize the rcu_data structure's list to empty. */
rdp->nxtlist = NULL;
for (i = 0; i < RCU_NEXT_SIZE; i++)
rdp->nxttail[i] = &rdp->nxtlist;
}
/*
* Adopt the RCU callbacks from the specified rcu_state structure's
* orphanage. The caller must hold the ->onofflock.
*/
static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
{
int i;
struct rcu_data *rdp = __this_cpu_ptr(rsp->rda);
/*
* Record a quiescent state for the dying CPU. This is safe
* only because we have already cleared out the callbacks.
* (Otherwise, the RCU core might try to schedule the invocation
* of callbacks on this now-offline CPU, which would be bad.)
* If there is an rcu_barrier() operation in progress, then
* only the task doing that operation is permitted to adopt
* callbacks. To do otherwise breaks rcu_barrier() and friends
* by causing them to fail to wait for the callbacks in the
* orphanage.
*/
mask = rdp->grpmask; /* rnp->grplo is constant. */
if (rsp->rcu_barrier_in_progress &&
rsp->rcu_barrier_in_progress != current)
return;
/* Do the accounting first. */
rdp->qlen_lazy += rsp->qlen_lazy;
rdp->qlen += rsp->qlen;
rdp->n_cbs_adopted += rsp->qlen;
rsp->qlen_lazy = 0;
rsp->qlen = 0;
/*
* We do not need a memory barrier here because the only way we
* can get here if there is an rcu_barrier() in flight is if
* we are the task doing the rcu_barrier().
*/
/* First adopt the ready-to-invoke callbacks. */
if (rsp->orphan_donelist != NULL) {
*rsp->orphan_donetail = *rdp->nxttail[RCU_DONE_TAIL];
*rdp->nxttail[RCU_DONE_TAIL] = rsp->orphan_donelist;
for (i = RCU_NEXT_SIZE - 1; i >= RCU_DONE_TAIL; i--)
if (rdp->nxttail[i] == rdp->nxttail[RCU_DONE_TAIL])
rdp->nxttail[i] = rsp->orphan_donetail;
rsp->orphan_donelist = NULL;
rsp->orphan_donetail = &rsp->orphan_donelist;
}
/* And then adopt the callbacks that still need a grace period. */
if (rsp->orphan_nxtlist != NULL) {
*rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_nxtlist;
rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_nxttail;
rsp->orphan_nxtlist = NULL;
rsp->orphan_nxttail = &rsp->orphan_nxtlist;
}
}
/*
* Trace the fact that this CPU is going offline.
*/
static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
{
RCU_TRACE(unsigned long mask);
RCU_TRACE(struct rcu_data *rdp = this_cpu_ptr(rsp->rda));
RCU_TRACE(struct rcu_node *rnp = rdp->mynode);
RCU_TRACE(mask = rdp->grpmask);
trace_rcu_grace_period(rsp->name,
rnp->gpnum + 1 - !!(rnp->qsmask & mask),
"cpuofl");
rcu_report_qs_rdp(smp_processor_id(), rsp, rdp, rsp->gpnum);
/* Note that rcu_report_qs_rdp() might call trace_rcu_grace_period(). */
}
/*
* The CPU has been completely removed, and some other CPU is reporting
* this fact from process context. Do the remainder of the cleanup.
* this fact from process context. Do the remainder of the cleanup,
* including orphaning the outgoing CPU's RCU callbacks, and also
* adopting them, if there is no _rcu_barrier() instance running.
* There can only be one CPU hotplug operation at a time, so no other
* CPU can be attempting to update rcu_cpu_kthread_task.
*/
......@@ -1409,17 +1455,21 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
unsigned long mask;
int need_report = 0;
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rnp. */
struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
/* Adjust any no-longer-needed kthreads. */
rcu_stop_cpu_kthread(cpu);
rcu_node_kthread_setaffinity(rnp, -1);
/* Remove the dying CPU from the bitmasks in the rcu_node hierarchy. */
/* Remove the dead CPU from the bitmasks in the rcu_node hierarchy. */
/* Exclude any attempts to start a new grace period. */
raw_spin_lock_irqsave(&rsp->onofflock, flags);
/* Orphan the dead CPU's callbacks, and adopt them if appropriate. */
rcu_send_cbs_to_orphanage(cpu, rsp, rnp, rdp);
rcu_adopt_orphan_cbs(rsp);
/* Remove the outgoing CPU from the masks in the rcu_node hierarchy. */
mask = rdp->grpmask; /* rnp->grplo is constant. */
do {
......@@ -1456,6 +1506,10 @@ static void rcu_cleanup_dead_cpu(int cpu, struct rcu_state *rsp)
#else /* #ifdef CONFIG_HOTPLUG_CPU */
static void rcu_adopt_orphan_cbs(struct rcu_state *rsp)
{
}
static void rcu_cleanup_dying_cpu(struct rcu_state *rsp)
{
}
......@@ -1524,9 +1578,6 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
rcu_is_callbacks_kthread());
/* Update count, and requeue any remaining callbacks. */
rdp->qlen_lazy -= count_lazy;
rdp->qlen -= count;
rdp->n_cbs_invoked += count;
if (list != NULL) {
*tail = rdp->nxtlist;
rdp->nxtlist = list;
......@@ -1536,6 +1587,10 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
else
break;
}
smp_mb(); /* List handling before counting for rcu_barrier(). */
rdp->qlen_lazy -= count_lazy;
rdp->qlen -= count;
rdp->n_cbs_invoked += count;
/* Reinstate batch limit if we have worked down the excess. */
if (rdp->blimit == LONG_MAX && rdp->qlen <= qlowmark)
......@@ -1823,11 +1878,14 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
rdp = this_cpu_ptr(rsp->rda);
/* Add the callback to our list. */
*rdp->nxttail[RCU_NEXT_TAIL] = head;
rdp->nxttail[RCU_NEXT_TAIL] = &head->next;
rdp->qlen++;
if (lazy)
rdp->qlen_lazy++;
else
rcu_idle_count_callbacks_posted();
smp_mb(); /* Count before adding callback for rcu_barrier(). */
*rdp->nxttail[RCU_NEXT_TAIL] = head;
rdp->nxttail[RCU_NEXT_TAIL] = &head->next;
if (__is_kfree_rcu_offset((unsigned long)func))
trace_rcu_kfree_callback(rsp->name, head, (unsigned long)func,
......@@ -1893,6 +1951,38 @@ void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
}
EXPORT_SYMBOL_GPL(call_rcu_bh);
/*
* Because a context switch is a grace period for RCU-sched and RCU-bh,
* any blocking grace-period wait automatically implies a grace period
* if there is only one CPU online at any point time during execution
* of either synchronize_sched() or synchronize_rcu_bh(). It is OK to
* occasionally incorrectly indicate that there are multiple CPUs online
* when there was in fact only one the whole time, as this just adds
* some overhead: RCU still operates correctly.
*
* Of course, sampling num_online_cpus() with preemption enabled can
* give erroneous results if there are concurrent CPU-hotplug operations.
* For example, given a demonic sequence of preemptions in num_online_cpus()
* and CPU-hotplug operations, there could be two or more CPUs online at
* all times, but num_online_cpus() might well return one (or even zero).
*
* However, all such demonic sequences require at least one CPU-offline
* operation. Furthermore, rcu_blocking_is_gp() giving the wrong answer
* is only a problem if there is an RCU read-side critical section executing
* throughout. But RCU-sched and RCU-bh read-side critical sections
* disable either preemption or bh, which prevents a CPU from going offline.
* Therefore, the only way that rcu_blocking_is_gp() can incorrectly return
* that there is only one CPU when in fact there was more than one throughout
* is when there were no RCU readers in the system. If there are no
* RCU readers, the grace period by definition can be of zero length,
* regardless of the number of online CPUs.
*/
static inline int rcu_blocking_is_gp(void)
{
might_sleep(); /* Check for RCU read-side critical section. */
return num_online_cpus() <= 1;
}
/**
* synchronize_sched - wait until an rcu-sched grace period has elapsed.
*
......@@ -2166,11 +2256,10 @@ static int rcu_cpu_has_callbacks(int cpu)
rcu_preempt_cpu_has_callbacks(cpu);
}
static DEFINE_PER_CPU(struct rcu_head, rcu_barrier_head) = {NULL};
static atomic_t rcu_barrier_cpu_count;
static DEFINE_MUTEX(rcu_barrier_mutex);
static struct completion rcu_barrier_completion;
/*
* RCU callback function for _rcu_barrier(). If we are last, wake
* up the task executing _rcu_barrier().
*/
static void rcu_barrier_callback(struct rcu_head *notused)
{
if (atomic_dec_and_test(&rcu_barrier_cpu_count))
......@@ -2200,27 +2289,94 @@ static void _rcu_barrier(struct rcu_state *rsp,
void (*call_rcu_func)(struct rcu_head *head,
void (*func)(struct rcu_head *head)))
{
BUG_ON(in_interrupt());
int cpu;
unsigned long flags;
struct rcu_data *rdp;
struct rcu_head rh;
init_rcu_head_on_stack(&rh);
/* Take mutex to serialize concurrent rcu_barrier() requests. */
mutex_lock(&rcu_barrier_mutex);
init_completion(&rcu_barrier_completion);
smp_mb(); /* Prevent any prior operations from leaking in. */
/*
* Initialize rcu_barrier_cpu_count to 1, then invoke
* rcu_barrier_func() on each CPU, so that each CPU also has
* incremented rcu_barrier_cpu_count. Only then is it safe to
* decrement rcu_barrier_cpu_count -- otherwise the first CPU
* might complete its grace period before all of the other CPUs
* did their increment, causing this function to return too
* early. Note that on_each_cpu() disables irqs, which prevents
* any CPUs from coming online or going offline until each online
* CPU has queued its RCU-barrier callback.
* Initialize the count to one rather than to zero in order to
* avoid a too-soon return to zero in case of a short grace period
* (or preemption of this task). Also flag this task as doing
* an rcu_barrier(). This will prevent anyone else from adopting
* orphaned callbacks, which could cause otherwise failure if a
* CPU went offline and quickly came back online. To see this,
* consider the following sequence of events:
*
* 1. We cause CPU 0 to post an rcu_barrier_callback() callback.
* 2. CPU 1 goes offline, orphaning its callbacks.
* 3. CPU 0 adopts CPU 1's orphaned callbacks.
* 4. CPU 1 comes back online.
* 5. We cause CPU 1 to post an rcu_barrier_callback() callback.
* 6. Both rcu_barrier_callback() callbacks are invoked, awakening
* us -- but before CPU 1's orphaned callbacks are invoked!!!
*/
init_completion(&rcu_barrier_completion);
atomic_set(&rcu_barrier_cpu_count, 1);
on_each_cpu(rcu_barrier_func, (void *)call_rcu_func, 1);
raw_spin_lock_irqsave(&rsp->onofflock, flags);
rsp->rcu_barrier_in_progress = current;
raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
/*
* Force every CPU with callbacks to register a new callback
* that will tell us when all the preceding callbacks have
* been invoked. If an offline CPU has callbacks, wait for
* it to either come back online or to finish orphaning those
* callbacks.
*/
for_each_possible_cpu(cpu) {
preempt_disable();
rdp = per_cpu_ptr(rsp->rda, cpu);
if (cpu_is_offline(cpu)) {
preempt_enable();
while (cpu_is_offline(cpu) && ACCESS_ONCE(rdp->qlen))
schedule_timeout_interruptible(1);
} else if (ACCESS_ONCE(rdp->qlen)) {
smp_call_function_single(cpu, rcu_barrier_func,
(void *)call_rcu_func, 1);
preempt_enable();
} else {
preempt_enable();
}
}
/*
* Now that all online CPUs have rcu_barrier_callback() callbacks
* posted, we can adopt all of the orphaned callbacks and place
* an rcu_barrier_callback() callback after them. When that is done,
* we are guaranteed to have an rcu_barrier_callback() callback
* following every callback that could possibly have been
* registered before _rcu_barrier() was called.
*/
raw_spin_lock_irqsave(&rsp->onofflock, flags);
rcu_adopt_orphan_cbs(rsp);
rsp->rcu_barrier_in_progress = NULL;
raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
atomic_inc(&rcu_barrier_cpu_count);
smp_mb__after_atomic_inc(); /* Ensure atomic_inc() before callback. */
call_rcu_func(&rh, rcu_barrier_callback);
/*
* Now that we have an rcu_barrier_callback() callback on each
* CPU, and thus each counted, remove the initial count.
*/
if (atomic_dec_and_test(&rcu_barrier_cpu_count))
complete(&rcu_barrier_completion);
/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
wait_for_completion(&rcu_barrier_completion);
/* Other rcu_barrier() invocations can now safely proceed. */
mutex_unlock(&rcu_barrier_mutex);
destroy_rcu_head_on_stack(&rh);
}
/**
......@@ -2417,7 +2573,7 @@ static void __init rcu_init_levelspread(struct rcu_state *rsp)
for (i = NUM_RCU_LVLS - 1; i > 0; i--)
rsp->levelspread[i] = CONFIG_RCU_FANOUT;
rsp->levelspread[0] = RCU_FANOUT_LEAF;
rsp->levelspread[0] = CONFIG_RCU_FANOUT_LEAF;
}
#else /* #ifdef CONFIG_RCU_FANOUT_EXACT */
static void __init rcu_init_levelspread(struct rcu_state *rsp)
......
......@@ -29,18 +29,14 @@
#include <linux/seqlock.h>
/*
* Define shape of hierarchy based on NR_CPUS and CONFIG_RCU_FANOUT.
* Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and
* CONFIG_RCU_FANOUT_LEAF.
* In theory, it should be possible to add more levels straightforwardly.
* In practice, this did work well going from three levels to four.
* Of course, your mileage may vary.
*/
#define MAX_RCU_LVLS 4
#if CONFIG_RCU_FANOUT > 16
#define RCU_FANOUT_LEAF 16
#else /* #if CONFIG_RCU_FANOUT > 16 */
#define RCU_FANOUT_LEAF (CONFIG_RCU_FANOUT)
#endif /* #else #if CONFIG_RCU_FANOUT > 16 */
#define RCU_FANOUT_1 (RCU_FANOUT_LEAF)
#define RCU_FANOUT_1 (CONFIG_RCU_FANOUT_LEAF)
#define RCU_FANOUT_2 (RCU_FANOUT_1 * CONFIG_RCU_FANOUT)
#define RCU_FANOUT_3 (RCU_FANOUT_2 * CONFIG_RCU_FANOUT)
#define RCU_FANOUT_4 (RCU_FANOUT_3 * CONFIG_RCU_FANOUT)
......@@ -371,6 +367,17 @@ struct rcu_state {
raw_spinlock_t onofflock; /* exclude on/offline and */
/* starting new GP. */
struct rcu_head *orphan_nxtlist; /* Orphaned callbacks that */
/* need a grace period. */
struct rcu_head **orphan_nxttail; /* Tail of above. */
struct rcu_head *orphan_donelist; /* Orphaned callbacks that */
/* are ready to invoke. */
struct rcu_head **orphan_donetail; /* Tail of above. */
long qlen_lazy; /* Number of lazy callbacks. */
long qlen; /* Total number of callbacks. */
struct task_struct *rcu_barrier_in_progress;
/* Task doing rcu_barrier(), */
/* or NULL if no barrier. */
raw_spinlock_t fqslock; /* Only one task forcing */
/* quiescent states. */
unsigned long jiffies_force_qs; /* Time at which to invoke */
......@@ -423,7 +430,6 @@ DECLARE_PER_CPU(char, rcu_cpu_has_work);
/* Forward declarations for rcutree_plugin.h */
static void rcu_bootup_announce(void);
long rcu_batches_completed(void);
static void rcu_preempt_note_context_switch(int cpu);
static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp);
#ifdef CONFIG_HOTPLUG_CPU
static void rcu_report_unblock_qs_rnp(struct rcu_node *rnp,
......@@ -471,6 +477,7 @@ static void __cpuinit rcu_prepare_kthreads(int cpu);
static void rcu_prepare_for_idle_init(int cpu);
static void rcu_cleanup_after_idle(int cpu);
static void rcu_prepare_for_idle(int cpu);
static void rcu_idle_count_callbacks_posted(void);
static void print_cpu_stall_info_begin(void);
static void print_cpu_stall_info(struct rcu_state *rsp, int cpu);
static void print_cpu_stall_info_end(void);
......
......@@ -153,7 +153,7 @@ static void rcu_preempt_qs(int cpu)
*
* Caller must disable preemption.
*/
static void rcu_preempt_note_context_switch(int cpu)
void rcu_preempt_note_context_switch(void)
{
struct task_struct *t = current;
unsigned long flags;
......@@ -164,7 +164,7 @@ static void rcu_preempt_note_context_switch(int cpu)
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
/* Possibly blocking in an RCU read-side critical section. */
rdp = per_cpu_ptr(rcu_preempt_state.rda, cpu);
rdp = __this_cpu_ptr(rcu_preempt_state.rda);
rnp = rdp->mynode;
raw_spin_lock_irqsave(&rnp->lock, flags);
t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
......@@ -228,7 +228,7 @@ static void rcu_preempt_note_context_switch(int cpu)
* means that we continue to block the current grace period.
*/
local_irq_save(flags);
rcu_preempt_qs(cpu);
rcu_preempt_qs(smp_processor_id());
local_irq_restore(flags);
}
......@@ -969,22 +969,6 @@ static void __init __rcu_init_preempt(void)
rcu_init_one(&rcu_preempt_state, &rcu_preempt_data);
}
/*
* Check for a task exiting while in a preemptible-RCU read-side
* critical section, clean up if so. No need to issue warnings,
* as debug_check_no_locks_held() already does this if lockdep
* is enabled.
*/
void exit_rcu(void)
{
struct task_struct *t = current;
if (t->rcu_read_lock_nesting == 0)
return;
t->rcu_read_lock_nesting = 1;
__rcu_read_unlock();
}
#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
static struct rcu_state *rcu_state = &rcu_sched_state;
......@@ -1017,14 +1001,6 @@ void rcu_force_quiescent_state(void)
}
EXPORT_SYMBOL_GPL(rcu_force_quiescent_state);
/*
* Because preemptible RCU does not exist, we never have to check for
* CPUs being in quiescent states.
*/
static void rcu_preempt_note_context_switch(int cpu)
{
}
/*
* Because preemptible RCU does not exist, there are never any preempted
* RCU readers.
......@@ -1938,6 +1914,14 @@ static void rcu_prepare_for_idle(int cpu)
{
}
/*
* Don't bother keeping a running count of the number of RCU callbacks
* posted because CONFIG_RCU_FAST_NO_HZ=n.
*/
static void rcu_idle_count_callbacks_posted(void)
{
}
#else /* #if !defined(CONFIG_RCU_FAST_NO_HZ) */
/*
......@@ -1978,11 +1962,20 @@ static void rcu_prepare_for_idle(int cpu)
#define RCU_IDLE_GP_DELAY 6 /* Roughly one grace period. */
#define RCU_IDLE_LAZY_GP_DELAY (6 * HZ) /* Roughly six seconds. */
/* Loop counter for rcu_prepare_for_idle(). */
static DEFINE_PER_CPU(int, rcu_dyntick_drain);
/* If rcu_dyntick_holdoff==jiffies, don't try to enter dyntick-idle mode. */
static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
static DEFINE_PER_CPU(struct hrtimer, rcu_idle_gp_timer);
static ktime_t rcu_idle_gp_wait; /* If some non-lazy callbacks. */
static ktime_t rcu_idle_lazy_gp_wait; /* If only lazy callbacks. */
/* Timer to awaken the CPU if it enters dyntick-idle mode with callbacks. */
static DEFINE_PER_CPU(struct timer_list, rcu_idle_gp_timer);
/* Scheduled expiry time for rcu_idle_gp_timer to allow reposting. */
static DEFINE_PER_CPU(unsigned long, rcu_idle_gp_timer_expires);
/* Enable special processing on first attempt to enter dyntick-idle mode. */
static DEFINE_PER_CPU(bool, rcu_idle_first_pass);
/* Running count of non-lazy callbacks posted, never decremented. */
static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted);
/* Snapshot of rcu_nonlazy_posted to detect meaningful exits from idle. */
static DEFINE_PER_CPU(unsigned long, rcu_nonlazy_posted_snap);
/*
* Allow the CPU to enter dyntick-idle mode if either: (1) There are no
......@@ -1995,6 +1988,8 @@ static ktime_t rcu_idle_lazy_gp_wait; /* If only lazy callbacks. */
*/
int rcu_needs_cpu(int cpu)
{
/* Flag a new idle sojourn to the idle-entry state machine. */
per_cpu(rcu_idle_first_pass, cpu) = 1;
/* If no callbacks, RCU doesn't need the CPU. */
if (!rcu_cpu_has_callbacks(cpu))
return 0;
......@@ -2044,17 +2039,35 @@ static bool rcu_cpu_has_nonlazy_callbacks(int cpu)
rcu_preempt_cpu_has_nonlazy_callbacks(cpu);
}
/*
* Handler for smp_call_function_single(). The only point of this
* handler is to wake the CPU up, so the handler does only tracing.
*/
void rcu_idle_demigrate(void *unused)
{
trace_rcu_prep_idle("Demigrate");
}
/*
* Timer handler used to force CPU to start pushing its remaining RCU
* callbacks in the case where it entered dyntick-idle mode with callbacks
* pending. The hander doesn't really need to do anything because the
* real work is done upon re-entry to idle, or by the next scheduling-clock
* interrupt should idle not be re-entered.
*
* One special case: the timer gets migrated without awakening the CPU
* on which the timer was scheduled on. In this case, we must wake up
* that CPU. We do so with smp_call_function_single().
*/
static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
static void rcu_idle_gp_timer_func(unsigned long cpu_in)
{
int cpu = (int)cpu_in;
trace_rcu_prep_idle("Timer");
return HRTIMER_NORESTART;
if (cpu != smp_processor_id())
smp_call_function_single(cpu, rcu_idle_demigrate, NULL, 0);
else
WARN_ON_ONCE(1); /* Getting here can hang the system... */
}
/*
......@@ -2062,19 +2075,11 @@ static enum hrtimer_restart rcu_idle_gp_timer_func(struct hrtimer *hrtp)
*/
static void rcu_prepare_for_idle_init(int cpu)
{
static int firsttime = 1;
struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
hrtimer_init(hrtp, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hrtp->function = rcu_idle_gp_timer_func;
if (firsttime) {
unsigned int upj = jiffies_to_usecs(RCU_IDLE_GP_DELAY);
rcu_idle_gp_wait = ns_to_ktime(upj * (u64)1000);
upj = jiffies_to_usecs(RCU_IDLE_LAZY_GP_DELAY);
rcu_idle_lazy_gp_wait = ns_to_ktime(upj * (u64)1000);
firsttime = 0;
}
per_cpu(rcu_dyntick_holdoff, cpu) = jiffies - 1;
setup_timer(&per_cpu(rcu_idle_gp_timer, cpu),
rcu_idle_gp_timer_func, cpu);
per_cpu(rcu_idle_gp_timer_expires, cpu) = jiffies - 1;
per_cpu(rcu_idle_first_pass, cpu) = 1;
}
/*
......@@ -2084,7 +2089,8 @@ static void rcu_prepare_for_idle_init(int cpu)
*/
static void rcu_cleanup_after_idle(int cpu)
{
hrtimer_cancel(&per_cpu(rcu_idle_gp_timer, cpu));
del_timer(&per_cpu(rcu_idle_gp_timer, cpu));
trace_rcu_prep_idle("Cleanup after idle");
}
/*
......@@ -2108,6 +2114,29 @@ static void rcu_cleanup_after_idle(int cpu)
*/
static void rcu_prepare_for_idle(int cpu)
{
struct timer_list *tp;
/*
* If this is an idle re-entry, for example, due to use of
* RCU_NONIDLE() or the new idle-loop tracing API within the idle
* loop, then don't take any state-machine actions, unless the
* momentary exit from idle queued additional non-lazy callbacks.
* Instead, repost the rcu_idle_gp_timer if this CPU has callbacks
* pending.
*/
if (!per_cpu(rcu_idle_first_pass, cpu) &&
(per_cpu(rcu_nonlazy_posted, cpu) ==
per_cpu(rcu_nonlazy_posted_snap, cpu))) {
if (rcu_cpu_has_callbacks(cpu)) {
tp = &per_cpu(rcu_idle_gp_timer, cpu);
mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
}
return;
}
per_cpu(rcu_idle_first_pass, cpu) = 0;
per_cpu(rcu_nonlazy_posted_snap, cpu) =
per_cpu(rcu_nonlazy_posted, cpu) - 1;
/*
* If there are no callbacks on this CPU, enter dyntick-idle mode.
* Also reset state to avoid prejudicing later attempts.
......@@ -2140,11 +2169,15 @@ static void rcu_prepare_for_idle(int cpu)
per_cpu(rcu_dyntick_drain, cpu) = 0;
per_cpu(rcu_dyntick_holdoff, cpu) = jiffies;
if (rcu_cpu_has_nonlazy_callbacks(cpu))
hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
rcu_idle_gp_wait, HRTIMER_MODE_REL);
per_cpu(rcu_idle_gp_timer_expires, cpu) =
jiffies + RCU_IDLE_GP_DELAY;
else
hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
rcu_idle_lazy_gp_wait, HRTIMER_MODE_REL);
per_cpu(rcu_idle_gp_timer_expires, cpu) =
jiffies + RCU_IDLE_LAZY_GP_DELAY;
tp = &per_cpu(rcu_idle_gp_timer, cpu);
mod_timer_pinned(tp, per_cpu(rcu_idle_gp_timer_expires, cpu));
per_cpu(rcu_nonlazy_posted_snap, cpu) =
per_cpu(rcu_nonlazy_posted, cpu);
return; /* Nothing more to do immediately. */
} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
/* We have hit the limit, so time to give up. */
......@@ -2184,6 +2217,19 @@ static void rcu_prepare_for_idle(int cpu)
trace_rcu_prep_idle("Callbacks drained");
}
/*
* Keep a running count of the number of non-lazy callbacks posted
* on this CPU. This running counter (which is never decremented) allows
* rcu_prepare_for_idle() to detect when something out of the idle loop
* posts a callback, even if an equal number of callbacks are invoked.
* Of course, callbacks should only be posted from within a trace event
* designed to be called from idle or from within RCU_NONIDLE().
*/
static void rcu_idle_count_callbacks_posted(void)
{
__this_cpu_add(rcu_nonlazy_posted, 1);
}
#endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
#ifdef CONFIG_RCU_CPU_STALL_INFO
......@@ -2192,14 +2238,12 @@ static void rcu_prepare_for_idle(int cpu)
static void print_cpu_stall_fast_no_hz(char *cp, int cpu)
{
struct hrtimer *hrtp = &per_cpu(rcu_idle_gp_timer, cpu);
struct timer_list *tltp = &per_cpu(rcu_idle_gp_timer, cpu);
sprintf(cp, "drain=%d %c timer=%lld",
sprintf(cp, "drain=%d %c timer=%lu",
per_cpu(rcu_dyntick_drain, cpu),
per_cpu(rcu_dyntick_holdoff, cpu) == jiffies ? 'H' : '.',
hrtimer_active(hrtp)
? ktime_to_us(hrtimer_get_remaining(hrtp))
: -1);
timer_pending(tltp) ? tltp->expires - jiffies : -1);
}
#else /* #ifdef CONFIG_RCU_FAST_NO_HZ */
......
......@@ -271,13 +271,13 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp)
gpnum = rsp->gpnum;
seq_printf(m, "c=%lu g=%lu s=%d jfq=%ld j=%x "
"nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu\n",
"nfqs=%lu/nfqsng=%lu(%lu) fqlh=%lu oqlen=%ld/%ld\n",
rsp->completed, gpnum, rsp->fqs_state,
(long)(rsp->jiffies_force_qs - jiffies),
(int)(jiffies & 0xffff),
rsp->n_force_qs, rsp->n_force_qs_ngp,
rsp->n_force_qs - rsp->n_force_qs_ngp,
rsp->n_force_qs_lh);
rsp->n_force_qs_lh, rsp->qlen_lazy, rsp->qlen);
for (rnp = &rsp->node[0]; rnp - &rsp->node[0] < NUM_RCU_NODES; rnp++) {
if (rnp->level != level) {
seq_puts(m, "\n");
......
......@@ -2083,6 +2083,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
#endif
/* Here we just switch the register state and the stack. */
rcu_switch_from(prev);
switch_to(prev, next, prev);
barrier();
......
此差异已折叠。
......@@ -861,7 +861,13 @@ EXPORT_SYMBOL(mod_timer);
*
* mod_timer_pinned() is a way to update the expire field of an
* active timer (if the timer is inactive it will be activated)
* and not allow the timer to be migrated to a different CPU.
* and to ensure that the timer is scheduled on the current CPU.
*
* Note that this does not prevent the timer from being migrated
* when the current CPU goes offline. If this is a problem for
* you, use CPU-hotplug notifiers to handle it correctly, for
* example, cancelling the timer when the corresponding CPU goes
* offline.
*
* mod_timer_pinned(timer, expires) is equivalent to:
*
......
......@@ -10,6 +10,7 @@
#include <linux/list.h>
#include <linux/bug.h>
#include <linux/kernel.h>
#include <linux/rculist.h>
/*
* Insert a new entry between two known consecutive entries.
......@@ -75,3 +76,24 @@ void list_del(struct list_head *entry)
entry->prev = LIST_POISON2;
}
EXPORT_SYMBOL(list_del);
/*
* RCU variants.
*/
void __list_add_rcu(struct list_head *new,
struct list_head *prev, struct list_head *next)
{
WARN(next->prev != prev,
"list_add_rcu corruption. next->prev should be "
"prev (%p), but was %p. (next=%p).\n",
prev, next->prev, next);
WARN(prev->next != next,
"list_add_rcu corruption. prev->next should be "
"next (%p), but was %p. (prev=%p).\n",
next, prev->next, prev);
new->next = next;
new->prev = prev;
rcu_assign_pointer(list_next_rcu(prev), new);
next->prev = new;
}
EXPORT_SYMBOL(__list_add_rcu);
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册