提交 · a32e01ee689794a26bdfdbaa7e8c334576cee36c · openanolis / cloud-kernel

21 2月, 2018 1 次提交

rcu: Use wrapper for lockdep asserts · a32e01ee

由 Matthew Wilcox 提交于 1月 17, 2018

Commits c0b334c5 and ea9b0c8a introduced new sparse warnings
by accessing rcu_node->lock directly and ignoring the __private
marker.  Introduce a new wrapper and use it.  Also fix a similar problem
in srcutree.c introduced by a3883df3.
Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a32e01ee

29 11月, 2017 1 次提交

srcu: Prohibit call_srcu() use under raw spinlocks · d6331980

由 Paul E. McKenney 提交于 10月 10, 2017

Invoking queue_delayed_work() while holding a raw spinlock is forbidden
in -rt kernels, which is exactly what __call_srcu() does, indirectly via
srcu_funnel_gp_start(). This commit therefore downgrades Tree SRCU's
locking from raw to non-raw spinlocks, which works because call_srcu()
is not ever called while holding a raw spinlock.
Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d6331980

21 10月, 2017 1 次提交
- P
  srcu: Add parameters to SRCU docbook comments · e4d0b679
  由 Paul E. McKenney 提交于 9月 17, 2017
```
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
```
  e4d0b679
20 10月, 2017 1 次提交

doc: Fix various RCU docbook comment-header problems · 27fdb35f

由 Paul E. McKenney 提交于 10月 19, 2017

Because many of RCU's files have not been included into docbook, a
number of errors have accumulated. This commit fixes them.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27fdb35f

28 7月, 2017 1 次提交

srcu: Provide ordering for CPU not involved in grace period · 35732cf9

由 Paul E. McKenney 提交于 7月 05, 2017

Tree RCU guarantees that every online CPU has a memory barrier between
any given grace period and any of that CPU's RCU read-side sections that
must be ordered against that grace period. Since RCU doesn't always
know where read-side critical sections are, the actual implementation
guarantees order against prior and subsequent non-idle non-offline code,
whether in an RCU read-side critical section or not. As a result, there
does not need to be a memory barrier at the end of synchronize_rcu()
and friends because the ordering internal to the grace period has
ordered every CPU's post-grace-period execution against each CPU's
pre-grace-period execution, again for all non-idle online CPUs.

In contrast, SRCU can have non-idle online CPUs that are completely
uninvolved in a given SRCU grace period, for example, a CPU that
never runs any SRCU read-side critical sections and took no part in
the grace-period processing. It is in theory possible for a given
synchronize_srcu()'s wakeup to be delivered to a CPU that was completely
uninvolved in the prior SRCU grace period, which could mean that the
code following that synchronize_srcu() would end up being unordered with
respect to both the grace period and any pre-existing SRCU read-side
critical sections.

This commit therefore adds an smp_mb() to the end of __synchronize_srcu(),
which prevents this scenario from occurring.
Reported-by: NLance Roy <ldr709@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NLance Roy <ldr709@gmail.com>
Cc: <stable@vger.kernel.org> # 4.12.x

35732cf9

25 7月, 2017 3 次提交

rcutorture: Print SRCU lock/unlock totals · ac3748c6

由 Paul E. McKenney 提交于 5月 22, 2017

This commit adds printing of SRCU lock/unlock totals, which are just
the sums of the per-CPU counts. Saves a bit of mental arithmetic.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ac3748c6

rcutorture: Move SRCU status printing to SRCU implementations · 115a1a52

由 Paul E. McKenney 提交于 5月 22, 2017

This commit gets rid of some ugly #ifdefs in rcutorture.c by moving
the SRCU status printing to the SRCU implementations.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

115a1a52

srcu: Make process_srcu() be static · 0d8a1e83

由 Paul E. McKenney 提交于 6月 15, 2017

The function process_srcu() is not invoked outside of srcutree.c, so
this commit makes it static and drops the EXPORT_SYMBOL_GPL().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0d8a1e83

09 6月, 2017 3 次提交

srcu: Use rnp->lock wrappers to replace explicit memory barriers · a3883df3

由 Paul E. McKenney 提交于 5月 09, 2017

This commit uses TREE RCU's rnp->lock wrappers to replace a few explicit
memory barriers.  This change also has the advantage of making SRCU's
memory-ordering properties be implemented in roughly the same way as they
are in Tree RCU.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a3883df3

srcu: Shrink srcu.h by moving docbook and private function · 5a0465e1

由 Paul E. McKenney 提交于 5月 04, 2017

The call_srcu() docbook entry is currently in include/linux/srcu.h,
which causes needless processing for each include point. This commit
therefore moves this entry to kernel/rcu/srcutree.c, which the compiler
reads only once. In addition, the srcu_batches_completed() function is
used only within RCU and its torture-test suites. This commit therefore
also moves this function's declaration from include/linux/srcutiny.h,
include/linux/srcutree.h, and include/linux/srcuclassic.h to
kernel/rcu/rcu.h.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5a0465e1

srcu: Prevent sdp->srcu_gp_seq_needed counter wrap · c350c008

由 Paul E. McKenney 提交于 5月 03, 2017

If a given CPU never happens to ever start an SRCU grace period, the
grace-period sequence counter might wrap. If this CPU were to decide to
finally start a grace period, the state of its sdp->srcu_gp_seq_needed
might make it appear that it has already requested this grace period,
which would prevent starting the grace period. If no other CPU ever started
a grace period again, this would look like a grace-period hang. Even
if some other CPU took pity and started the needed grace period, the
leaf rcu_node structure's ->srcu_data_have_cbs field won't have record
of the fact that this CPU has a callback pending, which would look like
a very localized grace-period hang.

This might seem very unlikely, but SRCU grace periods can take less than
a microsecond on small systems, which means that overflow can happen
in much less than an hour on a 32-bit embedded system. And embedded
systems are especially likely to have long-term idle CPUs. Therefore,
it makes sense to prevent this scenario from happening.

This commit therefore scans each srcu_data structure occasionally,
with frequency controlled by the srcutree.counter_wrap_check kernel
boot parameter. This parameter can be set to something like 255
in order to exercise the counter-wrap-prevention code.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c350c008

08 6月, 2017 6 次提交

srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality · a602538e

由 Paul E. McKenney 提交于 4月 28, 2017

This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu()
counterparts to double-free bugs.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a602538e

srcu: Print non-default exp_holdoff values at boot time · 0c8e0e3c

由 Paul E. McKenney 提交于 4月 28, 2017

This commit makes srcu_bootup_announce() check for non-default values
of the auto-expedite holdoff time exp_holdoff and print a message if so.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0c8e0e3c

srcu: Make exp_holdoff module parameter be static · b5815e6c

由 Paul E. McKenney 提交于 4月 28, 2017

Because exp_holdoff is not used outside of srcutree.c, it can be static.
This commit therefore makes this change.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b5815e6c

srcu: Make Classic and Tree SRCU announce themselves at bootup · 1f4f6da1

由 Paul E. McKenney 提交于 4月 21, 2017

Currently, the only way to tell whether a given kernel is running
Classic, Tiny, or Tree SRCU is to look at the .config file, which
can easily be lost or associated with the wrong kernel. This commit
therefore has Classic and Tree SRCU identify themselves at boot time.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1f4f6da1

srcu: Eliminate possibility of destructive counter overflow · 881ec9d2

由 Paul E. McKenney 提交于 4月 12, 2017

Earlier versions of Tree SRCU were subject to a counter overflow bug that
could theoretically result in too-short grace periods. This commit
eliminates this problem by adding an update-side memory barrier.
The short explanation is that if the updater sums the unlock counts
too late to see a given __srcu_read_unlock() increment, that CPU's
next __srcu_read_lock() must see the new value of ->srcu_idx, thus
incrementing the other bank of counters. This eliminates the possibility
of destructive counter overflow as long as the srcu_read_lock() nesting
level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an
eminently reasonable nesting limit, especially on 64-bit systems.
Reported-by: NLance Roy <ldr709@gmail.com>
Suggested-by: NLance Roy <ldr709@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

881ec9d2

srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context · cdf7abc4

由 Paolo Bonzini 提交于 5月 31, 2017

Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
a single variable.  Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
implementation already supports mixed-context use of srcu_read_lock()
and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
and srcu_read_unlock() in each handler are nested and paired properly.
In other words, it is still illegal to (say) invoke srcu_read_lock()
in an interrupt handler and to invoke the matching srcu_read_unlock()
in a softirq handler.  Therefore, the only change required for Tiny SRCU
is to its comments.

Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: NLinu Cherian <linuc.decode@gmail.com>
Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>

cdf7abc4

02 5月, 2017 1 次提交

srcu: Debloat the <linux/rcu_segcblist.h> header · 45753c5f

由 Ingo Molnar 提交于 5月 02, 2017

Linus noticed that the <linux/rcu_segcblist.h> has huge inline functions
which should not be inline at all.

As a first step in cleaning this up, move them all to kernel/rcu/ and
only keep an absolute minimum of data type defines in the header:

  before:   -rw-r--r-- 1 mingo mingo 22284 May  2 10:25 include/linux/rcu_segcblist.h
   after:   -rw-r--r-- 1 mingo mingo  3180 May  2 10:22 include/linux/rcu_segcblist.h

More can be done, such as uninlining the large functions, which inlining
is unjustified even if it's an RCU internal matter.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

45753c5f

27 4月, 2017 6 次提交

srcu: Adjust default auto-expediting holdoff · b5fe223a

由 Paul E. McKenney 提交于 4月 27, 2017

The default value for the kernel boot parameter srcutree.exp_holdoff
is 50 microseconds, which is too long for good Tree SRCU performance
(compared to Classic SRCU) on the workloads tested by Mike Galbraith.
This commit therefore sets the default value to 25 microseconds, which
shows excellent results in Mike's testing.
Reported-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMike Galbraith <efault@gmx.de>

b5fe223a

srcu: Specify auto-expedite holdoff time · 22607d66

由 Paul E. McKenney 提交于 4月 25, 2017

On small systems, in the absence of readers, expedited SRCU grace
periods can complete in less than a microsecond. This means that an
eight-CPU system can have all CPUs doing synchronize_srcu() in a tight
loop and almost always expedite. This might actually be desirable in
some situations, but in general it is a good way to needlessly burn
CPU cycles. And in those situations where it is desirable, your friend
is the function synchronize_srcu_expedited().

For other situations, this commit adds a kernel parameter that specifies
a holdoff between completing the last SRCU grace period and auto-expediting
the next. If the next grace period starts before the holdoff expires,
auto-expediting is disabled. The holdoff is 50 microseconds by default,
and can be tuned to the desired number of nanoseconds. A value of zero
disables auto-expediting.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMike Galbraith <efault@gmx.de>

22607d66

srcu: Expedite first synchronize_srcu() when idle · 2da4b2a7

由 Paul E. McKenney 提交于 4月 25, 2017

Classic SRCU in effect expedites the first synchronize_srcu() when SRCU
is idle, and Mike Galbraith demonstrated that some use cases do in fact
rely on this behavior. In particular, Mike showed that Steven Rostedt's
hotplug stress script takes 55 seconds with Classic SRCU and more than
16 -minutes- when running Tree SRCU. Assuming that each Tree SRCU's call
to synchronize_srcu() takes four milliseconds, this implies that Steven's
test invokes synchronize_srcu() in isolation, but more than once per
200 microseconds. Mike used ftrace to demonstrate that the time between
successive calls to synchronize_srcu() ranged from 118 to 342 microseconds,
with one outlier at 80 milliseconds. This data clearly indicates that
Tree SRCU needs to expedite the first invocation of synchronize_srcu()
during an SRCU idle period.

This commit therefor introduces a srcu_might_be_idle() function that
probabilistically checks whether or not SRCU is idle. This function is
used by synchronize_rcu() as an additional criterion in deciding whether
or not to expedite.

(Hat trick to Peter Zijlstra for his earlier suggestion that this might
in fact be a problem. Which for all I know might have motivated Mike to
look into it.)
Reported-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMike Galbraith <efault@gmx.de>

2da4b2a7

srcu: Expedited grace periods with reduced memory contention · 1e9a038b

由 Paul E. McKenney 提交于 4月 24, 2017

Commit f60d231a ("srcu: Crude control of expedited grace periods")
introduced a per-srcu_struct atomic counter to track outstanding
requests for grace periods. This works, but represents a memory-contention
bottleneck. This commit therefore uses the srcu_node combining tree
to remove this bottleneck.

This commit adds new ->srcu_gp_seq_needed_exp fields to the
srcu_data, srcu_node, and srcu_struct structures, which track the
farthest-in-the-future grace period that must be expedited, which in
turn requires that all nearer-term grace periods also be expedited.
Requests for expediting start with the srcu_data structure, run up
through the srcu_node tree, and end at the srcu_struct structure.
Note that it may be necessary to expedite a grace period that just
now started, and this is handled by a new srcu_funnel_exp_start()
function, which is invoked when the grace period itself is already
in its way, but when that grace period was not marked as expedited.

A new srcu_get_delay() function returns zero if there is at least one
expedited SRCU grace period in flight, or SRCU_INTERVAL otherwise.
This function is used to calculate delays: Normal grace periods
are allowed to extend in order to cover more requests with a given
grace-period computation, which decreases per-request overhead.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMike Galbraith <efault@gmx.de>

1e9a038b

srcu: Make rcutorture writer stalls print SRCU GP state · 7f6733c3

由 Paul E. McKenney 提交于 4月 18, 2017

In the past, SRCU was simple enough that there was little point in
making the rcutorture writer stall messages print the SRCU grace-period
number state.  With the advent of Tree SRCU, this has changed.  This
commit therefore makes Classic, Tiny, and Tree SRCU report this state
to rcutorture as needed.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMike Galbraith <efault@gmx.de>

7f6733c3

srcu: Exact tracking of srcu_data structures containing callbacks · c7e88067

由 Paul E. McKenney 提交于 4月 18, 2017

The current Tree SRCU implementation schedules a workqueue for every
srcu_data covered by a given leaf srcu_node structure having callbacks,
even if only one of those srcu_data structures actually contains
callbacks. This is clearly inefficient for workloads that don't feature
callbacks everywhere all the time. This commit therefore adds an array
of masks that are used by the leaf srcu_node structures to track exactly
which srcu_data structures contain callbacks.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMike Galbraith <efault@gmx.de>

c7e88067

21 4月, 2017 2 次提交

srcu: Expedite srcu_schedule_cbs_snp() callback invocation · 0497b489

由 Paul E. McKenney 提交于 4月 18, 2017

Although Tree SRCU does reduce delays when there is at least one
synchronize_srcu_expedited() invocation pending, srcu_schedule_cbs_snp()
still waits for SRCU_INTERVAL before invoking callbacks.  Since
synchronize_srcu_expedited() now posts a callback and waits for
that callback to do a wakeup, this destroys the expedited nature of
synchronize_srcu_expedited().  This destruction became apparent to
Marc Zyngier in the guise of a guest-OS bootup slowdown from five
seconds to no fewer than forty seconds.

This commit therefore invokes callbacks immediately at the end of the
grace period when there is at least one synchronize_srcu_expedited()
invocation pending.  This brought Marc's guest-OS bootup times back
into the realm of reason.
Reported-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NMarc Zyngier <marc.zyngier@arm.com>

0497b489

srcu: Parallelize callback handling · da915ad5

由 Paul E. McKenney 提交于 4月 05, 2017

Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
however, there are workloads that could result in a high volume of
concurrent invocations of call_srcu(), which with current SRCU would
result in excessive lock contention on the srcu_struct structure's
->queue_lock, which protects SRCU's callback lists. This commit therefore
moves SRCU to per-CPU callback lists, thus greatly reducing contention.

Because a given SRCU instance no longer has a single centralized callback
list, starting grace periods and invoking callbacks are both more complex
than in the single-list Classic SRCU implementation. Starting grace
periods and handling callbacks are now handled using an srcu_node tree
that is in some ways similar to the rcu_node trees used by RCU-bh,
RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
controlled by exactly the same Kconfig options and boot parameters that
control the shape of the rcu_node tree).

In addition, the old per-CPU srcu_array structure is now named srcu_data
and contains an rcu_segcblist structure named ->srcu_cblist for its
callbacks (and a spinlock to protect this). The srcu_struct gets
an srcu_gp_seq that is used to associate callback segments with the
corresponding completion-time grace-period number. These completion-time
grace-period numbers are propagated up the srcu_node tree so that the
grace-period workqueue handler can determine whether additional grace
periods are needed on the one hand and where to look for callbacks that
are ready to be invoked.

The srcu_barrier() function must now wait on all instances of the per-CPU
->srcu_cblist. Because each ->srcu_cblist is protected by ->lock,
srcu_barrier() can remotely add the needed callbacks. In theory,
it could also remotely start grace periods, but in practice doing so
is complex and racy. And interestingly enough, it is never necessary
for srcu_barrier() to start a grace period because srcu_barrier() only
enqueues a callback when a callback is already present--and it turns out
that a grace period has to have already been started for this pre-existing
callback. Furthermore, it is only the callback that srcu_barrier()
needs to wait on, not any particular grace period. Therefore, a new
rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
callback into the same segment occupied by the last pre-existing callback
in the list. The special case where all the pre-existing callbacks are
on a different list (because they are in the process of being invoked)
is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
segment, relying on the done-callbacks check that takes place after all
callbacks are inovked.

Note that the readers use the same algorithm as before. Note that there
is a separate srcu_idx that tells the readers what counter to increment.
This unfortunately cannot be combined with srcu_gp_seq because they
need to be incremented at different times.

This commit introduces some ugly #ifdefs in rcutorture. These will go
away when I feel good enough about Tree SRCU to ditch Classic SRCU.

Some crude performance comparisons, courtesy of a quickly hacked rcuperf
asynchronous-grace-period capability:

Callback Queuing Overhead
-------------------------
# CPUS Classic SRCU Tree SRCU
------ ------------ ---------
2 0.349 us 0.342 us
16 31.66 us 0.4 us
41 --------- 0.417 us

The times are the 90th percentiles, a statistic that was chosen to reject
the overheads of the occasional srcu_barrier() call needed to avoid OOMing
the test machine. The rcuperf test hangs when running Classic SRCU at 41
CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code
and that statistics, this is a convincing demonstration of Tree SRCU's
performance and scalability advantages.

[1] https://lwn.net/Articles/309030/
[2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]

da915ad5

19 4月, 2017 12 次提交

srcu: Introduce CLASSIC_SRCU Kconfig option · dad81a20

由 Paul E. McKenney 提交于 3月 25, 2017

The TREE_SRCU rewrite is large and a bit on the non-simple side, so
this commit helps reduce risk by allowing the old v4.11 SRCU algorithm
to be selected using a new CLASSIC_SRCU Kconfig option that depends
on RCU_EXPERT. The default is to use the new TREE_SRCU and TINY_SRCU
algorithms, in order to help get these the testing that they need.
However, if your users do not require the update-side scalability that
is to be provided by TREE_SRCU, select RCU_EXPERT and then CLASSIC_SRCU
to revert back to the old classic SRCU algorithm.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

dad81a20

srcu: Crude control of expedited grace periods · f60d231a

由 Paul E. McKenney 提交于 3月 24, 2017

SRCU's implementation of expedited grace periods has always assumed
that the SRCU instance is idle when the expedited request arrives.
This commit improves this a bit by maintaining a count of the number
of outstanding expedited requests, thus allowing prior non-expedited
grace periods accommodate these requests by shifting to expedited mode.
However, any non-expedited wait already in progress will still wait for
the full duration.

Improved control of expedited grace periods is planned, but one step
at a time.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f60d231a

srcu: Merge ->srcu_state into ->srcu_gp_seq · 80a7956f

由 Paul E. McKenney 提交于 3月 22, 2017

Updating ->srcu_state and ->srcu_gp_seq will lead to extremely complex
race conditions given multiple callback queues, so this commit takes
advantage of the two-bit state now available in rcu_seq counters to
store the state in the bottom two bits of ->srcu_gp_seq.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

80a7956f

P
srcu: Fix bogus try_check_zero() comment · 91e27c35
由 Paul E. McKenney 提交于 3月 15, 2017
```
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
```
91e27c35

srcu: Move rcu_init_levelspread() to rcu_tree_node.h · 2b34c43c

由 Paul E. McKenney 提交于 3月 14, 2017

This commit moves the rcu_init_levelspread() function from
kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is
another step towards enabling SRCU to create its own combining tree.
This commit is code-movement only, give or take knock-on adjustments.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2b34c43c

srcu: Use rcu_segcblist to track SRCU callbacks · 8660b7d8

由 Paul E. McKenney 提交于 3月 13, 2017

This commit switches SRCU from custom-built callback queues to the new
rcu_segcblist structure.  This change associates grace-period sequence
numbers with groups of callbacks, which will be needed for efficient
processing of per-CPU callbacks.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

8660b7d8

srcu: Add grace-period sequence numbers · ac367c1c

由 Paul E. McKenney 提交于 3月 11, 2017

This commit adds grace-period sequence numbers, which will be used to
handle mid-boot grace periods and per-CPU callback lists.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ac367c1c

srcu: Move to state-based grace-period sequencing · c2a8ec07

由 Paul E. McKenney 提交于 3月 10, 2017

The current SRCU grace-period processing might never reach the last
portion of srcu_advance_batches(). This is OK given the current
implementation, as the first portion, up to the try_check_zero()
following the srcu_flip() is sufficient to drive grace periods forward.
However, it has the unfortunate side-effect of making it impossible to
determine when a given grace period has ended, and it will be necessary
to efficiently trace ends of grace periods in order to efficiently handle
per-CPU SRCU callback lists.

This commit therefore adds states to the SRCU grace-period processing,
so that the end of a given SRCU grace period is marked by the transition
to the SRCU_STATE_DONE state.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c2a8ec07

srcu: Push srcu_advance_batches() fastpath into common case · c6e56f59

由 Paul E. McKenney 提交于 3月 09, 2017

This commit simplifies the SRCU state machine by pushing the
srcu_advance_batches() idle-SRCU fastpath into the common case.  This is
done by giving srcu_reschedule() a delay parameter, which is zero in
the call from srcu_advance_batches().

This commit is a step towards numbering callbacks in order to
efficiently handle per-CPU callback lists.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c6e56f59

srcu: Allow early boot use of synchronize_srcu() · b5eaeaa5

由 Paul E. McKenney 提交于 2月 10, 2017

This commit checks for pre-scheduler state, and if that early in the
boot process, synchronize_srcu() and friends are no-ops.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b5eaeaa5

srcu: Check for tardy grace-period activity in cleanup_srcu_struct() · 15c68f7f

由 Paul E. McKenney 提交于 1月 19, 2017

Users of SRCU are obliged to complete all grace-period activity before
invoking cleanup_srcu_struct().  This means that all calls to either
synchronize_srcu() or synchronize_srcu_expedited() must have returned,
and all calls to call_srcu() must have returned, and the last call to
call_srcu() must have been followed by a call to srcu_barrier().
Furthermore, the caller must have done something to prevent any
further calls to synchronize_srcu(), synchronize_srcu_expedited(),
and call_srcu().

Therefore, if there has ever been an invocation of call_srcu() on
the srcu_struct in question, the sequence of events must be as
follows:

1.  Prevent any further calls to call_srcu().
2.  Wait for any pre-existing call_srcu() invocations to return.
3.  Invoke srcu_barrier().
4.  It is now safe to invoke cleanup_srcu_struct().

On the other hand, if there has ever been a call to synchronize_srcu()
or synchronize_srcu_expedited(), the sequence of events must be as
follows:

1.  Prevent any further calls to synchronize_srcu() or
    synchronize_srcu_expedited().
2.  Wait for any pre-existing synchronize_srcu() or
    synchronize_srcu_expedited() invocations to return.
3.  It is now safe to invoke cleanup_srcu_struct().

If there have been calls to all both types of functions (call_srcu()
and either of synchronize_srcu() and synchronize_srcu_expedited()), then
the caller must do the first three steps of the call_srcu() procedure
above and the first two steps of the synchronize_s*() procedure above,
and only then invoke cleanup_srcu_struct().

Note that cleanup_srcu_struct() does some probabilistic checks
for the caller failing to follow these procedures, in which case
cleanup_srcu_struct() does WARN_ON() and avoids freeing the per-CPU
structures associated with the specified srcu_struct structure.
Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

15c68f7f

srcu: Consolidate batch checking into rcu_all_batches_empty() · cc985822

由 Paul E. McKenney 提交于 1月 19, 2017

The srcu_reschedule() function invokes rcu_batch_empty() on each of
the four rcu_batch structures in the srcu_struct in question twice.
Given that this check will also be needed in cleanup_srcu_struct(), this
commit consolidates these four checks into a new rcu_all_batches_empty()
function.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

cc985822

02 3月, 2017 1 次提交

rcu: Separate the RCU synchronization types and APIs into <linux/rcupdate_wait.h> · f9411ebe

由 Ingo Molnar 提交于 2月 06, 2017

So rcupdate.h is a pretty complex header, in particular it includes
<linux/completion.h> which includes <linux/wait.h> - creating a
dependency that includes <linux/wait.h> in <linux/sched.h>,
which prevents the isolation of <linux/sched.h> from the derived
<linux/wait.h> header.

Solve part of the problem by decoupling rcupdate.h from completions:
this can be done by separating out the rcu_synchronize types and APIs,
and updating their usage sites.

Since this is a mostly RCU-internal types this will not just simplify
<linux/sched.h>'s dependencies, but will make all the hundreds of
.c files that include rcupdate.h but not completions or wait.h build
faster.

( For rcutiny this means that two dependent APIs have to be uninlined,
  but that shouldn't be much of a problem as they are rare variants. )
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

f9411ebe

26 1月, 2017 1 次提交

srcu: Reduce probability of SRCU ->unlock_count[] counter overflow · 7f554a3d

由 Paul E. McKenney 提交于 1月 24, 2017

Because there are no memory barriers between the srcu_flip() ->completed
increment and the summation of the read-side ->unlock_count[] counters,
both the compiler and the CPU can reorder the summation with the
->completed increment.  If the updater is preempted long enough during
this process, the read-side counters could overflow, resulting in a
too-short grace period.

This commit therefore adds a memory barrier just after the ->completed
increment, ensuring that if the summation misses an increment of
->unlock_count[] from __srcu_read_unlock(), the next __srcu_read_lock()
will see the new value of ->completed, thus bounding the number of
->unlock_count[] increments that can be missed to NR_CPUS.  The actual
overflow computation is more complex due to the possibility of nesting
of __srcu_read_lock().
Reported-by: NLance Roy <ldr709@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

7f554a3d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功