提交 · 5a0465e17a18c467b712a816985b7b8dd8d10c16 · OpenHarmony / kernel_linux

09 6月, 2017 14 次提交

srcu: Shrink srcu.h by moving docbook and private function · 5a0465e1

由 Paul E. McKenney 提交于 5月 04, 2017

The call_srcu() docbook entry is currently in include/linux/srcu.h,
which causes needless processing for each include point. This commit
therefore moves this entry to kernel/rcu/srcutree.c, which the compiler
reads only once. In addition, the srcu_batches_completed() function is
used only within RCU and its torture-test suites. This commit therefore
also moves this function's declaration from include/linux/srcutiny.h,
include/linux/srcutree.h, and include/linux/srcuclassic.h to
kernel/rcu/rcu.h.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5a0465e1

srcu: Prevent sdp->srcu_gp_seq_needed counter wrap · c350c008

由 Paul E. McKenney 提交于 5月 03, 2017

If a given CPU never happens to ever start an SRCU grace period, the
grace-period sequence counter might wrap. If this CPU were to decide to
finally start a grace period, the state of its sdp->srcu_gp_seq_needed
might make it appear that it has already requested this grace period,
which would prevent starting the grace period. If no other CPU ever started
a grace period again, this would look like a grace-period hang. Even
if some other CPU took pity and started the needed grace period, the
leaf rcu_node structure's ->srcu_data_have_cbs field won't have record
of the fact that this CPU has a callback pending, which would look like
a very localized grace-period hang.

This might seem very unlikely, but SRCU grace periods can take less than
a microsecond on small systems, which means that overflow can happen
in much less than an hour on a 32-bit embedded system. And embedded
systems are especially likely to have long-term idle CPUs. Therefore,
it makes sense to prevent this scenario from happening.

This commit therefore scans each srcu_data structure occasionally,
with frequency controlled by the srcutree.counter_wrap_check kernel
boot parameter. This parameter can be set to something like 255
in order to exercise the counter-wrap-prevention code.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c350c008

rcu: Move rcu_request_urgent_qs_task() out of rcutiny.h and rcutree.h · fe21a27e

由 Paul E. McKenney 提交于 5月 03, 2017

The rcu_request_urgent_qs_task() function is used only within RCU,
so there is no point in exporting it to the rest of the kernel from
nclude/linux/rcutiny.h and include/linux/rcutree.h.  This commit therefore
moves this function to kernel/rcu/rcu.h.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

fe21a27e

rcu: Move torture-related functions out of rcutiny.h and rcutree.h · e3c8d51e

由 Paul E. McKenney 提交于 5月 03, 2017

The various functions similar to rcu_batches_started(), the
function show_rcu_gp_kthreads(), the various functions similar to
rcu_force_quiescent_state(), and the variables rcutorture_testseq and
rcutorture_vernum are used only within RCU.  There is therefore no point
in exporting them to the kernel at large from include/linux/rcutiny.h
and include/linux/rcutree.h.  This commit therefore moves all of these
to kernel/rcu/rcu.h.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e3c8d51e

rcu: Move rcu_ftrace_dump() from rcupdate.h to rcu.h · b8989b76

由 Paul E. McKenney 提交于 5月 03, 2017

The rcu_ftrace_dump() function is used only internally to RCU.  This
commit therefore moves its declaration from include/linux/rcupdate.h
to kernel/rcu/rcu.h.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b8989b76

rcu: Move rcu_is_nocb_cpu() from rcupdate.h to rcu.h · 3d54f798

由 Paul E. McKenney 提交于 5月 03, 2017

The rcu_is_nocb_cpu() function is used only internally to RCU.  This
commit therefore moves its declaration from include/linux/rcupdate.h
to kernel/rcu/rcu.h.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3d54f798

rcu: Improve __call_rcu() debug-objects error message · fa3c6647

由 Paul E. McKenney 提交于 5月 03, 2017

The "__call_rcu(): Leaked duplicate callback" error message from
__call_rcu() has proven to be unhelpful.  This commit therefore changes
it to "__call_rcu(): Double-freed CB" and adds the value of the pointer
passed in.  The value of the pointer improves debuggability by allowing
correlation with tracing output, for example, the rcu:rcu_callback trace
event.
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

fa3c6647

rcu: Move the RCU_SCHEDULER_ definitions from rcupdate.h · 82118249

由 Paul E. McKenney 提交于 5月 03, 2017

The RCU_SCHEDULER_INACTIVE, RCU_SCHEDULER_INIT, and RCU_SCHEDULER_RUNNING
definitions are used only within RCU, so this commit moves them from
include/linux/rcupdate.h to kernel/rcu/rcu.h.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

82118249

rcu: Eliminate the unused __rcu_is_watching() function · 791875d1

由 Paul E. McKenney 提交于 5月 03, 2017

The __rcu_is_watching() function is currently not used, aside from
to implement the rcu_is_watching() function.  This commit therefore
eliminates __rcu_is_watching(), which has the beneficial side-effect
of shrinking include/linux/rcupdate.h a bit.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

791875d1

rcu: Move torture-related definitions from rcupdate.h to rcu.h · cad7b389

由 Paul E. McKenney 提交于 5月 03, 2017

The include/linux/rcupdate.h file contains a number of definitions that
are used only to communicate between rcutorture, rcuperf, and the RCU code
itself. There is no point in having these definitions exposed globally
throughout the kernel, so this commit moves them to kernel/rcu/rcu.h.
This change has the added benefit of shrinking rcupdate.h.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

cad7b389

rcu: Move expediting-related access/control out of rcupdate.h · 25c36329

由 Paul E. McKenney 提交于 5月 03, 2017

The rcu_gp_is_normal(), rcu_gp_is_expedited(), rcu_expedite_gp(), and
rcu_unexpedite_gp() functions are intended only for use within the
RCU implementation itself -- the sysfs access is what should be used
outside of RCU. This commit therefore moves the declarations for
these functions to kernel/rcu/rcu.h, and also includes this file into
kernel/rcu/rcutorture.c and kernel/rcu/rcuperf.c. This also has the
beneficial effect of shrinking rcupdate.c a bit.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

25c36329

rcu: Move rcu_expedited and rcu_normal externs from rcupdate.h · 3caec62f

由 Paul E. McKenney 提交于 5月 03, 2017

The rcu_expedited and rcu_normal variables are used only by sysctl
and kernel/rcu/update.c, so it does not make sense to their extern
declarations in rcupdate.h.  This commit therefore moves these
extern declarations to update.c.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3caec62f

rcu: Move docbook comments out of rcupdate.h · a68a2bb2

由 Paul E. McKenney 提交于 5月 03, 2017

The include/linux/rcupdate.h file is included by more than 200
files, so shrinking it should provide some build-time benefits.
This commit therefore moves several docbook comments from rcupdate.h to
kernel/rcu/update.c, kernel/rcu/tree.c, and kernel/rcu/tree_plugin.h, thus
reducing the number of times that the compiler has to scan these comments.
This likely provides only a small benefit, but every little bit helps.

This commit also fixes a malformed bulleted list noted by the 0day
Test Robot.
Reported-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a68a2bb2

rcu: Add memory barriers for NOCB leader wakeup · 6b5fc3a1

由 Paul E. McKenney 提交于 4月 28, 2017

Wait/wakeup operations do not guarantee ordering on their own.  Instead,
either locking or memory barriers are required.  This commit therefore
adds memory barriers to wake_nocb_leader() and nocb_leader_wait().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NKrister Johansen <kjlx@templeofstupid.com>
Cc: <stable@vger.kernel.org> # 4.6.x

6b5fc3a1

08 6月, 2017 26 次提交

rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE · 511324e4

由 Paul E. McKenney 提交于 4月 28, 2017

The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags
are used to mediate wakeups for the no-CBs CPU kthreads. The "NOGP"
really doesn't make any sense, so this commit does s/NOGP/NOCB/.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

511324e4

sched: Rely on synchronize_rcu_mult() de-duplication · d7d34d5e

由 Paul E. McKenney 提交于 4月 28, 2017

The synchronize_rcu_mult() function now detects duplicate requests
for the same grace-period flavor and waits only once for each flavor.
This commit therefore removes the ugly #ifdef from sched_cpu_deactivate()
because synchronize_rcu_mult(call_rcu, call_rcu_sched) now does what
the #ifdef used to be needed for.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>

d7d34d5e

rcu: Make synchronize_rcu_mult() check for duplicates · 68ab0b42

由 Paul E. McKenney 提交于 4月 28, 2017

Currently, doing synchronize_rcu_mult(call_rcu, call_rcu) might
(or might not) wait for two RCU grace periods.  One approach is
of course "don't do that!", but in CONFIG_PREEMPT=n kernels,
synchronize_rcu_mult(call_rcu, call_rcu_sched) does exactly that.
This results in an ugly #ifdef in sched_cpu_deactivate().

This commit therefore makes __wait_rcu_gp() check for duplicates,
which in turn allows duplicates to be passed to synchronize_rcu_mult()
without risk of waiting twice on the same type of grace period.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

68ab0b42

srcu: Add DEBUG_OBJECTS_RCU_HEAD functionality · a602538e

由 Paul E. McKenney 提交于 4月 28, 2017

This commit adds DEBUG_OBJECTS_RCU_HEAD checking to detect call_srcu()
counterparts to double-free bugs.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a602538e

srcu: Shrink Tiny SRCU a bit · d4efe6c5

由 Paul E. McKenney 提交于 4月 28, 2017

In Tiny SRCU, __srcu_read_lock() is a trivial function, outweighed by
its EXPORT_SYMBOL_GPL(), and on many architectures, its call sequence.
This commit therefore moves it to srcutiny.h so that it can be inlined.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d4efe6c5

rcu: Add lockdep_assert_held() teeth to tree_plugin.h · ea9b0c8a

由 Paul E. McKenney 提交于 4月 28, 2017

Comments can be helpful, but assertions carry more force. This commit
therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN() calls to
enforce lock-held and interrupt-disabled preconditions.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ea9b0c8a

rcu: Add lockdep_assert_held() teeth to tree.c · c0b334c5

由 Paul E. McKenney 提交于 4月 28, 2017

Comments can be helpful, but assertions carry more force. This
commit therefore adds lockdep_assert_held() and RCU_LOCKDEP_WARN()
calls to enforce lock-held and interrupt-disabled preconditions.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c0b334c5

srcu: Print non-default exp_holdoff values at boot time · 0c8e0e3c

由 Paul E. McKenney 提交于 4月 28, 2017

This commit makes srcu_bootup_announce() check for non-default values
of the auto-expedite holdoff time exp_holdoff and print a message if so.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0c8e0e3c

srcu: Make exp_holdoff module parameter be static · b5815e6c

由 Paul E. McKenney 提交于 4月 28, 2017

Because exp_holdoff is not used outside of srcutree.c, it can be static.
This commit therefore makes this change.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b5815e6c

rcu: Update rcu_bootup_announce_oddness() · 17c7798b

由 Paul E. McKenney 提交于 4月 28, 2017

This commit updates rcu_bootup_announce_oddness() to check additional
Kconfig options and module/boot parameters.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

17c7798b

rcu: Print out rcupdate.c non-default boot-time settings · 59d80fd8

由 Paul E. McKenney 提交于 4月 28, 2017

This commit adds a rcupdate_announce_bootup_oddness() function to
print out non-default values of significant kernel boot parameter
settings to aid in debugging.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

59d80fd8

rcu: Add preemptibility checks in rcu_sched_qs() and rcu_bh_qs() · f4687d26

由 Paul E. McKenney 提交于 4月 27, 2017

This commit adds WARN_ON_ONCE() calls that trigger if either
rcu_sched_qs() or rcu_bh_qs() are invoked with preemption enabled.
In the immortal words of Peter Zijlstra: "these are much harder to ignore
than comments".
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f4687d26

rcuperf: Add writer_holdoff boot parameter · 820687a7

由 Paul E. McKenney 提交于 4月 25, 2017

This commit adds a writer_holdoff boot parameter to rcuperf, which is
intended to be used to test Tree SRCU's auto-expediting.  This
boot parameter is in microseconds, and defaults to zero (that is,
disabled).  Set it to a bit larger than srcutree.exp_holdoff,
keeping the nanosecond/microsecond conversion, to force Tree SRCU
to auto-expedite more aggressively.

This commit also adds documentation for this parameter, and fixes some
alphabetization while in the neighborhood.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

820687a7

rcuperf: Set more user-friendly defaults · 492b95e5

由 Paul E. McKenney 提交于 4月 21, 2017

Common-case use of rcuperf must set rcuperf.nreaders=0 and if not built
as a module, rcuperf.shutdown. This commit therefore sets the default
for rcuperf.nreaders to zero and sets the default for rcuperf.shutdown
to zero if rcuperf is built as a module and to one otherwise.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

492b95e5

srcu: Shrink Tiny SRCU a bit more · 3ddf20c9

由 Paul E. McKenney 提交于 4月 21, 2017

This commit rearranges Tiny SRCU's srcu_struct structure, substitutes
u8 for bool, and shrinks counters down to short.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3ddf20c9

srcu: Make Classic and Tree SRCU announce themselves at bootup · 1f4f6da1

由 Paul E. McKenney 提交于 4月 21, 2017

Currently, the only way to tell whether a given kernel is running
Classic, Tiny, or Tree SRCU is to look at the .config file, which
can easily be lost or associated with the wrong kernel. This commit
therefore has Classic and Tree SRCU identify themselves at boot time.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1f4f6da1

rcuperf: Add test for dynamically initialized srcu_struct · f60cb4d4

由 Paul E. McKenney 提交于 4月 19, 2017

This commit adds a perf_type of "srcud", which species that rcuperf
test SRCU on a dynamically initialized srcu_struct.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f60cb4d4

rcu: Make sync_rcu_preempt_exp_done() return bool · dcfc315b

由 Paul E. McKenney 提交于 4月 18, 2017

The sync_rcu_preempt_exp_done() function returns a logical expression,
but its return type is nevertheless int.  This commit therefore changes
the return type to bool.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

dcfc315b

rcuperf: Add ability to performance-test call_rcu() and friends · 881ed593

由 Paul E. McKenney 提交于 4月 17, 2017

This commit upgrades rcuperf so that it can do performance testing on
asynchronous grace-period primitives such as call_srcu().  There is
a new rcuperf.gp_async module parameter that specifies this new behavior,
with the pre-existing rcuperf.gp_exp testing expedited grace periods such as
synchronize_rcu_expedited, and with the default being to test synchronous
non-expedited grace periods such as synchronize_rcu().

There is also a new rcuperf.gp_async_max module parameter that specifies
the maximum number of outstanding callbacks per writer kthread, defaulting
to 1,000.  When this limit is exceeded, the writer thread invokes the
appropriate flavor of rcu_barrier() to wait for callbacks to drain.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Removed the redundant initialization noted by Arnd Bergmann. ]

881ed593

rcu: Remove obsolete reference to synchronize_kernel() · e28371c8

由 Paul E. McKenney 提交于 4月 17, 2017

The synchronize_kernel() primitive was removed in favor of
synchronize_sched() more than a decade ago, and it seems likely that
rather few kernel hackers are familiar with it.  Its continued presence
is therefore providing more confusion than enlightenment.  This commit
therefore removes the reference from the synchronize_sched() header
comment, and adds the corresponding information to the synchronize_rcu(0
header comment.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

e28371c8

rcuperf: Defer expedited/normal check to end of test · 9683937d

由 Paul E. McKenney 提交于 4月 14, 2017

Current rcuperf startup checks to see if the user asked to measure
only expedited grace periods, yet constrained all grace periods to be
normal, or if the user asked to measure only normal grace periods, yet
constrained all grace periods to be expedited.  Useless tests of this
sort are aborted.

Unfortunately, making RCU work through the mid-boot dead zone [1] puts
RCU into expedited-only mode during that zone.  Which happens to also
be the exact time that rcuperf carries out the aforementioned check.
So if the user asks rcuperf to measure only normal grace periods (the
default), rcuperf will now always complain and terminate the test.

This commit therefore moves the checks to rcu_perf_cleanup().  This has
the disadvantage of failing to abort useless tests, but avoids the need to
create yet another kthread and the need to do fiddly checks involving the
holdoff time.  (Yes, another approach is to do the checks in a late-stage
init function, but that would require some way to communicate badness
to rcuperf's kthreads, and seems not worth the bother.)

[1] https://lwn.net/Articles/716148/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9683937d

rcu: Complain if blocking in preemptible RCU read-side critical section · 5b72f964

由 Paul E. McKenney 提交于 4月 12, 2017

Although preemptible RCU allows its read-side critical sections to be
preempted, general blocking is forbidden. The reason for this is that
excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
voluntarily blocked task doesn't care how high you boost its priority.
Because preemptible RCU is a global mechanism, one ill-behaved reader
hurts everyone. Hence the prohibition against general blocking in
RCU-preempt read-side critical sections. Preemption yes, blocking no.

This commit enforces this prohibition.

There is a special exception for the -rt patchset (which they kindly
volunteered to implement): It is OK to block (as opposed to merely being
preempted) within an RCU-preempt read-side critical section, but only if
the blocking is subject to priority inheritance. This exception permits
CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.

Why doesn't this exception also apply to mainline's rt_mutex? Because
of the possibility that someone does general blocking while holding
an rt_mutex. Yes, the priority boosting will affect the rt_mutex,
but it won't help with the task doing general blocking while holding
that rt_mutex.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5b72f964

srcu: Eliminate possibility of destructive counter overflow · 881ec9d2

由 Paul E. McKenney 提交于 4月 12, 2017

Earlier versions of Tree SRCU were subject to a counter overflow bug that
could theoretically result in too-short grace periods. This commit
eliminates this problem by adding an update-side memory barrier.
The short explanation is that if the updater sums the unlock counts
too late to see a given __srcu_read_unlock() increment, that CPU's
next __srcu_read_lock() must see the new value of ->srcu_idx, thus
incrementing the other bank of counters. This eliminates the possibility
of destructive counter overflow as long as the srcu_read_lock() nesting
level does not exceed floor(ULONG_MAX/NR_CPUS/2), which should be an
eminently reasonable nesting limit, especially on 64-bit systems.
Reported-by: NLance Roy <ldr709@gmail.com>
Suggested-by: NLance Roy <ldr709@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

881ec9d2

rcu: Prevent rcu_barrier() from starting needless grace periods · f92c734f

由 Paul E. McKenney 提交于 4月 10, 2017

Currently rcu_barrier() uses call_rcu() to enqueue new callbacks
on each CPU with a non-empty callback list.  This works, but means
that rcu_barrier() forces grace periods that are not otherwise needed.
The key point is that rcu_barrier() never needs to wait for a grace
period, but instead only for all pre-existing callbacks to be invoked.
This means that rcu_barrier()'s new callbacks should be placed in
the callback-list segment containing the last pre-existing callback.

This commit makes this change using the new rcu_segcblist_entrain()
function.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f92c734f

srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604

由 Paolo Bonzini 提交于 5月 31, 2017

Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Cc: stable@vger.kernel.org
Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: NLinu Cherian <linuc.decode@gmail.com>
Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1123a604

srcu: Allow use of Tiny/Tree SRCU from both process and interrupt context · cdf7abc4

由 Paolo Bonzini 提交于 5月 31, 2017

Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Unlike Classic and Tree SRCU, Tiny SRCU does increments and decrements on
a single variable.  Therefore, as Peter Zijlstra pointed out, Tiny SRCU's
implementation already supports mixed-context use of srcu_read_lock()
and srcu_read_unlock(), at least as long as uses of srcu_read_lock()
and srcu_read_unlock() in each handler are nested and paired properly.
In other words, it is still illegal to (say) invoke srcu_read_lock()
in an interrupt handler and to invoke the matching srcu_read_unlock()
in a softirq handler.  Therefore, the only change required for Tiny SRCU
is to its comments.

Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: NLinu Cherian <linuc.decode@gmail.com>
Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>

cdf7abc4

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多