提交 · 4057adafb395204af4ff93f3669ecb49eb45b3cf · OpenHarmony / kernel_linux

16 5月, 2018 9 次提交

rcu: Convert ->need_future_gp[] array to boolean · 6f576e28

由 Paul E. McKenney 提交于 4月 18, 2018

There is no longer any need for ->need_future_gp[] to count the number of
requests for future grace periods, so this commit converts the additions
to assignments to "true" and reduces the size of each element to one byte.
While we are in the area, fix an obsolete comment.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

6f576e28

rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elements · 0ae94e00

由 Paul E. McKenney 提交于 4月 18, 2018

Currently, the rcu_future_needs_gp() function checks only the current
element of the ->need_future_gps[] array, which might miss elements that
were offset from the expected element, for example, due to races with
the start or the end of a grace period.  This commit therefore makes
rcu_future_needs_gp() use the need_any_future_gp() macro to check all
of the elements of this array.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

0ae94e00

rcu: Avoid losing ->need_future_gp[] values due to GP start/end races · 51af970d

由 Paul E. McKenney 提交于 4月 14, 2018

The rcu_cbs_completed() function provides the value of ->completed
at which new callbacks can safely be invoked.  This is recorded in
two-element ->need_future_gp[] arrays in the rcu_node structure, and
the elements of these arrays corresponding to the just-completed grace
period are zeroed at the end of that grace period.  However, the
rcu_cbs_completed() function can return the current ->completed value
plus either one or two, so it is possible for the corresponding
->need_future_gp[] entry to be cleared just after it was set, thus
losing a request for a future grace period.

This commit avoids this race by expanding ->need_future_gp[] to four
elements.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

51af970d

rcu: Make rcu_gp_cleanup() more accurately predict need for new GP · fb31340f

由 Paul E. McKenney 提交于 4月 12, 2018

Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset
state to reflect the end of the grace period. It also checks to see
whether a new grace period is needed, but in a number of cases, rather
than directly cause the new grace period to be immediately started, it
instead leaves the grace-period-needed state where various fail-safes
can find it. This works fine, but results in higher contention on the
root rcu_node structure's ->lock, which is undesirable, and contention
on that lock has recently become noticeable.

This commit therefore makes rcu_gp_cleanup() immediately start a new
grace period if there is any need for one.

It is quite possible that it will later be necessary to throttle the
grace-period rate, but that can be dealt with when and if.
Reported-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

fb31340f

rcu: Add accessor macros for the ->need_future_gp[] array · c91a8675

由 Paul E. McKenney 提交于 4月 18, 2018

Accessors for the ->need_future_gp[] array are currently open-coded,
which makes them difficult to change. To improve maintainability, this
commit adds need_future_gp_mask() to compute the indexing mask from the
array size, need_future_gp_element() to access the element corresponding
to the specified grace-period number, and need_any_future_gp() to
determine if any future grace period is needed. This commit also applies
need_future_gp_element() to existing open-coded single-element accesses.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

c91a8675

rcu: Declare rcu_eqs_special_set() in public header · 17672480

由 Yury Norov 提交于 3月 25, 2018

Because rcu_eqs_special_set() is declared only in internal header
kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is
inaccessible outside of the RCU implementation.  This patch therefore
moves the  rcu_eqs_special_set() declaration to include/linux/rcutree.h,
which allows it to be used in non-rcu kernel code.
Signed-off-by: NYury Norov <ynorov@caviumnetworks.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

17672480

rcu: Remove deprecated RCU debugfs tracing code · 6fba2b37

由 Byungchul Park 提交于 3月 02, 2018

Commit ae91aa0a ("rcu: Remove debugfs tracing") removed the
RCU debugfs tracing code, but did not remove the no-longer used
->exp_workdone{0,1,2,3} fields in the srcu_data structure.  This commit
therefore removes these fields along with the code that uselessly
updates them.
Signed-off-by: NByungchul Park <byungchul.park@lge.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

6fba2b37

rcu: Inline rcu_preempt_do_callback() into its sole caller · be01b4ca

由 Byungchul Park 提交于 2月 26, 2018

The rcu_preempt_do_callbacks() function was introduced in commit
09223371(rcu: Use softirq to address performance regression), where it
was necessary to handle kernel builds both containing and not containing
RCU-preempt.  Since then, various changes (most notably f8b7fc6b
("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have
resulted in this function being invoked only from rcu_kthread_do_work(),
which is present only in kernels containing RCU-preempt, which in turn
means that the rcu_preempt_do_callbacks() function is no longer needed.

This commit therefore inlines rcu_preempt_do_callbacks() into its
sole remaining caller and also removes the rcu_state_p and rcu_data_p
indirection for added clarity.
Signed-off-by: NByungchul Park <byungchul.park@lge.com>
Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
[ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

be01b4ca

rcu: Parallelize expedited grace-period initialization · 25f3d7ef

由 Paul E. McKenney 提交于 2月 01, 2018

The latency of RCU expedited grace periods grows with increasing numbers
of CPUs, eventually failing to be all that expedited.  Much of the growth
in latency is in the initialization phase, so this commit uses workqueues
to carry out this initialization concurrently on a rcu_node-by-rcu_node
basis.

This change makes use of a new rcu_par_gp_wq because flushing a work
item from another work item running from the same workqueue can result
in deadlock.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NNicholas Piggin <npiggin@gmail.com>

25f3d7ef

21 2月, 2018 5 次提交

rcu: Remove redundant nxttail index macro define · 65518db8

由 Liu, Changcheng 提交于 1月 16, 2018

RCU's nxttail has been optimized to be a rcu_segcblist, which is
a multi-tailed linked list with macros defined for the indexes for
each tail. The indexes have been defined in linux/rcu_segcblist.h,
so this commit removes the redundant definitions in kernel/rcu/tree.h.
Signed-off-by: NLiu Changcheng <changcheng.liu@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

65518db8

rcu: Remove obsolete force-quiescent-state statistics for debugfs · d62df573

由 Paul E. McKenney 提交于 1月 10, 2018

The debugfs interface displayed statistics on RCU-pending checks but
this interface has since been removed. This commit therefore removes the
no-longer-used rcu_state structure's ->n_force_qs_lh and ->n_force_qs_ngp
fields along with their updates. (Though the ->n_force_qs_ngp field
was actually not used at all, embarrassingly enough.)

If this information proves necessary in the future, the corresponding
event traces will be added.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d62df573

rcu: Remove obsolete __rcu_pending() statistics for debugfs · 01c495f7

由 Paul E. McKenney 提交于 1月 10, 2018

The debugfs interface displayed statistics on RCU-pending checks
but this interface has since been removed.  This commit therefore
removes the no-longer-used rcu_data structure's ->n_rcu_pending,
->n_rp_core_needs_qs, ->n_rp_report_qs, ->n_rp_cb_ready,
->n_rp_cpu_needs_gp, ->n_rp_gp_completed, ->n_rp_gp_started,
->n_rp_nocb_defer_wakeup, and ->n_rp_need_nothing fields along with
their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

01c495f7

rcu: Remove obsolete callback-invocation statistics for debugfs · 62df63e0

由 Paul E. McKenney 提交于 1月 10, 2018

The debugfs interface displayed statistics on RCU callback invocation but
this interface has since been removed.  This commit therefore removes the
no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked
fields along with their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

62df63e0

rcu: Remove obsolete boost statistics for debugfs · bec06785

由 Paul E. McKenney 提交于 1月 10, 2018

The debugfs interface displayed statistics on RCU priority boosting,
but this interface has since been removed.  This commit therefore
removes the no-longer-used rcu_data structure's ->n_tasks_boosted,
->n_exp_boosts, and ->n_exp_boosts and their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bec06785

29 11月, 2017 1 次提交

rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long · 84585aa8

由 Paul E. McKenney 提交于 10月 04, 2017

Because the ->dynticks_nesting field now only contains the process-based
nesting level instead of a value encoding both the process nesting level
and the irq "nesting" level, we no longer need a long long, even on
32-bit systems. This commit therefore changes both the ->dynticks_nesting
and ->dynticks_nmi_nesting fields to long.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

84585aa8

28 11月, 2017 1 次提交

rcu: Make ->dynticks_nesting be a simple counter · 51a1fd30

由 Paul E. McKenney 提交于 10月 03, 2017

Now that ->dynticks_nesting counts only process-level dyntick-idle
entry and exit, there is no need for the elaborate segmented counter
with its guard fields and overflow checking. This commit therefore
makes ->dynticks_nesting be a simple counter.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

51a1fd30

10 10月, 2017 1 次提交

rcu: Make RCU CPU stall warnings check for irq-disabled CPUs · 9b9500da

由 Paul E. McKenney 提交于 8月 17, 2017

One common question upon seeing an RCU CPU stall warning is "did
the stalled CPUs have interrupts disabled?" However, the current
stall warnings are silent on this point. This commit therefore
uses irq_work to check whether stalled CPUs still respond to IPIs,
and flags this state in the RCU CPU stall warning console messages.
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9b9500da

26 7月, 2017 5 次提交

rcu: Localize rcu_state ->orphan_pend and ->orphan_done · f2dbe4a5

由 Paul E. McKenney 提交于 6月 27, 2017

Given that the rcu_state structure's >orphan_pend and ->orphan_done
fields are used only during migration of callbacks from the recently
offlined CPU to a surviving CPU, if rcu_send_cbs_to_orphanage() and
rcu_adopt_orphan_cbs() are combined, these fields can become local
variables in the combined function.  This commit therefore combines
rcu_send_cbs_to_orphanage() and rcu_adopt_orphan_cbs() into a new
rcu_segcblist_merge() function and removes the ->orphan_pend and
->orphan_done fields.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f2dbe4a5

rcu: Eliminate rcu_state ->orphan_lock · 537b85c8

由 Paul E. McKenney 提交于 6月 26, 2017

The ->orphan_lock is acquired and released only within the
rcu_migrate_callbacks() function, which now acquires the root rcu_node
structure's ->lock. This commit therefore eliminates the ->orphan_lock
in favor of the root rcu_node structure's ->lock.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

537b85c8

rcu: Make NOCB CPUs migrate CBs directly from outgoing CPU · b1a2d79f

由 Paul E. McKenney 提交于 6月 26, 2017

RCU's CPU-hotplug callback-migration code first moves the outgoing
CPU's callbacks to ->orphan_done and ->orphan_pend, and only then
moves them to the NOCB callback list. This commit avoids the
extra step (and simplifies the code) by moving the callbacks directly
from the outgoing CPU's callback list to the NOCB callback list.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b1a2d79f

rcu: Remove orphan/adopt event-tracing fields · c47e067a

由 Paul E. McKenney 提交于 6月 23, 2017

The rcu_node structure's ->n_cbs_orphaned and ->n_cbs_adopted fields
are updated, but never read. This commit therefore removes them.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c47e067a

rcu: Use timer as backstop for NOCB deferred wakeups · 8be6e1b1

由 Paul E. McKenney 提交于 4月 29, 2017

The handling of RCU's no-CBs CPUs has a maintenance headache, namely
that if call_rcu() is invoked with interrupts disabled, the rcuo kthread
wakeup must be defered to a point where we can be sure that scheduler
locks are not held.  Of course, there are a lot of code paths leading
from an interrupts-disabled invocation of call_rcu(), and missing any
one of these can result in excessive callback-invocation latency, and
potentially even system hangs.

This commit therefore uses a timer to guarantee that the wakeup will
eventually occur.  If one of the deferred-wakeup points kicks in, then
the timer is simply cancelled.

This commit also fixes up an incomplete removal of commits that were
intended to plug remaining exit paths, which should have the added
benefit of reducing the overhead of RCU's context-switch hooks.  In
addition, it simplifies leader-to-follower callback-list handoff by
introducing locking.  The call_rcu()-to-leader handoff continues to
use atomic operations in order to maintain good real-time latency for
common-case use of call_rcu().
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Dan Carpenter fix for mod_timer() usage bug found by smatch. ]

8be6e1b1

09 6月, 2017 4 次提交

rcu: Remove debugfs tracing · ae91aa0a

由 Paul E. McKenney 提交于 5月 15, 2017

RCU's debugfs tracing used to be the only reasonable low-level debug
information available, but ftrace and event tracing has since surpassed
the RCU debugfs level of usefulness.  This commit therefore removes
RCU's debugfs tracing.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

ae91aa0a

rcu: Remove nohz_full full-system-idle state machine · fe5ac724

由 Paul E. McKenney 提交于 5月 11, 2017

The NO_HZ_FULL_SYSIDLE full-system-idle capability was added in 2013
by commit 0edd1b17 ("nohz_full: Add full-system-idle state machine"),
but has not been used.  This commit therefore removes it.

If it turns out to be needed later, this commit can always be reverted.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe5ac724

rcu: Move rnp->lock wrappers for SRCU use · 83d40bd3

由 Paul E. McKenney 提交于 5月 09, 2017

This commit moves the now-generic rnp->lock wrapper macros from
kernel/rcu/tree.h to kernel/rcu/rcu.h, thus allowing SRCU to use them.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

83d40bd3

rcu: Convert rnp->lock wrappers to macros for SRCU use · bf32c765

由 Paul E. McKenney 提交于 5月 09, 2017

Use of smp_mb__after_unlock_lock() would allow SRCU to omit a full
memory barrier during callback execution, so this commit converts
raw_spin_lock_rcu_node() from inline functions to type-generic macros
to allow them to handle locks in srcu_node structures as well as
rcu_node structures.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bf32c765

08 6月, 2017 2 次提交

rcu: Use RCU_NOCB_WAKE rather than RCU_NOGP_WAKE · 511324e4

由 Paul E. McKenney 提交于 4月 28, 2017

The RCU_NOGP_WAKE_NOT, RCU_NOGP_WAKE, and RCU_NOGP_WAKE_FORCE flags
are used to mediate wakeups for the no-CBs CPU kthreads. The "NOGP"
really doesn't make any sense, so this commit does s/NOGP/NOCB/.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

511324e4

rcu: Complain if blocking in preemptible RCU read-side critical section · 5b72f964

由 Paul E. McKenney 提交于 4月 12, 2017

Although preemptible RCU allows its read-side critical sections to be
preempted, general blocking is forbidden. The reason for this is that
excessive preemption times can be handled by CONFIG_RCU_BOOST=y, but a
voluntarily blocked task doesn't care how high you boost its priority.
Because preemptible RCU is a global mechanism, one ill-behaved reader
hurts everyone. Hence the prohibition against general blocking in
RCU-preempt read-side critical sections. Preemption yes, blocking no.

This commit enforces this prohibition.

There is a special exception for the -rt patchset (which they kindly
volunteered to implement): It is OK to block (as opposed to merely being
preempted) within an RCU-preempt read-side critical section, but only if
the blocking is subject to priority inheritance. This exception permits
CONFIG_RCU_BOOST=y to get -rt RCU readers out of trouble.

Why doesn't this exception also apply to mainline's rt_mutex? Because
of the possibility that someone does general blocking while holding
an rt_mutex. Yes, the priority boosting will affect the rt_mutex,
but it won't help with the task doing general blocking while holding
that rt_mutex.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5b72f964

02 5月, 2017 1 次提交

srcu: Debloat the <linux/rcu_segcblist.h> header · 45753c5f

由 Ingo Molnar 提交于 5月 02, 2017

Linus noticed that the <linux/rcu_segcblist.h> has huge inline functions
which should not be inline at all.

As a first step in cleaning this up, move them all to kernel/rcu/ and
only keep an absolute minimum of data type defines in the header:

  before:   -rw-r--r-- 1 mingo mingo 22284 May  2 10:25 include/linux/rcu_segcblist.h
   after:   -rw-r--r-- 1 mingo mingo  3180 May  2 10:22 include/linux/rcu_segcblist.h

More can be done, such as uninlining the large functions, which inlining
is unjustified even if it's an RCU internal matter.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

45753c5f

21 4月, 2017 1 次提交

srcu: Parallelize callback handling · da915ad5

由 Paul E. McKenney 提交于 4月 05, 2017

Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2],
however, there are workloads that could result in a high volume of
concurrent invocations of call_srcu(), which with current SRCU would
result in excessive lock contention on the srcu_struct structure's
->queue_lock, which protects SRCU's callback lists. This commit therefore
moves SRCU to per-CPU callback lists, thus greatly reducing contention.

Because a given SRCU instance no longer has a single centralized callback
list, starting grace periods and invoking callbacks are both more complex
than in the single-list Classic SRCU implementation. Starting grace
periods and handling callbacks are now handled using an srcu_node tree
that is in some ways similar to the rcu_node trees used by RCU-bh,
RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is
controlled by exactly the same Kconfig options and boot parameters that
control the shape of the rcu_node tree).

In addition, the old per-CPU srcu_array structure is now named srcu_data
and contains an rcu_segcblist structure named ->srcu_cblist for its
callbacks (and a spinlock to protect this). The srcu_struct gets
an srcu_gp_seq that is used to associate callback segments with the
corresponding completion-time grace-period number. These completion-time
grace-period numbers are propagated up the srcu_node tree so that the
grace-period workqueue handler can determine whether additional grace
periods are needed on the one hand and where to look for callbacks that
are ready to be invoked.

The srcu_barrier() function must now wait on all instances of the per-CPU
->srcu_cblist. Because each ->srcu_cblist is protected by ->lock,
srcu_barrier() can remotely add the needed callbacks. In theory,
it could also remotely start grace periods, but in practice doing so
is complex and racy. And interestingly enough, it is never necessary
for srcu_barrier() to start a grace period because srcu_barrier() only
enqueues a callback when a callback is already present--and it turns out
that a grace period has to have already been started for this pre-existing
callback. Furthermore, it is only the callback that srcu_barrier()
needs to wait on, not any particular grace period. Therefore, a new
rcu_segcblist_entrain() function enqueues the srcu_barrier() function's
callback into the same segment occupied by the last pre-existing callback
in the list. The special case where all the pre-existing callbacks are
on a different list (because they are in the process of being invoked)
is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL
segment, relying on the done-callbacks check that takes place after all
callbacks are inovked.

Note that the readers use the same algorithm as before. Note that there
is a separate srcu_idx that tells the readers what counter to increment.
This unfortunately cannot be combined with srcu_gp_seq because they
need to be incremented at different times.

This commit introduces some ugly #ifdefs in rcutorture. These will go
away when I feel good enough about Tree SRCU to ditch Classic SRCU.

Some crude performance comparisons, courtesy of a quickly hacked rcuperf
asynchronous-grace-period capability:

Callback Queuing Overhead
-------------------------
# CPUS Classic SRCU Tree SRCU
------ ------------ ---------
2 0.349 us 0.342 us
16 31.66 us 0.4 us
41 --------- 0.417 us

The times are the 90th percentiles, a statistic that was chosen to reject
the overheads of the occasional srcu_barrier() call needed to avoid OOMing
the test machine. The rcuperf test hangs when running Classic SRCU at 41
CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code
and that statistics, this is a convincing demonstration of Tree SRCU's
performance and scalability advantages.

[1] https://lwn.net/Articles/309030/
[2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]

da915ad5

19 4月, 2017 10 次提交

srcu: Move rcu_node traversal macros to rcu.h · efbe451d

由 Paul E. McKenney 提交于 3月 15, 2017

This commit moves rcu_for_each_node_breadth_first(),
rcu_for_each_nonleaf_node_breadth_first(), and
rcu_for_each_leaf_node() from kernel/rcu/tree.h to
kernel/rcu/rcu.h so that SRCU can access them.
This commit is code-movement only.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

efbe451d

srcu: Move combining-tree definitions for SRCU's benefit · f2425b4e

由 Paul E. McKenney 提交于 3月 14, 2017

This commit moves the C preprocessor code that defines the default shape
of the rcu_node combining tree to a new include/linux/rcu_node_tree.h
file as a first step towards enabling SRCU to create its own combining
tree, which in turn enables SRCU to implement per-CPU callback handling,
thus avoiding contention on the lock currently guarding the single list
of callbacks. Note that users of SRCU still need to know the size of
the srcu_struct structure, hence include/linux rather than kernel/rcu.

This commit is code-movement only.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f2425b4e

srcu: Use rcu_segcblist to track SRCU callbacks · 8660b7d8

由 Paul E. McKenney 提交于 3月 13, 2017

This commit switches SRCU from custom-built callback queues to the new
rcu_segcblist structure.  This change associates grace-period sequence
numbers with groups of callbacks, which will be needed for efficient
processing of per-CPU callbacks.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

8660b7d8

srcu: Abstract multi-tail callback list handling · 15fecf89

由 Paul E. McKenney 提交于 2月 08, 2017

RCU has only one multi-tail callback list, which is implemented via
the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the
rcu_data structure, and whose operations are open-code throughout the
Tree RCU implementation. This has been more or less OK in the past,
but upcoming callback-list optimizations in SRCU could really use
a multi-tail callback list there as well.

This commit therefore abstracts the multi-tail callback list handling
into a new kernel/rcu/rcu_segcblist.h file, and uses this new API.
The simple head-and-tail pointer callback list is also abstracted and
applied everywhere except for the NOCB callback-offload lists. (Yes,
the plan is to apply them there as well, but this commit is already
bigger than would be good.)
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

15fecf89

rcu: Default RCU_FANOUT_LEAF to 16 unless explicitly changed · b8c78d3a

由 Paul E. McKenney 提交于 2月 03, 2017

If the RCU_EXPERT Kconfig option is not set (the default), then the
RCU_FANOUT_LEAF Kconfig option will not be defined, which will cause
the leaf-level rcu_node tree fanout to default to 32 on 32-bit systems
and 64 on 64-bit systems. This can result in excessive lock contention.
This commit therefore changes the computation of the leaf-level rcu_node
tree fanout so that the result will be 16 unless an explicit Kconfig or
kernel-boot setting says otherwise.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b8c78d3a

rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions · 9226b10d

由 Paul E. McKenney 提交于 1月 27, 2017

The rcu_all_qs() and rcu_note_context_switch() do a series of checks,
taking various actions to supply RCU with quiescent states, depending
on the outcomes of the various checks. This is a bit much for scheduling
fastpaths, so this commit creates a separate ->rcu_urgent_qs field in
the rcu_dynticks structure that acts as a global guard for these checks.
Thus, in the common case, rcu_all_qs() and rcu_note_context_switch()
check the ->rcu_urgent_qs field, find it false, and simply return.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>

9226b10d

rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle() · 0f9be8ca

由 Paul E. McKenney 提交于 1月 27, 2017

The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking
that one of them still needs a quiescent state before doing an expensive
atomic operation on the ->dynticks counter. However, this check reduces
overhead only after a rare race condition, and increases complexity. This
commit therefore removes the scan and the mechanism enabling the scan.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0f9be8ca

rcu: Pull rcu_qs_ctr into rcu_dynticks structure · 9577df9a

由 Paul E. McKenney 提交于 1月 26, 2017

The rcu_qs_ctr variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9577df9a

rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure · abb06b99

由 Paul E. McKenney 提交于 1月 26, 2017

The rcu_sched_qs_mask variable is yet another isolated per-CPU variable,
so this commit pulls it into the pre-existing rcu_dynticks per-CPU
structure.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

abb06b99

rcu: Maintain special bits at bottom of ->dynticks counter · b8c17e66

由 Paul E. McKenney 提交于 11月 08, 2016

Currently, IPIs are used to force other CPUs to invalidate their TLBs
in response to a kernel virtual-memory mapping change. This works, but
degrades both battery lifetime (for idle CPUs) and real-time response
(for nohz_full CPUs), and in addition results in unnecessary IPIs due to
the fact that CPUs executing in usermode are unaffected by stale kernel
mappings. It would be better to cause a CPU executing in usermode to
wait until it is entering kernel mode to do the flush, first to avoid
interrupting usemode tasks and second to handle multiple flush requests
with a single flush in the case of a long-running user task.

This commit therefore reserves a bit at the bottom of the ->dynticks
counter, which is checked upon exit from extended quiescent states.
If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is
invoked, which, if not supplied, is an empty single-pass do-while loop.
If this bottom bit is set on -entry- to an extended quiescent state,
then a WARN_ON_ONCE() triggers.

This bottom bit may be set using a new rcu_eqs_special_set() function,
which returns true if the bit was set, or false if the CPU turned
out to not be in an extended quiescent state. Please note that this
function refuses to set the bit for a non-nohz_full CPU when that CPU
is executing in usermode because usermode execution is tracked by RCU
as a dyntick-idle extended quiescent state only for nohz_full CPUs.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

b8c17e66

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多