- 03 5月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
Because the rcu_cblist_n_cbs() just samples the ->len counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level of indirection. This commit makes this change. Reported-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
-
- 02 5月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
Because the rcu_cblist_empty() just samples the ->head pointer, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a level of indirection. This commit makes this change. Reported-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
-
- 27 4月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
In the past, SRCU was simple enough that there was little point in making the rcutorture writer stall messages print the SRCU grace-period number state. With the advent of Tree SRCU, this has changed. This commit therefore makes Classic, Tiny, and Tree SRCU report this state to rcutorture as needed. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NMike Galbraith <efault@gmx.de>
-
- 21 4月, 2017 2 次提交
-
-
由 Paul E. McKenney 提交于
Currently, a call to schedule() acts as a Tasks RCU quiescent state only if a context switch actually takes place. However, just the call to schedule() guarantees that the calling task has moved off of whatever tracing trampoline that it might have been one previously. This commit therefore plumbs schedule()'s "preempt" parameter into rcu_note_context_switch(), which then records the Tasks RCU quiescent state, but only if this call to schedule() was -not- due to a preemption. To avoid adding overhead to the common-case context-switch path, this commit hides the rcu_note_context_switch() check under an existing non-common-case check. Suggested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2], however, there are workloads that could result in a high volume of concurrent invocations of call_srcu(), which with current SRCU would result in excessive lock contention on the srcu_struct structure's ->queue_lock, which protects SRCU's callback lists. This commit therefore moves SRCU to per-CPU callback lists, thus greatly reducing contention. Because a given SRCU instance no longer has a single centralized callback list, starting grace periods and invoking callbacks are both more complex than in the single-list Classic SRCU implementation. Starting grace periods and handling callbacks are now handled using an srcu_node tree that is in some ways similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is controlled by exactly the same Kconfig options and boot parameters that control the shape of the rcu_node tree). In addition, the old per-CPU srcu_array structure is now named srcu_data and contains an rcu_segcblist structure named ->srcu_cblist for its callbacks (and a spinlock to protect this). The srcu_struct gets an srcu_gp_seq that is used to associate callback segments with the corresponding completion-time grace-period number. These completion-time grace-period numbers are propagated up the srcu_node tree so that the grace-period workqueue handler can determine whether additional grace periods are needed on the one hand and where to look for callbacks that are ready to be invoked. The srcu_barrier() function must now wait on all instances of the per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected by ->lock, srcu_barrier() can remotely add the needed callbacks. In theory, it could also remotely start grace periods, but in practice doing so is complex and racy. And interestingly enough, it is never necessary for srcu_barrier() to start a grace period because srcu_barrier() only enqueues a callback when a callback is already present--and it turns out that a grace period has to have already been started for this pre-existing callback. Furthermore, it is only the callback that srcu_barrier() needs to wait on, not any particular grace period. Therefore, a new rcu_segcblist_entrain() function enqueues the srcu_barrier() function's callback into the same segment occupied by the last pre-existing callback in the list. The special case where all the pre-existing callbacks are on a different list (because they are in the process of being invoked) is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on the done-callbacks check that takes place after all callbacks are inovked. Note that the readers use the same algorithm as before. Note that there is a separate srcu_idx that tells the readers what counter to increment. This unfortunately cannot be combined with srcu_gp_seq because they need to be incremented at different times. This commit introduces some ugly #ifdefs in rcutorture. These will go away when I feel good enough about Tree SRCU to ditch Classic SRCU. Some crude performance comparisons, courtesy of a quickly hacked rcuperf asynchronous-grace-period capability: Callback Queuing Overhead ------------------------- # CPUS Classic SRCU Tree SRCU ------ ------------ --------- 2 0.349 us 0.342 us 16 31.66 us 0.4 us 41 --------- 0.417 us The times are the 90th percentiles, a statistic that was chosen to reject the overheads of the occasional srcu_barrier() call needed to avoid OOMing the test machine. The rcuperf test hangs when running Classic SRCU at 41 CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code and that statistics, this is a convincing demonstration of Tree SRCU's performance and scalability advantages. [1] https://lwn.net/Articles/309030/ [2] https://patchwork.kernel.org/patch/5108281/Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
-
- 20 4月, 2017 4 次提交
-
-
由 Paul E. McKenney 提交于
This commit just changes a "the the" to "the" to reduce repetition. Reported-by: NMichalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Nicholas Mc Guire 提交于
The beenonline variable is declared bool so there is no need for an explicit comparison, especially not against the constant zero. Signed-off-by: NNicholas Mc Guire <der.herr@hofr.at> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this commit drags this comment into the year 2017. Reported-by: NMichalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 19 4月, 2017 12 次提交
-
-
由 Paul E. McKenney 提交于
This commit makes the num_rcu_lvl[] array external so that SRCU can make use of it for initializing its upcoming srcu_node tree. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The levelcnt[] array is identical to num_rcu_lvl[], so this commit removes levelcnt[]. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
This commit moves the rcu_init_levelspread() function from kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is another step towards enabling SRCU to create its own combining tree. This commit is code-movement only, give or take knock-on adjustments. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h. This will allow SRCU to use these functions, which in turn will allow SRCU to move from a single global callback queue to a per-CPU callback queue. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
RCU has only one multi-tail callback list, which is implemented via the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the rcu_data structure, and whose operations are open-code throughout the Tree RCU implementation. This has been more or less OK in the past, but upcoming callback-list optimizations in SRCU could really use a multi-tail callback list there as well. This commit therefore abstracts the multi-tail callback list handling into a new kernel/rcu/rcu_segcblist.h file, and uses this new API. The simple head-and-tail pointer callback list is also abstracted and applied everywhere except for the NOCB callback-offload lists. (Yes, the plan is to apply them there as well, but this commit is already bigger than would be good.) Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The rcu_all_qs() and rcu_note_context_switch() do a series of checks, taking various actions to supply RCU with quiescent states, depending on the outcomes of the various checks. This is a bit much for scheduling fastpaths, so this commit creates a separate ->rcu_urgent_qs field in the rcu_dynticks structure that acts as a global guard for these checks. Thus, in the common case, rcu_all_qs() and rcu_note_context_switch() check the ->rcu_urgent_qs field, find it false, and simply return. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> -
由 Paul E. McKenney 提交于
The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking that one of them still needs a quiescent state before doing an expensive atomic operation on the ->dynticks counter. However, this check reduces overhead only after a rare race condition, and increases complexity. This commit therefore removes the scan and the mechanism enabling the scan. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The rcu_qs_ctr variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The rcu_sched_qs_mask variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The current use of "RCU_TRACE(statement);" can cause odd bugs, especially where "statement" is a local-variable declaration, as it can leave a misplaced ";" in the source code. This commit therefore converts these to "RCU_TRACE(statement;)", which avoids the misplaced ";". Reported-by: NJosh Triplett <josh@joshtriplett.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Currently, IPIs are used to force other CPUs to invalidate their TLBs in response to a kernel virtual-memory mapping change. This works, but degrades both battery lifetime (for idle CPUs) and real-time response (for nohz_full CPUs), and in addition results in unnecessary IPIs due to the fact that CPUs executing in usermode are unaffected by stale kernel mappings. It would be better to cause a CPU executing in usermode to wait until it is entering kernel mode to do the flush, first to avoid interrupting usemode tasks and second to handle multiple flush requests with a single flush in the case of a long-running user task. This commit therefore reserves a bit at the bottom of the ->dynticks counter, which is checked upon exit from extended quiescent states. If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is invoked, which, if not supplied, is an empty single-pass do-while loop. If this bottom bit is set on -entry- to an extended quiescent state, then a WARN_ON_ONCE() triggers. This bottom bit may be set using a new rcu_eqs_special_set() function, which returns true if the bit was set, or false if the CPU turned out to not be in an extended quiescent state. Please note that this function refuses to set the bit for a non-nohz_full CPU when that CPU is executing in usermode because usermode execution is tracked by RCU as a dyntick-idle extended quiescent state only for nohz_full CPUs. Reported-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 02 3月, 2017 3 次提交
-
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
So rcupdate.h is a pretty complex header, in particular it includes <linux/completion.h> which includes <linux/wait.h> - creating a dependency that includes <linux/wait.h> in <linux/sched.h>, which prevents the isolation of <linux/sched.h> from the derived <linux/wait.h> header. Solve part of the problem by decoupling rcupdate.h from completions: this can be done by separating out the rcu_synchronize types and APIs, and updating their usage sites. Since this is a mostly RCU-internal types this will not just simplify <linux/sched.h>'s dependencies, but will make all the hundreds of .c files that include rcupdate.h but not completions or wait.h build faster. ( For rcutiny this means that two dependent APIs have to be uninlined, but that shouldn't be much of a problem as they are rare variants. ) Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 1月, 2017 11 次提交
-
-
由 Paul E. McKenney 提交于
Commit 7ec99de3 ("rcu: Provide exact CPU-online tracking for RCU"), as its title suggests, got rid of RCU's remaining CPU-hotplug timing guesswork. This commit therefore removes the one-jiffy kludge that was used to paper over this guesswork. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
Commit 4a81e832 ("rcu: Reduce overhead of cond_resched() checks for RCU") moved quiescent-state generation out of cond_resched() and commit bde6c3aa ("rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops") introduced cond_resched_rcu_qs(), and commit 5cd37193 ("rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently polled by the RCU core state machine. This frequent polling can increase grace-period rate, which in turn increases grace-period overhead, which is visible in some benchmarks (for example, the "open1" benchmark in Anton Blanchard's "will it scale" suite). This commit therefore reduces the rate at which rcu_qs_ctr is polled by moving that polling into the force-quiescent-state (FQS) machinery, and by further polling it only after the grace period has been in effect for at least jiffies_till_sched_qs jiffies. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
This commit is the third step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of 1 and entry checks in a new rcu_dynticks_eqs_enter() function, and the same but with exit checks in a new rcu_dynticks_eqs_exit() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fixed RCU_TRACE() statements added by this commit. ] Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
The rcu_cpu_starting() function uses this_cpu_ptr() to locate the incoming CPU's rcu_data structure. This works for the boot CPU and for all CPUs onlined after rcu_init() executes (during very early boot). Currently, this is the full set of CPUs, so all is well. But if anyone ever parallelizes boot before rcu_init() time, it will fail. This commit therefore substitutes the rcu_cpu_starting() function's this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and (arguably) improving readability. This commit inadvertently fixes a latent bug: If there ever had been more than just the boot CPU online at rcu_init() time, the old code would not initialize the non-boot CPUs, but rather would repeatedly initialize the boot CPU. Reported-by: NBoqun Feng <boqun.feng@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
Chris Friesen notice that rcuc/X kthreads were consuming CPU even on NOCB CPUs. This makes no sense because the only purpose or these kthreads is to invoke normal (non-offloaded) callbacks, of which there will never be any on NOCB CPUs. This problem was due to a bug in cpu_has_callbacks_ready_to_invoke(), which should have been checking ->nxttail[RCU_NEXT_TAIL] for NULL, but which was instead (incorrectly) checking ->nxttail[RCU_DONE_TAIL]. Because ->nxttail[RCU_DONE_TAIL] is never NULL, the only effect is to cause the rcuc/X kthread to execute when it should not do so. This commit therefore checks ->nxttail[RCU_NEXT_TAIL], which is NULL for NOCB CPUs. Reported-by: NChris Friesen <chris.friesen@windriver.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
This commit is for all intents and purposes a revert of bc1dce51 ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose that this can now safely be reverted is the presence of 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI"), which is said to have made NMI-based stack dumps safe. However, this reversion keeps one nice property of bc1dce51 ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that only those CPUs blocking the grace period are dumped. The new trigger_single_cpu_backtrace() is used to make this happen, as suggested by Josh Poimboeuf. Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: NPetr Mladek <pmladek@suse.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
Commit 4914950a ("rcu: Stop treating in-kernel CPU-bound workloads as errors") added a (relatively) short-timeout call to resched_cpu(). This was inspired by as issue that was fixed by b7e7ade3 ("sched/core: Fix remote wakeups"). But given that this issue was fixed, it is time for the current commit to remove this call to resched_cpu(). Reported-by: NByungchul Park <byungchul.park@lge.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
This commit prepares for the removal of short-term CPU kicking (in a subsequent commit). It does so by starting to invoke resched_cpu() for each holdout at each force-quiescent-state interval that is more than halfway through the stall-warning interval. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Tobias Klauser 提交于
Since commit 7ec99de3 ("rcu: Provide exact CPU-online tracking for RCU"), the variable mask in rcu_init_percpu_data is set but no longer used. Remove it to fix the following warning when building with 'W=1': kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’: kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable] Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Byungchul Park 提交于
The print_other_cpu_stall() function currently unconditionally invokes rcu_print_detail_task_stall(). This is OK because if there was a stall sufficient to cause print_other_cpu_stall() to be invoked, that stall is very likely to persist through the entire print_other_cpu_stall() execution. However, if the stall did not persist, the variable ndetected will be zero, and that variable is already tested in an "if" statement. Therefore, this commit moves the call to rcu_print_detail_task_stall() under that pre-existing "if" to improve readability, with a very rare reduction in overhead. Signed-off-by: NByungchul Park <byungchul.park@lge.com> [ paulmck: Reworked commit log. ] Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 17 1月, 2017 2 次提交
-
-
由 Paul E. McKenney 提交于
This commit is the second step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of zero in a new rcu_dynticks_snap() function. This abstraction will ease changes o the ->dynticks counter operation. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
This commit is the first step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of two in a new rcu_dynticks_momentary_idle() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 15 1月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
The current preemptible RCU implementation goes through three phases during bootup. In the first phase, there is only one CPU that is running with preemption disabled, so that a no-op is a synchronous grace period. In the second mid-boot phase, the scheduler is running, but RCU has not yet gotten its kthreads spawned (and, for expedited grace periods, workqueues are not yet running. During this time, any attempt to do a synchronous grace period will hang the system (or complain bitterly, depending). In the third and final phase, RCU is fully operational and everything works normally. This has been OK for some time, but there has recently been some synchronous grace periods showing up during the second mid-boot phase. This code worked "by accident" for awhile, but started failing as soon as expedited RCU grace periods switched over to workqueues in commit 8b355e3b ("rcu: Drive expedited grace periods from workqueue"). Note that the code was buggy even before this commit, as it was subject to failure on real-time systems that forced all expedited grace periods to run as normal grace periods (for example, using the rcu_normal ksysfs parameter). The callchain from the failure case is as follows: early_amd_iommu_init() |-> acpi_put_table(ivrs_base); |-> acpi_tb_put_table(table_desc); |-> acpi_tb_invalidate_table(table_desc); |-> acpi_tb_release_table(...) |-> acpi_os_unmap_memory |-> acpi_os_unmap_iomem |-> acpi_os_map_cleanup |-> synchronize_rcu_expedited The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y, which caused the code to try using workqueues before they were initialized, which did not go well. This commit therefore reworks RCU to permit synchronous grace periods to proceed during this mid-boot phase. This commit is therefore a fix to a regression introduced in v4.9, and is therefore being put forward post-merge-window in v4.10. This commit sets a flag from the existing rcu_scheduler_starting() function which causes all synchronous grace periods to take the expedited path. The expedited path now checks this flag, using the requesting task to drive the expedited grace period forward during the mid-boot phase. Finally, this flag is updated by a core_initcall() function named rcu_exp_runtime_mode(), which causes the runtime codepaths to be used. Note that this arrangement assumes that tasks are not sent POSIX signals (or anything similar) from the time that the first task is spawned through core_initcall() time. Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue") Reported-by: N"Zheng, Lv" <lv.zheng@intel.com> Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NStan Kain <stan.kain@gmail.com> Tested-by: NIvan <waffolz@hotmail.com> Tested-by: NEmanuel Castelo <emanuel.castelo@gmail.com> Tested-by: NBruno Pesavento <bpesavento@infinito.it> Tested-by: NBorislav Petkov <bp@suse.de> Tested-by: NFrederic Bezies <fredbezies@gmail.com> Cc: <stable@vger.kernel.org> # 4.9.0-
-
- 15 11月, 2016 2 次提交
-
-
由 Paul E. McKenney 提交于
The current code can result in spurious kicks when there are no grace periods in progress and no grace-period-related requests. This is sort of OK for a diagnostic aid, but the resulting ftrace-dump messages in dmesg are annoying. This commit therefore avoids spurious kicks in the common case. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
The __call_rcu() comment about opportunistically noting grace period beginnings and endings is obsolete. RCU still does such opportunistic noting, but in __call_rcu_core() rather than __call_rcu(), and there already is an appropriate comment in __call_rcu_core(). This commit therefore removes the obsolete comment. Reported-by: NMichalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-