- 15 1月, 2017 2 次提交
-
-
由 Paul E. McKenney 提交于
The current preemptible RCU implementation goes through three phases during bootup. In the first phase, there is only one CPU that is running with preemption disabled, so that a no-op is a synchronous grace period. In the second mid-boot phase, the scheduler is running, but RCU has not yet gotten its kthreads spawned (and, for expedited grace periods, workqueues are not yet running. During this time, any attempt to do a synchronous grace period will hang the system (or complain bitterly, depending). In the third and final phase, RCU is fully operational and everything works normally. This has been OK for some time, but there has recently been some synchronous grace periods showing up during the second mid-boot phase. This code worked "by accident" for awhile, but started failing as soon as expedited RCU grace periods switched over to workqueues in commit 8b355e3b ("rcu: Drive expedited grace periods from workqueue"). Note that the code was buggy even before this commit, as it was subject to failure on real-time systems that forced all expedited grace periods to run as normal grace periods (for example, using the rcu_normal ksysfs parameter). The callchain from the failure case is as follows: early_amd_iommu_init() |-> acpi_put_table(ivrs_base); |-> acpi_tb_put_table(table_desc); |-> acpi_tb_invalidate_table(table_desc); |-> acpi_tb_release_table(...) |-> acpi_os_unmap_memory |-> acpi_os_unmap_iomem |-> acpi_os_map_cleanup |-> synchronize_rcu_expedited The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y, which caused the code to try using workqueues before they were initialized, which did not go well. This commit therefore reworks RCU to permit synchronous grace periods to proceed during this mid-boot phase. This commit is therefore a fix to a regression introduced in v4.9, and is therefore being put forward post-merge-window in v4.10. This commit sets a flag from the existing rcu_scheduler_starting() function which causes all synchronous grace periods to take the expedited path. The expedited path now checks this flag, using the requesting task to drive the expedited grace period forward during the mid-boot phase. Finally, this flag is updated by a core_initcall() function named rcu_exp_runtime_mode(), which causes the runtime codepaths to be used. Note that this arrangement assumes that tasks are not sent POSIX signals (or anything similar) from the time that the first task is spawned through core_initcall() time. Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue") Reported-by: N"Zheng, Lv" <lv.zheng@intel.com> Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NStan Kain <stan.kain@gmail.com> Tested-by: NIvan <waffolz@hotmail.com> Tested-by: NEmanuel Castelo <emanuel.castelo@gmail.com> Tested-by: NBruno Pesavento <bpesavento@infinito.it> Tested-by: NBorislav Petkov <bp@suse.de> Tested-by: NFrederic Bezies <fredbezies@gmail.com> Cc: <stable@vger.kernel.org> # 4.9.0-
-
由 Paul E. McKenney 提交于
It is now legal to invoke synchronize_sched() at early boot, which causes Tiny RCU's synchronize_sched() to emit spurious splats. This commit therefore removes the cond_resched() from Tiny RCU's synchronize_sched(). Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue") Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.9.0-
-
- 15 11月, 2016 6 次提交
-
-
由 Paul E. McKenney 提交于
The current code can result in spurious kicks when there are no grace periods in progress and no grace-period-related requests. This is sort of OK for a diagnostic aid, but the resulting ftrace-dump messages in dmesg are annoying. This commit therefore avoids spurious kicks in the common case. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
Expedited grace periods check dyntick-idle state, and avoid sending IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL kernels, nohz_full CPUs. However, the kernel has been observed checking a CPU while it was non-idle, but sending the IPI after it has gone idle. This commit therefore rechecks idle state immediately before sending the IPI, refraining from IPIing CPUs that have since gone idle. Reported-by: NRik van Riel <riel@redhat.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Although rcutorture will occasionally do a 50-millisecond grace-period delay, these delays are quite rare. And rightly so, because otherwise the read rate would be quite low. Thie means that it can be important to identify whether or not a given run contained a long-delay read. This commit therefore inserts a trace_rcu_torture_read() event to flag runs containing long delays. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The __call_rcu() comment about opportunistically noting grace period beginnings and endings is obsolete. RCU still does such opportunistic noting, but in __call_rcu_core() rather than __call_rcu(), and there already is an appropriate comment in __call_rcu_core(). This commit therefore removes the obsolete comment. Reported-by: NMichalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
In the deep past, rcu_check_callbacks() was only invoked if rcu_pending() returned true. Which was fine, but these days rcu_check_callbacks() is invoked unconditionally. This commit therefore removes the obsolete sentence from the header comment. Reported-by: NMichalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
由 Paul E. McKenney 提交于
Commit 720abae3 ("rcu: force alignment on struct callback_head/rcu_head") forced the rcu_head (AKA callback_head) structure's alignment to pointer size, that is, to 4-byte boundaries on 32-bit systems and to 8-byte boundaries on 64-bit systems. This commit therefore checks for this same alignment in __call_rcu(), which used to contain a looser check for two-byte alignment. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
-
- 11 10月, 2016 1 次提交
-
-
由 Emese Revfy 提交于
The __latent_entropy gcc attribute can be used only on functions and variables. If it is on a function then the plugin will instrument it for gathering control-flow entropy. If the attribute is on a variable then the plugin will initialize it with random contents. The variable must be an integer, an integer array type or a structure with integer fields. These specific functions have been selected because they are init functions (to help gather boot-time entropy), are called at unpredictable times, or they have variable loops, each of which provide some level of latent entropy. Signed-off-by: NEmese Revfy <re.emese@gmail.com> [kees: expanded commit message] Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 23 8月, 2016 14 次提交
-
-
由 SeongJae Park 提交于
A few rcuperf dmesg output messages have no space between the flag and the start of the message. In contrast, every other messages consistently supplies a single space. This difference makes rcuperf dmesg output hard to read and to mechanically parse. This commit therefore fixes this problem by modifying a pr_alert() call and PERFOUT_STRING() macro function to provide that single space. Signed-off-by: NSeongJae Park <sj38.park@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 SeongJae Park 提交于
Tests for rcu_barrier() were introduced by commit fae4b54f ("rcu: Introduce rcutorture testing for rcu_barrier()"). This commit updated the documentation to say that the "rtbe" field in rcutorture's dmesg output indicates test failure. However, the code was not updated, only the documentation. This commit therefore updates the code to match the updated documentation. Signed-off-by: NSeongJae Park <sj38.park@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit adds a dump of the scheduler state for stalled rcutorture writer tasks. This addition provides yet more debug for the intermittent "failures to proceed", where grace periods move ahead but the rcutorture writer tasks fail to do so. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Up to now, RCU has assumed that the CPU-online process makes it from CPU_UP_PREPARE to set_cpu_online() within one jiffy. Given the recent rise of virtualized environments, this assumption is very clearly obsolete. Failing to meet this deadline can result in RCU paying attention to an incoming CPU for one jiffy, then ignoring it until the grace period following the one in which that CPU sets itself online. This situation might prove to be fatally disappointing to any RCU read-side critical sections that had the misfortune to execute during the time in which RCU was ignoring the slow-to-come-online CPU. This commit therefore updates RCU's internal CPU state-tracking information at notify_cpu_starting() time, thus providing RCU with an exact transition of the CPU's state from offline to online. Note that this means that incoming CPUs must not use RCU read-side critical section (other than those of SRCU) until notify_cpu_starting() time. Note also that the CPU_STARTING notifiers -are- allowed to use RCU read-side critical sections. (Of course, CPU-hotplug notifiers are rapidly becoming obsolete, so you need to act fast!) If a given architecture or CPU family needs to use RCU read-side critical sections earlier, the call to rcu_cpu_starting() from notify_cpu_starting() will need to be architecture-specific, with architectures that need early use being required to hand-place the call to rcu_cpu_starting() at some point preceding the call to notify_cpu_starting(). Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Currently, __note_gp_changes() checks to see if the CPU has slept through multiple grace periods. If it has, it resynchronizes that CPU's view of the grace-period state, which includes whether or not the current grace period needs a quiescent state from this CPU. The fact of this need (or lack thereof) needs to be in two places, rdp->cpu_no_qs.b.norm and rdp->core_needs_qs. The former tells RCU's context-switch code to go get a quiescent state and the latter says that it needs to be reported. The current code unconditionally sets the former to true, but correctly sets the latter. This does not result in failures, but it does unnecessarily increase the amount of work done on average at context-switch time. This commit therefore correctly sets both fields. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul Gortmaker 提交于
The Kconfig currently controlling compilation of tree.c is: init/Kconfig:config TREE_RCU init/Kconfig: bool ...and update.c and sync.c are "obj-y" meaning that none are ever built as a module by anyone. Since MODULE_ALIAS is a no-op for non-modular code, we can remove them from these files. We leave moduleparam.h behind since the files instantiate some boot time configuration parameters with module_param() still. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Jisheng Zhang 提交于
Commit abedf8e2 ("rcu: Use simple wait queues where possible in rcutree") converts Tree RCU's wait queues to simple wait queues, but it incorrectly reverts the commit 2aa792e6 ("rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads"). This can result in redundant self-wakeups. This commit therefore replaces the simple wait-queue wakeups with rcu_gp_kthread_wake(), thus avoiding the redundant wakeups. Signed-off-by: NJisheng Zhang <jszhang@marvell.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit improves the accuracy of the interaction between CPU hotplug operations and RCU's expedited grace periods by using RCU's online-CPU state to determine when failed IPIs should be retried. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The expedited RCU grace periods currently rely on a failure indication from smp_call_function_single() to determine that a given CPU is offline. This works after a fashion, but is more contorted and less precise than relying on RCU's internal state. This commit therefore takes a first step towards relying on internal state. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The expedited RCU CPU stall warnings currently responds to neither the panic_on_rcu_stall sysctl setting nor the rcupdate.rcu_cpu_stall_suppress kernel boot parameter. This commit therefore updates the expedited code to respond to these two controls. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Now that RCU expedited grace periods are always driven by a workqueue, there is no need to account for signal reception, and thus no need to disable expedited RCU CPU stall warnings due to signal reception. This commit therefore removes the signal-reception checks, leaving a WARN_ON() to catch possible future bugs. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The current implementation of expedited grace periods has the user task drive the grace period. This works, but has downsides: (1) The user task must awaken tasks piggybacking on this grace period, which can result in latencies rivaling that of the grace period itself, and (2) User tasks can receive signals, which interfere with RCU CPU stall warnings. This commit therefore uses workqueues to drive the grace periods, so that the user task need not do the awakening. A subsequent commit will remove the now-unnecessary code allowing for signals. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The functions synchronize_rcu_expedited() and synchronize_sched_expedited() have nearly identical code. This commit therefore consolidates this code into a new _synchronize_rcu_expedited() function. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 22 8月, 2016 1 次提交
-
-
由 Ding Tianhong 提交于
Carrying out the following steps results in a softlockup in the RCU callback-offload (rcuo) kthreads: 1. Connect to ixgbevf, and set the speed to 10Gb/s. 2. Use ifconfig to bring the nic up and down repeatedly. [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15] [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000 [ 368.106005] RIP: 0010:[<ffffffff81579e04>] [<ffffffff81579e04>] fib_table_lookup+0x14/0x390 [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286 [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001 [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00 [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000 [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58 [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0 [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000 [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0 [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 368.106005] Stack: [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00 [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146 [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0 [ 368.106005] Call Trace: [ 368.106005] <IRQ> [ 368.106005] [ 368.106005] [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0 [ 368.106005] [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110 [ 368.106005] [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0 [ 368.106005] [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350 [ 368.106005] [<ffffffff81537034>] ip_rcv+0x234/0x380 [ 368.106005] [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870 [ 368.106005] [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60 [ 368.106005] [<ffffffff814fe4de>] process_backlog+0xae/0x180 [ 368.106005] [<ffffffff814fdcb2>] net_rx_action+0x152/0x240 [ 368.106005] [<ffffffff81077b3f>] __do_softirq+0xef/0x280 [ 368.106005] [<ffffffff8161619c>] call_softirq+0x1c/0x30 [ 368.106005] <EOI> [ 368.106005] [ 368.106005] [<ffffffff81015d95>] do_softirq+0x65/0xa0 [ 368.106005] [<ffffffff81077174>] local_bh_enable+0x94/0xa0 [ 368.106005] [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370 [ 368.106005] [<ffffffff81098250>] ? wake_up_bit+0x30/0x30 [ 368.106005] [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40 [ 368.106005] [<ffffffff8109728f>] kthread+0xcf/0xe0 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 [ 368.106005] [<ffffffff816147d8>] ret_from_fork+0x58/0x90 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 ==================================cut here============================== It turns out that the rcuos callback-offload kthread is busy processing a very large quantity of RCU callbacks, and it is not reliquishing the CPU while doing so. This commit therefore adds an cond_resched_rcu_qs() within the loop to allow other tasks to run. Signed-off-by: NDing Tianhong <dingtianhong@huawei.com> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ] Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 18 8月, 2016 1 次提交
-
-
由 Peter Zijlstra 提交于
The current percpu-rwsem read side is entirely free of serializing insns at the cost of having a synchronize_sched() in the write path. The latency of the synchronize_sched() is too high for cgroups. The commit 1ed13287 talks about the write path being a fairly cold path but this is not the case for Android which moves task to the foreground cgroup and back around binder IPC calls from foreground processes to background processes, so it is significantly hotter than human initiated operations. Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the problem, hopefully it should not be that slow after another commit: 80127a39 ("locking/percpu-rwsem: Optimize readers and reduce global impact"). We could just add rcu_sync_enter() into cgroup_init() but we do not want another synchronize_sched() at boot time, so this patch adds the new helper which doesn't block but currently can only be called before the first use. Reported-by: NJohn Stultz <john.stultz@linaro.org> Reported-by: NDmitry Shmidt <dimitrysh@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Colin Cross <ccross@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rom Lemarchand <romlem@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@google.com> Link: http://lkml.kernel.org/r/20160811165413.GA22807@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 8月, 2016 1 次提交
-
-
由 Peter Zijlstra 提交于
Currently the percpu-rwsem switches to (global) atomic ops while a writer is waiting; which could be quite a while and slows down releasing the readers. This patch cures this problem by ordering the reader-state vs reader-count (see the comments in __percpu_down_read() and percpu_down_write()). This changes a global atomic op into a full memory barrier, which doesn't have the global cacheline contention. This also enables using the percpu-rwsem with rcu_sync disabled in order to bias the implementation differently, reducing the writer latency by adding some cost to readers. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org [ Fixed modular build. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 7月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
Straight forward conversion to the state machine. Though the question arises whether this needs really all these state transitions to work. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153337.982013161@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 6月, 2016 4 次提交
-
-
由 Mark Rutland 提交于
In many cases in the RCU tree code, we iterate over the set of cpus for a leaf node described by rcu_node::grplo and rcu_node::grphi, checking per-cpu data for each cpu in this range. However, if the set of possible cpus is sparse, some cpus described in this range are not possible, and thus no per-cpu region will have been allocated (or initialised) for them by the generic percpu code. Erroneous accesses to a per-cpu area for these !possible cpus may fault or may hit other data depending on the addressed generated when the erroneous per cpu offset is applied. In practice, both cases have been observed on arm64 hardware (the former being silent, but detectable with additional patches). To avoid issues resulting from this, we must iterate over the set of *possible* cpus for a given leaf node. This patch add a new helper, for_each_leaf_node_possible_cpu, to enable this. As iteration is often intertwined with rcu_node local bitmask manipulation, a new leaf_node_cpu_bit helper is added to make this simpler and more consistent. The RCU tree code is made to use both of these where appropriate. Without this patch, running reboot at a shell can result in an oops like: [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c [ 3369.083881] pgd = ffffffc3ecdda000 [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000 [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP [ 3369.101781] Modules linked in: [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G W 4.6.0+ #3 [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000 [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510 [ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5 [ 3369.146860] sp : ffffffc3eb9435a0 [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88 [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600 [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88 [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80 [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40 [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000 [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0 [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000 [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000 [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78 [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000 [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003 [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280 [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001 [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140 ... [ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510 [ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8 [ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30 [ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298 [ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0 [ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40 [ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158 [ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60 [ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918 [ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8 [ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220 [ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8 [ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40 [ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238 [ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358 [ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30 [ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280 [ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88 [ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20 [ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28 Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reported-by: NDennis Chen <dennis.chen@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
It is not always easy to determine the cause of an RCU stall just by analysing the RCU stall messages, mainly when the problem is caused by the indirect starvation of rcu threads. For example, when preempt_rcu is not awakened due to the starvation of a timer softirq. We have been hard coding panic() in the RCU stall functions for some time while testing the kernel-rt. But this is not possible in some scenarios, like when supporting customers. This patch implements the sysctl kernel.panic_on_rcu_stall. If set to 1, the system will panic() when an RCU stall takes place, enabling the capture of a vmcore. The vmcore provides a way to analyze all kernel/tasks states, helping out to point to the culprit and the solution for the stall. The kernel.panic_on_rcu_stall sysctl is disabled by default. Changes from v1: - Fixed a typo in the git log - The if(sysctl_panic_on_rcu_stall) panic() is in a static function - Fixed the CONFIG_TINY_RCU compilation issue - The var sysctl_panic_on_rcu_stall is now __read_mostly Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Reviewed-by: NArnaldo Carvalho de Melo <acme@kernel.org> Tested-by: N"Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
In the area in hot pursuit of a bug, so might as well clean it up. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Currently, if the very first call to call_rcu_tasks() has irqs disabled, it will create the rcu_tasks_kthread with irqs disabled, which will result in a splat in the memory allocator, which kthread_run() invokes with the expectation that irqs are enabled. This commit fixes this problem by deferring kthread creation if called with irqs disabled. The first call to call_rcu_tasks() that has irqs enabled will create the kthread. This bug was detected by rcutorture changes that were motivated by Iftekhar Ahmed's mutation-testing efforts. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 15 6月, 2016 9 次提交
-
-
由 Wei Yongjun 提交于
Fix to return a negative error code -ENOMEM from kcalloc() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Boqun Feng 提交于
0day found a boot warning triggered in rcu_perf_writer() on !SMP kernel: WARN_ON(rcu_gp_is_normal() && gp_exp); , the root cause of which is trying to measure expedited grace periods(by setting gp_exp to true by default) when all the grace periods are normal(TINY RCU only has normal grace periods). However, such a mis-setting would only result in failing to measure the performance for a specific kind of grace periods, therefore using a WARN_ON to check this is a little overkilling. We could handle this inside rcuperf module via some error messages to tell users about the mis-settings. Therefore this patch removes the WARN_ON in rcu_perf_writer() and handles those checkings in rcu_perf_init() with plain if() code. Moreover, this patch changes the default value of gp_exp to 1) align with rcutorture tests and 2) make the default setting work for all RCU implementations by default. Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NBoqun Feng <boqun.feng@gmail.com> Fixes: http://lkml.kernel.org/r/57411b10.mFvG0+AgcrMXGtcj%fengguang.wu@intel.comSigned-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit removes CONFIG_RCU_TORTURE_TEST_RUNNABLE in favor of the already-existing rcutorture.torture_runnable kernel boot parameter. It also converts an #ifdef into IS_ENABLED(), saving a few lines of code. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit applies the infamous IS_ENABLED() macro to eliminate a #ifdef. It also eliminates the RCU_PERF_TEST_RUNNABLE Kconfig option in favor of the already-existing rcuperf.perf_runnable kernel boot parameter. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
People have been having some difficulty finding their way around the RCU code. This commit therefore pulls some of the expedited grace-period code from tree_plugin.h to a new tree_exp.h file. This commit is strictly code movement. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
People have been having some difficulty finding their way around the RCU code. This commit therefore pulls some of the expedited grace-period code from tree.c to a new tree_exp.h file. This commit is strictly code movement, with the exception of a forward declaration that was added for the sync_sched_exp_online_cleanup() function. A subsequent commit will move the remaining expedited grace-period code from tree_plugin.h to tree_exp.h. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Peter Zijlstra 提交于
I think you'll find this condition is superfluous, as the whole function is under #ifdef of that same. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
In the past, RCU grace-period initialization excluded CPU-hotplug operations, but this is no longer the case. This commit therefore removed an outdated comment in rcu_gp_init() claiming that these are excluded. Reported-by: NLihao Liang <lihao.liang@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
The comment header for rcu_scheduler_active states that it is used to optimize synchronize_sched() at early boot. This is incorrect. The synchronize_sched() function instead checks the number of online CPUs. This commit therefore replaces the comment's synchronize_sched() with synchronize_rcu(), which really does use rcu_scheduler_active for this purpose. Reported-by: NLihao Liang <lihao.liang@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-