- 06 8月, 2015 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
Hyper-V module needs to disable cpu hotplug (offlining) as there is no support from hypervisor side to reassign already opened event channels to a different CPU. Currently it is been done by altering smp_ops.cpu_disable but it is hackish. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Vitaly Kuznetsov 提交于
As a prerequisite to exporting cpu_hotplug_enable/cpu_hotplug_disable functions to modules we need to convert cpu_hotplug_disabled to a counter to properly support disable -> disable -> enable call sequences. E.g. after Hyper-V vmbus module (which is supposed to be the first user of exported cpu_hotplug_enable/cpu_hotplug_disable) did cpu_hotplug_disable() hibernate path calls disable_nonboot_cpus() and if we hit an error in _cpu_down() enable_nonboot_cpus() will be called on the failure path (thus making cpu_hotplug_disabled = 0 and leaving cpu hotplug in 'enabled' state). Same problem is possible if more than 1 module use cpu_hotplug_disable/cpu_hotplug_enable on their load/unload paths. When one of these modules is been unloaded it is logical to leave cpu hotplug in 'disabled' state. To support the change we need to increse cpu_hotplug_disabled counter in disable_nonboot_cpus() unconditionally as all users of disable_nonboot_cpus() are supposed to do enable_nonboot_cpus() in case an error was returned. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 15 7月, 2015 1 次提交
-
-
由 Thomas Gleixner 提交于
Boris reported that the sparse_irq protection around __cpu_up() in the generic code causes a regression on Xen. Xen allocates interrupts and some more in the xen_cpu_up() function, so it deadlocks on the sparse_irq_lock. There is no simple fix for this and we really should have the protection for all architectures, but for now the only solution is to move it to x86 where actual wreckage due to the lack of protection has been observed. Reported-and-tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Fixes: a8994181 'hotplug: Prevent alloc/free of irq descriptors during cpu up/down' Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: xen-devel <xen-devel@lists.xenproject.org>
-
- 08 7月, 2015 1 次提交
-
-
由 Thomas Gleixner 提交于
When a cpu goes up some architectures (e.g. x86) have to walk the irq space to set up the vector space for the cpu. While this needs extra protection at the architecture level we can avoid a few race conditions by preventing the concurrent allocation/free of irq descriptors and the associated data. When a cpu goes down it moves the interrupts which are targeted to this cpu away by reassigning the affinities. While this happens interrupts can be allocated and freed, which opens a can of race conditions in the code which reassignes the affinities because interrupt descriptors might be freed underneath. Example: CPU1 CPU2 cpu_up/down irq_desc = irq_to_desc(irq); remove_from_radix_tree(desc); raw_spin_lock(&desc->lock); free(desc); We could protect the irq descriptors with RCU, but that would require a full tree change of all accesses to interrupt descriptors. But fortunately these kind of race conditions are rather limited to a few things like cpu hotplug. The normal setup/teardown is very well serialized. So the simpler and obvious solution is: Prevent allocation and freeing of interrupt descriptors accross cpu hotplug. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
-
- 17 6月, 2015 1 次提交
-
-
由 Paul Gortmaker 提交于
We removed __cpuinit support (leaving no-op stubs) quite some time ago. However a new instance was added in commit 00df35f9 ("cpu: Defer smpboot kthread unparking until CPU known to scheduler") Since we want to clobber the stubs soon, get this removed now. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 28 5月, 2015 2 次提交
-
-
由 Paul Gortmaker 提交于
We removed __cpuinit support (leaving no-op stubs) quite some time ago. However a new instance was added in commit 00df35f9 ("cpu: Defer smpboot kthread unparking until CPU known to scheduler") Since we want to clobber the stubs soon, get this removed now. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Commit 00df35f9 (cpu: Defer smpboot kthread unparking until CPU known to scheduler) put the online path's call to smpboot_unpark_threads() into a CPU-hotplug notifier. This commit places the offline-failure paths call into the same notifier for the sake of uniformity. Note that it is not currently possible to place the offline path's call to smpboot_park_threads() into an existing notifier because the CPU_DYING notifiers run in a restricted environment, and the CPU_UP_PREPARE notifiers run too soon. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 13 4月, 2015 1 次提交
-
-
由 Paul E. McKenney 提交于
Currently, smpboot_unpark_threads() is invoked before the incoming CPU has been added to the scheduler's runqueue structures. This might potentially cause the unparked kthread to run on the wrong CPU, since the correct CPU isn't fully set up yet. That causes a sporadic, hard to debug boot crash triggering on some systems, reported by Borislav Petkov, and bisected down to: 2a442c9c ("x86: Use common outgoing-CPU-notification code") This patch places smpboot_unpark_threads() in a CPU hotplug notifier with priority set so that these kthreads are unparked just after the CPU has been added to the runqueues. Reported-and-tested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 4月, 2015 2 次提交
-
-
由 Thomas Gleixner 提交于
clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the cleanup function for a dead cpu and invoke it directly from the cpu down code. Make it conditional on CPU_HOTPLUG as well. Temporary change, will be refined in the future. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> [ Rebased, added clockevents_notify() removal ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
clockevents_notify() is a leftover from the early design of the clockevents facility. It's really not a notification mechanism, it's a multiplex call. We are way better off to have explicit calls instead of this monstrosity. Split out the tick_handover call and invoke it explicitely from the hotplug code. Temporary solution will be cleaned up in later patches. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> [ Rebase ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 4月, 2015 1 次提交
-
-
由 Preeti U Murthy 提交于
It was found when doing a hotplug stress test on POWER, that the machine either hit softlockups or rcu_sched stall warnings. The issue was traced to commit: 7cba160a ("powernv/cpuidle: Redesign idle states management") which exposed the cpu_down() race with hrtimer based broadcast mode: 5d1638ac ("tick: Introduce hrtimer based broadcast") The race is the following: Assume CPU1 is the CPU which holds the hrtimer broadcasting duty before it is taken down. CPU0 CPU1 cpu_down() take_cpu_down() disable_interrupts() cpu_die() while (CPU1 != CPU_DEAD) { msleep(100); switch_to_idle(); stop_cpu_timer(); schedule_broadcast(); } tick_cleanup_cpu_dead() take_over_broadcast() So after CPU1 disabled interrupts it cannot handle the broadcast hrtimer anymore, so CPU0 will be stuck forever. Fix this by explicitly taking over broadcast duty before cpu_die(). This is a temporary workaround. What we really want is a callback in the clockevent device which allows us to do that from the dying CPU by pushing the hrtimer onto a different cpu. That might involve an IPI and is definitely more complex than this immediate fix. Changelog was picked up from: https://lkml.org/lkml/2015/2/16/213Suggested-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NPreeti U. Murthy <preeti@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: nicolas.pitre@linaro.org Cc: peterz@infradead.org Cc: rjw@rjwysocki.net Fixes: http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html Link: http://lkml.kernel.org/r/20150330092410.24979.59887.stgit@preeti.in.ibm.com [ Merged it to the latest timer tree, renamed the callback, tidied up the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 3月, 2015 1 次提交
-
-
由 Paul E. McKenney 提交于
This commit uses a per-CPU variable to make the CPU-offline code path through the idle loop more precise, so that the outgoing CPU is guaranteed to make it into the idle loop before it is powered off. This commit is in preparation for putting the RCU offline-handling code on this code path, which will eliminate the magic one-jiffy wait that RCU uses as the maximum time for an outgoing CPU to get all the way through the scheduler. The magic one-jiffy wait for incoming CPUs remains a separate issue. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 07 1月, 2015 1 次提交
-
-
由 David Hildenbrand 提交于
Commit b2c4623d ("rcu: More on deadlock between CPU hotplug and expedited grace periods") introduced another problem that can easily be reproduced by starting/stopping cpus in a loop. E.g.: for i in `seq 5000`; do echo 1 > /sys/devices/system/cpu/cpu1/online echo 0 > /sys/devices/system/cpu/cpu1/online done Will result in: INFO: task /cpu_start_stop:1 blocked for more than 120 seconds. Call Trace: ([<00000000006a028e>] __schedule+0x406/0x91c) [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4 [<0000000000130ff6>] _cpu_up+0x3e/0x1c4 [<0000000000131232>] cpu_up+0xb6/0xd4 [<00000000004a5720>] device_online+0x80/0xc0 [<00000000004a57f0>] online_store+0x90/0xb0 ... And a deadlock. Problem is that if the last ref in put_online_cpus() can't get the cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer might never be woken up, therefore never exiting the loop in cpu_hotplug_begin(). This fix removes puts_pending and turns refcount into an atomic variable. We also introduce a wait queue for the active_writer, to avoid possible races and use-after-free. There is no need to take the lock in put_online_cpus() anymore. Can't reproduce it with this fix. Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 04 11月, 2014 1 次提交
-
-
由 Paul E. McKenney 提交于
A long string of get_online_cpus() with each followed by a put_online_cpu() that fails to acquire cpu_hotplug.lock can result in overflow of the cpu_hotplug.puts_pending counter. Although this is perhaps improbably, a system with absolutely no CPU-hotplug operations will have an arbitrarily long time in which this overflow could occur. This commit therefore adds overflow checks to get_online_cpus() and try_get_online_cpus(). Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NPranith Kumar <bobby.prani@gmail.com>
-
- 23 10月, 2014 1 次提交
-
-
由 Paul E. McKenney 提交于
Commit dd56af42 (rcu: Eliminate deadlock between CPU hotplug and expedited grace periods) was incomplete. Although it did eliminate deadlocks involving synchronize_sched_expedited()'s acquisition of cpu_hotplug.lock via get_online_cpus(), it did nothing about the similar deadlock involving acquisition of this same lock via put_online_cpus(). This deadlock became apparent with testing involving hibernation. This commit therefore changes put_online_cpus() acquisition of this lock to be conditional, and increments a new cpu_hotplug.puts_pending field in case of acquisition failure. Then cpu_hotplug_begin() checks for this new field being non-zero, and applies any changes to cpu_hotplug.refcount. Reported-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: NJiri Kosina <jkosina@suse.cz> Tested-by: NBorislav Petkov <bp@suse.de>
-
- 19 9月, 2014 1 次提交
-
-
由 Paul E. McKenney 提交于
Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: NLan Tianyu <tianyu.lan@intel.com>
-
- 05 7月, 2014 1 次提交
-
-
由 Kirill Tkhai 提交于
1) Iterate thru all of threads in the system. Check for all threads, not only for group leaders. 2) Check for p->on_rq instead of p->state and cputime. Preempted task in !TASK_RUNNING state OR just created task may be queued, that we want to be reported too. 3) Use read_lock() instead of write_lock(). This function does not change any structures, and read_lock() is enough. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ben Segall <bsegall@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Konstantin Khorenko <khorenko@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Todd E Brandt <todd.e.brandt@linux.intel.com> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1403684395.3462.44.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 6月, 2014 1 次提交
-
-
由 Todd E Brandt 提交于
Adds trace events that give finer resolution into suspend/resume. These events are graphed in the timelines generated by the analyze_suspend.py script. They represent large areas of time consumed that are typical to suspend and resume. The event is triggered by calling the function "trace_suspend_resume" with three arguments: a string (the name of the event to be displayed in the timeline), an integer (case specific number, such as the power state or cpu number), and a boolean (where true is used to denote the start of the timeline event, and false to denote the end). The suspend_resume trace event reproduces the data that the machine_suspend trace event did, so the latter has been removed. Signed-off-by: NTodd Brandt <todd.e.brandt@intel.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 05 6月, 2014 1 次提交
-
-
由 Fabian Frederick 提交于
no level printk converted to pr_warn (if err) no level printk converted to pr_info (disabling non-boot cpus) Other printk converted to respective level. Signed-off-by: NFabian Frederick <fabf@skynet.be> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 5月, 2014 1 次提交
-
-
由 Lai Jiangshan 提交于
Lai found that: WARNING: CPU: 1 PID: 13 at arch/x86/kernel/smp.c:124 native_smp_send_reschedule+0x2d/0x4b() ... migration_cpu_stop+0x1d/0x22 was caused by set_cpus_allowed_ptr() assuming that cpu_active_mask is always a sub-set of cpu_online_mask. This isn't true since 5fbd036b ("sched: Cleanup cpu_active madness"). So set active and online at the same time to avoid this particular problem. Fixes: 5fbd036b ("sched: Cleanup cpu_active madness") Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/53758B12.8060609@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 3月, 2014 2 次提交
-
-
由 Srivatsa S. Bhat 提交于
The following method of CPU hotplug callback registration is not safe due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock and the cpu_hotplug.lock. get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); The deadlock is shown below: CPU 0 CPU 1 ----- ----- Acquire cpu_hotplug.lock [via get_online_cpus()] CPU online/offline operation takes cpu_add_remove_lock [via cpu_maps_update_begin()] Try to acquire cpu_add_remove_lock [via register_cpu_notifier()] CPU online/offline operation tries to acquire cpu_hotplug.lock [via cpu_hotplug_begin()] *** DEADLOCK! *** The problem here is that callback registration takes the locks in one order whereas the CPU hotplug operations take the same locks in the opposite order. To avoid this issue and to provide a race-free method to register CPU hotplug callbacks (along with initialization of already online CPUs), introduce new variants of the callback registration APIs that simply register the callbacks without holding the cpu_add_remove_lock during the registration. That way, we can avoid the ABBA scenario. However, we will need to hold the cpu_add_remove_lock throughout the entire critical section, to protect updates to the callback/notifier chain. This can be achieved by writing the callback registration code as follows: cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ] for_each_online_cpu(cpu) init_cpu(cpu); /* This doesn't take the cpu_add_remove_lock */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ] Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin() because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD notifiers, and hence get_online_cpus() cannot provide the necessary synchronization to protect the callback/notifier chains against concurrent reads and writes. On the other hand, since the cpu_add_remove_lock protects the entire hotplug operation (including CPU_POST_DEAD), we can use cpu_maps_update_begin/done() to guarantee proper synchronization. Also, since cpu_maps_update_begin/done() is like a super-set of get/put_online_cpus(), the former naturally protects the critical sections from concurrent hotplug operations. Since the names cpu_maps_update_begin/done() don't make much sense in CPU hotplug callback registration scenarios, we'll introduce new APIs named cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done(). In summary, introduce the lockless variants of un/register_cpu_notifier() and also export the cpu_notifier_register_begin/done() APIs for use by modules. This way, we provide a race-free way to register hotplug callbacks as well as perform initialization for the CPUs that are already online. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Gautham R. Shenoy 提交于
Add lockdep annotations for get/put_online_cpus() and cpu_hotplug_begin()/cpu_hotplug_end(). Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 13 11月, 2013 2 次提交
-
-
由 Michael wang 提交于
Commit 6acce3ef: sched: Remove get_online_cpus() usage tries to do sync_sched/rcu() inside _cpu_down() but triggers: INFO: task swapper/0:1 blocked for more than 120 seconds. ... [<ffffffff811263dc>] synchronize_rcu+0x2c/0x30 [<ffffffff81d1bd82>] _cpu_down+0x2b2/0x340 ... It was caused by that in the rcu boost case we rely on smpboot thread to finish the rcu callback, which has already been parked before sync in here and leads to the endless sync_sched/rcu(). This patch exchanges the sequence of smpboot_park_threads() and sync_sched/rcu() to fix the bug. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Tested-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5282EDC0.6060003@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call mem_online_node() to put its node online if offlined and then call build_all_zonelists() to initialize the zone list. These steps are specific to memory hotplug, and should be managed in mm/memory_hotplug.c. lock_memory_hotplug() should also be held for the whole steps. For this reason, this patch replaces mem_online_node() with try_online_node(), which performs the whole steps with lock_memory_hotplug() held. try_online_node() is named after try_offline_node() as they have similar purpose. There is no functional change in this patch. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 10月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Remove get_online_cpus() usage from the scheduler; there's 4 sites that use it: - sched_init_smp(); where its completely superfluous since we're in 'early' boot and there simply cannot be any hotplugging. - sched_getaffinity(); we already take a raw spinlock to protect the task cpus_allowed mask, this disables preemption and therefore also stabilizes cpu_online_mask as that's modified using stop_machine. However switch to active mask for symmetry with sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active mask stability by inserting sync_rcu/sched() into _cpu_down. - sched_setaffinity(); we don't appear to need get_online_cpus() either, there's two sites where hotplug appears relevant: * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask, for the cpuset case we hold task_lock, which is a spinlock and thus for mainline disables preemption (might cause pain on RT). * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has preemption properly disabled; also it already deals with hotplug races explicitly where it releases them. - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for us with a little trickery. By adding a sync_sched/rcu() after the CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for cpu_active_mask. Use these to validate that both our cpus are active when queueing the stop work before we queue the stop_machine works for take_cpu_down(). Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 8月, 2013 1 次提交
-
-
由 Toshi Kani 提交于
CPU system maps are protected with reader/writer locks. The reader lock, get_online_cpus(), assures that the maps are not updated while holding the lock. The writer lock, cpu_hotplug_begin(), is used to udpate the cpu maps along with cpu_maps_update_begin(). However, the ACPI processor handler updates the cpu maps without holding the the writer lock. acpi_map_lsapic() is called from acpi_processor_hotadd_init() to update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic() is called from acpi_processor_remove() to update cpu_possible_mask. Currently, they are either unprotected or protected with the reader lock, which is not correct. For example, the get_online_cpus() below is supposed to assure that cpu_possible_mask is not changed while the code is iterating with for_each_possible_cpu(). get_online_cpus(); for_each_possible_cpu(cpu) { : } put_online_cpus(); However, this lock has no protection with CPU hotplug since the ACPI processor handler does not use the writer lock when it updates cpu_possible_mask. The reader lock does not serialize within the readers. This patch protects them with the writer lock with cpu_hotplug_begin() along with cpu_maps_update_begin(), which must be held before calling cpu_hotplug_begin(). It also protects arch_register_cpu() / arch_unregister_cpu(), which creates / deletes a sysfs cpu device interface. For this purpose it changes cpu_hotplug_begin() and cpu_hotplug_done() to global and exports them in cpu.h. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 15 7月, 2013 1 次提交
-
-
由 Paul Gortmaker 提交于
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the uses of the __cpuinit macros from C files in the core kernel directories (kernel, init, lib, mm, and include) that don't really have a specific maintainer. [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 13 6月, 2013 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
There are instances in the kernel where we would like to disable CPU hotplug (from sysfs) during some important operation. Today the freezer code depends on this and the code to do it was kinda tailor-made for that. Restructure the code and make it generic enough to be useful for other usecases too. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NRobin Holt <holt@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russ Anderson <rja@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 2月, 2013 1 次提交
-
-
由 Thomas Gleixner 提交于
Use the smpboot thread infrastructure. Mark the stopper thread selfparking and park it after it has finished the take_cpu_down() work. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Richard Weinberger <rw@linutronix.de> Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 28 1月, 2013 1 次提交
-
-
由 Frederic Weisbecker 提交于
This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 15 11月, 2012 2 次提交
-
-
由 Yasuaki Ishimatsu 提交于
Even if acpi_processor_handle_eject() offlines cpu, there is a chance to online the cpu after that. So the patch closes the window by using get/put_online_cpus(). Why does the patch change _cpu_up() logic? The patch cares the race of hot-remove cpu and _cpu_up(). If the patch does not change it, there is the following race. hot-remove cpu | _cpu_up() ------------------------------------- ------------------------------------ call acpi_processor_handle_eject() | call cpu_down() | call get_online_cpus() | | call cpu_hotplug_begin() and stop here call arch_unregister_cpu() | call acpi_unmap_lsapic() | call put_online_cpus() | | start and continue _cpu_up() return acpi_processor_remove() | continue hot-remove the cpu | So _cpu_up() can continue to itself. And hot-remove cpu can also continue itself. If the patch changes _cpu_up() logic, the race disappears as below: hot-remove cpu | _cpu_up() ----------------------------------------------------------------------- call acpi_processor_handle_eject() | call cpu_down() | call get_online_cpus() | | call cpu_hotplug_begin() and stop here call arch_unregister_cpu() | call acpi_unmap_lsapic() | cpu's cpu_present is set | to false by set_cpu_present()| call put_online_cpus() | | start _cpu_up() | check cpu_present() and return -EINVAL return acpi_processor_remove() | continue hot-remove the cpu | Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Fenghua Yu 提交于
cpu_hotplug_pm_callback should have higher priority than bsp_pm_callback which depends on cpu_hotplug_pm_callback to disable cpu hotplug to avoid race during bsp online checking. This is to hightlight the priorities between the two callbacks in case people may overlook the order. Ideally the priorities should be defined in macro/enum instead of fixed values. To do that, a seperate patchset may be pushed which will touch serveral other generic files and is out of scope of this patchset. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-7-git-send-email-fenghua.yu@intel.comReviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 09 10月, 2012 1 次提交
-
-
由 Srivatsa S. Bhat 提交于
The synchronization between CPU hotplug readers and writers is achieved by means of refcounting, safeguarded by the cpu_hotplug.lock. get_online_cpus() increments the refcount, whereas put_online_cpus() decrements it. If we ever hit an imbalance between the two, we end up compromising the guarantees of the hotplug synchronization i.e, for example, an extra call to put_online_cpus() can end up allowing a hotplug reader to execute concurrently with a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases where the refcount can go negative, and also attempt to fix it up, so that we can continue to run. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 8月, 2012 1 次提交
-
-
由 Rusty Russell 提交于
We still patch SMP instructions to UP variants if we boot with a single CPU, but not at any other time. In particular, not if we unplug CPUs to return to a single cpu. Paul McKenney points out: mean offline overhead is 6251/48=130.2 milliseconds. If I remove the alternatives_smp_switch() from the offline path [...] the mean offline overhead is 550/42=13.1 milliseconds Basically, we're never going to get those 120ms back, and the code is pretty messy. We get rid of: 1) The "smp-alt-once" boot option. It's actually "smp-alt-boot", the documentation is wrong. It's now the default. 2) The skip_smp_alternatives flag used by suspend. 3) arch_disable_nonboot_cpus_begin() and arch_disable_nonboot_cpus_end() which were only used to set this one flag. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Paul McKenney <paul.mckenney@us.ibm.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/87vcgwwive.fsf@rustcorp.com.auSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 8月, 2012 1 次提交
-
-
由 Thomas Gleixner 提交于
Provide a generic interface for setting up and tearing down percpu threads. On registration the threads for already online cpus are created and started. On deregistration (modules) the threads are stoppped. During hotplug operations the threads are created, started, parked and unparked. The datastructure for registration provides a pointer to percpu storage space and optional setup, cleanup, park, unpark functions. These functions are called when the thread state changes. Each implementation has to provide a function which is queried and returns whether the thread should run and the thread function itself. The core code handles all state transitions and avoids duplicated code in the call sites. [ paulmck: Preemption leak fix ] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20120716103948.352501068@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 01 8月, 2012 1 次提交
-
-
由 Jiang Liu 提交于
When hotadd_new_pgdat() is called to create new pgdat for a new node, a fallback zonelist should be created for the new node. There's code to try to achieve that in hotadd_new_pgdat() as below: /* * The node we allocated has no zone fallback lists. For avoiding * to access not-initialized zonelist, build here. */ mutex_lock(&zonelists_mutex); build_all_zonelists(pgdat, NULL); mutex_unlock(&zonelists_mutex); But it doesn't work as expected. When hotadd_new_pgdat() is called, the new node is still in offline state because node_set_online(nid) hasn't been called yet. And build_all_zonelists() only builds zonelists for online nodes as: for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); build_zonelists(pgdat); build_zonelist_cache(pgdat); } Though we hope to create zonelist for the new pgdat, but it doesn't. So add a new parameter "pgdat" the build_all_zonelists() to build pgdat for the new pgdat too. Signed-off-by: NJiang Liu <liuj97@gmail.com> Signed-off-by: NXishi Qiu <qiuxishi@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Keping Chen <chenkeping@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 6月, 2012 2 次提交
-
-
由 Anton Vorontsov 提交于
Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check: the function is only suitable for offlined CPUs, and if called inappropriately, the kernel should scream aloud. [akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols] Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Vorontsov 提交于
Many architectures clear tasks' mm_cpumask like this: read_lock(&tasklist_lock); for_each_process(p) { if (p->mm) cpumask_clear_cpu(cpu, mm_cpumask(p->mm)); } read_unlock(&tasklist_lock); Depending on the context, the code above may have several problems, such as: 1. Working with task->mm w/o getting mm or grabing the task lock is dangerous as ->mm might disappear (exit_mm() assigns NULL under task_lock(), so tasklist lock is not enough). 2. Checking for process->mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. This patch implements a small helper function that does things correctly, i.e.: 1. We take the task's lock while whe handle its mm (we can't use get_task_mm()/mmput() pair as mmput() might sleep); 2. To catch exited main thread case, we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with task lock held). Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in the new helper, instead we take the rcu read lock. We can do this because the function is called after the cpu is taken down and marked offline, so no new tasks will get this cpu set in their mm mask. Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 5月, 2012 1 次提交
-
-
由 Suresh Siddha 提交于
percpu areas are already allocated during boot for each possible cpu. percpu idle threads can be considered as an extension of the percpu areas, and allocate them for each possible cpu during boot. This will eliminate the need for workqueue based idle thread allocation. In future we can move the idle thread area into the percpu area too. [ tglx: Moved the loop into smpboot.c and added an error check when the init code failed to allocate an idle thread for a cpu which should be onlined ] Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: venki@google.com Link: http://lkml.kernel.org/r/1334966930.28674.245.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 26 4月, 2012 1 次提交
-
-
由 Thomas Gleixner 提交于
All SMP architectures have magic to fork the idle task and to store it for reusage when cpu hotplug is enabled. Provide a generic infrastructure for it. Create/reinit the idle thread for the cpu which is brought up in the generic code and hand the thread pointer to the architecture code via __cpu_up(). Note, that fork_idle() is called via a workqueue, because this guarantees that the idle thread does not get a reference to a user space VM. This can happen when the boot process did not bring up all possible cpus and a later cpu_up() is initiated via the sysfs interface. In that case fork_idle() would be called in the context of the user space task and take a reference on the user space VM. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Cc: x86@kernel.org Acked-by: NVenkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de
-