- 20 3月, 2015 1 次提交
-
-
由 Paul E. McKenney 提交于
As noted earlier, the following sequence of events can occur when running PREEMPT_RCU and HOTPLUG_CPU on a system with a multi-level rcu_node combining tree: 1. A group of tasks block on CPUs corresponding to a given leaf rcu_node structure while within RCU read-side critical sections. 2. All CPUs corrsponding to that rcu_node structure go offline. 3. The next grace period starts, but because there are still tasks blocked, the upper-level bits corresponding to this leaf rcu_node structure remain set. 4. All the tasks exit their RCU read-side critical sections and remove themselves from the leaf rcu_node structure's list, leaving it empty. 5. But because there now is code to check for this condition at force-quiescent-state time, the upper bits are cleared and the grace period completes. However, there is another complication that can occur following step 4 above: 4a. The grace period starts, and the leaf rcu_node structure's gp_tasks pointer is set to NULL because there are no tasks blocked on this structure. 4b. One of the CPUs corresponding to the leaf rcu_node structure comes back online. 4b. An endless stream of tasks are preempted within RCU read-side critical sections on this CPU, such that the ->blkd_tasks list is always non-empty. The grace period will never end. This commit therefore makes the force-quiescent-state processing check only for absence of tasks blocking the current grace period rather than absence of tasks altogether. This will cause a quiescent state to be reported if the current leaf rcu_node structure is not blocking the current grace period and its parent thinks that it is, regardless of how RCU managed to get itself into this state. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # 4.0.x Tested-by: NSasha Levin <sasha.levin@oracle.com>
-
- 13 3月, 2015 7 次提交
-
-
由 Paul E. McKenney 提交于
At grace-period initialization time, RCU checks that all quiescent states were really reported for the previous grace period. Now that grace-period cleanup has been split out of grace-period initialization, this commit also performs those checks at grace-period cleanup time. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
This commit informs RCU of an outgoing CPU just before that CPU invokes arch_cpu_idle_dead() during its last pass through the idle loop (via a new CPU_DYING_IDLE notifier value). This change means that RCU need not deal with outgoing CPUs passing through the scheduler after informing RCU that they are no longer online. Note that removing the CPU from the rcu_node ->qsmaskinit bit masks is done at CPU_DYING_IDLE time, and orphaning callbacks is still done at CPU_DEAD time, the reason being that at CPU_DEAD time we have another CPU that can adopt them. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
This commit uses a per-CPU variable to make the CPU-offline code path through the idle loop more precise, so that the outgoing CPU is guaranteed to make it into the idle loop before it is powered off. This commit is in preparation for putting the RCU offline-handling code on this code path, which will eliminate the magic one-jiffy wait that RCU uses as the maximum time for an outgoing CPU to get all the way through the scheduler. The magic one-jiffy wait for incoming CPUs remains a separate issue. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
Because that RCU grace-period initialization need no longer exclude CPU-hotplug operations, this commit eliminates the ->onoff_mutex and its uses. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
Races between CPU hotplug and grace periods can be difficult to resolve, so the ->onoff_mutex is used to exclude the two events. Unfortunately, this means that it is impossible for an outgoing CPU to perform the last bits of its offlining from its last pass through the idle loop, because sleeplocks cannot be acquired in that context. This commit avoids these problems by buffering online and offline events in a new ->qsmaskinitnext field in the leaf rcu_node structures. When a grace period starts, the events accumulated in this mask are applied to the ->qsmaskinit field, and, if needed, up the rcu_node tree. The special case of all CPUs corresponding to a given leaf rcu_node structure being offline while there are still elements in that structure's ->blkd_tasks list is handled using a new ->wait_blkd_tasks field. In this case, propagating the offline bits up the tree is deferred until the beginning of the grace period after all of the tasks have exited their RCU read-side critical sections and removed themselves from the list, at which point the ->wait_blkd_tasks flag is cleared. If one of that leaf rcu_node structure's CPUs comes back online before the list empties, then the ->wait_blkd_tasks flag is simply cleared. This of course means that RCU's notion of which CPUs are offline can be out of date. This is OK because RCU need only wait on CPUs that were online at the time that the grace period started. In addition, RCU's force-quiescent-state actions will handle the case where a CPU goes offline after the grace period starts. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
The rcu_report_unblock_qs_rnp() function is invoked when the last task blocking the current grace period exits its outermost RCU read-side critical section. Previously, this was called only from rcu_read_unlock_special(), and was therefore defined only when CONFIG_RCU_PREEMPT=y. However, this function will be invoked even when CONFIG_RCU_PREEMPT=n once CPU-hotplug operations are processed only at the beginnings of RCU grace periods. The reason for this change is that the last task on a given leaf rcu_node structure's ->blkd_tasks list might well exit its RCU read-side critical section between the time that recent CPU-hotplug operations were applied and when the new grace period was initialized. This situation could result in RCU waiting forever on that leaf rcu_node structure, because if all that structure's CPUs were already offline, there would be no quiescent-state events to drive that structure's part of the grace period. This commit therefore moves rcu_report_unblock_qs_rnp() to common code that is built unconditionally so that the quiescent-state-forcing code can clean up after this situation, avoiding the grace-period stall. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
Currently, the rcu_node tree ->expmask bitmasks are initially set to reflect the online CPUs. This is pointless, because only the CPUs preempted within RCU read-side critical sections by the preceding synchronize_sched_expedited() need to be tracked. This commit therefore instead sets up these bitmasks based on the state of the ->blkd_tasks lists. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 12 3月, 2015 8 次提交
-
-
由 Paul E. McKenney 提交于
Offline CPUs cannot safely invoke trace events, but such CPUs do execute within rcu_cpu_notify(). Therefore, this commit removes the trace events from rcu_cpu_notify(). These trace events are for utilization, against which rcu_cpu_notify() execution time should be negligible. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Grace-period initialization normally proceeds quite quickly, so that it is very difficult to reproduce races against grace-period initialization. This commit therefore allows grace-period initialization to be artificially slowed down, increasing race-reproduction probability. A pair of new Kconfig parameters are provided, CONFIG_RCU_TORTURE_TEST_SLOW_INIT to enable the slowdowns, and CONFIG_RCU_TORTURE_TEST_SLOW_INIT_DELAY to specify the number of jiffies of slowdown to apply. A boot-time parameter named rcutree.gp_init_delay allows boot-time delay to be specified. By default, no delay will be applied even if CONFIG_RCU_TORTURE_TEST_SLOW_INIT is set. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
If all CPUs have passed through quiescent states, then stalls might be due to starvation of the grace-period kthread or to failure to propagate the quiescent states up the rcu_node combining tree. The current stall warning messages do not differentiate, so this commit adds a printout of the root rcu_node structure's ->qsmask field. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
This commit eliminates a boolean and associated "if" statement by rearranging the code. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
Currently, both rcu_cleanup_dead_cpu() and rcu_send_cbs_to_orphanage() initialize the outgoing CPU's callback list. However, only rcu_cleanup_dead_cpu() invokes rcu_send_cbs_to_orphanage(), and it does so unconditionally, which means that only one of these initializations is required. This commit therefore consolidates the callback-list initialization with the rest of the callback handling in rcu_send_cbs_to_orphanage(). Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> -
由 Paul E. McKenney 提交于
RCU ignores offlined CPUs, so they cannot safely run RCU read-side code. (They -can- use SRCU, but not RCU.) This means that any use of RCU during or after the call to arch_cpu_idle_dead(). Unfortunately, commit 2ed53c0d added a complete() call, which will contain RCU read-side critical sections if there is a task waiting to be awakened. Which, as it turns out, there almost never is. In my qemu/KVM testing, the to-be-awakened task is not yet asleep more than 99.5% of the time. In current mainline, failure is even harder to reproduce, requiring a virtualized environment that delays the outgoing CPU by at least three jiffies between the time it exits its stop_machine() task at CPU_DYING time and the time it calls arch_cpu_idle_dead() from the idle loop. However, this problem really can occur, especially in virtualized environments, and therefore really does need to be fixed This suggests moving back to the polling loop, but using a much shorter wait, with gentle exponential backoff instead of the old 100-millisecond wait. Most of the time, the loop will exit without waiting at all, and almost all of the remaining uses will wait only five microseconds. If the outgoing CPU is preempted, a loop will wait one jiffy, then increase the wait by a factor of 11/10ths, rounding up. As before, there is a five-second timeout. This commit therefore provides common-code infrastructure to do the dying-to-surviving CPU handoff in a safe manner. This code also provides an indication at CPU-online of whether the CPU to be onlined previously timed out on offline. The new cpu_check_up_prepare() function returns -EBUSY if this CPU previously took more than five seconds to go offline, or -EAGAIN if it has not yet managed to go offline. The rationale for -EAGAIN is that it might still be preempted, so an additional wait might well find it correctly offlined. Architecture-specific code can decide how to handle these conditions. Systems in which CPUs take themselves completely offline might respond to an -EBUSY return as if it was a zero (success) return. Systems in which the surviving CPU must take some action might take it at this time, or might simply mark the other CPU as unusable. Note that architectures that take the easy way out and simply pass the -EBUSY and -EAGAIN upwards will change the sysfs API. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> [ paulmck: Fixed state machine for architectures that don't check earlier CPU-hotplug results as suggested by James Hogan. ]
-
- 20 2月, 2015 8 次提交
-
-
由 Colin Cross 提交于
On non-developer devices, kgdb prevents the device from rebooting after a panic. Incase of panics and exceptions, to allow the device to reboot, prevent entering debug mode to avoid getting stuck waiting for the user to interact with debugger. To avoid entering the debugger on panic/exception without any extra configuration, panic_timeout is being used which can be set via /proc/sys/kernel/panic at run time and CONFIG_PANIC_TIMEOUT sets the default value. Setting panic_timeout indicates that the user requested machine to perform unattended reboot after panic. We dont want to get stuck waiting for the user input incase of panic. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kgdb-bugreport@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Cc: Android Kernel Team <kernel-team@android.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: NColin Cross <ccross@android.com> [Kiran: Added context to commit message. panic_timeout is used instead of break_on_panic and break_on_exception to honor CONFIG_PANIC_TIMEOUT Modified the commit as per community feedback] Signed-off-by: NKiran Raparthy <kiran.kumar@linaro.org> Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Daniel Thompson 提交于
All current callers of kdb_getstr() can pass constant pointers via the prompt argument. This patch adds a const qualification to make explicit the fact that this is safe. Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Daniel Thompson 提交于
Currently kdb allows the output of comamnds to be filtered using the | grep feature. This is useful but does not permit the output emitted shortly after a string match to be examined without wading through the entire unfiltered output of the command. Such a feature is particularly useful to navigate function traces because these traces often have a useful trigger string *before* the point of interest. This patch reuses the existing filtering logic to introduce a simple forward search to kdb that can be triggered from the more prompt. Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Daniel Thompson 提交于
Currently when the "| grep" feature is used to filter the output of a command then the prompt is not displayed for the subsequent command. Likewise any characters typed by the user are also not echoed to the display. This rather disconcerting problem eventually corrects itself when the user presses Enter and the kdb_grepping_flag is cleared as kdb_parse() tries to make sense of whatever they typed. This patch resolves the problem by moving the clearing of this flag from the middle of command processing to the beginning. Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Daniel Thompson 提交于
Issuing a stack dump feels ergonomically wrong when entering due to NMI. Entering due to NMI is normally a reaction to a user request, either the NMI button on a server or a "magic knock" on a UART. Therefore the backtrace behaviour on entry due to NMI should be like SysRq-g (no stack dump) rather than like oops. Note also that the stack dump does not offer any information that cannot be trivial retrieved using the 'bt' command. Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Daniel Thompson 提交于
Currently when kdb traps printk messages then the raw log level prefix (consisting of '\001' followed by a numeral) does not get stripped off before the message is issued to the various I/O handlers supported by kdb. This causes annoying visual noise as well as causing problems grepping for ^. It is also a change of behaviour compared to normal usage of printk() usage. For example <SysRq>-h ends up with different output to that of kdb's "sr h". This patch addresses the problem by stripping log levels from messages before they are issued to the I/O handlers. printk() which can also act as an i/o handler in some cases is special cased; if the caller provided a log level then the prefix will be preserved when sent to printk(). The addition of non-printable characters to the output of kdb commands is a regression, albeit and extremely elderly one, introduced by commit 04d2c8c8 ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"). Note also that this patch does *not* restore the original behaviour from v3.5. Instead it makes printk() from within a kdb command display the message without any prefix (i.e. like printk() normally does). Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: stable@vger.kernel.org Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
There was a follow on replacement patch against the prior "kgdb: Timeout if secondary CPUs ignore the roundup". See: https://lkml.org/lkml/2015/1/7/442 This patch is the delta vs the patch that was committed upstream: * Fix an off-by-one error in kdb_cpu(). * Replace NR_CPUS with CONFIG_NR_CPUS to tell checkpatch that we really want a static limit. * Removed the "KGDB: " prefix from the pr_crit() in debug_core.c (kgdb-next contains a patch which introduced pr_fmt() to this file to the tag will now be applied automatically). Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jay Lan 提交于
The output of KDB 'summary' command should report MemTotal, MemFree and Buffers output in kB. Current codes report in unit of pages. A define of K(x) as is defined in the code, but not used. This patch would apply the define to convert the values to kB. Please include me on Cc on replies. I do not subscribe to linux-kernel. Signed-off-by: NJay Lan <jlan@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
- 18 2月, 2015 16 次提交
-
-
由 Peter Zijlstra 提交于
Setting the root group's cpu.rt_runtime_us to 0 is a bad thing; it would disallow the kernel creating RT tasks. One can of course still set it to 1, which will (likely) still wreck your kernel, but at least make it clear that setting it to 0 is not good. Collect both sanity checks into the one place while we're there. Suggested-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150209112715.GO24151@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Because task_group() uses a cache of autogroup_task_group(), whose output depends on sched_class, switching classes can generate problems. In particular, when started as fair, the cache points to the autogroup, so when switching to RT the tg_rt_schedulable() test fails for every cpu.rt_{runtime,period}_us change because now the autogroup has tasks and no runtime. Furthermore, going back to the previous semantics of varying task_group() with sched_class has the down-side that the sched_debug output varies as well, even though the task really is in the autogroup. Therefore add an autogroup exception to tg_has_rt_tasks() -- such that both (all) task_group() usages in sched/core now have one. And remove all the remnants of the variable task_group() output. Reported-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Stefan Bader <stefan.bader@canonical.com> Fixes: 8323f26c ("sched: Fix race in task_group()") Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org> -
由 Kirill Tkhai 提交于
update_curr_dl() needs actual rq clock. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1423040972.18770.10.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 John Stultz 提交于
Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb43 (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: NJosh Boyer <jwboyer@fedoraproject.org> Reported-by: NGeorge Joseph <george.joseph@fairview5.com> Tested-by: NGeorge Joseph <george.joseph@fairview5.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v3.19+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org [ Prettified the changelog and the comments a bit. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 NeilBrown 提交于
io_schedule() calls blk_flush_plug() which, depending on the contents of current->plug, can initiate arbitrary blk-io requests. Note that this contrasts with blk_schedule_flush_plug() which requires all non-trivial work to be handed off to a separate thread. This makes it possible for io_schedule() to recurse, and initiating block requests could possibly call mempool_alloc() which, in times of memory pressure, uses io_schedule(). Apart from any stack usage issues, io_schedule() will not behave correctly when called recursively as delayacct_blkio_start() does not allow for repeated calls. So: - use ->in_iowait to detect recursion. Set it earlier, and restore it to the old value. - move the call to "raw_rq" after the call to blk_flush_plug(). As this is some sort of per-cpu thing, we want some chance that we are on the right CPU - When io_schedule() is called recurively, use blk_schedule_flush_plug() which cannot further recurse. - as this makes io_schedule() a lot more complex and as io_schedule() must match io_schedule_timeout(), but all the changes in io_schedule_timeout() and make io_schedule a simple wrapper for that. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> [ Moved the now rudimentary io_schedule() into sched.h. ] Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Tony Battersby <tonyb@cybernetics.com> Link: http://lkml.kernel.org/r/20150213162600.059fffb2@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Commit de30ec47 "Remove unnecessary ->wait.lock serialization when reading completion state" was not correct, without lock/unlock the code like stop_machine_from_inactive_cpu() while (!completion_done()) cpu_relax(); can return before complete() finishes its spin_unlock() which writes to this memory. And spin_unlock_wait(). While at it, change try_wait_for_completion() to use READ_ONCE(). Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: NDavidlohr Bueso <dave@stgolabs.net> Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> [ Added a comment with the barrier. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: raghavendra.kt@linux.vnet.ibm.com Cc: waiman.long@hp.com Fixes: de30ec47 ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state") Link: http://lkml.kernel.org/r/20150212195913.GA30430@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
Since the function graph tracer needs to disable preemption, it might call preempt_schedule() after reenabling it if something triggered the need for rescheduling in between. Therefore we can't trace preempt_schedule() itself because we would face a function tracing recursion otherwise as the tracer is always called before PREEMPT_ACTIVE gets set to prevent that recursion. This is why preempt_schedule() is tagged as "notrace". But the same issue applies to every function called by preempt_schedule() before PREEMPT_ACTIVE is actually set. And preempt_schedule_common() is one such example. Unfortunately we forgot to tag it as notrace as well and as a result we are encountering tracing recursion since it got introduced by: a18b5d01 ("sched: Fix missing preemption opportunity") Let's fix that by applying the appropriate function tag to preempt_schedule_common(). Reported-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1424110807-15057-1-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
A deadline task may be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; Later the timer fires, but the task is still dequeued: dl_task_timer() enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */ Someone wakes it up: try_to_wake_up() enqueue_dl_entity() BUG_ON(on_dl_rq()) Patch fixes this problem, it prevents queueing !on_rq tasks on dl_rq. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> [ Wrote comment. ] Cc: Juri Lelli <juri.lelli@arm.com> Fixes: 1019a359 ("sched/deadline: Fix stale yield state") Link: http://lkml.kernel.org/r/1374601424090314@web4j.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org> -
由 Peter Zijlstra 提交于
Kirill reported that a dl task can be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; This invalidates the assumption from commit 0f397f2c ("sched/dl: Fix race in dl_task_timer()"): "The only reason we don't strictly need ->pi_lock now is because we're guaranteed to have p->state == TASK_RUNNING here and are thus free of ttwu races". And therefore we have to use the full task_rq_lock() here. This further amends the fact that we forgot to update the rq lock loop for TASK_ON_RQ_MIGRATE, from commit cca26e80 ("sched: Teach scheduler to understand TASK_ON_RQ_MIGRATING state"). Reported-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Link: http://lkml.kernel.org/r/20150217123139.GN5029@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org> -
由 Peter Zijlstra 提交于
There was a wee bit of confusion around the exact ordering here; clarify things. Reported-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20150217121258.GM5029@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the waiter to the waitqueue. Later, we try to remove it via remove_waiter() and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL pointer. ( Tested on v3.18-RT where rtmutex is used for regular mutex and I tried to get one twice in a row. ) Not sure when this started but I guess 397335f0 ("rtmutex: Fix deadlock detector for real") or commit 3d5c9340 ("rtmutex: Handle deadlock detection smarter"). Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # for v3.16 and later kernels Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kees Cook 提交于
The value resulting from the SECCOMP_RET_DATA mask could exceed MAX_ERRNO when setting errno during a SECCOMP_RET_ERRNO filter action. This makes sure we have a reliable value being set, so that an invalid errno will not be ignored by userspace. Signed-off-by: NKees Cook <keescook@chromium.org> Reported-by: NDmitry V. Levin <ldv@altlinux.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kiszka 提交于
This provides a reliable breakpoint target, required for automatic symbol loading via the gdb helper command 'lx-symbols'. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Geoff Levand 提交于
Simplify the code around one of the conditionals in the kexec_load syscall routine. The original code was confusing with a redundant check on KEXEC_ON_CRASH and comments outside of the conditional block. This change switches the order of the conditional check, and cleans up the comments for the conditional. There is no functional change to the code. Signed-off-by: NGeoff Levand <geoff@infradead.org> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Maximilian Attems <max@stro.at> Cc: Michal Marek <mmarek@suse.cz> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baoquan He 提交于
struct kimage has a member destination which is used to store the real destination address of each page when load segment from user space buffer to kernel. But we never retrieve the value stored in kimage->destination, so this member variable in kimage and its assignment operation are redundent code. I guess for_each_kimage_entry just does the work that kimage->destination is expected to do. So in this patch just make a cleanup to remove it. Signed-off-by: NBaoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-