- 28 10月, 2014 7 次提交
-
-
由 Wanpeng Li 提交于
Use nr_cpus_allowed to bail from select_task_rq() when only one cpu can be used, and saves some cycles for pinned tasks. Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413253360-5318-2-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wanpeng Li 提交于
There is no need to do balance during fork since SCHED_DEADLINE tasks can't fork. This patch avoid the SD_BALANCE_FORK check. Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1413253360-5318-1-git-send-email-wanpeng.li@linux.intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juri Lelli 提交于
Exclusive cpusets are the only way users can restrict SCHED_DEADLINE tasks affinity (performing what is commonly called clustered scheduling). Unfortunately, such thing is currently broken for two reasons: - No check is performed when the user tries to attach a task to an exlusive cpuset (recall that exclusive cpusets have an associated maximum allowed bandwidth). - Bandwidths of source and destination cpusets are not correctly updated after a task is migrated between them. This patch fixes both things at once, as they are opposite faces of the same coin. The check is performed in cpuset_can_attach(), as there aren't any points of failure after that function. The updated is split in two halves. We first reserve bandwidth in the destination cpuset, after we pass the check in cpuset_can_attach(). And we then release bandwidth from the source cpuset when the task's affinity is actually changed. Even if there can be time windows when sched_setattr() may erroneously fail in the source cpuset, we are fine with it, as we can't perfom an atomic update of both cpusets at once. Reported-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Reported-by: NVincent Legout <vincent@legout.info> Signed-off-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Michael Trimarchi <michael@amarulasolutions.com> Cc: Fabio Checconi <fchecconi@gmail.com> Cc: michael@amarulasolutions.com Cc: luca.abeni@unitn.it Cc: Li Zefan <lizefan@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: cgroups@vger.kernel.org Link: http://lkml.kernel.org/r/1411118561-26323-3-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wanpeng Li 提交于
As Kirill mentioned (https://lkml.org/lkml/2013/1/29/118): | If rq has already had 2 or more pushable tasks and we try to add a | pinned task then call of push_rt_task will just waste a time. Just switched pinned task is not able to be pushed. If the rq has had several dl tasks before they have already been considered as candidates to be pushed (or pulled). This patch implements the same behavior as rt class which introduced by commit 10447917 ("sched/rt: Do not try to push tasks if pinned task switches to RT"). Suggested-by: NKirill V Tkhai <tkhai@yandex.ru> Acked-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413938203-224610-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
1) switched_to_dl() check is wrong. We reschedule only if rq->curr is deadline task, and we do not reschedule if it's a lower priority task. But we must always preempt a task of other classes. 2) dl_task_timer(): Policy does not change in case of priority inheritance. rt_mutex_setprio() changes prio, while policy remains old. So we lose some balancing logic in dl_task_timer() and switched_to_dl() when we check policy instead of priority. Boosted task may be rq->curr. (I didn't change switched_from_dl() because no check is necessary there at all). I've looked at this place(switched_to_dl) several times and even fixed this function, but found just now... I suppose some performance tests may work better after this. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1413909356.19914.128.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juri Lelli 提交于
dl_task_timer() is racy against several paths. Daniel noticed that the replenishment timer may experience a race condition against an enqueue_dl_entity() called from rt_mutex_setprio(). With his own words: rt_mutex_setprio() resets p->dl.dl_throttled. So the pattern is: start_dl_timer() throttled = 1, rt_mutex_setprio() throlled = 0, sched_switch() -> enqueue_task(), dl_task_timer-> enqueue_task() throttled is 0 => BUG_ON(on_dl_rq(dl_se)) fires as the scheduling entity is already enqueued on the -deadline runqueue. As we do for the other races, we just bail out in the replenishment timer code. Reported-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: vincent@legout.info Cc: Dario Faggioli <raistlin@linux.it> Cc: Michael Trimarchi <michael@amarulasolutions.com> Cc: Fabio Checconi <fchecconi@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414142198-18552-5-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juri Lelli 提交于
In the deboost path, right after the dl_boosted flag has been reset, we can currently end up replenishing using -deadline parameters of a !SCHED_DEADLINE entity. This of course causes a bug, as those parameters are empty. In the case depicted above it is safe to simply bail out, as the deboosted task is going to be back to its original scheduling class anyway. Reported-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: vincent@legout.info Cc: Dario Faggioli <raistlin@linux.it> Cc: Michael Trimarchi <michael@amarulasolutions.com> Cc: Fabio Checconi <fchecconi@gmail.com> Link: http://lkml.kernel.org/r/1414142198-18552-4-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 9月, 2014 2 次提交
-
-
由 Juri Lelli 提交于
Users can perform clustered scheduling using the cpuset facility. After an exclusive cpuset is created, task migrations happen only between CPUs belonging to the same cpuset. Inter- cpuset migrations can only happen when the user requires so, moving a task between different cpusets. This behaviour is broken in SCHED_DEADLINE, as currently spurious inter- cpuset migration may happen without user intervention. This patch fix the problem (and shuffles the code a bit to improve clarity). Signed-off-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: raistlin@linux.it Cc: michael@amarulasolutions.com Cc: fchecconi@gmail.com Cc: daniel.wagner@bmw-carit.de Cc: vincent@legout.info Cc: luca.abeni@unitn.it Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1411118561-26323-4-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juri Lelli 提交于
When a task is using SCHED_DEADLINE and the user setschedules it to a different class its sched_dl_entity static parameters are not cleaned up. This causes a bug if the user sets it back to SCHED_DEADLINE with the same parameters again. The problem resides in the check we perform at the very beginning of dl_overflow(): if (new_bw == p->dl.dl_bw) return 0; This condition is met in the case depicted above, so the function returns and dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this, admission control is broken. This patch fixes the thing, properly clearing static parameters for a task that ceases to use SCHED_DEADLINE. Reported-by: NDaniele Alessandrelli <daniele.alessandrelli@gmail.com> Reported-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Reported-by: NVincent Legout <vincent@legout.info> Tested-by: NLuca Abeni <luca.abeni@unitn.it> Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Tested-by: NVincent Legout <vincent@legout.info> Signed-off-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Fabio Checconi <fchecconi@gmail.com> Cc: Dario Faggioli <raistlin@linux.it> Cc: Michael Trimarchi <michael@amarulasolutions.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1411118561-26323-2-git-send-email-juri.lelli@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 9月, 2014 1 次提交
-
-
由 Kirill Tkhai 提交于
1) Nobody calls pick_dl_task() with negative cpu, it's old RT leftover. 2) If p->nr_cpus_allowed is 1, than the affinity has just been changed in set_cpus_allowed_ptr(); we'll pick it just earlier than migration thread. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1410529340.3569.27.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 9月, 2014 1 次提交
-
-
由 xiaofeng.yan 提交于
An overrun could happen in function start_hrtick_dl() when a task with SCHED_DEADLINE runs in the microseconds range. For example, if a task with SCHED_DEADLINE has the following parameters: Task runtime deadline period P1 200us 500us 500us The deadline and period from task P1 are less than 1ms. In order to achieve microsecond precision, we need to enable HRTICK feature by the next command: PC#echo "HRTICK" > /sys/kernel/debug/sched_features PC#trace-cmd record -e sched_switch & PC#./schedtool -E -t 200000:500000:500000 -e ./test The binary test is in an endless while(1) loop here. Some pieces of trace.dat are as follows: <idle>-0 157.603157: sched_switch: :R ==> 2481:4294967295: test test-2481 157.603203: sched_switch: 2481:R ==> 0:120: swapper/2 <idle>-0 157.605657: sched_switch: :R ==> 2481:4294967295: test test-2481 157.608183: sched_switch: 2481:R ==> 2483:120: trace-cmd trace-cmd-2483 157.609656: sched_switch:2483:R==>2481:4294967295: test We can get the runtime of P1 from the information above: runtime = 157.608183 - 157.605657 runtime = 0.002526(2.526ms) The correct runtime should be less than or equal to 200us at some point. The problem is caused by a conditional judgment "delta > 10000" in function start_hrtick_dl(). Because no hrtimer start up to control the rest of runtime when the reset of runtime is less than 10us. So the process will continue to run until tick-period is coming. Move the code with the limit of the least time slice from hrtick_start_fair() to hrtick_start() because the EDF schedule class also needs this function in start_hrtick_dl(). To fix this problem, we call hrtimer_start() unconditionally in start_hrtick_dl(), and make sure the scheduling slice won't be smaller than 10us in hrtimer_start(). Signed-off-by: NXiaofeng Yan <xiaofeng.yan@huawei.com> Reviewed-by: NLi Zefan <lizefan@huawei.com> Acked-by: NJuri Lelli <juri.lelli@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409022941-5880-1-git-send-email-xiaofeng.yan@huawei.com [ Massaged the changelog and the code. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 8月, 2014 1 次提交
-
-
由 Christoph Lameter 提交于
__get_cpu_var can paper over differences in the definitions of cpumask_var_t and either use the address of the cpumask variable directly or perform a fetch of the address of the struct cpumask allocated elsewhere. This is important particularly when using per cpu cpumask_var_t declarations because in one case we have an offset into a per cpu area to handle and in the other case we need to fetch a pointer from the offset. This patch introduces a new macro this_cpu_cpumask_var_ptr() that is defined where cpumask_var_t is defined and performs the proper actions. All use cases where __get_cpu_var is used with cpumask_var_t are converted to the use of this_cpu_cpumask_var_ptr(). Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 20 8月, 2014 1 次提交
-
-
由 Kirill Tkhai 提交于
Implement task_on_rq_queued() and use it everywhere instead of on_rq check. No functional changes. The only exception is we do not use the wrapper in check_for_tasks(), because it requires to export task_on_rq_queued() in global header files. Next patch in series would return it back, so we do not twist it from here to there. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 7月, 2014 2 次提交
-
-
由 xiaofeng.yan 提交于
Signed-off-by: Nxiaofeng.yan <xiaofeng.yan@huawei.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1404712744-16986-1-git-send-email-xiaofeng.yan@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
We always use resched_task() with rq->curr argument. It's not possible to reschedule any task but rq's current. The patch introduces resched_curr(struct rq *) to replace all of the repeating patterns. The main aim is cleanup, but there is a little size profit too: (before) $ size kernel/sched/built-in.o text data bss dec hex filename 155274 16445 7042 178761 2ba49 kernel/sched/built-in.o $ size vmlinux text data bss dec hex filename 7411490 1178376 991232 9581098 92322a vmlinux (after) $ size kernel/sched/built-in.o text data bss dec hex filename 155130 16445 7042 178617 2b9b9 kernel/sched/built-in.o $ size vmlinux text data bss dec hex filename 7411362 1178376 991232 9580970 9231aa vmlinux I was choosing between resched_curr() and resched_rq(), and the first name looks better for me. A little lie in Documentation/trace/ftrace.txt. I have not actually collected the tracing again. With a hope the patch won't make execution times much worse :) Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140628200219.1778.18735.stgit@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 6月, 2014 4 次提交
-
-
由 Paul Gortmaker 提交于
There was a prototype for it added to kernel/sched/sched.h at the same time the extern was added, so the extern in the C file was never really ever needed. See commit 332ac17e ("sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks") for details. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Link: http://lkml.kernel.org/r/1400013605-18717-1-git-send-email-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
Throttled task is still on rq, and it may be moved to other cpu if user is playing with sched_setaffinity(). Therefore, unlocked task_rq() access makes the race. Juri Lelli reports he got this race when dl_bandwidth_enabled() was not set. Other thing, pointed by Peter Zijlstra: "Now I suppose the problem can still actually happen when you change the root domain and trigger a effective affinity change that way". To fix that we do the same as made in __task_rq_lock(). We do not use __task_rq_lock() itself, because it has a useful lockdep check, which is not correct in case of dl_task_timer(). We do not need pi_lock locked here. This case is an exception (PeterZ): "The only reason we don't strictly need ->pi_lock now is because we're guaranteed to have p->state == TASK_RUNNING here and are thus free of ttwu races". Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v3.14+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3056991400578422@web14g.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 John Stultz 提交于
Two of the three prink_deferred uses are really printk_once style uses, so add a printk_deferred_once macro to simplify those call sites. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 John Stultz 提交于
After learning we'll need some sort of deferred printk functionality in the timekeeping core, Peter suggested we rename the printk_sched function so it can be reused by needed subsystems. This only changes the function name. No logic changes. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 5月, 2014 2 次提交
-
-
由 xiaofeng.yan 提交于
sched/rt: Fix 'struct sched_dl_entity' and dl_task_time() comments, to match the current upstream code Signed-off-by: Nxiaofeng.yan <xiaofeng.yan@huawei.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1399605687-18094-1-git-send-email-xiaofeng.yan@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
Sometimes ->nr_running may cross 2 but interrupt is not being sent to rq's cpu. In this case we don't reenable the timer. Looks like this may be the reason for rare unexpected effects, if nohz is enabled. Patch replaces all places of direct changing of nr_running and makes add_nr_running() caring about crossing border. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140508225830.2469.97461.stgit@localhostSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 5月, 2014 1 次提交
-
-
由 Juri Lelli 提交于
yield_task_dl() is broken: o it forces current to be throttled setting its runtime to zero; o it sets current's dl_se->dl_new to one, expecting that dl_task_timer() will queue it back with proper parameters at replenish time. Unfortunately, dl_task_timer() has this check at the very beginning: if (!dl_task(p) || dl_se->dl_new) goto unlock; So, it just bails out and the task is never replenished. It actually yielded forever. To fix this, introduce a new flag indicating that the task properly yielded the CPU before its current runtime expired. While this is a little overdoing at the moment, the flag would be useful in the future to discriminate between "good" jobs (of which remaining runtime could be reclaimed, i.e. recycled) and "bad" jobs (for which dl_throttled task has been set) that needed to be stopped. Reported-by: Nyjay.kim <yjay.kim@lge.com> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140429103953.e68eba1b2ac3309214e3dc5a@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 4月, 2014 1 次提交
-
-
由 Kirill Tkhai 提交于
We need to do it like we do for the other higher priority classes.. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 3月, 2014 1 次提交
-
-
由 Kirill Tkhai 提交于
The problems: 1) We check for rt_nr_running before call of put_prev_task(). If previous task is RT, its rt_rq may become throttled and dequeued after this call. In case of p is from rt->rq this just causes picking a task from throttled queue, but in case of its rt_rq is child we are guaranteed catch BUG_ON. 2) The same with deadline class. The only difference we operate on only dl_rq. This patch fixes all the above problems and it adds a small skip in the DL update like we've already done for RT class: if (unlikely((s64)delta_exec <= 0)) return; This will optimize sequential update_curr_dl() calls a little. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Link: http://lkml.kernel.org/r/1393946746.3643.3.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 2月, 2014 2 次提交
-
-
由 Juri Lelli 提交于
Kirill Tkhai noted: Since deadline tasks share rt bandwidth, we must care about bandwidth timer set. Otherwise rt_time may grow up to infinity in update_curr_dl(), if there are no other available RT tasks on top level bandwidth. RT task were in fact throttled right after they got enqueued, and never executed again (rt_time never again went below rt_runtime). Peter then proposed to accrue DL execution on rt_time only when rt timer is active, and proposed a patch (this patch is a slight modification of that) to implement that behavior. While this solves Kirill problem, it has a drawback. Indeed, Kirill noted again: It looks we may get into a situation, when all CPU time is shared between RT and DL tasks: rt_runtime = n rt_period = 2n | RT working, DL sleeping | DL working, RT sleeping | ----------------------------------------------------------- | (1) duration = n | (2) duration = n | (repeat) |--------------------------|------------------------------| | (rt_bw timer is running) | (rt_bw timer is not running) | No time for fair tasks at all. While this can happen during the first period, if rq is always backlogged, RT tasks won't have the opportunity to execute anymore: rt_time reached rt_runtime during (1), suppose after (2) RT is enqueued back, it gets throttled since rt timer didn't fire, replenishment is from now on eaten up by DL tasks that accrue their execution on rt_time (while rt timer is active - we have an RT task waiting for replenishment). FAIR tasks are not touched after this first period. Ok, this is not ideal, and the situation is even worse! What above (the nice case), practically never happens in reality, where your rt timer is not aligned to tasks periods, tasks are in general not periodic, etc.. Long story short, you always risk to overload your system. This patch is based on Peter's idea, but exploits an additional fact: if you don't have RT tasks enqueued, it makes little sense to continue incrementing rt_time once you reached the upper limit (DL tasks have their own mechanism for throttling). This cures both problems: - no matter how many DL instances in the past, you'll have an rt_time slightly above rt_runtime when an RT task is enqueued, and from that point on (after the first replenishment), the task will normally execute; - you can still eat up all bandwidth during the first period, but not anymore after that, remember that DL execution will increment rt_time till the upper limit is reached. The situation is still not perfect! But, we have a simple solution for now, that limits how much you can jeopardize your system, as we keep working towards the right answer: RT groups scheduled using deadline servers. Reported-by: NKirill Tkhai <tkhai@yandex.ru> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140225151515.617714e2f2cd6c558531ba61@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
In deadline class we do not have group scheduling. So, let's remove unnecessary X = X; equations. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Link: http://lkml.kernel.org/r/1393343543.4089.5.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 2月, 2014 4 次提交
-
-
由 Peter Zijlstra 提交于
Remove a few gratuitous #ifdefs in pick_next_task*(). Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-nnzddp5c4fijyzzxxrwlxghf@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Peter Zijlstra 提交于
Dan Carpenter reported: > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338) > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005) Kirill also spotted that migrate_tasks() will have an instant NULL deref because pick_next_task() will immediately deref prev. Instead of fixing all the corner cases because migrate_tasks() can pass in a NULL prev task in the unlikely case of hot-un-plug, provide a fake task such that we can remove all the NULL checks from the far more common paths. A further problem; not previously spotted; is that because we pushed pre_schedule() and idle_balance() into pick_next_task() we now need to avoid those getting called and pulling more tasks on our dying CPU. We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1. We also note that since we call pick_next_task() exactly the amount of times we have runnable tasks present, we should never land in idle_balance(). Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()") Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reported-by: NKirill Tkhai <tkhai@yandex.ru> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Kirill Tkhai 提交于
In deadline class we do not have group scheduling like in RT. dl_nr_total is the same as dl_nr_running. So, one of them should be removed. Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/368631392675853@web20h.yandex.ruSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Juri Lelli 提交于
Rostedt writes: My test suite was locking up hard when enabling mmiotracer. This was due to the mmiotracer placing all but one CPU offline. I found this out when I was able to reproduce the bug with just my stress-cpu-hotplug test. This bug baffled me because it would not always trigger, and would only trigger on the first run after boot up. The stress-cpu-hotplug test would crash hard the first run, or never crash at all. But a new reboot may cause it to crash on the first run again. I spent all week bisecting this, as I couldn't find a consistent reproducer. I finally narrowed it down to the sched deadline patches, and even more peculiar, to the commit that added the sched deadline boot up self test to the latency tracer. Then it dawned on me to what the bug was. All it took was to run a task under sched deadline to screw up the CPU hot plugging. This explained why it would lock up only on the first run of the stress-cpu-hotplug test. The bug happened when the boot up self test of the schedule latency tracer would test a deadline task. The deadline task would corrupt something that would cause CPU hotplug to fail. If it didn't corrupt it, the stress test would always work (there's no other sched deadline tasks that would run to cause problems). If it did corrupt on boot up, the first test would lockup hard. I proved this theory by running my deadline test program on another box, and then run the stress-cpu-hotplug test, and it would now consistently lock up. I could run stress-cpu-hotplug over and over with no problem, but once I ran the deadline test, the next run of the stress-cpu-hotplug would lock hard. After adding lots of tracing to the code, I found the cause. The function tracer showed that migrate_tasks() was stuck in an infinite loop, where rq->nr_running never equaled 1 to break out of it. When I added a trace_printk() to see what that number was, it was 335 and never decrementing! Looking at the deadline code I found: static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { dequeue_dl_entity(&p->dl); dequeue_pushable_dl_task(rq, p); } static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags) { update_curr_dl(rq); __dequeue_task_dl(rq, p, flags); dec_nr_running(rq); } And this: if (dl_runtime_exceeded(rq, dl_se)) { __dequeue_task_dl(rq, curr, 0); if (likely(start_dl_timer(dl_se, curr->dl.dl_boosted))) dl_se->dl_throttled = 1; else enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH); if (!is_leftmost(curr, &rq->dl)) resched_task(curr); } Notice how we call __dequeue_task_dl() and in the else case we call enqueue_task_dl()? Also notice that dequeue_task_dl() has underscores where enqueue_task_dl() does not. The enqueue_task_dl() calls inc_nr_running(rq), but __dequeue_task_dl() does not. This is where we get nr_running out of sync. [snip] Another point where nr_running can get out of sync is when the dl_timer fires: dl_se->dl_throttled = 0; if (p->on_rq) { enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); if (task_has_dl_policy(rq->curr)) check_preempt_curr_dl(rq, p, 0); else resched_task(rq->curr); This patch does two things: - correctly accounts for throttled tasks (that are now considered !running); - fixes the bug, updating nr_running from {inc,dec}_dl_tasks(), since we risk to update it twice in some situations (e.g., a task is dequeued while it has exceeded its budget). Cc: mingo@redhat.com Cc: torvalds@linux-foundation.org Cc: akpm@linux-foundation.org Reported-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392884379-13744-1-git-send-email-juri.lelli@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 11 2月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
This patch both merged idle_balance() and pre_schedule() and pushes both of them into pick_next_task(). Conceptually pre_schedule() and idle_balance() are rather similar, both are used to pull more work onto the current CPU. We cannot however first move idle_balance() into pre_schedule_fair() since there is no guarantee the last runnable task is a fair task, and thus we would miss newidle balances. Similarly, the dl and rt pre_schedule calls must be ran before idle_balance() since their respective tasks have higher priority and it would not do to delay their execution searching for less important tasks first. However, by noticing that pick_next_tasks() already traverses the sched_class hierarchy in the right order, we can get the right behaviour and do away with both calls. We must however change the special case optimization to also require that prev is of sched_class_fair, otherwise we can miss doing a dl or rt pull where we needed one. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 2月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
In order to avoid having to do put/set on a whole cgroup hierarchy when we context switch, push the put into pick_next_task() so that both operations are in the same function. Further changes then allow us to possibly optimize away redundant work. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 2月, 2014 1 次提交
-
-
由 Kirill Tkhai 提交于
When p is current and it's not of dl class, then there are no other dl taks in the rq. If we had had pushable tasks in some other rq, they would have been pushed earlier. So, skip "p == rq->curr" case. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Acked-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140128072421.32315.25300.stgit@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 1月, 2014 1 次提交
-
-
由 Dario Faggioli 提交于
Add in Documentation/scheduler/ some hints about the design choices, the usage and the future possible developments of the sched_dl scheduling class and of the SCHED_DEADLINE policy. Reviewed-by: NHenrik Austad <henrik@austad.us> Signed-off-by: NDario Faggioli <raistlin@linux.it> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> [ Re-wrote sections 2 and 3. ] Signed-off-by: NLuca Abeni <luca.abeni@unitn.it> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390821615-23247-1-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 1月, 2014 1 次提交
-
-
由 Juri Lelli 提交于
Dan Carpenter reported new 'Smatch' warnings: > tree: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/core > head: 130816ce > commit: 1baca4ce [17/50] sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic > > kernel/sched/deadline.c:937 pick_next_task_dl() warn: variable dereferenced before check 'p' (see line 934) BUG_ON() already fires if pick_next_dl_entity() doesn't return a valid dl_se. No need to check if p is valid afterward. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Fixes: 1baca4ce ("sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic") Link: http://lkml.kernel.org/r/52D54E25.6060100@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 1月, 2014 5 次提交
-
-
由 Peter Zijlstra 提交于
Remove the deadline specific sysctls for now. The problem with them is that the interaction with the exisiting rt knobs is nearly impossible to get right. The current (as per before this patch) situation is that the rt and dl bandwidth is completely separate and we enforce rt+dl < 100%. This is undesirable because this means that the rt default of 95% leaves us hardly any room, even though dl tasks are saver than rt tasks. Another proposed solution was (a discarted patch) to have the dl bandwidth be a fraction of the rt bandwidth. This is highly confusing imo. Furthermore neither proposal is consistent with the situation we actually want; which is rt tasks ran from a dl server. In which case the rt bandwidth is a direct subset of dl. So whichever way we go, the introduction of dl controls at this point is painful. Therefore remove them and instead share the rt budget. This means that for now the rt knobs are used for dl admission control and the dl runtime is accounted against the rt runtime. I realise that this isn't entirely desirable either; but whatever we do we appear to need to change the interface later, so better have a small interface for now. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juri Lelli 提交于
Data from tests confirmed that the original active load balancing logic didn't scale neither in the number of CPU nor in the number of tasks (as sched_rt does). Here we provide a global data structure to keep track of deadlines of the running tasks in the system. The structure is composed by a bitmask showing the free CPUs and a max-heap, needed when the system is heavily loaded. The implementation and concurrent access scheme are kept simple by design. However, our measurements show that we can compete with sched_rt on large multi-CPUs machines [1]. Only the push path is addressed, the extension to use this structure also for pull decisions is straightforward. However, we are currently evaluating different (in order to decrease/avoid contention) data structures to solve possibly both problems. We are also going to re-run tests considering recent changes inside cpupri [2]. [1] http://retis.sssup.it/~jlelli/papers/Ospert11Lelli.pdf [2] http://www.spinics.net/lists/linux-rt-users/msg06778.htmlSigned-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-14-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dario Faggioli 提交于
In order of deadline scheduling to be effective and useful, it is important that some method of having the allocation of the available CPU bandwidth to tasks and task groups under control. This is usually called "admission control" and if it is not performed at all, no guarantee can be given on the actual scheduling of the -deadline tasks. Since when RT-throttling has been introduced each task group have a bandwidth associated to itself, calculated as a certain amount of runtime over a period. Moreover, to make it possible to manipulate such bandwidth, readable/writable controls have been added to both procfs (for system wide settings) and cgroupfs (for per-group settings). Therefore, the same interface is being used for controlling the bandwidth distrubution to -deadline tasks and task groups, i.e., new controls but with similar names, equivalent meaning and with the same usage paradigm are added. However, more discussion is needed in order to figure out how we want to manage SCHED_DEADLINE bandwidth at the task group level. Therefore, this patch adds a less sophisticated, but actually very sensible, mechanism to ensure that a certain utilization cap is not overcome per each root_domain (the single rq for !SMP configurations). Another main difference between deadline bandwidth management and RT-throttling is that -deadline tasks have bandwidth on their own (while -rt ones doesn't!), and thus we don't need an higher level throttling mechanism to enforce the desired bandwidth. This patch, therefore: - adds system wide deadline bandwidth management by means of: * /proc/sys/kernel/sched_dl_runtime_us, * /proc/sys/kernel/sched_dl_period_us, that determine (i.e., runtime / period) the total bandwidth available on each CPU of each root_domain for -deadline tasks; - couples the RT and deadline bandwidth management, i.e., enforces that the sum of how much bandwidth is being devoted to -rt -deadline tasks to stay below 100%. This means that, for a root_domain comprising M CPUs, -deadline tasks can be created until the sum of their bandwidths stay below: M * (sched_dl_runtime_us / sched_dl_period_us) It is also possible to disable this bandwidth management logic, and be thus free of oversubscribing the system up to any arbitrary level. Signed-off-by: NDario Faggioli <raistlin@linux.it> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dario Faggioli 提交于
Some method to deal with rt-mutexes and make sched_dl interact with the current PI-coded is needed, raising all but trivial issues, that needs (according to us) to be solved with some restructuring of the pi-code (i.e., going toward a proxy execution-ish implementation). This is under development, in the meanwhile, as a temporary solution, what this commits does is: - ensure a pi-lock owner with waiters is never throttled down. Instead, when it runs out of runtime, it immediately gets replenished and it's deadline is postponed; - the scheduling parameters (relative deadline and default runtime) used for that replenishments --during the whole period it holds the pi-lock-- are the ones of the waiting task with earliest deadline. Acting this way, we provide some kind of boosting to the lock-owner, still by using the existing (actually, slightly modified by the previous commit) pi-architecture. We would stress the fact that this is only a surely needed, all but clean solution to the problem. In the end it's only a way to re-start discussion within the community. So, as always, comments, ideas, rants, etc.. are welcome! :-) Signed-off-by: NDario Faggioli <raistlin@linux.it> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> [ Added !RT_MUTEXES build fix. ] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Harald Gustafsson 提交于
Make it possible to specify a period (different or equal than deadline) for -deadline tasks. Relative deadlines (D_i) are used on task arrivals to generate new scheduling (absolute) deadlines as "d = t + D_i", and periods (P_i) to postpone the scheduling deadlines as "d = d + P_i" when the budget is zero. This is in general useful to model (and schedule) tasks that have slow activation rates (long periods), but have to be scheduled soon once activated (short deadlines). Signed-off-by: NHarald Gustafsson <harald.gustafsson@ericsson.com> Signed-off-by: NDario Faggioli <raistlin@linux.it> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-7-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-