- 11 3月, 2014 3 次提交
-
-
由 Kirill Tkhai 提交于
Check for fair tasks number to decide, that we've pulled a task. rq's nr_running may contain throttled RT tasks. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394118975.19290.104.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
1) Single cpu machine case. When rq has only RT tasks, but no one of them can be picked because of throttling, we enter in endless loop. pick_next_task_{dl,rt} return NULL. In pick_next_task_fair() we permanently go to retry if (rq->nr_running != rq->cfs.h_nr_running) return RETRY_TASK; (rq->nr_running is not being decremented when rt_rq becomes throttled). No chances to unthrottle any rt_rq or to wake fair here, because of rq is locked permanently and interrupts are disabled. 2) In case of SMP this can cause a hang too. Although we unlock rq in idle_balance(), interrupts are still disabled. The solution is to check for available tasks in DL and RT classes instead of checking for sum. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
We close idle_exit_fair() bracket in case of we've pulled something or we've received task of high priority class. Signed-off-by: NKirill Tkhai <ktkhai@parallels.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/1394098315.19290.10.camel@tkhaiSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 2月, 2014 3 次提交
-
-
由 Peter Zijlstra 提交于
Michael spotted that the idle_balance() push down created a task priority problem. Previously, when we called idle_balance() before pick_next_task() it wasn't a problem when -- because of the rq->lock droppage -- an rt/dl task slipped in. Similarly for pre_schedule(), rt pre-schedule could have a dl task slip in. But by pulling it into the pick_next_task() loop, we'll not try a higher task priority again. Cure this by creating a re-start condition in pick_next_task(); and triggering this from pick_next_task_{rt,fair}(). It also fixes a live-lock where we get stuck in pick_next_task_fair() due to idle_balance() seeing !0 nr_running but there not actually being any fair tasks about. Reported-by: NMichael Wang <wangyun@linux.vnet.ibm.com> Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()") Tested-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140224121218.GR15586@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dietmar Eggemann 提交于
The struct sched_avg of struct rq is only used in case group scheduling is enabled inside __update_tg_runnable_avg() to update per-cpu representation of a task group. I.e. that there is no need to maintain the runnable avg of a rq in the !CONFIG_FAIR_GROUP_SCHED case. This patch guards struct sched_avg of struct rq and update_rq_runnable_avg() with CONFIG_FAIR_GROUP_SCHED. There is an extra empty definition for update_rq_runnable_avg() necessary for the !CONFIG_FAIR_GROUP_SCHED && CONFIG_SMP case. The function print_cfs_group_stats() which prints out struct sched_avg of struct rq is already guarded with CONFIG_FAIR_GROUP_SCHED. Reviewed-by: NBen Segall <bsegall@google.com> Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/530DCDC5.1060406@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 George McCollister 提交于
dequeue_entity() is called when p->on_rq and sets se->on_rq = 0 which appears to guarentee that the !se->on_rq condition is met. If the task has done set_current_state(TASK_INTERRUPTIBLE) without schedule() the second condition will be met and vruntime will be incorrectly adjusted twice. In certain cases this can result in the task's vruntime never increasing past the vruntime of other tasks on the CFS' run queue, starving them of CPU time. This patch changes switched_from_fair() to use !p->on_rq instead of !se->on_rq. I'm able to cause a task with a priority of 120 to starve all other tasks with the same priority on an ARM platform running 3.2.51-rt72 PREEMPT RT by writing one character at time to a serial tty (16550 UART) in a tight loop. I'm also able to verify making this change corrects the problem on that platform and kernel version. Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1392767811-28916-1-git-send-email-george.mccollister@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 2月, 2014 2 次提交
-
-
由 Mike Galbraith 提交于
The user explicitly disabled load balancing, else this core would not be disconnected. Don't add these to nohz.idle_cpus_mask. Signed-off-by: NMike Galbraith <mgalbraith@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Lei Wen <leiwen@marvell.com> Link: http://lkml.kernel.org/n/tip-vmme4f49psirp966pklm5l9j@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Morten Rasmussen 提交于
Brings select_task_rq_fair() description comments up-to-date. Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392732864-10927-1-git-send-email-morten.rasmussen@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 2月, 2014 4 次提交
-
-
由 Peter Zijlstra 提交于
Dan Carpenter reported: > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338) > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005) Kirill also spotted that migrate_tasks() will have an instant NULL deref because pick_next_task() will immediately deref prev. Instead of fixing all the corner cases because migrate_tasks() can pass in a NULL prev task in the unlikely case of hot-un-plug, provide a fake task such that we can remove all the NULL checks from the far more common paths. A further problem; not previously spotted; is that because we pushed pre_schedule() and idle_balance() into pick_next_task() we now need to avoid those getting called and pulling more tasks on our dying CPU. We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1. We also note that since we call pick_next_task() exactly the amount of times we have runnable tasks present, we should never land in idle_balance(). Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()") Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reported-by: NKirill Tkhai <tkhai@yandex.ru> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Peter Zijlstra 提交于
Remove idle_balance() from the public life; also reduce some #ifdef clutter by folding the pick_next_task_fair() idle path into idle_balance(). Cc: mingo@kernel.org Reported-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140211151148.GP27965@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Michael wang 提交于
Sasha reported: [ 522.645288] BUG: unable to handle kernel NULL pointer dereference at ... [ 522.646271] IP: [<ffffffff81186c6f>] check_preempt_wakeup+0x11f/0x210 ... [ 522.650021] Call Trace: [ 522.650021] <IRQ> [ 522.650021] [<ffffffff8117361d>] check_preempt_curr+0x3d/0xb0 [ 522.650021] [<ffffffff81175d88>] ttwu_do_wakeup+0x18/0x130 ... which was caused by the se-depth changed during the time when task is not FAIR, and we will use the wrong depth value after it switched back to FAIR. This patch reset the depth at the time when task switched to FAIR, make sure that we always have the correct value when task is FAIR. Cc: Ingo Molnar <mingo@kernel.org> Reported-by: NSasha Levin <sasha.levin@oracle.com> Tested-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5305732D.70001@linux.vnet.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Rik van Riel 提交于
Normally task_numa_work scans over a fairly small amount of memory, but it is possible to run into a large unpopulated part of virtual memory, with no pages mapped. In that case, task_numa_work can run for a while, and it may make sense to reschedule as required. Cc: akpm@linux-foundation.org Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NRik van Riel <riel@redhat.com> Reported-by: NXing Gang <gang.xing@hp.com> Tested-by: NChegu Vinod <chegu_vinod@hp.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392761566-24834-2-git-send-email-riel@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 11 2月, 2014 2 次提交
-
-
由 Dietmar Eggemann 提交于
Since is_same_group() is only used in the group scheduling code, there is no need to define it outside CONFIG_FAIR_GROUP_SCHED. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391005773-29493-1-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
This patch both merged idle_balance() and pre_schedule() and pushes both of them into pick_next_task(). Conceptually pre_schedule() and idle_balance() are rather similar, both are used to pull more work onto the current CPU. We cannot however first move idle_balance() into pre_schedule_fair() since there is no guarantee the last runnable task is a fair task, and thus we would miss newidle balances. Similarly, the dl and rt pre_schedule calls must be ran before idle_balance() since their respective tasks have higher priority and it would not do to delay their execution searching for less important tasks first. However, by noticing that pick_next_tasks() already traverses the sched_class hierarchy in the right order, we can get the right behaviour and do away with both calls. We must however change the special case optimization to also require that prev is of sched_class_fair, otherwise we can miss doing a dl or rt pull where we needed one. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 2月, 2014 7 次提交
-
-
由 Peter Zijlstra 提交于
Since commit 2f36825b ("sched: Next buddy hint on sleep and preempt path") it is likely we pick a new task from the same cgroup, doing a put and then set on all intermediate entities is a waste of time, so try to avoid this. Measured using: mount nodev /cgroup -t cgroup -o cpu cd /cgroup mkdir a; cd a mkdir b; cd b mkdir c; cd c echo $$ > tasks perf stat --repeat 10 -- taskset 1 perf bench sched pipe PRE : 4.542422684 seconds time elapsed ( +- 0.33% ) POST: 4.389409991 seconds time elapsed ( +- 0.32% ) Which shows a significant improvement of ~3.5% Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Slightly easier code flow, no functional changes. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In order to avoid having to do put/set on a whole cgroup hierarchy when we context switch, push the put into pick_next_task() so that both operations are in the same function. Further changes then allow us to possibly optimize away redundant work. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Track depth in cgroup tree, this is useful for things like find_matching_se() where you need to get to a common parent of two sched entities. Keeping the depth avoids having to calculate it on the spot, which saves a number of possible cache-misses. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1328936700.2476.17.camel@laptopSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
idle_balance() modifies the rq->idle_stamp field, making this information shared across core.c and fair.c. As we know if the cpu is going to idle or not with the previous patch, let's encapsulate the rq->idle_stamp information in core.c by moving it up to the caller. The idle_balance() function returns true in case a balancing occured and the cpu won't be idle, false if no balance happened and the cpu is going idle. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Cc: alex.shi@linaro.org Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389949444-14821-3-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The scheduler main function 'schedule()' checks if there are no more tasks on the runqueue. Then it checks if a task should be pulled in the current runqueue in idle_balance() assuming it will go to idle otherwise. But idle_balance() releases the rq->lock in order to look up the sched domains and takes the lock again right after. That opens a window where another cpu may put a task in our runqueue, so we won't go to idle but we have filled the idle_stamp, thinking we will. This patch closes the window by checking if the runqueue has been modified but without pulling a task after taking the lock again, so we won't go to idle right after in the __schedule() function. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Cc: alex.shi@linaro.org Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389949444-14821-2-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The cpu parameter passed to idle_balance() is not needed as it could be retrieved from 'struct rq.' Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Cc: alex.shi@linaro.org Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389949444-14821-1-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 1月, 2014 9 次提交
-
-
由 Rik van Riel 提交于
Cleanup suggested by Mel Gorman. Now the code contains some more hints on what statistics go where. Suggested-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-10-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
We track both the node of the memory after a NUMA fault, and the node of the CPU on which the fault happened. Rename the local variables in task_numa_fault to make things more explicit. Suggested-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-9-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
The current code in task_numa_placement calculates the difference between the old and the new value, but also temporarily stores half of the old value in the per-process variables. The NUMA balancing code looks at those per-process variables, and having other tasks temporarily see halved statistics could lead to unwanted numa migrations. This can be avoided by doing all the math in local variables. This change also simplifies the code a little. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-8-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Tracing the code that decides the active nodes has made it abundantly clear that the naive implementation of the faults_from code has issues. Specifically, the garbage collector in some workloads will access orders of magnitudes more memory than the threads that do all the active work. This resulted in the node with the garbage collector being marked the only active node in the group. This issue is avoided if we weigh the statistics by CPU use of each task in the numa group, instead of by how many faults each thread has occurred. To achieve this, we normalize the number of faults to the fraction of faults that occurred on each node, and then multiply that fraction by the fraction of CPU time the task has used since the last time task_numa_placement was invoked. This way the nodes in the active node mask will be the ones where the tasks from the numa group are most actively running, and the influence of eg. the garbage collector and other do-little threads is properly minimized. On a 4 node system, using CPU use statistics calculated over a longer interval results in about 1% fewer page migrations with two 32-warehouse specjbb runs on a 4 node system, and about 5% fewer page migrations, as well as 1% better throughput, with two 8-warehouse specjbb runs, as compared with the shorter term statistics kept by the scheduler. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-7-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Use the active_nodes nodemask to make smarter decisions on NUMA migrations. In order to maximize performance of workloads that do not fit in one NUMA node, we want to satisfy the following criteria: 1) keep private memory local to each thread 2) avoid excessive NUMA migration of pages 3) distribute shared memory across the active nodes, to maximize memory bandwidth available to the workload This patch accomplishes that by implementing the following policy for NUMA migrations: 1) always migrate on a private fault 2) never migrate to a node that is not in the set of active nodes for the numa_group 3) always migrate from a node outside of the set of active nodes, to a node that is in that set 4) within the set of active nodes in the numa_group, only migrate from a node with more NUMA page faults, to a node with fewer NUMA page faults, with a 25% margin to avoid ping-ponging This results in most pages of a workload ending up on the actively used nodes, with reduced ping-ponging of pages between those nodes. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-6-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
The numa_faults_cpu statistics are used to maintain an active_nodes nodemask per numa_group. This allows us to be smarter about when to do numa migrations. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-5-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Track which nodes NUMA faults are triggered from, in other words the CPUs on which the NUMA faults happened. This uses a similar mechanism to what is used to track the memory involved in numa faults. The next patches use this to build up a bitmap of which nodes a workload is actively running on. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-4-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
In order to get a more consistent naming scheme, making it clear which fault statistics track memory locality, and which track CPU locality, rename the memory fault statistics. Suggested-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-3-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Excessive migration of pages can hurt the performance of workloads that span multiple NUMA nodes. However, it turns out that the p->numa_migrate_deferred knob is a really big hammer, which does reduce migration rates, but does not actually help performance. Now that the second stage of the automatic numa balancing code has stabilized, it is time to replace the simplistic migration deferral code with something smarter. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Link: http://lkml.kernel.org/r/1390860228-21539-2-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 1月, 2014 1 次提交
-
-
由 Vincent Guittot 提交于
This reverts commit 282cf499. With the current implementation, the load average statistics of a sched entity change according to other activity on the CPU even if this activity is done between the running window of the sched entity and have no influence on the running duration of the task. When a task wakes up on the same CPU, we currently update last_runnable_update with the return of __synchronize_entity_decay without updating the runnable_avg_sum and runnable_avg_period accordingly. In fact, we have to sync the load_contrib of the se with the rq's blocked_load_contrib before removing it from the latter (with __synchronize_entity_decay) but we must keep last_runnable_update unchanged for updating runnable_avg_sum/period during the next update_entity_load_avg. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Reviewed-by: NBen Segall <bsegall@google.com> Cc: pjt@google.com Cc: alex.shi@linaro.org Link: http://lkml.kernel.org/r/1390376734-6800-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 1月, 2014 1 次提交
-
-
由 Mel Gorman 提交于
This patch adds three tracepoints o trace_sched_move_numa when a task is moved to a node o trace_sched_swap_numa when a task is swapped with another task o trace_sched_stick_numa when a numa-related migration fails The tracepoints allow the NUMA scheduler activity to be monitored and the following high-level metrics can be calculated o NUMA migrated stuck nr trace_sched_stick_numa o NUMA migrated idle nr trace_sched_move_numa o NUMA migrated swapped nr trace_sched_swap_numa o NUMA local swapped trace_sched_swap_numa src_nid == dst_nid (should never happen) o NUMA remote swapped trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped) o NUMA group swapped trace_sched_swap_numa src_ngid == dst_ngid Maybe a small number of these are acceptable but a high number would be a major surprise. It would be even worse if bounces are frequent. o NUMA avg task migs. Average number of migrations for tasks o NUMA stddev task mig Self-explanatory o NUMA max task migs. Maximum number of migrations for a single task In general the intent of the tracepoints is to help diagnose problems where automatic NUMA balancing appears to be doing an excessive amount of useless work. [akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style] Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2014 8 次提交
-
-
由 Daniel Lezcano 提交于
The test on_null_domain is done twice in the trigger_load_balance function. Move the test at the begin of the function, so there is only one check. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-9-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The cpu information is stored in the struct rq. Pass the struct rq to nohz_idle_balance, so all the functions called in run_rebalance_domains have the same parameters and the 'this_cpu' variable becomes pointless. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> [ Added !SMP build fix. ] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-8-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The cpu information is stored in the struct rq and the caller of the rebalance_domains function pass the cpu to retrieve the struct rq but it already has the struct rq info. Replace the cpu parameter with the struct rq. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-7-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The cpu parameter is no longer needed in nohz_balancer_kick, let's remove the parameter. Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-6-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The 'call_cpu' is never used in the function. Remove it. Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-5-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The on_null_domain() function is getting the cpu to retrieve the struct rq associated with it. Pass 'struct rq' directly to the function as the caller already has the info. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-4-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The cpu information is already stored in the struct rq, so no need to pass it as parameter to the nohz_kick_needed function. The caller of this function just called idle_cpu() before to fill the rq->idle_balance field. Use rq->cpu and rq->idle_balance. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-3-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Lezcano 提交于
The cpu information is already stored in the struct rq, so no need to pass it as parameter to the trigger_load_balance function. Cc: linaro-kernel@lists.linaro.org Cc: preeti.lkml@gmail.com Cc: mingo@redhat.com Cc: peterz@infradead.org Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1389008085-9069-2-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-