- 10 9月, 2010 1 次提交
-
-
由 Suresh Siddha 提交于
Currently sched_avg_update() (which updates rt_avg stats in the rq) is getting called from scale_rt_power() (in the load balance context) which doesn't take rq->lock. Fix it by moving the sched_avg_update() to more appropriate update_cpu_load() where the CFS load gets updated as well. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 9月, 2010 1 次提交
-
-
由 Andi Kleen 提交于
No real bugs I believe, just some dead code. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: andi@firstfloor.org Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 8月, 2010 1 次提交
-
-
由 Peter Zijlstra 提交于
sched_fork() -- we do task placement in ->task_fork_fair() ensure we update_rq_clock() so we work with current time. We leave the vruntime in relative state, so the time delay until wake_up_new_task() doesn't matter. wake_up_new_task() -- Since task_fork_fair() left p->vruntime in relative state we can safely migrate, the activate_task() on the remote rq will call update_rq_clock() and causes the clock to be synced (enough). Tested-by: NJack Daniel <wanders.thirst@gmail.com> Tested-by: NPhilby John <pjohn@mvista.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1281002322.1923.1708.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 7月, 2010 2 次提交
-
-
由 Peter Zijlstra 提交于
Currently we update cpu_power() too often, update_group_power() only updates the local group's cpu_power but it gets called for all groups. Furthermore, CPU_NEWLY_IDLE invocations will result in all cpus calling it, even though a slow update of cpu_power is sufficient. Therefore move the update under 'idle != CPU_NEWLY_IDLE && local_group' to reduce superfluous invocations. Reported-by: NVenkatesh Pallipadi <venki@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1278612989.1900.176.camel@laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Suresh Siddha 提交于
Suresh spotted that we don't update the rq->clock in the nohz load-balancer path. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1278626014.2834.74.camel@sbs-t61.sc.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 29 6月, 2010 1 次提交
-
-
由 Michael Neuling 提交于
No logic changes, only spelling. Signed-off-by: NMichael Neuling <mikey@neuling.org> Cc: linuxppc-dev@ozlabs.org Cc: David Howells <dhowells@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <15249.1277776921@neuling.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 23 6月, 2010 1 次提交
-
-
由 Daniel J Blueman 提交于
The task_group() function returns a pointer that must be protected by either RCU, the ->alloc_lock, or the cgroup lock (see the rcu_dereference_check() in task_subsys_state(), which is invoked by task_group()). The wake_affine() function currently does none of these, which means that a concurrent update would be within its rights to free the structure returned by task_group(). Because wake_affine() uses this structure only to compute load-balancing heuristics, there is no reason to acquire either of the two locks. Therefore, this commit introduces an RCU read-side critical section that starts before the first call to task_group() and ends after the last use of the "tg" pointer returned from task_group(). Thanks to Li Zefan for pointing out the need to extend the RCU read-side critical section from that proposed by the original patch. Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 18 6月, 2010 2 次提交
-
-
由 Michael Neuling 提交于
Docbook fails in sched_fair.c due to comments added in the asymmetric packing patch series. This fixes these errors. No code changes. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <24737.1276135581@neuling.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Michael Neuling 提交于
The CPU power test is the wrong way around in fix_small_capacity. This was due to a small changes made in the posted patch on lkml to what was was taken upstream. This patch fixes asymmetric packing for POWER7. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <12629.1276124617@neuling.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 6月, 2010 4 次提交
-
-
由 Michael Neuling 提交于
Check to see if the group is packed in a sched doman. This is primarily intended to used at the sibling level. Some cores like POWER7 prefer to use lower numbered SMT threads. In the case of POWER7, it can move to lower SMT modes only when higher threads are idle. When in lower SMT modes, the threads will perform better since they share less core resources. Hence when we have idle threads, we want them to be the higher ones. This adds a hook into f_b_g() called check_asym_packing() to check the packing. This packing function is run on idle threads. It checks to see if the busiest CPU in this domain (core in the P7 case) has a higher CPU number than what where the packing function is being run on. If it is, calculate the imbalance and return the higher busier thread as the busiest group to f_b_g(). Here we are assuming a lower CPU number will be equivalent to a lower SMT thread number. It also creates a new SD_ASYM_PACKING flag to enable this feature at any scheduler domain level. It also creates an arch hook to enable this feature at the sibling level. The default function doesn't enable this feature. Based heavily on patch from Peter Zijlstra. Fixes from Srivatsa Vaddagiri. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20100608045702.2936CCC897@localhost.localdomain> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Srivatsa Vaddagiri 提交于
Handle cpu capacity being reported as 0 on cores with more number of hardware threads. For example on a Power7 core with 4 hardware threads, core power is 1177 and thus power of each hardware thread is 1177/4 = 294. This low power can lead to capacity for each hardware thread being calculated as 0, which leads to tasks bouncing within the core madly! Fix this by reporting capacity for hardware threads as 1, provided their power is not scaled down significantly because of frequency scaling or real-time tasks usage of cpu. Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <20100608045702.21D03CC895@localhost.localdomain> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Venkatesh Pallipadi 提交于
In the new push model, all idle CPUs indeed go into nohz mode. There is still the concept of idle load balancer (performing the load balancing on behalf of all the idle cpu's in the system). Busy CPU kicks the nohz balancer when any of the nohz CPUs need idle load balancing. The kickee CPU does the idle load balancing on behalf of all idle CPUs instead of the normal idle balance. This addresses the below two problems with the current nohz ilb logic: * the idle load balancer continued to have periodic ticks during idle and wokeup frequently, even though it did not have any rebalancing to do on behalf of any of the idle CPUs. * On x86 and CPUs that have APIC timer stoppage on idle CPUs, this periodic wakeup can result in a periodic additional interrupt on a CPU doing the timer broadcast. Also currently we are migrating the unpinned timers from an idle to the cpu doing idle load balancing (when all the cpus in the system are idle, there is no idle load balancing cpu and timers get added to the same idle cpu where the request was made. So the existing optimization works only on semi idle system). And In semi idle system, we no longer have periodic ticks on the idle load balancer CPU. Using that cpu will add more delays to the timers than intended (as that cpu's timer base may not be uptodate wrt jiffies etc). This was causing mysterious slowdowns during boot etc. For now, in the semi idle case, use the nearest busy cpu for migrating timers from an idle cpu. This is good for power-savings anyway. Signed-off-by: NVenkatesh Pallipadi <venki@google.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1274486981.2840.46.camel@sbs-t61.sc.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Venkatesh Pallipadi 提交于
tickless idle has a negative side effect on update_cpu_load(), which in turn can affect load balancing behavior. update_cpu_load() is supposed to be called every tick, to keep track of various load indicies. With tickless idle, there are no scheduler ticks called on the idle CPUs. Idle CPUs may still do load balancing (with idle_load_balance CPU) using the stale cpu_load. It will also cause problems when all CPUs go idle for a while and become active again. In this case loads would not degrade as expected. This is how rq->nr_load_updates change looks like under different conditions: <cpu_num> <nr_load_updates change> All CPUS idle for 10 seconds (HZ=1000) 0 1621 10 496 11 139 12 875 13 1672 14 12 15 21 1 1472 2 2426 3 1161 4 2108 5 1525 6 701 7 249 8 766 9 1967 One CPU busy rest idle for 10 seconds 0 10003 10 601 11 95 12 966 13 1597 14 114 15 98 1 3457 2 93 3 6679 4 1425 5 1479 6 595 7 193 8 633 9 1687 All CPUs busy for 10 seconds 0 10026 10 10026 11 10026 12 10026 13 10025 14 10025 15 10025 1 10026 2 10026 3 10026 4 10026 5 10026 6 10026 7 10026 8 10026 9 10026 That is update_cpu_load works properly only when all CPUs are busy. If all are idle, all the CPUs get way lower updates. And when few CPUs are busy and rest are idle, only busy and ilb CPU does proper updates and rest of the idle CPUs will do lower updates. The patch keeps track of when a last update was done and fixes up the load avg based on current time. On one of my test system SPECjbb with warehouse 1..numcpus, patch improves throughput numbers by ~1% (average of 6 runs). On another test system (with different domain hierarchy) there is no noticable change in perf. Signed-off-by: NVenkatesh Pallipadi <venki@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <AANLkTilLtDWQsAUrIxJ6s04WTgmw9GuOODc5AOrYsaR5@mail.gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 6月, 2010 1 次提交
-
-
由 Peter Zijlstra 提交于
Mike reports that since e9e9250b (sched: Scale down cpu_power due to RT tasks), wake_affine() goes funny on RT tasks due to them still having a !0 weight and wake_affine() still subtracts that from the rq weight. Since nobody should be using se->weight for RT tasks, set the value to zero. Also, since we now use ->cpu_power to normalize rq weights to account for RT cpu usage, add that factor into the imbalance computation. Reported-by: NMike Galbraith <efault@gmx.de> Tested-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1275316109.27810.22969.camel@twins> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 5月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
Currently migration_thread is serving three purposes - migration pusher, context to execute active_load_balance() and forced context switcher for expedited RCU synchronize_sched. All three roles are hardcoded into migration_thread() and determining which job is scheduled is slightly messy. This patch kills migration_thread and replaces all three uses with cpu_stop. The three different roles of migration_thread() are splitted into three separate cpu_stop callbacks - migration_cpu_stop(), active_load_balance_cpu_stop() and synchronize_sched_expedited_cpu_stop() - and each use case now simply asks cpu_stop to execute the callback as necessary. synchronize_sched_expedited() was implemented with private preallocated resources and custom multi-cpu queueing and waiting logic, both of which are provided by cpu_stop. synchronize_sched_expedited_count is made atomic and all other shared resources along with the mutex are dropped. synchronize_sched_expedited() also implemented a check to detect cases where not all the callback got executed on their assigned cpus and fall back to synchronize_sched(). If called with cpu hotplug blocked, cpu_stop already guarantees that and the condition cannot happen; otherwise, stop_machine() would break. However, this patch preserves the paranoid check using a cpumask to record on which cpus the stopper ran so that it can serve as a bisection point if something actually goes wrong theree. Because the internal execution state is no longer visible, rcu_expedited_torture_stats() is removed. This patch also renames cpu_stop threads to from "stopper/%d" to "migration/%d". The names of these threads ultimately don't matter and there's no reason to make unnecessary userland visible changes. With this patch applied, stop_machine() and sched now share the same resources. stop_machine() is faster without wasting any resources and sched migration users are much cleaner. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Dimitri Sivanich <sivanich@sgi.com>
-
- 23 4月, 2010 2 次提交
-
-
由 Suresh Siddha 提交于
Issues in the current select_idle_sibling() logic in select_task_rq_fair() in the context of a task wake-up: a) Once we select the idle sibling, we use that domain (spanning the cpu that the task is currently woken-up and the idle sibling that we found) in our wake_affine() decisions. This domain is completely different from the domain(we are supposed to use) that spans the cpu that the task currently woken-up and the cpu where the task previously ran. b) We do select_idle_sibling() check only for the cpu that the task is currently woken-up on. If select_task_rq_fair() selects the previously run cpu for waking the task, doing a select_idle_sibling() check for that cpu also helps and we don't do this currently. c) In the scenarios where the cpu that the task is woken-up is busy but with its HT siblings are idle, we are selecting the task be woken-up on the idle HT sibling instead of a core that it previously ran and currently completely idle. i.e., we are not taking decisions based on wake_affine() but directly selecting an idle sibling that can cause an imbalance at the SMT/MC level which will be later corrected by the periodic load balancer. Fix this by first going through the load imbalance calculations using wake_affine() and once we make a decision of woken-up cpu vs previously-ran cpu, then choose a possible idle sibling for waking up the task on. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1270079265.7835.8.camel@sbs-t61.sc.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Dave reported that his large SPARC machines spend lots of time in hweight64(), try and optimize some of those needless cpumask_weight() invocations (esp. with the large offstack cpumasks these are very expensive indeed). Reported-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 4月, 2010 2 次提交
-
-
由 Peter Zijlstra 提交于
In order to reduce the dependency on TASK_WAKING rework the enqueue interface to support a proper flags field. Replace the int wakeup, bool head arguments with an int flags argument and create the following flags: ENQUEUE_WAKEUP - the enqueue is a wakeup of a sleeping task, ENQUEUE_WAKING - the enqueue has relative vruntime due to having sched_class::task_waking() called, ENQUEUE_HEAD - the waking task should be places on the head of the priority queue (where appropriate). For symmetry also convert sched_class::dequeue() to a flags scheme. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Oleg noticed a few races with the TASK_WAKING usage on fork. - since TASK_WAKING is basically a spinlock, it should be IRQ safe - since we set TASK_WAKING (*) without holding rq->lock it could be there still is a rq->lock holder, thereby not actually providing full serialization. (*) in fact we clear PF_STARTING, which in effect enables TASK_WAKING. Cure the second issue by not setting TASK_WAKING in sched_fork(), but only temporarily in wake_up_new_task() while calling select_task_rq(). Cure the first by holding rq->lock around the select_task_rq() call, this will disable IRQs, this however requires that we push down the rq->lock release into select_task_rq_fair()'s cgroup stuff. Because select_task_rq_fair() still needs to drop the rq->lock we cannot fully get rid of TASK_WAKING. Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 3月, 2010 10 次提交
-
-
由 Mike Galbraith 提交于
Disabling affine wakeups is too horrible to contemplate. Remove the feature flag. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301890.6785.50.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
This features has been enabled for quite a while, after testing showed that easing preemption for light tasks was harmful to high priority threads. Remove the feature flag. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301675.6785.44.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
This feature never earned its keep, remove it. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301591.6785.42.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Our preemption model relies too heavily on sleeper fairness to disable it without dire consequences. Remove the feature, and save a branch or two. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301520.6785.40.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
This feature hasn't been enabled in a long time, remove effectively dead code. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301447.6785.38.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Don't bother with selection when the current cpu is idle. Recent load balancing changes also make it no longer necessary to check wake_affine() success before returning the selected sibling, so we now always use it. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301369.6785.36.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Allow LAST_BUDDY to kick in sooner, improving cache utilization as soon as a second buddy pair arrives on scene. The cost is latency starting to climb sooner, the tbenefit for tbench 8 on my Q6600 box is ~2%. No detrimental effects noted in normal idesktop usage. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301285.6785.34.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Now that we no longer depend on the clock being updated prior to enqueueing on migratory wakeup, we can clean up a bit, placing calls to update_rq_clock() exactly where they are needed, ie on enqueue, dequeue and schedule events. In the case of a freshly enqueued task immediately preempting, we can skip the update during preemption, as the clock was just updated by the enqueue event. We also save an unneeded call during a migratory wakeup by not updating the previous runqueue, where update_curr() won't be invoked. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301199.6785.32.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Both avg_overlap and avg_wakeup had an inherent problem in that their accuracy was detrimentally affected by cross-cpu wakeups, this because we are missing the necessary call to update_curr(). This can't be fixed without increasing overhead in our already too fat fastpath. Additionally, with recent load balancing changes making us prefer to place tasks in an idle cache domain (which is good for compute bound loads), communicating tasks suffer when a sync wakeup, which would enable affine placement, is turned into a non-sync wakeup by SYNC_LESS. With one task on the runqueue, wake_affine() rejects the affine wakeup request, leaving the unfortunate where placed, taking frequent cache misses. Remove it, and recover some fastpath cycles. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301121.6785.30.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Testing the load which led to this heuristic (nfs4 kbuild) shows that it has outlived it's usefullness. With intervening load balancing changes, I cannot see any difference with/without, so recover there fastpath cycles. Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268301062.6785.29.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 3月, 2010 1 次提交
-
-
由 Lucas De Marchi 提交于
Put all statistic fields of sched_entity in one struct, sched_statistics, and embed it into sched_entity. This change allows to memset the sched_statistics to 0 when needed (for instance when forking), avoiding bugs of non initialized fields. Signed-off-by: NLucas De Marchi <lucas.de.marchi@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1268275065-18542-1-git-send-email-lucas.de.marchi@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 3月, 2010 1 次提交
-
-
由 Paul E. McKenney 提交于
Make rcu_dereference() of runqueue data structures be rcu_dereference_sched(). Located-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <20100228163218.GD6846@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 2月, 2010 1 次提交
-
-
由 Suresh Siddha 提交于
On platforms like dual socket quad-core platform, the scheduler load balancer is not detecting the load imbalances in certain scenarios. This is leading to scenarios like where one socket is completely busy (with all the 4 cores running with 4 tasks) and leaving another socket completely idle. This causes performance issues as those 4 tasks share the memory controller, last-level cache bandwidth etc. Also we won't be taking advantage of turbo-mode as much as we would like, etc. Some of the comparisons in the scheduler load balancing code are comparing the "weighted cpu load that is scaled wrt sched_group's cpu_power" with the "weighted average load per task that is not scaled wrt sched_group's cpu_power". While this has probably been broken for a longer time (for multi socket numa nodes etc), the problem got aggrevated via this recent change: | | commit f93e65c1 | Author: Peter Zijlstra <a.p.zijlstra@chello.nl> | Date: Tue Sep 1 10:34:32 2009 +0200 | | sched: Restore __cpu_power to a straight sum of power | Also with this change, the sched group cpu power alone no longer reflects the group capacity that is needed to implement MC, MT performance (default) and power-savings (user-selectable) policies. We need to use the computed group capacity (sgs.group_capacity, that is computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to find out if the group with the max load is above its capacity and how much load to move etc. Reported-by: NMa Ling <ling.ma@intel.com> Initial-Analysis-by: NZhang, Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> [ -v2: build fix ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x] LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 23 1月, 2010 1 次提交
-
-
由 Thomas Gleixner 提交于
The ability of enqueueing a task to the head of a SCHED_FIFO priority list is required to fix some violations of POSIX scheduling policy. Extend the related functions with a "head" argument. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Tested-by: NCarsten Emde <cbe@osadl.org> Tested-by: NMathias Weber <mathias.weber.mw1@roche.com> LKML-Reference: <20100120171629.734886007@linutronix.de>
-
- 21 1月, 2010 7 次提交
-
-
由 Gautham R Shenoy 提交于
We want to update the sched_group_powers when balance_cpu == this_cpu. Currently the group powers are updated only if the balance_cpu is the first CPU in the local group. But balance_cpu = this_cpu could also be the first idle cpu in the group. Hence fix the place where the group powers are updated. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Signed-off-by: NJoel Schopp <jschopp@austin.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1264017764.5717.127.camel@jschopp-laptop> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Since all load_balance() callers will have !NULL balance parameters we can now assume so and remove a few checks. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The two functions: load_balance{,_newidle}() are very similar, with the following differences: - rq->lock usage - sb->balance_interval updates - *balance check So remove the load_balance_newidle() call with load_balance(.idle = CPU_NEWLY_IDLE), explicitly unlock the rq->lock before calling (would be done by double_lock_balance() anyway), and ignore the other differences for now. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
load_balance() and load_balance_newidle() look remarkably similar, one key point they differ in is the condition on when to active balance. So split out that logic into a separate function. One side effect is that previously load_balance_newidle() used to fail and return -1 under these conditions, whereas now it doesn't. I've not yet fully figured out the whole -1 return case for either load_balance{,_newidle}(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Since load-balancing can hold rq->locks for quite a long while, allow breaking out early when there is lock contention. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move code around to get rid of fwd declarations. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Again, since we only iterate the fair class, remove the abstraction. Since this is the last user of the rq_iterator, remove all that too. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-