提交 · b0432d8f162c7d5d9537b4cb749d44076b76a783 · OpenHarmony / kernel_linux

11 4月, 2011 1 次提交

sched: Fix sched-domain avg_load calculation · b0432d8f

由 Ken Chen 提交于 4月 07, 2011

In function find_busiest_group(), the sched-domain avg_load isn't
calculated at all if there is a group imbalance within the domain. This
will cause erroneous imbalance calculation.

The reason is that calculate_imbalance() sees sds->avg_load = 0 and it
will dump entire sds->max_load into imbalance variable, which is used
later on to migrate entire load from busiest CPU to the puller CPU.

This has two really bad effect:

1. stampede of task migration, and they won't be able to break out
   of the bad state because of positive feedback loop: large load
   delta -> heavier load migration -> larger imbalance and the cycle
   goes on.

2. severe imbalance in CPU queue depth.  This causes really long
   scheduling latency blip which affects badly on application that
   has tight latency requirement.

The fix is to have kernel calculate domain avg_load in both cases. This
will ensure that imbalance calculation is always sensible and the target
is usually half way between busiest and puller CPU.
Signed-off-by: NKen Chen <kenchen@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/20110408002322.3A0D812217F@elm.corp.google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b0432d8f

05 4月, 2011 1 次提交

sched: Clean up rebalance_domains() load-balance interval calculation · 49c022e6

由 Peter Zijlstra 提交于 4月 05, 2011

Instead of the possible multiple-evaluation of num_online_cpus()
in rebalance_domains() that Linus reported, avoid it altogether
in the normal case since it's implemented with a Hamming weight
function over a cpu bitmask which can be darn expensive for those
with big iron.

This also makes it cleaner, smaller and documents the code.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1301991265.2225.12.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

49c022e6

31 3月, 2011 2 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

sched: Fix rebalance interval calculation · 3436ae12

由 Sisir Koppaka 提交于 3月 26, 2011

The interval for checking scheduling domains if they are due to be
balanced currently depends on boot state NR_CPUS, which may not
accurately reflect the number of online CPUs at the time of check.

Thus replace NR_CPUS with num_online_cpus().

 (ed: Should only affect those who set NR_CPUS really high, such as 4096
      or so :-)
Signed-off-by: NSisir Koppaka <sisir.koppaka@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <AANLkTikqHWid2Q93F5U5Qw5snJH8C5PXoa7J6=6hYO94@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3436ae12

04 3月, 2011 2 次提交

sched: Resched proper CPU on yield_to() · 6d1cafd8

由 Venkatesh Pallipadi 提交于 3月 01, 2011

yield_to_task_fair() has code to resched the CPU of yielding task when the
intention is to resched the CPU of the task that is being yielded to.

Change here fixes the problem and also makes the resched conditional on
rq != p_rq.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1299025701-22168-1-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d1cafd8

sched: Allow SCHED_BATCH to preempt SCHED_IDLE tasks · a2f5c9ab

由 Darren Hart 提交于 2月 22, 2011

Perform the test for SCHED_IDLE before testing for SCHED_BATCH (and
ensure idle tasks don't preempt idle tasks) so the non-interactive,
but still important, SCHED_BATCH tasks will run in favor of the very
low priority SCHED_IDLE tasks.
Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
LKML-Reference: <1298408674-3130-2-git-send-email-dvhart@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a2f5c9ab

23 2月, 2011 3 次提交

sched: Fix the group_imb logic · 866ab43e

由 Peter Zijlstra 提交于 2月 21, 2011

On a 2*6*2 machine something like:

 taskset -c 3-11 bash -c 'for ((i=0;i<9;i++)) do while :; do :; done & done'

_should_ result in 9 busy CPUs, each running 1 task.

However it didn't quite work reliably, most of the time one cpu of the
second socket (6-11) would be idle and one cpu of the first socket
(0-5) would have two tasks on it.

The group_imb logic is supposed to deal with this and detect when a
particular group is imbalanced (like in our case, 0-2 are idle but 3-5
will have 4 tasks on it).

The detection phase needed a bit of a tweak as it was too weak and
required more than 2 avg weight tasks difference between idle and busy
cpus in the group which won't trigger for our test-case. So cure that
to be one or more avg task weight difference between cpus.

Once the detection phase worked, it was then defeated by the f_b_g()
tests trying to avoid ping-pongs. In particular, this_load >= max_load
triggered because the pulling cpu (the (first) idle cpu in on the
second socket, say 6) would find this_load to be 5 and max_load to be
4 (there'd be 5 tasks running on our socket and only 4 on the other
socket).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

866ab43e

sched: Clean up some f_b_g() comments · cc57aa8f

由 Peter Zijlstra 提交于 2月 21, 2011

The existing comment tends to grow state (as it already has), split it
up and place it near the actual tests.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cc57aa8f

sched: Clean up remnants of sd_idle · c186fafe

由 Peter Zijlstra 提交于 2月 21, 2011

With the wholesale removal of the sd_idle SMT logic we can clean up
some more.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c186fafe

16 2月, 2011 1 次提交

sched: Wholesale removal of sd_idle logic · 46e49b38

由 Venkatesh Pallipadi 提交于 2月 14, 2011

sd_idle logic was introduced way back in 2005 (commit 5969fe06),
as an HT optimization.

As per the discussion in the thread here:

  lkml - sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1
  https://patchwork.kernel.org/patch/532501/

The capacity based logic in the load balancer right now handles this
in a much cleaner way, handling more than 2 SMT siblings etc, and sd_idle
does not seem to bring any additional benefits. sd_idle logic also has
some bugs that has performance impact. Here is the patch that removes
the sd_idle logic altogether.

Also, there was a dependency of sched_mc_power_savings == 2, with sd_idle
logic.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Acked-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1297723130-693-1-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

46e49b38

03 2月, 2011 4 次提交

sched: Add yield_to(task, preempt) functionality · d95f4122

由 Mike Galbraith 提交于 2月 01, 2011

Currently only implemented for fair class tasks.

Add a yield_to_task method() to the fair scheduling class. allowing the
caller of yield_to() to accelerate another thread in it's thread group,
task group.

Implemented via a scheduler hint, using cfs_rq->next to encourage the
target being selected.  We can rely on pick_next_entity to keep things
fair, so noone can accelerate a thread that has already used its fair
share of CPU time.

This also means callers should only call yield_to when they really
mean it.  Calling it too often can result in the scheduler just
ignoring the hint.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201095051.4ddb7738@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d95f4122

sched: Use a buddy to implement yield_task_fair() · ac53db59

由 Rik van Riel 提交于 2月 01, 2011

Use the buddy mechanism to implement yield_task_fair.  This
allows us to skip onto the next highest priority se at every
level in the CFS tree, unless doing so would introduce gross
unfairness in CPU time distribution.

We order the buddy selection in pick_next_entity to check
yield first, then last, then next.  We need next to be able
to override yield, because it is possible for the "next" and
"yield" task to be different processen in the same sub-tree
of the CFS tree.  When they are, we need to go into that
sub-tree regardless of the "yield" hint, and pick the correct
entity once we get to the right level.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ac53db59

sched: Limit the scope of clear_buddies · 2c13c919

由 Rik van Riel 提交于 2月 01, 2011

The clear_buddies function does not seem to play well with the concept
of hierarchical runqueues.  In the following tree, task groups are
represented by 'G', tasks by 'T', next by 'n' and last by 'l'.

     (nl)
    /    \
   G(nl)  G
   / \     \
 T(l) T(n)  T

This situation can arise when a task is woken up T(n), and the previously
running task T(l) is marked last.

When clear_buddies is called from either T(l) or T(n), the next and last
buddies of the group G(nl) will be cleared.  This is not the desired
result, since we would like to be able to find the other type of buddy
in many cases.

This especially a worry when implementing yield_task_fair through the
buddy system.

The fix is simple: only clear the buddy type that the task itself
is indicated to be.  As an added bonus, we stop walking up the tree
when the buddy has already been cleared or pointed elsewhere.
Signed-off-by: NRik van Riel <riel@redhat.coM>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201094837.6b0962a9@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2c13c919

sched: Check the right ->nr_running in yield_task_fair() · 725e7580

由 Rik van Riel 提交于 2月 01, 2011

With CONFIG_FAIR_GROUP_SCHED, each task_group has its own cfs_rq.
Yielding to a task from another cfs_rq may be worthwhile, since
a process calling yield typically cannot use the CPU right now.

Therefor, we want to check the per-cpu nr_running, not the
cgroup local one.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201094715.798c4f86@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

725e7580

26 1月, 2011 6 次提交

sched: Fix switch_from_fair() · da7a735e

由 Peter Zijlstra 提交于 1月 17, 2011

When a task is taken out of the fair class we must ensure the vruntime
is properly normalized because when we put it back in it will assume
to be normalized.

The case that goes wrong is when changing away from the fair class
while sleeping. Sleeping tasks have non-normalized vruntime in order
to make sleeper-fairness work. So treat the switch away from fair as a
wakeup and preserve the relative vruntime.

Also update sysrq-n to call the ->switch_{to,from} methods.
Reported-by: NOnkalo Samu <samu.p.onkalo@nokia.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

da7a735e

sched: Avoid expensive initial update_cfs_load() · f07333bf

由 Paul Turner 提交于 1月 21, 2011

Since cfs->{load_stamp,load_last} are zero-initalized the initial load update
will consider the delta to be 'since the beginning of time'.

This results in a lot of pointless divisions to bring this large period to be
within the sysctl_sched_shares_window.

Fix this by initializing load_stamp to be 1 at cfs_rq initialization, this
allows for an initial load_stamp > load_last which then lets standard idle
truncation proceed.

We avoid spinning (and slightly improve consistency) by fixing delta to be
[period - 1] in this path resulting in a slightly more predictable shares ramp.
(Previously the amount of idle time preserved by the overflow would range between
[period/2,period-1].)
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044852.102126037@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f07333bf

sched: Simplify update_cfs_shares parameters · 6d5ab293

由 Paul Turner 提交于 1月 21, 2011

Re-visiting this: Since update_cfs_shares will now only ever re-weight an
entity that is a relative parent of the current entity in enqueue_entity; we
can safely issue the account_entity_enqueue relative to that cfs_rq and avoid
the requirement for special handling of the enqueue case in update_cfs_shares.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.915214637@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d5ab293

sched: Use rq->clock_task instead of rq->clock for correctly maintaining load averages · 05ca62c6

由 Paul Turner 提交于 1月 21, 2011

The delta in clock_task is a more fair attribution of how much time a tg has
been contributing load to the current cpu.

While not really important it also means we're more in sync (by magnitude)
with respect to periodic updates (since __update_curr deltas are clock_task
based).
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044852.007092349@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

05ca62c6

sched: Fix/remove redundant cfs_rq checks · b815f196

由 Paul Turner 提交于 1月 21, 2011

Since updates are against an entity's queuing cfs_rq it's not possible to
enter update_cfs_{shares,load} with a NULL cfs_rq.  (Indeed, update_cfs_load
would crash prior to the check if we did anyway since we load is examined
during the initializers).

Also, in the update_cfs_load case there's no point
in maintaining averages for rq->cfs_rq since we don't perform shares
distribution at that level -- NULL check is replaced accordingly.

Thanks to Dan Carpenter for pointing out the deference before NULL check.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.825284940@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b815f196

sched: Fix sign under-flows in wake_affine · e37b6a7b

由 Paul Turner 提交于 1月 21, 2011

While care is taken around the zero-point in effective_load to not exceed
the instantaneous rq->weight, it's still possible (e.g. using wake_idx != 0)
for (load + effective_load) to underflow.

In this case the comparing the unsigned values can result in incorrect balanced
decisions.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110122044851.734245014@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e37b6a7b

24 1月, 2011 1 次提交

sched: Fix poor interactivity on UP systems due to group scheduler nice tune bug · 3ff6dcac

由 Yong Zhang 提交于 1月 24, 2011

Michael Witten and Christian Kujau reported that the autogroup
scheduling feature hurts interactivity on their UP systems.

It turns out that this is an older bug in the group scheduling code,
and the wider appeal provided by the autogroup feature exposed it
more prominently.

When on UP with FAIR_GROUP_SCHED enabled, tune shares
only affect tg->shares, but is not reflected in
tg->se->load. The reason is that update_cfs_shares()
does nothing on UP.

So introduce update_cfs_shares() for UP && FAIR_GROUP_SCHED.

This issue was found when enable autogroup scheduling was enabled,
but it is an older bug that also exists on cgroup.cpu on UP.
Reported-and-Tested-by: NMichael Witten <mfwitten@gmail.com>
Reported-and-Tested-by: NChristian Kujau <christian@nerdbynature.de>
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Acked-by: NPekka Enberg <penberg@kernel.org>
Acked-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110124073352.GA24186@windriver.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3ff6dcac

18 1月, 2011 2 次提交

sched: Fix signed unsigned comparison in check_preempt_tick() · d7d82944

由 Mike Galbraith 提交于 1月 05, 2011

Signed unsigned comparison may lead to superfluous resched if leftmost
is right of the current task, wasting a few cycles, and inadvertently
_lengthening_ the current task's slice.
Reported-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294202477.9384.5.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d7d82944

sched: Update effective_load() to use global share weights · 977dda7c

由 Paul Turner 提交于 1月 14, 2011

Previously effective_load would approximate the global load weight present on
a group taking advantage of:

entity_weight = tg->shares ( lw / global_lw ), where entity_weight was provided
by tg_shares_up.

This worked (approximately) for an 'empty' (at tg level) cpu since we would
place boost load representative of what a newly woken task would receive.

However, now that load is instantaneously updated this assumption is no longer
true and the load calculation is rather incorrect in this case.

Fix this (and improve the general case) by re-writing effective_load to take
advantage of the new shares distribution code.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110115015817.069769529@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

977dda7c

19 12月, 2010 2 次提交

sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight · 19e5eebb

由 Paul Turner 提交于 12月 15, 2010

Mike Galbraith reported poor interactivity[*] when the new shares distribution
code was combined with autogroups.

The root cause turns out to be a mis-ordering of accounting accrued execution
time and shares updates.  Since update_curr() is issued hierarchically,
updating the parent entity weights to reflect child enqueue/dequeue results in
the parent's unaccounted execution time then being accrued (vs vruntime) at the
new weight as opposed to the weight present at accumulation.

While this doesn't have much effect on processes with timeslices that cross a
tick, it is particularly problematic for an interactive process (e.g. Xorg)
which incurs many (tiny) timeslices.  In this scenario almost all updates are
at dequeue which can result in significant fairness perturbation (especially if
it is the only thread, resulting in potential {tg->shares, MIN_SHARES}
transitions).

Correct this by ensuring unaccounted time is accumulated prior to manipulating
an entity's weight.

[*] http://xkcd.com/619/ is perversely Nostradamian here.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20101216031038.159704378@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

19e5eebb

sched: Move periodic share updates to entity_tick() · 43365bd7

由 Paul Turner 提交于 12月 15, 2010

Long running entities that do not block (dequeue) require periodic updates to
maintain accurate share values. (Note: group entities with several threads are
quite likely to be non-blocking in many circumstances).

By virtue of being long-running however, we will see entity ticks (otherwise
the required update occurs in dequeue/put and we are done). Thus we can move
the detection (and associated work) for these updates into the periodic path.

This restores the 'atomicity' of update_curr() with respect to accounting.
Signed-off-by: NPaul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101216031038.067028969@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

43365bd7

23 11月, 2010 1 次提交

sched: Fix UP build breakage · 70caf8a6

由 Peter Zijlstra 提交于 11月 20, 2010

The recent cgroup-scheduling rework caused a UP build problem.

Cc: Paul Turner <pjt@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

70caf8a6

18 11月, 2010 12 次提交

sched: Allow update_cfs_load() to update global load · d6b55918