提交 · 2bba22c50b06abe9fd0d23933b1e64d35b419262 · OpenHarmony / kernel_linux

09 9月, 2009 1 次提交

sched: Turn off child_runs_first · 2bba22c5

由 Mike Galbraith 提交于 9月 09, 2009

Set child_runs_first default to off.

It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs
get preempted by child tasks, reducing parallelism.

Note, this patch might make existing races in user
applications more prominent than before - so breakages
might be bisected to this commit.

Child-runs-first is broken on SMP to begin with, and we
already had it off briefly in v2.6.23 so most of the
offenders ought to be fixed. Would be nice not to revert
this commit but fix those apps finally ...
Signed-off-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1252486344.28645.18.camel@marge.simson.net>
[ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
  people want to work around broken apps. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2bba22c5

08 9月, 2009 3 次提交

sched: Ensure that a child can't gain time over it's parent after fork() · b5d9d734

由 Mike Galbraith 提交于 9月 08, 2009

A fork/exec load is usually "pass the baton", so the child
should never be placed behind the parent.  With START_DEBIT we
make room for the new task, but with child_runs_first, that
room comes out of the _parent's_ hide. There's nothing to say
that the parent wasn't ahead of min_vruntime at fork() time,
which means that the "baton carrier", who is essentially the
parent in drag, can gain time and increase scheduling latencies
for waiters.

With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
enabled, we essentially pass the sleeper fairness off to the
child, which is fine, but if we don't base placement on the
parent's updated vruntime, we can end up compounding latency
woes if the child itself then does fork/exec.  The debit
incurred at fork doesn't hurt the parent who is then going to
sleep and maybe exit, but the child who acquires the error
harms all comers.

This improves latencies of make -j<n> kernel build workloads.
Reported-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b5d9d734

sched: Deal with low-load in wake_affine() · 71a29aa7

由 Peter Zijlstra 提交于 9月 07, 2009

wake_affine() would always fail under low-load situations where
both prev and this were idle, because adding a single task will
always be a significant imbalance, even if there's nothing
around that could balance it.

Deal with this by allowing imbalance when there's nothing you
can do about it.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71a29aa7

sched: Remove short cut from select_task_rq_fair() · cdd2ab3d

由 Peter Zijlstra 提交于 9月 07, 2009

select_task_rq_fair() incorrectly skips the wake_affine()
logic, remove this.

When prev_cpu == this_cpu, the code jumps straight to the
wake_idle() logic, this doesn't give the wake_affine() logic
the chance to pin the task to this cpu.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdd2ab3d

02 9月, 2009 2 次提交

sched: Add wait, sleep and iowait accounting tracepoints · 768d0c27

由 Peter Zijlstra 提交于 7月 23, 2009

Add 3 schedstat tracepoints to help account for wait-time,
sleep-time and iowait-time.

They can also be used as a perf-counter source to profile tasks
on these clocks.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <new-submission>
[ build fix for the !CONFIG_SCHEDSTATS case ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

768d0c27

sched: Provide iowait counters · 8f0dfc34

由 Arjan van de Ven 提交于 7月 20, 2009

For counting how long an application has been waiting for
(disk) IO, there currently is only the HZ sample driven
information available, while for all other counters in this
class, a high resolution version is available via
CONFIG_SCHEDSTATS.

In order to make an improved bootchart tool possible, we also
need a higher resolution version of the iowait time.

This patch below adds this scheduler statistic to the kernel.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A64B813.1080506@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f0dfc34

02 8月, 2009 3 次提交

sched: Add debug check to task_of() · 8f48894f

由 Peter Zijlstra 提交于 7月 24, 2009

A frequent mistake appears to be to call task_of() on a
scheduler entity that is not actually a task, which can result
in a wild pointer.

Add a check to catch these mistakes.
Suggested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f48894f

sched: Fully integrate cpus_active_map and root-domain code · 00aec93d

由 Gregory Haskins 提交于 7月 30, 2009

Reflect "active" cpus in the rq->rd->online field, instead of
the online_map.

The motivation is that things that use the root-domain code
(such as cpupri) only care about cpus classified as "active"
anyway. By synchronizing the root-domain state with the active
map, we allow several optimizations.

For instance, we can remove an extra cpumask_and from the
scheduler hotpath by utilizing rq->rd->online (since it is now
a cached version of cpu_active_map & rq->rd->span).
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

00aec93d

sched: Fix latencytop and sleep profiling vs group scheduling · e414314c

由 Peter Zijlstra 提交于 7月 23, 2009

The latencytop and sleep accounting code assumes that any
scheduler entity represents a task, this is not so.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e414314c

18 7月, 2009 1 次提交

sched: Account for vruntime wrapping · 54fdc581

由 Fabio Checconi 提交于 7月 16, 2009

I spotted two sites that didn't take vruntime wrap-around into
account. Fix these by creating a comparison helper that does do
so.
Signed-off-by: NFabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

54fdc581

11 7月, 2009 1 次提交

sched: Fix bug in SCHED_IDLE interaction with group scheduling · d07387b4

由 Paul Turner 提交于 7月 10, 2009

One of the isolation modifications for SCHED_IDLE is the
unitization of sleeper credit.  However the check for this
assumes that the sched_entity we're placing always belongs to a
task.

This is potentially not true with group scheduling and leaves
us rummaging randomly when we try to pull the policy.
Signed-off-by: NPaul Turner <pjt@google.com>
Cc: peterz@infradead.org
LKML-Reference: <alpine.DEB.1.00.0907101649570.29914@kitami.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d07387b4

18 6月, 2009 1 次提交

sched: Fix out of scope variable access in sched_slice() · 3104bf03

由 Christian Engelmayer 提交于 6月 16, 2009

Access to local variable lw is aliased by usage of pointer load.
Access to pointer load in calc_delta_mine() happens when lw is
already out of scope.

[ Reported by static code analysis. ]
Signed-off-by: NChristian Engelmayer <christian.engelmayer@frequentis.com>
LKML-Reference: <20090616103512.0c846e51@frequentis.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3104bf03

09 4月, 2009 1 次提交

sched: remove redundant hierarchy walk in check_preempt_wakeup · 002f128b

由 Paul Turner 提交于 4月 08, 2009

Impact: micro-optimization

Under group scheduling we traverse up until we are at common siblings
to make the wakeup comparison on.

At this point however, they should have the same parent so continuing
to check up the tree is redundant.
Signed-off-by: NPaul Turner <pjt@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <alpine.DEB.1.00.0904081520320.30317@kitami.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

002f128b

11 2月, 2009 1 次提交

sched: revert recent sync wakeup changes · fc631c82

由 Peter Zijlstra 提交于 2月 11, 2009

Intel reported a 10% regression (mysql+sysbench) on a 16-way machine
with these patches:

  1596e297: sched: symmetric sync vs avg_overlap
  d942fb6c: sched: fix sync wakeups

Revert them.
Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Bisected-by: NLin Ming <ming.m.lin@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fc631c82

01 2月, 2009 3 次提交

sched: fix buddie group latency · a571bbea

由 Peter Zijlstra 提交于 1月 28, 2009

Similar to the previous patch, by not clearing buddies we can select entities
past their run quota, which can increase latency. This means we have to clear
group buddies as well.

Do not use the group clear for pick_next_task(), otherwise that'll get O(n^2).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a571bbea

sched: clear buddies more aggressively · a9f3e2b5

由 Mike Galbraith 提交于 1月 28, 2009

It was noticed that a task could get re-elected past its run quota due to buddy
affinities. This could increase latency a little. Cure it by more aggresively
clearing buddy state.

We do so in two situations:
 - when we force preempt
 - when we select a buddy to run
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a9f3e2b5

sched: fix sync wakeups · d942fb6c

由 Peter Zijlstra 提交于 1月 26, 2009

Pawel Dziekonski reported that the openssl benchmark and his
quantum chemistry application both show slowdowns due to the
scheduler under-parallelizing execution.

The reason are pipe wakeups still doing 'sync' wakeups which
overrides the normal buddy wakeup logic - even if waker and
wakee are loosely coupled.

Fix an inversion of logic in the buddy wakeup code.
Reported-by: NPawel Dziekonski <dzieko@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d942fb6c

16 1月, 2009 1 次提交

sched: sched_slice() fixlet · 6272d68c

由 Lin Ming 提交于 1月 15, 2009

Mike's change: 0a582440 "sched: fix sched_slice())" broke group
scheduling by forgetting to reload cfs_rq on each loop.

This patch fixes aim7 regression and specjbb2005 regression becomes
less than 1.5% on 8-core stokley.
Signed-off-by: NLin Ming <ming.m.lin@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NJayson King <dev@jaysonking.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6272d68c

15 1月, 2009 3 次提交

sched: fix update_min_vruntime · e17036da

由 Peter Zijlstra 提交于 1月 15, 2009

Impact: fix SCHED_IDLE latency problems

OK, so we have 1 running task A (which is obviously curr and the tree is
equally obviously empty).

'A' nicely chugs along, doing its thing, carrying min_vruntime along as it
goes.

Then some whacko speed freak SCHED_IDLE task gets inserted due to SMP
balancing, which is very likely far right, in that case

update_curr
  update_min_vruntime
    cfs_rq->rb_leftmost := true (the crazy task sitting in a tree)
      vruntime = se->vruntime

and voila, min_vruntime is waaay right of where it ought to be.

OK, so why did I write it like that to begin with...

Aah, yes.

Say we've just dequeued current

schedule
  deactivate_task(prev)
    dequeue_entity
      update_min_vruntime

Then we'll set

  vruntime = cfs_rq->min_vruntime;

we find !cfs_rq->curr, but do find someone in the tree. Then we _must_
do vruntime = se->vruntime, because

 vruntime = min_vruntime(vruntime := cfs_rq->min_vruntime, se->vruntime)

will not advance vruntime, and cause lags the other way around (which we
fixed with that initial patch: 1af5f730
(sched: more accurate min_vruntime accounting).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NMike Galbraith <efault@gmx.de>
Acked-by: NMike Galbraith <efault@gmx.de>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e17036da

sched: SCHED_OTHER vs SCHED_IDLE isolation · 6bc912b7

由 Peter Zijlstra 提交于 1月 15, 2009

Stronger SCHED_IDLE isolation:

 - no SCHED_IDLE buddies
 - never let SCHED_IDLE preempt on wakeup
 - always preempt SCHED_IDLE on wakeup
 - limit SLEEPER fairness for SCHED_IDLE.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6bc912b7

sched: prefer wakers · e52fb7c0

由 Peter Zijlstra 提交于 1月 14, 2009

Prefer tasks that wake other tasks to preempt quickly. This improves
performance because more work is available sooner.

The workload that prompted this patch was a kernel build over NFS4 (for some
curious and not understood reason we had to revert commit:
18de9735 to make any progress at all)

Without this patch a make -j8 bzImage (of x86-64 defconfig) would take
3m30-ish, with this patch we're down to 2m50-ish.

psql-sysbench/mysql-sysbench show a slight improvement in peak performance as
well, tbench and vmark seemed to not care.

It is possible to improve upon the build time (to 2m20-ish) but that seriously
destroys other benchmarks (just shows that there's more room for tinkering).

Much thanks to Mike who put in a lot of effort to benchmark things and proved
a worthy opponent with a competing patch.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e52fb7c0

09 1月, 2009 1 次提交

generic swap(): sched: remove local swap() macro · df4927bf

由 Wu Fengguang 提交于 1月 07, 2009

Use the new generic implementation.
Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df4927bf

03 1月, 2009 1 次提交

sched: fix sched_slice() · 0a582440

由 Mike Galbraith 提交于 1月 02, 2009

Impact: fix bad-interactivity buglet

Fix sched_slice() to emit a sane result whether a task is currently
enqueued or not.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Tested-by: NJayson King <dev@jaysonking.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

 kernel/sched_fair.c |   30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

0a582440

19 12月, 2008 1 次提交

sched: bias task wakeups to preferred semi-idle packages · 7eb52dfa

由 Vaidyanathan Srinivasan 提交于 12月 18, 2008

Impact: tweak task wakeup to save power more agressively

Preferred wakeup cpu (from a semi idle package) has been
nominated in find_busiest_group() in the previous patch.  Use
this information in sched_mc_preferred_wakeup_cpu in function
wake_idle() to bias task wakeups if the following conditions
are satisfied:

        - The present cpu that is trying to wakeup the process is
          idle and waking the target process on this cpu will
          potentially wakeup a completely idle package
        - The previous cpu on which the target process ran is
          also idle and hence selecting the previous cpu may
          wakeup a semi idle cpu package
        - The task being woken up is allowed to run in the
          nominated cpu (cpu affinity and restrictions)

Basically if both the current cpu and the previous cpu on
which the task ran is idle, select the nominated cpu from semi
idle cpu package for running the new task that is waking up.

Cache hotness is considered since the actual biasing happens
in wake_idle() only if the application is cache cold.

This technique will effectively move short running bursty jobs in
a mostly idle system.

Wakeup biasing for power savings gets automatically disabled if
system utilisation increases due to the fact that the probability
of finding both this_cpu and prev_cpu idle decreases.
Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7eb52dfa

16 12月, 2008 2 次提交

sched: optimize update_curr() · 34f28ecd

由 Peter Zijlstra 提交于 12月 16, 2008

Impact: micro-optimization

Skip the hard work when there is none.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

34f28ecd

sched: fix wakeup preemption clock · 03e89e45

由 Mike Galbraith 提交于 12月 16, 2008

Impact: sharpen the wakeup-granularity to always be against current scheduler time

It was possible to do the preemption check against an old time stamp.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

03e89e45

25 11月, 2008 2 次提交

sched: convert remaining old-style cpumask operators · 96f874e2

由 Rusty Russell 提交于 11月 25, 2008

Impact: Trivial API conversion

  NR_CPUS -> nr_cpu_ids
  cpumask_t -> struct cpumask
  sizeof(cpumask_t) -> cpumask_size()
  cpumask_a = cpumask_b -> cpumask_copy(&cpumask_a, &cpumask_b)

  cpu_set() -> cpumask_set_cpu()
  first_cpu() -> cpumask_first()
  cpumask_of_cpu() -> cpumask_of()
  cpus_* -> cpumask_*

There are some FIXMEs where we all archs to complete infrastructure
(patches have been sent):

  cpu_coregroup_map -> cpu_coregroup_mask
  node_to_cpumask* -> cpumask_of_node

There is also one FIXME where we pass an array of cpumasks to
partition_sched_domains(): this implies knowing the definition of
'struct cpumask' and the size of a cpumask.  This will be fixed in a
future patch.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

96f874e2

sched: wrap sched_group and sched_domain cpumask accesses. · 758b2cdc

由 Rusty Russell 提交于 11月 25, 2008

Impact: trivial wrap of member accesses

This eases the transition in the next patch.

We also get rid of a temporary cpumask in find_idlest_cpu() thanks to
for_each_cpu_and, and sched_balance_self() due to getting weight before
setting sd to NULL.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

758b2cdc

11 11月, 2008 1 次提交

sched: release buddies on yield · 2002c695

由 Peter Zijlstra 提交于 11月 11, 2008

Clear buddies on yield, so that the buddy rules don't schedule them
despite them being placed right-most.

This fixed a performance regression with yield-happy binary JVMs.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Tested-by: NLin Ming <ming.m.lin@intel.com>

2002c695

05 11月, 2008 4 次提交

sched: fix buddies for group scheduling · 02479099

由 Peter Zijlstra 提交于 11月 04, 2008

Impact: scheduling order fix for group scheduling

For each level in the hierarchy, set the buddy to point to the right entity.
Therefore, when we do the hierarchical schedule, we have a fair chance of
ending up where we meant to.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

02479099

sched: backward looking buddy · 4793241b

由 Peter Zijlstra 提交于 11月 04, 2008

Impact: improve/change/fix wakeup-buddy scheduling

Currently we only have a forward looking buddy, that is, we prefer to
schedule to the task we last woke up, under the presumption that its
going to consume the data we just produced, and therefore will have
cache hot benefits.

This allows co-waking producer/consumer task pairs to run ahead of the
pack for a little while, keeping their cache warm. Without this, we
would interleave all pairs, utterly trashing the cache.

This patch introduces a backward looking buddy, that is, suppose that
in the above scenario, the consumer preempts the producer before it
can go to sleep, we will therefore miss the wakeup from consumer to
producer (its already running, after all), breaking the cycle and
reverting to the cache-trashing interleaved schedule pattern.

The backward buddy will try to schedule back to the task that woke us
up in case the forward buddy is not available, under the assumption
that the last task will be the one with the most cache hot task around
barring current.

This will basically allow a task to continue after it got preempted.

In order to avoid starvation, we allow either buddy to get wakeup_gran
ahead of the pack.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4793241b

sched: fix fair preempt check · d95f98d0

由 Peter Zijlstra 提交于 11月 04, 2008

Impact: fix cross-class preemption

Inter-class wakeup preemptions should go on class order.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d95f98d0

sched: cleanup fair task selection · f4b6755f

由 Peter Zijlstra 提交于 11月 04, 2008

Impact: cleanup

Clean up task selection
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f4b6755f

24 10月, 2008 4 次提交

sched: virtual time buddy preemption · 3f3a4904

由 Peter Zijlstra 提交于 10月 24, 2008

Since we moved wakeup preemption back to virtual time, it makes sense to move
the buddy stuff back as well. The purpose of the buddy scheduling is to allow
a quickly scheduling pair of tasks to run away from the group as far as a
regular busy task would be allowed under wakeup preemption.

This has the advantage that the pair can ping-pong for a while, enjoying
cache-hotness. Without buddy scheduling other tasks would interleave destroying
the cache.

Also, it saves a word in cfs_rq.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f3a4904

sched: re-instate vruntime based wakeup preemption · 464b7527

由 Peter Zijlstra 提交于 10月 24, 2008

The advantage is that vruntime based wakeup preemption has a better
conceptual model. Here wakeup_gran = 0 means: preempt when 'fair'.
Therefore wakeup_gran is the granularity of unfairness we allow in order
to make progress.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

464b7527

sched: weaken sync hint · 0d13033b

由 Mike Galbraith 提交于 10月 24, 2008

Mysql+oltp and pgsql+oltp peaks are still shifted right. The below puts
the peaks back to 1 client/server pair per core.

Use the avg_overlap information to weaken the sync hint.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d13033b

sched: more accurate min_vruntime accounting · 1af5f730

由 Peter Zijlstra 提交于 10月 24, 2008

Mike noticed the current min_vruntime tracking can go wrong and skip the
current task. If the only remaining task in the tree is a nice 19 task
with huge vruntime, new tasks will be inserted too far to the right too,
causing some interactibity issues.

min_vruntime can only change due to the leftmost entry disappearing
(dequeue_entity()), or by the leftmost entry being incremented past the
next entry, which elects a new leftmost (__update_curr())

Due to the current entry not being part of the actual tree, we have to
compare the leftmost tree entry with the current entry, and take the
leftmost of these two.

So create a update_min_vruntime() function that takes computes the
leftmost vruntime in the system (either tree of current) and increases
the cfs_rq->min_vruntime if the computed value is larger than the
previously found min_vruntime. And call this from the two sites we've
identified that can change min_vruntime.
Reported-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1af5f730

22 10月, 2008 1 次提交

sched: add CONFIG_SMP consistency · 4ce72a2c

由 Li Zefan 提交于 10月 22, 2008

a patch from Henrik Austad did this:

>> Do not declare select_task_rq as part of sched_class when CONFIG_SMP is
>> not set.

Peter observed:

> While a proper cleanup, could you do it by re-arranging the methods so
> as to not create an additional ifdef?

Do not declare select_task_rq and some other methods as part of sched_class
when CONFIG_SMP is not set.

Also gather those methods to avoid CONFIG_SMP mess.
Idea-by: NHenrik Austad <henrik.austad@gmail.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NHenrik Austad <henrik@austad.us>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4ce72a2c

20 10月, 2008 2 次提交

sched: revert back to per-rq vruntime · f9c0b095

由 Peter Zijlstra 提交于 10月 17, 2008

Vatsa rightly points out that having the runqueue weight in the vruntime
calculations can cause unfairness in the face of task joins/leaves.

Suppose: dv = dt * rw / w

Then take 10 tasks t_n, each of similar weight. If the first will run 1
then its vruntime will increase by 10. Now, if the next 8 tasks leave after
having run their 1, then the last task will get a vruntime increase of 2
after having run 1.

Which will leave us with 2 tasks of equal weight and equal runtime, of which
one will not be scheduled for 8/2=4 units of time.

Ergo, we cannot do that and must use: dv = dt / w.

This means we cannot have a global vruntime based on effective priority, but
must instead go back to the vruntime per rq model we started out with.

This patch was lightly tested by doing starting while loops on each nice level
and observing their execution time, and a simple group scenario of 1:2:3 pinned
to a single cpu.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f9c0b095

sched: fair scheduler should not resched rt tasks · a4c2f00f

由 Peter Zijlstra 提交于 10月 17, 2008

With use of ftrace Steven noticed that some RT tasks got rescheduled due
to sched_fair interaction.

What happens is that we reprogram the hrtick from enqueue/dequeue_fair_task()
because that can change nr_running, and thus a current tasks ideal runtime.
However, its possible the current task isn't a fair_sched_class task, and thus
doesn't have a hrtick set to change.

Fix this by wrapping those hrtick_start_fair() calls in a hrtick_update()
function, which will check for the right conditions.
Reported-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a4c2f00f

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年