提交 · 6e40f5bbbc734231bc5809d3eb785e3c21f275d7 · openanolis / cloud-kernel

23 1月, 2010 1 次提交

sched: Extend enqueue_task to allow head queueing · ea87bb78

由 Thomas Gleixner 提交于 1月 20, 2010

The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.

Extend the related functions with a "head" argument.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Tested-by: NCarsten Emde <cbe@osadl.org>
Tested-by: NMathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.734886007@linutronix.de>

ea87bb78

21 1月, 2010 11 次提交

sched: Fix the place where group powers are updated · 871e35bc

由 Gautham R Shenoy 提交于 1月 20, 2010

We want to update the sched_group_powers when balance_cpu == this_cpu.

Currently the group powers are updated only if the balance_cpu is the
first CPU in the local group. But balance_cpu = this_cpu could also be
the first idle cpu in the group. Hence fix the place where the group
powers are updated.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NJoel Schopp <jschopp@austin.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264017764.5717.127.camel@jschopp-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

871e35bc

sched: Assume *balance is valid · 8f190fb3

由 Peter Zijlstra 提交于 12月 24, 2009

Since all load_balance() callers will have !NULL balance parameters we
can now assume so and remove a few checks.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f190fb3

sched: Remove load_balance_newidle() · f492e12e

由 Peter Zijlstra 提交于 12月 23, 2009

The two functions: load_balance{,_newidle}() are very similar, with the
following differences:

 - rq->lock usage
 - sb->balance_interval updates
 - *balance check

So remove the load_balance_newidle() call with load_balance(.idle =
CPU_NEWLY_IDLE), explicitly unlock the rq->lock before calling (would be
done by double_lock_balance() anyway), and ignore the other differences
for now.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f492e12e

sched: Unify load_balance{,_newidle}() · 1af3ed3d

由 Peter Zijlstra 提交于 12月 23, 2009

load_balance() and load_balance_newidle() look remarkably similar, one
key point they differ in is the condition on when to active balance.

So split out that logic into a separate function.

One side effect is that previously load_balance_newidle() used to fail
and return -1 under these conditions, whereas now it doesn't. I've not
yet fully figured out the whole -1 return case for either
load_balance{,_newidle}().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1af3ed3d

sched: Add a lock break for PREEMPT=y · baa8c110

由 Peter Zijlstra 提交于 12月 17, 2009

Since load-balancing can hold rq->locks for quite a long while, allow
breaking out early when there is lock contention.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

baa8c110

sched: Remove from fwd decls · 230059de

由 Peter Zijlstra 提交于 12月 17, 2009

Move code around to get rid of fwd declarations.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

230059de

sched: Remove rq_iterator from move_one_task · 897c395f

由 Peter Zijlstra 提交于 12月 17, 2009

Again, since we only iterate the fair class, remove the abstraction.

Since this is the last user of the rq_iterator, remove all that too.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

897c395f

sched: Remove rq_iterator usage from load_balance_fair · ee00e66f

由 Peter Zijlstra 提交于 12月 17, 2009

Since we only ever iterate the fair class, do away with this abstraction.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ee00e66f

sched: Remove the sched_class load_balance methods · 3d45fd80

由 Peter Zijlstra 提交于 12月 17, 2009

Take out the sched_class methods for load-balancing.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3d45fd80

sched: Move load balance code into sched_fair.c · 1e3c88bd

由 Peter Zijlstra 提交于 12月 17, 2009

Straight fwd code movement.

Since non of the load-balance abstractions are used anymore, do away with
them and simplify the code some. In preparation move the code around.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1e3c88bd

sched: Fix vmark regression on big machines · 50b926e4

由 Mike Galbraith 提交于 1月 04, 2010

SD_PREFER_SIBLING is set at the CPU domain level if power saving isn't
enabled, leading to many cache misses on large machines as we traverse
looking for an idle shared cache to wake to.  Change the enabler of
select_idle_sibling() to SD_SHARE_PKG_RESOURCES, and enable same at the
sibling domain level.
Reported-by: NLin Ming <ming.m.lin@intel.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1262612696.15495.15.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

50b926e4

17 1月, 2010 1 次提交

sched: Don't expose local functions · 6d686f45

由 H Hartley Sweeten 提交于 1月 13, 2010

kernel/sched: don't expose local functions

The get_rr_interval_* functions are all class methods of
struct sched_class. They are not exported so make them
static.
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <201001132021.53253.hartleys@visionengravers.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6d686f45

17 12月, 2009 2 次提交

sched: Remove the cfs_rq dependency from set_task_cpu() · 88ec22d3

由 Peter Zijlstra 提交于 12月 16, 2009

In order to remove the cfs_rq dependency from set_task_cpu() we
need to ensure the task is cfs_rq invariant for all callsites.

The simple approach is to substract cfs_rq->min_vruntime from
se->vruntime on dequeue, and add cfs_rq->min_vruntime on
enqueue.

However, this has the downside of breaking FAIR_SLEEPERS since
we loose the old vruntime as we only maintain the relative
position.

To solve this, we observe that we only migrate runnable tasks,
we do this using deactivate_task(.sleep=0) and
activate_task(.wakeup=0), therefore we can restrain the
min_vruntime invariance to that state.

The only other case is wakeup balancing, since we want to
maintain the old vruntime we cannot make it relative on dequeue,
but since we don't migrate inactive tasks, we can do so right
before we activate it again.

This is where we need the new pre-wakeup hook, we need to call
this while still holding the old rq->lock. We could fold it into
->select_task_rq(), but since that has multiple callsites and
would obfuscate the locking requirements, that seems like a
fudge.

This leaves the fork() case, simply make sure that ->task_fork()
leaves the ->vruntime in a relative state.

This covers all cases where set_task_cpu() gets called, and
ensures it sees a relative vruntime.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170518.191697025@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88ec22d3

sched: Select_task_rq_fair() must honour SD_LOAD_BALANCE · e4f42888

由 Peter Zijlstra 提交于 12月 16, 2009

We should skip !SD_LOAD_BALANCE domains.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091216170517.653578430@chello.nl>
CC: stable@kernel.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e4f42888

15 12月, 2009 1 次提交

sched: Convert rq->lock to raw_spinlock · 05fa785c

由 Thomas Gleixner 提交于 11月 17, 2009

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NIngo Molnar <mingo@elte.hu>

05fa785c

09 12月, 2009 9 次提交

sched: Update normalized values on user updates via proc · acb4a848

由 Christian Ehrhardt 提交于 11月 30, 2009

The normalized values are also recalculated in case the scaling factor
changes.

This patch updates the internally used scheduler tuning values that are
normalized to one cpu in case a user sets new values via sysfs.

Together with patch 2 of this series this allows to let user configured
values scale (or not) to cpu add/remove events taking place later.
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-4-git-send-email-ehrhardt@linux.vnet.ibm.com>
[ v2: fix warning ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

acb4a848

sched: Make tunable scaling style configurable · 1983a922

由 Christian Ehrhardt 提交于 11月 30, 2009

As scaling now takes place on all kind of cpu add/remove events a user
that configures values via proc should be able to configure if his set
values are still rescaled or kept whatever happens.

As the comments state that log2 was just a second guess that worked the
interface is not just designed for on/off, but to choose a scaling type.
Currently this allows none, log and linear, but more important it allwos
us to keep the interface even if someone has an even better idea how to
scale the values.
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-3-git-send-email-ehrhardt@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1983a922

sched: Fix missing sched tunable recalculation on cpu add/remove · 0bcdcf28

由 Christian Ehrhardt 提交于 11月 30, 2009

Based on Peter Zijlstras patch suggestion this enables recalculation of
the scheduler tunables in response of a change in the number of cpus. It
also adds a max of eight cpus that are considered in that scaling.
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1259579808-11357-2-git-send-email-ehrhardt@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0bcdcf28

sched: Remove unnecessary RCU exclusion · fb58bac5

由 Peter Zijlstra 提交于 12月 01, 2009

As Nick pointed out, and realized by myself when doing:
   sched: Fix balance vs hotplug race
the patch:
   sched: for_each_domain() vs RCU

is wrong, sched_domains are freed after synchronize_sched(), which
means disabling preemption is enough.
Reported-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb58bac5

sched: Discard some old bits · 6cecd084

由 Peter Zijlstra 提交于 11月 30, 2009

WAKEUP_RUNNING was an experiment, not sure why that ever ended up being
merged...
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6cecd084

sched: Clean up check_preempt_wakeup() · 3a7e73a2

由 Peter Zijlstra 提交于 11月 28, 2009

Streamline the wakeup preemption code a bit, unifying the preempt path
so that they all do the same.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3a7e73a2

sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call · a65ac745

由 Jupyung Lee 提交于 11月 17, 2009

If a RT task is woken up while a non-RT task is running,
check_preempt_wakeup() is called to check whether the new task can
preempt the old task. The function returns quickly without going deeper
because it is apparent that a RT task can always preempt a non-RT task.

In this situation, check_preempt_wakeup() always calls update_curr() to
update vruntime value of the currently running task. However, the
function call is unnecessary and redundant at that moment because (1) a
non-RT task can always be preempted by a RT task regardless of its
vruntime value, and (2) update_curr() will be called shortly when the
context switch between two occurs.

By moving update_curr() in check_preempt_wakeup(), we can avoid
redundant call to update_curr(), slightly reducing the time taken to
wake up RT tasks.
Signed-off-by: NJupyung Lee <jupyung@gmail.com>
[ Place update_curr() right before the wake_preempt_entity() call, which
  is the only thing that relies on the updated vruntime ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1258451500-6714-1-git-send-email-jupyung@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a65ac745

sched: Sanitize fork() handling · cd29fe6f

由 Peter Zijlstra 提交于 11月 27, 2009

Currently we try to do task placement in wake_up_new_task() after we do
the load-balance pass in sched_fork(). This yields complicated semantics
in that we have to deal with tasks on different RQs and the
set_task_cpu() calls in copy_process() and sched_fork()

Rename ->task_new() to ->task_fork() and call it from sched_fork()
before the balancing, this gives the policy a clear point to place the
task.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cd29fe6f

sched: Protect sched_rr_get_param() access to task->sched_class · dba091b9

由 Thomas Gleixner 提交于 12月 09, 2009

sched_rr_get_param calls
task->sched_class->get_rr_interval(task) without protection
against a concurrent sched_setscheduler() call which modifies
task->sched_class.

Serialize the access with task_rq_lock(task) and hand the rq
pointer into get_rr_interval() as it's needed at least in the
sched_fair implementation.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dba091b9

24 11月, 2009 1 次提交

sched: Optimize branch hint in pick_next_task_fair() · 36ace27e

由 Tim Blechmann 提交于 11月 24, 2009

Branch hint profiling on my nehalem machine showed 90%
incorrect branch hints:

  15728471 158903754  90 pick_next_task_fair
  sched_fair.c    1555
Signed-off-by: NTim Blechmann <tim@klingt.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0BBBB1.2050100@klingt.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

36ace27e

13 11月, 2009 2 次提交

sched: More generic WAKE_AFFINE vs select_idle_sibling() · fe3bcfe1

由 Peter Zijlstra 提交于 11月 12, 2009

Instead of only considering SD_WAKE_AFFINE | SD_PREFER_SIBLING
domains also allow all SD_PREFER_SIBLING domains below a
SD_WAKE_AFFINE domain to change the affinity target.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091112145610.909723612@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fe3bcfe1

sched: Cleanup select_task_rq_fair() · a50bde51

由 Peter Zijlstra 提交于 11月 12, 2009

Clean up the new affine to idle sibling bits while trying to
grok them. Should not have any function differences.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091112145610.832503781@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a50bde51

05 11月, 2009 2 次提交

sched: Fix affinity logic in select_task_rq_fair() · fd210738

由 Mike Galbraith 提交于 11月 05, 2009

Ingo Molnar reported:

[   26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
[   26.808000] caller is vmstat_update+0x26/0x70
[   26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887
[   26.816000] Call Trace:
[   26.820000]  [<c1924a24>] ? printk+0x28/0x3c
[   26.824000]  [<c13258a0>] debug_smp_processor_id+0xf0/0x110
[   26.824000] mount used greatest stack depth: 1464 bytes left
[   26.828000]  [<c111d086>] vmstat_update+0x26/0x70
[   26.832000]  [<c1086418>] worker_thread+0x188/0x310
[   26.836000]  [<c10863b7>] ? worker_thread+0x127/0x310
[   26.840000]  [<c108d310>] ? autoremove_wake_function+0x0/0x60
[   26.844000]  [<c1086290>] ? worker_thread+0x0/0x310
[   26.848000]  [<c108cf0c>] kthread+0x7c/0x90
[   26.852000]  [<c108ce90>] ? kthread+0x0/0x90
[   26.856000]  [<c100c0a7>] kernel_thread_helper+0x7/0x10
[   26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10
[   26.864000] caller is vmstat_update+0x3c/0x70

Because this commit:

  a1f84a3a: sched: Check for an idle shared cache in select_task_rq_fair()

broke ->cpus_allowed.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: arjan@infradead.org
Cc: <stable@kernel.org>
LKML-Reference: <1257415066.12867.1.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd210738

sched: Check for an idle shared cache in select_task_rq_fair() · a1f84a3a

由 Mike Galbraith 提交于 10月 27, 2009

When waking affine, check for an idle shared cache, and if
found, wake to that CPU/sibling instead of the waker's CPU.

This improves pgsql+oltp ramp up by roughly 8%. Possibly more
for other loads, depending on overlap. The trade-off is a
roughly 1% peak downturn if tasks are truly synchronous.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@kernel.org>
LKML-Reference: <1256654138.17752.7.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a1f84a3a

24 10月, 2009 1 次提交

sched: Strengthen buddies and mitigate buddy induced latencies · f685ceac

由 Mike Galbraith 提交于 10月 23, 2009

This patch restores the effectiveness of LAST_BUDDY in preventing
pgsql+oltp from collapsing due to wakeup preemption. It also
switches LAST_BUDDY to exclusively do what it does best, namely
mitigate the effects of aggressive wakeup preemption, which
improves vmark throughput markedly, and restores mysql+oltp
scalability.

Since buddies are about scalability, enable them beginning at the
point where we begin expanding sched_latency, namely
sched_nr_latency. Previously, buddies were cleared aggressively,
which seriously reduced their effectiveness. Not clearing
aggressively however, produces a small drop in mysql+oltp
throughput immediately after peak, indicating that LAST_BUDDY is
actually doing some harm. This is right at the point where X on the
desktop in competition with another load wants low latency service.
Ergo, do not enable until we need to scale.

To mitigate latency induced by buddies, or by a task just missing
wakeup preemption, check latency at tick time.

Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
CACHE_HOT_BUDDY.

Supporting performance tests:

 tip   = v2.6.32-rc5-1497-ga525b32
 tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
 tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs

(Three run averages except where noted.)

 vmark:
 ------
 tip           108466 messages per second
 tip+          125307 messages per second
 tip+x         125335 messages per second
 tipx          117781 messages per second
 2.6.31.3      122729 messages per second

 mysql+oltp:
 -----------
 clients          1        2        4        8       16       32       64        128    256
 ..........................................................................................
 tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
 tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
 tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
 2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86

 pgsql+oltp:
 -----------
 clients          1        2        4        8       16       32       64      128      256
 ..........................................................................................
 tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
 tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
 tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
 tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
 2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
Signed-off-by: NMike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f685ceac

14 10月, 2009 1 次提交

sched: Do less agressive buddy clearing · 92f6a5e3

由 Peter Zijlstra 提交于 10月 09, 2009

Yanmin reported a hackbench regression due to:

 > commit de69a80b
 > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
 > Date:   Thu Sep 17 09:01:20 2009 +0200
 >
 >     sched: Stop buddies from hogging the system

I really liked de69a80b, and it affecting hackbench shows I wasn't
crazy ;-)

So hackbench is a multi-cast, with one sender spraying multiple
receivers, who in their turn don't spray back.

This would be exactly the scenario that patch 'cures'. Previously
we would not clear the last buddy after running the next task,
allowing the sender to get back to work sooner than it otherwise
ought to have been, increasing latencies for other tasks.

Now, since those receivers don't poke back, they don't enforce the
buddy relation, which means there's nothing to re-elect the sender.

Cure this by less agressively clearing the buddy stats. Only clear
buddies when they were not chosen. It should still avoid a buddy
sticking around long after its served its time.
Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
CC: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1255084986.8802.46.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

92f6a5e3

24 9月, 2009 1 次提交

sysctl: remove "struct file *" argument of ->proc_handler · 8d65af78

由 Alexey Dobriyan 提交于 9月 23, 2009

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d65af78

21 9月, 2009 1 次提交

sched: Simplify sys_sched_rr_get_interval() system call · 0d721cea

由 Peter Williams 提交于 9月 21, 2009

By removing the need for it to know details of scheduling classes.

This allows PlugSched to define orthogonal scheduling classes.
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <06d1b89ee15a0eef82d7.1253496713@mudlark.pw.nest>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d721cea

19 9月, 2009 1 次提交

sched: Re-add lost cpu_allowed check to sched_fair.c::select_task_rq_fair() · 3f04e8cd

由 Mike Galbraith 提交于 9月 19, 2009

While doing some testing, I pinned mplayer, only to find it
following X around like a puppy. Looking at commit c88d5910, I found
a cpu_allowed check that went AWOL.  I plugged it back in where it
looks like it needs to go, and now when I say "sit, stay!", mplayer
obeys again.

'c88d5910 sched: Merge select_task_rq_fair() and
sched_balance_self()' accidentally dropped the check, causing
wake_affine() to pull pinned tasks - put it back.

[ v2: use a cheaper version from Peter ]
Signed-off-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f04e8cd

18 9月, 2009 1 次提交

sched: Remove unneeded indentation in sched_fair.c::place_entity() · a2e7a7eb

由 Mike Galbraith 提交于 9月 18, 2009

Signed-off-by: NMike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253258365.22787.33.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a2e7a7eb

17 9月, 2009 3 次提交

sched: Fix SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL vs SD_WAKE_AFFINE · 29cd8bae

由 Peter Zijlstra 提交于 9月 17, 2009

The SD_POWERSAVING_BALANCE|SD_PREFER_LOCAL code can break out of
the domain iteration early, making us miss the SD_WAKE_AFFINE bits.

Fix this by continuing iteration until there is no need for a
larger domain.

This also cleans up the cgroup stuff a bit, but not having two
update_shares() invocations.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

29cd8bae

sched: Stop buddies from hogging the system · de69a80b

由 Peter Zijlstra 提交于 9月 17, 2009

Clear buddies more agressively.

The (theoretical, haven't actually observed any of this) problem is
that when we do not select either buddy in pick_next_entity()
because they are too far ahead of the left-most task, we do not
clear the buddies.

This means that as soon as we service the left-most task, these
same buddies will be tried again on the next schedule. Now if the
left-most task was a pure hog, it wouldn't have done any wakeups
and it wouldn't have set buddies of its own. That leads to the old
buddies dominating, which would lead to bad latencies.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

de69a80b

sched: Add new wakeup preemption mode: WAKEUP_RUNNING · ad4b78bb

由 Peter Zijlstra 提交于 9月 16, 2009

Create a new wakeup preemption mode, preempt towards tasks that run
shorter on avg. It sets next buddy to be sure we actually run the task
we preempted for.

Test results:

 root@twins:~# while :; do :; done &
 [1] 6537
 root@twins:~# while :; do :; done &
 [2] 6538
 root@twins:~# while :; do :; done &
 [3] 6539
 root@twins:~# while :; do :; done &
 [4] 6540

 root@twins:/home/peter# ./latt -c4 sleep 4
 Entries: 48 (clients=4)

 Averages:
 ------------------------------
        Max          4750 usec
        Avg           497 usec
        Stdev         737 usec

 root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features

 root@twins:/home/peter# ./latt -c4 sleep 4
 Entries: 48 (clients=4)

 Averages:
 ------------------------------
        Max            14 usec
        Avg             5 usec
        Stdev           3 usec

Disabled by default - needs more testing.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
LKML-Reference: <new-submission>

ad4b78bb

16 9月, 2009 1 次提交

sched: Rename flags to wake_flags · 5a9b86f6

由 Peter Zijlstra 提交于 9月 16, 2009

For consistencies sake, rename the argument (again).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5a9b86f6

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功