提交 · 702a7c7609bec3a940b6a46b0d6ab9ce45274580 · xiphi1978 / linux

09 12月, 2009 1 次提交

sched: Protect sched_rr_get_param() access to task->sched_class · dba091b9

由 Thomas Gleixner 提交于 12月 09, 2009

sched_rr_get_param calls
task->sched_class->get_rr_interval(task) without protection
against a concurrent sched_setscheduler() call which modifies
task->sched_class.

Serialize the access with task_rq_lock(task) and hand the rq
pointer into get_rr_interval() as it's needed at least in the
sched_fair implementation.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dba091b9

21 9月, 2009 1 次提交

sched: Simplify sys_sched_rr_get_interval() system call · 0d721cea

由 Peter Williams 提交于 9月 21, 2009

By removing the need for it to know details of scheduling classes.

This allows PlugSched to define orthogonal scheduling classes.
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <06d1b89ee15a0eef82d7.1253496713@mudlark.pw.nest>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d721cea

15 9月, 2009 3 次提交

sched: Rename sync arguments · 7d478721

由 Peter Zijlstra 提交于 9月 14, 2009

In order to extend the functions to have more than 1 flag (sync),
rename the argument to flags, and explicitly define a WF_ space for
individual flags.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7d478721

sched: Rename select_task_rq() argument · 0763a660

由 Peter Zijlstra 提交于 9月 14, 2009

In order to be able to rename the sync argument, we need to rename
the current flag argument.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0763a660

sched: Hook sched_balance_self() into sched_class::select_task_rq() · 5f3edc1b

由 Peter Zijlstra 提交于 9月 10, 2009

Rather ugly patch to fully place the sched_balance_self() code
inside the fair class.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5f3edc1b

15 5月, 2009 1 次提交

sched, timers: move calc_load() to scheduler · dce48a84

由 Thomas Gleixner 提交于 4月 11, 2009

Dimitri Sivanich noticed that xtime_lock is held write locked across
calc_load() which iterates over all online CPUs. That can cause long
latencies for xtime_lock readers on large SMP systems. 

The load average calculation is an rough estimate anyway so there is
no real need to protect the readers vs. the update. It's not a problem
when the avenrun array is updated while a reader copies the values.

Instead of iterating over all online CPUs let the scheduler_tick code
update the number of active tasks shortly before the avenrun update
happens. The avenrun update itself is handled by the CPU which calls
do_timer().

[ Impact: reduce xtime_lock write locked section ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>

dce48a84

22 10月, 2008 1 次提交

sched: add CONFIG_SMP consistency · 4ce72a2c

由 Li Zefan 提交于 10月 22, 2008

a patch from Henrik Austad did this:

>> Do not declare select_task_rq as part of sched_class when CONFIG_SMP is
>> not set.

Peter observed:

> While a proper cleanup, could you do it by re-arranging the methods so
> as to not create an additional ifdef?

Do not declare select_task_rq and some other methods as part of sched_class
when CONFIG_SMP is not set.

Also gather those methods to avoid CONFIG_SMP mess.
Idea-by: NHenrik Austad <henrik.austad@gmail.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NHenrik Austad <henrik@austad.us>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4ce72a2c

22 9月, 2008 1 次提交

sched: wakeup preempt when small overlap · 15afe09b

由 Peter Zijlstra 提交于 9月 20, 2008

Lin Ming reported a 10% OLTP regression against 2.6.27-rc4.

The difference seems to come from different preemption agressiveness,
which affects the cache footprint of the workload and its effective
cache trashing.

Aggresively preempt a task if its avg overlap is very small, this should
avoid the task going to sleep and find it still running when we schedule
back to it - saving a wakeup.
Reported-by: NLin Ming <ming.m.lin@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

15afe09b

06 5月, 2008 1 次提交

sched: make rt_sched_class, idle_sched_class static · 2abdad0a

由 Harvey Harrison 提交于 4月 25, 2008

The C files are included directly in sched.c, so they are
effectively static.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2abdad0a

26 1月, 2008 3 次提交

sched: high-res preemption tick · 8f4d37ec

由 Peter Zijlstra 提交于 1月 25, 2008

Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f4d37ec

sched: RT-balance, add new methods to sched_class · cb469845

由 Steven Rostedt 提交于 1月 25, 2008

Dmitry Adamushko found that the current implementation of the RT
balancing code left out changes to the sched_setscheduler and
rt_mutex_setprio.

This patch addresses this issue by adding methods to the schedule classes
to handle being switched out of (switched_from) and being switched into
(switched_to) a sched_class. Also a method for changing of priorities
is also added (prio_changed).

This patch also removes some duplicate logic between rt_mutex_setprio and
sched_setscheduler.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cb469845

sched: de-SCHED_OTHER-ize the RT path · e7693a36

由 Gregory Haskins 提交于 1月 25, 2008

The current wake-up code path tries to determine if it can optimize the
wake-up to "this_cpu" by computing load calculations.  The problem is that
these calculations are only relevant to SCHED_OTHER tasks where load is king.
For RT tasks, priority is king.  So the load calculation is completely wasted
bandwidth.

Therefore, we create a new sched_class interface to help with
pre-wakeup routing decisions and move the load calculation as a function
of CFS task's class.
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e7693a36

25 10月, 2007 2 次提交

sched: isolate SMP balancing code a bit more · 681f3e68

由 Peter Williams 提交于 10月 24, 2007

At the moment, a lot of load balancing code that is irrelevant to non
SMP systems gets included during non SMP builds.

This patch addresses this issue and reduces the binary size on non
SMP systems:

   text    data     bss     dec     hex filename
  10983      28    1192   12203    2fab sched.o.before
  10739      28    1192   11959    2eb7 sched.o.after
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

681f3e68

sched: reduce balance-tasks overhead · e1d1484f

由 Peter Williams 提交于 10月 24, 2007

At the moment, balance_tasks() provides low level functionality for both
  move_tasks() and move_one_task() (indirectly) via the load_balance()
function (in the sched_class interface) which also provides dual
functionality.  This dual functionality complicates the interfaces and
internal mechanisms and makes the run time overhead of operations that
are called with two run queue locks held.

This patch addresses this issue and reduces the overhead of these
operations.
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e1d1484f

15 10月, 2007 4 次提交

sched: mark scheduling classes as const · 5522d5d5

由 Ingo Molnar 提交于 10月 15, 2007

mark scheduling classes as const. The speeds up the code
a bit and shrinks it:

   text    data     bss     dec     hex filename
  40027    4018     292   44337    ad31 sched.o.before
  40190    3842     292   44324    ad24 sched.o.after
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

5522d5d5

sched: revert recent removal of set_curr_task() · 83b699ed

由 Srivatsa Vaddagiri 提交于 10月 15, 2007

Revert removal of set_curr_task.
Use put_prev_task/set_curr_task when changing groups/policies

Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com>
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

83b699ed

sched: rework enqueue/dequeue_entity() to get rid of set_curr_task() · f6b53205

由 Dmitry Adamushko 提交于 10月 15, 2007

rework enqueue/dequeue_entity() to get rid of 
sched_class::set_curr_task(). This simplifies sched_setscheduler(), 
rt_mutex_setprio() and sched_move_tasks().

   text    data     bss     dec     hex filename
  24330    2734      20   27084    69cc sched.o.before
  24233    2730      20   26983    6967 sched.o.after
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

f6b53205

sched: group-scheduler core · 29f59db3

由 Srivatsa Vaddagiri 提交于 10月 15, 2007

Add interface to control cpu bandwidth allocation to task-groups.

(not yet configurable, due to missing CONFIG_CONTAINERS)
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

29f59db3

09 8月, 2007 5 次提交

sched: remove the 'u64 now' parameter from ->put_prev_task() · 31ee529c

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->put_prev_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

31ee529c

sched: remove the 'u64 now' parameter from ->pick_next_task() · fb8d4724

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->pick_next_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb8d4724

sched: remove the 'u64 now' parameter from ->dequeue_task() · f02231e5

由 Ingo Molnar 提交于 8月 09, 2007

remove the 'u64 now' parameter from ->dequeue_task().

( identity transformation that causes no change in functionality. )
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f02231e5

sched: fix bug in balance_tasks() · a4ac01c3

由 Peter Williams 提交于 8月 09, 2007

There are two problems with balance_tasks() and how it used:

1. The variables best_prio and best_prio_seen (inherited from the old
move_tasks()) were only required to handle problems caused by the
active/expired arrays, the order in which they were processed and the
possibility that the task with the highest priority could be on either.
These issues are no longer present and the extra overhead associated
with their use is unnecessary (and possibly wrong).

2. In the absence of CONFIG_FAIR_GROUP_SCHED being set, the same
this_best_prio variable needs to be used by all scheduling classes or
there is a risk of moving too much load. E.g. if the highest priority
task on this at the beginning is a fairly low priority task and the rt
class migrates a task (during its turn) then that moved task becomes the
new highest priority task on this_rq but when the sched_fair class
initializes its copy of this_best_prio it will get the priority of the
original highest priority task as, due to the run queue locks being
held, the reschedule triggered by pull_task() will not have taken place.
This could result in inappropriate overriding of skip_for_load and
excessive load being moved.

The attached patch addresses these problems by deleting all reference to
best_prio and best_prio_seen and making this_best_prio a reference
parameter to the various functions involved.

load_balance_fair() has also been modified so that this_best_prio is
only reset (in the loop) if CONFIG_FAIR_GROUP_SCHED is set. This should
preserve the effect of helping spread groups' higher priority tasks
around the available CPUs while improving system performance when
CONFIG_FAIR_GROUP_SCHED isn't set.
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a4ac01c3

sched: simplify move_tasks() · 43010659

由 Peter Williams 提交于 8月 09, 2007

The move_tasks() function is currently multiplexed with two distinct
capabilities:

1. attempt to move a specified amount of weighted load from one run
queue to another; and
2. attempt to move a specified number of tasks from one run queue to
another.

The first of these capabilities is used in two places, load_balance()
and load_balance_idle(), and in both of these cases the return value of
move_tasks() is used purely to decide if tasks/load were moved and no
notice of the actual number of tasks moved is taken.

The second capability is used in exactly one place,
active_load_balance(), to attempt to move exactly one task and, as
before, the return value is only used as an indicator of success or failure.

This multiplexing of sched_task() was introduced, by me, as part of the
smpnice patches and was motivated by the fact that the alternative, one
function to move specified load and one to move a single task, would
have led to two functions of roughly the same complexity as the old
move_tasks() (or the new balance_tasks()).  However, the new modular
design of the new CFS scheduler allows a simpler solution to be adopted
and this patch addresses that solution by:

1. adding a new function, move_one_task(), to be used by
active_load_balance(); and
2. making move_tasks() a single purpose function that tries to move a
specified weighted load and returns 1 for success and 0 for failure.

One of the consequences of these changes is that neither move_one_task()
or the new move_tasks() care how many tasks sched_class.load_balance()
moves and this enables its interface to be simplified by returning the
amount of load moved as its result and removing the load_moved pointer
from the argument list.  This helps simplify the new move_tasks() and
slightly reduces the amount of work done in each of
sched_class.load_balance()'s implementations.

Further simplification, e.g. changes to balance_tasks(), are possible
but (slightly) complicated by the special needs of load_balance_fair()
so I've left them to a later patch (if this one gets accepted).

NB Since move_tasks() gets called with two run queue locks held even
small reductions in overhead are worthwhile.

[ mingo@elte.hu ]

this change also reduces code size nicely:

   text    data     bss     dec     hex filename
   39216    3618      24   42858    a76a sched.o.before
   39173    3618      24   42815    a73f sched.o.after
Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

43010659

10 7月, 2007 1 次提交

sched: cfs core, kernel/sched_idletask.c · fa72e9e4

由 Ingo Molnar 提交于 7月 09, 2007

add kernel/sched_idletask.c - which implements the idle thread
scheduling class. This further simplifies sched.c (under CFS),
for example a number of 'if (p == rq->idle)' type of special-cases
can be removed from sched.c, and schedule() gets simpler too.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fa72e9e4