提交 · a8a51d5e59561aa5b4d66e19eca819b537783e8f · OpenHarmony / kernel_linux

27 6月, 2008 12 次提交

sched: persistent average load per task · a8a51d5e

由 Peter Zijlstra 提交于 6月 27, 2008

Remove the fall-back to SCHED_LOAD_SCALE by remembering the previous value of
cpu_avg_load_per_task() - this is useful because of the hierarchical group
model in which task weight can be much smaller.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a8a51d5e

sched: fix sched_balance_self() smp group balancing · 039a1c41

由 Peter Zijlstra 提交于 6月 27, 2008

Finding the least idle cpu is more accurate when done with updated shares.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

039a1c41

sched: fix newidle smp group balancing · 3e5459b4

由 Peter Zijlstra 提交于 6月 27, 2008

Re-compute the shares on newidle - so we can make a decision based on
recent data.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3e5459b4

sched: simplify the group load balancer · c8cba857

由 Peter Zijlstra 提交于 6月 27, 2008

While thinking about the previous patch - I realized that using per domain
aggregate load values in load_balance_fair() is wrong. We should use the
load value for that CPU.

By not needing per domain hierarchical load values we don't need to store
per domain aggregate shares, which greatly simplifies all the math.

It basically falls apart in two separate computations:
 - per domain update of the shares
 - per CPU update of the hierarchical load

Also get rid of the move_group_shares() stuff - just re-compute the shares
again after a successful load balance.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c8cba857

sched: no need to aggregate task_weight · a25b5aca

由 Peter Zijlstra 提交于 6月 27, 2008

We only need to know the task_weight of the busiest rq - nothing to do
if there are no tasks there.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a25b5aca

sched: dont micro manage share losses · d3f40dba

由 Peter Zijlstra 提交于 6月 27, 2008

We used to try and contain the loss of 'shares' by playing arithmetic
games. Replace that by noticing that at the top sched_domain we'll
always have the full weight in shares to distribute.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d3f40dba

sched: update aggregate when holding the RQs · 4d8d595d

由 Peter Zijlstra 提交于 6月 27, 2008

It was observed that in __update_group_shares_cpu()

  rq_weight > aggregate()->rq_weight

This is caused by forks/wakeups in between the initial aggregate pass and
locking of the RQs for load balance. To avoid this situation partially re-do
the aggregation once we have the RQs locked (which avoids new tasks from
appearing).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4d8d595d

sched: fix sched_domain aggregation · b6a86c74

由 Peter Zijlstra 提交于 6月 27, 2008

Keeping the aggregate on the first cpu of the sched domain has two problems:
 - it could collide between different sched domains on different cpus
 - it could slow things down because of the remote accesses
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b6a86c74

sched: fix wakeup granularity and buddy granularity · 103638d9

由 Peter Zijlstra 提交于 6月 27, 2008

Uncouple buddy selection from wakeup granularity.

The initial idea was that buddies could run ahead as far as a normal task
can - do this by measuring a pair 'slice' just as we do for a normal task.

This means we can drop the wakeup_granularity back to 5ms.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

103638d9

sched: sched_clock_cpu() based cpu_clock() · 76a2a6ee

由 Peter Zijlstra 提交于 6月 27, 2008

with sched_clock_cpu() being reasonably in sync between cpus (max 1 jiffy
difference) use this to provide cpu_clock().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

76a2a6ee

sched: revert revert of: fair-group: SMP-nice for group scheduling · c09595f6

由 Peter Zijlstra 提交于 6月 27, 2008

Try again..

Initial commit: 18d95a28
Revert: 6363ca57Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c09595f6

sched: revert the revert of: weight calculations · a7be37ac

由 Peter Zijlstra 提交于 6月 27, 2008

Try again..

initial commit: 8f1bc385
revert: f9305d4aSigned-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a7be37ac

20 6月, 2008 3 次提交

sched: refactor wait_for_completion_timeout() · ea71a546

由 Oleg Nesterov 提交于 6月 20, 2008

Simplify the code and fix the boundary condition of
wait_for_completion_timeout(,0).

We can kill the first __remove_wait_queue() as well.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

ea71a546

sched: fix wait_for_completion_timeout() spurious failure under heavy load · bb10ed09

由 Roland Dreier 提交于 6月 19, 2008

It seems that the current implementaton of wait_for_completion_timeout()
has a small problem under very high load for the common pattern:

	if (!wait_for_completion_timeout(&done, timeout))
		/* handle failure */

because the implementation very roughly does (lots of code deleted to
show the basic flow):

	static inline long __sched
	do_wait_for_common(struct completion *x, long timeout, int state)
	{
		if (x->done)
			return timeout;

		do {
			timeout = schedule_timeout(timeout);

			if (!timeout)
				return timeout;

		} while (!x->done);

		return timeout;
	}

so if the system is very busy and x->done is not set when
do_wait_for_common() is entered, it is possible that the first call to
schedule_timeout() returns 0 because the task doing wait_for_completion
doesn't get rescheduled for a long time, even if it is woken up early
enough.

In this case, wait_for_completion_timeout() returns 0 without even
checking x->done again, and the code above falls into its failure case
purely for scheduler reasons, even if the hardware event or whatever was
being waited for happened early enough.

It would make sense to add an extra test to do_wait_for() in the timeout
case and return 1 if x->done is actually set.

A quick audit (not exhaustive) of wait_for_completion_timeout() callers
seems to indicate that no one actually cares about the return value in
the success case -- they just test for 0 (timed out) versus non-zero
(wait succeeded).
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb10ed09

sched: rt: fix the bandwidth contraint computations · 10b612f4

由 Peter Zijlstra 提交于 6月 19, 2008

Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Daniel K." <dk@uw.no>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

10b612f4

19 6月, 2008 4 次提交

cpuset: limit the input of cpuset.sched_relax_domain_level · 30e0e178

由 Li Zefan 提交于 5月 13, 2008

We allow the inputs to be [-1 ... SD_LV_MAX), and return -EINVAL
for inputs outside this range.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPaul Menage <menage@google.com>
Acked-by: NPaul Jackson <pj@sgi.com>
Acked-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

30e0e178

sched: CPU hotplug events must not destroy scheduler domains created by the cpusets · f18f982a

由 Max Krasnyansky 提交于 5月 29, 2008

First issue is not related to the cpusets. We're simply leaking doms_cur.
It's allocated in arch_init_sched_domains() which is called for every
hotplug event. So we just keep reallocation doms_cur without freeing it.
I introduced free_sched_domains() function that cleans things up.

Second issue is that sched domains created by the cpusets are
completely destroyed by the CPU hotplug events. For all CPU hotplug
events scheduler attaches all CPUs to the NULL domain and then puts
them all into the single domain thereby destroying domains created
by the cpusets (partition_sched_domains).
The solution is simple, when cpusets are enabled scheduler should not
create default domain and instead let cpusets do that. Which is
exactly what the patch does.
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: menage@google.com
Cc: rostedt@goodmis.org
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f18f982a

sched: rt-group: fix hierarchy · 7ea56616

由 Peter Zijlstra 提交于 6月 19, 2008

Don't re-set the entity's runqueue to the wrong rq after we've set it
to the right one.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NDaniel K. <dk@uw.no>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7ea56616

sched: NULL pointer dereference while setting sched_rt_period_us · 49307fd6

由 Dario Faggioli 提交于 6月 18, 2008

When CONFIG_RT_GROUP_SCHED and CONFIG_CGROUP_SCHED are enabled, with:

 echo 10000 > /proc/sys/kernel/sched_rt_period_us

We get this:

 BUG: unable to handle kernel NULL pointer dereference at 0000008c
 [  947.682233] IP: [<c0216b72>] __rt_schedulable+0x12/0x160
 [  947.683123] *pde = 00000000=20
 [  947.683782] Oops: 0000 [#1]
 [  947.684307] Modules linked in:
 [  947.684308]
 [  947.684308] Pid: 2359, comm: bash Not tainted (2.6.26-rc6 #8)
 [  947.684308] EIP: 0060:[<c0216b72>] EFLAGS: 00000246 CPU: 0
 [  947.684308] EIP is at __rt_schedulable+0x12/0x160
 [  947.684308] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
 [  947.684308] ESI: c0521db4 EDI: 00000001 EBP: c6cc9f00 ESP: c6cc9ed0
 [  947.684308]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
 [  947.684308] Process bash (pid: 2359, tiÆcc8000 taskÇa54f00=20 task.tiÆcc8000)
 [  947.684308] Stack: c0222790 00000000 080f8c08 c0521db4 c6cc9f00 00000001 00000000 00000000
 [  947.684308]        c6cc9f9c 00000000 c0521db4 00000001 c6cc9f28 c0216d40 00000000 00000000
 [  947.684308]        c6cc9f9c 000f4240 000e7ef0 ffffffff c0521db4 c79dfb60 c6cc9f58 c02af2cc
 [  947.684308] Call Trace:
 [  947.684308]  [<c0222790>] ? do_proc_dointvec_conv+0x0/0x50
 [  947.684308]  [<c0216d40>] ? sched_rt_handler+0x80/0x110
 [  947.684308]  [<c02af2cc>] ? proc_sys_call_handler+0x9c/0xb0
 [  947.684308]  [<c02af2fa>] ? proc_sys_write+0x1a/0x20
 [  947.684308]  [<c0273c36>] ? vfs_write+0x96/0x160
 [  947.684308]  [<c02af2e0>] ? proc_sys_write+0x0/0x20
 [  947.684308]  [<c027423d>] ? sys_write+0x3d/0x70
 [  947.684308]  [<c0202ef5>] ? sysenter_past_esp+0x6a/0x91
 [  947.684308]  =======================
 [  947.684308] Code: 24 04 e8 62 b1 0e 00 89 c7 89 f8 8b 5d f4 8b 75
 f8 8b 7d fc 89 ec 5d c3 90 55 89 e5 57 56 53 83 ec 24 89 45 ec 89 55 e4
 89 4d e8 <8b> b8 8c 00 00 00 85 ff 0f 84 c9 00 00 00 8b 57 24 39 55 e8
 8b
 [  947.684308] EIP: [<c0216b72>] __rt_schedulable+0x12/0x160 SS:ESP  0068:c6cc9ed0

We think the following patch solves the issue.
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NMichael Trimarchi <trimarchimichael@yahoo.it>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

49307fd6

18 6月, 2008 1 次提交

sched: rework of "prioritize non-migratable tasks over migratable ones" · 20b6331b

由 Dmitry Adamushko 提交于 6月 11, 2008

regarding this commit: 45c01e82

I think we can do it simpler. Please take a look at the patch below.

Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and
twice so on x86_64), let's add "exclusive" (the ones that are bound to
this CPU) tasks to the head of the queue and "shared" ones -- to the
end.

In case of a few newly woken up "exclusive" tasks, they are 'stacked'
(not queued as now), meaning that a task {i+1} is being placed in front
of the previously woken up task {i}. But I don't think that this
behavior may cause any realistic problems.

There are a couple of changes on top of this one.

(1) in check_preempt_curr_rt()

I don't think there is a need for the "pick_next_rt_entity(rq, &rq->rt)
!= &rq->curr->rt" check.

enqueue_task_rt(p) and check_preempt_curr_rt() are always called one
after another with rq->lock being held so the following check
"p->rt.nr_cpus_allowed == 1 && rq->curr->rt.nr_cpus_allowed != 1" should
be enough (well, just its left part) to guarantee that 'p' has been
queued in front of the 'curr'.

(2) in set_cpus_allowed_rt()

I don't thinks there is a need for requeue_task_rt() here.

Perhaps, the only case when 'requeue' (+ reschedule) might be useful is
as follows:

i) weight == 1 && cpu_isset(task_cpu(p), *new_mask)

i.e. a task is being bound to this CPU);

ii) 'p' != rq->curr

but here, 'p' has already been on this CPU for a while and was not
migrated. i.e. it's possible that 'rq->curr' would not have high chances
to be migrated right at this particular moment (although, has chance in
a bit longer term), should we allow it to be preempted.

Anyway, I think we should not perhaps make it more complex trying to
address some rare corner cases. For instance, that's why a single queue
approach would be preferable. Unless I'm missing something obvious, this
approach gives us similar functionality at lower cost.

Verified only compilation-wise.

(Almost)-Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

20b6331b

17 6月, 2008 1 次提交

sched: fix defined-but-unused warning · 95e904c7

由 Rabin Vincent 提交于 5月 11, 2008

Fix this warning, which appears with !CONFIG_SMP:
kernel/sched.c:1216: warning: `init_hrtick' defined but not used
Signed-off-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

95e904c7

12 6月, 2008 2 次提交

sched: 64-bit: fix arithmetics overflow · 7a232e03

由 Lai Jiangshan 提交于 6月 12, 2008

(overflow means weight >= 2^32 here, because inv_weigh = 2^32/weight)

A weight of a cfs_rq is the sum of weights of which entities
are queued on this cfs_rq, so it will overflow when there are
too many entities.

Although, overflow occurs very rarely, but it break fairness when
it occurs. 64-bits systems have more memory than 32-bit systems
and 64-bit systems can create more process usually, so overflow may
occur more frequently.

This patch guarantees fairness when overflow happens on 64-bit systems.
Thanks to the optimization of compiler, it changes nothing on 32-bit.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7a232e03

sched: fair group: fix overflow(was: fix divide by zero) · 2e084786

由 Lai Jiangshan 提交于 6月 12, 2008

I found a bug which can be reproduced by this way:(linux-2.6.26-rc5, x86-64)
(use 2^32, 2^33, ...., 2^63 as shares value)

# mkdir /dev/cpuctl
# mount -t cgroup -o cpu cpuctl /dev/cpuctl
# cd /dev/cpuctl
# mkdir sub
# echo 0x8000000000000000 > sub/cpu.shares
# echo $$ > sub/tasks
oops here! divide by zero.

This is because do_div() expects the 2th parameter to be 32 bits,
but unsigned long is 64 bits in x86_64.

Peter Zijstra pointed it out that the sane thing to do is limit the
shares value to something smaller instead of using an even more
expensive divide.

Also, I found another bug about "the shares value is too large":

pid1 and pid2 are set affinity to cpu#0
pid1 is attached to cg1 and pid2 is attached to cg2

if cg1/cpu.shares = 1024 cg2/cpu.shares = 2000000000
then pid2 got 100% usage of cpu, and pid1 0%

if cg1/cpu.shares = 1024 cg2/cpu.shares = 20000000000
then pid2 got 0% usage of cpu, and pid1 100%

And a weight of a cfs_rq is the sum of weights of which entities
are queued on this cfs_rq, so the shares value should be limited
to a smaller value.

I think that (1UL << 18) is a good limited value:

1) it's not too large, we can create a lot of group before overflow
2) it's several times the weight value for nice=-19 (not too small)
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e084786

10 6月, 2008 4 次提交

sched: kill off dead cfs_rq_set_shares() · e9886ca3

由 Paul Mundt 提交于 6月 09, 2008

Building with CONFIG_FAIR_GROUP_SCHED=y on UP results in an unused
cfs_rq_set_shares() reference. As nothing is using this dummy function
in the first place, just kill it off.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e9886ca3

sched: prevent bound kthreads from changing cpus_allowed · 9985b0ba

由 David Rientjes 提交于 6月 05, 2008

Kthreads that have called kthread_bind() are bound to specific cpus, so
other tasks should not be able to change their cpus_allowed from under
them.  Otherwise, it is possible to move kthreads, such as the migration
or software watchdog threads, so they are not allowed access to the cpu
they work on.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9985b0ba

sched: fix hotplug cpus on ia64 · 7def2be1

由 Peter Zijlstra 提交于 6月 05, 2008

Cliff Wickman wrote:

> I built an ia64 kernel from Andrew's tree (2.6.26-rc2-mm1)
> and get a very predictable hotplug cpu problem.
> billberry1:/tmp/cpw # ./dis
> disabled cpu 17
> enabled cpu 17
> billberry1:/tmp/cpw # ./dis
> disabled cpu 17
> enabled cpu 17
> billberry1:/tmp/cpw # ./dis
>
> The script that disables the cpu always hangs (unkillable)
> on the 3rd attempt.
>
> And a bit further:
> The kstopmachine thread always sits on the run queue (real time) for about
> 30 minutes before running.

this fix solves some (but not all) issues between CPU hotplug and
RT bandwidth throttling.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7def2be1

sched: fix TASK_WAKEKILL vs SIGKILL race · 16882c1e

由 Oleg Nesterov 提交于 6月 08, 2008

schedule() has the special "TASK_INTERRUPTIBLE && signal_pending()" case,
this allows us to do

	current->state = TASK_INTERRUPTIBLE;
	schedule();

without fear to sleep with pending signal.

However, the code like

	current->state = TASK_KILLABLE;
	schedule();

is not right, schedule() doesn't take TASK_WAKEKILL into account. This means
that mutex_lock_killable(), wait_for_completion_killable(), down_killable(),
schedule_timeout_killable() can miss SIGKILL (and btw the second SIGKILL has
no effect).

Introduce the new helper, signal_pending_state(), and change schedule() to
use it. Hopefully it will have more users, that is why the task's state is
passed separately.

Note this "__TASK_STOPPED | __TASK_TRACED" check in signal_pending_state().
This is needed to preserve the current behaviour (ptrace_notify). I hope
this check will be removed soon, but this (afaics good) change needs the
separate discussion.

The fast path is "(state & (INTERRUPTIBLE | WAKEKILL)) + signal_pending(p)",
basically the same that schedule() does now. However, this patch of course
bloats schedule().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

16882c1e

06 6月, 2008 13 次提交

sched: move weighted_cpuload into #ifdef CONFIG_SMP section · e958b360

由 Thomas Gleixner 提交于 6月 04, 2008

weighted_cpuload is only used on SMP. move it into the CONFIG_SMP
section.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e958b360

sched: Move cpu masks from kernel/sched.c into kernel/cpu.c · 68f4f1ec

由 Max Krasnyansky 提交于 5月 29, 2008

kernel/cpu.c seems a more logical place for those maps since they do not really
have much to do with the scheduler these days.

kernel/cpu.c is now built for the UP kernel too, but it does not affect the size
the kernel sections.

$ size vmlinux

before
   text       data        bss        dec        hex    filename
3313797     307060     310352    3931209     3bfc49    vmlinux

after
   text       data        bss        dec        hex    filename
3313797     307060     310352    3931209     3bfc49    vmlinux
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: menage@google.com
Cc: rostedt@goodmis.org
Cc: mingo@elte.hu
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

68f4f1ec

sched: CPU hotplug events must not destroy scheduler domains created by the cpusets · 5c8e1ed1

由 Max Krasnyansky 提交于 5月 29, 2008

First issue is not related to the cpusets. We're simply leaking doms_cur.
It's allocated in arch_init_sched_domains() which is called for every
hotplug event. So we just keep reallocation doms_cur without freeing it.
I introduced free_sched_domains() function that cleans things up.

Second issue is that sched domains created by the cpusets are
completely destroyed by the CPU hotplug events. For all CPU hotplug
events scheduler attaches all CPUs to the NULL domain and then puts
them all into the single domain thereby destroying domains created
by the cpusets (partition_sched_domains).
The solution is simple, when cpusets are enabled scheduler should not
create default domain and instead let cpusets do that. Which is
exactly what the patch does.
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Cc: pj@sgi.com
Cc: menage@google.com
Cc: rostedt@goodmis.org
Cc: mingo@elte.hu
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5c8e1ed1

sched: fix cpupri hotplug support · 1f11eb6a

由 Gregory Haskins 提交于 6月 04, 2008

The RT folks over at RedHat found an issue w.r.t. hotplug support which
was traced to problems with the cpupri infrastructure in the scheduler:

https://bugzilla.redhat.com/show_bug.cgi?id=449676

This bug affects 23-rt12+, 24-rtX, 25-rtX, and sched-devel.  This patch
applies to 25.4-rt4, though it should trivially apply to most cpupri enabled
kernels mentioned above.

It turned out that the issue was that offline cpus could get inadvertently
registered with cpupri so that they were erroneously selected during
migration decisions.  The end result would be an OOPS as the offline cpu
had tasks routed to it.

This patch generalizes the old join/leave domain interface into an
online/offline interface, and adjusts the root-domain/hotplug code to
utilize it.

I was able to easily reproduce the issue prior to this patch, and am no
longer able to reproduce it after this patch.  I can offline cpus
indefinately and everything seems to be in working order.

Thanks to Arnaldo (acme), Thomas, and Peter for doing the legwork to point
me in the right direction.  Also thank you to Peter for reviewing the
early iterations of this patch.
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1f11eb6a

sched: print the sd->level in sched_domain_debug code · 099f98c8

由 Gautham R Shenoy 提交于 5月 29, 2008

While printing out the visual representation of the sched-domains, print
the level (MC, SMT, CPU, NODE, ... ) of each of the sched_domains.

Credit: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

099f98c8

sched: add comments for ifdefs in sched.c · 6d6bc0ad

由 Dhaval Giani 提交于 5月 30, 2008

make sched.c easier to read.
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6d6bc0ad

sched: print module list in the "scheduling while atomic" warning · e21f5b15

由 Arjan van de Ven 提交于 5月 23, 2008

For the normal WARN_ON() etc we added a print-the-modules-list already,
which is very useful to figure out candidates for certain types of bugs.

This patch adds the same print to the "scheduling while atomic" BUG warning,
for the same reason: when we get here it's very useful to see which modules
are loaded, to narrow down the candidate code list.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: mingo@elte.hu
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e21f5b15

sched: fix defined-but-unused warning · 81d41d7e

由 Rabin Vincent 提交于 5月 11, 2008

Fix this warning, which appears with !CONFIG_SMP:
kernel/sched.c:1216: warning: `init_hrtick' defined but not used
Signed-off-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

81d41d7e

namespacecheck: fixes in kernel/sched.c · f7dcd80b

由 Thomas Gleixner 提交于 5月 24, 2008

Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f7dcd80b

sched: check for SD_SERIALIZE atomically in rebalance_domains() · d07355f5

由 Dmitry Adamushko 提交于 5月 12, 2008

Nothing really serious here, mainly just a matter of nit-picking :-/

From: Dmitry Adamushko <dmitry.adamushko@gmail.com>
For CONFIG_SCHED_DEBUG && CONFIG_SYSCT configs, sd->flags can be altered
while being manipulated in rebalance_domains(). Let's do an atomic check.
We rely here on the atomicity of read/write accesses for aligned words.
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d07355f5

sched: use a 2-d bitmap for searching lowest-pri CPU · 6e0534f2

由 Gregory Haskins 提交于 5月 12, 2008

The current code use a linear algorithm which causes scaling issues
on larger SMP machines.  This patch replaces that algorithm with a
2-dimensional bitmap to reduce latencies in the wake-up path.
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6e0534f2

sched: make !hrtick faster · f333fdc9

由 Mike Galbraith 提交于 5月 12, 2008

it is safe to ignore timers and flags when the feature is disabled.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f333fdc9

sched: prioritize non-migratable tasks over migratable ones · 45c01e82

由 Gregory Haskins 提交于 5月 12, 2008

Dmitry Adamushko pointed out a known flaw in the rt-balancing algorithm
that could allow suboptimal balancing if a non-migratable task gets
queued behind a running migratable one.  It is discussed in this thread:

http://lkml.org/lkml/2008/4/22/296

This issue has been further exacerbated by a recent checkin to
sched-devel (git-id 5eee63a5ebc19a870ac40055c0be49457f3a89a3).

>From a pure priority standpoint, the run-queue is doing the "right"
thing. Using Dmitry's nomenclature, if T0 is on cpu1 first, and T1
wakes up at equal or lower priority (affined only to cpu1) later, it
*should* wait for T0 to finish.  However, in reality that is likely
suboptimal from a system perspective if there are other cores that
could allow T0 and T1 to run concurrently.  Since T1 can not migrate,
the only choice for higher concurrency is to try to move T0.  This is
not something we addessed in the recent rt-balancing re-work.

This patch tries to enhance the balancing algorithm by accomodating this
scenario.  It accomplishes this by incorporating the migratability of a
task into its priority calculation.  Within a numerical tsk->prio, a
non-migratable task is logically higher than a migratable one.  We
maintain this by introducing a new per-priority queue (xqueue, or
exclusive-queue) for holding non-migratable tasks.  The scheduler will
draw from the xqueue over the standard shared-queue (squeue) when
available.

There are several details for utilizing this properly.

1) During task-wake-up, we not only need to check if the priority
   preempts the current task, but we also need to check for this
   non-migratable condition.  Therefore, if a non-migratable task wakes
   up and sees an equal priority migratable task already running, it
   will attempt to preempt it *if* there is a likelyhood that the
   current task will find an immediate home.

2) Tasks only get this non-migratable "priority boost" on wake-up.  Any
   requeuing will result in the non-migratable task being queued to the
   end of the shared queue.  This is an attempt to prevent the system
   from being completely unfair to migratable tasks during things like
   SCHED_RR timeslicing.

I am sure this patch introduces potentially "odd" behavior if you
concoct a scenario where a bunch of non-migratable threads could starve
migratable ones given the right pattern.  I am not yet convinced that
this is a problem since we are talking about tasks of equal RT priority
anyway, and there never is much in the way of guarantees against
starvation under that scenario anyway. (e.g. you could come up with a
similar scenario with a specific timing environment verses an affinity
environment).  I can be convinced otherwise, but for now I think this is
"ok".
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

45c01e82

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多