提交 · 634fa8c97cc8f4ee2ae1dea7200ff0df762405e7 · openanolis / cloud-kernel

10 7月, 2007 27 次提交

sched: remove interactivity types · 634fa8c9

由 Ingo Molnar 提交于 7月 09, 2007

remove now unused interactivity-heuristics related defined and
types of the old scheduler.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

634fa8c9

sched: clean up include files in sched.c · dff06c15

由 Ingo Molnar 提交于 7月 09, 2007

clean up include files in sched.c, they were still old-style <asm/>.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dff06c15

B
sched: update delay-accounting to use CFS's precise stats · 172ba844
由 Balbir Singh 提交于 7月 09, 2007
```
update delay-accounting to use CFS's precise stats.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
172ba844
I
sched: turn on the use of unstable events · 1b9f19c2
由 Ingo Molnar 提交于 7月 09, 2007
```
make use of sched-clock-unstable events.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
1b9f19c2

sched: x86, track TSC-unstable events · bb29ab26

由 Ingo Molnar 提交于 7月 09, 2007

track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb29ab26

sched: cfs core code · dd41f596

由 Ingo Molnar 提交于 7月 09, 2007

apply the CFS core code.

this change switches over the scheduler core to CFS's modular
design and makes use of kernel/sched_fair/rt/idletask.c to implement
Linux's scheduling policies.

thanks to Andrew Morton and Thomas Gleixner for lots of detailed review
feedback and for fixlets.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

dd41f596

sched: remove the sleep-bonus interactivity code · f3479f10

由 Ingo Molnar 提交于 7月 09, 2007

remove the sleep-bonus interactivity code from the core scheduler.

scheduling policy is implemented in the policy modules, and CFS does
not need such type of heuristics.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f3479f10

sched: remove expired_starving() · c18a1732

由 Ingo Molnar 提交于 7月 09, 2007

remove the expired_starving() heuristics from the core scheduler.

CFS does not need it, and this did not really work well in practice
anyway, due to the rq->nr_running multiplier to STARVATION_LIMIT.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c18a1732

sched: remove sleep_type · f2ac58ee

由 Ingo Molnar 提交于 7月 09, 2007

remove the sleep_type heuristics from the core scheduler - scheduling
policy is implemented in the scheduling-policy modules. (and CFS does
not use this type of sleep-type heuristics)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f2ac58ee

I
sched: cfs, add load-calculation methods · 45bf76df
由 Ingo Molnar 提交于 7月 09, 2007
```
add the new load-calculation methods of CFS.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
45bf76df

sched: clean up __normal_prio() position · 14531189

由 Ingo Molnar 提交于 7月 09, 2007

clean up: move __normal_prio() in head of normal_prio().

no code changed.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

14531189

sched: cleanup: move dequeue/enqueue_task() · 71f8bd46

由 Ingo Molnar 提交于 7月 09, 2007

cleanup: move dequeue/enqueue_task() to a more logical place, to
not split up __normal_prio()/normal_prio().
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71f8bd46

sched: move around resched_task() · c24d20db

由 Ingo Molnar 提交于 7月 09, 2007

move resched_task()/resched_cpu() into the 'public interfaces'
section of sched.c, for use by kernel/sched_fair/rt/idletask.c
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c24d20db

sched: clean up the rt priority macros · e05606d3

由 Ingo Molnar 提交于 7月 09, 2007

clean up the rt priority macros, pointed out by Andrew Morton.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e05606d3

sched: add cfs_rq ops · 138a8aeb

由 Ingo Molnar 提交于 7月 09, 2007

add the set_task_cfs_rq() abstraction needed by CONFIG_FAIR_GROUP_SCHED.

(not activated yet)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

138a8aeb

sched: make posix-cpu-timers use CFS's accounting information · 41b86e9c

由 Ingo Molnar 提交于 7月 09, 2007

update the posix-cpu-timers code to use CFS's CPU accounting information.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

41b86e9c

sched: add rq_clock()/__rq_clock() · 20d315d4

由 Ingo Molnar 提交于 7月 09, 2007

add rq_clock()/__rq_clock(), a robust wrapper around sched_clock(),
used by CFS. It protects against common type of sched_clock() problems
(caused by hardware): time warps forwards and backwards.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

20d315d4

sched: cfs rq data types · 6aa645ea

由 Ingo Molnar 提交于 7月 09, 2007

add the CFS rq data types to sched.c.

(the old scheduler fields are still intact, they are removed
 by a later patch)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6aa645ea

sched: cfs core, kernel/sched_idletask.c · fa72e9e4

由 Ingo Molnar 提交于 7月 09, 2007

add kernel/sched_idletask.c - which implements the idle thread
scheduling class. This further simplifies sched.c (under CFS),
for example a number of 'if (p == rq->idle)' type of special-cases
can be removed from sched.c, and schedule() gets simpler too.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fa72e9e4

sched: cfs core, kernel/sched_rt.c · bb44e5d1

由 Ingo Molnar 提交于 7月 09, 2007

add kernel/sched_rt.c: SCHED_FIFO/SCHED_RR support. The behavior
and semantics of SCHED_FIFO/SCHED_RR tasks is unchanged.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb44e5d1

sched: cfs core, kernel/sched_fair.c · bf0f6f24

由 Ingo Molnar 提交于 7月 09, 2007

add kernel/sched_fair.c - which implements the bulk of CFS's
behavioral changes for SCHED_OTHER tasks.

see Documentation/sched-design-CFS.txt about details.

Authors:

 Ingo Molnar <mingo@elte.hu>
 Dmitry Adamushko <dmitry.adamushko@gmail.com>
 Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
 Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

bf0f6f24

sched: move code into kernel/sched_stats.h · 425e0968

由 Ingo Molnar 提交于 7月 09, 2007

create sched_stats.h and move sched.c schedstats code into it.
This cleans up sched.c a bit.

no code changes are caused by this patch.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

425e0968

sched: add init_idle_bootup_task() · 1df21055

由 Ingo Molnar 提交于 7月 09, 2007

add the init_idle_bootup_task() callback to the bootup thread,
unused at the moment. (CFS will use it to switch the scheduling
class of the boot thread to the idle class)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1df21055

sched: remove sched_exit() · f64f6114

由 Ingo Molnar 提交于 7月 09, 2007

remove sched_exit(): the elaborate dance of us trying to recover
timeslices given to child tasks never really worked.

CFS does not need it either.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f64f6114

sched: uninline set_task_cpu() · c65cc870

由 Ingo Molnar 提交于 7月 09, 2007

uninline set_task_cpu(): CFS will add more code to it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c65cc870

sched: zap the migration init / cache-hot balancing code · 0437e109

由 Ingo Molnar 提交于 7月 09, 2007

the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.

this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.

(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)

under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0437e109

sched: rename idle_type/SCHED_IDLE · d15bcfdb

由 Ingo Molnar 提交于 7月 09, 2007

enum idle_type (used by the load-balancer) clashes with the
SCHED_IDLE name that we want to introduce. 'CPU_IDLE' instead
of 'SCHED_IDLE' is more descriptive as well.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d15bcfdb

04 7月, 2007 1 次提交

NTP: remove clock_was_set() call to prevent deadlock · 746976a3

由 Thomas Gleixner 提交于 7月 03, 2007

The clock_was_set() call in seconds_overflow() which happens only when
leap seconds are inserted / deleted is wrong in two aspects:

1. it results in a call to on_each_cpu() with interrupts disabled
2. it is potential deadlock source vs. call_lock in smp_call_function()

The only possible side effect of the removal might be, that an absolute
CLOCK_REALTIME timer fires 1 second too late, in the rare case of leap
second deletion and an absolute CLOCK_REALTIME timer which expires in
the affected time frame. It will never fire too early.

This was probably observed by the reporter of a June 30th -> July 1st
hang: http://lkml.org/lkml/2007/7/3/103

A similar problem was observed by Dave Jones, who provided a screen shot
with a lockdep back trace, which allowed to analyse the problem.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

746976a3

02 7月, 2007 1 次提交

PM: introduce set_target method in pm_ops · 2391dae3

由 Rafael J. Wysocki 提交于 7月 01, 2007

Commit 52ade9b3 changed the suspend code
ordering to execute pm_ops->prepare() after the device model per-device
.suspend() calls in order to fix some ACPI-related issues.  Unfortunately, it
broke the at91 platform which assumed that pm_ops->prepare() would be called
before suspending devices.

at91 used pm_ops->prepare() to get notified of the target system sleep state,
so that it could use this information while suspending devices.  However, with
the current suspend code ordering pm_ops->prepare() is called too late for
this purpose.  Thus, at91 needs an additional method in 'struct pm_ops' that
will be used for notifying the platform of the target system sleep state.
Moreover, in the future such a method will also be needed by ACPI.

This patch adds the .set_target() method to 'struct pm_ops' and makes the
suspend code call it, if implemented, before executing the device model
per-device .suspend() calls.  It also modifies the at91 code to use
pm_ops->set_target() instead of pm_ops->prepare().
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2391dae3

29 6月, 2007 2 次提交

relayfs: fix overwrites · a66e356c

由 Masami Hiramatsu 提交于 6月 27, 2007

When I use relayfs with "overwrite" mode, read() still sets incorrect
number of consumed bytes.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: NTom Zanussi <zanussi@us.ibm.com>
Acked-by: NDavid Wilder <dwilder@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a66e356c

relay file read: start-pos fix · 8d62fdeb

由 David Wilder 提交于 6月 27, 2007

Fix a bug in the relay read interface causing the number of consumed bytes
to be set incorrectly.
Signed-off-by: NTom Zanussi <zanussi@us.ibm.com>
Signed-off-by: NDavid Wilder <dwilder@us.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d62fdeb

25 6月, 2007 1 次提交

FUTEX: Restore the dropped ERSCH fix · a06381fe

由 Thomas Gleixner 提交于 6月 23, 2007

The return value of futex_find_get_task() needs to be -ESRCH in case
that the search fails.  This was part of the original futex fixes and
got accidentally dropped, when the futex-tidy-up patch was split out.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Stable Team <stable@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a06381fe

24 6月, 2007 3 次提交

audit: fix oops removing watch if audit disabled · 7b018b28

由 Tony Jones 提交于 6月 23, 2007

Removing a watched file will oops if audit is disabled (auditctl -e 0).

To reproduce:
- auditctl -e 1
- touch /tmp/foo
- auditctl -w /tmp/foo
- auditctl -e 0
- rm /tmp/foo (or mv)
Signed-off-by: NTony Jones <tonyj@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b018b28

sched: fix next_interval determination in idle_balance() · 92c4ca5c

由 Christoph Lameter 提交于 6月 23, 2007

The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance.  Otherwise
we may defer rebalancing forever.

Siddha also spotted that the conversion of the balance interval
to jiffies is missing. Fix that to.

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

also continue the loop if !(sd->flags & SD_LOAD_BALANCE).
Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

It did in fact trigger under all three of mainline, CFS, and -rt including CFS
-- see below for a couple of emails from last Friday giving results for these
three on the AMD box (where it happened) and on a single-quad NUMA-Q system
(where it did not, at least not with such severity).
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92c4ca5c

fix refcounting of nsproxy object when unshared · 4e71e474

由 Cedric Le Goater 提交于 6月 23, 2007

When a namespace is unshared, a refcount on the previous nsproxy is
abusively taken, leading to a memory leak of nsproxy objects.
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e71e474

22 6月, 2007 1 次提交

posix-timers: Prevent softirq starvation by small intervals and SIG_IGN · 58229a18

由 Thomas Gleixner 提交于 6月 21, 2007

posix-timers which deliver an ignored signal are currently rearmed in
the timer softirq: This is necessary because the timer needs to be
delivered again when SIG_IGN is removed. This is not a problem, when
the interval is reasonable.

With high resolution timers enabled one might arm a posix timer with a
very small interval and ignore the signal. This might lead to a
softirq starvation when the interval is so small that the timer is
requeued onto the softirq pending list right away.

This problem was pointed out by Jan Kiszka. Thanks Jan !

The correct solution would be to stop the timer, when the signal is
ignored and rearm it when SIG_IGN is removed. Unfortunately this
requires modification in sigaction and involves non trivial sighand
locking. It's too late in the release cycle for such a change.

For now we just keep the timer running and enforce that the timer only
fires every jiffie. This does not break anything as we keep the
overrun counter correct. It adds a little inaccuracy to the
timer_gettime() interface, but...

The more complex change is necessary anyway to fix another short
coming of the current implementation, which I discovered while looking
at this problem: A pending signal is discarded when SIG_IGN is set. In
case that a posixtimer signal is pending then it is discarded as well,
but when SIG_IGN is removed later nothing rearms the timer. This is
not new, it's that way since posix timers have been merged. So nothing
to worry about right now.

I have a working solution to fix all of this, but the impact is too
large for both stable and 2.6.22. I'm going to send it out for review
in the next days.

This should go into 2.6.21.stable as well.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Stable Team <stable@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58229a18

19 6月, 2007 4 次提交

Fix possible runqueue lock starvation in wait_task_inactive() · fa490cfd

由 Linus Torvalds 提交于 6月 18, 2007

Miklos Szeredi reported very long pauses (several seconds, sometimes
more) on his T60 (with a Core2Duo) which he managed to track down to
wait_task_inactive()'s open-coded busy-loop.

He observed that an interrupt on one core tries to acquire the
runqueue-lock but does not succeed in doing so for a very long time -
while wait_task_inactive() on the other core loops waiting for the first
core to deschedule a task (which it wont do while spinning in an
interrupt handler).

This rewrites wait_task_inactive() to do all its waiting optimistically
without any locks taken at all, and then just double-check the end
result with the proper runqueue lock held over just a very short
section.  If there were races in the optimistic wait, of a preemption
event scheduled the process away, we simply re-synchronize, and start
over.

So the code now looks like this:

	repeat:
		/* Unlocked, optimistic looping! */
		rq = task_rq(p);
		while (task_running(rq, p))
			cpu_relax();

		/* Get the *real* values */
		rq = task_rq_lock(p, &flags);
		running = task_running(rq, p);
		array = p->array;
		task_rq_unlock(rq, &flags);

		/* Check them.. */
		if (unlikely(running)) {
			cpu_relax();
			goto repeat;
		}

		/* Preempted away? Yield if so.. */
		if (unlikely(array)) {
			yield();
			goto repeat;
		}

Basically, that first "while()" loop is done entirely without any
locking at all (and doesn't check for the case where the target process
might have been preempted away), and so it's possibly "incorrect", but
we don't really care.  Both the runqueue used, and the "task_running()"
check might be the wrong tests, but they won't oops - they just mean
that we could possibly get the wrong results due to lack of locking and
exit the loop early in the case of a race condition.

So once we've exited the loop, we then get the proper (and careful) rq
lock, and check the running/runnable state _safely_.  And if it turns
out that our quick-and-dirty and unsafe loop was wrong after all, we
just go back and try it all again.

(The patch also adds a lot of comments, which is the actual bulk of it
all, to make it more obvious why we can do these things without holding
the locks).

Thanks to Miklos for all the testing and tracking it down.
Tested-by: NMiklos Szeredi <miklos@szeredi.hu>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa490cfd

sched: fix SysRq-N (normalize RT tasks) · a0f98a1c

由 Ingo Molnar 提交于 6月 17, 2007

Gene Heskett reported the following problem while testing CFS: SysRq-N
is not always effective in normalizing tasks back to SCHED_OTHER.

The reason for that turns out to be the following bug:

 - normalize_rt_tasks() uses for_each_process() to iterate through all
   tasks in the system.  The problem is, this method does not iterate
   through all tasks, it iterates through all thread groups.

The proper mechanism to enumerate over all threads is to use a
do_each_thread() + while_each_thread() loop.
Reported-by: NGene Heskett <gene.heskett@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a0f98a1c

Fix signalfd interaction with thread-private signals · caec4e8d

由 Benjamin Herrenschmidt 提交于 6月 12, 2007

Don't let signalfd dequeue private signals off other threads (in the
case of things like SIGILL or SIGSEGV, trying to do so would result
in undefined behaviour on who actually gets the signal, since they
are force unblocked).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

caec4e8d

Revert "futex_requeue_pi optimization" · bd197234

由 Thomas Gleixner 提交于 6月 17, 2007

This reverts commit d0aa7a70.

It not only introduced user space visible changes to the futex syscall,
it is also non-functional and there is no way to fix it proper before
the 2.6.22 release.

The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
unanswered, and unfortunately it turned out that the concept is not
feasible at all.  It violates the rtmutex semantics badly by introducing
a virtual owner, which hacks around the coupling of the user-space
pi_futex and the kernel internal rt_mutex representation.

At the moment the only safe option is to remove it fully as it contains
user-space visible changes to broken kernel code, which we do not want
to expose in the 2.6.22 release.

The patch reverts the original patch mostly 1:1, but contains a couple
of trivial manual cleanups which were necessary due to patches, which
touched the same area of code later.

Verified against the glibc tests and my own PI futex tests.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NUlrich Drepper <drepper@redhat.com>
Cc: Pierre Peiffer <pierre.peiffer@bull.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd197234

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功