提交 · 097cc62283455668ad0b35c72920c6373f45c2ab · OpenHarmony / kernel_linux

20 9月, 2007 6 次提交

sched: fix invalid sched_class use · 9c95e731

由 Hiroshi Shimamoto 提交于 9月 19, 2007

When using rt_mutex, a NULL pointer dereference is occurred at
enqueue_task_rt. Here is a scenario;
1) there are two threads, the thread A is fair_sched_class and
   thread B is rt_sched_class.
2) Thread A is boosted up to rt_sched_class, because the thread A
   has a rt_mutex lock and the thread B is waiting the lock.
3) At this time, when thread A create a new thread C, the thread
   C has a rt_sched_class.
4) When doing wake_up_new_task() for the thread C, the priority
   of the thread C is out of the RT priority range, because the
   normal priority of thread A is not the RT priority. It makes
   data corruption by overflowing the rt_prio_array.
The new thread C should be fair_sched_class.

The new thread should be valid scheduler class before queuing.
This patch fixes to set the suitable scheduler class.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

9c95e731

sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d

由 Ingo Molnar 提交于 9月 19, 2007

add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
more agressive, by moving the yielding task to the last position
in the rbtree.

with sched_compat_yield=0:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
  2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop

with sched_compat_yield=1:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
  2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

1799e35d

Fix user namespace exiting OOPs · 28f300d2

由 Pavel Emelyanov 提交于 9月 18, 2007

It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e.  MUCH later.

On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.

Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting.  The subsequent free_uid() will complete the
user_struct destruction.

For example simple program

   #include <sched.h>

   char stack[2 * 1024 * 1024];

   int f(void *foo)
   {
   	return 0;
   }

   int main(void)
   {
   	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
   	return 0;
   }

run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.

This was spotted during OpenVZ kernel testing.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28f300d2

Convert uid hash to hlist · 735de223

由 Pavel Emelyanov 提交于 9月 18, 2007

Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could.  Convert it to
hlist_heads.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

735de223

kernel/user.c: Use list_for_each_entry instead of list_for_each · d8a4821d

由 Matthias Kaehlcke 提交于 9月 18, 2007

kernel/user.c: Convert list_for_each to list_for_each_entry in
uid_hash_find()
Signed-off-by: NMatthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8a4821d

Fix UTS corruption during clone(CLONE_NEWUTS) · efc63c4f

由 Alexey Dobriyan 提交于 9月 18, 2007

struct utsname is copied from master one without any exclusion.

Here is sample output from one proggie doing

	sethostname("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
	sethostname("bbbbbbbbbbbbbbbbbbbbbbbbbbbbbb");

and another

	clone(,, CLONE_NEWUTS, ...)
	uname()

	hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaabbbbb'
	hostname = 'bbbaaaaaaaaaaaaaaaaaaaaaaaaaaa'
	hostname = 'aaaaaaaabbbbbbbbbbbbbbbbbbbbbb'
	hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaabbbb'
	hostname = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaabb'
	hostname = 'aaabbbbbbbbbbbbbbbbbbbbbbbbbbb'
	hostname = 'bbbbbbbbbbbbbbbbaaaaaaaaaaaaaa'

Hostname is sometimes corrupted.

Yes, even _the_ simplest namespace activity had bug in it. :-(
Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

efc63c4f

16 9月, 2007 5 次提交

clockevents: prevent stale tick update on offline cpu · 5e41d0d6

由 Thomas Gleixner 提交于 9月 16, 2007

Taking a cpu offline removes the cpu from the online mask before the
CPU_DEAD notification is done. The clock events layer does the cleanup
of the dead CPU from the CPU_DEAD notifier chain. tick_do_timer_cpu is
used to avoid xtime lock contention by assigning the task of jiffies
xtime updates to one CPU. If a CPU is taken offline, then this
assignment becomes stale. This went unnoticed because most of the time
the offline CPU went dead before the online CPU reached __cpu_die(),
where the CPU_DEAD state is checked. In the case that the offline CPU did
not reach the DEAD state before we reach __cpu_die(), the code in there
goes to sleep for 100ms. Due to the stale time update assignment, the
system is stuck forever.

Take the assignment away when a cpu is not longer in the cpu_online_mask.
We do this in the last call to tick_nohz_stop_sched_tick() when the offline
CPU is on the way to the final play_dead() idle entry.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5e41d0d6

clockevents: do not shutdown the oneshot broadcast device · 31d9b393

由 Thomas Gleixner 提交于 9月 16, 2007

When a cpu goes offline it is removed from the broadcast masks. If the
mask becomes empty the code shuts down the broadcast device. This is
wrong, because the broadcast device needs to be ready for the online
cpu going idle (into a c-state, which stops the local apic timer).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

31d9b393

clockevents: Enforce oneshot broadcast when broadcast mask is set on resume · 07eec6af

由 Thomas Gleixner 提交于 9月 16, 2007

The jinxed VAIO refuses to resume without hitting keys on the keyboard
when this is not enforced. It is unclear why the cpu ends up in a lower
C State without notifying the clock events layer, but enforcing the
oneshot broadcast here is safe.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

07eec6af

timekeeping: Prevent time going backwards on resume · 6a669ee8

由 Thomas Gleixner 提交于 9月 16, 2007

Timekeeping resume adjusts xtime by adding the slept time in seconds and
resets the reference value of the clock source (clock->cycle_last).
clock->cycle last is used to calculate the delta between the last xtime
update and the readout of the clock source in __get_nsec_offset(). xtime
plus the offset is the current time. The resume code ignores the delta
which had already elapsed between the last xtime update and the actual
time of suspend. If the suspend time is short, then we can see time
going backwards on resume.

Suspend:
offs_s = clock->read() - clock->cycle_last;
now = xtime + offs_s;
timekeeping_suspend_time = read_rtc();

Resume:
sleep_time = read_rtc() - timekeeping_suspend_time;
xtime.tv_sec += sleep_time;
clock->cycle_last = clock->read();
offs_r = clock->read() - clock->cycle_last;
now = xtime + offs_r;

if sleep_time_seconds == 0 and offs_r < offs_s, then time goes
backwards.

Fix this by storing the offset from the last xtime update and add it to
xtime during resume, when we reset clock->cycle_last:

sleep_time = read_rtc() - timekeeping_suspend_time;
xtime.tv_sec += sleep_time;
xtime += offs_s;	/* Fixup xtime offset at suspend time */
clock->cycle_last = clock->read();
offs_r = clock->read() - clock->cycle_last;
now = xtime + offs_r;

Thanks to Marcelo for tracking this down on the OLPC and providing the
necessary details to analyze the root cause.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Tosatti <marcelo@kvack.org>

6a669ee8

timekeeping: access rtc outside of xtime lock · 3be90950

由 Thomas Gleixner 提交于 9月 16, 2007

Lockdep complains about the access of rtc in timekeeping_suspend
inside the interrupt disabled region of the write locked xtime lock.
Move the access outside.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>

3be90950

12 9月, 2007 3 次提交

Fix "no_sync_cmos_clock" logic inversion in kernel/time/ntp.c · 298a5df4

由 Tony Breeds 提交于 9月 11, 2007

Seems to me that this timer will only get started on platforms that say
they don't want it?
Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gabriel Paubert <paubert@iram.es>
Cc: Zachary Amsden <zach@vmware.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

298a5df4

Restore call_usermodehelper_pipe() behaviour · 3210f0ec

由 Michael Ellerman 提交于 9月 11, 2007

The semantics of call_usermodehelper_pipe() used to be that it would fork
the helper, and wait for the kernel thread to be started.  This was
implemented by setting sub_info.wait to 0 (implicitly), and doing a
wait_for_completion().

As part of the cleanup done in 0ab4dc92,
call_usermodehelper_pipe() was changed to pass 1 as the value for wait to
call_usermodehelper_exec().

This is equivalent to setting sub_info.wait to 1, which is a change from
the previous behaviour.  Using 1 instead of 0 causes
__call_usermodehelper() to start the kernel thread running
wait_for_helper(), rather than directly calling ____call_usermodehelper().

The end result is that the calling kernel code blocks until the user mode
helper finishes.  As the helper is expecting input on stdin, and now no one
is writing anything, everything locks up (observed in do_coredump).

The fix is to change the 1 to UMH_WAIT_EXEC (aka 0), indicating that we
want to wait for the kernel thread to be started, but not for the helper to
finish.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Acked-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3210f0ec

futex_compat: fix list traversal bugs · 179c85ea

由 Arnd Bergmann 提交于 9月 11, 2007

The futex list traversal on the compat side appears to have
a bug.

It's loop termination condition compares:

        while (compat_ptr(uentry) != &head->list)

But that can't be right because "uentry" has the special
"pi" indicator bit still potentially set at bit 0.  This
is cleared by fetch_robust_entry() into the "entry"
return value.

What this seems to mean is that the list won't terminate
when list iteration gets back to the the head.  And we'll
also process the list head like a normal entry, which could
cause all kinds of problems.

So we should check for equality with "entry".  That pointer
is of the non-compat type so we have to do a little casting
to keep the compiler and sparse happy.

The same problem can in theory occur with the 'pending'
variable, although that has not been reported from users
so far.

Based on the original patch from David Miller.
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

179c85ea

11 9月, 2007 1 次提交

Fix spurious syscall tracing after PTRACE_DETACH + PTRACE_ATTACH · 7d941432

由 Roland McGrath 提交于 9月 05, 2007

When PTRACE_SYSCALL was used and then PTRACE_DETACH is used, the
TIF_SYSCALL_TRACE flag is left set on the formerly-traced task. This
means that when a new tracer comes along and does PTRACE_ATTACH, it's
possible he gets a syscall tracing stop even though he's never used
PTRACE_SYSCALL. This happens if the task was in the middle of a system
call when the second PTRACE_ATTACH was done. The symptom is an
unexpected SIGTRAP when the tracer thinks that only SIGSTOP should have
been provoked by his ptrace calls so far.

A few machines already fixed this in ptrace_disable (i386, ia64, m68k).
But all other machines do not, and still have this bug. On x86_64, this
constitutes a regression in IA32 compatibility support.

Since all machines now use TIF_SYSCALL_TRACE for this, I put the
clearing of TIF_SYSCALL_TRACE in the generic ptrace_detach code rather
than adding it to every other machine's ptrace_disable.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d941432

05 9月, 2007 8 次提交

sched: fix ideal_runtime calculations for reniced tasks · 11697830

由 Peter Zijlstra 提交于 9月 05, 2007

fix ideal_runtime:

  - do not scale it using niced_granularity()
    it is against sum_exec_delta, so its wall-time, not fair-time.

  - move the whole check into __check_preempt_curr_fair()
    so that wakeup preemption can also benefit from the new logic.

this also results in code size reduction:

   text    data     bss     dec     hex filename
  13391     228    1204   14823    39e7 sched.o.before
  13369     228    1204   14801    39d1 sched.o.after
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

11697830

sched: improve prev_sum_exec_runtime setting · 4a55b450

由 Peter Zijlstra 提交于 9月 05, 2007

Second preparatory patch for fix-ideal runtime:

Mark prev_sum_exec_runtime at the beginning of our run, the same spot
that adds our wait period to wait_runtime. This seems a more natural
location to do this, and it also reduces the code a bit:

   text    data     bss     dec     hex filename
  13397     228    1204   14829    39ed sched.o.before
  13391     228    1204   14823    39e7 sched.o.after
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4a55b450

sched: simplify __check_preempt_curr_fair() · 7c92e54f

由 Peter Zijlstra 提交于 9月 05, 2007

Preparatory patch for fix-ideal-runtime:

simplify __check_preempt_curr_fair(): get rid of the integer return.

   text    data     bss     dec     hex filename
  13404     228    1204   14836    39f4 sched.o.before
  13393     228    1204   14825    39e9 sched.o.after

functionality is unchanged.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7c92e54f

sched: fix xtensa build warning · cf2ab469

由 Ingo Molnar 提交于 9月 05, 2007

rename RSR to SRR - 'RSR' is already defined on xtensa.

found by Adrian Bunk.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cf2ab469

sched: debug: fix sum_exec_runtime clearing · 2491b2b8

由 Ingo Molnar 提交于 9月 05, 2007

when cleaning sched-stats also clear prev_sum_exec_runtime.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2491b2b8

sched: debug: fix cfs_rq->wait_runtime accounting · a206c072

由 Ingo Molnar 提交于 9月 05, 2007

the cfs_rq->wait_runtime debug/statistics counter was not maintained
properly - fix this.

this also removes some code:

   text    data     bss     dec     hex filename
  13420     228    1204   14852    3a04 sched.o.before
  13404     228    1204   14836    39f4 sched.o.after
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

a206c072

sched: fix niced_granularity() shift · a0dc7260

由 Ingo Molnar 提交于 9月 05, 2007

fix niced_granularity(). This resulted in under-scheduling for
CPU-bound negative nice level tasks (and this in turn caused
higher than necessary latencies in nice-0 tasks).
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a0dc7260

sched: fix MC/HT scheduler optimization, without breaking the FUZZ logic. · 7fd0d2dd

由 Suresh Siddha 提交于 9月 05, 2007

First fix the check
	if (*imbalance + SCHED_LOAD_SCALE_FUZZ < busiest_load_per_task)
with this
	if (*imbalance < busiest_load_per_task)

As the current check is always false for nice 0 tasks (as
SCHED_LOAD_SCALE_FUZZ is same as busiest_load_per_task for nice 0
tasks).

With the above change, imbalance was getting reset to 0 in the corner
case condition, making the FUZZ logic fail. Fix it by not corrupting the
imbalance and change the imbalance, only when it finds that the HT/MC
optimization is needed.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7fd0d2dd

31 8月, 2007 6 次提交

sigqueue_free: fix the race with collect_signal() · 60187d27

由 Oleg Nesterov 提交于 8月 30, 2007

Spotted by taoyue <yue.tao@windriver.com> and Jeremy Katz <jeremy.katz@windriver.com>.

collect_signal:				sigqueue_free:

	list_del_init(&first->list);
						if (!list_empty(&q->list)) {
							// not taken
						}
						q->flags &= ~SIGQUEUE_PREALLOC;

	__sigqueue_free(first);			__sigqueue_free(q);

Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the
obviously bad implications.

In particular, this double free breaks the array_cache->avail logic, so the
same sigqueue could be "allocated" twice, and the bug can manifest itself via
the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.

Hopefully this can explain these mysterious bug-reports, see

	http://marc.info/?t=118766926500003
	http://marc.info/?t=118466273000005

Alexey Dobriyan reports this patch makes the difference for the testcase, but
nobody has an access to the application which opened the problems originally.

Also, this patch removes tasklist lock/unlock, ->siglock is enough.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: taoyue <yue.tao@windriver.com>
Cc: Jeremy Katz <jeremy.katz@windriver.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60187d27

userns: don't leak root user · 99db67bc

由 Alexey Dobriyan 提交于 8月 30, 2007

Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Acked-by: NCedric Le Goater <clg@fr.ibm.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99db67bc

request_irq: fix DEBUG_SHIRQ handling · 59845b1f

由 Jarek Poplawski 提交于 8月 30, 2007

Mariusz Kozlowski reported lockdep's warning:

> =================================
> [ INFO: inconsistent lock state ]
> 2.6.23-rc2-mm1 #7
> ---------------------------------
> inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
> ifconfig/5492 [HC0[0]:SC0[0]:HE1:SE1] takes:
>  (&tp->lock){+...}, at: [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
> {in-hardirq-W} state was registered at:
>   [<c0138eeb>] __lock_acquire+0x949/0x11ac
>   [<c01397e7>] lock_acquire+0x99/0xb2
>   [<c0452ff3>] _spin_lock+0x35/0x42
>   [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
>   [<c0147a5d>] handle_IRQ_event+0x28/0x59
>   [<c01493ca>] handle_level_irq+0xad/0x10b
>   [<c0105a13>] do_IRQ+0x93/0xd0
>   [<c010441e>] common_interrupt+0x2e/0x34
...
> other info that might help us debug this:
> 1 lock held by ifconfig/5492:
>  #0:  (rtnl_mutex){--..}, at: [<c0451778>] mutex_lock+0x1c/0x1f
>
> stack backtrace:
...
>  [<c0452ff3>] _spin_lock+0x35/0x42
>  [<de8706e0>] rtl8139_interrupt+0x27/0x46b [8139too]
>  [<c01480fd>] free_irq+0x11b/0x146
>  [<de871d59>] rtl8139_close+0x8a/0x14a [8139too]
>  [<c03bde63>] dev_close+0x57/0x74
...

This shows that a driver's irq handler was running both in hard interrupt
and process contexts with irqs enabled. The latter was done during
free_irq() call and was possible only with CONFIG_DEBUG_SHIRQ enabled.
This was fixed by another patch.

But similar problem is possible with request_irq(): any locks taken from
irq handler could be vulnerable - especially with soft interrupts. This
patch fixes it by disabling local interrupts during handler's run. (It
seems, disabling softirqs should be enough, but it needs more checking
on possible races or other special cases).
Reported-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: NJarek Poplawski <jarkao2@o2.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

59845b1f

PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION · f3de4be9

由 Rafael J. Wysocki 提交于 8月 30, 2007

Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit
296699de "Introduce CONFIG_SUSPEND for
suspend-to-Ram and standby" are incorrect, as they don't cover the facts that
(1) not all architectures support suspend and (2) SMP hibernation is only
possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set).
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3de4be9

setpgid(child) fails if the child was forked by sub-thread · b07e35f9

由 Oleg Nesterov 提交于 8月 30, 2007

Spotted by Marcin Kowalczyk <qrczak@knm.org.pl>.

sys_setpgid(child) fails if the child was forked by sub-thread.

Fix the "is it our child" check. The previous commit
ee0acf90 was not complete.

(this patch asks for the new same_thread_group() helper, but mainline doesn't
 have it yet).
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: <stable@kernel.org>
Tested-by: N"Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b07e35f9

Assign task_struct.exit_code before taskstats_exit() · f2ab6d88

由 Jonathan Lim 提交于 8月 30, 2007

taskstats.ac_exitcode is assigned to task_struct.exit_code in bacct_add_tsk()
through the following kernel function calls:

  do_exit()
    taskstats_exit()
      fill_pid()
        bacct_add_tsk()

The problem is that in do_exit(), task_struct.exit_code is set to 'code' only
after taskstats_exit() has been called.  So we need to move the assignment
before taskstats_exit().
Signed-off-by: NJonathan Lim <jlim@sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2ab6d88

28 8月, 2007 7 次提交

sched: clean up task_new_fair() · 9f508f82

由 Ingo Molnar 提交于 8月 28, 2007

cleanup: we have the 'se' and 'curr' entity-pointers already,
no need to use p->se and current->se.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>

9f508f82

sched: small schedstat fix · 213c8af6

由 Ingo Molnar 提交于 8月 28, 2007

small schedstat fix: the cfs_rq->wait_runtime 'sum of all runtimes'
statistics counters missed newly forked tasks and thus had a constant
negative skew. Fix this.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>

213c8af6

sched: fix wait_start_fair condition in update_stats_wait_end() · b77d69db

由 Ingo Molnar 提交于 8月 28, 2007

Peter Zijlstra noticed the following bug in SCHED_FEAT_SKIP_INITIAL (which
is disabled by default at the moment): it relies on se.wait_start_fair
being 0 while update_stats_wait_end() did not recognize a 0 value,
so instead of 'skipping' the initial interval we gave the new child
a maximum boost of +runtime-limit ...

(No impact on the default kernel, but nice to fix for completeness.)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>

b77d69db

sched: call update_curr() in task_tick_fair() · 7109c442

由 Ting Yang 提交于 8月 28, 2007

update the fair-clock before using it for the key value.

[ mingo@elte.hu: small cleanups. ]
Signed-off-by: NTing Yang <tingy@cs.umass.edu>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

7109c442

sched: make the scheduler converge to the ideal latency · f6cf891c

由 Ingo Molnar 提交于 8月 28, 2007

de-HZ-ification of the granularity defaults unearthed a pre-existing
property of CFS: while it correctly converges to the granularity goal,
it does not prevent run-time fluctuations in the range of
[-gran ... 0 ... +gran].

With the increase of the granularity due to the removal of HZ
dependencies, this becomes visible in chew-max output (with 5 tasks
running):

 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:   17 .   13 | per:   44 .   40
 out:  27 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   36 .   40
 out:  29 . 27. 32 | flu:  2 .  0 | ran:   17 .   13 | per:   46 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40
 out:  29 . 27. 32 | flu:  0 .  0 | ran:   18 .   13 | per:   47 .   40
 out:  28 . 27. 32 | flu:  0 .  0 | ran:    9 .   13 | per:   37 .   40

average slice is the ideal 13 msecs and the period is picture-perfect 40
msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no
mechanism in CFS to keep that from happening: it's a perfectly valid
solution that CFS finds.

to fix this we add a granularity/preemption rule that knows about
the "target latency", which makes tasks that run longer than the ideal
latency run a bit less. The simplest approach is to simply decrease the
preemption granularity when a task overruns its ideal latency. For this
we have to track how much the task executed since its last preemption.

( this adds a new field to task_struct, but we can eliminate that
  overhead in 2.6.24 by putting all the scheduler timestamps into an
  anonymous union. )

with this change in place, chew-max output is fluctuation-less all
around:

 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  2 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40
 out:  28 . 27. 39 | flu:  0 .  1 | ran:   13 .   13 | per:   41 .   40

this patch has no impact on any fastpath or on any globally observable
scheduling property. (unless you have sharp enough eyes to see
millisecond-level ruckles in glxgears smoothness :-)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMike Galbraith <efault@gmx.de>

f6cf891c

sched: fix sleeper bonus limit · 5f01d519

由 Mike Galbraith 提交于 8月 28, 2007

There is an Amarok song switch time increase (regression) under
hefty load.

What is happening is that sleeper_bonus is never consumed, and only
rarely goes below runtime_limit, so for the most part, Amarok isn't
getting any bonus at all.  We're keeping sleeper_bonus right at
runtime_limit (sched_latency == sched_runtime_limit == 40ms) forever, ie
we don't consume if we're lower that that, and don't add if we're above
it.  One Amarok thread waking (or anybody else) will push us past the
threshold, so the next thread waking gets nada, but will reap pain from
the previous thread waking until we drop back to runtime_limit.  It
looks to me like under load, some random task gets a bonus, and
everybody else pays, whether deserving or not.

This diff fixed the regression for me at any load rate.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

5f01d519

fix bogus hotplug cpu warning · d243769d

由 Hugh Dickins 提交于 8月 27, 2007

Fix bogus DEBUG_PREEMPT warning on x86_64, when cpu brought online after
bootup: current_is_keventd is right to note its use of smp_processor_id
is preempt-safe, but should use raw_smp_processor_id to avoid the warning.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d243769d

26 8月, 2007 4 次提交

sched: s/sched_latency/sched_min_granularity · 50c46637

由 Ingo Molnar 提交于 8月 25, 2007

runtime limit and wakeup granularity used to be a function of
granularity and that was incorrect changed to sched_latency.

Fix this to make wakeup granularity a function of min-granularity,
and the runtime limit equal to latency.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

50c46637

sched: cleanup, sched_granularity -> sched_min_granularity · 172ac3db

由 Ingo Molnar 提交于 8月 25, 2007

due to adaptive granularity scheduling the role of sched_granularity
has changed to "minimum granularity", so rename the variable (and the
tunable) accordingly.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

172ac3db

sched: adaptive scheduler granularity · 21805085

由 Peter Zijlstra 提交于 8月 25, 2007

Instead of specifying the preemption granularity, specify the wanted
latency. By fixing the granlarity to a constany the wakeup latency
it a function of the number of running tasks on the rq.

Invert this relation.

sysctl_sched_granularity becomes a minimum for the dynamic granularity
computed from the new sysctl_sched_latency.

Then use this latency to do more intelligent granularity decisions: if
there are fewer tasks running then we can schedule coarser. This helps
performance while still always keeping the latency target.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

21805085

sched: fix CONFIG_SCHED_DEBUG dependency of lockdep sysctls · 1fc84aaa

由 Peter Zijlstra 提交于 8月 25, 2007

Make the lockdep sysctls not depend on CONFIG_SCHED_DEBUG.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1fc84aaa

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多