提交 · b92ce55893745e011edae70830b8bc863be881f9 · xiphi1978 / linux

11 4月, 2006 1 次提交

[PATCH] splice: add direct fd <-> fd splicing support · b92ce558

由 Jens Axboe 提交于 4月 11, 2006

It's more efficient for sendfile() emulation. Basically we cache an
internal private pipe and just use that as the intermediate area for
pages. Direct splicing is not available from sys_splice(), it is only
meant to be used for sendfile() emulation.

Additional patch from Ingo Molnar to avoid the PIPE_BUFFERS loop at
exit for the normal fast path.
Signed-off-by: NJens Axboe <axboe@suse.de>

b92ce558

01 4月, 2006 1 次提交

[PATCH] task: RCU protect task->usage · 8c7904a0

由 Eric W. Biederman 提交于 3月 31, 2006

A big problem with rcu protected data structures that are also reference
counted is that you must jump through several hoops to increase the reference
count.  I think someone finally implemented atomic_inc_not_zero(&count) to
automate the common case.  Unfortunately this means you must special case the
rcu access case.

When data structures are only visible via rcu in a manner that is not
determined by the reference count on the object (i.e.  tasks are visible until
their zombies are reaped) there is a much simpler technique we can employ.
Simply delaying the decrement of the reference count until the rcu interval is
over.

What that means is that the proc code that looks up a task and later
wants to sleep can now do:

rcu_read_lock();
task = find_task_by_pid(some_pid);
if (task) {
	get_task_struct(task);
}
rcu_read_unlock();

The effect on the rest of the kernel is that put_task_struct becomes cheaper
and immediate, and in the case where the task has been reaped it frees the
task immediate instead of unnecessarily waiting an until the rcu interval is
over.

Cleanup of task_struct does not happen when its reference count drops to
zero, instead cleanup happens when release_task is called.  Tasks can only
be looked up via rcu before release_task is called.  All rcu protected
members of task_struct are freed by release_task.

Therefore we can move call_rcu from put_task_struct into release_task.  And
we can modify release_task to not immediately release the reference count
but instead have it call put_task_struct from the function it gives to
call_rcu.

The end result:

- get_task_struct is safe in an rcu context where we have just looked
  up the task.

- put_task_struct() simplifies into its old pre rcu self.

This reorganization also makes put_task_struct uncallable from modules as
it is not exported but it does not appear to be called from any modules so
this should not be an issue, and is trivially fixed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8c7904a0

29 3月, 2006 14 次提交

[PATCH] cleanup __exit_signal->cleanup_sighand path · a7e5328a

由 Oleg Nesterov 提交于 3月 28, 2006

Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal().  This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a7e5328a

[PATCH] pids: kill PIDTYPE_TGID · 47e65328

由 Oleg Nesterov 提交于 3月 28, 2006

This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit.  It is also a
preparation for the further tref/pids rework.

This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.

We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.

Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-By: NEric Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

47e65328

[PATCH] do_group_exit: don't take tasklist_lock · aacc9094

由 Oleg Nesterov 提交于 3月 28, 2006

do_group_exit() takes tasklist_lock for zap_other_threads(), this is unneeded
now.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aacc9094

[PATCH] do __unhash_process() under ->siglock · 5876700c

由 Oleg Nesterov 提交于 3月 28, 2006

This patch moves __unhash_process() call from realease_task() to
__exit_signal(), so __detach_pid() is called with ->siglock held.

This means we don't need tasklist_lock to iterate over thread group anymore:

	copy_process() was already changed to do attach_pid()
	under ->siglock.

	Eric's "pidhash-kill-switch_exec_pids.patch" from -mm
	changed de_thread() so it doesn't touch PIDTYPE_TGID.

NOTE: de_thread() still needs some attention.  It still changes task->pid
lockless.  Taking ->sighand.siglock here allows to do more tasklist_lock
removals.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5876700c

[PATCH] revert "Optimize sys_times for a single thread process" · 35f5cad8

由 Oleg Nesterov 提交于 3月 28, 2006

This patch reverts 'CONFIG_SMP && thread_group_empty()' optimization in
sys_times().  The reason is that the next patch breaks memory ordering which
is needed for that optimization.

tasklist_lock in sys_times() will be eliminated completely by further patch.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

35f5cad8

[PATCH] move __exit_signal() to kernel/exit.c · 6a14c5c9

由 Oleg Nesterov 提交于 3月 28, 2006

__exit_signal() is private to release_task() now.  I think it is better to
make it static in kernel/exit.c and export flush_sigqueue() instead - this
function is much more simple and straightforward.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6a14c5c9

[PATCH] release_task: replace open-coded ptrace_unlink() · 1f09f974

由 Oleg Nesterov 提交于 3月 28, 2006

Use ptrace_unlink() instead of open-coding.  No changes in kernel/exit.o
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1f09f974

[PATCH] reparent_thread: use remove_parent/add_parent · 6ac781b1

由 Oleg Nesterov 提交于 3月 28, 2006

Use remove_parent/add_parent instead of open coding.

No changes in kernel/exit.o
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6ac781b1

[PATCH] pidhash: don't count idle threads · 73b9ebfe

由 Oleg Nesterov 提交于 3月 28, 2006

fork_idle() does unhash_process() just after copy_process().  Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.

copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].

We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
called with pid == 0 arg, so it will never be reused.  So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.

However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

73b9ebfe

[PATCH] kill SET_LINKS/REMOVE_LINKS · c97d9893

由 Oleg Nesterov 提交于 3月 28, 2006

Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
these callers already check thread_group_leader().

This patch kills theese macros, they mix two different things: setting
process's parent and registering it in init_task.tasks list.  Callers are
updated to do these actions by hand.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c97d9893

[PATCH] don't use REMOVE_LINKS/SET_LINKS for reparenting · 9b678ece

由 Oleg Nesterov 提交于 3月 28, 2006

There are places where kernel uses REMOVE_LINKS/SET_LINKS while changing
process's ->parent.  Use add_parent/remove_parent instead, they don't abuse
of global process list.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9b678ece

[PATCH] remove add_parent()'s parent argument · 8fafabd8

由 Oleg Nesterov 提交于 3月 28, 2006

add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently.  This patch removes this argument.

No changes in affected .o files.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8fafabd8

[PATCH] choose_new_parent: remove unused arg, sanitize exit_state check · d799f035

由 Oleg Nesterov 提交于 3月 28, 2006

'child_reaper' arg is not used in choose_new_parent().

"->exit_state >= EXIT_ZOMBIE" check is a leftover, was
valid when EXIT_ZOMBIE lived in ->state var.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: NEric Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d799f035

[PATCH] exec: allow init to exec from any thread. · fef23e7f

由 Eric W. Biederman 提交于 3月 28, 2006

After looking at the problem of init calling exec some more I figured out
an easy way to make the code work.

The actual symptom without out this patch is that all threads will die
except pid == 1, and the thread calling exec.  The thread calling exec will
wait forever for pid == 1 to die.

Since pid == 1 does not install a handler for SIGKILL it will never die.

This modifies the tests for init from current->pid == 1 to the equivalent
current == child_reaper.  And then it causes exec in the ugly case to
modify child_reaper.

The only weird symptom is that you wind up with an init process that
doesn't have the oldest start time on the box.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fef23e7f

28 3月, 2006 2 次提交

[PATCH] lightweight robust futexes: compat · 34f192c6

由 Ingo Molnar 提交于 3月 27, 2006

32-bit syscall compatibility support.  (This patch also moves all futex
related compat functionality into kernel/futex_compat.c.)
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Acked-by: NUlrich Drepper <drepper@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

34f192c6

[PATCH] lightweight robust futexes: core · 0771dfef

由 Ingo Molnar 提交于 3月 27, 2006

Add the core infrastructure for robust futexes: structure definitions, the new
syscalls and the do_exit() based cleanup mechanism.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Acked-by: NUlrich Drepper <drepper@redhat.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0771dfef

23 3月, 2006 1 次提交

[PATCH] sem2mutex: tty · 70522e12

由 Ingo Molnar 提交于 3月 23, 2006

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

70522e12

19 3月, 2006 1 次提交

[PATCH] don't do exit_io_context() until we know we won't be doing any IO · afc847b7

由 Al Viro 提交于 2月 28, 2006

testcase:

mount /dev/sdb10 /mnt
touch /mnt/tmp/b
umount /mnt
mount /dev/sdb10 /mnt
rm /mnt/tmp/b </mnt/tmp/b
umount /mnt

and watch blkdev_ioc line in /proc/slabinfo.  Vanilla kernel leaks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

afc847b7

21 2月, 2006 1 次提交

[PATCH] kjournald keeps reference to namespace · 5914811a

由 Björn Steinbrink 提交于 2月 18, 2006

In daemonize() a new thread gets cleaned up and 'merged' with init_task.
The current fs_struct is handled there, but not the current namespace.

This adds the namespace part.

[ Eric Biederman pointed out the namespace wrappers, and also notes that
  we can't ever count on using our parents namespace because we already
  have called exit_fs(), which is the only way to the namespace from a
  process. ]
Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
Acked-by: NEric Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5914811a

15 1月, 2006 2 次提交

[PATCH] Unlinline a bunch of other functions · 858119e1

由 Arjan van de Ven 提交于 1月 14, 2006

Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NJeff Garzik <jgarzik@pobox.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

858119e1

[PATCH] sched: add new SCHED_BATCH policy · b0a9499c

由 Ingo Molnar 提交于 1月 14, 2006

Add a new SCHED_BATCH (3) scheduling policy: such tasks are presumed
CPU-intensive, and will acquire a constant +5 priority level penalty.  Such
policy is nice for workloads that are non-interactive, but which do not
want to give up their nice levels.  The policy is also useful for workloads
that want a deterministic scheduling policy without interactivity causing
extra preemptions (between that workload's tasks).
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b0a9499c

12 1月, 2006 1 次提交

[PATCH] move capable() to capability.h · c59ede7b

由 Randy.Dunlap 提交于 1月 11, 2006

- Move capable() from sched.h to capability.h;

- Use <linux/capability.h> where capable() is used
	(in include/, block/, ipc/, kernel/, a few drivers/,
	mm/, security/, & sound/;
	many more drivers/ to go)
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c59ede7b

11 1月, 2006 2 次提交

[PATCH] Decrease number of pointer derefs in exit.c · 3795e161

由 Jesper Juhl 提交于 1月 09, 2006

Decrease the number of pointer derefs in kernel/exit.c

Benefits of the patch:
 - Fewer pointer dereferences should make the code slightly faster.
 - Size of generated code is smaller
 - improved readability
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3795e161

[PATCH] hrtimer: switch itimers to hrtimer · 2ff678b8

由 Thomas Gleixner 提交于 1月 09, 2006

switch itimers to a hrtimers-based implementation
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2ff678b8

10 1月, 2006 1 次提交

[PATCH] mutex subsystem, more debugging code · de5097c2

由 Ingo Molnar 提交于 1月 09, 2006

more mutex debugging: check for held locks during memory freeing,
task exit, enable sysrq printouts, etc.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@infradead.org>

de5097c2

09 1月, 2006 3 次提交

[PATCH] setpgid: should work for sub-threads · e19f247a

由 Oren Laadan 提交于 1月 08, 2006

setsid() does not work unless the calling process is a
thread_group_leader().

'man setpgid' does not tell anything about that, so I consider this
behaviour is a bug.
Signed-off-by: NOren Laadan <orenl@cs.columbia.edu>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e19f247a

[PATCH] little do_group_exit() cleanup · 485a6435

由 Oleg Nesterov 提交于 1月 08, 2006

zap_other_threads() sets SIGNAL_GROUP_EXIT at the very start,
do_group_exit() doesn't need to do it.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

485a6435

[PATCH] RCU signal handling · e56d0903

由 Ingo Molnar 提交于 1月 08, 2006

RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked
instead of tasklist_lock read-locked.  This is a scalability improvement on
SMP and a preemption-latency improvement under PREEMPT_RCU.
Signed-off-by: NPaul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NWilliam Irwin <wli@holomorphy.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e56d0903

14 11月, 2005 1 次提交

[PATCH] m68k: introduce task_thread_info · a1261f54

由 Al Viro 提交于 11月 13, 2005

new helper - task_thread_info(task). On platforms that have thread_info
allocated separately (i.e. in default case) it simply returns
task->thread_info. m68k wants (and for good reasons) to embed its thread_info
into task_struct. So it will (in later patch) have task_thread_info() of its
own. For now we just add a macro for generic case and convert existing
instances of its body in core kernel to uses of new macro. Obviously safe -
all normal architectures get the same preprocessor output they used to get.
Signed-off-by: NAl Viro <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a1261f54

07 11月, 2005 1 次提交

[PATCH] Process Events Connector · 9f46080c

由 Matt Helsley 提交于 11月 07, 2005

This patch adds a connector that reports fork, exec, id change, and exit
events for all processes to userspace.  It replaces the fork_advisor patch
that ELSA is currently using.  Applications that may find these events
useful include accounting/auditing (e.g.  ELSA), system activity monitoring
(e.g.  top), security, and resource management (e.g.  CKRM).
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9f46080c

31 10月, 2005 3 次提交

[PATCH] remove hardcoded SEND_SIG_xxx constants · b67a1b9e

由 Oleg Nesterov 提交于 10月 30, 2005

This patch replaces hardcoded SEND_SIG_xxx constants with
their symbolic names.

No changes in affected .o files.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b67a1b9e

[PATCH] wait4 PTRACE_ATTACH race fix · 7f2a5255

由 Roland McGrath 提交于 10月 30, 2005

Back about a year ago when I last fiddled heavily with the do_wait code, I
was thinking too hard about the wrong thing and I now think I introduced a
bug whose inverse thought I was fixing.

Apparently noone was looking too hard over much shoulder, so as to cite my
bogus reasoning at the time.  In the race condition when PTRACE_ATTACH is
about to steal a child and then the child hits a tracing event (what
my_ptrace_child checks for), the real parent does need to set its flag
noting it has some eligible live children.  Otherwise a spurious ECHILD
error is possible, since the child in question is not yet on the
ptrace_children list.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7f2a5255

[PATCH] PF_DEAD cleanup · 7407251a

由 Coywolf Qi Hunt 提交于 10月 30, 2005

The PF_DEAD setting doesn't belong to exit_notify(), move it to a proper
place.
Signed-off-by: NCoywolf Qi Hunt <qiyong@fc-cn.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7407251a

30 10月, 2005 1 次提交

[PATCH] mm: update_hiwaters just in time · 365e9c87

由 Hugh Dickins 提交于 10月 29, 2005

update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.

Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.

And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).

There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.

What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

365e9c87

28 10月, 2005 1 次提交

Revert "remove false BUG_ON() from run_posix_cpu_timers()" · a362f463

由 Linus Torvalds 提交于 10月 27, 2005

This reverts commit 3de463c7.

Roland has another patch that allows us to leave the BUG_ON() in place
by just making sure that the condition it tests for really is always
true.

That goes in next.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a362f463

24 10月, 2005 1 次提交

[PATCH] posix-timers: remove false BUG_ON() from run_posix_cpu_timers() · 3de463c7

由 Oleg Nesterov 提交于 10月 24, 2005

do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.

After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.

At this moment exiting task has no pending cpu timers, they were cleaned
up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just
return from irq.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3de463c7

22 10月, 2005 1 次提交

[PATCH] Call exit_itimers from do_exit, not __exit_signal · 25f407f0

由 Roland McGrath 提交于 10月 21, 2005

When I originally moved exit_itimers into __exit_signal, that was the only
place where we could reliably know it was the last thread in the group
dying, without races.  Since then we've gotten the signal_struct.live
counter, and do_exit can reliably do group-wide cleanup work.

This patch moves the call to do_exit, where it's made without locks.  This
avoids the deadlock issues that the old __exit_signal code's comment talks
about, and the one that Oleg found recently with process CPU timers.

[ This replaces e03d13e9, which is why
  it was just reverted. ]
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

25f407f0

02 10月, 2005 1 次提交

Fix inequality comparison against "task->state" · 14bf01bb

由 Linus Torvalds 提交于 10月 01, 2005

We should always use bitmask ops, rather than depend on some ordering of
the different states.  With the TASK_NONINTERACTIVE flag, the inequality
doesn't really work.

Oleg Nesterov argues (likely correctly) that this test is unnecessary in
the first place.  However, the minimal fix for now is to at least make
it work in the presense of TASK_NONINTERACTIVE.  Waiting for consensus
from Roland & co on potential bigger cleanups.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

14bf01bb