提交 · bbea9f69668a3d0cf9feba15a724cd02896f8675 · xiphi1978 / linux

11 12月, 2006 1 次提交

[PATCH] fdtable: Make fdarray and fdsets equal in size · bbea9f69

由 Vadim Lobanov 提交于 12月 10, 2006

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets.  The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal.  This
patch removes fdtable->max_fdset.  As an added bonus, most of the supporting
code becomes simpler.
Signed-off-by: NVadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bbea9f69

09 12月, 2006 2 次提交

[PATCH] add child reaper to pid_namespace · 84d73786

由 Sukadev Bhattiprolu 提交于 12月 08, 2006

Add a per pid_namespace child-reaper.  This is needed so processes are reaped
within the same pid space and do not spill over to the parent pid space.  Its
also needed so containers preserve existing semantic that pid == 1 would reap
orphaned children.

This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

84d73786

[PATCH] VFS: change struct file to use struct path · 0f7fc9e4

由 Josef "Jeff" Sipek 提交于 12月 08, 2006

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.
Signed-off-by: NJosef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0f7fc9e4

08 12月, 2006 2 次提交

[PATCH] do_coredump() and not stopping rewrite attacks? · 6d4df677

由 Alexey Dobriyan 提交于 12月 06, 2006

On Sat, Dec 02, 2006 at 11:47:44PM +0300, Alexey Dobriyan wrote:
> David Binderman compiled 2.6.19 with icc and grepped for "was set but never
> used". Many warnings are on
> 	http://coderock.org/kj/unused-2.6.19-fs

Heh, the very first line:
fs/exec.c(1465): remark #593: variable "flag" was set but never used

fs/exec.c:
  1477		/*
  1478		 *	We cannot trust fsuid as being the "true" uid of the
  1479		 *	process nor do we know its entire history. We only know it
  1480		 *	was tainted so we dump it as root in mode 2.
  1481		 */
  1482		if (mm->dumpable == 2) {	/* Setuid core dump mode */
  1483			flag = O_EXCL;		/* Stop rewrite attacks */
  1484			current->fsuid = 0;	/* Dump root private */
  1485		}

And then filp_open follows with "flag" totally ignored.

(akpm: this restores the code to Alan's original version.  Andi's "Support
piping into commands in /proc/sys/kernel/core_pattern" (cset d025c9db) broke
it).

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@kerenl.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6d4df677

[PATCH] slab: remove SLAB_KERNEL · e94b1766

由 Christoph Lameter 提交于 12月 06, 2006

SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e94b1766

02 10月, 2006 1 次提交

[PATCH] namespaces: utsname: switch to using uts namespaces · e9ff3990

由 Serge E. Hallyn 提交于 10月 02, 2006

Replace references to system_utsname to the per-process uts namespace
where appropriate.  This includes things like uname.

Changes: Per Eric Biederman's comments, use the per-process uts namespace
	for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c

[jdike@addtoit.com: UML fix]
[clg@fr.ibm.com: cleanup]
[akpm@osdl.org: build fix]
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e9ff3990

01 10月, 2006 2 次提交

[PATCH] Support piping into commands in /proc/sys/kernel/core_pattern · d025c9db

由 Andi Kleen 提交于 9月 30, 2006

Using the infrastructure created in previous patches implement support to
pipe core dumps into programs.

This is done by overloading the existing core_pattern sysctl
with a new syntax:

|program

When the first character of the pattern is a '|' the kernel will instead
threat the rest of the pattern as a command to run.  The core dump will be
written to the standard input of that program instead of to a file.

This is useful for having automatic core dump analysis without filling up
disks.  The program can do some simple analysis and save only a summary of
the core dump.

The core dump proces will run with the privileges and in the name space of
the process that caused the core dump.

I also increased the core pattern size to 128 bytes so that longer command
lines fit.

Most of the changes comes from allowing core dumps without seeks.  They are
fairly straight forward though.

One small incompatibility is that if someone had a core pattern previously
that started with '|' they will get suddenly new behaviour.  I think that's
unlikely to be a real problem though.

Additional background:

> Very nice, do you happen to have a program that can accept this kind of
> input for crash dumps?  I'm guessing that the embedded people will
> really want this functionality.

I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
ran gdb on it to get a backtrace and wrote the summary to a shared directory.
Then there was a simple CGI script to generate a "top 10" crashes HTML
listing.

Unfortunately this still had the disadvantage to needing full disk space for a
dump except for deleting it afterwards (in fact it was worse because over the
pipe holes didn't work so if you have a holey address map it would require
more space).

Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
cores (at least it worked with zsh's =(cat core) syntax), so it would be
likely possible to do it without temporary space with a simple wrapper that
calls it in the right way.  I ran out of time before doing that though.

The demo prototype scripts weren't very good.  If there is really interest I
can dig them out (they are currently on a laptop disk on the desk with the
laptop itself being in service), but I would recommend to rewrite them for any
serious application of this and fix the disk space problem.

Also to be really useful it should probably find a way to automatically fetch
the debuginfos (I cheated and just installed them in advance).  If nobody else
does it I can probably do the rewrite myself again at some point.

My hope at some point was that desktops would support it in their builtin
crash reporters, but at least the KDE people I talked too seemed to be happy
with their user space only solution.

Alan sayeth:

  I don't believe that piping as such as neccessarily the right model, but
  the ability to intercept and processes core dumps from user space is asked
  for by many enterprise users as well.  They want to know about, capture,
  analyse and process core dumps, often centrally and in automated form.

[akpm@osdl.org: loff_t != unsigned long]
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d025c9db

[PATCH] csa: convert CONFIG tag for extended accounting routines · 8f0ab514

由 Jay Lan 提交于 9月 30, 2006

There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT.  This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT.  A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.
Signed-off-by: NJay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8f0ab514

30 9月, 2006 1 次提交

[PATCH] Fix unserialized task->files changing · 3b9b8ab6

由 Kirill Korotaev 提交于 9月 29, 2006

Fixed race on put_files_struct on exec with proc.  Restoring files on
current on error path may lead to proc having a pointer to already kfree-d
files_struct.

->files changing at exit.c and khtread.c are safe as exit_files() makes all
things under lock.

Found during OpenVZ stress testing.

[akpm@osdl.org: add export]
Signed-off-by: NPavel Emelianov <xemul@openvz.org>
Signed-off-by: NKirill Korotaev <dev@openvz.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3b9b8ab6

27 9月, 2006 2 次提交

[PATCH] de_thread: Use tsk not current · aafe6c2a

由 Eric W. Biederman 提交于 9月 27, 2006

Ingo Oeser pointed out that because current expands to an inline function
it is more space efficient and somewhat faster to simply keep a cached copy
of current in another variable.  This patch implements that for the
de_thread function.

(akpm: saves nearly 100 bytes of text on x86)
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aafe6c2a

[PATCH] pid: Implement transfer_pid and use it to simplify de_thread · c18258c6

由 Eric W. Biederman 提交于 9月 27, 2006

In de_thread we move pids from one process to another, a rather ugly case.
The function transfer_pid makes it clear what we are doing, and makes the
action atomic.  This is useful we ever want to atomically traverse the
process group and session lists, in a rcu safe manner.

Even if the atomic properties this change should be a win as transfer_pid
should be less code to execute than executing both attach_pid and
detach_pid, and this should make de_thread slightly smaller as only a
single function call needs to be emitted.  The only downside is that the
code might be slower to execute as the odds are against transfer_pid being
in cache.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c18258c6

28 8月, 2006 1 次提交

[PATCH] fix up lockdep trace in fs/exec.c · 513627d7

由 Dave Jones 提交于 8月 27, 2006

This fixes the locking error noticed by lockdep:

  =============================================
  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  init/1 is trying to acquire lock:
   (&sighand->siglock){....}, at: [<c047a78a>] flush_old_exec+0x3ae/0x859

  but task is already holding lock:
   (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859

  other info that might help us debug this:
  2 locks held by init/1:
   #0:  (tasklist_lock){..--}, at: [<c047a76a>] flush_old_exec+0x38e/0x859
   #1:  (&sighand->siglock){....}, at: [<c047a77a>] flush_old_exec+0x39e/0x859

  stack backtrace:
   [<c04051e1>] show_trace_log_lvl+0x54/0xfd
   [<c040579d>] show_trace+0xd/0x10
   [<c04058b6>] dump_stack+0x19/0x1b
   [<c043b33a>] __lock_acquire+0x773/0x997
   [<c043bacf>] lock_acquire+0x4b/0x6c
   [<c060630b>] _spin_lock+0x19/0x28
   [<c047a78a>] flush_old_exec+0x3ae/0x859
   [<c0498053>] load_elf_binary+0x4aa/0x1628
   [<c0479cab>] search_binary_handler+0xa7/0x24e
   [<c047b577>] do_execve+0x15b/0x1f9
   [<c04022b4>] sys_execve+0x29/0x4d
   [<c0403faf>] syscall_call+0x7/0xb
Signed-off-by: NArjan van de Ven <arjan@infradead.org>
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

513627d7

25 8月, 2006 2 次提交

VFS: Remove redundant open-coded mode bit checks in open_exec(). · a969fd5a

由 Trond Myklebust 提交于 8月 22, 2006

The check in open_exec() for inode->i_mode & 0111 has been made
redundant by the fix to permission().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 1d3741c5d991686699f100b65b9956f7ee7ae0ae commit)

a969fd5a

VFS: Remove redundant open-coded mode bit check in prepare_binfmt(). · 9167b0b9

由 Trond Myklebust 提交于 8月 22, 2006

The check in prepare_binfmt() for inode->i_mode & 0111 is redundant,
since open_exec() will already have done that.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 822dec482ced07af32c378cd936d77345786572b commit)

9167b0b9

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

27 6月, 2006 8 次提交

[PATCH] coredump: shutdown current process first · 5debfa6d

由 Oleg Nesterov 提交于 6月 26, 2006

This patch optimizes zap_threads() for the case when there are no ->mm
users except the current's thread group.  In that case we can avoid
'for_each_process()' loop.

It also adds a useful invariant: SIGNAL_GROUP_EXIT (if checked under
->siglock) always implies that all threads (except may be current) have
pending SIGKILL.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5debfa6d

[PATCH] coredump: some code relocations · dcf560c5

由 Oleg Nesterov 提交于 6月 26, 2006

This is a preparation for the next patch.  No functional changes.
Basically, this patch moves '->flags & SIGNAL_GROUP_EXIT' check into
zap_threads(), and 'complete(vfork_done)' into coredump_wait outside of
->mmap_sem protected area.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

dcf560c5

[PATCH] coredump: don't take tasklist_lock · 7b1c6154

由 Oleg Nesterov 提交于 6月 26, 2006

This patch removes tasklist_lock from zap_threads().
This is safe wrt:

	do_exit:
		The caller holds mm->mmap_sem. This means that task which
		shares the same ->mm can't pass exit_mm(), so it can't be
		unhashed from init_task.tasks or ->thread_group lists.

	fork:
		None of sub-threads can fork after zap_process(leader). All
		processes which were created before this point should be
		visible to zap_threads() because copy_process() adds the new
		process to the tail of init_task.tasks list, and ->siglock
		lock/unlock provides a memory barrier.

	de_thread:
		It does list_replace_rcu(&leader->tasks, &current->tasks).
		So zap_threads() will see either old or new leader, it does
		not matter. However, it can change p->sighand, so we should
		use lock_task_sighand() in zap_process().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7b1c6154

[PATCH] coredump: kill ptrace related stuff · d5f70c00

由 Oleg Nesterov 提交于 6月 26, 2006

With this patch zap_process() sets SIGNAL_GROUP_EXIT while sending SIGKILL to
the thread group.  This means that a TASK_TRACED task

	1. Will be awakened by signal_wake_up(1)

	2. Can't sleep again via ptrace_notify()

	3. Can't go to do_signal_stop() after return
	   from ptrace_stop() in get_signal_to_deliver()

So we can remove all ptrace related stuff from coredump path.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d5f70c00

[PATCH] coredump: speedup SIGKILL sending · 281de339

由 Oleg Nesterov 提交于 6月 26, 2006

With this patch a thread group is killed atomically under ->siglock.  This is
faster because we can use sigaddset() instead of force_sig_info() and this is
used in further patches.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

281de339

[PATCH] coredump: optimize ->mm users traversal · aceecc04

由 Oleg Nesterov 提交于 6月 26, 2006

zap_threads() iterates over all threads to find those ones which share
current->mm.  All threads in the thread group share the same ->mm, so we can
skip entire thread group if it has another ->mm.

This patch shifts the killing of thread group into the newly added
zap_process() function.  This looks as unnecessary complication, but it is
used in further patches.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aceecc04

[PATCH] de_thread: fix lockless do_each_thread · 2ceb8693

由 Oleg Nesterov 提交于 6月 26, 2006

We should keep the value of old_leader->tasks.next in de_thread, otherwise
we can't do for_each_process/do_each_thread without tasklist_lock held.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2ceb8693

[PATCH] proc: Rewrite the proc dentry flush on exit optimization · 48e6484d

由 Eric W. Biederman 提交于 6月 26, 2006

To keep the dcache from filling up with dead /proc entries we flush them on
process exit.  However over the years that code has gotten hairy with a
dentry_pointer and a lock in task_struct and misdocumented as a correctness
feature.

I have rewritten this code to look and see if we have a corresponding entry in
the dcache and if so flush it on process exit.  This removes the extra fields
in the task_struct and allows me to trivially handle the case of a
/proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

48e6484d

23 6月, 2006 1 次提交

[PATCH] remove steal_locks() · c89681ed

由 Miklos Szeredi 提交于 6月 22, 2006

This patch removes the steal_locks() function.

steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.

In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().

The steal_locks() function has an effect on applications doing:

clone(CLONE_FILES)
  /* in child */
  lock
  execve
  lock

POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".

According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.

Apps using NPTL are not affected, since all other threads are killed before
execve.

Apps using LinuxThreads are only affected if they

  - have multiple threads during exec (LinuxThreads doesn't kill other
    threads, the app may do it with pthread_kill_other_threads_np())
  - rely on POSIX locks being inherited across exec

Both conditions are documented, but not their interaction.

Apps using clone() natively are affected if they

  - use clone(CLONE_FILES)
  - rely on POSIX locks being inherited across exec

The above scenarios are unlikely, but possible.

If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.

That would add more complexity though, so this solution seems to be
preferred by most people.
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c89681ed

20 6月, 2006 1 次提交
- A
  [PATCH] execve argument logging · 473ae30b
  由 Al Viro 提交于 4月 26, 2006
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  473ae30b
20 4月, 2006 1 次提交

[PATCH] task: Make task list manipulations RCU safe · 5e85d4ab

由 Eric W. Biederman 提交于 4月 18, 2006

While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5e85d4ab

14 4月, 2006 1 次提交

[PATCH] de_thread: Don't change our parents and ptrace flags. · c06511d1

由 Eric W. Biederman 提交于 4月 14, 2006

This is two distinct changes.
 - Not changing our real parents.
 - Not changing our ptrace parents.

Not changing our real parents is trivially correct because both tasks
have the same real parents as they are part of a thread group.  Now that
we demote the leader to a thread there is no longer any reason to change
it's parentage.

Not changing our ptrace parents is a user visible change if someone
looks hard enough.  I don't think user space applications will care or
even notice.

In the practical and I think common case a debugger will have attached
to all of the threads using the same ptrace flags.  From my quick skim
of strace and gdb that appears to be the case.  Which if true means
debuggers will not notice a change.

Before this point we have already generated a ptrace event in do_exit
that reports the leaders pid has died so de_thread is visible to a
debugger.  Which means attempting to hide this case by copying flags
around appears excessive.

By not doing anything it avoids all of the weird locking issues between
de_thread and ptrace attach, and removes one case from consideration for
fixing the ptrace locking.

This only addresses Oleg's first concern with ptrace_attach, that of the
problems caused by reparenting.  Oleg's second concern is essentially a
race between ptrace_attach and release_task that causes an oops when we
get to force_sig_specific.  There is nothing special about de_thread
with respect to that race.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c06511d1

11 4月, 2006 2 次提交

[PATCH] process accounting: take original leader's start_time in non-leader exec · f5e90281

由 Roland McGrath 提交于 4月 10, 2006

The only record we have of the real-time age of a process, regardless of
execs it's done, is start_time.  When a non-leader thread exec, the
original start_time of the process is lost.  Things looking at the
real-time age of the process are fooled, for example the process accounting
record when the process finally dies.  This change makes the oldest
start_time stick around with the process after a non-leader exec.  This way
the association between PID and start_time is kept constant, which seems
correct to me.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f5e90281

[PATCH] de_thread: Don't confuse users do_each_thread. · de12a787

由 Eric W. Biederman 提交于 4月 10, 2006

Oleg Nesterov spotted two interesting bugs with the current de_thread
code.  The simplest is a long standing double decrement of
__get_cpu_var(process_counts) in __unhash_process.  Caused by
two processes exiting when only one was created.

The other is that since we no longer detach from the thread_group list
it is possible for do_each_thread when run under the tasklist_lock to
see the same task_struct twice.  Once on the task list as a
thread_group_leader, and once on the thread list of another
thread.

The double appearance in do_each_thread can cause a double increment
of mm_core_waiters in zap_threads resulting in problems later on in
coredump_wait.

To remedy those two problems this patch takes the simple approach
of changing the old thread group leader into a child thread.
The only routine in release_task that cares is __unhash_process,
and it can be trivially seen that we handle cleaning up a
thread group leader properly.

Since de_thread doesn't change the pid of the exiting leader process
and instead shares it with the new leader process.  I change
thread_group_leader to recognize group leadership based on the
group_leader field and not based on pids.  This should also be
slightly cheaper then the existing thread_group_leader macro.

I performed a quick audit and I couldn't see any user of
thread_group_leader that cared about the difference.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

de12a787

01 4月, 2006 1 次提交

BUG_ON() Conversion in fs/exec.c · 7dddb12c

由 Eric Sesterhenn 提交于 4月 01, 2006

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away
Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

7dddb12c

29 3月, 2006 5 次提交

[PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU · aa1757f9

由 Oleg Nesterov 提交于 3月 28, 2006

This patch borrows a clever Hugh's 'struct anon_vma' trick.

Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.

But this means we don't need to defer 'kmem_cache_free(sighand)'.  We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.

To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aa1757f9

[PATCH] remove add_parent()'s parent argument · 8fafabd8

由 Oleg Nesterov 提交于 3月 28, 2006

add_parent(p, parent) is always called with parent == p->parent, and it makes
no sense to do it differently.  This patch removes this argument.

No changes in affected .o files.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8fafabd8

[PATCH] pidhash: kill switch_exec_pids · d73d6529

由 Eric W. Biederman 提交于 3月 28, 2006

switch_exec_pids is only called from de_thread by way of exec, and it is
only called when we are exec'ing from a non thread group leader.

Currently switch_exec_pids gives the leader the pid of the thread and
unhashes and rehashes all of the process groups.  The leader is already in
the EXIT_DEAD state so no one cares about it's pids.  The only concern for
the leader is that __unhash_process called from release_task will function
correctly.  If we don't touch the leader at all we know that
__unhash_process will work fine so there is no need to touch the leader.

For the task becomming the thread group leader, we just need to give it the
pid of the old thread group leader, add it to the task list, and attach it
to the session and the process group of the thread group.

Currently de_thread is also adding the task to the task list which is just
silly.

Currently the only leader of __detach_pid besides detach_pid is
switch_exec_pids because of the ugly extra work that was being
performed.

So this patch removes switch_exec_pids because it is doing too much, it is
creating an unnecessary special case in pid.c, duing work duplicated in
de_thread, and generally obscuring what it is going on.

The necessary work is added to de_thread, and it seems to be a little
clearer there what is going on.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d73d6529

[PATCH] simplify exec from init's subthread · 1434261c

由 Oleg Nesterov 提交于 3月 28, 2006

I think it is enough to take tasklist_lock for reading while changing
child_reaper:

	Reparenting needs write_lock(tasklist_lock)

	Only one thread in a thread group can do exec()

	sighand->siglock garantees that get_signal_to_deliver()
	will not see a stale value of child_reaper.

This means that we can change child_reaper earlier, without calling
zap_other_threads() twice.

"child_reaper = current" is a NOOP when init does exec from main thread, we
don't care.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1434261c

[PATCH] exec: allow init to exec from any thread. · fef23e7f

由 Eric W. Biederman 提交于 3月 28, 2006

After looking at the problem of init calling exec some more I figured out
an easy way to make the code work.

The actual symptom without out this patch is that all threads will die
except pid == 1, and the thread calling exec.  The thread calling exec will
wait forever for pid == 1 to die.

Since pid == 1 does not install a handler for SIGKILL it will never die.

This modifies the tests for init from current->pid == 1 to the equivalent
current == child_reaper.  And then it causes exec in the ugly case to
modify child_reaper.

The only weird symptom is that you wind up with an init process that
doesn't have the oldest start time on the box.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fef23e7f

27 3月, 2006 1 次提交

[PATCH] hrtimers: remove data field · 05cfb614

由 Roman Zippel 提交于 3月 26, 2006

The nanosleep cleanup allows to remove the data field of hrtimer.  The
callback function can use container_of() to get it's own data.  Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

05cfb614

26 3月, 2006 2 次提交

[PATCH] use kzalloc and kcalloc in core fs code · 11b0b5ab

由 Oliver Neukum 提交于 3月 25, 2006

Signed-off-by: NOliver Neukum <oliver@neukum.name>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

11b0b5ab

[PATCH] Introduce FMODE_EXEC file flag · b500531e

由 Oleg Drokin 提交于 3月 25, 2006

Introduce FMODE_EXEC file flag, to indicate that file is being opened for
execution.  This is useful for distributed filesystems to maintain
consistent behavior for returning ETXTBUSY when opening for write and
execution happens on different nodes.

akpm:

  Needed by Lustre at present.  I assume their objective to to work towards
  being able to install Lustre on an unmodified distro kernel, which seems
  sane.  It should have zero runtime cost.

  Trond and Chuck indicate that NFS4 can probably use this too, for the same
  thing.

  Steven says it's also on the GFS todo list.
Signed-off-by: NOleg Drokin <green@linuxhacker.ru>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b500531e

01 3月, 2006 1 次提交

[PATCH] Add mm->task_size and fix powerpc vdso · 0551fbd2

由 Benjamin Herrenschmidt 提交于 2月 28, 2006

This patch adds mm->task_size to keep track of the task size of a given mm
and uses that to fix the powerpc vdso so that it uses the mm task size to
decide what pages to fault in instead of the current thread flags (which
broke when ptracing).

(akpm: I expect that mm_struct.task_size will become the way in which we
finally sort out the confusion between 32-bit processes and 32-bit mm's.  It
may need tweaks, but at this stage this patch is powerpc-only.)
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0551fbd2

16 2月, 2006 1 次提交

[PATCH] fix zap_thread's ptrace related problems · 5ecfbae0

由 Oleg Nesterov 提交于 2月 15, 2006

1. The tracee can go from ptrace_stop() to do_signal_stop()
   after __ptrace_unlink(p).

2. It is unsafe to __ptrace_unlink(p) while p->parent may wait
   for tasklist_lock in ptrace_detach().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5ecfbae0