提交 · 48f24c4da1ee7f3f22289cb85e8b8a73e4df4db5 · gsplhtlxg / clone-Linux

04 7月, 2006 4 次提交

[PATCH] lockdep: annotate ->mmap_sem · ad339451

由 Ingo Molnar 提交于 7月 03, 2006

Teach special (recursive) locking code to the lock validator.  Has no effect
on non-lockdep kernels.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ad339451

[PATCH] lockdep: core · fbb9ce95

由 Ingo Molnar 提交于 7月 03, 2006

Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.

Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.

What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.

When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]

Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.

To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.

To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:

lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176

The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.

More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:

http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt

[bunk@stusta.de: cleanups]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fbb9ce95

[PATCH] lockdep: irqtrace subsystem, core · de30a2b3

由 Ingo Molnar 提交于 7月 03, 2006

Accurate hard-IRQ-flags and softirq-flags state tracing.

This allows us to attach extra functionality to IRQ flags on/off
events (such as trace-on/off).
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

de30a2b3

[PATCH] lockdep: better lock debugging · 9a11b49a

由 Ingo Molnar 提交于 7月 03, 2006

Generic lock debugging:

 - generalized lock debugging framework. For example, a bug in one lock
   subsystem turns off debugging in all lock subsystems.

 - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
   the mutex/rtmutex debugging code: it caused way too much prototype
   hackery, and lockdep will give the same information anyway.

 - ability to do silent tests

 - check lock freeing in vfree too.

 - more finegrained debugging options, to allow distributions to
   turn off more expensive debugging features.

There's no separate 'held mutexes' list anymore - but there's a 'held locks'
stack within lockdep, which unifies deadlock detection across all lock
classes.  (this is independent of the lockdep validation stuff - lockdep first
checks whether we are holding a lock already)

Here are the current debugging options:

CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y

which do:

 config DEBUG_MUTEXES
          bool "Mutex debugging, basic checks"

 config DEBUG_LOCK_ALLOC
         bool "Detect incorrect freeing of live mutexes"
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9a11b49a

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

28 6月, 2006 2 次提交

[PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support · c87e2837

由 Ingo Molnar 提交于 6月 27, 2006

This adds the actual pi-futex implementation, based on rt-mutexes.

[dino@in.ibm.com: fix an oops-causing race]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c87e2837

[PATCH] pi-futex: rt mutex core · 23f78d4a

由 Ingo Molnar 提交于 6月 27, 2006

Core functions for the rt-mutex subsystem.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

23f78d4a

27 6月, 2006 2 次提交

[PATCH] coredump: copy_process: don't check SIGNAL_GROUP_EXIT · cf2dfbfb

由 Oleg Nesterov 提交于 6月 26, 2006

After the previous patch SIGNAL_GROUP_EXIT implies a pending SIGKILL, we
can remove this check from copy_process() because we already checked
!signal_pending().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cf2dfbfb

[PATCH] proc: Rewrite the proc dentry flush on exit optimization · 48e6484d

由 Eric W. Biederman 提交于 6月 26, 2006

To keep the dcache from filling up with dead /proc entries we flush them on
process exit.  However over the years that code has gotten hairy with a
dentry_pointer and a lock in task_struct and misdocumented as a correctness
feature.

I have rewritten this code to look and see if we have a corresponding entry in
the dcache and if so flush it on process exit.  This removes the extra fields
in the task_struct and allows me to trivially handle the case of a
/proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

48e6484d

26 6月, 2006 1 次提交

[PATCH] pacct: add pacct_struct to fix some pacct bugs. · 0e464814

由 KaiGai Kohei 提交于 6月 25, 2006

The pacct facility need an i/o operation when an accounting record is
generated. There is a possibility to wake OOM killer up. If OOM killer is
activated, it kills some processes to make them release process memory
regions.

But acct_process() is called in the killed processes context before calling
exit_mm(), so those processes cannot release own memory. In the results, any
processes stop in this point and it finally cause a system stall.

0e464814

23 6月, 2006 2 次提交

[PATCH] dup fd error fix · 6e667260

由 Prasanna Meda 提交于 6月 23, 2006

Set errorp in dup_fd, it will be used in sys_unshare also.
Signed-off-by: NPrasanna Meda <mlp@google.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6e667260

[PATCH] mmput() might sleep · 0ae26f1b

由 Andrew Morton 提交于 6月 23, 2006

exit_aio() and exit_mmap() can sleep.  But it's easy to accidentally call
mmput() from inside locks.

Cc: Dave Peterson <dsp@llnl.gov>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0ae26f1b

01 5月, 2006 1 次提交
- A
  [PATCH] move call of audit_free() into do_exit() · fa84cb93
  由 Al Viro 提交于 3月 29, 2006
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fa84cb93
20 4月, 2006 2 次提交

[PATCH] Don't inherit ->splice_pipe across forks · a0aa7f68

由 Jens Axboe 提交于 4月 20, 2006

It's really task private, so clear that field on fork after copying
task structure.
Signed-off-by: NJens Axboe <axboe@suse.de>

a0aa7f68

[PATCH] task: Make task list manipulations RCU safe · 5e85d4ab

由 Eric W. Biederman 提交于 4月 18, 2006

While we can currently walk through thread groups, process groups, and
sessions with just the rcu_read_lock, this opens the door to walking the
entire task list.

We already have all of the other RCU guarantees so there is no cost in
doing this, this should be enough so that proc can stop taking the
tasklist lock during readdir.

prev_task was killed because it has no users, and using it will miss new
tasks when doing an rcu traversal.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5e85d4ab

15 4月, 2006 1 次提交

[PATCH] kill unushed __put_task_struct_cb · 64541d19

由 Eric W. Biederman 提交于 4月 14, 2006

Somehow in the midst of dotting i's and crossing t's during
the merge up to rc1 we wound up keeping __put_task_struct_cb
when it should have been killed as it no longer has any users.
Sorry I probably should have caught this while it was
still in the -mm tree.

Having the old code there gets confusing when reading
through the code and trying to understand what is
happening.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

64541d19

01 4月, 2006 3 次提交

[PATCH] wrong error path in dup_fd() leading to oopses in RCU · 42862298

由 Kirill Korotaev 提交于 3月 31, 2006

Wrong error path in dup_fd() - it should return NULL on error,
not an address of already freed memory :/

Triggered by OpenVZ stress test suite.

What is interesting is that it was causing different oopses in RCU like
below:
Call Trace:
   [<c013492c>] rcu_do_batch+0x2c/0x80
   [<c0134bdd>] rcu_process_callbacks+0x3d/0x70
   [<c0126cf3>] tasklet_action+0x73/0xe0
   [<c01269aa>] __do_softirq+0x10a/0x130
   [<c01058ff>] do_softirq+0x4f/0x60
   =======================
   [<c0113817>] smp_apic_timer_interrupt+0x77/0x110
   [<c0103b54>] apic_timer_interrupt+0x1c/0x24
  Code:  Bad EIP value.
   <0>Kernel panic - not syncing: Fatal exception in interrupt
Signed-Off-By: NPavel Emelianov <xemul@sw.ru>
Signed-Off-By: NDmitry Mishin <dim@openvz.org>
Signed-Off-By: NKirill Korotaev <dev@openvz.org>
Signed-Off-By: NLinus Torvalds <torvalds@osdl.org>

42862298

[PATCH] pidhash: Refactor the pid hash table · 92476d7f

由 Eric W. Biederman 提交于 3月 31, 2006

Simplifies the code, reduces the need for 4 pid hash tables, and makes the
code more capable.

In the discussions I had with Oleg it was felt that to a large extent the
cleanup itself justified the work. With struct pid being dynamically
allocated meant we could create the hash table entry when the pid was
allocated and free the hash table entry when the pid was freed. Instead of
playing with the hash lists when ever a process would attach or detach to a
process.

For myself the fact that it gave what my previous task_ref patch gave for free
with simpler code was a big win. The problem is that if you hold a reference
to struct task_struct you lock in 10K of low memory. If you do that in a user
controllable way like /proc does, with an unprivileged but hostile user space
application with typical resource limits of 1000 fds and 100 processes I can
trigger the OOM killer by consuming all of low memory with task structs, on a
machine wight 1GB of low memory.

If I instead hold a reference to struct pid which holds a pointer to my
task_struct, I don't suffer from that problem because struct pid is 2 orders
of magnitude smaller. In fact struct pid is small enough that most other
kernel data structures dwarf it, so simply limiting the number of referring
data structures is enough to prevent exhaustion of low memory.

This splits the current struct pid into two structures, struct pid and struct
pid_link, and reduces our number of hash tables from PIDTYPE_MAX to just one.
struct pid_link is the per process linkage into the hash tables and lives in
struct task_struct. struct pid is given an indepedent lifetime, and holds
pointers to each of the pid types.

The independent life of struct pid simplifies attach_pid, and detach_pid,
because we are always manipulating the list of pids and not the hash table.
In addition in giving struct pid an indpendent life it makes the concept much
more powerful.

Kernel data structures can now embed a struct pid * instead of a pid_t and
not suffer from pid wrap around problems or from keeping unnecessarily
large amounts of memory allocated.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

92476d7f

[PATCH] resurrect __put_task_struct · 158d9ebd

由 Andrew Morton 提交于 3月 31, 2006

This just got nuked in mainline.  Bring it back because Eric's patches use it.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

158d9ebd

29 3月, 2006 9 次提交

[PATCH] cleanup __exit_signal->cleanup_sighand path · a7e5328a

由 Oleg Nesterov 提交于 3月 28, 2006

Move 'tsk->sighand = NULL' from cleanup_sighand() to __exit_signal().  This
makes the exit path more understandable and allows us to do
cleanup_sighand() outside of ->siglock protected section.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a7e5328a

[PATCH] make fork() atomic wrt pgrp/session signals · 4a2c7a78

由 Oleg Nesterov 提交于 3月 28, 2006

Eric W. Biederman wrote:
>
> Ok. SUSV3/Posix is clear, fork is atomic with respect
> to signals.  Either a signal comes before or after a
> fork but not during. (See the rationale section).
> http://www.opengroup.org/onlinepubs/000095399/functions/fork.html
>
> The tasklist_lock does not stop forks from adding to a process
> group. The forks stall while the tasklist_lock is held, but a fork
> that began before we grabbed the tasklist_lock simply completes
> afterwards, and the child does not receive the signal.

This also means that SIGSTOP or sig_kernel_coredump() signal can't
be delivered to pgrp/session reliably.

With this patch copy_process() returns -ERESTARTNOINTR when it
detects a pending signal, fork() will be restarted transparently
after handling the signals.

This patch also deletes now unneeded "group_stop_count > 0" check,
copy_process() can no longer succeed while group stop in progress.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-By: NEric Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4a2c7a78

[PATCH] pids: kill PIDTYPE_TGID · 47e65328

由 Oleg Nesterov 提交于 3月 28, 2006

This patch kills PIDTYPE_TGID pid_type thus saving one hash table in
kernel/pid.c and speeding up subthreads create/destroy a bit.  It is also a
preparation for the further tref/pids rework.

This patch adds 'struct list_head thread_group' to 'struct task_struct'
instead.

We don't detach group leader from PIDTYPE_PID namespace until another
thread inherits it's ->pid == ->tgid, so we are safe wrt premature
free_pidmap(->tgid) call.

Currently there are no users of find_task_by_pid_type(PIDTYPE_TGID).
Should the need arise, we can use find_task_by_pid()->group_leader.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Acked-By: NEric Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

47e65328

[PATCH] rename __exit_sighand to cleanup_sighand · c81addc9

由 Oleg Nesterov 提交于 3月 28, 2006

Cosmetic, rename __exit_sighand to cleanup_sighand and move it close to
copy_sighand().

This matches copy_signal/cleanup_signal naming, and I think it is easier to
follow.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c81addc9

[PATCH] copy_process: cleanup bad_fork_cleanup_signal · 6b3934ef

由 Oleg Nesterov 提交于 3月 28, 2006

__exit_signal() does important cleanups atomically under ->siglock.  It is
also called from copy_process's error path.  This is not good, for example we
can't move __unhash_process() under ->siglock for that reason.

We should not mix these 2 paths, just look at ugly 'if (p->sighand)' under
'bad_fork_cleanup_sighand:' label.  For copy_process() case it is sufficient
to just backout copy_signal(), nothing more.

Again, nobody can see this task yet.  For CLONE_THREAD case we just decrement
signal->count, otherwise nobody can see this ->signal and we can free it
lockless.

This patch assumes it is safe to do exit_thread_group_keys() without
tasklist_lock.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6b3934ef

[PATCH] copy_process: cleanup bad_fork_cleanup_sighand · 7001510d

由 Oleg Nesterov 提交于 3月 28, 2006

The only caller of exit_sighand(tsk) is copy_process's error path.  We can
call __exit_sighand() directly and kill exit_sighand().

This 'tsk' was not yet registered in pid_hash[] or init_task.tasks, it has no
external references, nobody can see it, and

	IF (clone_flags & CLONE_SIGHAND)
		At least 'current' has a reference to ->sighand, this
		means atomic_dec_and_test(sighand->count) can't be true.

	ELSE
		Nobody can see this ->sighand, this means we can free it
		without any locking.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7001510d

[PATCH] convert sighand_cache to use SLAB_DESTROY_BY_RCU · aa1757f9

由 Oleg Nesterov 提交于 3月 28, 2006

This patch borrows a clever Hugh's 'struct anon_vma' trick.

Without tasklist_lock held we can't trust task->sighand until we locked it
and re-checked that it is still the same.

But this means we don't need to defer 'kmem_cache_free(sighand)'.  We can
return the memory to slab immediately, all we need is to be sure that
sighand->siglock can't dissapear inside rcu protected section.

To do so we need to initialize ->siglock inside ctor function,
SLAB_DESTROY_BY_RCU does the rest.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aa1757f9

[PATCH] pidhash: don't count idle threads · 73b9ebfe

由 Oleg Nesterov 提交于 3月 28, 2006

fork_idle() does unhash_process() just after copy_process().  Contrary,
boot_cpu's idle thread explicitely registers itself for each pid_type with nr
= 0.

copy_process() already checks p->pid != 0 before process_counts++, I think we
can just skip attach_pid() calls and job control inits for idle threads and
kill unhash_process().  We don't need to cleanup ->proc_dentry in fork_idle()
because with this patch idle threads are never hashed in
kernel/pid.c:pid_hash[].

We don't need to hash pid == 0 in pidmap_init().  free_pidmap() is never
called with pid == 0 arg, so it will never be reused.  So it is still possible
to use pid == 0 in any PIDTYPE_xxx namespace from kernel/pid.c's POV.

However with this patch we don't hash pid == 0 for PIDTYPE_PID case.  We still
have have PIDTYPE_PGID/PIDTYPE_SID entries with pid == 0: /sbin/init and
kernel threads which don't call daemonize().
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

73b9ebfe

[PATCH] kill SET_LINKS/REMOVE_LINKS · c97d9893

由 Oleg Nesterov 提交于 3月 28, 2006

Both SET_LINKS() and SET_LINKS/REMOVE_LINKS() have exactly one caller, and
these callers already check thread_group_leader().

This patch kills theese macros, they mix two different things: setting
process's parent and registering it in init_task.tasks list.  Callers are
updated to do these actions by hand.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c97d9893

28 3月, 2006 1 次提交

[PATCH] lightweight robust futexes updates · 8f17d3a5

由 Ingo Molnar 提交于 3月 27, 2006

- fix: initialize the robust list(s) to NULL in copy_process.

- doc update

- cleanup: rename _inuser to _inatomic

- __user cleanups and other small cleanups
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8f17d3a5

27 3月, 2006 2 次提交

[PATCH] hrtimers: remove data field · 05cfb614

由 Roman Zippel 提交于 3月 26, 2006

The nanosleep cleanup allows to remove the data field of hrtimer.  The
callback function can use container_of() to get it's own data.  Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

05cfb614

BUG_ON() Conversion in kernel/fork.c · 910dea7f

由 Eric Sesterhenn 提交于 3月 26, 2006

this changes if() BUG(); constructs to BUG_ON() which is
cleaner, contains unlikely() and can better optimized away.
Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

910dea7f

24 3月, 2006 2 次提交

[PATCH] cpuset memory spread slab cache optimizations · c61afb18

由 Paul Jackson 提交于 3月 24, 2006

The hooks in the slab cache allocator code path for support of NUMA
mempolicies and cpuset memory spreading are in an important code path.  Many
systems will use neither feature.

This patch optimizes those hooks down to a single check of some bits in the
current tasks task_struct flags.  For non NUMA systems, this hook and related
code is already ifdef'd out.

The optimization is done by using another task flag, set if the task is using
a non-default NUMA mempolicy.  Taking this flag bit along with the
PF_SPREAD_PAGE and PF_SPREAD_SLAB flag bits added earlier in this 'cpuset
memory spreading' patch set, one can check for the combination of any of these
special case memory placement mechanisms with a single test of the current
tasks task_struct flags.

This patch also tightens up the code, to save a few bytes of kernel text
space, and moves some of it out of line.  Due to the nested inlines called
from multiple places, we were ending up with three copies of this code, which
once we get off the main code path (for local node allocation) seems a bit
wasteful of instruction memory.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c61afb18

J
[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23 · 2056a782
由 Jens Axboe 提交于 3月 23, 2006
```
Signed-off-by: NJens Axboe <axboe@suse.de>
```
2056a782

23 3月, 2006 1 次提交

[PATCH] Shrinks sizeof(files_struct) and better layout · 0c9e63fd

由 Eric Dumazet 提交于 3月 23, 2006

1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
   platforms, lowering kmalloc() allocated space by 50%.

2) Reduce the size of (files_struct), using a special 32 bits (or
   64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
   close_on_exec_init and open_fds_init fields.  This save some ram (248
   bytes per task) as most tasks dont open more than 32 files.  D-Cache
   footprint for such tasks is also reduced to the minimum.

3) Reduce size of allocated fdset.  Currently two full pages are
   allocated, that is 32768 bits on x86 for example, and way too much.  The
   minimum is now L1_CACHE_BYTES.

UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 ..  2] being in the
same cache line)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0c9e63fd

22 3月, 2006 1 次提交

[PATCH] unshare: Error if passed unsupported flags · 06f9d4f9

由 Eric W. Biederman 提交于 3月 22, 2006

A bare bones trivial patch to ensure we always get -EINVAL on the
unsupported cases for sys_unshare.  If this goes in before 2.6.16 it allows
us to forward compatible with future applications using sys_unshare.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: JANAK DESAI <janak@us.ibm.com>
Cc: <stable@kerenl.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

06f9d4f9

19 3月, 2006 1 次提交

[PATCH] disable unshare(CLONE_VM) for now · 2d61b867

由 Oleg Nesterov 提交于 3月 18, 2006

sys_unshare() does mmput(new_mm).  This is not enough if we have
mm->core_waiters.

This patch is a temporary fix for soon to be released 2.6.16.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
[ Checked with Uli: "I'm not planning to use unshare(CLONE_VM).  It's
  not needed for any functionality planned so far.  What we (as in Red
  Hat) need unshare() for now is the filesystem side." ]
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2d61b867

17 3月, 2006 1 次提交

[PATCH] unshare: Use rcu_assign_pointer when setting sighand · e0e8eb54

由 Eric W. Biederman 提交于 3月 16, 2006

The sighand pointer only needs the rcu_read_lock on the
read side.  So only depending on task_lock protection
when setting this pointer is not enough.  We also need
a memory barrier to ensure the initialization is seen first.

Use rcu_assign_pointer as it does this for us, and clearly
documents that we are setting an rcu readable pointer.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPaul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e0e8eb54

14 3月, 2006 1 次提交

[PATCH] Fix sigaltstack corruption among cloned threads · f9a3879a

由 GOTO Masanori 提交于 3月 13, 2006

This patch fixes alternate signal stack corruption among cloned threads
with CLONE_SIGHAND (and CLONE_VM) for linux-2.6.16-rc6.

The value of alternate signal stack is currently inherited after a call of
clone(...  CLONE_SIGHAND | CLONE_VM).  But if sigaltstack is set by a
parent thread, and then if multiple cloned child threads (+ parent threads)
call signal handler at the same time, some threads may be conflicted -
because they share to use the same alternative signal stack region.
Finally they get sigsegv.  It's an undesirable race condition.  Note that
child threads created from NPTL pthread_create() also hit this conflict
when the parent thread uses sigaltstack, without my patch.

To fix this problem, this patch clears the child threads' sigaltstack
information like exec().  This behavior follows the SUSv3 specification.
In SUSv3, pthread_create() says "The alternate stack shall not be inherited
(when new threads are initialized)".  It means that sigaltstack should be
cleared when sigaltstack memory space is shared by cloned threads with
CLONE_SIGHAND.

Note that I chose "if (clone_flags & CLONE_SIGHAND)" line because:
  - If clone_flags line is not existed, fork() does not inherit sigaltstack.
  - CLONE_VM is another choice, but vfork() does not inherit sigaltstack.
  - CLONE_SIGHAND implies CLONE_VM, and it looks suitable.
  - CLONE_THREAD is another candidate, and includes CLONE_SIGHAND + CLONE_VM,
    but this flag has a bit different semantics.
I decided to use CLONE_SIGHAND.

[ Changed to test for CLONE_VM && !CLONE_VFORK after discussion --Linus ]
Signed-off-by: NGOTO Masanori <gotom@sanori.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NLinus Torvalds <torvalds@osdl.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f9a3879a

12 3月, 2006 1 次提交

[PATCH] remove __put_task_struct_cb export again · 7cd9013b

由 Christoph Hellwig 提交于 3月 11, 2006

The patch '[PATCH] RCU signal handling' [1] added an export for
__put_task_struct_cb, a put_task_struct helper newly introduced in that
patch.  But the put_task_struct couldn't be used modular previously as
__put_task_struct wasn't exported.  There are not callers of it in modular
code, and it shouldn't be exported because we don't want drivers to hold
references to task_structs.

This patch removes the export and folds __put_task_struct into
__put_task_struct_cb as there's no other caller.

[1] http://www2.kernel.org/git/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e56d090310d7625ecb43a1eeebd479f04affb48bSigned-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NPaul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7cd9013b

16 2月, 2006 1 次提交

[PATCH] fix kill_proc_info() vs fork() theoretical race · dadac81b

由 Oleg Nesterov 提交于 2月 15, 2006

copy_process:

	attach_pid(p, PIDTYPE_PID, p->pid);
	attach_pid(p, PIDTYPE_TGID, p->tgid);

What if kill_proc_info(p->pid) happens in between?

copy_process() holds current->sighand.siglock, so we are safe
in CLONE_THREAD case, because current->sighand == p->sighand.

Otherwise, p->sighand is unlocked, the new process is already
visible to the find_task_by_pid(), but have a copy of parent's
'struct pid' in ->pids[PIDTYPE_TGID].

This means that __group_complete_signal() may hang while doing

	do ... while (next_thread() != p)

We can solve this problem if we reverse these 2 attach_pid()s:

	attach_pid() does wmb()

	group_send_sig_info() calls spin_lock(), which
	provides a read barrier. // Yes ?

I don't think we can hit this race in practice, but still.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

dadac81b