提交 · d1f74e20b5b064a130cd0743a256c2d3cfe84010 · OpenHarmony / kernel_linux

04 6月, 2010 1 次提交

tracing/sched: Make preempt_schedule() notrace · d1f74e20

由 Steven Rostedt 提交于 6月 02, 2010

The function tracer code uses ftrace_preempt_disable() to disable
preemption instead of normal preempt_disable(). But there's a slight
race condition that may cause it to lose a preemption check.

This was made to keep the function tracer from recursing on itself
by disabling preemption then having the enable call the function tracer
again, causing infinite recursion.

The bug was assumed to happen if the call was just in schedule, but
this is incorrect. The bug is caused by preempt_schedule() which
is called by preempt_enable(). The calling of preempt_enable() when
NEED_RESCHED was set would call preempt_schedule() which would call
the function tracer again.

By making the preempt_schedule() and add_preempt_count() notrace
then this will prevent the inifinite recursion. This is because
the add_preempt_count() would stop the preempt_enable() in the
function tracer from calling preempt_schedule() again.

The sub_preempt_count() is also made notrace just to keep it
symmetric.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d1f74e20

28 5月, 2010 38 次提交

Avoid warning when CPU hotplug isn't enabled · 00b9b0af

由 Linus Torvalds 提交于 5月 27, 2010

Commit e9fb7631 ("cpu-hotplug: introduce cpu_notify(),
__cpu_notify(), cpu_notify_nofail()") also introduced this annoying
warning:

  kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used

when CONFIG_HOTPLUG_CPU wasn't set.

So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and
simplify it while at it.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00b9b0af

numa: in-kernel profiling: use cpu_to_mem() for per cpu allocations · 3dd6b5fb

由 Lee Schermerhorn 提交于 5月 26, 2010

In kernel profiling requires that we be able to allocate "local" memory
for each cpu.  Use "cpu_to_mem()" instead of "cpu_to_node()" to support
memoryless nodes.

Depends on the "numa_mem_id()" patch.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dd6b5fb

panic: call console_verbose() in panic · 5b530fc1

由 Anton Blanchard 提交于 5月 26, 2010

Most distros turn the console verbosity down and that means a backtrace
after a panic never makes it to the console.  I assume we haven't seen
this because a panic is often preceeded by an oops which will have called
console_verbose.  There are however a lot of places we call panic
directly, and they are broken.

Use console_verbose like we do in the oops path to ensure a directly
called panic will print a backtrace.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b530fc1

pids: fix fork_idle() to setup ->pids correctly · f106eee1

由 Oleg Nesterov 提交于 5月 26, 2010

copy_process(pid => &init_struct_pid) doesn't do attach_pid/etc.

It shouldn't, but this means that the idle threads run with the wrong
pids copied from the caller's task_struct. In x86 case the caller is
either kernel_init() thread or keventd.

In particular, this means that after the series of cpu_up/cpu_down an
idle thread (which never exits) can run with .pid pointing to nowhere.

Change fork_idle() to initialize idle->pids[] correctly. We only set
.pid = &init_struct_pid but do not add .node to list, INIT_TASK() does
the same for the boot-cpu idle thread (swapper).
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f106eee1

pids: increase pid_max based on num_possible_cpus · 72680a19

由 Hedi Berriche 提交于 5月 26, 2010

On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough.  A system with 1664 CPU's, there are
25163 processes started before the login prompt.  It's estimated that with
2048 CPU's we will pass the 32k limit.  With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.

This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: NHedi Berriche <hedi@sgi.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NRobin Holt <holt@sgi.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: John Stoffel <john@stoffel.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72680a19

cpuhotplug: do not need cpu_hotplug_begin() when CONFIG_HOTPLUG_CPU=n · 79a6cdeb

由 Lai Jiangshan 提交于 5月 26, 2010

Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't
need cpu_hotplug_begin() either.

This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code
block of CONFIG_HOTPLUG_CPU=y.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79a6cdeb

kernel/: convert cpu notifier to return encapsulate errno value · 80b5184c

由 Akinobu Mita 提交于 5月 26, 2010

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for kernel/*.c
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80b5184c

cpu-hotplug: return better errno on cpu hotplug failure · e6bde73b

由 Akinobu Mita 提交于 5月 26, 2010

Currently, onlining or offlining a CPU failure by one of the cpu notifiers
error always cause -EINVAL error.  (i.e.  writing 0 or 1 to
/sys/devices/system/cpu/cpuX/online gets EINVAL)

To get better error reporting rather than always getting -EINVAL, This
changes cpu_notify() to return -errno value with notifier_to_errno() and
fix the callers.  Now that cpu notifiers can return encapsulate errno
value.

Currently, all cpu hotplug notifiers return NOTIFY_OK, NOTIFY_BAD, or
NOTIFY_DONE.  So cpu_notify() can returns 0 or -EPERM with this change for
now.

(notifier_to_errno(NOTIFY_OK) == 0, notifier_to_errno(NOTIFY_DONE) == 0,
notifier_to_errno(NOTIFY_BAD) == -EPERM)

Forthcoming patches convert several cpu notifiers to return encapsulate
errno value with notifier_from_errno().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6bde73b

cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail() · e9fb7631

由 Akinobu Mita 提交于 5月 26, 2010

No functional change.  These are just wrappers of
raw_cpu_notifier_call_chain.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e9fb7631

proc: turn signal_struct->count into "int nr_threads" · b3ac022c

由 Oleg Nesterov 提交于 5月 26, 2010

No functional changes, just s/atomic_t count/int nr_threads/.

With the recent changes this counter has a single user, get_nr_threads()
And, none of its callers need the really accurate number of threads, not
to mention each caller obviously races with fork/exit.  It is only used to
report this value to the user-space, except first_tid() uses it to avoid
the unnecessary while_each_thread() loop in the unlikely case.

It is a bit sad we need a word in struct signal_struct for this, perhaps
we can change get_nr_threads() to approximate the number of threads using
signal->live and kill ->nr_threads later.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3ac022c

proc_sched_show_task(): use get_nr_threads() · 5089a976

由 Oleg Nesterov 提交于 5月 26, 2010

Trivial, use get_nr_threads() helper to read signal->count which we are
going to change.

Like other callers, proc_sched_show_task() doesn't need the exactly
precise nr_threads.

David said:

: Note that get_nr_threads() isn't completely equivalent (it can return 0
: where proc_sched_show_task() will display a 1).  But I don't think this
: should be a problem.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5089a976

check_unshare_flags: kill the bogus CLONE_SIGHAND/sig->count check · 6e1be45a

由 Oleg Nesterov 提交于 5月 26, 2010

check_unshare_flags(CLONE_SIGHAND) adds CLONE_THREAD to *flags_ptr if the
task is multithreaded to ensure unshare_thread() will fail.

Not only this is a bit strange way to return the error, this is absolutely
meaningless.  If signal->count > 1 then sighand->count must be also > 1,
and unshare_sighand() will fail anyway.

In fact, all CLONE_THREAD/SIGHAND/VM checks inside sys_unshare() do not
look right.  Fortunately this code doesn't really work anyway.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e1be45a

exit: move taskstats_tgid_free() from __exit_signal() to free_signal_struct() · 97101eb4

由 Oleg Nesterov 提交于 5月 26, 2010

Move taskstats_tgid_free() from __exit_signal() to free_signal_struct().

This way signal->stats never points to nowhere and we can read ->stats
lockless.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97101eb4

kill the obsolete thread_group_cputime_free() helper · a705be6b

由 Oleg Nesterov 提交于 5月 26, 2010

Kill the empty thread_group_cputime_free() helper.  It was needed to free
the per-cpu data which we no longer have.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a705be6b

exit: __exit_signal: use thread_group_leader() consistently · d40e48e0

由 Oleg Nesterov 提交于 5月 26, 2010

Cleanup:

- Add the boolean, group_dead = thread_group_leader(), for clarity.

- Do not test/set sig == NULL to detect the all-dead case, use this
  boolean.

- Pass this boolen to __unhash_process() and use it instead of another
  thread_group_leader() call which needs ->group_leader.

  This can be considered as microoptimization, but hopefully this also
  allows us do do other cleanups later.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d40e48e0

signals: kill the awful task_rq_unlock_wait() hack · b7b8ff63

由 Oleg Nesterov 提交于 5月 26, 2010

Now that task->signal can't go away we can revert the horrible hack added
by ad474cac ("fix for
account_group_exec_runtime(), make sure ->signal can't be freed under
rq->lock").

And we can do more cleanups sched_stats.h/posix-cpu-timers.c later.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b7b8ff63

signals: clear signal->tty when the last thread exits · 4ada856f

由 Oleg Nesterov 提交于 5月 26, 2010

When the last thread exits signal->tty is freed, but the pointer is not
cleared and points to nowhere.

This is OK.  Nobody should use signal->tty lockless, and it is no longer
possible to take ->siglock.  However this looks wrong even if correct, and
the nice OOPS is better than subtle and hard to find bugs.

Change __exit_signal() to clear signal->tty under ->siglock.

Note: __exit_signal() needs more cleanups.  It should not check "sig !=
NULL" to detect the all-dead case and we have the same issues with
signal->stats.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ada856f

signals: make task_struct->signal immutable/refcountable · ea6d290c

由 Oleg Nesterov 提交于 5月 26, 2010

We have a lot of problems with accessing task_struct->signal, it can
"disappear" at any moment.  Even current can't use its ->signal safely
after exit_notify().  ->siglock helps, but it is not convenient, not
always possible, and sometimes it makes sense to use task->signal even
after this task has already dead.

This patch adds the reference counter, sigcnt, into signal_struct.  This
reference is owned by task_struct and it is dropped in
__put_task_struct().  Perhaps it makes sense to export
get/put_signal_struct() later, but currently I don't see the immediate
reason.

Rename __cleanup_signal() to free_signal_struct() and unexport it.  With
the previous changes it does nothing except kmem_cache_free().

Change __exit_signal() to not clear/free ->signal, it will be freed when
the last reference to any thread in the thread group goes away.

Note:
	- when the last thead exits signal->tty can point to nowhere, see
	  the next patch.

	- with or without this patch signal_struct->count should go away,
	  or at least it should be "int nr_threads" for fs/proc. This will
	  be addressed later.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea6d290c

fork/exit: move tty_kref_put() outside of __cleanup_signal() · 4dec2a91

由 Oleg Nesterov 提交于 5月 26, 2010

tty_kref_put() has two callsites in copy_process() paths,

	1. if copy_process() suceeds it is called before we copy
	   signal->tty from parent

	2. otherwise it is called from __cleanup_signal() under
	   bad_fork_cleanup_signal: label

In both cases tty_kref_put() is not right and unneeded because we don't
have the balancing tty_kref_get().  Fortunately, this is harmless because
this can only happen without CLONE_THREAD, and in this case signal->tty
must be NULL.

Remove tty_kref_put() from copy_process() and __cleanup_signal(), and
change another caller of __cleanup_signal(), __exit_signal(), to call
tty_kref_put() by hand.

I hope this change makes sense by itself, but it is also needed to make
->signal refcountable.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NAlan Cox <alan@linux.intel.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4dec2a91

posix-cpu-timers: avoid "task->signal != NULL" checks · d30fda35

由 Oleg Nesterov 提交于 5月 26, 2010

Preparation to make task->signal immutable, no functional changes.

posix-cpu-timers.c checks task->signal != NULL to ensure this task is
alive and didn't pass __exit_signal().  This is correct but we are going
to change the lifetime rules for ->signal and never reset this pointer.

Change the code to check ->sighand instead, it doesn't matter which
pointer we check under tasklist, they both are cleared simultaneously.

As Roland pointed out, some of these changes are not strictly needed and
probably it makes sense to revert them later, when ->signal will be pinned
to task_struct.  But this patch tries to ensure the subsequent changes in
fork/exit can't make any visible impact on posix cpu timers.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d30fda35

exit: avoid sig->count in __exit_signal() to detect the group-dead case · 4a599942

由 Oleg Nesterov 提交于 5月 26, 2010

Change __exit_signal() to check thread_group_leader() instead of
atomic_dec_and_test(&sig->count).  This must be equivalent, the group
leader must be released only after all other threads have exited and
passed __exit_signal().

Henceforth sig->count is not actually used, except in fs/proc for
get_nr_threads/etc.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a599942

exit: avoid sig->count in de_thread/__exit_signal synchronization · d344193a

由 Oleg Nesterov 提交于 5月 26, 2010

de_thread() and __exit_signal() use signal_struct->count/notify_count for
synchronization.  We can simplify the code and use ->notify_count only.
Instead of comparing these two counters, we can change de_thread() to set
->notify_count = nr_of_sub_threads, then change __exit_signal() to
dec-and-test this counter and notify group_exit_task.

Note that __exit_signal() checks "notify_count > 0" just for symmetry with
exit_notify(), we could just check it is != 0.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d344193a

exit: change zap_other_threads() to count sub-threads · 09faef11

由 Oleg Nesterov 提交于 5月 26, 2010

Change zap_other_threads() to return the number of other sub-threads found
on ->thread_group list.

Other changes are cosmetic:

	- change the code to use while_each_thread() helper

	- remove the obsolete comment about SIGKILL/SIGSTOP
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09faef11

exit: exit_notify() can trust signal->notify_count < 0 · 9c339168

由 Oleg Nesterov 提交于 5月 26, 2010

signal_struct->count in its current form must die.

- it has no reasons to be atomic_t

- it looks like a reference counter, but it is not

- otoh, we really need to make task->signal refcountable, just look at
  the extremely ugly task_rq_unlock_wait() called from __exit_signals().

- we should change the lifetime rules for task->signal, it should be
  pinned to task_struct.  We have a lot of code which can be simplified
  after that.

- it is not needed!  while the code is correct, any usage of this
  counter is artificial, except fs/proc uses it correctly to show the
  number of threads.

This series removes the usage of sig->count from exit pathes.

This patch:

Now that Veaceslav changed copy_signal() to use zalloc(), exit_notify()
can just check notify_count < 0 to ensure the execing sub-threads needs
the notification from us.  No need to do other checks, notify_count != 0
must always mean ->group_exit_task != NULL is waiting for us.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c339168

call_usermodehelper: UMH_WAIT_EXEC ignores kernel_thread() failure · 04b1c384

由 Oleg Nesterov 提交于 5月 26, 2010

UMH_WAIT_EXEC should report the error if kernel_thread() fails, like
UMH_WAIT_PROC does.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04b1c384

call_usermodehelper: simplify/fix UMH_NO_WAIT case · d47419cd

由 Oleg Nesterov 提交于 5月 26, 2010

__call_usermodehelper(UMH_NO_WAIT) has 2 problems:

	- if kernel_thread() fails, call_usermodehelper_freeinfo()
	  is not called.

	- for unknown reason UMH_NO_WAIT has UMH_WAIT_PROC logic,
	  we spawn yet another thread which waits until the user
	  mode application exits.

Change the UMH_NO_WAIT code to use ____call_usermodehelper() instead of
wait_for_helper(), and do call_usermodehelper_freeinfo() unconditionally.
We can rely on CLONE_VFORK, do_fork(CLONE_VFORK) until the child exits or
execs.

With or without this patch UMH_NO_WAIT does not report the error if
kernel_thread() fails, this is correct since the caller doesn't wait for
result.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d47419cd

wait_for_helper: SIGCHLD from user-space can lead to use-after-free · 7d642242

由 Oleg Nesterov 提交于 5月 26, 2010

1. wait_for_helper() calls allow_signal(SIGCHLD) to ensure the child
   can't autoreap itself.

   However, this means that a spurious SIGCHILD from user-space can
   set TIF_SIGPENDING and:

   	- kernel_thread() or sys_wait4() can fail due to signal_pending()

   	- worse, wait4() can fail before ____call_usermodehelper() execs
   	  or exits. In this case the caller may kfree(subprocess_info)
   	  while the child still uses this memory.

   Change the code to use SIG_DFL instead of magic "(void __user *)2"
   set by allow_signal(). This means that SIGCHLD won't be delivered,
   yet the child won't autoreap itsefl.

   The problem is minor, only root can send a signal to this kthread.

2. If sys_wait4(&ret) fails it doesn't populate "ret", in this case
   wait_for_helper() reports a random value from uninitialized var.

   With this patch sys_wait4() should never fail, but still it makes
   sense to initialize ret = -ECHILD so that the caller can notice
   the problem.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d642242

call_usermodehelper: no need to unblock signals · 363da402

由 Oleg Nesterov 提交于 5月 26, 2010

____call_usermodehelper() correctly calls flush_signal_handlers() to set
SIG_DFL, but sigemptyset(->blocked) and recalc_sigpending() are not
needed.

This kthread was forked by workqueue thread, all signals must be unblocked
and ignored, no pending signal is possible.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

363da402

umh: creds: kill subprocess_info->cred logic · c70a626d

由 Oleg Nesterov 提交于 5月 26, 2010

Now that nobody ever changes subprocess_info->cred we can kill this member
and related code.  ____call_usermodehelper() always runs in the context of
freshly forked kernel thread, it has the proper ->cred copied from its
parent kthread, keventd.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c70a626d

umh: creds: convert call_usermodehelper_keys() to use subprocess_info->init() · 685bfd2c

由 Oleg Nesterov 提交于 5月 26, 2010

call_usermodehelper_keys() uses call_usermodehelper_setkeys() to change
subprocess_info->cred in advance.  Now that we have info->init() we can
change this code to set tgcred->session_keyring in context of execing
kernel thread.

Note: since currently call_usermodehelper_keys() is never called with
UMH_NO_WAIT, call_usermodehelper_keys()->key_get() and umh_keys_cleanup()
are not really needed, we could rely on install_session_keyring_to_cred()
which does key_get() on success.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

685bfd2c

exec: replace call_usermodehelper_pipe with use of umh init function and resolve limit · 898b374a

由 Neil Horman 提交于 5月 26, 2010

The first patch in this series introduced an init function to the
call_usermodehelper api so that processes could be customized by caller.
This patch takes advantage of that fact, by customizing the helper in
do_coredump to create the pipe and set its core limit to one (for our
recusrsion check). This lets us clean up the previous uglyness in the
usermodehelper internals and factor call_usermodehelper out entirely.
While I'm at it, we can also modify the helper setup to look for a core
limit value of 1 rather than zero for our recursion check
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

898b374a

kmod: add init function to usermodehelper · a06a4dc3

由 Neil Horman 提交于 5月 26, 2010

About 6 months ago, I made a set of changes to how the core-dump-to-a-pipe
feature in the kernel works. We had reports of several races, including
some reports of apps bypassing our recursion check so that a process that
was forked as part of a core_pattern setup could infinitely crash and
refork until the system crashed.

We fixed those by improving our recursion checks. The new check basically
refuses to fork a process if its core limit is zero, which works well.

Unfortunately, I've been getting grief from maintainer of user space
programs that are inserted as the forked process of core_pattern. They
contend that in order for their programs (such as abrt and apport) to
work, all the running processes in a system must have their core limits
set to a non-zero value, to which I say 'yes'. I did this by design, and
think thats the right way to do things.

But I've been asked to ease this burden on user space enough times that I
thought I would take a look at it. The first suggestion was to make the
recursion check fail on a non-zero 'special' number, like one. That way
the core collector process could set its core size ulimit to 1, and enable
the kernel's recursion detection. This isn't a bad idea on the surface,
but I don't like it since its opt-in, in that if a program like abrt or
apport has a bug and fails to set such a core limit, we're left with a
recursively crashing system again.

So I've come up with this. What I've done is modify the
call_usermodehelper api such that an extra parameter is added, a function
pointer which will be called by the user helper task, after it forks, but
before it exec's the required process. This will give the caller the
opportunity to get a call back in the processes context, allowing it to do
whatever it needs to to the process in the kernel prior to exec-ing the
user space code. In the case of do_coredump, this callback is ues to set
the core ulimit of the helper process to 1. This elimnates the opt-in
problem that I had above, as it allows the ulimit for core sizes to be set
to the value of 1, which is what the recursion check looks for in
do_coredump.

This patch:

Create new function call_usermodehelper_fns() and allow it to assign both
an init and cleanup function, as we'll as arbitrary data.

The init function is called from the context of the forked process and
allows for customization of the helper process prior to calling exec. Its
return code gates the continuation of the process, or causes its exit.
Also add an arbitrary data pointer to the subprocess_info struct allowing
for data to be passed from the caller to the new process, and the
subsequent cleanup process

Also, use this patch to cleanup the cleanup function. It currently takes
an argp and envp pointer for freeing, which is ugly. Lets instead just
make the subprocess_info structure public, and pass that to the cleanup
and init routines
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a06a4dc3

signals: check_kill_permission(): don't check creds if same_thread_group() · 065add39

由 Oleg Nesterov 提交于 5月 26, 2010

Andrew Tridgell reports that aio_read(SIGEV_SIGNAL) can fail if the
notification from the helper thread races with setresuid(), see
http://samba.org/~tridge/junkcode/aio_uid.c

This happens because check_kill_permission() doesn't permit sending a
signal to the task with the different cred->xids.  But there is not any
security reason to check ->cred's when the task sends a signal (private or
group-wide) to its sub-thread.  Whatever we do, any thread can bypass all
security checks and send SIGKILL to all threads, or it can block a signal
SIG and do kill(gettid(), SIG) to deliver this signal to another
sub-thread.  Not to mention that CLONE_THREAD implies CLONE_VM.

Change check_kill_permission() to avoid the credentials check when the
sender and the target are from the same thread group.

Also, move "cred = current_cred()" down to avoid calling get_current()
twice.

Note: David Howells pointed out we could relax this even more, the
CLONE_SIGHAND (without CLONE_THREAD) case probably does not need
these checks too.

Roland said:
: The glibc (libpthread) that does set*id across threads has
: been in use for a while (2.3.4?), probably in distro's using kernels as old
: or older than any active -stable streams.  In the race in question, this
: kernel bug is breaking valid POSIX application expectations.
Reported-by: NAndrew Tridgell <tridge@samba.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: <stable@kernel.org>		[all kernel versions]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

065add39

ptrace: PTRACE_GETFDPIC: fix the unsafe usage of child->mm · e0129ef9

由 Oleg Nesterov 提交于 5月 26, 2010

Now that Mike Frysinger unified the FDPIC ptrace code, we can fix the
unsafe usage of child->mm in ptrace_request(PTRACE_GETFDPIC).

We have the reference to task_struct, and ptrace_check_attach() verified
the tracee is stopped.  But nothing can protect from SIGKILL after that,
we must not assume child->mm != NULL.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NMike Frysinger <vapier.adi@gmail.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0129ef9

ptrace: unify FDPIC implementations · 9c1a1259

由 Mike Frysinger 提交于 5月 26, 2010

The Blackfin/FRV/SuperH guys all have the same exact FDPIC ptrace code in
their arch handlers (since they were probably copied & pasted).  Since
these ptrace interfaces are an arch independent aspect of the FDPIC code,
unify them in the common ptrace code so new FDPIC ports don't need to copy
and paste this fundamental stuff yet again.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c1a1259

cpusets: randomize node rotor used in cpuset_mem_spread_node() · 0ac0c0d0

由 Jack Steiner 提交于 5月 26, 2010

Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
Signed-off-by: NJack Steiner <steiner@sgi.com>
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ac0c0d0

cpusets: new round-robin rotor for SLAB allocations · 6adef3eb

由 Jack Steiner 提交于 5月 26, 2010

We have observed several workloads running on multi-node systems where
memory is assigned unevenly across the nodes in the system.  There are
numerous reasons for this but one is the round-robin rotor in
cpuset_mem_spread_node().

For example, a simple test that writes a multi-page file will allocate
pages on nodes 0 2 4 6 ...  Odd nodes are skipped.  (Sometimes it
allocates on odd nodes & skips even nodes).

An example is shown below.  The program "lfile" writes a file consisting
of 10 pages.  The program then mmaps the file & uses get_mempolicy(...,
MPOL_F_NODE) to determine the nodes where the file pages were allocated.
The output is shown below:

	# ./lfile
	 allocated on nodes: 2 4 6 0 1 2 6 0 2

There is a single rotor that is used for allocating both file pages & slab
pages.  Writing the file allocates both a data page & a slab page
(buffer_head).  This advances the RR rotor 2 nodes for each page
allocated.

A quick confirmation seems to confirm this is the cause of the uneven
allocation:

	# echo 0 >/dev/cpuset/memory_spread_slab
	# ./lfile
	 allocated on nodes: 6 7 8 9 0 1 2 3 4 5

This patch introduces a second rotor that is used for slab allocations.
Signed-off-by: NJack Steiner <steiner@sgi.com>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6adef3eb

cgroups: make cftype.unregister_event() void-returning · 907860ed

由 Kirill A. Shutemov 提交于 5月 26, 2010

Since we are unable to handle an error returned by
cftype.unregister_event() properly, let's make the callback
void-returning.

mem_cgroup_unregister_event() has been rewritten to be a "never fail"
function.  On mem_cgroup_usage_register_event() we save old buffer for
thresholds array and reuse it in mem_cgroup_usage_unregister_event() to
avoid allocation.
Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

907860ed

26 5月, 2010 1 次提交

Revert "module: drop the lock while waiting for module to complete initialization." · 218ce735

由 Linus Torvalds 提交于 5月 25, 2010

This reverts commit 480b02df, since
Rafael reports that it causes occasional kernel paging request faults in
load_module().

Dropping the module lock and re-taking it deep in the call-chain is
definitely not the right thing to do.  That just turns the mutex from a
lock into a "random non-locking data structure" that doesn't actually
protect what it's supposed to protect.
Requested-and-tested-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Brandon Philips <brandon@ifup.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

218ce735

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年