提交 · dd1d6772692316fe35094085c5e4d9a370ad3462 · openeuler / raspberrypi-kernel

05 6月, 2011 7 次提交

signal: remove three noop tracehooks · dd1d6772

由 Tejun Heo 提交于 6月 02, 2011

Remove the following three noop tracehooks in signals.c.

* tracehook_force_sigpending()
* tracehook_get_signal()
* tracehook_finish_jctl()

The code area is about to be updated and these hooks don't do anything
other than obfuscating the logic.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

dd1d6772

ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit · 62c124ff

由 Tejun Heo 提交于 6月 02, 2011

ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
->wait_chldexit was already complicated with waker-side filtering
without adding TRAPPING wait on top of it.  Also, it unnecessarily
made TRAPPING clearing depend on the current ptrace relationship - if
the ptracee is detached, wakeup is lost.

There is no reason to use signal->wait_chldexit here.  We're just
waiting for JOBCTL_TRAPPING bit to clear and given the relatively
infrequent use of ptrace, bit_waitqueue can serve it perfectly.

This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
signal->wait_chldexit.

-v2: Use JOBCTL_*_BIT macros instead of ilog2() as suggested by Linus.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

62c124ff

job control: introduce task_set_jobctl_pending() · 7dd3db54

由 Tejun Heo 提交于 6月 02, 2011

task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
pending bits too.  Setting pending conditions on a dying task may make
the task unkillable.  Currently, each setting site is responsible for
checking for the condition but with to-be-added job control traps this
becomes too fragile.

This patch adds task_set_jobctl_pending() which should be used when
setting task->jobctl bits to schedule a stop or trap.  The function
performs the followings to ease setting pending bits.

* Sanity checks.

* If fatal signal is pending or PF_EXITING is set, no bit is set.

* STOP_SIGMASK is automatically cleared if new value is being set.

do_signal_stop() and ptrace_attach() are updated to use
task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
The surrounding structures around setting are changed to fit
task_set_jobctl_pending() better but there should be no userland
visible behavior difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

7dd3db54

job control: make task_clear_jobctl_pending() clear TRAPPING automatically · 6dfca329

由 Tejun Heo 提交于 6月 02, 2011

JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to
(re)transit into TRACED.  task_clear_jobctl_pending() must be called
when either tracee enters TRACED or the transition is cancelled for
some reason.  The former is achieved by explicitly calling
task_clear_jobctl_pending() in ptrace_stop() and the latter by calling
it at the end of do_signal_stop().

Calling task_clear_jobctl_trapping() at the end of do_signal_stop()
limits the scope TRAPPING can be used and is fragile in that seemingly
unrelated changes to tracee's control flow can lead to stuck TRAPPING.

We already have task_clear_jobctl_pending() calls on those cancelling
events to clear JOBCTL_STOP_PENDING.  Cancellations can be handled by
making those call sites use JOBCTL_PENDING_MASK instead and updating
task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is
called automatically if no stop/trap is pending.

This patch makes the above changes and removes the fallback
task_clear_jobctl_trapping() call from do_signal_stop().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

6dfca329

job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() · 3759a0d9

由 Tejun Heo 提交于 6月 02, 2011

This patch introduces JOBCTL_PENDING_MASK and replaces
task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
which takes an extra @mask argument.

JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
future patches will add more bits.  recalc_sigpending_tsk() is updated
to use JOBCTL_PENDING_MASK instead.

task_clear_jobctl_pending() takes @mask which in subset of
JOBCTL_PENDING_MASK and clears the relevant jobctl bits.  If
JOBCTL_STOP_PENDING is set, other STOP bits are cleared together.  All
task_clear_jobctl_stop_pending() users are updated to call
task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
functionally identical to task_clear_jobctl_stop_pending().

This patch doesn't cause any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

3759a0d9

ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop() · 81be24b8

由 Tejun Heo 提交于 6月 02, 2011

In ptrace_stop(), after arch hook is done, the task state and jobctl
bits are updated while holding siglock.  The ordering requirement
there is that TASK_TRACED is set before JOBCTL_TRAPPING is cleared to
prevent ptracer waiting on TRAPPING doesn't end up waking up TRACED is
actually set and sees TASK_RUNNING in wait(2).

Move set_current_state(TASK_TRACED) to the top of the block and
reorganize comments.  This makes the ordering more obvious
(TASK_TRACED before other updates) and helps future updates to group
stop participation.

This patch doesn't cause any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

81be24b8

job control: rename signal->group_stop and flags to jobctl and update them · a8f072c1

由 Tejun Heo 提交于 6月 02, 2011

signal->group_stop currently hosts mostly group stop related flags;
however, it's gonna be used for wider purposes and the GROUP_STOP_
flag prefix becomes confusing.  Rename signal->group_stop to
signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.

Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
defined in terms of them to allow using bitops later.

While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
future additions.

This doesn't cause any functional change.

-v2: JOBCTL_*_BIT macros added as suggested by Linus.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

a8f072c1

26 5月, 2011 1 次提交

signal: sys_pause() should check signal_pending() · d92fcf05

由 Oleg Nesterov 提交于 5月 25, 2011

ERESTART* is always wrong without TIF_SIGPENDING. Teach sys_pause()
to handle the spurious wakeup correctly.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

d92fcf05

09 5月, 2011 2 次提交

ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() · 40ae717d

由 Tejun Heo 提交于 5月 06, 2011

GROUP_STOP_TRAPPING waiting mechanism piggybacks on
signal->wait_chldexit which is primarily used to implement waiting for
wait(2) and friends.  When do_wait() waits on signal->wait_chldexit,
it uses a custom wake up callback, child_wait_callback(), which
expects the child task which is waking up the parent to be passed in
as @key to filter out spurious wakeups.

task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
@key causing the following oops if the parent was doing do_wait().

  BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
  IP: [<ffffffff810499f9>] child_wait_callback+0x29/0x80
  PGD 1d899067 PUD 1e418067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP
  last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
  CPU 2
  Modules linked in:

  Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
  RIP: 0010:[<ffffffff810499f9>]  [<ffffffff810499f9>] child_wait_callback+0x29/0x80
  RSP: 0000:ffff88001b889bf8  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
  RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
  FS:  00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
  Stack:
   ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
   0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
   0000000000000002 0000000000000082 0000000000000001 0000000000000000
  Call Trace:
   [<ffffffff810312f9>] __wake_up_common+0x59/0x90
   [<ffffffff81035263>] __wake_up_sync_key+0x53/0x80
   [<ffffffff810352a0>] __wake_up_sync+0x10/0x20
   [<ffffffff8105a984>] task_clear_jobctl_trapping+0x44/0x50
   [<ffffffff8105bcbc>] ptrace_stop+0x7c/0x290
   [<ffffffff8105c20a>] do_signal_stop+0x28a/0x2d0
   [<ffffffff8105d27f>] get_signal_to_deliver+0x14f/0x5a0
   [<ffffffff81002175>] do_signal+0x75/0x7b0
   [<ffffffff8100292d>] do_notify_resume+0x5d/0x70
   [<ffffffff8182e36a>] retint_signal+0x46/0x8c
  Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 <4b> 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44

Fix it by using __wake_up_sync_key() and passing in the child as @key.

I still think it's a mistake to piggyback on wait_chldexit for this.
Given the relative low frequency of ptrace use, we would be much
better off leaving already complex wait_chldexit alone and using bit
waitqueue.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>

40ae717d

signal: sys_sigprocmask() needs retarget_shared_pending() · 2e4f7c77

由 Oleg Nesterov 提交于 5月 09, 2011

sys_sigprocmask() changes current->blocked by hand. Convert this code
to use set_current_blocked().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>

2e4f7c77

28 4月, 2011 11 次提交

signal: cleanup sys_sigprocmask() · b013c399

由 Oleg Nesterov 提交于 4月 28, 2011

Cleanup. Remove the unneeded goto's, we can simply read blocked.sig[0]
unconditionally and then copy-to-user it if oset != NULL.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>

b013c399

signal: rename signandsets() to sigandnsets() · 702a5073

由 Oleg Nesterov 提交于 4月 27, 2011

As Tejun and Linus pointed out, "nand" is the wrong name for "x & ~y",
it should be "andn". Rename signandsets() as suggested.
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>

702a5073

signal: do_sigtimedwait() needs retarget_shared_pending() · b182801a

由 Oleg Nesterov 提交于 4月 27, 2011

do_sigtimedwait() changes current->blocked and thus it needs
set_current_blocked()->retarget_shared_pending().

We could use set_current_blocked() directly. It is fine to change
->real_blocked from all-zeroes to ->blocked and vice versa lockless,
but this is not immediately clear, looks racy, and needs a huge
comment to explain why this is correct.

To keep the things simple this patch adds the new static helper,
__set_task_blocked() which should be called with ->siglock held. This
way we can change both ->real_blocked and ->blocked atomically under
->siglock as the current code does. This is more understandable.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>

b182801a

signal: introduce do_sigtimedwait() to factor out compat/native code · 943df148

由 Oleg Nesterov 提交于 4月 27, 2011

Factor out the common code in sys_rt_sigtimedwait/compat_sys_rt_sigtimedwait
to the new helper, do_sigtimedwait().

Add the comment to document the extra tick we add to timespec_to_jiffies(ts),
thanks to Linus who explained this to me.

Perhaps it would be better to move compat_sys_rt_sigtimedwait() into
signal.c under CONFIG_COMPAT, then we can make do_sigtimedwait() static.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>

943df148

signal: sys_rt_sigtimedwait: simplify the timeout logic · fe0faa00

由 Oleg Nesterov 提交于 4月 27, 2011

No functional changes, cleanup compat_sys_rt_sigtimedwait() and
sys_rt_sigtimedwait().

Calculate the timeout before we take ->siglock, this simplifies and
lessens the code. Use timespec_valid() to check the timespec.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>

fe0faa00

signal: cleanup sys_rt_sigprocmask() · bb7efee2

由 Oleg Nesterov 提交于 4月 27, 2011

sys_rt_sigprocmask() looks unnecessarily complicated, simplify it.
We can just read current->blocked lockless unconditionally before
anything else and then copy-to-user it if needed.  At worst we
copy 4 words on mips.

We could copy-to-user the old mask first and simplify the code even
more, but the patch tries to keep the current behaviour: we change
current->block even if copy_to_user(oset) fails.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>

bb7efee2

signal: sigprocmask() should do retarget_shared_pending() · e6fa16ab

由 Oleg Nesterov 提交于 4月 27, 2011

In short, almost every changing of current->blocked is wrong, or at least
can lead to the unexpected results.

For example. Two threads T1 and T2, T1 sleeps in sigtimedwait/pause/etc.
kill(tgid, SIG) can pick T2 for TIF_SIGPENDING. If T2 calls sigprocmask()
and blocks SIG before it notices the pending signal, nobody else can handle
this pending shared signal.

I am not sure this is bug, but at least this looks strange imho. T1 should
not sleep forever, there is a signal which should wake it up.

This patch moves the code which actually changes ->blocked into the new
helper, set_current_blocked() and changes this code to call
retarget_shared_pending() as exit_signals() does. We should only care about
the signals we just blocked, we use "newset & ~current->blocked" as a mask.

We do not check !sigisemptyset(newblocked), retarget_shared_pending() is
cheap unless mask & shared_pending.

Note: for this particular case we could simply change sigprocmask() to
return -EINTR if signal_pending(), but then we should change other callers
and, more importantly, if we need this fix then set_current_blocked() will
have more callers and some of them can't restart. See the next patch as a
random example.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>

e6fa16ab

signal: sigprocmask: narrow the scope of ->siglock · 73ef4aeb

由 Oleg Nesterov 提交于 4月 27, 2011

No functional changes, preparation to simplify the review of the next change.

1. We can read current->block lockless, nobody else can ever change this mask.

2. Calculate the resulting sigset_t outside of ->siglock into the temporary
   variable, then take ->siglock and change ->blocked.

Also, kill the stale comment about BKL.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>

73ef4aeb

signal: retarget_shared_pending: optimize while_each_thread() loop · fec9993d

由 Oleg Nesterov 提交于 4月 27, 2011

retarget_shared_pending() blindly does recalc_sigpending_and_wake() for
every sub-thread, this is suboptimal. We can check t->blocked and stop
looping once every bit in shared_pending has the new target.

Note: we do not take task_is_stopped_or_traced(t) into account, we are
not trying to speed up the signal delivery or to avoid the unnecessary
(but harmless) signal_wake_up(0) in this unlikely case.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>

fec9993d

signal: retarget_shared_pending: consider shared/unblocked signals only · f646e227

由 Oleg Nesterov 提交于 4月 27, 2011

exit_signals() checks signal_pending() before retarget_shared_pending() but
this is suboptimal. We can avoid the while_each_thread() loop in case when
there are no shared signals visible to us.

Add the "shared_pending.signal & ~blocked" check. We don't use tsk->blocked
directly but pass ~blocked as an argument, this is needed for the next patch.

Note: we can optimize this more. while_each_thread(t) can check t->blocked
into account and stop after every pending signal has the new target, see the
next patch.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>

f646e227

signal: introduce retarget_shared_pending() · 0edceb7b

由 Oleg Nesterov 提交于 4月 27, 2011

No functional changes. Move the notify-other-threads code from exit_signals()
to the new helper, retarget_shared_pending().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NMatt Fleming <matt.fleming@linux.intel.com>
Acked-by: NTejun Heo <tj@kernel.org>

0edceb7b

09 4月, 2011 1 次提交

signal.c: fix erroneous syscall kernel-doc · f9fa0bc1

由 Randy Dunlap 提交于 4月 08, 2011

Fix erroneous syscall kernel-doc comments in kernel/signal.c.
Reported-by: NMatt Fleming <matt@console-pimps.org>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9fa0bc1

05 4月, 2011 2 次提交

kernel/signal.c: add kernel-doc notation to syscalls · 41c57892

由 Randy Dunlap 提交于 4月 04, 2011

Add kernel-doc to syscalls in signal.c.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41c57892

kernel/signal.c: fix typos and coding style · 5aba085e

由 Randy Dunlap 提交于 4月 04, 2011

General coding style and comment fixes; no code changes:

 - Use multi-line-comment coding style.
 - Put some function signatures completely on one line.
 - Hyphenate some words.
 - Spell Posix as POSIX.
 - Correct typos & spellos in some comments.
 - Drop trailing whitespace.
 - End sentences with periods.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5aba085e

04 4月, 2011 3 次提交

signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED · ee77f075

由 Oleg Nesterov 提交于 4月 01, 2011

This patch moves SIGNAL_STOP_DEQUEUED from signal_struct->flags to
task_struct->group_stop, and thus makes it per-thread.

Like SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can be false-positive
after return from get_signal_to_deliver(), this is fine. The only
purpose of this bit is: we can drop ->siglock after __dequeue_signal()
returns the sig_kernel_stop() signal and before we call
do_signal_stop(), in this case we must not miss SIGCONT if it comes in
between.

But, unlike SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can not be
false-positive in do_signal_stop() if multiple threads dequeue the
sig_kernel_stop() signal at the same time.

Consider two threads T1 and T2, SIGTTIN has a hanlder.

	- T1 dequeues SIGTSTP and sets SIGNAL_STOP_DEQUEUED, then
	  it drops ->siglock

	- SIGCONT comes and clears SIGNAL_STOP_DEQUEUED, SIGTSTP
	  should be cancelled.

	- T2 dequeues SIGTTIN and sets SIGNAL_STOP_DEQUEUED again.
	  Since we have a handler we should not stop, T2 returns
	  to usermode to run the handler.

	- T1 continues, calls do_signal_stop() and wrongly starts
	  the group stop because SIGNAL_STOP_DEQUEUED was restored
	  in between.

With or without this change:

	- we need to do something with ptrace_signal() which can
	  return SIGSTOP, but this needs another discussion

	- SIGSTOP can be lost if it races with the mt exec, will
	  be fixed later.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

ee77f075

signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending() · 780006ea

由 Oleg Nesterov 提交于 4月 01, 2011

PF_EXITING or TASK_STOPPED has already called task_participate_group_stop()
and cleared its ->group_stop. No need to do task_clear_group_stop_pending()
when we start the new group stop.

Add a small comment to explain the !task_is_stopped() check. Note that this
check is not exactly right and it can lead to unnecessary stop later if the
thread is TASK_PTRACED. What we need is task_participated_in_group_stop(),
this will be solved later.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

780006ea

signal: prepare_signal(SIGCONT) shouldn't play with TIF_SIGPENDING · 1deac632

由 Oleg Nesterov 提交于 4月 01, 2011

prepare_signal(SIGCONT) should never set TIF_SIGPENDING or wake up
the TASK_INTERRUPTIBLE threads. We are going to call complete_signal()
which should pick the right thread correctly. All we need is to wake
up the TASK_STOPPED threads.

If the task was stopped, it can't return to usermode without taking
->siglock. Otherwise we don't care, and the spurious TIF_SIGPENDING
can't be useful.

The comment says:

	* If there is a handler for SIGCONT, we must make
	* sure that no thread returns to user mode before
	* we post the signal

It is not clear what this means. Probably, "when there's only a single
thread" and this continues to be true. Otherwise, even if this SIGCONT
is not private, with or without this change only one thread can dequeue
SIGCONT, other threads can happily return to user mode before before
that thread handles this signal.

Note also that wake_up_state(t, __TASK_STOPPED) can't race with the task
which changes its state, TASK_STOPPED state is protected by ->siglock as
well.

In short: when it comes to signal delivery, SIGCONT is the normal signal
and does not need any special support.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

1deac632

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

29 3月, 2011 1 次提交

Relax si_code check in rt_sigqueueinfo and rt_tgsigqueueinfo · 243b422a

由 Roland Dreier 提交于 3月 28, 2011

Commit da48524e ("Prevent rt_sigqueueinfo and rt_tgsigqueueinfo
from spoofing the signal code") made the check on si_code too strict.
There are several legitimate places where glibc wants to queue a
negative si_code different from SI_QUEUE:

 - This was first noticed with glibc's aio implementation, which wants
   to queue a signal with si_code SI_ASYNCIO; the current kernel
   causes glibc's tst-aio4 test to fail because rt_sigqueueinfo()
   fails with EPERM.

 - Further examination of the glibc source shows that getaddrinfo_a()
   wants to use SI_ASYNCNL (which the kernel does not even define).
   The timer_create() fallback code wants to queue signals with SI_TIMER.

As suggested by Oleg Nesterov <oleg@redhat.com>, loosen the check to
forbid only the problematic SI_TKILL case.
Reported-by: NKlaus Dittrich <kladit@arcor.de>
Acked-by: NJulien Tinnes <jln@google.com>
Cc: <stable@kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

243b422a

24 3月, 2011 1 次提交

userns: allow killing tasks in your own or child userns · 39fd3393

由 Serge E. Hallyn 提交于 3月 23, 2011

Changelog:
	Dec  8: Fixed bug in my check_kill_permission pointed out by
	        Eric Biederman.
	Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred()
	        for clarity
	Dec 31: address comment by Eric Biederman:
		don't need cred/tcred in check_kill_permission.
	Jan  1: use const cred struct.
	Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred().
	Feb 16: kill_ok_by_cred: fix bad parentheses
	Feb 23: per akpm, let compiler inline kill_ok_by_cred
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39fd3393

23 3月, 2011 10 次提交

job control: Don't send duplicate job control stop notification while ptraced · 244056f9

由 Tejun Heo 提交于 3月 23, 2011

Just as group_exit_code shouldn't be generated when a PTRACE_CONT'd
task re-enters job control stop, notifiction for the event should be
suppressed too.  The logic is the same as the group_exit_code
generation suppression in do_signal_stop(), if SIGNAL_STOP_STOPPED is
already set, the task is re-entering job control stop without
intervening SIGCONT and the notifications should be suppressed.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <signal.h>
  #include <time.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  static const struct timespec ts100ms = { .tv_nsec = 100000000 };
  static pid_t tracee, tracer;

  static const char *pid_who(pid_t pid)
  {
	  return pid == tracee ? "tracee" : (pid == tracer ? "tracer" : "mommy ");
  }

  static void sigchld_sigaction(int signo, siginfo_t *si, void *ucxt)
  {
	  printf("%s: SIG status=%02d code=%02d (%s)\n",
		 pid_who(getpid()), si->si_status, si->si_code,
		 pid_who(si->si_pid));
  }

  int main(void)
  {
	  const struct sigaction chld_sa = { .sa_sigaction = sigchld_sigaction,
					     .sa_flags = SA_SIGINFO|SA_RESTART };
	  siginfo_t si;

	  sigaction(SIGCHLD, &chld_sa, NULL);

	  tracee = fork();
	  if (!tracee) {
		  tracee = getpid();
		  while (1)
			  pause();
	  }

	  kill(tracee, SIGSTOP);
	  waitid(P_PID, tracee, &si, WSTOPPED);

	  tracer = fork();
	  if (!tracer) {
		  tracer = getpid();
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  printf("tracer: detaching\n");
		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
		  return 0;
	  }

	  while (1)
		  pause();
	  return 0;
  }

Before the patch, the parent gets the second notification for the
tracee after the tracer detaches.  si_status is zero because
group_exit_code is not set by the group stop completion which
triggered this notification.

  mommy : SIG status=19 code=05 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: SIG status=19 code=04 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: detaching
  mommy : SIG status=00 code=05 (tracee)
  mommy : SIG status=00 code=01 (tracer)
  ^C

After the patch, the duplicate notification is gone.

  mommy : SIG status=19 code=05 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: SIG status=19 code=04 (tracee)
  tracer: SIG status=00 code=05 (tracee)
  tracer: detaching
  mommy : SIG status=00 code=01 (tracer)
  ^C
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>

244056f9

job control: Notify the real parent of job control events regardless of ptrace · ceb6bd67

由 Tejun Heo 提交于 3月 23, 2011

With recent changes, job control and ptrace stopped states are
properly separated and accessible to the real parent and the ptracer
respectively; however, notifications of job control stopped/continued
events to the real parent while ptraced are still missing.

A ptracee participates in group stop in ptrace_stop() but the
completion isn't notified.  If participation results in completion of
group stop, notify the real parent of the event.  The ptrace and group
stops are separate and can be handled as such.

However, when the real parent and the ptracer are in the same thread
group, only the ptrace stop event is visible through wait(2) and the
duplicate notifications are different from the current behavior and
are confusing.  Suppress group stop notification in such cases.

The continued state is shared between the real parent and the ptracer
but is only meaningful to the real parent.  Always notify the real
parent and notify the ptracer too for backward compatibility.  Similar
to stop notification, if the real parent is the ptracer, suppress a
duplicate notification.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <time.h>
  #include <errno.h>
  #include <sys/types.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  int main(void)
  {
	  const struct timespec ts100ms = { .tv_nsec = 100000000 };
	  pid_t tracee, tracer;
	  siginfo_t si;
	  int i;

	  tracee = fork();
	  if (tracee == 0) {
		  while (1) {
			  printf("tracee: SIGSTOP\n");
			  raise(SIGSTOP);
			  nanosleep(&ts100ms, NULL);
			  printf("tracee: SIGCONT\n");
			  raise(SIGCONT);
			  nanosleep(&ts100ms, NULL);
		  }
	  }

	  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG | WNOWAIT);

	  tracer = fork();
	  if (tracer == 0) {
		  nanosleep(&ts100ms, NULL);
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);

		  for (i = 0; i < 11; i++) {
			  si.si_pid = 0;
			  waitid(P_PID, tracee, &si, WSTOPPED);
			  if (si.si_pid && si.si_code == CLD_TRAPPED)
				  ptrace(PTRACE_CONT, tracee, NULL,
					 (void *)(long)si.si_status);
		  }
		  printf("tracer: EXITING\n");
		  return 0;
	  }

	  while (1) {
		  si.si_pid = 0;
		  waitid(P_PID, tracee, &si, WSTOPPED | WCONTINUED | WEXITED);
		  if (si.si_pid)
			  printf("mommy : WAIT status=%02d code=%02d\n",
				 si.si_status, si.si_code);
	  }
	  return 0;
  }

Before this patch, while ptraced, the real parent doesn't get
notifications for job control events, so although it can access those
events, the later waitid(2) call never wakes up.

  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  tracee: SIGSTOP
  tracee: SIGCONT
  tracee: SIGSTOP
  tracee: SIGCONT
  tracee: SIGSTOP
  tracer: EXITING
  mommy : WAIT status=19 code=05
  ^C

After this patch, it works as expected.

  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  mommy : WAIT status=19 code=05
  tracee: SIGCONT
  mommy : WAIT status=18 code=06
  tracee: SIGSTOP
  tracer: EXITING
  mommy : WAIT status=19 code=05
  ^C

-v2: Oleg pointed out that

     * Group stop notification to the real parent should also happen
       when ptracer detach races with ptrace_stop().

     * real_parent_is_ptracer() should be testing thread group
       equality not the task itself as wait(2) and stop/cont
       notifications are normally thread-group wide.

     Both issues are fixed accordingly.

-v3: real_parent_is_ptracer() updated to test child->real_parent
     instead of child->group_leader->real_parent per Oleg's
     suggestion.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>

ceb6bd67

job control: Job control stop notifications should always go to the real parent · 62bcf9d9

由 Tejun Heo 提交于 3月 23, 2011

The stopped notifications in do_signal_stop() and exit_signals() are
always for the completion of job control.  The one in do_signal_stop()
may be delivered to the ptracer if PTRACE_ATTACH races with
notification and the one in exit_signals() if task exits while
ptraced.

In both cases, the notifications are meaningless and confusing to the
ptracer as it never accesses the group stop state while the real
parent would miss notifications for the events it is watching.

Make sure these notifications always go to the real parent by calling
do_notify_parent_cld_stop() with %false @for_ptrace.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>

62bcf9d9

job control: Add @for_ptrace to do_notify_parent_cldstop() · 75b95953

由 Tejun Heo 提交于 3月 23, 2011

Currently, do_notify_parent_cldstop() determines whether the
notification is for the real parent or ptracer.  Move the decision to
the caller by adding @for_ptrace parameter to
do_notify_parent_cldstop().  All the callers are updated to pass
task_ptrace(target_task), so this patch doesn't cause any behavior
difference.

While at it, add function comment to do_notify_parent_cldstop().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>

75b95953

job control: Don't set group_stop exit_code if re-entering job control stop · 408a37de

由 Tejun Heo 提交于 3月 23, 2011

While ptraced, a task may be resumed while the containing process is
still job control stopped.  If the task receives another stop signal
in this state, it will still initiate group stop, which generates
group_exit_code, which the real parent would be able to see once the
ptracer detaches.

In this scenario, the real parent may see two consecutive CLD_STOPPED
events from two stop signals without intervening SIGCONT, which
normally is impossible.

Test case follows.

  #include <stdio.h>
  #include <unistd.h>
  #include <sys/ptrace.h>
  #include <sys/wait.h>

  int main(void)
  {
	  pid_t tracee;
	  siginfo_t si;

	  tracee = fork();
	  if (!tracee)
		  while (1)
			  pause();

	  kill(tracee, SIGSTOP);
	  waitid(P_PID, tracee, &si, WSTOPPED);

	  if (!fork()) {
		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
		  waitid(P_PID, tracee, &si, WSTOPPED);
		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
		  return 0;
	  }

	  while (1) {
		  si.si_pid = 0;
		  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG);
		  if (si.si_pid)
			  printf("st=%02d c=%02d\n", si.si_status, si.si_code);
	  }
	  return 0;
  }

Before the patch, the latter waitid() in polling mode reports the
second stopped event generated by the implied SIGSTOP of
PTRACE_ATTACH.

  st=19 c=05
  ^C

After the patch, the second event is not reported.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>

408a37de

ptrace: Clean transitions between TASK_STOPPED and TRACED · d79fdd6d

由 Tejun Heo 提交于 3月 23, 2011

Currently, if the task is STOPPED on ptrace attach, it's left alone
and the state is silently changed to TRACED on the next ptrace call.
The behavior breaks the assumption that arch_ptrace_stop() is called
before any task is poked by ptrace and is ugly in that a task
manipulates the state of another task directly.

With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
TRACED can be made clean.  The tracer can use the flag to tell the
tracee to retry stop on attach and detach.  On retry, the tracee will
enter the desired state in the correct way.  The lower 16bits of
task->group_stop is used to remember the signal number which caused
the last group stop.  This is used while retrying for ptrace attach as
the original group_exit_code could have been consumed with wait(2) by
then.

As the real parent may wait(2) and consume the group_exit_code
anytime, the group_exit_code needs to be saved separately so that it
can be used when switching from regular sleep to ptrace_stop().  This
is recorded in the lower 16bits of task->group_stop.

If a task is already stopped and there's no intervening SIGCONT, a
ptrace request immediately following a successful PTRACE_ATTACH should
always succeed even if the tracer doesn't wait(2) for attach
completion; however, with this change, the tracee might still be
TASK_RUNNING trying to enter TASK_TRACED which would cause the
following request to fail with -ESRCH.

This intermediate state is hidden from the ptracer by setting
GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
for it to clear on its signal->wait_chldexit.  Completing the
transition or getting killed clears TRAPPING and wakes up the tracer.

Note that the STOPPED -> RUNNING -> TRACED transition is still visible
to other threads which are in the same group as the ptracer and the
reverse transition is visible to all.  Please read the comments for
details.

Oleg:

* Spotted a race condition where a task may retry group stop without
  proper bookkeeping.  Fixed by redoing bookkeeping on retry.

* Spotted that the transition is visible to userland in several
  different ways.  Most are fixed with GROUP_STOP_TRAPPING.  Unhandled
  corner case is documented.

* Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
  task would result in more consistent behavior.

* Pointed out that calling ptrace_stop() from do_signal_stop() in
  TASK_STOPPED can race with group stop start logic and then confuse
  the TRAPPING wait in ptrace_check_attach().  ptrace_stop() is now
  called with TASK_RUNNING.

* Suggested using signal->wait_chldexit instead of bit wait.

* Spotted a race condition between TRACED transition and clearing of
  TRAPPING.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>

d79fdd6d

ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced · 5224fa36

由 Tejun Heo 提交于 3月 23, 2011

A ptraced task would still stop at do_signal_stop() when it's stopping
for stop signals and do_signal_stop() behaves the same whether the
task is ptraced or not.  However, in addition to stopping,
ptrace_stop() also does ptrace specific stuff like calling
architecture specific callbacks, so this behavior makes the code more
fragile and difficult to understand.

This patch makes do_signal_stop() test whether the task is ptraced and
use ptrace_stop() if so.  This renders tracehook_notify_jctl() rather
pointless as the ptrace notification is now handled by ptrace_stop()
regardless of the return value from the tracehook.  It probably is a
good idea to update it.

This doesn't solve the whole problem as tasks already in stopped state
would stay in the regular stop when ptrace attached.  That part will
be handled by the next patch.

Oleg pointed out that this makes a userland-visible change.  Before,
SIGCONT would be able to wake up a task in group stop even if the task
is ptraced if the tracer hasn't issued another ptrace command
afterwards (as the next ptrace commands transitions the state into
TASK_TRACED which ignores SIGCONT wakeups).  With this and the next
patch, SIGCONT may race with the transition into TASK_TRACED and is
ignored if the tracee already entered TASK_TRACED.

Another userland visible change of this and the next patch is that the
ptracee's state would now be TASK_TRACED where it used to be
TASK_STOPPED, which is visible via fs/proc.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>

5224fa36

ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop · 0ae8ce1c

由 Tejun Heo 提交于 3月 23, 2011

Currently, ptrace_stop() unconditionally participates in group stop
bookkeeping.  This is unnecessary and inaccurate.  Make it only
participate if the task is trapping for group stop - ie. if @why is
CLD_STOPPED.  As ptrace_stop() currently is not used when trapping for
group stop, this equals to disabling group stop participation from
ptrace_stop().

A visible behavior change is increased likelihood of delayed group
stop completion if the thread group contains one or more ptraced
tasks.

This is to preapre for further cleanup of the interaction between
group stop and ptrace.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>

0ae8ce1c

signal: Use GROUP_STOP_PENDING to stop once for a single group stop · 39efa3ef

由 Tejun Heo 提交于 3月 23, 2011

Currently task->signal->group_stop_count is used to decide whether to
stop for group stop.  However, if there is a task in the group which
is taking a long time to stop, other tasks which are continued by
ptrace would repeatedly stop for the same group stop until the group
stop is complete.

Conversely, if a ptraced task is in TASK_TRACED state, the debugger
won't get notified of group stops which is inconsistent compared to
the ptraced task in any other state.

This patch introduces GROUP_STOP_PENDING which tracks whether a task
is yet to stop for the group stop in progress.  The flag is set when a
group stop starts and cleared when the task stops the first time for
the group stop, and consulted whenever whether the task should
participate in a group stop needs to be determined.  Note that now
tasks in TASK_TRACED also participate in group stop.

This results in the following behavior changes.

* For a single group stop, a ptracer would see at most one stop
  reported.

* A ptracee in TASK_TRACED now also participates in group stop and the
  tracer would get the notification.  However, as a ptraced task could
  be in TASK_STOPPED state or any ptrace trap could consume group
  stop, the notification may still be missing.  These will be
  addressed with further patches.

* A ptracee may start a group stop while one is still in progress if
  the tracer let it continue with stop signal delivery.  Group stop
  code handles this correctly.

Oleg:

* Spotted that a task might skip signal check even when its
  GROUP_STOP_PENDING is set.  Fixed by updating
  recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
  group_stop_count.

* Pointed out that task->group_stop should be cleared whenever
  task->signal->group_stop_count is cleared.  Fixed accordingly.

* Pointed out the behavior inconsistency between TASK_TRACED and
  RUNNING and the last behavior change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>

39efa3ef

signal: Fix premature completion of group stop when interfered by ptrace · e5c1902e

由 Tejun Heo 提交于 3月 23, 2011

task->signal->group_stop_count is used to track the progress of group
stop.  It's initialized to the number of tasks which need to stop for
group stop to finish and each stopping or trapping task decrements.
However, each task doesn't keep track of whether it decremented the
counter or not and if woken up before the group stop is complete and
stops again, it can decrement the counter multiple times.

Please consider the following example code.

 static void *worker(void *arg)
 {
	 while (1) ;
	 return NULL;
 }

 int main(void)
 {
	 pthread_t thread;
	 pid_t pid;
	 int i;

	 pid = fork();
	 if (!pid) {
		 for (i = 0; i < 5; i++)
			 pthread_create(&thread, NULL, worker, NULL);
		 while (1) ;
		 return 0;
	 }

	 ptrace(PTRACE_ATTACH, pid, NULL, NULL);
	 while (1) {
		 waitid(P_PID, pid, NULL, WSTOPPED);
		 ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
	 }
	 return 0;
 }

The child creates five threads and the parent continuously traps the
first thread and whenever the child gets a signal, SIGSTOP is
delivered.  If an external process sends SIGSTOP to the child, all
other threads in the process should reliably stop.  However, due to
the above bug, the first thread will often end up consuming
group_stop_count multiple times and SIGSTOP often ends up stopping
none or part of the other four threads.

This patch adds a new field task->group_stop which is protected by
siglock and uses GROUP_STOP_CONSUME flag to track which task is still
to consume group_stop_count to fix this bug.

task_clear_group_stop_pending() and task_participate_group_stop() are
added to help manipulating group stop states.  As ptrace_stop() now
also uses task_participate_group_stop(), it will set
SIGNAL_STOP_STOPPED if it completes a group stop.

There still are many issues regarding the interaction between group
stop and ptrace.  Patches to address them will follow.

- Oleg spotted duplicate GROUP_STOP_CONSUME.  Dropped.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>

e5c1902e