1. 21 3月, 2012 1 次提交
    • O
      exit_signal: fix the "parent has changed security domain" logic · b6e238dc
      Oleg Nesterov 提交于
      exit_notify() changes ->exit_signal if the parent already did exec.
      This doesn't really work, we are not going to send the signal now
      if there is another live thread or the exiting task is traced. The
      parent can exec before the last dies or the tracer detaches.
      
      Move this check into do_notify_parent() which actually sends the
      signal.
      
      The user-visible change is that we do not change ->exit_signal,
      and thus the exiting task is still "clone children" for
      do_wait()->eligible_child(__WCLONE). Hopefully this is fine, the
      current logic is racy anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6e238dc
  2. 14 1月, 2012 2 次提交
  3. 11 1月, 2012 2 次提交
    • S
      user namespace: make signal.c respect user namespaces · 6b550f94
      Serge E. Hallyn 提交于
      ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
      user namespace. (new, thanks Oleg)
      
      __send_signal: convert current's uid to the recipient's user namespace
      for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
      again :)
      
      do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
      user namespace
      
      ptrace_signal maps parent's uid into current's user namespace before
      including in signal to current.  IIUC Oleg has argued that this shouldn't
      matter as the debugger will play with it, but it seems like not converting
      the value currently being set is misleading.
      
      Changelog:
      Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
      	simplify callers and help make clear what we are translating
              (which uid into which namespace).  Passing the target task would
      	make callers even easier to read, but we pass in user_ns because
      	current_user_ns() != task_cred_xxx(current, user_ns).
      Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
      	in ptrace_signal().
      Sep 23: In send_signal(), detect when (user) signal is coming from an
      	ancestor or unrelated user namespace.  Pass that on to __send_signal,
      	which sets si_uid to 0 or overflowuid if needed.
      Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
      	SI_FROMKERNEL cases at callers, because we can't assume sender is
      	current in those cases.
      Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
      Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Serge Hallyn <serge.hallyn@canonical.com>
      Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
      
      Eric Biederman pointed out that passing info is a bug and could lead to a
      NULL pointer deref to boot.
      
      A collection of signal, securebits, filecaps, cap_bounds, and a few other
      ltp tests passed with this kernel.
      
      Changelog:
          Nov 18: previous patch missed a leading '&'
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Dan Carpenter <dan.carpenter@oracle.com>
      Subject: ipc/mqueue: lock() => unlock() typo
      
      There was a double lock typo introduced in b085f4bd6b21 "user namespace:
      make signal.c respect user namespaces"
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b550f94
    • M
      signal: add block_sigmask() for adding sigmask to current->blocked · 5e6292c0
      Matt Fleming 提交于
      Abstract the code sequence for adding a signal handler's sa_mask to
      current->blocked because the sequence is identical for all architectures.
      Furthermore, in the past some architectures actually got this code wrong,
      so introduce a wrapper that all architectures can use.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e6292c0
  4. 05 1月, 2012 1 次提交
    • O
      ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b
      Oleg Nesterov 提交于
      This is the temporary simple fix for 3.2, we need more changes in this
      area.
      
      1. do_signal_stop() assumes that the running untraced thread in the
         stopped thread group is not possible. This was our goal but it is
         not yet achieved: a stopped-but-resumed tracee can clone the running
         thread which can initiate another group-stop.
      
         Remove WARN_ON_ONCE(!current->ptrace).
      
      2. A new thread always starts with ->jobctl = 0. If it is auto-attached
         and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
         but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
         in do_jobctl_trap() if another debugger attaches.
      
         Change __ptrace_unlink() to set the artificial SIGSTOP for report.
      
         Alternatively we could change ptrace_init_task() to copy signr from
         current, but this means we can copy it for no reason and hide the
         possible similar problems.
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.1]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a88951b
  5. 15 12月, 2011 1 次提交
  6. 13 12月, 2011 1 次提交
    • T
      threadgroup: extend threadgroup_lock() to cover exit and exec · 77e4ef99
      Tejun Heo 提交于
      threadgroup_lock() protected only protected against new addition to
      the threadgroup, which was inherently somewhat incomplete and
      problematic for its only user cgroup.  On-going migration could race
      against exec and exit leading to interesting problems - the symmetry
      between various attach methods, task exiting during method execution,
      ->exit() racing against attach methods, migrating task switching basic
      properties during exec and so on.
      
      This patch extends threadgroup_lock() such that it protects against
      all three threadgroup altering operations - fork, exit and exec.  For
      exit, threadgroup_change_begin/end() calls are added to exit_signals
      around assertion of PF_EXITING.  For exec, threadgroup_[un]lock() are
      updated to also grab and release cred_guard_mutex.
      
      With this change, threadgroup_lock() guarantees that the target
      threadgroup will remain stable - no new task will be added, no new
      PF_EXITING will be set and exec won't happen.
      
      The next patch will update cgroup so that it can take full advantage
      of this change.
      
      -v2: beefed up comment as suggested by Frederic.
      
      -v3: narrowed scope of protection in exit path as suggested by
           Frederic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      77e4ef99
  7. 31 10月, 2011 1 次提交
  8. 30 9月, 2011 1 次提交
    • S
      user namespace: usb: make usb urbs user namespace aware (v2) · d178bc3a
      Serge Hallyn 提交于
      Add to the dev_state and alloc_async structures the user namespace
      corresponding to the uid and euid.  Pass these to kill_pid_info_as_uid(),
      which can then implement a proper, user-namespace-aware uid check.
      
      Changelog:
      Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
      	uid, and euid each separately, pass a struct cred.
      Sep 26: Address Alan Stern's comments: don't define a struct cred at
      	usbdev_open(), and take and put a cred at async_completed() to
      	ensure it lasts for the duration of kill_pid_info_as_cred().
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      d178bc3a
  9. 28 7月, 2011 1 次提交
  10. 21 7月, 2011 2 次提交
    • O
      ptrace: fix ptrace_signal() && STOP_DEQUEUED interaction · 8a352418
      Oleg Nesterov 提交于
      Simple test-case,
      
      	int main(void)
      	{
      		int pid, status;
      
      		pid = fork();
      		if (!pid) {
      			pause();
      			assert(0);
      			return 0x23;
      		}
      
      		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
      
      		kill(pid, SIGCONT);	// <--- also clears STOP_DEQUEUD
      
      		assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGCONT);
      
      		assert(ptrace(PTRACE_CONT, pid, 0, SIGSTOP) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGSTOP);
      
      		kill(pid, SIGKILL);
      		return 0;
      	}
      
      Without the patch it hangs. After the patch SIGSTOP "injected" by the
      tracer is not ignored and stops the tracee.
      
      Note also that if this test-case uses, say, SIGWINCH instead of SIGCONT,
      everything works without the patch. This can't be right, and this is
      confusing.
      
      The problem is that SIGSTOP (or any other sig_kernel_stop() signal) has
      no effect without JOBCTL_STOP_DEQUEUED. This means it is simply ignored
      after PTRACE_CONT unless JOBCTL_STOP_DEQUEUED was set "by accident", say
      it wasn't cleared after initial SIGSTOP sent by PTRACE_ATTACH.
      
      At first glance we could change ptrace_signal() to add STOP_DEQUEUED
      after return from ptrace_stop(), but this is not right in case when the
      tracer does not change the reported SIGSTOP and SIGCONT comes in between.
      This is even more wrong with PT_SEIZED, SIGCONT adds JOBCTL_TRAP_NOTIFY
      which will be "lost" during the TRAP_STOP | TRAP_NOTIFY report.
      
      So lets add STOP_DEQUEUED _before_ we report the signal. It has no effect
      unless sig_kernel_stop() == T after the tracer resumes us, and in the
      latter case the pending STOP_DEQUEUED means no SIGCONT in between, we
      should stop.
      
      Note also that if SIGCONT was sent, PT_SEIZED tracee will correctly
      report PTRACE_EVENT_STOP/SIGTRAP and thus the tracer can notice the fact
      SIGSTOP was cancelled.
      
      Also, move the current->ptrace check from ptrace_signal() to its caller,
      get_signal_to_deliver(), this looks more natural.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      8a352418
    • P
      signal: align __lock_task_sighand() irq disabling and RCU · a841796f
      Paul E. McKenney 提交于
      The __lock_task_sighand() function calls rcu_read_lock() with interrupts
      and preemption enabled, but later calls rcu_read_unlock() with interrupts
      disabled.  It is therefore possible that this RCU read-side critical
      section will be preempted and later RCU priority boosted, which means that
      rcu_read_unlock() will call rt_mutex_unlock() in order to deboost itself, but
      with interrupts disabled. This results in lockdep splats, so this commit
      nests the RCU read-side critical section within the interrupt-disabled
      region of code.  This prevents the RCU read-side critical section from
      being preempted, and thus prevents the attempt to deboost with interrupts
      disabled.
      
      It is quite possible that a better long-term fix is to make rt_mutex_unlock()
      disable irqs when acquiring the rt_mutex structure's ->wait_lock.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a841796f
  11. 28 6月, 2011 3 次提交
  12. 23 6月, 2011 2 次提交
    • T
      ptrace: kill trivial tracehooks · a288eecc
      Tejun Heo 提交于
      At this point, tracehooks aren't useful to mainline kernel and mostly
      just add an extra layer of obfuscation.  Although they have comments,
      without actual in-kernel users, it is difficult to tell what are their
      assumptions and they're actually trying to achieve.  To mainline
      kernel, they just aren't worth keeping around.
      
      This patch kills the following trivial tracehooks.
      
      * Ones testing whether task is ptraced.  Replace with ->ptrace test.
      
      	tracehook_expect_breakpoints()
      	tracehook_consider_ignored_signal()
      	tracehook_consider_fatal_signal()
      
      * ptrace_event() wrappers.  Call directly.
      
      	tracehook_report_exec()
      	tracehook_report_exit()
      	tracehook_report_vfork_done()
      
      * ptrace_release_task() wrapper.  Call directly.
      
      	tracehook_finish_release_task()
      
      * noop
      
      	tracehook_prepare_release_task()
      	tracehook_report_death()
      
      This doesn't introduce any behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a288eecc
    • T
      ptrace: kill task_ptrace() · d21142ec
      Tejun Heo 提交于
      task_ptrace(task) simply dereferences task->ptrace and isn't even used
      consistently only adding confusion.  Kill it and directly access
      ->ptrace instead.
      
      This doesn't introduce any behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      d21142ec
  13. 17 6月, 2011 4 次提交
    • T
      ptrace: implement PTRACE_LISTEN · 544b2c91
      Tejun Heo 提交于
      The previous patch implemented async notification for ptrace but it
      only worked while trace is running.  This patch introduces
      PTRACE_LISTEN which is suggested by Oleg Nestrov.
      
      It's allowed iff tracee is in STOP trap and puts tracee into
      quasi-running state - tracee never really runs but wait(2) and
      ptrace(2) consider it to be running.  While ptracer is listening,
      tracee is allowed to re-enter STOP to notify an async event.
      Listening state is cleared on the first notification.  Ptracer can
      also clear it by issuing INTERRUPT - tracee will re-trap into STOP
      with listening state cleared.
      
      This allows ptracer to monitor group stop state without running tracee
      - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
      wait(2) to wait for the next group stop event.  When it happens,
      PTRACE_GETSIGINFO provides information to determine the current state.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_INTERRUPT	0x4207
        #define PTRACE_LISTEN		0x4208
      
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee, tracer;
      	  int i;
      
      	  tracee = fork();
      	  if (!tracee)
      		  while (1)
      			  pause();
      
      	  tracer = fork();
      	  if (!tracer) {
      		  siginfo_t si;
      
      		  ptrace(PTRACE_SEIZE, tracee, NULL,
      			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
      	  repeat:
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      
      		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
      		  if (!si.si_code) {
      			  printf("tracer: SIG %d\n", si.si_signo);
      			  ptrace(PTRACE_CONT, tracee, NULL,
      				 (void *)(unsigned long)si.si_signo);
      			  goto repeat;
      		  }
      		  printf("tracer: stopped=%d signo=%d\n",
      			 si.si_signo != SIGTRAP, si.si_signo);
      		  if (si.si_signo != SIGTRAP)
      			  ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
      		  else
      			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      		  goto repeat;
      	  }
      
      	  for (i = 0; i < 3; i++) {
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGSTOP\n");
      		  kill(tracee, SIGSTOP);
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGCONT\n");
      		  kill(tracee, SIGCONT);
      	  }
      	  nanosleep(&ts1s, NULL);
      
      	  kill(tracer, SIGKILL);
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      This is identical to the program to test TRAP_NOTIFY except that
      tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
      This allows ptracer to monitor when group stop ends without running
      tracee.
      
        # ./test-listen
        tracer: stopped=0 signo=5
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
      
      -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
           task_stopped_code() as suggested by Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      544b2c91
    • T
      ptrace: implement TRAP_NOTIFY and use it for group stop events · fb1d910c
      Tejun Heo 提交于
      Currently there's no way for ptracer to find out whether group stop
      finished other than polling with INTERRUPT - GETSIGINFO - CONT
      sequence.  This patch implements group stop notification for ptracer
      using STOP traps.
      
      When group stop state of a seized tracee changes, JOBCTL_TRAP_NOTIFY
      is set, which schedules a STOP trap which is sticky - it isn't cleared
      by other traps and at least one STOP trap will happen eventually.
      STOP trap is synchronization point for event notification and the
      tracer can determine the current group stop state by looking at the
      signal number portion of exit code (si_status from waitid(2) or
      si_code from PTRACE_GETSIGINFO).
      
      Notifications are generated both on start and end of group stops but,
      because group stop participation always happens before STOP trap, this
      doesn't cause an extra trap while tracee is participating in group
      stop.  The symmetry will be useful later.
      
      Note that this notification works iff tracee is not trapped.
      Currently there is no way to be notified of group stop state changes
      while tracee is trapped.  This will be addressed by a later patch.
      
      An example program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_INTERRUPT	0x4207
      
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee, tracer;
      	  int i;
      
      	  tracee = fork();
      	  if (!tracee)
      		  while (1)
      			  pause();
      
      	  tracer = fork();
      	  if (!tracer) {
      		  siginfo_t si;
      
      		  ptrace(PTRACE_SEIZE, tracee, NULL,
      			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
      	  repeat:
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      
      		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
      		  if (!si.si_code) {
      			  printf("tracer: SIG %d\n", si.si_signo);
      			  ptrace(PTRACE_CONT, tracee, NULL,
      				 (void *)(unsigned long)si.si_signo);
      			  goto repeat;
      		  }
      		  printf("tracer: stopped=%d signo=%d\n",
      			 si.si_signo != SIGTRAP, si.si_signo);
      		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      		  goto repeat;
      	  }
      
      	  for (i = 0; i < 3; i++) {
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGSTOP\n");
      		  kill(tracee, SIGSTOP);
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGCONT\n");
      		  kill(tracee, SIGCONT);
      	  }
      	  nanosleep(&ts1s, NULL);
      
      	  kill(tracer, SIGKILL);
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      In the above program, tracer keeps tracee running and gets
      notification of each group stop state changes.
      
        # ./test-notify
        tracer: stopped=0 signo=5
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      fb1d910c
    • T
      ptrace: implement PTRACE_SEIZE · 3544d72a
      Tejun Heo 提交于
      PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
      effects on tracee signal and job control states.  This patch
      implements a new ptrace request PTRACE_SEIZE which attaches a tracee
      without trapping it or affecting its signal and job control states.
      
      The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
      flags in @data.  Currently, the only defined flag is
      PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
      PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
      The changes will be implemented gradually and the DEVEL flag is to
      prevent programs which expect full SEIZE behavior from using it before
      all the behavior modifications are complete while allowing unit
      testing.  The flag will be removed once SEIZE behaviors are completely
      implemented.
      
      * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap.  After
        attaching tracee continues to run unless a trap condition occurs.
      
      * PTRACE_SEIZE doesn't affect signal or group stop state.
      
      * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
        exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
        the stopping signals if group stop is in effect or SIGTRAP
        otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
        instead of NULL.
      
      Seizing sets PT_SEIZED in ->ptrace of the tracee.  This flag will be
      used to determine whether new SEIZE behaviors should be enabled.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts100ms = { .tv_nsec = 100000000 };
        static const struct timespec ts1s = { .tv_sec = 1 };
        static const struct timespec ts3s = { .tv_sec = 3 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  while (1) {
      			  printf("tracee: alive\n");
      			  nanosleep(&ts1s, NULL);
      		  }
      	  }
      
      	  if (argc > 1)
      		  kill(tracee, SIGSTOP);
      
      	  nanosleep(&ts100ms, NULL);
      
      	  ptrace(PTRACE_SEIZE, tracee, NULL,
      		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      	  if (argc > 1) {
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      	  }
      	  nanosleep(&ts3s, NULL);
      	  printf("tracer: exiting\n");
      	  return 0;
        }
      
      When the above program is called w/o argument, tracee is seized while
      running and remains running.  When tracer exits, tracee continues to
      run and print out messages.
      
        # ./test-seize-simple
        tracee: alive
        tracee: alive
        tracee: alive
        tracer: exiting
        tracee: alive
        tracee: alive
      
      When called with an argument, tracee is seized from stopped state and
      continued, and returns to stopped state when tracer exits.
      
        # ./test-seize
        tracee: alive
        tracee: alive
        tracee: alive
        tracer: exiting
        # ps -el|grep test-seize
        1 T     0  4720     1  0  80   0 -   941 signal ttyS0    00:00:00 test-seize
      
      -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
           suggested.
      
      -v3: PTRACE_EVENT_STOP traps now report group stop state by signr.  If
           group stop is in effect the stop signal number is returned as
           part of exit_code; otherwise, SIGTRAP.  This was suggested by
           Denys and Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      3544d72a
    • T
      job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap · 73ddff2b
      Tejun Heo 提交于
      do_signal_stop() implemented both normal group stop and trap for group
      stop while ptraced.  This approach has been enough but scheduled
      changes require trap mechanism which can be used in more generic
      manner and using group stop trap for generic trap site simplifies both
      userland visible interface and implementation.
      
      This patch adds a new jobctl flag - JOBCTL_TRAP_STOP.  When set, it
      triggers a trap site, which behaves like group stop trap, in
      get_signal_to_deliver() after checking for pending signals.  While
      ptraced, do_signal_stop() doesn't stop itself.  It initiates group
      stop if requested and schedules JOBCTL_TRAP_STOP and returns.  The
      caller - get_signal_to_deliver() - is responsible for checking whether
      TRAP_STOP is pending afterwards and handling it.
      
      ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
      JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
      bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
      after detach.
      
      While at it, add proper function comment to do_signal_stop() and make
      it return bool.
      
      -v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
           instead of JOBCTL_PENDING_MASK.  This avoids accidentally
           clearing JOBCTL_STOP_CONSUME.  Spotted by Oleg.
      
      -v3: do_signal_stop() updated to return %false without dropping
           siglock while ptraced and TRAP_STOP check moved inside for(;;)
           loop after group stop participation.  This avoids unnecessary
           relocking and also will help avoiding unnecessary traps by
           consuming group stop before handling pending traps.
      
      -v4: Jobctl trap handling moved into a separate function -
           do_jobctl_trap().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      73ddff2b
  14. 15 6月, 2011 1 次提交
  15. 05 6月, 2011 7 次提交
    • T
      signal: remove three noop tracehooks · dd1d6772
      Tejun Heo 提交于
      Remove the following three noop tracehooks in signals.c.
      
      * tracehook_force_sigpending()
      * tracehook_get_signal()
      * tracehook_finish_jctl()
      
      The code area is about to be updated and these hooks don't do anything
      other than obfuscating the logic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      dd1d6772
    • T
      ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit · 62c124ff
      Tejun Heo 提交于
      ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
      ->wait_chldexit was already complicated with waker-side filtering
      without adding TRAPPING wait on top of it.  Also, it unnecessarily
      made TRAPPING clearing depend on the current ptrace relationship - if
      the ptracee is detached, wakeup is lost.
      
      There is no reason to use signal->wait_chldexit here.  We're just
      waiting for JOBCTL_TRAPPING bit to clear and given the relatively
      infrequent use of ptrace, bit_waitqueue can serve it perfectly.
      
      This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
      signal->wait_chldexit.
      
      -v2: Use JOBCTL_*_BIT macros instead of ilog2() as suggested by Linus.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      62c124ff
    • T
      job control: introduce task_set_jobctl_pending() · 7dd3db54
      Tejun Heo 提交于
      task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
      pending bits too.  Setting pending conditions on a dying task may make
      the task unkillable.  Currently, each setting site is responsible for
      checking for the condition but with to-be-added job control traps this
      becomes too fragile.
      
      This patch adds task_set_jobctl_pending() which should be used when
      setting task->jobctl bits to schedule a stop or trap.  The function
      performs the followings to ease setting pending bits.
      
      * Sanity checks.
      
      * If fatal signal is pending or PF_EXITING is set, no bit is set.
      
      * STOP_SIGMASK is automatically cleared if new value is being set.
      
      do_signal_stop() and ptrace_attach() are updated to use
      task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
      The surrounding structures around setting are changed to fit
      task_set_jobctl_pending() better but there should be no userland
      visible behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      7dd3db54
    • T
      job control: make task_clear_jobctl_pending() clear TRAPPING automatically · 6dfca329
      Tejun Heo 提交于
      JOBCTL_TRAPPING indicates that ptracer is waiting for tracee to
      (re)transit into TRACED.  task_clear_jobctl_pending() must be called
      when either tracee enters TRACED or the transition is cancelled for
      some reason.  The former is achieved by explicitly calling
      task_clear_jobctl_pending() in ptrace_stop() and the latter by calling
      it at the end of do_signal_stop().
      
      Calling task_clear_jobctl_trapping() at the end of do_signal_stop()
      limits the scope TRAPPING can be used and is fragile in that seemingly
      unrelated changes to tracee's control flow can lead to stuck TRAPPING.
      
      We already have task_clear_jobctl_pending() calls on those cancelling
      events to clear JOBCTL_STOP_PENDING.  Cancellations can be handled by
      making those call sites use JOBCTL_PENDING_MASK instead and updating
      task_clear_jobctl_pending() such that task_clear_jobctl_trapping() is
      called automatically if no stop/trap is pending.
      
      This patch makes the above changes and removes the fallback
      task_clear_jobctl_trapping() call from do_signal_stop().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      6dfca329
    • T
      job control: introduce JOBCTL_PENDING_MASK and task_clear_jobctl_pending() · 3759a0d9
      Tejun Heo 提交于
      This patch introduces JOBCTL_PENDING_MASK and replaces
      task_clear_jobctl_stop_pending() with task_clear_jobctl_pending()
      which takes an extra @mask argument.
      
      JOBCTL_PENDING_MASK is currently equal to JOBCTL_STOP_PENDING but
      future patches will add more bits.  recalc_sigpending_tsk() is updated
      to use JOBCTL_PENDING_MASK instead.
      
      task_clear_jobctl_pending() takes @mask which in subset of
      JOBCTL_PENDING_MASK and clears the relevant jobctl bits.  If
      JOBCTL_STOP_PENDING is set, other STOP bits are cleared together.  All
      task_clear_jobctl_stop_pending() users are updated to call
      task_clear_jobctl_pending() with JOBCTL_STOP_PENDING which is
      functionally identical to task_clear_jobctl_stop_pending().
      
      This patch doesn't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      3759a0d9
    • T
      ptrace: relocate set_current_state(TASK_TRACED) in ptrace_stop() · 81be24b8
      Tejun Heo 提交于
      In ptrace_stop(), after arch hook is done, the task state and jobctl
      bits are updated while holding siglock.  The ordering requirement
      there is that TASK_TRACED is set before JOBCTL_TRAPPING is cleared to
      prevent ptracer waiting on TRAPPING doesn't end up waking up TRACED is
      actually set and sees TASK_RUNNING in wait(2).
      
      Move set_current_state(TASK_TRACED) to the top of the block and
      reorganize comments.  This makes the ordering more obvious
      (TASK_TRACED before other updates) and helps future updates to group
      stop participation.
      
      This patch doesn't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      81be24b8
    • T
      job control: rename signal->group_stop and flags to jobctl and update them · a8f072c1
      Tejun Heo 提交于
      signal->group_stop currently hosts mostly group stop related flags;
      however, it's gonna be used for wider purposes and the GROUP_STOP_
      flag prefix becomes confusing.  Rename signal->group_stop to
      signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.
      
      Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
      defined in terms of them to allow using bitops later.
      
      While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
      future additions.
      
      This doesn't cause any functional change.
      
      -v2: JOBCTL_*_BIT macros added as suggested by Linus.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a8f072c1
  16. 26 5月, 2011 1 次提交
  17. 09 5月, 2011 2 次提交
    • T
      ptrace: fix signal->wait_chldexit usage in task_clear_group_stop_trapping() · 40ae717d
      Tejun Heo 提交于
      GROUP_STOP_TRAPPING waiting mechanism piggybacks on
      signal->wait_chldexit which is primarily used to implement waiting for
      wait(2) and friends.  When do_wait() waits on signal->wait_chldexit,
      it uses a custom wake up callback, child_wait_callback(), which
      expects the child task which is waking up the parent to be passed in
      as @key to filter out spurious wakeups.
      
      task_clear_group_stop_trapping() used __wake_up_sync() which uses NULL
      @key causing the following oops if the parent was doing do_wait().
      
        BUG: unable to handle kernel NULL pointer dereference at 00000000000002d8
        IP: [<ffffffff810499f9>] child_wait_callback+0x29/0x80
        PGD 1d899067 PUD 1e418067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP
        last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/local_cpus
        CPU 2
        Modules linked in:
      
        Pid: 4498, comm: test-continued Not tainted 2.6.39-rc6-work+ #32 Bochs Bochs
        RIP: 0010:[<ffffffff810499f9>]  [<ffffffff810499f9>] child_wait_callback+0x29/0x80
        RSP: 0000:ffff88001b889bf8  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: ffff88001fab3af8 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88001d91df20
        RBP: ffff88001b889c08 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
        R13: ffff88001fb70550 R14: 0000000000000000 R15: 0000000000000001
        FS:  00007f26ccae4700(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00000000000002d8 CR3: 000000001b8ac000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process test-continued (pid: 4498, threadinfo ffff88001b888000, task ffff88001fb88000)
        Stack:
         ffff88001b889c18 ffff88001fb70538 ffff88001b889c58 ffffffff810312f9
         0000000000000001 0000000200000001 ffff88001b889c58 ffff88001fb70518
         0000000000000002 0000000000000082 0000000000000001 0000000000000000
        Call Trace:
         [<ffffffff810312f9>] __wake_up_common+0x59/0x90
         [<ffffffff81035263>] __wake_up_sync_key+0x53/0x80
         [<ffffffff810352a0>] __wake_up_sync+0x10/0x20
         [<ffffffff8105a984>] task_clear_jobctl_trapping+0x44/0x50
         [<ffffffff8105bcbc>] ptrace_stop+0x7c/0x290
         [<ffffffff8105c20a>] do_signal_stop+0x28a/0x2d0
         [<ffffffff8105d27f>] get_signal_to_deliver+0x14f/0x5a0
         [<ffffffff81002175>] do_signal+0x75/0x7b0
         [<ffffffff8100292d>] do_notify_resume+0x5d/0x70
         [<ffffffff8182e36a>] retint_signal+0x46/0x8c
        Code: 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 8b 47 d8 83 f8 03 74 3a 85 c0 49 89 c8 75 23 89 c0 48 8b 5f e0 4c 8d 0c 40 31 c0 <4b> 39 9c c8 d8 02 00 00 74 1d 48 83 c4 08 5b c9 c3 66 0f 1f 44
      
      Fix it by using __wake_up_sync_key() and passing in the child as @key.
      
      I still think it's a mistake to piggyback on wait_chldexit for this.
      Given the relative low frequency of ptrace use, we would be much
      better off leaving already complex wait_chldexit alone and using bit
      waitqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      40ae717d
    • O
      signal: sys_sigprocmask() needs retarget_shared_pending() · 2e4f7c77
      Oleg Nesterov 提交于
      sys_sigprocmask() changes current->blocked by hand. Convert this code
      to use set_current_blocked().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      2e4f7c77
  18. 28 4月, 2011 7 次提交