1. 10 7月, 2013 2 次提交
    • O
      ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child) · fab840fc
      Oleg Nesterov 提交于
      Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child).  This
      frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
      ensures that the tracee won't be killed by SIGTRAP triggered by the
      active breakpoints.
      
      Test-case:
      
      	unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
      	{
      		unsigned long dr7;
      
      		dr7 = ((len | type) & 0xf)
      			<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
      		if (enable)
      			dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
      		return dr7;
      	}
      
      	int write_dr(int pid, int dr, unsigned long val)
      	{
      		return ptrace(PTRACE_POKEUSER, pid,
      				offsetof (struct user, u_debugreg[dr]),
      				val);
      	}
      
      	void func(void)
      	{
      	}
      
      	int main(void)
      	{
      		int pid, stat;
      		unsigned long dr7;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			kill(getpid(), SIGHUP);
      
      			func();
      			return 0x13;
      		}
      
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(WSTOPSIG(stat) == SIGHUP);
      
      		assert(write_dr(pid, 0, (long)func) == 0);
      		dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
      		assert(write_dr(pid, 7, dr7) == 0);
      
      		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(stat == 0x1300);
      
      		return 0;
      	}
      
      Before this patch the child is killed after PTRACE_DETACH.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fab840fc
    • O
      ptrace: revert "Prepare to fix racy accesses on task breakpoints" · 7c8df286
      Oleg Nesterov 提交于
      This reverts commit bf26c018 ("Prepare to fix racy accesses on task
      breakpoints").
      
      The patch was fine but we can no longer race with SIGKILL after commit
      9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race
      with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
      ->ptrace_bps[] can't go away.
      
      Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
      we can kill them and remove task->ptrace_bp_refcnt.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c8df286
  2. 04 7月, 2013 1 次提交
    • A
      ptrace: add ability to get/set signal-blocked mask · 29000cae
      Andrey Vagin 提交于
      crtools uses a parasite code for dumping processes.  The parasite code is
      injected into a process with help PTRACE_SEIZE.
      
      Currently crtools blocks signals from a parasite code.  If a process has
      pending signals, crtools wait while a process handles these signals.
      
      This method is not suitable for stopped tasks.  A stopped task can have a
      few pending signals, when we will try to execute a parasite code, we will
      need to drop SIGSTOP, but all other signals must remain pending, because a
      state of processes must not be changed during checkpointing.
      
      This patch adds two ptrace commands to set/get signal-blocked mask.
      
      I think gdb can use this commands too.
      
      [akpm@linux-foundation.org: be consistent with brace layout]
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29000cae
  3. 30 6月, 2013 1 次提交
  4. 08 5月, 2013 1 次提交
  5. 01 5月, 2013 1 次提交
    • A
      ptrace: add ability to retrieve signals without removing from a queue (v4) · 84c751bd
      Andrey Vagin 提交于
      This patch adds a new ptrace request PTRACE_PEEKSIGINFO.
      
      This request is used to retrieve information about pending signals
      starting with the specified sequence number.  Siginfo_t structures are
      copied from the child into the buffer starting at "data".
      
      The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
      struct ptrace_peeksiginfo_args {
      	u64 off;	/* from which siginfo to start */
      	u32 flags;
      	s32 nr;		/* how may siginfos to take */
      };
      
      "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
      i386 and a negative values is used for errors.
      
      Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
      signals from process-wide queue.  If this flag is not set, signals are
      read from a per-thread queue.
      
      The request PTRACE_PEEKSIGINFO returns a number of dumped signals.  If a
      signal with the specified sequence number doesn't exist, ptrace returns
      zero.  The request returns an error, if no signal has been dumped.
      
      Errors:
      EINVAL - one or more specified flags are not supported or nr is negative
      EFAULT - buf or addr is outside your accessible address space.
      
      A result siginfo contains a kernel part of si_code which usually striped,
      but it's required for queuing the same siginfo back during restore of
      pending signals.
      
      This functionality is required for checkpointing pending signals.  Pedro
      Alves suggested using it in "gdb" to peek at pending signals.  gdb already
      uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
      dequeued.  This functionality allows gdb to look at the pending signals
      which were not reported yet.
      
      The prototype of this code was developed by Oleg Nesterov.
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Alves <palves@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c751bd
  6. 09 2月, 2013 1 次提交
    • J
      uprobes: Add exports for module use · e8440c14
      Josh Stone 提交于
      The original pull message for uprobes (commit 654443e2) noted:
      
        This tree includes uprobes support in 'perf probe' - but SystemTap
        (and other tools) can take advantage of user probe points as well.
      
      In order to actually be usable in module-based tools like SystemTap, the
      interface needs to be exported.  This patch first adds the obvious
      exports for uprobe_register and uprobe_unregister.  Then it also adds
      one for task_user_regset_view, which is necessary to get the correct
      state of userspace registers.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      e8440c14
  7. 23 1月, 2013 2 次提交
    • O
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 9899d11f
      Oleg Nesterov 提交于
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      Reported-by: NSalman Qazi <sqazi@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9899d11f
    • O
      ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() · 910ffdb1
      Oleg Nesterov 提交于
      Cleanup and preparation for the next change.
      
      signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
      actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
      necessary mask.
      
      Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
      signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
      which adds __TASK_TRACED.
      
      This way ptrace_signal_wake_up() can work "inside" ptrace_request()
      even if the tracee doesn't have the TASK_WAKEKILL bit set.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910ffdb1
  8. 21 1月, 2013 1 次提交
  9. 18 12月, 2012 1 次提交
    • O
      ptrace: introduce PTRACE_O_EXITKILL · 992fb6e1
      Oleg Nesterov 提交于
      Ptrace jailers want to be sure that the tracee can never escape
      from the control. However if the tracer dies unexpectedly the
      tracee continues to run in potentially unsafe mode.
      
      Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
      it sends SIGKILL to every tracee which has this bit set.
      
      Note that the new option is not equal to the last-option << 1.  Because
      currently all options have an event, and the new one starts the eventless
      group.  It uses the random 20 bit, so we have the room for 12 more events,
      but we can also add the new eventless options below this one.
      
      Suggested by Amnon Shiloh.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: Chris Evans <scarybeasts@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      992fb6e1
  10. 20 11月, 2012 1 次提交
  11. 03 8月, 2012 1 次提交
  12. 03 5月, 2012 1 次提交
  13. 08 4月, 2012 1 次提交
  14. 24 3月, 2012 4 次提交
  15. 06 1月, 2012 2 次提交
    • E
      ptrace: do not audit capability check when outputing /proc/pid/stat · 69f594a3
      Eric Paris 提交于
      Reading /proc/pid/stat of another process checks if one has ptrace permissions
      on that process.  If one does have permissions it outputs some data about the
      process which might have security and attack implications.  If the current
      task does not have ptrace permissions the read still works, but those fields
      are filled with inocuous (0) values.  Since this check and a subsequent denial
      is not a violation of the security policy we should not audit such denials.
      
      This can be quite useful to removing ptrace broadly across a system without
      flooding the logs when ps is run or something which harmlessly walks proc.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      69f594a3
    • E
      capabilities: remove task_ns_* functions · f1c84dae
      Eric Paris 提交于
      task_ in the front of a function, in the security subsystem anyway, means
      to me at least, that we are operating with that task as the subject of the
      security decision.  In this case what it means is that we are using current as
      the subject but we use the task to get the right namespace.  Who in the world
      would ever realize that's what task_ns_capability means just by the name?  This
      patch eliminates the task_ns functions entirely and uses the has_ns_capability
      function instead.  This means we explicitly open code the ns in question in
      the caller.  I think it makes the caller a LOT more clear what is going on.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      f1c84dae
  16. 05 1月, 2012 1 次提交
    • O
      ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b
      Oleg Nesterov 提交于
      This is the temporary simple fix for 3.2, we need more changes in this
      area.
      
      1. do_signal_stop() assumes that the running untraced thread in the
         stopped thread group is not possible. This was our goal but it is
         not yet achieved: a stopped-but-resumed tracee can clone the running
         thread which can initiate another group-stop.
      
         Remove WARN_ON_ONCE(!current->ptrace).
      
      2. A new thread always starts with ->jobctl = 0. If it is auto-attached
         and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
         but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
         in do_jobctl_trap() if another debugger attaches.
      
         Change __ptrace_unlink() to set the artificial SIGSTOP for report.
      
         Alternatively we could change ptrace_init_task() to copy signr from
         current, but this means we can copy it for no reason and hide the
         possible similar problems.
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.1]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a88951b
  17. 31 10月, 2011 1 次提交
  18. 26 9月, 2011 1 次提交
  19. 19 7月, 2011 1 次提交
    • V
      connector: add an event for monitoring process tracers · f701e5b7
      Vladimir Zapolskiy 提交于
      This change adds a procfs connector event, which is emitted on every
      successful process tracer attach or detach.
      
      If some process connects to other one, kernelspace connector reports
      process id and thread group id of both these involved processes. On
      disconnection null process id is returned.
      
      Such an event allows to create a simple automated userspace mechanism
      to be aware about processes connecting to others, therefore predefined
      process policies can be applied to them if needed.
      
      Note, a detach signal is emitted only in case, if a tracer process
      explicitly executes PTRACE_DETACH request. In other cases like tracee
      or tracer exit detach event from proc connector is not reported.
      Signed-off-by: NVladimir Zapolskiy <vzapolskiy@gmail.com>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f701e5b7
  20. 28 6月, 2011 2 次提交
  21. 17 6月, 2011 4 次提交
    • T
      ptrace: implement PTRACE_LISTEN · 544b2c91
      Tejun Heo 提交于
      The previous patch implemented async notification for ptrace but it
      only worked while trace is running.  This patch introduces
      PTRACE_LISTEN which is suggested by Oleg Nestrov.
      
      It's allowed iff tracee is in STOP trap and puts tracee into
      quasi-running state - tracee never really runs but wait(2) and
      ptrace(2) consider it to be running.  While ptracer is listening,
      tracee is allowed to re-enter STOP to notify an async event.
      Listening state is cleared on the first notification.  Ptracer can
      also clear it by issuing INTERRUPT - tracee will re-trap into STOP
      with listening state cleared.
      
      This allows ptracer to monitor group stop state without running tracee
      - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
      wait(2) to wait for the next group stop event.  When it happens,
      PTRACE_GETSIGINFO provides information to determine the current state.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_INTERRUPT	0x4207
        #define PTRACE_LISTEN		0x4208
      
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee, tracer;
      	  int i;
      
      	  tracee = fork();
      	  if (!tracee)
      		  while (1)
      			  pause();
      
      	  tracer = fork();
      	  if (!tracer) {
      		  siginfo_t si;
      
      		  ptrace(PTRACE_SEIZE, tracee, NULL,
      			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
      	  repeat:
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      
      		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
      		  if (!si.si_code) {
      			  printf("tracer: SIG %d\n", si.si_signo);
      			  ptrace(PTRACE_CONT, tracee, NULL,
      				 (void *)(unsigned long)si.si_signo);
      			  goto repeat;
      		  }
      		  printf("tracer: stopped=%d signo=%d\n",
      			 si.si_signo != SIGTRAP, si.si_signo);
      		  if (si.si_signo != SIGTRAP)
      			  ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
      		  else
      			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      		  goto repeat;
      	  }
      
      	  for (i = 0; i < 3; i++) {
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGSTOP\n");
      		  kill(tracee, SIGSTOP);
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGCONT\n");
      		  kill(tracee, SIGCONT);
      	  }
      	  nanosleep(&ts1s, NULL);
      
      	  kill(tracer, SIGKILL);
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      This is identical to the program to test TRAP_NOTIFY except that
      tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
      This allows ptracer to monitor when group stop ends without running
      tracee.
      
        # ./test-listen
        tracer: stopped=0 signo=5
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
      
      -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
           task_stopped_code() as suggested by Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      544b2c91
    • T
      ptrace: implement PTRACE_INTERRUPT · fca26f26
      Tejun Heo 提交于
      Currently, there's no way to trap a running ptracee short of sending a
      signal which has various side effects.  This patch implements
      PTRACE_INTERRUPT which traps ptracee without any signal or job control
      related side effect.
      
      The implementation is almost trivial.  It uses the group stop trap -
      SIGTRAP | PTRACE_EVENT_STOP << 8.  A new trap flag
      JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
      cleared when any trap happens.  As INTERRUPT should be useable
      regardless of the current state of tracee, task_is_traced() test in
      ptrace_check_attach() is skipped for INTERRUPT.
      
      PTRACE_INTERRUPT is available iff tracee is attached with
      PTRACE_SEIZE.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_INTERRUPT	0x4207
      
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts100ms = { .tv_nsec = 100000000 };
        static const struct timespec ts1s = { .tv_sec = 1 };
        static const struct timespec ts3s = { .tv_sec = 3 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  while (1) {
      			  printf("tracee: alive pid=%d\n", getpid());
      			  nanosleep(&ts1s, NULL);
      		  }
      	  }
      
      	  if (argc > 1)
      		  kill(tracee, SIGSTOP);
      
      	  nanosleep(&ts100ms, NULL);
      
      	  ptrace(PTRACE_SEIZE, tracee, NULL,
      		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      	  if (argc > 1) {
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      	  }
      	  nanosleep(&ts3s, NULL);
      
      	  printf("tracer: INTERRUPT and DETACH\n");
      	  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
      	  waitid(P_PID, tracee, NULL, WSTOPPED);
      	  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
      	  nanosleep(&ts3s, NULL);
      
      	  printf("tracer: exiting\n");
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      When called without argument, tracee is seized from running state,
      interrupted and then detached back to running state.
      
        # ./test-interrupt
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracer: INTERRUPT and DETACH
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracer: exiting
      
      When called with argument, tracee is seized from stopped state,
      continued, interrupted and then detached back to stopped state.
      
        # ./test-interrupt  1
        tracee: alive pid=4548
        tracee: alive pid=4548
        tracee: alive pid=4548
        tracer: INTERRUPT and DETACH
        tracer: exiting
      
      Before PTRACE_INTERRUPT, once the tracee was running, there was no way
      to trap tracee and do PTRACE_DETACH without causing side effect.
      
      -v2: Updated to use task_set_jobctl_pending() so that it doesn't end
           up scheduling TRAP_STOP if child is dying which may make the
           child unkillable.  Spotted by Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      fca26f26
    • T
      ptrace: implement PTRACE_SEIZE · 3544d72a
      Tejun Heo 提交于
      PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
      effects on tracee signal and job control states.  This patch
      implements a new ptrace request PTRACE_SEIZE which attaches a tracee
      without trapping it or affecting its signal and job control states.
      
      The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
      flags in @data.  Currently, the only defined flag is
      PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
      PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
      The changes will be implemented gradually and the DEVEL flag is to
      prevent programs which expect full SEIZE behavior from using it before
      all the behavior modifications are complete while allowing unit
      testing.  The flag will be removed once SEIZE behaviors are completely
      implemented.
      
      * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap.  After
        attaching tracee continues to run unless a trap condition occurs.
      
      * PTRACE_SEIZE doesn't affect signal or group stop state.
      
      * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
        exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
        the stopping signals if group stop is in effect or SIGTRAP
        otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
        instead of NULL.
      
      Seizing sets PT_SEIZED in ->ptrace of the tracee.  This flag will be
      used to determine whether new SEIZE behaviors should be enabled.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts100ms = { .tv_nsec = 100000000 };
        static const struct timespec ts1s = { .tv_sec = 1 };
        static const struct timespec ts3s = { .tv_sec = 3 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  while (1) {
      			  printf("tracee: alive\n");
      			  nanosleep(&ts1s, NULL);
      		  }
      	  }
      
      	  if (argc > 1)
      		  kill(tracee, SIGSTOP);
      
      	  nanosleep(&ts100ms, NULL);
      
      	  ptrace(PTRACE_SEIZE, tracee, NULL,
      		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      	  if (argc > 1) {
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      	  }
      	  nanosleep(&ts3s, NULL);
      	  printf("tracer: exiting\n");
      	  return 0;
        }
      
      When the above program is called w/o argument, tracee is seized while
      running and remains running.  When tracer exits, tracee continues to
      run and print out messages.
      
        # ./test-seize-simple
        tracee: alive
        tracee: alive
        tracee: alive
        tracer: exiting
        tracee: alive
        tracee: alive
      
      When called with an argument, tracee is seized from stopped state and
      continued, and returns to stopped state when tracer exits.
      
        # ./test-seize
        tracee: alive
        tracee: alive
        tracee: alive
        tracer: exiting
        # ps -el|grep test-seize
        1 T     0  4720     1  0  80   0 -   941 signal ttyS0    00:00:00 test-seize
      
      -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
           suggested.
      
      -v3: PTRACE_EVENT_STOP traps now report group stop state by signr.  If
           group stop is in effect the stop signal number is returned as
           part of exit_code; otherwise, SIGTRAP.  This was suggested by
           Denys and Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      3544d72a
    • T
      job control: introduce JOBCTL_TRAP_STOP and use it for group stop trap · 73ddff2b
      Tejun Heo 提交于
      do_signal_stop() implemented both normal group stop and trap for group
      stop while ptraced.  This approach has been enough but scheduled
      changes require trap mechanism which can be used in more generic
      manner and using group stop trap for generic trap site simplifies both
      userland visible interface and implementation.
      
      This patch adds a new jobctl flag - JOBCTL_TRAP_STOP.  When set, it
      triggers a trap site, which behaves like group stop trap, in
      get_signal_to_deliver() after checking for pending signals.  While
      ptraced, do_signal_stop() doesn't stop itself.  It initiates group
      stop if requested and schedules JOBCTL_TRAP_STOP and returns.  The
      caller - get_signal_to_deliver() - is responsible for checking whether
      TRAP_STOP is pending afterwards and handling it.
      
      ptrace_attach() is updated to use JOBCTL_TRAP_STOP instead of
      JOBCTL_STOP_PENDING and __ptrace_unlink() to clear all pending trap
      bits and TRAPPING so that TRAP_STOP and future trap bits don't linger
      after detach.
      
      While at it, add proper function comment to do_signal_stop() and make
      it return bool.
      
      -v2: __ptrace_unlink() updated to clear JOBCTL_TRAP_MASK and TRAPPING
           instead of JOBCTL_PENDING_MASK.  This avoids accidentally
           clearing JOBCTL_STOP_CONSUME.  Spotted by Oleg.
      
      -v3: do_signal_stop() updated to return %false without dropping
           siglock while ptraced and TRAP_STOP check moved inside for(;;)
           loop after group stop participation.  This avoids unnecessary
           relocking and also will help avoiding unnecessary traps by
           consuming group stop before handling pending traps.
      
      -v4: Jobctl trap handling moved into a separate function -
           do_jobctl_trap().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      73ddff2b
  22. 05 6月, 2011 5 次提交
    • T
      ptrace: use bit_waitqueue for TRAPPING instead of wait_chldexit · 62c124ff
      Tejun Heo 提交于
      ptracer->signal->wait_chldexit was used to wait for TRAPPING; however,
      ->wait_chldexit was already complicated with waker-side filtering
      without adding TRAPPING wait on top of it.  Also, it unnecessarily
      made TRAPPING clearing depend on the current ptrace relationship - if
      the ptracee is detached, wakeup is lost.
      
      There is no reason to use signal->wait_chldexit here.  We're just
      waiting for JOBCTL_TRAPPING bit to clear and given the relatively
      infrequent use of ptrace, bit_waitqueue can serve it perfectly.
      
      This patch makes JOBCTL_TRAPPING wait use bit_waitqueue instead of
      signal->wait_chldexit.
      
      -v2: Use JOBCTL_*_BIT macros instead of ilog2() as suggested by Linus.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      62c124ff
    • T
      job control: introduce task_set_jobctl_pending() · 7dd3db54
      Tejun Heo 提交于
      task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
      pending bits too.  Setting pending conditions on a dying task may make
      the task unkillable.  Currently, each setting site is responsible for
      checking for the condition but with to-be-added job control traps this
      becomes too fragile.
      
      This patch adds task_set_jobctl_pending() which should be used when
      setting task->jobctl bits to schedule a stop or trap.  The function
      performs the followings to ease setting pending bits.
      
      * Sanity checks.
      
      * If fatal signal is pending or PF_EXITING is set, no bit is set.
      
      * STOP_SIGMASK is automatically cleared if new value is being set.
      
      do_signal_stop() and ptrace_attach() are updated to use
      task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
      The surrounding structures around setting are changed to fit
      task_set_jobctl_pending() better but there should be no userland
      visible behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      7dd3db54
    • T
      ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments · 755e276b
      Tejun Heo 提交于
      PTRACE_INTERRUPT is going to be added which should also skip
      task_is_traced() check in ptrace_check_attach().  Rename @kill to
      @ignore_state and make it bool.  Add function comment while at it.
      
      This patch doesn't introduce any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      755e276b
    • T
      job control: rename signal->group_stop and flags to jobctl and update them · a8f072c1
      Tejun Heo 提交于
      signal->group_stop currently hosts mostly group stop related flags;
      however, it's gonna be used for wider purposes and the GROUP_STOP_
      flag prefix becomes confusing.  Rename signal->group_stop to
      signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.
      
      Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
      defined in terms of them to allow using bitops later.
      
      While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
      future additions.
      
      This doesn't cause any functional change.
      
      -v2: JOBCTL_*_BIT macros added as suggested by Linus.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a8f072c1
    • T
      ptrace: remove silly wait_trap variable from ptrace_attach() · 0b1007c3
      Tejun Heo 提交于
      Remove local variable wait_trap which determines whether to wait for
      !TRAPPING or not and simply wait for it if attach was successful.
      
      -v2: Oleg pointed out wait should happen iff attach was successful.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      0b1007c3
  23. 26 5月, 2011 1 次提交
    • O
      ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread · 0666fb51
      Oleg Nesterov 提交于
      It is not clear why ptrace_resume() does wake_up_process(). Unless the
      caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use
      wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do
      not need the extra and potentionally spurious wakeup.
      
      If the caller is PTRACE_KILL, wake_up_process() is even more wrong.
      The tracee can sleep in any state in any place, and if we have a buggy
      code which doesn't handle a spurious wakeup correctly PTRACE_KILL can
      be used to exploit it. For example:
      
      	int main(void)
      	{
      		int child, status;
      
      		child = fork();
      		if (!child) {
      			int ret;
      
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      
      			ret = pause();
      			printf("pause: %d %m\n", ret);
      
      			return 0x23;
      		}
      
      		sleep(1);
      		assert(ptrace(PTRACE_KILL, child, 0,0) == 0);
      
      		assert(child == wait(&status));
      		printf("wait: %x\n", status);
      
      		return 0;
      	}
      
      prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the
      userland. In this case sys_pause() is buggy as well and should be
      fixed.
      
      I do not know what was the original rationality behind PTRACE_KILL.
      The man page is simply wrong and afaics it was always wrong. Imho
      it should be deprecated, or may be it should do send_sig(SIGKILL)
      as Denys suggests, but in any case I do not think that the current
      behaviour was intentional.
      
      Note: there is another problem, ptrace_resume() changes ->exit_code
      and this can race with SIGKILL too. Eventually we should change ptrace
      to not use ->exit_code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      0666fb51
  24. 25 4月, 2011 1 次提交
    • F
      ptrace: Prepare to fix racy accesses on task breakpoints · bf26c018
      Frederic Weisbecker 提交于
      When a task is traced and is in a stopped state, the tracer
      may execute a ptrace request to examine the tracee state and
      get its task struct. Right after, the tracee can be killed
      and thus its breakpoints released.
      This can happen concurrently when the tracer is in the middle
      of reading or modifying these breakpoints, leading to dereferencing
      a freed pointer.
      
      Hence, to prepare the fix, create a generic breakpoint reference
      holding API. When a reference on the breakpoints of a task is
      held, the breakpoints won't be released until the last reference
      is dropped. After that, no more ptrace request on the task's
      breakpoints can be serviced for the tracer.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: v2.6.33.. <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com
      bf26c018
  25. 04 4月, 2011 1 次提交
  26. 24 3月, 2011 1 次提交