1. 05 6月, 2011 4 次提交
    • T
      job control: introduce task_set_jobctl_pending() · 7dd3db54
      Tejun Heo 提交于
      task->jobctl currently hosts JOBCTL_STOP_PENDING and will host TRAP
      pending bits too.  Setting pending conditions on a dying task may make
      the task unkillable.  Currently, each setting site is responsible for
      checking for the condition but with to-be-added job control traps this
      becomes too fragile.
      
      This patch adds task_set_jobctl_pending() which should be used when
      setting task->jobctl bits to schedule a stop or trap.  The function
      performs the followings to ease setting pending bits.
      
      * Sanity checks.
      
      * If fatal signal is pending or PF_EXITING is set, no bit is set.
      
      * STOP_SIGMASK is automatically cleared if new value is being set.
      
      do_signal_stop() and ptrace_attach() are updated to use
      task_set_jobctl_pending() instead of setting STOP_PENDING explicitly.
      The surrounding structures around setting are changed to fit
      task_set_jobctl_pending() better but there should be no userland
      visible behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      7dd3db54
    • T
      ptrace: ptrace_check_attach(): rename @kill to @ignore_state and add comments · 755e276b
      Tejun Heo 提交于
      PTRACE_INTERRUPT is going to be added which should also skip
      task_is_traced() check in ptrace_check_attach().  Rename @kill to
      @ignore_state and make it bool.  Add function comment while at it.
      
      This patch doesn't introduce any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      755e276b
    • T
      job control: rename signal->group_stop and flags to jobctl and update them · a8f072c1
      Tejun Heo 提交于
      signal->group_stop currently hosts mostly group stop related flags;
      however, it's gonna be used for wider purposes and the GROUP_STOP_
      flag prefix becomes confusing.  Rename signal->group_stop to
      signal->jobctl and rename all GROUP_STOP_* flags to JOBCTL_*.
      
      Bit position macros JOBCTL_*_BIT are defined and JOBCTL_* flags are
      defined in terms of them to allow using bitops later.
      
      While at it, reassign JOBCTL_TRAPPING to bit 22 to better accomodate
      future additions.
      
      This doesn't cause any functional change.
      
      -v2: JOBCTL_*_BIT macros added as suggested by Linus.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a8f072c1
    • T
      ptrace: remove silly wait_trap variable from ptrace_attach() · 0b1007c3
      Tejun Heo 提交于
      Remove local variable wait_trap which determines whether to wait for
      !TRAPPING or not and simply wait for it if attach was successful.
      
      -v2: Oleg pointed out wait should happen iff attach was successful.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      0b1007c3
  2. 26 5月, 2011 1 次提交
    • O
      ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread · 0666fb51
      Oleg Nesterov 提交于
      It is not clear why ptrace_resume() does wake_up_process(). Unless the
      caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use
      wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do
      not need the extra and potentionally spurious wakeup.
      
      If the caller is PTRACE_KILL, wake_up_process() is even more wrong.
      The tracee can sleep in any state in any place, and if we have a buggy
      code which doesn't handle a spurious wakeup correctly PTRACE_KILL can
      be used to exploit it. For example:
      
      	int main(void)
      	{
      		int child, status;
      
      		child = fork();
      		if (!child) {
      			int ret;
      
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      
      			ret = pause();
      			printf("pause: %d %m\n", ret);
      
      			return 0x23;
      		}
      
      		sleep(1);
      		assert(ptrace(PTRACE_KILL, child, 0,0) == 0);
      
      		assert(child == wait(&status));
      		printf("wait: %x\n", status);
      
      		return 0;
      	}
      
      prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the
      userland. In this case sys_pause() is buggy as well and should be
      fixed.
      
      I do not know what was the original rationality behind PTRACE_KILL.
      The man page is simply wrong and afaics it was always wrong. Imho
      it should be deprecated, or may be it should do send_sig(SIGKILL)
      as Denys suggests, but in any case I do not think that the current
      behaviour was intentional.
      
      Note: there is another problem, ptrace_resume() changes ->exit_code
      and this can race with SIGKILL too. Eventually we should change ptrace
      to not use ->exit_code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      0666fb51
  3. 25 4月, 2011 1 次提交
    • F
      ptrace: Prepare to fix racy accesses on task breakpoints · bf26c018
      Frederic Weisbecker 提交于
      When a task is traced and is in a stopped state, the tracer
      may execute a ptrace request to examine the tracee state and
      get its task struct. Right after, the tracee can be killed
      and thus its breakpoints released.
      This can happen concurrently when the tracer is in the middle
      of reading or modifying these breakpoints, leading to dereferencing
      a freed pointer.
      
      Hence, to prepare the fix, create a generic breakpoint reference
      holding API. When a reference on the breakpoints of a task is
      held, the breakpoints won't be released until the last reference
      is dropped. After that, no more ptrace request on the task's
      breakpoints can be serviced for the tracer.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: v2.6.33.. <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1302284067-7860-2-git-send-email-fweisbec@gmail.com
      bf26c018
  4. 04 4月, 2011 1 次提交
  5. 24 3月, 2011 1 次提交
  6. 23 3月, 2011 4 次提交
    • T
      ptrace: Always put ptracee into appropriate execution state · 0e9f0a4a
      Tejun Heo 提交于
      Currently, __ptrace_unlink() wakes up the tracee iff it's in
      TASK_TRACED.  For unlinking from PTRACE_DETACH, this is correct as the
      tracee is guaranteed to be in TASK_TRACED or dead; however, unlinking
      also happens when the ptracer exits and in this case the ptracee can
      be in any state and ptrace might be left running even if the group it
      belongs to is stopped.
      
      This patch updates __ptrace_unlink() such that GROUP_STOP_PENDING is
      reinstated regardless of the ptracee's current state as long as it's
      alive and makes sure that signal_wake_up() is called if execution
      state transition is necessary.
      
      Test case follows.
      
        #include <unistd.h>
        #include <time.h>
        #include <sys/types.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(void)
        {
      	  pid_t tracee;
      	  siginfo_t si;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  while (1) {
      			  nanosleep(&ts1s, NULL);
      			  write(1, ".", 1);
      		  }
      	  }
      
      	  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      	  write(1, "exiting", 7);
      	  return 0;
        }
      
      Before the patch, after the parent process exits, the child is left
      running and prints out "." every second.
      
        exiting..... (continues)
      
      After the patch, the group stop initiated by the implied SIGSTOP from
      PTRACE_ATTACH is re-established when the parent exits.
      
        exiting
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      0e9f0a4a
    • T
      ptrace: Collapse ptrace_untrace() into __ptrace_unlink() · e3bd058f
      Tejun Heo 提交于
      Remove the extra task_is_traced() check in __ptrace_unlink() and
      collapse ptrace_untrace() into __ptrace_unlink().  This is to prepare
      for further changes.
      
      While at it, drop the comment on top of ptrace_untrace() and convert
      __ptrace_unlink() comment to docbook format.  Detailed comment will be
      added by the next patch.
      
      This patch doesn't cause any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      e3bd058f
    • T
      ptrace: Clean transitions between TASK_STOPPED and TRACED · d79fdd6d
      Tejun Heo 提交于
      Currently, if the task is STOPPED on ptrace attach, it's left alone
      and the state is silently changed to TRACED on the next ptrace call.
      The behavior breaks the assumption that arch_ptrace_stop() is called
      before any task is poked by ptrace and is ugly in that a task
      manipulates the state of another task directly.
      
      With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
      TRACED can be made clean.  The tracer can use the flag to tell the
      tracee to retry stop on attach and detach.  On retry, the tracee will
      enter the desired state in the correct way.  The lower 16bits of
      task->group_stop is used to remember the signal number which caused
      the last group stop.  This is used while retrying for ptrace attach as
      the original group_exit_code could have been consumed with wait(2) by
      then.
      
      As the real parent may wait(2) and consume the group_exit_code
      anytime, the group_exit_code needs to be saved separately so that it
      can be used when switching from regular sleep to ptrace_stop().  This
      is recorded in the lower 16bits of task->group_stop.
      
      If a task is already stopped and there's no intervening SIGCONT, a
      ptrace request immediately following a successful PTRACE_ATTACH should
      always succeed even if the tracer doesn't wait(2) for attach
      completion; however, with this change, the tracee might still be
      TASK_RUNNING trying to enter TASK_TRACED which would cause the
      following request to fail with -ESRCH.
      
      This intermediate state is hidden from the ptracer by setting
      GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
      for it to clear on its signal->wait_chldexit.  Completing the
      transition or getting killed clears TRAPPING and wakes up the tracer.
      
      Note that the STOPPED -> RUNNING -> TRACED transition is still visible
      to other threads which are in the same group as the ptracer and the
      reverse transition is visible to all.  Please read the comments for
      details.
      
      Oleg:
      
      * Spotted a race condition where a task may retry group stop without
        proper bookkeeping.  Fixed by redoing bookkeeping on retry.
      
      * Spotted that the transition is visible to userland in several
        different ways.  Most are fixed with GROUP_STOP_TRAPPING.  Unhandled
        corner case is documented.
      
      * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
        task would result in more consistent behavior.
      
      * Pointed out that calling ptrace_stop() from do_signal_stop() in
        TASK_STOPPED can race with group stop start logic and then confuse
        the TRAPPING wait in ptrace_check_attach().  ptrace_stop() is now
        called with TASK_RUNNING.
      
      * Suggested using signal->wait_chldexit instead of bit wait.
      
      * Spotted a race condition between TRACED transition and clearing of
        TRAPPING.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      d79fdd6d
    • T
      ptrace: Remove the extra wake_up_state() from ptrace_detach() · 9f2bf651
      Tejun Heo 提交于
      This wake_up_state() has a turbulent history.  This is a remnant from
      ancient ptrace implementation and patently wrong.  Commit 95a3540d
      (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic) removed
      it but the change was reverted later by commit edaba2c5 (ptrace:
      revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic")
      citing compatibility breakage and general brokeness of the whole group
      stop / ptrace interaction.  Then, recently, it got converted from
      wake_up_process() to wake_up_state() to make it less dangerous.
      
      Digging through the mailing archives, the compatibility breakage
      doesn't seem to be critical in the sense that the behavior isn't well
      defined or reliable to begin with and it seems to have been agreed to
      remove the wakeup with proper cleanup of the whole thing.
      
      Now that the group stop and its interaction with ptrace are being
      cleaned up, it's high time to finally kill this silliness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      9f2bf651
  7. 05 3月, 2011 1 次提交
  8. 12 2月, 2011 1 次提交
    • T
      ptrace: use safer wake up on ptrace_detach() · 01e05e9a
      Tejun Heo 提交于
      The wake_up_process() call in ptrace_detach() is spurious and not
      interlocked with the tracee state.  IOW, the tracee could be running or
      sleeping in any place in the kernel by the time wake_up_process() is
      called.  This can lead to the tracee waking up unexpectedly which can be
      dangerous.
      
      The wake_up is spurious and should be removed but for now reduce its
      toxicity by only waking up if the tracee is in TRACED or STOPPED state.
      
      This bug can possibly be used as an attack vector.  I don't think it
      will take too much effort to come up with an attack which triggers oops
      somewhere.  Most sleeps are wrapped in condition test loops and should
      be safe but we have quite a number of places where sleep and wakeup
      conditions are expected to be interlocked.  Although the window of
      opportunity is tiny, ptrace can be used by non-privileged users and with
      some loading the window can definitely be extended and exploited.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e05e9a
  9. 28 10月, 2010 4 次提交
  10. 11 8月, 2010 1 次提交
  11. 28 5月, 2010 2 次提交
  12. 27 4月, 2010 1 次提交
  13. 10 4月, 2010 1 次提交
    • A
      ptrace: kill BKL in ptrace syscall · 5534ecb2
      Arnd Bergmann 提交于
      The comment suggests that this usage is stale. There is no bkl in the
      exec path so if there is a race lurking there, the bkl in ptrace is
      not going to help in this regard.
      
      Overview of the possibility of "accidental" races this bkl might
      protect:
      
      - ptrace_traceme() is protected against task removal and concurrent
      read/write on current->ptrace as it locks write tasklist_lock.
      
      - arch_ptrace_attach() is serialized by ptrace_traceme() against
      concurrent PTRACE_TRACEME or PTRACE_ATTACH
      
      - ptrace_attach() is protected the same way ptrace_traceme() and
      in turn serializes arch_ptrace_attach()
      
      - ptrace_check_attach() does its own well described serializing too.
      
      There is no obvious race here.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Roland McGrath <roland@redhat.com>
      5534ecb2
  14. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  15. 24 2月, 2010 1 次提交
  16. 12 2月, 2010 1 次提交
    • S
      ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET · 2225a122
      Suresh Siddha 提交于
      Generic support for PTRACE_GETREGSET/PTRACE_SETREGSET commands which
      export the regsets supported by each architecture using the correponding
      NT_* types. These NT_* types are already part of the userland ABI, used
      in representing the architecture specific register sets as different NOTES
      in an ELF core file.
      
      'addr' parameter for the ptrace system call encode the REGSET type (using
      the corresppnding NT_* type) and the 'data' parameter points to the
      struct iovec having the user buffer and the length of that buffer.
      
      	struct iovec iov = { buf, len};
      	ret = ptrace(PTRACE_GETREGSET/PTRACE_SETREGSET, pid, NT_XXX_TYPE, &iov);
      
      On successful completion, iov.len will be updated by the kernel specifying
      how much the kernel has written/read to/from the user's iov.buf.
      
      x86 extended state registers are primarily exported using this interface.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100211195614.886724710@sbs-t61.sc.intel.com>
      Acked-by: NHongjiu Lu <hjl.tools@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2225a122
  17. 24 9月, 2009 1 次提交
    • O
      ptrace: __ptrace_detach: do __wake_up_parent() if we reap the tracee · a7f0765e
      Oleg Nesterov 提交于
      The bug is old, it wasn't cause by recent changes.
      
      Test case:
      
      	static void *tfunc(void *arg)
      	{
      		int pid = (long)arg;
      
      		assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0);
      		kill(pid, SIGKILL);
      
      		sleep(1);
      		return NULL;
      	}
      
      	int main(void)
      	{
      		pthread_t th;
      		long pid = fork();
      
      		if (!pid)
      			pause();
      
      		signal(SIGCHLD, SIG_IGN);
      		assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0);
      
      		int r = waitpid(-1, NULL, __WNOTHREAD);
      		printf("waitpid: %d %m\n", r);
      
      		return 0;
      	}
      
      Before the patch this program hangs, after this patch waitpid() correctly
      fails with errno == -ECHILD.
      
      The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its
      ->real_parent is our sub-thread and we ignore SIGCHLD.  But in this case
      we should wake up other threads which can sleep in do_wait().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7f0765e
  18. 07 7月, 2009 1 次提交
  19. 24 6月, 2009 1 次提交
  20. 19 6月, 2009 5 次提交
  21. 05 6月, 2009 1 次提交
    • O
      ptrace: revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic" · edaba2c5
      Oleg Nesterov 提交于
      Commit 95a3540d ("ptrace_detach: the wrong
      wakeup breaks the ERESTARTxxx logic") removed the "extra"
      wake_up_process() from ptrace_detach(), but as Jan pointed out this breaks
      the compatibility.
      
      I believe the changelog is right and this wake_up() is wrong in many
      ways, but GDB assumes that ptrace(PTRACE_DETACH, child, 0, 0) always
      wakes up the tracee.
      
      Despite the fact this breaks SIGNAL_STOP_STOPPED/group_stop_count logic,
      and despite the fact this wake_up_process() can break another
      assumption: PTRACE_DETACH with SIGSTOP should leave the tracee in
      TASK_STOPPED case.  Because the untraced child can dequeue SIGSTOP and
      call do_signal_stop() before ptrace_detach() calls wake_up_process().
      
      Revert this change for now.  We need some fixes even if we we want to keep
      the current behaviour, but these fixes are not for 2.6.30.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      edaba2c5
  22. 11 5月, 2009 1 次提交
  23. 27 4月, 2009 1 次提交
  24. 14 4月, 2009 1 次提交
  25. 09 4月, 2009 1 次提交
  26. 07 4月, 2009 1 次提交