1. 16 7月, 2015 1 次提交
    • T
      seccomp: add ptrace options for suspend/resume · 13c4a901
      Tycho Andersen 提交于
      This patch is the first step in enabling checkpoint/restore of processes
      with seccomp enabled.
      
      One of the things CRIU does while dumping tasks is inject code into them
      via ptrace to collect information that is only available to the process
      itself. However, if we are in a seccomp mode where these processes are
      prohibited from making these syscalls, then what CRIU does kills the task.
      
      This patch adds a new ptrace option, PTRACE_O_SUSPEND_SECCOMP, that enables
      a task from the init user namespace which has CAP_SYS_ADMIN and no seccomp
      filters to disable (and re-enable) seccomp filters for another task so that
      they can be successfully dumped (and restored). We restrict the set of
      processes that can disable seccomp through ptrace because although today
      ptrace can be used to bypass seccomp, there is some discussion of closing
      this loophole in the future and we would like this patch to not depend on
      that behavior and be future proofed for when it is removed.
      
      Note that seccomp can be suspended before any filters are actually
      installed; this behavior is useful on criu restore, so that we can suspend
      seccomp, restore the filters, unmap our restore code from the restored
      process' address space, and then resume the task by detaching and have the
      filters resumed as well.
      
      v2 changes:
      
      * require that the tracer have no seccomp filters installed
      * drop TIF_NOTSC manipulation from the patch
      * change from ptrace command to a ptrace option and use this ptrace option
        as the flag to check. This means that as soon as the tracer
        detaches/dies, seccomp is re-enabled and as a corrollary that one can not
        disable seccomp across PTRACE_ATTACHs.
      
      v3 changes:
      
      * get rid of various #ifdefs everywhere
      * report more sensible errors when PTRACE_O_SUSPEND_SECCOMP is incorrectly
        used
      
      v4 changes:
      
      * get rid of may_suspend_seccomp() in favor of a capable() check in ptrace
        directly
      
      v5 changes:
      
      * check that seccomp is not enabled (or suspended) on the tracer
      Signed-off-by: NTycho Andersen <tycho.andersen@canonical.com>
      CC: Will Drewry <wad@chromium.org>
      CC: Roland McGrath <roland@hack.frob.com>
      CC: Pavel Emelyanov <xemul@parallels.com>
      CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      [kees: access seccomp.mode through seccomp_mode() instead]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      13c4a901
  2. 17 4月, 2015 2 次提交
    • O
      ptrace: ptrace_detach() can no longer race with SIGKILL · 64a4096c
      Oleg Nesterov 提交于
      ptrace_detach() re-checks ->ptrace under tasklist lock and calls
      release_task() if __ptrace_detach() returns true.  This was needed because
      the __TASK_TRACED tracee could be killed/untraced, and it could even pass
      exit_notify() before we take tasklist_lock.
      
      But this is no longer possible after 9899d11f "ptrace: ensure
      arch_ptrace/ptrace_request can never race with SIGKILL".  We can turn
      these checks into WARN_ON() and remove release_task().
      
      While at it, document the setting of child->exit_code.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Pavel Labath <labath@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64a4096c
    • O
      ptrace: fix race between ptrace_resume() and wait_task_stopped() · b72c1869
      Oleg Nesterov 提交于
      ptrace_resume() is called when the tracee is still __TASK_TRACED.  We set
      tracee->exit_code and then wake_up_state() changes tracee->state.  If the
      tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
      wrongly looks like another report from tracee.
      
      This confuses debugger, and since wait_task_stopped() clears ->exit_code
      the tracee can miss a signal.
      
      Test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/wait.h>
      	#include <sys/ptrace.h>
      	#include <pthread.h>
      	#include <assert.h>
      
      	int pid;
      
      	void *waiter(void *arg)
      	{
      		int stat;
      
      		for (;;) {
      			assert(pid == wait(&stat));
      			assert(WIFSTOPPED(stat));
      			if (WSTOPSIG(stat) == SIGHUP)
      				continue;
      
      			assert(WSTOPSIG(stat) == SIGCONT);
      			printf("ERR! extra/wrong report:%x\n", stat);
      		}
      	}
      
      	int main(void)
      	{
      		pthread_t thread;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			for (;;)
      				kill(getpid(), SIGHUP);
      		}
      
      		assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
      
      		for (;;)
      			ptrace(PTRACE_CONT, pid, 0, SIGCONT);
      
      		return 0;
      	}
      
      Note for stable: the bug is very old, but without 9899d11f "ptrace:
      ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
      should use lock_task_sighand(child).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NPavel Labath <labath@google.com>
      Tested-by: NPavel Labath <labath@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b72c1869
  3. 18 2月, 2015 1 次提交
  4. 11 12月, 2014 1 次提交
  5. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  6. 06 3月, 2014 1 次提交
  7. 13 11月, 2013 1 次提交
    • K
      exec/ptrace: fix get_dumpable() incorrect tests · d049f74f
      Kees Cook 提交于
      The get_dumpable() return value is not boolean.  Most users of the
      function actually want to be testing for non-SUID_DUMP_USER(1) rather than
      SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
      protected state.  Almost all places did this correctly, excepting the two
      places fixed in this patch.
      
      Wrong logic:
          if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
              or
          if (dumpable == 0) { /* be protective */ }
              or
          if (!dumpable) { /* be protective */ }
      
      Correct logic:
          if (dumpable != SUID_DUMP_USER) { /* be protective */ }
              or
          if (dumpable != 1) { /* be protective */ }
      
      Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
      user was able to ptrace attach to processes that had dropped privileges to
      that user.  (This may have been partially mitigated if Yama was enabled.)
      
      The macros have been moved into the file that declares get/set_dumpable(),
      which means things like the ia64 code can see them too.
      
      CVE-2013-2929
      Reported-by: NVasily Kulikov <segoon@openwall.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d049f74f
  8. 12 9月, 2013 1 次提交
  9. 07 8月, 2013 1 次提交
  10. 10 7月, 2013 2 次提交
    • O
      ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child) · fab840fc
      Oleg Nesterov 提交于
      Change ptrace_detach() to call flush_ptrace_hw_breakpoint(child).  This
      frees the slots for non-ptrace PERF_TYPE_BREAKPOINT users, and this
      ensures that the tracee won't be killed by SIGTRAP triggered by the
      active breakpoints.
      
      Test-case:
      
      	unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
      	{
      		unsigned long dr7;
      
      		dr7 = ((len | type) & 0xf)
      			<< (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
      		if (enable)
      			dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
      		return dr7;
      	}
      
      	int write_dr(int pid, int dr, unsigned long val)
      	{
      		return ptrace(PTRACE_POKEUSER, pid,
      				offsetof (struct user, u_debugreg[dr]),
      				val);
      	}
      
      	void func(void)
      	{
      	}
      
      	int main(void)
      	{
      		int pid, stat;
      		unsigned long dr7;
      
      		pid = fork();
      		if (!pid) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			kill(getpid(), SIGHUP);
      
      			func();
      			return 0x13;
      		}
      
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(WSTOPSIG(stat) == SIGHUP);
      
      		assert(write_dr(pid, 0, (long)func) == 0);
      		dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
      		assert(write_dr(pid, 7, dr7) == 0);
      
      		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
      		assert(pid == waitpid(-1, &stat, 0));
      		assert(stat == 0x1300);
      
      		return 0;
      	}
      
      Before this patch the child is killed after PTRACE_DETACH.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fab840fc
    • O
      ptrace: revert "Prepare to fix racy accesses on task breakpoints" · 7c8df286
      Oleg Nesterov 提交于
      This reverts commit bf26c018 ("Prepare to fix racy accesses on task
      breakpoints").
      
      The patch was fine but we can no longer race with SIGKILL after commit
      9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race
      with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
      ->ptrace_bps[] can't go away.
      
      Now that ptrace_get_breakpoints/ptrace_put_breakpoints have no callers,
      we can kill them and remove task->ptrace_bp_refcnt.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMichael Neuling <mikey@neuling.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c8df286
  11. 04 7月, 2013 1 次提交
    • A
      ptrace: add ability to get/set signal-blocked mask · 29000cae
      Andrey Vagin 提交于
      crtools uses a parasite code for dumping processes.  The parasite code is
      injected into a process with help PTRACE_SEIZE.
      
      Currently crtools blocks signals from a parasite code.  If a process has
      pending signals, crtools wait while a process handles these signals.
      
      This method is not suitable for stopped tasks.  A stopped task can have a
      few pending signals, when we will try to execute a parasite code, we will
      need to drop SIGSTOP, but all other signals must remain pending, because a
      state of processes must not be changed during checkpointing.
      
      This patch adds two ptrace commands to set/get signal-blocked mask.
      
      I think gdb can use this commands too.
      
      [akpm@linux-foundation.org: be consistent with brace layout]
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29000cae
  12. 30 6月, 2013 1 次提交
  13. 08 5月, 2013 1 次提交
  14. 01 5月, 2013 1 次提交
    • A
      ptrace: add ability to retrieve signals without removing from a queue (v4) · 84c751bd
      Andrey Vagin 提交于
      This patch adds a new ptrace request PTRACE_PEEKSIGINFO.
      
      This request is used to retrieve information about pending signals
      starting with the specified sequence number.  Siginfo_t structures are
      copied from the child into the buffer starting at "data".
      
      The argument "addr" is a pointer to struct ptrace_peeksiginfo_args.
      struct ptrace_peeksiginfo_args {
      	u64 off;	/* from which siginfo to start */
      	u32 flags;
      	s32 nr;		/* how may siginfos to take */
      };
      
      "nr" has type "s32", because ptrace() returns "long", which has 32 bits on
      i386 and a negative values is used for errors.
      
      Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping
      signals from process-wide queue.  If this flag is not set, signals are
      read from a per-thread queue.
      
      The request PTRACE_PEEKSIGINFO returns a number of dumped signals.  If a
      signal with the specified sequence number doesn't exist, ptrace returns
      zero.  The request returns an error, if no signal has been dumped.
      
      Errors:
      EINVAL - one or more specified flags are not supported or nr is negative
      EFAULT - buf or addr is outside your accessible address space.
      
      A result siginfo contains a kernel part of si_code which usually striped,
      but it's required for queuing the same siginfo back during restore of
      pending signals.
      
      This functionality is required for checkpointing pending signals.  Pedro
      Alves suggested using it in "gdb" to peek at pending signals.  gdb already
      uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already
      dequeued.  This functionality allows gdb to look at the pending signals
      which were not reported yet.
      
      The prototype of this code was developed by Oleg Nesterov.
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Alves <palves@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c751bd
  15. 09 2月, 2013 1 次提交
    • J
      uprobes: Add exports for module use · e8440c14
      Josh Stone 提交于
      The original pull message for uprobes (commit 654443e2) noted:
      
        This tree includes uprobes support in 'perf probe' - but SystemTap
        (and other tools) can take advantage of user probe points as well.
      
      In order to actually be usable in module-based tools like SystemTap, the
      interface needs to be exported.  This patch first adds the obvious
      exports for uprobe_register and uprobe_unregister.  Then it also adds
      one for task_user_regset_view, which is necessary to get the correct
      state of userspace registers.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      e8440c14
  16. 23 1月, 2013 2 次提交
    • O
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 9899d11f
      Oleg Nesterov 提交于
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      Reported-by: NSalman Qazi <sqazi@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9899d11f
    • O
      ptrace: introduce signal_wake_up_state() and ptrace_signal_wake_up() · 910ffdb1
      Oleg Nesterov 提交于
      Cleanup and preparation for the next change.
      
      signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
      actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
      necessary mask.
      
      Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
      signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
      which adds __TASK_TRACED.
      
      This way ptrace_signal_wake_up() can work "inside" ptrace_request()
      even if the tracee doesn't have the TASK_WAKEKILL bit set.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910ffdb1
  17. 21 1月, 2013 1 次提交
  18. 18 12月, 2012 1 次提交
    • O
      ptrace: introduce PTRACE_O_EXITKILL · 992fb6e1
      Oleg Nesterov 提交于
      Ptrace jailers want to be sure that the tracee can never escape
      from the control. However if the tracer dies unexpectedly the
      tracee continues to run in potentially unsafe mode.
      
      Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
      it sends SIGKILL to every tracee which has this bit set.
      
      Note that the new option is not equal to the last-option << 1.  Because
      currently all options have an event, and the new one starts the eventless
      group.  It uses the random 20 bit, so we have the room for 12 more events,
      but we can also add the new eventless options below this one.
      
      Suggested by Amnon Shiloh.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: Chris Evans <scarybeasts@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      992fb6e1
  19. 20 11月, 2012 1 次提交
  20. 03 8月, 2012 1 次提交
  21. 03 5月, 2012 1 次提交
  22. 08 4月, 2012 1 次提交
  23. 24 3月, 2012 4 次提交
  24. 06 1月, 2012 2 次提交
    • E
      ptrace: do not audit capability check when outputing /proc/pid/stat · 69f594a3
      Eric Paris 提交于
      Reading /proc/pid/stat of another process checks if one has ptrace permissions
      on that process.  If one does have permissions it outputs some data about the
      process which might have security and attack implications.  If the current
      task does not have ptrace permissions the read still works, but those fields
      are filled with inocuous (0) values.  Since this check and a subsequent denial
      is not a violation of the security policy we should not audit such denials.
      
      This can be quite useful to removing ptrace broadly across a system without
      flooding the logs when ps is run or something which harmlessly walks proc.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      69f594a3
    • E
      capabilities: remove task_ns_* functions · f1c84dae
      Eric Paris 提交于
      task_ in the front of a function, in the security subsystem anyway, means
      to me at least, that we are operating with that task as the subject of the
      security decision.  In this case what it means is that we are using current as
      the subject but we use the task to get the right namespace.  Who in the world
      would ever realize that's what task_ns_capability means just by the name?  This
      patch eliminates the task_ns functions entirely and uses the has_ns_capability
      function instead.  This means we explicitly open code the ns in question in
      the caller.  I think it makes the caller a LOT more clear what is going on.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Acked-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      f1c84dae
  25. 05 1月, 2012 1 次提交
    • O
      ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b
      Oleg Nesterov 提交于
      This is the temporary simple fix for 3.2, we need more changes in this
      area.
      
      1. do_signal_stop() assumes that the running untraced thread in the
         stopped thread group is not possible. This was our goal but it is
         not yet achieved: a stopped-but-resumed tracee can clone the running
         thread which can initiate another group-stop.
      
         Remove WARN_ON_ONCE(!current->ptrace).
      
      2. A new thread always starts with ->jobctl = 0. If it is auto-attached
         and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
         but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
         in do_jobctl_trap() if another debugger attaches.
      
         Change __ptrace_unlink() to set the artificial SIGSTOP for report.
      
         Alternatively we could change ptrace_init_task() to copy signr from
         current, but this means we can copy it for no reason and hide the
         possible similar problems.
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.1]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a88951b
  26. 31 10月, 2011 1 次提交
  27. 26 9月, 2011 1 次提交
  28. 19 7月, 2011 1 次提交
    • V
      connector: add an event for monitoring process tracers · f701e5b7
      Vladimir Zapolskiy 提交于
      This change adds a procfs connector event, which is emitted on every
      successful process tracer attach or detach.
      
      If some process connects to other one, kernelspace connector reports
      process id and thread group id of both these involved processes. On
      disconnection null process id is returned.
      
      Such an event allows to create a simple automated userspace mechanism
      to be aware about processes connecting to others, therefore predefined
      process policies can be applied to them if needed.
      
      Note, a detach signal is emitted only in case, if a tracer process
      explicitly executes PTRACE_DETACH request. In other cases like tracee
      or tracer exit detach event from proc connector is not reported.
      Signed-off-by: NVladimir Zapolskiy <vzapolskiy@gmail.com>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f701e5b7
  29. 28 6月, 2011 2 次提交
  30. 17 6月, 2011 3 次提交
    • T
      ptrace: implement PTRACE_LISTEN · 544b2c91
      Tejun Heo 提交于
      The previous patch implemented async notification for ptrace but it
      only worked while trace is running.  This patch introduces
      PTRACE_LISTEN which is suggested by Oleg Nestrov.
      
      It's allowed iff tracee is in STOP trap and puts tracee into
      quasi-running state - tracee never really runs but wait(2) and
      ptrace(2) consider it to be running.  While ptracer is listening,
      tracee is allowed to re-enter STOP to notify an async event.
      Listening state is cleared on the first notification.  Ptracer can
      also clear it by issuing INTERRUPT - tracee will re-trap into STOP
      with listening state cleared.
      
      This allows ptracer to monitor group stop state without running tracee
      - use INTERRUPT to put tracee into STOP trap, issue LISTEN and then
      wait(2) to wait for the next group stop event.  When it happens,
      PTRACE_GETSIGINFO provides information to determine the current state.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_INTERRUPT	0x4207
        #define PTRACE_LISTEN		0x4208
      
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee, tracer;
      	  int i;
      
      	  tracee = fork();
      	  if (!tracee)
      		  while (1)
      			  pause();
      
      	  tracer = fork();
      	  if (!tracer) {
      		  siginfo_t si;
      
      		  ptrace(PTRACE_SEIZE, tracee, NULL,
      			 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      		  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
      	  repeat:
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      
      		  ptrace(PTRACE_GETSIGINFO, tracee, NULL, &si);
      		  if (!si.si_code) {
      			  printf("tracer: SIG %d\n", si.si_signo);
      			  ptrace(PTRACE_CONT, tracee, NULL,
      				 (void *)(unsigned long)si.si_signo);
      			  goto repeat;
      		  }
      		  printf("tracer: stopped=%d signo=%d\n",
      			 si.si_signo != SIGTRAP, si.si_signo);
      		  if (si.si_signo != SIGTRAP)
      			  ptrace(PTRACE_LISTEN, tracee, NULL, NULL);
      		  else
      			  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      		  goto repeat;
      	  }
      
      	  for (i = 0; i < 3; i++) {
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGSTOP\n");
      		  kill(tracee, SIGSTOP);
      		  nanosleep(&ts1s, NULL);
      		  printf("mother: SIGCONT\n");
      		  kill(tracee, SIGCONT);
      	  }
      	  nanosleep(&ts1s, NULL);
      
      	  kill(tracer, SIGKILL);
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      This is identical to the program to test TRAP_NOTIFY except that
      tracee is PTRACE_LISTEN'd instead of PTRACE_CONT'd when group stopped.
      This allows ptracer to monitor when group stop ends without running
      tracee.
      
        # ./test-listen
        tracer: stopped=0 signo=5
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
        mother: SIGSTOP
        tracer: SIG 19
        tracer: stopped=1 signo=19
        mother: SIGCONT
        tracer: stopped=0 signo=5
        tracer: SIG 18
      
      -v2: Moved JOBCTL_LISTENING check in wait_task_stopped() into
           task_stopped_code() as suggested by Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      544b2c91
    • T
      ptrace: implement PTRACE_INTERRUPT · fca26f26
      Tejun Heo 提交于
      Currently, there's no way to trap a running ptracee short of sending a
      signal which has various side effects.  This patch implements
      PTRACE_INTERRUPT which traps ptracee without any signal or job control
      related side effect.
      
      The implementation is almost trivial.  It uses the group stop trap -
      SIGTRAP | PTRACE_EVENT_STOP << 8.  A new trap flag
      JOBCTL_TRAP_INTERRUPT is added, which is set on PTRACE_INTERRUPT and
      cleared when any trap happens.  As INTERRUPT should be useable
      regardless of the current state of tracee, task_is_traced() test in
      ptrace_check_attach() is skipped for INTERRUPT.
      
      PTRACE_INTERRUPT is available iff tracee is attached with
      PTRACE_SEIZE.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_INTERRUPT	0x4207
      
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts100ms = { .tv_nsec = 100000000 };
        static const struct timespec ts1s = { .tv_sec = 1 };
        static const struct timespec ts3s = { .tv_sec = 3 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  while (1) {
      			  printf("tracee: alive pid=%d\n", getpid());
      			  nanosleep(&ts1s, NULL);
      		  }
      	  }
      
      	  if (argc > 1)
      		  kill(tracee, SIGSTOP);
      
      	  nanosleep(&ts100ms, NULL);
      
      	  ptrace(PTRACE_SEIZE, tracee, NULL,
      		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      	  if (argc > 1) {
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      	  }
      	  nanosleep(&ts3s, NULL);
      
      	  printf("tracer: INTERRUPT and DETACH\n");
      	  ptrace(PTRACE_INTERRUPT, tracee, NULL, NULL);
      	  waitid(P_PID, tracee, NULL, WSTOPPED);
      	  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
      	  nanosleep(&ts3s, NULL);
      
      	  printf("tracer: exiting\n");
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      When called without argument, tracee is seized from running state,
      interrupted and then detached back to running state.
      
        # ./test-interrupt
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracer: INTERRUPT and DETACH
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracee: alive pid=4546
        tracer: exiting
      
      When called with argument, tracee is seized from stopped state,
      continued, interrupted and then detached back to stopped state.
      
        # ./test-interrupt  1
        tracee: alive pid=4548
        tracee: alive pid=4548
        tracee: alive pid=4548
        tracer: INTERRUPT and DETACH
        tracer: exiting
      
      Before PTRACE_INTERRUPT, once the tracee was running, there was no way
      to trap tracee and do PTRACE_DETACH without causing side effect.
      
      -v2: Updated to use task_set_jobctl_pending() so that it doesn't end
           up scheduling TRAP_STOP if child is dying which may make the
           child unkillable.  Spotted by Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      fca26f26
    • T
      ptrace: implement PTRACE_SEIZE · 3544d72a
      Tejun Heo 提交于
      PTRACE_ATTACH implicitly issues SIGSTOP on attach which has side
      effects on tracee signal and job control states.  This patch
      implements a new ptrace request PTRACE_SEIZE which attaches a tracee
      without trapping it or affecting its signal and job control states.
      
      The usage is the same with PTRACE_ATTACH but it takes PTRACE_SEIZE_*
      flags in @data.  Currently, the only defined flag is
      PTRACE_SEIZE_DEVEL which is a temporary flag to enable PTRACE_SEIZE.
      PTRACE_SEIZE will change ptrace behaviors outside of attach itself.
      The changes will be implemented gradually and the DEVEL flag is to
      prevent programs which expect full SEIZE behavior from using it before
      all the behavior modifications are complete while allowing unit
      testing.  The flag will be removed once SEIZE behaviors are completely
      implemented.
      
      * PTRACE_SEIZE, unlike ATTACH, doesn't force tracee to trap.  After
        attaching tracee continues to run unless a trap condition occurs.
      
      * PTRACE_SEIZE doesn't affect signal or group stop state.
      
      * If PTRACE_SEIZE'd, group stop uses PTRACE_EVENT_STOP trap which uses
        exit_code of (signr | PTRACE_EVENT_STOP << 8) where signr is one of
        the stopping signals if group stop is in effect or SIGTRAP
        otherwise, and returns usual trap siginfo on PTRACE_GETSIGINFO
        instead of NULL.
      
      Seizing sets PT_SEIZED in ->ptrace of the tracee.  This flag will be
      used to determine whether new SEIZE behaviors should be enabled.
      
      Test program follows.
      
        #define PTRACE_SEIZE		0x4206
        #define PTRACE_SEIZE_DEVEL	0x80000000
      
        static const struct timespec ts100ms = { .tv_nsec = 100000000 };
        static const struct timespec ts1s = { .tv_sec = 1 };
        static const struct timespec ts3s = { .tv_sec = 3 };
      
        int main(int argc, char **argv)
        {
      	  pid_t tracee;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  while (1) {
      			  printf("tracee: alive\n");
      			  nanosleep(&ts1s, NULL);
      		  }
      	  }
      
      	  if (argc > 1)
      		  kill(tracee, SIGSTOP);
      
      	  nanosleep(&ts100ms, NULL);
      
      	  ptrace(PTRACE_SEIZE, tracee, NULL,
      		 (void *)(unsigned long)PTRACE_SEIZE_DEVEL);
      	  if (argc > 1) {
      		  waitid(P_PID, tracee, NULL, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, NULL);
      	  }
      	  nanosleep(&ts3s, NULL);
      	  printf("tracer: exiting\n");
      	  return 0;
        }
      
      When the above program is called w/o argument, tracee is seized while
      running and remains running.  When tracer exits, tracee continues to
      run and print out messages.
      
        # ./test-seize-simple
        tracee: alive
        tracee: alive
        tracee: alive
        tracer: exiting
        tracee: alive
        tracee: alive
      
      When called with an argument, tracee is seized from stopped state and
      continued, and returns to stopped state when tracer exits.
      
        # ./test-seize
        tracee: alive
        tracee: alive
        tracee: alive
        tracer: exiting
        # ps -el|grep test-seize
        1 T     0  4720     1  0  80   0 -   941 signal ttyS0    00:00:00 test-seize
      
      -v2: SEIZE doesn't schedule TRAP_STOP and leaves tracee running as Jan
           suggested.
      
      -v3: PTRACE_EVENT_STOP traps now report group stop state by signr.  If
           group stop is in effect the stop signal number is returned as
           part of exit_code; otherwise, SIGTRAP.  This was suggested by
           Denys and Oleg.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      3544d72a