1. 23 3月, 2011 9 次提交
    • T
      ptrace: Clean transitions between TASK_STOPPED and TRACED · d79fdd6d
      Tejun Heo 提交于
      Currently, if the task is STOPPED on ptrace attach, it's left alone
      and the state is silently changed to TRACED on the next ptrace call.
      The behavior breaks the assumption that arch_ptrace_stop() is called
      before any task is poked by ptrace and is ugly in that a task
      manipulates the state of another task directly.
      
      With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
      TRACED can be made clean.  The tracer can use the flag to tell the
      tracee to retry stop on attach and detach.  On retry, the tracee will
      enter the desired state in the correct way.  The lower 16bits of
      task->group_stop is used to remember the signal number which caused
      the last group stop.  This is used while retrying for ptrace attach as
      the original group_exit_code could have been consumed with wait(2) by
      then.
      
      As the real parent may wait(2) and consume the group_exit_code
      anytime, the group_exit_code needs to be saved separately so that it
      can be used when switching from regular sleep to ptrace_stop().  This
      is recorded in the lower 16bits of task->group_stop.
      
      If a task is already stopped and there's no intervening SIGCONT, a
      ptrace request immediately following a successful PTRACE_ATTACH should
      always succeed even if the tracer doesn't wait(2) for attach
      completion; however, with this change, the tracee might still be
      TASK_RUNNING trying to enter TASK_TRACED which would cause the
      following request to fail with -ESRCH.
      
      This intermediate state is hidden from the ptracer by setting
      GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
      for it to clear on its signal->wait_chldexit.  Completing the
      transition or getting killed clears TRAPPING and wakes up the tracer.
      
      Note that the STOPPED -> RUNNING -> TRACED transition is still visible
      to other threads which are in the same group as the ptracer and the
      reverse transition is visible to all.  Please read the comments for
      details.
      
      Oleg:
      
      * Spotted a race condition where a task may retry group stop without
        proper bookkeeping.  Fixed by redoing bookkeeping on retry.
      
      * Spotted that the transition is visible to userland in several
        different ways.  Most are fixed with GROUP_STOP_TRAPPING.  Unhandled
        corner case is documented.
      
      * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
        task would result in more consistent behavior.
      
      * Pointed out that calling ptrace_stop() from do_signal_stop() in
        TASK_STOPPED can race with group stop start logic and then confuse
        the TRAPPING wait in ptrace_check_attach().  ptrace_stop() is now
        called with TASK_RUNNING.
      
      * Suggested using signal->wait_chldexit instead of bit wait.
      
      * Spotted a race condition between TRACED transition and clearing of
        TRAPPING.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      d79fdd6d
    • T
      ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced · 5224fa36
      Tejun Heo 提交于
      A ptraced task would still stop at do_signal_stop() when it's stopping
      for stop signals and do_signal_stop() behaves the same whether the
      task is ptraced or not.  However, in addition to stopping,
      ptrace_stop() also does ptrace specific stuff like calling
      architecture specific callbacks, so this behavior makes the code more
      fragile and difficult to understand.
      
      This patch makes do_signal_stop() test whether the task is ptraced and
      use ptrace_stop() if so.  This renders tracehook_notify_jctl() rather
      pointless as the ptrace notification is now handled by ptrace_stop()
      regardless of the return value from the tracehook.  It probably is a
      good idea to update it.
      
      This doesn't solve the whole problem as tasks already in stopped state
      would stay in the regular stop when ptrace attached.  That part will
      be handled by the next patch.
      
      Oleg pointed out that this makes a userland-visible change.  Before,
      SIGCONT would be able to wake up a task in group stop even if the task
      is ptraced if the tracer hasn't issued another ptrace command
      afterwards (as the next ptrace commands transitions the state into
      TASK_TRACED which ignores SIGCONT wakeups).  With this and the next
      patch, SIGCONT may race with the transition into TASK_TRACED and is
      ignored if the tracee already entered TASK_TRACED.
      
      Another userland visible change of this and the next patch is that the
      ptracee's state would now be TASK_TRACED where it used to be
      TASK_STOPPED, which is visible via fs/proc.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      5224fa36
    • T
      ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop · 0ae8ce1c
      Tejun Heo 提交于
      Currently, ptrace_stop() unconditionally participates in group stop
      bookkeeping.  This is unnecessary and inaccurate.  Make it only
      participate if the task is trapping for group stop - ie. if @why is
      CLD_STOPPED.  As ptrace_stop() currently is not used when trapping for
      group stop, this equals to disabling group stop participation from
      ptrace_stop().
      
      A visible behavior change is increased likelihood of delayed group
      stop completion if the thread group contains one or more ptraced
      tasks.
      
      This is to preapre for further cleanup of the interaction between
      group stop and ptrace.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      0ae8ce1c
    • T
      signal: Use GROUP_STOP_PENDING to stop once for a single group stop · 39efa3ef
      Tejun Heo 提交于
      Currently task->signal->group_stop_count is used to decide whether to
      stop for group stop.  However, if there is a task in the group which
      is taking a long time to stop, other tasks which are continued by
      ptrace would repeatedly stop for the same group stop until the group
      stop is complete.
      
      Conversely, if a ptraced task is in TASK_TRACED state, the debugger
      won't get notified of group stops which is inconsistent compared to
      the ptraced task in any other state.
      
      This patch introduces GROUP_STOP_PENDING which tracks whether a task
      is yet to stop for the group stop in progress.  The flag is set when a
      group stop starts and cleared when the task stops the first time for
      the group stop, and consulted whenever whether the task should
      participate in a group stop needs to be determined.  Note that now
      tasks in TASK_TRACED also participate in group stop.
      
      This results in the following behavior changes.
      
      * For a single group stop, a ptracer would see at most one stop
        reported.
      
      * A ptracee in TASK_TRACED now also participates in group stop and the
        tracer would get the notification.  However, as a ptraced task could
        be in TASK_STOPPED state or any ptrace trap could consume group
        stop, the notification may still be missing.  These will be
        addressed with further patches.
      
      * A ptracee may start a group stop while one is still in progress if
        the tracer let it continue with stop signal delivery.  Group stop
        code handles this correctly.
      
      Oleg:
      
      * Spotted that a task might skip signal check even when its
        GROUP_STOP_PENDING is set.  Fixed by updating
        recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
        group_stop_count.
      
      * Pointed out that task->group_stop should be cleared whenever
        task->signal->group_stop_count is cleared.  Fixed accordingly.
      
      * Pointed out the behavior inconsistency between TASK_TRACED and
        RUNNING and the last behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      39efa3ef
    • T
      signal: Fix premature completion of group stop when interfered by ptrace · e5c1902e
      Tejun Heo 提交于
      task->signal->group_stop_count is used to track the progress of group
      stop.  It's initialized to the number of tasks which need to stop for
      group stop to finish and each stopping or trapping task decrements.
      However, each task doesn't keep track of whether it decremented the
      counter or not and if woken up before the group stop is complete and
      stops again, it can decrement the counter multiple times.
      
      Please consider the following example code.
      
       static void *worker(void *arg)
       {
      	 while (1) ;
      	 return NULL;
       }
      
       int main(void)
       {
      	 pthread_t thread;
      	 pid_t pid;
      	 int i;
      
      	 pid = fork();
      	 if (!pid) {
      		 for (i = 0; i < 5; i++)
      			 pthread_create(&thread, NULL, worker, NULL);
      		 while (1) ;
      		 return 0;
      	 }
      
      	 ptrace(PTRACE_ATTACH, pid, NULL, NULL);
      	 while (1) {
      		 waitid(P_PID, pid, NULL, WSTOPPED);
      		 ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
      	 }
      	 return 0;
       }
      
      The child creates five threads and the parent continuously traps the
      first thread and whenever the child gets a signal, SIGSTOP is
      delivered.  If an external process sends SIGSTOP to the child, all
      other threads in the process should reliably stop.  However, due to
      the above bug, the first thread will often end up consuming
      group_stop_count multiple times and SIGSTOP often ends up stopping
      none or part of the other four threads.
      
      This patch adds a new field task->group_stop which is protected by
      siglock and uses GROUP_STOP_CONSUME flag to track which task is still
      to consume group_stop_count to fix this bug.
      
      task_clear_group_stop_pending() and task_participate_group_stop() are
      added to help manipulating group stop states.  As ptrace_stop() now
      also uses task_participate_group_stop(), it will set
      SIGNAL_STOP_STOPPED if it completes a group stop.
      
      There still are many issues regarding the interaction between group
      stop and ptrace.  Patches to address them will follow.
      
      - Oleg spotted duplicate GROUP_STOP_CONSUME.  Dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      e5c1902e
    • T
      ptrace: Add @why to ptrace_stop() · fe1bc6a0
      Tejun Heo 提交于
      To prepare for cleanup of the interaction between group stop and
      ptrace, add @why to ptrace_stop().  Existing users are updated such
      that there is no behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRoland McGrath <roland@redhat.com>
      fe1bc6a0
    • T
      ptrace: Kill tracehook_notify_jctl() · edf2ed15
      Tejun Heo 提交于
      tracehook_notify_jctl() aids in determining whether and what to report
      to the parent when a task is stopped or continued.  The function also
      adds an extra requirement that siglock may be released across it,
      which is currently unused and quite difficult to satisfy in
      well-defined manner.
      
      As job control and the notifications are about to receive major
      overhaul, remove the tracehook and open code it.  If ever necessary,
      let's factor it out after the overhaul.
      
      * Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
        Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      edf2ed15
    • T
      signal: Remove superflous try_to_freeze() loop in do_signal_stop() · 71db5eb9
      Tejun Heo 提交于
      do_signal_stop() is used only by get_signal_to_deliver() and after a
      successful signal stop, it always calls try_to_freeze(), so the
      try_to_freeze() loop around schedule() in do_signal_stop() is
      superflous and confusing.  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      71db5eb9
    • T
      signal: Fix SIGCONT notification code · c672af35
      Tejun Heo 提交于
      After a task receives SIGCONT, its parent is notified via SIGCHLD with
      its siginfo describing what the notified event is.  If SIGCONT is
      received while the child process is stopped, the code should be
      CLD_CONTINUED.  If SIGCONT is recieved while the child process is in
      the process of being stopped, it should be CLD_STOPPED.  Which code to
      use is determined in prepare_signal() and recorded in signal->flags
      using SIGNAL_CLD_CONTINUED|STOP flags.
      
      get_signal_deliver() should test these flags and then notify
      accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
      instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
      CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
      and CLD_STOPPED if the state was fetched already.
      
      Fix it by testing SIGNAL_CLD_CONTINUED.  While at it, uncompress the
      ?: test into if/else clause for better readability.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      c672af35
  2. 22 3月, 2011 1 次提交
    • J
      Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal code · da48524e
      Julien Tinnes 提交于
      Userland should be able to trust the pid and uid of the sender of a
      signal if the si_code is SI_TKILL.
      
      Unfortunately, the kernel has historically allowed sigqueueinfo() to
      send any si_code at all (as long as it was negative - to distinguish it
      from kernel-generated signals like SIGILL etc), so it could spoof a
      SI_TKILL with incorrect siginfo values.
      
      Happily, it looks like glibc has always set si_code to the appropriate
      SI_QUEUE, so there are probably no actual user code that ever uses
      anything but the appropriate SI_QUEUE flag.
      
      So just tighten the check for si_code (we used to allow any negative
      value), and add a (one-time) warning in case there are binaries out
      there that might depend on using other si_code values.
      Signed-off-by: NJulien Tinnes <jln@google.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da48524e
  3. 28 10月, 2010 2 次提交
  4. 07 10月, 2010 1 次提交
    • A
      HWPOISON: Copy si_addr_lsb to user · a337fdac
      Andi Kleen 提交于
      The original hwpoison code added a new siginfo field si_addr_lsb to
      pass the granuality of the fault address to user space. Unfortunately
      this field was never copied to user space. Fix this here.
      
      I added explicit checks for the MCEERR codes to avoid having
      to patch all potential callers to initialize the field.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      a337fdac
  5. 05 8月, 2010 1 次提交
  6. 28 5月, 2010 2 次提交
  7. 21 5月, 2010 1 次提交
  8. 07 3月, 2010 1 次提交
  9. 04 3月, 2010 1 次提交
    • L
      Prioritize synchronous signals over 'normal' signals · a27341cd
      Linus Torvalds 提交于
      This makes sure that we pick the synchronous signals caused by a
      processor fault over any pending regular asynchronous signals sent to
      use by [t]kill().
      
      This is not strictly required semantics, but it makes it _much_ easier
      for programs like Wine that expect to find the fault information in the
      signal stack.
      
      Without this, if a non-synchronous signal gets picked first, the delayed
      asynchronous signal will have its signal context pointing to the new
      signal invocation, rather than the instruction that caused the SIGSEGV
      or SIGBUS in the first place.
      
      This is not all that pretty, and we're discussing making the synchronous
      signals more explicit rather than have these kinds of implicit
      preferences of SIGSEGV and friends.  See for example
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=15395
      
      for some of the discussion.  But in the meantime this is a simple and
      fairly straightforward work-around, and the whole
      
      	if (x & Y)
      		x &= Y;
      
      thing can be compiled into (and gcc does do it) just three instructions:
      
      	movq    %rdx, %rax
      	andl    $Y, %eax
      	cmovne  %rax, %rdx
      
      so it is at least a simple solution to a subtle issue.
      Reported-and-tested-by: NPavel Vilim <wylda@volny.cz>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a27341cd
  10. 12 1月, 2010 1 次提交
    • A
      kernel/signal.c: fix kernel information leak with print-fatal-signals=1 · b45c6e76
      Andi Kleen 提交于
      When print-fatal-signals is enabled it's possible to dump any memory
      reachable by the kernel to the log by simply jumping to that address from
      user space.
      
      Or crash the system if there's some hardware with read side effects.
      
      The fatal signals handler will dump 16 bytes at the execution address,
      which is fully controlled by ring 3.
      
      In addition when something jumps to a unmapped address there will be up to
      16 additional useless page faults, which might be potentially slow (and at
      least is not very efficient)
      
      Fortunately this option is off by default and only there on i386.
      
      But fix it by checking for kernel addresses and also stopping when there's
      a page fault.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b45c6e76
  11. 16 12月, 2009 5 次提交
  12. 11 12月, 2009 2 次提交
  13. 26 11月, 2009 3 次提交
  14. 09 11月, 2009 1 次提交
    • N
      signal: Print warning message when dropping signals · f84d49b2
      Naohiro Ooiwa 提交于
      When the system has too many timers or too many aggregate
      queued signals, the EAGAIN error is returned to application
      from kernel, including timer_create() [POSIX.1b].
      
      It means that the app exceeded the limit of pending signals,
      but in general application writers do not expect this
      outcome and the current silent failure can cause rare app
      failures under very high load.
      
      This patch adds a new message when we reach the limit
      and if print_fatal_signals is enabled:
      
          task/1234: reached RLIMIT_SIGPENDING, dropping signal
      
      If you see this message and your system behaved unexpectedly,
      you can run following command to lift the limit:
      
         # ulimit -i unlimited
      
      With help from Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>.
      Signed-off-by: NNaohiro Ooiwa <nooiwa@miraclelinux.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: oleg@redhat.com
      LKML-Reference: <4AF6E7E2.9080406@miraclelinux.com>
      [ Modified a few small details, gave surrounding code some love. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f84d49b2
  15. 24 9月, 2009 4 次提交
  16. 02 8月, 2009 2 次提交
    • L
      do_sigaltstack: small cleanups · 0dd8486b
      Linus Torvalds 提交于
      The previous commit ("do_sigaltstack: avoid copying 'stack_t' as a
      structure to user space") fixed a real bug.  This one just cleans up the
      copy from user space to that gcc can generate better code for it (and so
      that it looks the same as the later copy back to user space).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dd8486b
    • L
      do_sigaltstack: avoid copying 'stack_t' as a structure to user space · 0083fc2c
      Linus Torvalds 提交于
      Ulrich Drepper correctly points out that there is generally padding in
      the structure on 64-bit hosts, and that copying the structure from
      kernel to user space can leak information from the kernel stack in those
      padding bytes.
      
      Avoid the whole issue by just copying the three members one by one
      instead, which also means that the function also can avoid the need for
      a stack frame.  This also happens to match how we copy the new structure
      from user space, so it all even makes sense.
      
      [ The obvious solution of adding a memset() generates horrid code, gcc
        does really stupid things. ]
      Reported-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0083fc2c
  17. 19 6月, 2009 2 次提交
  18. 15 6月, 2009 1 次提交
    • V
      signal: fix __send_signal() false positive kmemcheck warning · 7a0aeb14
      Vegard Nossum 提交于
      This false positive is due to field padding in struct sigqueue. When
      this dynamically allocated structure is copied to the stack (in arch-
      specific delivery code), kmemcheck sees a read from the padding, which
      is, naturally, uninitialized.
      
      Hide the false positive using the __GFP_NOTRACK_FALSE_POSITIVE flag.
      Also made the rlimit override code a bit clearer by introducing a new
      variable.
      
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      7a0aeb14