1. 04 4月, 2011 3 次提交
    • O
      signal: Turn SIGNAL_STOP_DEQUEUED into GROUP_STOP_DEQUEUED · ee77f075
      Oleg Nesterov 提交于
      This patch moves SIGNAL_STOP_DEQUEUED from signal_struct->flags to
      task_struct->group_stop, and thus makes it per-thread.
      
      Like SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can be false-positive
      after return from get_signal_to_deliver(), this is fine. The only
      purpose of this bit is: we can drop ->siglock after __dequeue_signal()
      returns the sig_kernel_stop() signal and before we call
      do_signal_stop(), in this case we must not miss SIGCONT if it comes in
      between.
      
      But, unlike SIGNAL_STOP_DEQUEUED, GROUP_STOP_DEQUEUED can not be
      false-positive in do_signal_stop() if multiple threads dequeue the
      sig_kernel_stop() signal at the same time.
      
      Consider two threads T1 and T2, SIGTTIN has a hanlder.
      
      	- T1 dequeues SIGTSTP and sets SIGNAL_STOP_DEQUEUED, then
      	  it drops ->siglock
      
      	- SIGCONT comes and clears SIGNAL_STOP_DEQUEUED, SIGTSTP
      	  should be cancelled.
      
      	- T2 dequeues SIGTTIN and sets SIGNAL_STOP_DEQUEUED again.
      	  Since we have a handler we should not stop, T2 returns
      	  to usermode to run the handler.
      
      	- T1 continues, calls do_signal_stop() and wrongly starts
      	  the group stop because SIGNAL_STOP_DEQUEUED was restored
      	  in between.
      
      With or without this change:
      
      	- we need to do something with ptrace_signal() which can
      	  return SIGSTOP, but this needs another discussion
      
      	- SIGSTOP can be lost if it races with the mt exec, will
      	  be fixed later.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ee77f075
    • O
      signal: do_signal_stop: Remove the unneeded task_clear_group_stop_pending() · 780006ea
      Oleg Nesterov 提交于
      PF_EXITING or TASK_STOPPED has already called task_participate_group_stop()
      and cleared its ->group_stop. No need to do task_clear_group_stop_pending()
      when we start the new group stop.
      
      Add a small comment to explain the !task_is_stopped() check. Note that this
      check is not exactly right and it can lead to unnecessary stop later if the
      thread is TASK_PTRACED. What we need is task_participated_in_group_stop(),
      this will be solved later.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      780006ea
    • O
      signal: prepare_signal(SIGCONT) shouldn't play with TIF_SIGPENDING · 1deac632
      Oleg Nesterov 提交于
      prepare_signal(SIGCONT) should never set TIF_SIGPENDING or wake up
      the TASK_INTERRUPTIBLE threads. We are going to call complete_signal()
      which should pick the right thread correctly. All we need is to wake
      up the TASK_STOPPED threads.
      
      If the task was stopped, it can't return to usermode without taking
      ->siglock. Otherwise we don't care, and the spurious TIF_SIGPENDING
      can't be useful.
      
      The comment says:
      
      	* If there is a handler for SIGCONT, we must make
      	* sure that no thread returns to user mode before
      	* we post the signal
      
      It is not clear what this means. Probably, "when there's only a single
      thread" and this continues to be true. Otherwise, even if this SIGCONT
      is not private, with or without this change only one thread can dequeue
      SIGCONT, other threads can happily return to user mode before before
      that thread handles this signal.
      
      Note also that wake_up_state(t, __TASK_STOPPED) can't race with the task
      which changes its state, TASK_STOPPED state is protected by ->siglock as
      well.
      
      In short: when it comes to signal delivery, SIGCONT is the normal signal
      and does not need any special support.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      1deac632
  2. 23 3月, 2011 35 次提交
    • T
      job control: Don't send duplicate job control stop notification while ptraced · 244056f9
      Tejun Heo 提交于
      Just as group_exit_code shouldn't be generated when a PTRACE_CONT'd
      task re-enters job control stop, notifiction for the event should be
      suppressed too.  The logic is the same as the group_exit_code
      generation suppression in do_signal_stop(), if SIGNAL_STOP_STOPPED is
      already set, the task is re-entering job control stop without
      intervening SIGCONT and the notifications should be suppressed.
      
      Test case follows.
      
        #include <stdio.h>
        #include <unistd.h>
        #include <signal.h>
        #include <time.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        static const struct timespec ts100ms = { .tv_nsec = 100000000 };
        static pid_t tracee, tracer;
      
        static const char *pid_who(pid_t pid)
        {
      	  return pid == tracee ? "tracee" : (pid == tracer ? "tracer" : "mommy ");
        }
      
        static void sigchld_sigaction(int signo, siginfo_t *si, void *ucxt)
        {
      	  printf("%s: SIG status=%02d code=%02d (%s)\n",
      		 pid_who(getpid()), si->si_status, si->si_code,
      		 pid_who(si->si_pid));
        }
      
        int main(void)
        {
      	  const struct sigaction chld_sa = { .sa_sigaction = sigchld_sigaction,
      					     .sa_flags = SA_SIGINFO|SA_RESTART };
      	  siginfo_t si;
      
      	  sigaction(SIGCHLD, &chld_sa, NULL);
      
      	  tracee = fork();
      	  if (!tracee) {
      		  tracee = getpid();
      		  while (1)
      			  pause();
      	  }
      
      	  kill(tracee, SIGSTOP);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      
      	  tracer = fork();
      	  if (!tracer) {
      		  tracer = getpid();
      		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  printf("tracer: detaching\n");
      		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
      		  return 0;
      	  }
      
      	  while (1)
      		  pause();
      	  return 0;
        }
      
      Before the patch, the parent gets the second notification for the
      tracee after the tracer detaches.  si_status is zero because
      group_exit_code is not set by the group stop completion which
      triggered this notification.
      
        mommy : SIG status=19 code=05 (tracee)
        tracer: SIG status=00 code=05 (tracee)
        tracer: SIG status=19 code=04 (tracee)
        tracer: SIG status=00 code=05 (tracee)
        tracer: detaching
        mommy : SIG status=00 code=05 (tracee)
        mommy : SIG status=00 code=01 (tracer)
        ^C
      
      After the patch, the duplicate notification is gone.
      
        mommy : SIG status=19 code=05 (tracee)
        tracer: SIG status=00 code=05 (tracee)
        tracer: SIG status=19 code=04 (tracee)
        tracer: SIG status=00 code=05 (tracee)
        tracer: detaching
        mommy : SIG status=00 code=01 (tracer)
        ^C
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      244056f9
    • T
      job control: Notify the real parent of job control events regardless of ptrace · ceb6bd67
      Tejun Heo 提交于
      With recent changes, job control and ptrace stopped states are
      properly separated and accessible to the real parent and the ptracer
      respectively; however, notifications of job control stopped/continued
      events to the real parent while ptraced are still missing.
      
      A ptracee participates in group stop in ptrace_stop() but the
      completion isn't notified.  If participation results in completion of
      group stop, notify the real parent of the event.  The ptrace and group
      stops are separate and can be handled as such.
      
      However, when the real parent and the ptracer are in the same thread
      group, only the ptrace stop event is visible through wait(2) and the
      duplicate notifications are different from the current behavior and
      are confusing.  Suppress group stop notification in such cases.
      
      The continued state is shared between the real parent and the ptracer
      but is only meaningful to the real parent.  Always notify the real
      parent and notify the ptracer too for backward compatibility.  Similar
      to stop notification, if the real parent is the ptracer, suppress a
      duplicate notification.
      
      Test case follows.
      
        #include <stdio.h>
        #include <unistd.h>
        #include <time.h>
        #include <errno.h>
        #include <sys/types.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        int main(void)
        {
      	  const struct timespec ts100ms = { .tv_nsec = 100000000 };
      	  pid_t tracee, tracer;
      	  siginfo_t si;
      	  int i;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  while (1) {
      			  printf("tracee: SIGSTOP\n");
      			  raise(SIGSTOP);
      			  nanosleep(&ts100ms, NULL);
      			  printf("tracee: SIGCONT\n");
      			  raise(SIGCONT);
      			  nanosleep(&ts100ms, NULL);
      		  }
      	  }
      
      	  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG | WNOWAIT);
      
      	  tracer = fork();
      	  if (tracer == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      
      		  for (i = 0; i < 11; i++) {
      			  si.si_pid = 0;
      			  waitid(P_PID, tracee, &si, WSTOPPED);
      			  if (si.si_pid && si.si_code == CLD_TRAPPED)
      				  ptrace(PTRACE_CONT, tracee, NULL,
      					 (void *)(long)si.si_status);
      		  }
      		  printf("tracer: EXITING\n");
      		  return 0;
      	  }
      
      	  while (1) {
      		  si.si_pid = 0;
      		  waitid(P_PID, tracee, &si, WSTOPPED | WCONTINUED | WEXITED);
      		  if (si.si_pid)
      			  printf("mommy : WAIT status=%02d code=%02d\n",
      				 si.si_status, si.si_code);
      	  }
      	  return 0;
        }
      
      Before this patch, while ptraced, the real parent doesn't get
      notifications for job control events, so although it can access those
      events, the later waitid(2) call never wakes up.
      
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        tracee: SIGSTOP
        tracee: SIGCONT
        tracee: SIGSTOP
        tracee: SIGCONT
        tracee: SIGSTOP
        tracer: EXITING
        mommy : WAIT status=19 code=05
        ^C
      
      After this patch, it works as expected.
      
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        mommy : WAIT status=18 code=06
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        mommy : WAIT status=18 code=06
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        mommy : WAIT status=18 code=06
        tracee: SIGSTOP
        tracer: EXITING
        mommy : WAIT status=19 code=05
        ^C
      
      -v2: Oleg pointed out that
      
           * Group stop notification to the real parent should also happen
             when ptracer detach races with ptrace_stop().
      
           * real_parent_is_ptracer() should be testing thread group
             equality not the task itself as wait(2) and stop/cont
             notifications are normally thread-group wide.
      
           Both issues are fixed accordingly.
      
      -v3: real_parent_is_ptracer() updated to test child->real_parent
           instead of child->group_leader->real_parent per Oleg's
           suggestion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      ceb6bd67
    • T
      job control: Job control stop notifications should always go to the real parent · 62bcf9d9
      Tejun Heo 提交于
      The stopped notifications in do_signal_stop() and exit_signals() are
      always for the completion of job control.  The one in do_signal_stop()
      may be delivered to the ptracer if PTRACE_ATTACH races with
      notification and the one in exit_signals() if task exits while
      ptraced.
      
      In both cases, the notifications are meaningless and confusing to the
      ptracer as it never accesses the group stop state while the real
      parent would miss notifications for the events it is watching.
      
      Make sure these notifications always go to the real parent by calling
      do_notify_parent_cld_stop() with %false @for_ptrace.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      62bcf9d9
    • T
      job control: Add @for_ptrace to do_notify_parent_cldstop() · 75b95953
      Tejun Heo 提交于
      Currently, do_notify_parent_cldstop() determines whether the
      notification is for the real parent or ptracer.  Move the decision to
      the caller by adding @for_ptrace parameter to
      do_notify_parent_cldstop().  All the callers are updated to pass
      task_ptrace(target_task), so this patch doesn't cause any behavior
      difference.
      
      While at it, add function comment to do_notify_parent_cldstop().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      75b95953
    • T
      job control: Allow access to job control events through ptracees · 45cb24a1
      Tejun Heo 提交于
      Currently a real parent can't access job control stopped/continued
      events through a ptraced child.  This utterly breaks job control when
      the children are ptraced.
      
      For example, if a program is run from an interactive shell and then
      strace(1) attaches to it, pressing ^Z would send SIGTSTP and strace(1)
      would notice it but the shell has no way to tell whether the child
      entered job control stop and thus can't tell when to take over the
      terminal - leading to awkward lone ^Z on the terminal.
      
      Because the job control and ptrace stopped states are independent,
      there is no reason to prevent real parents from accessing the stopped
      state regardless of ptrace.  The continued state isn't separate but
      ptracers don't have any use for them as ptracees can never resume
      without explicit command from their ptracers, so as long as ptracers
      don't consume it, it should be fine.
      
      Although this is a behavior change, because the previous behavior is
      utterly broken when viewed from real parents and the change is only
      visible to real parents, I don't think it's necessary to make this
      behavior optional.
      
      One situation to be careful about is when a task from the real
      parent's group is ptracing.  The parent group is the recipient of both
      ptrace and job control stop events and one stop can be reported as
      both job control and ptrace stops.  As this can break the current
      ptrace users, suppress job control stopped events for these cases.
      
      If a real parent ptracer wants to know about both job control and
      ptrace stops, it can create a separate process to serve the role of
      real parent.
      
      Note that this only updates wait(2) side of things.  The real parent
      can access the states via wait(2) but still is not properly notified
      (woken up and delivered signal).  Test case polls wait(2) with WNOHANG
      to work around.  Notification will be updated by future patches.
      
      Test case follows.
      
        #include <stdio.h>
        #include <unistd.h>
        #include <time.h>
        #include <errno.h>
        #include <sys/types.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        int main(void)
        {
      	  const struct timespec ts100ms = { .tv_nsec = 100000000 };
      	  pid_t tracee, tracer;
      	  siginfo_t si;
      	  int i;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  while (1) {
      			  printf("tracee: SIGSTOP\n");
      			  raise(SIGSTOP);
      			  nanosleep(&ts100ms, NULL);
      			  printf("tracee: SIGCONT\n");
      			  raise(SIGCONT);
      			  nanosleep(&ts100ms, NULL);
      		  }
      	  }
      
      	  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG | WNOWAIT);
      
      	  tracer = fork();
      	  if (tracer == 0) {
      		  nanosleep(&ts100ms, NULL);
      		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      
      		  for (i = 0; i < 11; i++) {
      			  si.si_pid = 0;
      			  waitid(P_PID, tracee, &si, WSTOPPED);
      			  if (si.si_pid && si.si_code == CLD_TRAPPED)
      				  ptrace(PTRACE_CONT, tracee, NULL,
      					 (void *)(long)si.si_status);
      		  }
      		  printf("tracer: EXITING\n");
      		  return 0;
      	  }
      
      	  while (1) {
      		  si.si_pid = 0;
      		  waitid(P_PID, tracee, &si,
      			 WSTOPPED | WCONTINUED | WEXITED | WNOHANG);
      		  if (si.si_pid)
      			  printf("mommy : WAIT status=%02d code=%02d\n",
      				 si.si_status, si.si_code);
      		  nanosleep(&ts100ms, NULL);
      	  }
      	  return 0;
        }
      
      Before the patch, while ptraced, the parent can't see any job control
      events.
      
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        tracee: SIGSTOP
        tracee: SIGCONT
        tracee: SIGSTOP
        tracee: SIGCONT
        tracee: SIGSTOP
        tracer: EXITING
        mommy : WAIT status=19 code=05
        ^C
      
      After the patch,
      
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        mommy : WAIT status=18 code=06
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        mommy : WAIT status=18 code=06
        tracee: SIGSTOP
        mommy : WAIT status=19 code=05
        tracee: SIGCONT
        mommy : WAIT status=18 code=06
        tracee: SIGSTOP
        tracer: EXITING
        mommy : WAIT status=19 code=05
        ^C
      
      -v2: Oleg pointed out that wait(2) should be suppressed for the real
           parent's group instead of only the real parent task itself.
           Updated accordingly.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      45cb24a1
    • T
      job control: Fix ptracer wait(2) hang and explain notask_error clearing · 9b84cca2
      Tejun Heo 提交于
      wait(2) and friends allow access to stopped/continued states through
      zombies, which is required as the states are process-wide and should
      be accessible whether the leader task is alive or undead.
      wait_consider_task() implements this by always clearing notask_error
      and going through wait_task_stopped/continued() for unreaped zombies.
      
      However, while ptraced, the stopped state is per-task and as such if
      the ptracee became a zombie, there's no further stopped event to
      listen to and wait(2) and friends should return -ECHILD on the tracee.
      
      Fix it by clearing notask_error only if WCONTINUED | WEXITED is set
      for ptraced zombies.  While at it, document why clearing notask_error
      is safe for each case.
      
      Test case follows.
      
        #include <stdio.h>
        #include <unistd.h>
        #include <pthread.h>
        #include <time.h>
        #include <sys/types.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        static void *nooper(void *arg)
        {
      	  pause();
      	  return NULL;
        }
      
        int main(void)
        {
      	  const struct timespec ts1s = { .tv_sec = 1 };
      	  pid_t tracee, tracer;
      	  siginfo_t si;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  pthread_t thr;
      
      		  pthread_create(&thr, NULL, nooper, NULL);
      		  nanosleep(&ts1s, NULL);
      		  printf("tracee exiting\n");
      		  pthread_exit(NULL);	/* let subthread run */
      	  }
      
      	  tracer = fork();
      	  if (tracer == 0) {
      		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      		  while (1) {
      			  if (waitid(P_PID, tracee, &si, WSTOPPED) < 0) {
      				  perror("waitid");
      				  break;
      			  }
      			  ptrace(PTRACE_CONT, tracee, NULL,
      				 (void *)(long)si.si_status);
      		  }
      		  return 0;
      	  }
      
      	  waitid(P_PID, tracer, &si, WEXITED);
      	  kill(tracee, SIGKILL);
      	  return 0;
        }
      
      Before the patch, after the tracee becomes a zombie, the tracer's
      waitid(WSTOPPED) never returns and the program doesn't terminate.
      
        tracee exiting
        ^C
      
      After the patch, tracee exiting triggers waitid() to fail.
      
        tracee exiting
        waitid: No child processes
      
      -v2: Oleg pointed out that exited in addition to continued can happen
           for ptraced dead group leader.  Clear notask_error for ptraced
           child on WEXITED too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      9b84cca2
    • T
      job control: Small reorganization of wait_consider_task() · 823b018e
      Tejun Heo 提交于
      Move EXIT_DEAD test in wait_consider_task() above ptrace check.  As
      ptraced tasks can't be EXIT_DEAD, this change doesn't cause any
      behavior change.  This is to prepare for further changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      823b018e
    • T
      job control: Don't set group_stop exit_code if re-entering job control stop · 408a37de
      Tejun Heo 提交于
      While ptraced, a task may be resumed while the containing process is
      still job control stopped.  If the task receives another stop signal
      in this state, it will still initiate group stop, which generates
      group_exit_code, which the real parent would be able to see once the
      ptracer detaches.
      
      In this scenario, the real parent may see two consecutive CLD_STOPPED
      events from two stop signals without intervening SIGCONT, which
      normally is impossible.
      
      Test case follows.
      
        #include <stdio.h>
        #include <unistd.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        int main(void)
        {
      	  pid_t tracee;
      	  siginfo_t si;
      
      	  tracee = fork();
      	  if (!tracee)
      		  while (1)
      			  pause();
      
      	  kill(tracee, SIGSTOP);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      
      	  if (!fork()) {
      		  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      		  waitid(P_PID, tracee, &si, WSTOPPED);
      		  ptrace(PTRACE_DETACH, tracee, NULL, NULL);
      		  return 0;
      	  }
      
      	  while (1) {
      		  si.si_pid = 0;
      		  waitid(P_PID, tracee, &si, WSTOPPED | WNOHANG);
      		  if (si.si_pid)
      			  printf("st=%02d c=%02d\n", si.si_status, si.si_code);
      	  }
      	  return 0;
        }
      
      Before the patch, the latter waitid() in polling mode reports the
      second stopped event generated by the implied SIGSTOP of
      PTRACE_ATTACH.
      
        st=19 c=05
        ^C
      
      After the patch, the second event is not reported.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      408a37de
    • T
      ptrace: Always put ptracee into appropriate execution state · 0e9f0a4a
      Tejun Heo 提交于
      Currently, __ptrace_unlink() wakes up the tracee iff it's in
      TASK_TRACED.  For unlinking from PTRACE_DETACH, this is correct as the
      tracee is guaranteed to be in TASK_TRACED or dead; however, unlinking
      also happens when the ptracer exits and in this case the ptracee can
      be in any state and ptrace might be left running even if the group it
      belongs to is stopped.
      
      This patch updates __ptrace_unlink() such that GROUP_STOP_PENDING is
      reinstated regardless of the ptracee's current state as long as it's
      alive and makes sure that signal_wake_up() is called if execution
      state transition is necessary.
      
      Test case follows.
      
        #include <unistd.h>
        #include <time.h>
        #include <sys/types.h>
        #include <sys/ptrace.h>
        #include <sys/wait.h>
      
        static const struct timespec ts1s = { .tv_sec = 1 };
      
        int main(void)
        {
      	  pid_t tracee;
      	  siginfo_t si;
      
      	  tracee = fork();
      	  if (tracee == 0) {
      		  while (1) {
      			  nanosleep(&ts1s, NULL);
      			  write(1, ".", 1);
      		  }
      	  }
      
      	  ptrace(PTRACE_ATTACH, tracee, NULL, NULL);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      	  waitid(P_PID, tracee, &si, WSTOPPED);
      	  ptrace(PTRACE_CONT, tracee, NULL, (void *)(long)si.si_status);
      	  write(1, "exiting", 7);
      	  return 0;
        }
      
      Before the patch, after the parent process exits, the child is left
      running and prints out "." every second.
      
        exiting..... (continues)
      
      After the patch, the group stop initiated by the implied SIGSTOP from
      PTRACE_ATTACH is re-established when the parent exits.
      
        exiting
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      0e9f0a4a
    • T
      ptrace: Collapse ptrace_untrace() into __ptrace_unlink() · e3bd058f
      Tejun Heo 提交于
      Remove the extra task_is_traced() check in __ptrace_unlink() and
      collapse ptrace_untrace() into __ptrace_unlink().  This is to prepare
      for further changes.
      
      While at it, drop the comment on top of ptrace_untrace() and convert
      __ptrace_unlink() comment to docbook format.  Detailed comment will be
      added by the next patch.
      
      This patch doesn't cause any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      e3bd058f
    • T
      ptrace: Clean transitions between TASK_STOPPED and TRACED · d79fdd6d
      Tejun Heo 提交于
      Currently, if the task is STOPPED on ptrace attach, it's left alone
      and the state is silently changed to TRACED on the next ptrace call.
      The behavior breaks the assumption that arch_ptrace_stop() is called
      before any task is poked by ptrace and is ugly in that a task
      manipulates the state of another task directly.
      
      With GROUP_STOP_PENDING, the transitions between TASK_STOPPED and
      TRACED can be made clean.  The tracer can use the flag to tell the
      tracee to retry stop on attach and detach.  On retry, the tracee will
      enter the desired state in the correct way.  The lower 16bits of
      task->group_stop is used to remember the signal number which caused
      the last group stop.  This is used while retrying for ptrace attach as
      the original group_exit_code could have been consumed with wait(2) by
      then.
      
      As the real parent may wait(2) and consume the group_exit_code
      anytime, the group_exit_code needs to be saved separately so that it
      can be used when switching from regular sleep to ptrace_stop().  This
      is recorded in the lower 16bits of task->group_stop.
      
      If a task is already stopped and there's no intervening SIGCONT, a
      ptrace request immediately following a successful PTRACE_ATTACH should
      always succeed even if the tracer doesn't wait(2) for attach
      completion; however, with this change, the tracee might still be
      TASK_RUNNING trying to enter TASK_TRACED which would cause the
      following request to fail with -ESRCH.
      
      This intermediate state is hidden from the ptracer by setting
      GROUP_STOP_TRAPPING on attach and making ptrace_check_attach() wait
      for it to clear on its signal->wait_chldexit.  Completing the
      transition or getting killed clears TRAPPING and wakes up the tracer.
      
      Note that the STOPPED -> RUNNING -> TRACED transition is still visible
      to other threads which are in the same group as the ptracer and the
      reverse transition is visible to all.  Please read the comments for
      details.
      
      Oleg:
      
      * Spotted a race condition where a task may retry group stop without
        proper bookkeeping.  Fixed by redoing bookkeeping on retry.
      
      * Spotted that the transition is visible to userland in several
        different ways.  Most are fixed with GROUP_STOP_TRAPPING.  Unhandled
        corner case is documented.
      
      * Pointed out not setting GROUP_STOP_SIGMASK on an already stopped
        task would result in more consistent behavior.
      
      * Pointed out that calling ptrace_stop() from do_signal_stop() in
        TASK_STOPPED can race with group stop start logic and then confuse
        the TRAPPING wait in ptrace_check_attach().  ptrace_stop() is now
        called with TASK_RUNNING.
      
      * Suggested using signal->wait_chldexit instead of bit wait.
      
      * Spotted a race condition between TRACED transition and clearing of
        TRAPPING.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      d79fdd6d
    • T
      ptrace: Make do_signal_stop() use ptrace_stop() if the task is being ptraced · 5224fa36
      Tejun Heo 提交于
      A ptraced task would still stop at do_signal_stop() when it's stopping
      for stop signals and do_signal_stop() behaves the same whether the
      task is ptraced or not.  However, in addition to stopping,
      ptrace_stop() also does ptrace specific stuff like calling
      architecture specific callbacks, so this behavior makes the code more
      fragile and difficult to understand.
      
      This patch makes do_signal_stop() test whether the task is ptraced and
      use ptrace_stop() if so.  This renders tracehook_notify_jctl() rather
      pointless as the ptrace notification is now handled by ptrace_stop()
      regardless of the return value from the tracehook.  It probably is a
      good idea to update it.
      
      This doesn't solve the whole problem as tasks already in stopped state
      would stay in the regular stop when ptrace attached.  That part will
      be handled by the next patch.
      
      Oleg pointed out that this makes a userland-visible change.  Before,
      SIGCONT would be able to wake up a task in group stop even if the task
      is ptraced if the tracer hasn't issued another ptrace command
      afterwards (as the next ptrace commands transitions the state into
      TASK_TRACED which ignores SIGCONT wakeups).  With this and the next
      patch, SIGCONT may race with the transition into TASK_TRACED and is
      ignored if the tracee already entered TASK_TRACED.
      
      Another userland visible change of this and the next patch is that the
      ptracee's state would now be TASK_TRACED where it used to be
      TASK_STOPPED, which is visible via fs/proc.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      5224fa36
    • T
      ptrace: Participate in group stop from ptrace_stop() iff the task is trapping for group stop · 0ae8ce1c
      Tejun Heo 提交于
      Currently, ptrace_stop() unconditionally participates in group stop
      bookkeeping.  This is unnecessary and inaccurate.  Make it only
      participate if the task is trapping for group stop - ie. if @why is
      CLD_STOPPED.  As ptrace_stop() currently is not used when trapping for
      group stop, this equals to disabling group stop participation from
      ptrace_stop().
      
      A visible behavior change is increased likelihood of delayed group
      stop completion if the thread group contains one or more ptraced
      tasks.
      
      This is to preapre for further cleanup of the interaction between
      group stop and ptrace.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      0ae8ce1c
    • T
      signal: Use GROUP_STOP_PENDING to stop once for a single group stop · 39efa3ef
      Tejun Heo 提交于
      Currently task->signal->group_stop_count is used to decide whether to
      stop for group stop.  However, if there is a task in the group which
      is taking a long time to stop, other tasks which are continued by
      ptrace would repeatedly stop for the same group stop until the group
      stop is complete.
      
      Conversely, if a ptraced task is in TASK_TRACED state, the debugger
      won't get notified of group stops which is inconsistent compared to
      the ptraced task in any other state.
      
      This patch introduces GROUP_STOP_PENDING which tracks whether a task
      is yet to stop for the group stop in progress.  The flag is set when a
      group stop starts and cleared when the task stops the first time for
      the group stop, and consulted whenever whether the task should
      participate in a group stop needs to be determined.  Note that now
      tasks in TASK_TRACED also participate in group stop.
      
      This results in the following behavior changes.
      
      * For a single group stop, a ptracer would see at most one stop
        reported.
      
      * A ptracee in TASK_TRACED now also participates in group stop and the
        tracer would get the notification.  However, as a ptraced task could
        be in TASK_STOPPED state or any ptrace trap could consume group
        stop, the notification may still be missing.  These will be
        addressed with further patches.
      
      * A ptracee may start a group stop while one is still in progress if
        the tracer let it continue with stop signal delivery.  Group stop
        code handles this correctly.
      
      Oleg:
      
      * Spotted that a task might skip signal check even when its
        GROUP_STOP_PENDING is set.  Fixed by updating
        recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
        group_stop_count.
      
      * Pointed out that task->group_stop should be cleared whenever
        task->signal->group_stop_count is cleared.  Fixed accordingly.
      
      * Pointed out the behavior inconsistency between TASK_TRACED and
        RUNNING and the last behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      39efa3ef
    • T
      signal: Fix premature completion of group stop when interfered by ptrace · e5c1902e
      Tejun Heo 提交于
      task->signal->group_stop_count is used to track the progress of group
      stop.  It's initialized to the number of tasks which need to stop for
      group stop to finish and each stopping or trapping task decrements.
      However, each task doesn't keep track of whether it decremented the
      counter or not and if woken up before the group stop is complete and
      stops again, it can decrement the counter multiple times.
      
      Please consider the following example code.
      
       static void *worker(void *arg)
       {
      	 while (1) ;
      	 return NULL;
       }
      
       int main(void)
       {
      	 pthread_t thread;
      	 pid_t pid;
      	 int i;
      
      	 pid = fork();
      	 if (!pid) {
      		 for (i = 0; i < 5; i++)
      			 pthread_create(&thread, NULL, worker, NULL);
      		 while (1) ;
      		 return 0;
      	 }
      
      	 ptrace(PTRACE_ATTACH, pid, NULL, NULL);
      	 while (1) {
      		 waitid(P_PID, pid, NULL, WSTOPPED);
      		 ptrace(PTRACE_SINGLESTEP, pid, NULL, (void *)(long)SIGSTOP);
      	 }
      	 return 0;
       }
      
      The child creates five threads and the parent continuously traps the
      first thread and whenever the child gets a signal, SIGSTOP is
      delivered.  If an external process sends SIGSTOP to the child, all
      other threads in the process should reliably stop.  However, due to
      the above bug, the first thread will often end up consuming
      group_stop_count multiple times and SIGSTOP often ends up stopping
      none or part of the other four threads.
      
      This patch adds a new field task->group_stop which is protected by
      siglock and uses GROUP_STOP_CONSUME flag to track which task is still
      to consume group_stop_count to fix this bug.
      
      task_clear_group_stop_pending() and task_participate_group_stop() are
      added to help manipulating group stop states.  As ptrace_stop() now
      also uses task_participate_group_stop(), it will set
      SIGNAL_STOP_STOPPED if it completes a group stop.
      
      There still are many issues regarding the interaction between group
      stop and ptrace.  Patches to address them will follow.
      
      - Oleg spotted duplicate GROUP_STOP_CONSUME.  Dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      e5c1902e
    • T
      ptrace: Add @why to ptrace_stop() · fe1bc6a0
      Tejun Heo 提交于
      To prepare for cleanup of the interaction between group stop and
      ptrace, add @why to ptrace_stop().  Existing users are updated such
      that there is no behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRoland McGrath <roland@redhat.com>
      fe1bc6a0
    • T
      ptrace: Kill tracehook_notify_jctl() · edf2ed15
      Tejun Heo 提交于
      tracehook_notify_jctl() aids in determining whether and what to report
      to the parent when a task is stopped or continued.  The function also
      adds an extra requirement that siglock may be released across it,
      which is currently unused and quite difficult to satisfy in
      well-defined manner.
      
      As job control and the notifications are about to receive major
      overhaul, remove the tracehook and open code it.  If ever necessary,
      let's factor it out after the overhaul.
      
      * Oleg spotted incorrect CLD_CONTINUED/STOPPED selection when ptraced.
        Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      edf2ed15
    • T
      signal: Remove superflous try_to_freeze() loop in do_signal_stop() · 71db5eb9
      Tejun Heo 提交于
      do_signal_stop() is used only by get_signal_to_deliver() and after a
      successful signal stop, it always calls try_to_freeze(), so the
      try_to_freeze() loop around schedule() in do_signal_stop() is
      superflous and confusing.  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      71db5eb9
    • T
      ptrace: Remove the extra wake_up_state() from ptrace_detach() · 9f2bf651
      Tejun Heo 提交于
      This wake_up_state() has a turbulent history.  This is a remnant from
      ancient ptrace implementation and patently wrong.  Commit 95a3540d
      (ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic) removed
      it but the change was reverted later by commit edaba2c5 (ptrace:
      revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic")
      citing compatibility breakage and general brokeness of the whole group
      stop / ptrace interaction.  Then, recently, it got converted from
      wake_up_process() to wake_up_state() to make it less dangerous.
      
      Digging through the mailing archives, the compatibility breakage
      doesn't seem to be critical in the sense that the behavior isn't well
      defined or reliable to begin with and it seems to have been agreed to
      remove the wakeup with proper cleanup of the whole thing.
      
      Now that the group stop and its interaction with ptrace are being
      cleaned up, it's high time to finally kill this silliness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      9f2bf651
    • T
      signal: Fix SIGCONT notification code · c672af35
      Tejun Heo 提交于
      After a task receives SIGCONT, its parent is notified via SIGCHLD with
      its siginfo describing what the notified event is.  If SIGCONT is
      received while the child process is stopped, the code should be
      CLD_CONTINUED.  If SIGCONT is recieved while the child process is in
      the process of being stopped, it should be CLD_STOPPED.  Which code to
      use is determined in prepare_signal() and recorded in signal->flags
      using SIGNAL_CLD_CONTINUED|STOP flags.
      
      get_signal_deliver() should test these flags and then notify
      accoringly; however, it incorrectly tested SIGNAL_STOP_CONTINUED
      instead of SIGNAL_CLD_CONTINUED, thus incorrectly notifying
      CLD_CONTINUED if the signal is delivered before the task is wait(2)ed
      and CLD_STOPPED if the state was fetched already.
      
      Fix it by testing SIGNAL_CLD_CONTINUED.  While at it, uncompress the
      ?: test into if/else clause for better readability.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      c672af35
    • M
      printk: allow setting DEFAULT_MESSAGE_LEVEL via Kconfig · 5af5bcb8
      Mandeep Singh Baines 提交于
      We've been burned by regressions/bugs which we later realized could have
      been triaged quicker if only we'd paid closer attention to dmesg.  To make
      it easier to audit dmesg, we'd like to make DEFAULT_MESSAGE_LEVEL
      Kconfig-settable.  That way we can set it to KERN_NOTICE and audit any
      messages <= KERN_WARNING.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joe Perches <joe@perches.com>
      Cc: Olof Johansson <olofj@chromium.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5af5bcb8
    • K
      printk: use %pK for /proc/kallsyms and /proc/modules · 9f36e2c4
      Kees Cook 提交于
      In an effort to reduce kernel address leaks that might be used to help
      target kernel privilege escalation exploits, this patch uses %pK when
      displaying addresses in /proc/kallsyms, /proc/modules, and
      /sys/module/*/sections/*.
      
      Note that this changes %x to %p, so some legitimately 0 values in
      /proc/kallsyms would have changed from 00000000 to "(null)".  To avoid
      this, "(null)" is not used when using the "K" format.  Anything that was
      already successfully parsing "(null)" in addition to full hex digits
      should have no problem with this change.  (Thanks to Joe Perches for the
      suggestion.) Due to the %x to %p, "void *" casts are needed since these
      addresses are already "unsigned long" everywhere internally, due to their
      starting life as ELF section offsets.
      Signed-off-by: NKees Cook <kees.cook@canonical.com>
      Cc: Eugene Teo <eugene@redhat.com>
      Cc: Dan Rosenberg <drosenberg@vsecurity.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f36e2c4
    • F
      console: prevent registered consoles from dumping old kernel message over again · fe3d8ad3
      Feng Tang 提交于
      For a platform with many consoles like:
       "console=tty1 console=ttyMFD2 console=ttyS0 earlyprintk=mrst"
      
      Each time when the non "selected_console" (tty1 and ttyMFD2 here) get
      registered, the existing kernel message will be printed out on registered
      consoles again, the "mrst" early console will get some same message for 3
      times, and "tty1" will get some for twice.
      
      As suggested by Andrew Morton, every time a new console is registered, it
      will be set as the "exclusive" console which will dump the already
      existing kernel messages.
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe3d8ad3
    • F
      console: allow to retain boot console via boot option keep_bootcon · 7bf69395
      Fabio M. Di Nitto 提交于
      On some architectures, the boot process involves de-registering the boot
      console (early boot), initialize drivers and then re-register the console.
      
      This mechanism introduces a window in which no printk can happen on the
      console and messages are buffered and then printed once the new console is
      available.
      
      If a kernel crashes during this window, all it's left on the boot console
      is "console [foo] enabled, bootconsole disabled" making debug of the crash
      rather 'interesting'.
      
      By adding "keep_bootcon" option, do not unregister the boot console, that
      will allow to printk everything that is happening up to the crash.
      
      The option is clearly meant only for debugging purposes as it introduces
      lots of duplicated info printed on console, but will make bug report from
      users easier as it doesn't require a kernel build just to figure out where
      we crash.
      Signed-off-by: NFabio M. Di Nitto <fabbione@fabbione.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Greg KH <gregkh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bf69395
    • D
      kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events · f99a9933
      Don Zickus 提交于
      This patch addresses a couple of problems.  One was the case when the
      hardlockup failed to start, it also failed to start the softlockup.  There
      were valid cases when the hardlockup shouldn't start and that shouldn't
      block the softlockup (no lapic, bios controls perf counters).
      
      The second problem was when the hardlockup failed to start on boxes (from
      a no lapic or bios controlled perf counter case), it reported failure to
      the cpu notifier chain.  This blocked the notifier from continuing to
      start other more critical pieces of cpu bring-up (in our case based on a
      2.6.32 fork, it was the mce).  As a result, during soft cpu online/offline
      testing, the system would panic when a cpu was offlined because the cpu
      notifier would succeed in processing a watchdog disable cpu event and
      would panic in the mce case as a result of un-initialized variables from a
      never executed cpu up event.
      
      I realized the hardlockup/softlockup cases are really just debugging aids
      and should never impede the progress of a cpu up/down event.  Therefore I
      modified the code to always return NOTIFY_OK and instead rely on printks
      to inform the user of problems.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f99a9933
    • D
      kernel/watchdog.c: allow hardlockup to panic by default · fef2c9bc
      Don Zickus 提交于
      When a cpu is considered stuck, instead of limping along and just printing
      a warning, it is sometimes preferred to just panic, let kdump capture the
      vmcore and reboot.  This gets the machine back into a stable state quickly
      while saving the info that got it into a stuck state to begin with.
      
      Add a Kconfig option to allow users to set the hardlockup to panic
      by default.  Also add in a 'nmi_watchdog=nopanic' to override this.
      
      [akpm@linux-foundation.org: fix strncmp length]
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fef2c9bc
    • O
      sys_unshare: remove the dead CLONE_THREAD/SIGHAND/VM code · 9bfb23fc
      Oleg Nesterov 提交于
      Cleanup: kill the dead code which does nothing but complicates the code
      and confuses the reader.
      
      sys_unshare(CLONE_THREAD/SIGHAND/VM) is not really implemented, and I
      doubt very much it will ever work.  At least, nobody even tried since the
      original 99d1419d ("unshare system call -v5: system call
      handler function") was applied more than 4 years ago.
      
      And the code is not consistent.  unshare_thread() always fails
      unconditionally, while unshare_sighand() and unshare_vm() pretend to work
      if there is nothing to unshare.
      
      Remove unshare_thread(), unshare_sighand(), unshare_vm() helpers and
      related variables and add a simple CLONE_THREAD | CLONE_SIGHAND| CLONE_VM
      check into check_unshare_flags().
      
      Also, move the "CLONE_NEWNS needs CLONE_FS" check from
      check_unshare_flags() to sys_unshare().  This looks more consistent and
      matches the similar do_sysvsem check in sys_unshare().
      
      Note: with or without this patch "atomic_read(mm->mm_users) > 1" can give
      a false positive due to get_task_mm().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Janak Desai <janak@us.ibm.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9bfb23fc
    • M
      kernel/cpu.c: fix many errors related to style. · 4d51985e
      Michael Rodriguez 提交于
      Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and
      fixes other coding style errors.  Not _all_ of them are gone, though.
      
      [akpm@linux-foundation.org: revert the bits I disagree with]
      Signed-off-by: NMichael Rodriguez <dkingston02@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d51985e
    • A
      smp: move smp setup functions to kernel/smp.c · 34db18a0
      Amerigo Wang 提交于
      Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
      setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
      CONFIG_SMP.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Rakib Mullick <rakib.mullick@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34db18a0
    • O
      move x86 specific oops=panic to generic code · d404ab0a
      Olaf Hering 提交于
      The oops=panic cmdline option is not x86 specific, move it to generic code.
      Update documentation.
      Signed-off-by: NOlaf Hering <olaf@aepfle.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d404ab0a
    • E
      kthread: use kthread_create_on_node() · 94dcf29a
      Eric Dumazet 提交于
      ksoftirqd, kworker, migration, and pktgend kthreads can be created with
      kthread_create_on_node(), to get proper NUMA affinities for their stack and
      task_struct.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94dcf29a
    • E
      kthread: NUMA aware kthread_create_on_node() · 207205a2
      Eric Dumazet 提交于
      All kthreads being created from a single helper task, they all use memory
      from a single node for their kernel stack and task struct.
      
      This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter
      to parameters already used by kthread_create().
      
      This parameter serves in allocating memory for the new kthread on its
      memory node if possible.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      207205a2
    • E
      mm: NUMA aware alloc_thread_info_node() · b6a84016
      Eric Dumazet 提交于
      Add a node parameter to alloc_thread_info(), and change its name to
      alloc_thread_info_node()
      
      This change is needed to allow NUMA aware kthread_create_on_cpu()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6a84016
    • E
      mm: NUMA aware alloc_task_struct_node() · 504f52b5
      Eric Dumazet 提交于
      All kthreads being created from a single helper task, they all use memory
      from a single node for their kernel stack and task struct.
      
      This patch suite creates kthread_create_on_cpu(), adding a 'cpu' parameter
      to parameters already used by kthread_create().
      
      This parameter serves in allocating memory for the new kthread on its
      memory node if available.
      
      Users of this new function are : ksoftirqd, kworker, migration, pktgend...
      
      This patch:
      
      Add a node parameter to alloc_task_struct(), and change its name to
      alloc_task_struct_node()
      
      This change is needed to allow NUMA aware kthread_create_on_cpu()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      504f52b5
    • P
      cgroups: if you list_empty() a head then don't list_del() it · 8d258797
      Phil Carmody 提交于
      list_del() leaves poison in the prev and next pointers.  The next
      list_empty() will compare those poisons, and say the list isn't empty.
      Any list operations that assume the node is on a list because of such a
      check will be fooled into dereferencing poison.  One needs to INIT the
      node after the del, and fortunately there's already a wrapper for that -
      list_del_init().
      
      Some of the dels are followed by deallocations, so can be ignored, and one
      can be merged with an add to make a move.  Apart from that, I erred on the
      side of caution in making nodes list_empty()-queriable.
      Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d258797
  3. 22 3月, 2011 1 次提交
    • J
      Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal code · da48524e
      Julien Tinnes 提交于
      Userland should be able to trust the pid and uid of the sender of a
      signal if the si_code is SI_TKILL.
      
      Unfortunately, the kernel has historically allowed sigqueueinfo() to
      send any si_code at all (as long as it was negative - to distinguish it
      from kernel-generated signals like SIGILL etc), so it could spoof a
      SI_TKILL with incorrect siginfo values.
      
      Happily, it looks like glibc has always set si_code to the appropriate
      SI_QUEUE, so there are probably no actual user code that ever uses
      anything but the appropriate SI_QUEUE flag.
      
      So just tighten the check for si_code (we used to allow any negative
      value), and add a (one-time) warning in case there are binaries out
      there that might depend on using other si_code values.
      Signed-off-by: NJulien Tinnes <jln@google.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da48524e
  4. 18 3月, 2011 1 次提交