1. 09 2月, 2008 7 次提交
  2. 07 2月, 2008 2 次提交
  3. 06 2月, 2008 3 次提交
    • O
      exec: rework the group exit and fix the race with kill · ed5d2cac
      Oleg Nesterov 提交于
      As Roland pointed out, we have the very old problem with exec.  de_thread()
      sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
      clears signal->flags.  All signals (even fatal ones) sent in this window
      (which is not too small) will be lost.
      
      With this patch exec doesn't abuse SIGNAL_GROUP_EXIT.  signal_group_exit(),
      the new helper, should be used to detect exit_group() or exec() in progress.
      It can have more users, but this patch does only strictly necessary changes.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed5d2cac
    • O
      remove handle_group_stop() in favor of do_signal_stop() · f558b7e4
      Oleg Nesterov 提交于
      Every time we set SIGNAL_GROUP_EXIT or clear SIGNAL_STOP_DEQUEUED we also
      reset ->group_stop_count.
      
      This means that the SIGNAL_GROUP_EXIT check in handle_group_stop() is not
      needed, and do_signal_stop() should check SIGNAL_STOP_DEQUEUED only when
      ->group_stop_count == 0. With these changes handle_group_stop() becomes the
      subset of do_signal_stop(), we can kill it and use do_signal_stop() instead.
      
      Also, a preparation for the next patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f558b7e4
    • O
      __group_complete_signal(): fix coredump with group stop race · 198466b4
      Oleg Nesterov 提交于
      When __group_complete_signal() sees sig_kernel_coredump() signal, it starts
      the group stop, but sets ->group_exit_task = t in a hope that "t" will
      actually dequeue this signal and invoke do_coredump().  However, by the
      time "t" enters get_signal_to_deliver() it is possible that the signal was
      blocked/ignored or we have another pending !SIG_KERNEL_COREDUMP_MASK signal
      which will be dequeued first.  This means the task could be stopped but not
      killed.
      
      Remove this code from __group_complete_signal().  Note also this patch
      removes the bogus signal_wake_up(t, 1).  This thread can't be
      STOPPED/TRACED, note the corresponding check in wants_signal().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      198466b4
  4. 01 2月, 2008 1 次提交
  5. 30 1月, 2008 1 次提交
  6. 07 12月, 2007 3 次提交
  7. 13 11月, 2007 1 次提交
  8. 30 10月, 2007 1 次提交
  9. 29 10月, 2007 1 次提交
  10. 20 10月, 2007 8 次提交
    • P
      Use helpers to obtain task pid in printks · ba25f9dc
      Pavel Emelyanov 提交于
      The task_struct->pid member is going to be deprecated, so start
      using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in
      the kernel.
      
      The first thing to start with is the pid, printed to dmesg - in
      this case we may safely use task_pid_nr(). Besides, printks produce
      more (much more) than a half of all the explicit pid usage.
      
      [akpm@linux-foundation.org: git-drm went and changed lots of stuff]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba25f9dc
    • P
      Isolate some explicit usage of task->tgid · bac0abd6
      Pavel Emelyanov 提交于
      With pid namespaces this field is now dangerous to use explicitly, so hide
      it behind the helpers.
      
      Also the pid and pgrp fields o task_struct and signal_struct are to be
      deprecated.  Unfortunately this patch cannot be sent right now as this
      leads to tons of warnings, so start isolating them, and deprecate later.
      
      Actually the p->tgid == pid has to be changed to has_group_leader_pid(),
      but Oleg pointed out that in case of posix cpu timers this is the same, and
      thread_group_leader() is more preferable.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bac0abd6
    • P
      Uninline find_task_by_xxx set of functions · 228ebcbe
      Pavel Emelyanov 提交于
      The find_task_by_something is a set of macros are used to find task by pid
      depending on what kind of pid is proposed - global or virtual one.  All of
      them are wrappers above the most generic one - find_task_by_pid_type_ns() -
      and just substitute some args for it.
      
      It turned out, that dereferencing the current->nsproxy->pid_ns construction
      and pushing one more argument on the stack inline cause kernel text size to
      grow.
      
      This patch moves all this stuff out-of-line into kernel/pid.c.  Together
      with the next patch it saves a bit less than 400 bytes from the .text
      section.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      228ebcbe
    • P
      pid namespaces: changes to show virtual ids to user · b488893a
      Pavel Emelyanov 提交于
      This is the largest patch in the set. Make all (I hope) the places where
      the pid is shown to or get from user operate on the virtual pids.
      
      The idea is:
       - all in-kernel data structures must store either struct pid itself
         or the pid's global nr, obtained with pid_nr() call;
       - when seeking the task from kernel code with the stored id one
         should use find_task_by_pid() call that works with global pids;
       - when showing pid's numerical value to the user the virtual one
         should be used, but however when one shows task's pid outside this
         task's namespace the global one is to be used;
       - when getting the pid from userspace one need to consider this as
         the virtual one and use appropriate task/pid-searching functions.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: nuther build fix]
      [akpm@linux-foundation.org: yet nuther build fix]
      [akpm@linux-foundation.org: remove unneeded casts]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b488893a
    • S
      pid namespaces: allow signalling cgroup-init · 0fbc26a6
      Sukadev Bhattiprolu 提交于
      Only the global-init process must be special - any other cgroup-init
      process must be killable to prevent run-away processes in the system.
      
      TODO: 	Ideally we should allow killing the cgroup-init only from parent
      	cgroup and prevent it being killed from within the cgroup.
      	But that is a more complex change and will be addressed by a follow-on
      	patch. For now allow the cgroup-init to be terminated by any process
      	with sufficient privileges.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fbc26a6
    • S
      pid namespaces: define is_global_init() and is_container_init() · b460cbc5
      Serge E. Hallyn 提交于
      is_init() is an ambiguous name for the pid==1 check.  Split it into
      is_global_init() and is_container_init().
      
      A cgroup init has it's tsk->pid == 1.
      
      A global init also has it's tsk->pid == 1 and it's active pid namespace
      is the init_pid_ns.  But rather than check the active pid namespace,
      compare the task structure with 'init_pid_ns.child_reaper', which is
      initialized during boot to the /sbin/init process and never changes.
      
      Changelog:
      
      	2.6.22-rc4-mm2-pidns1:
      	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
      	  global init (/sbin/init) process. This would improve performance
      	  and remove dependence on the task_pid().
      
      	2.6.21-mm2-pidns2:
      
      	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
      	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
      	  This way, we kill only the cgroup if the cgroup's init has a
      	  bug rather than force a kernel panic.
      
      [akpm@linux-foundation.org: fix comment]
      [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
      [bunk@stusta.de: kernel/pid.c: remove unused exports]
      [sukadev@us.ibm.com: Fix capability.c to work with threaded init]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b460cbc5
    • S
      pid namespaces: rename child_reaper() function · 88f21d81
      Sukadev Bhattiprolu 提交于
      Rename the child_reaper() function to task_child_reaper() to be similar to
      other task_* functions and to distinguish the function from 'struct
      pid_namspace.child_reaper'.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88f21d81
    • P
      pid namespaces: round up the API · a47afb0f
      Pavel Emelianov 提交于
      The set of functions process_session, task_session, process_group and
      task_pgrp is confusing, as the names can be mixed with each other when looking
      at the code for a long time.
      
      The proposals are to
      * equip the functions that return the integer with _nr suffix to
        represent that fact,
      * and to make all functions work with task (not process) by making
        the common prefix of the same name.
      
      For monotony the routines signal_session() and set_signal_session() are
      replaced with task_session_nr() and set_task_session(), especially since they
      are only used with the explicit task->signal dereference.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47afb0f
  11. 19 10月, 2007 1 次提交
  12. 17 10月, 2007 4 次提交
  13. 08 10月, 2007 1 次提交
  14. 21 9月, 2007 1 次提交
    • D
      signalfd simplification · b8fceee1
      Davide Libenzi 提交于
      This simplifies signalfd code, by avoiding it to remain attached to the
      sighand during its lifetime.
      
      In this way, the signalfd remain attached to the sighand only during
      poll(2) (and select and epoll) and read(2).  This also allows to remove
      all the custom "tsk == current" checks in kernel/signal.c, since
      dequeue_signal() will only be called by "current".
      
      I think this is also what Ben was suggesting time ago.
      
      The external effect of this, is that a thread can extract only its own
      private signals and the group ones.  I think this is an acceptable
      behaviour, in that those are the signals the thread would be able to
      fetch w/out signalfd.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8fceee1
  15. 31 8月, 2007 1 次提交
    • O
      sigqueue_free: fix the race with collect_signal() · 60187d27
      Oleg Nesterov 提交于
      Spotted by taoyue <yue.tao@windriver.com> and Jeremy Katz <jeremy.katz@windriver.com>.
      
      collect_signal:				sigqueue_free:
      
      	list_del_init(&first->list);
      						if (!list_empty(&q->list)) {
      							// not taken
      						}
      						q->flags &= ~SIGQUEUE_PREALLOC;
      
      	__sigqueue_free(first);			__sigqueue_free(q);
      
      Now, __sigqueue_free() is called twice on the same "struct sigqueue" with the
      obviously bad implications.
      
      In particular, this double free breaks the array_cache->avail logic, so the
      same sigqueue could be "allocated" twice, and the bug can manifest itself via
      the "impossible" BUG_ON(!SIGQUEUE_PREALLOC) in sigqueue_free/send_sigqueue.
      
      Hopefully this can explain these mysterious bug-reports, see
      
      	http://marc.info/?t=118766926500003
      	http://marc.info/?t=118466273000005
      
      Alexey Dobriyan reports this patch makes the difference for the testcase, but
      nobody has an access to the application which opened the problems originally.
      
      Also, this patch removes tasklist lock/unlock, ->siglock is enough.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: taoyue <yue.tao@windriver.com>
      Cc: Jeremy Katz <jeremy.katz@windriver.com>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60187d27
  16. 23 8月, 2007 1 次提交
    • O
      signalfd: fix interaction with posix-timers · 834d216e
      Oleg Nesterov 提交于
      dequeue_signal:
      
      	if (__SI_TIMER) {
      		spin_unlock(&tsk->sighand->siglock);
      		do_schedule_next_timer(info);
      		spin_lock(&tsk->sighand->siglock);
      	}
      
      Unless tsk == curent, this is absolutely unsafe: nothing prevents tsk from
      exiting. If signalfd was passed to another process, do_schedule_next_timer()
      is just wrong.
      
      Add yet another "tsk == current" check into dequeue_signal().
      
      This patch fixes an oopsable bug, but breaks the scheduling of posix timers
      if the shared __SI_TIMER signal was fetched via signalfd attached to another
      sub-thread. Mostly fixed by the next patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      834d216e
  17. 04 8月, 2007 1 次提交
  18. 23 7月, 2007 1 次提交
    • M
      x86: i386-show-unhandled-signals-v3 · abd4f750
      Masoud Asgharifard Sharbiani 提交于
      This patch makes the i386 behave the same way that x86_64 does when a
      segfault happens.  A line gets printed to the kernel log so that tools
      that need to check for failures can behave more uniformly between
      debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 >
      /proc/sys/debug/exception-trace)
      
      Also, all of the lines being printed are now using printk_ratelimit() to
      deny the ability of DoS from a local user with a program like the
      following:
      
      main()
      {
             while (1)
                     if (!fork()) *(int *)0 = 0;
      }
      
      This new revision also includes the fix that Andrew did which got rid of
      new sysctl that was added to the system in earlier versions of this.
      Also, 'show-unhandled-signals' sysctl has been renamed back to the old
      'exception-trace' to avoid breakage of people's scripts.
      
      AK: Enabling by default for i386 will be likely controversal, but let's see what happens
      AK: Really folks, before complaining just fix your segfaults
      AK: I bet this will find a lot of silent issues
      Signed-off-by: NMasoud Sharbiani <masouds@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      [ Personally, I've found the complaints useful on x86-64, so I'm all for
        this. That said, I wonder if we could do it more prettily..   -Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abd4f750
  19. 17 7月, 2007 1 次提交