1. 27 9月, 2012 3 次提交
  2. 31 7月, 2012 3 次提交
    • J
      coredump: fix wrong comments on core limits of pipe coredump case · 108ceeb0
      Jovi Zhang 提交于
      In commit 898b374a ("exec: replace call_usermodehelper_pipe with use
      of umh init function and resolve limit"), the core limits recursive
      check value was changed from 0 to 1, but the corresponding comments were
      not updated.
      Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      108ceeb0
    • K
      coredump: warn about unsafe suid_dumpable / core_pattern combo · 54b50199
      Kees Cook 提交于
      When suid_dumpable=2, detect unsafe core_pattern settings and warn when
      they are seen.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54b50199
    • K
      fs: make dumpable=2 require fully qualified path · 9520628e
      Kees Cook 提交于
      When the suid_dumpable sysctl is set to "2", and there is no core dump
      pipe defined in the core_pattern sysctl, a local user can cause core files
      to be written to root-writable directories, potentially with
      user-controlled content.
      
      This means an admin can unknowningly reintroduce a variation of
      CVE-2006-2451, allowing local users to gain root privileges.
      
        $ cat /proc/sys/fs/suid_dumpable
        2
        $ cat /proc/sys/kernel/core_pattern
        core
        $ ulimit -c unlimited
        $ cd /
        $ ls -l core
        ls: cannot access core: No such file or directory
        $ touch core
        touch: cannot touch `core': Permission denied
        $ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 &
        $ pid=$!
        $ sleep 1
        $ kill -SEGV $pid
        $ ls -l core
        -rw------- 1 root kees 458752 Jun 21 11:35 core
        $ sudo strings core | grep evil
        OHAI=evil-string-here
      
      While cron has been fixed to abort reading a file when there is any
      parse error, there are still other sensitive directories that will read
      any file present and skip unparsable lines.
      
      Instead of introducing a suid_dumpable=3 mode and breaking all users of
      mode 2, this only disables the unsafe portion of mode 2 (writing to disk
      via relative path).  Most users of mode 2 (e.g.  Chrome OS) already use
      a core dump pipe handler, so this change will not break them.  For the
      situations where a pipe handler is not defined but mode 2 is still
      active, crash dumps will only be written to fully qualified paths.  If a
      relative path is defined (e.g.  the default "core" pattern), dump
      attempts will trigger a printk yelling about the lack of a fully
      qualified path.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9520628e
  3. 30 7月, 2012 1 次提交
  4. 27 7月, 2012 1 次提交
    • J
      posix_types.h: Cleanup stale __NFDBITS and related definitions · 8ded2bbc
      Josh Boyer 提交于
      Recently, glibc made a change to suppress sign-conversion warnings in
      FD_SET (glibc commit ceb9e56b3d1).  This uncovered an issue with the
      kernel's definition of __NFDBITS if applications #include
      <linux/types.h> after including <sys/select.h>.  A build failure would
      be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
      flags to gcc.
      
      It was suggested that the kernel should either match the glibc
      definition of __NFDBITS or remove that entirely.  The current in-kernel
      uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
      uses of the related __FDELT and __FDMASK defines.  Given that, we'll
      continue the cleanup that was started with commit 8b3d1cda
      ("posix_types: Remove fd_set macros") and drop the remaining unused
      macros.
      
      Additionally, linux/time.h has similar macros defined that expand to
      nothing so we'll remove those at the same time.
      Reported-by: NJeff Law <law@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      [ .. and fix up whitespace as per akpm ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ded2bbc
  5. 21 6月, 2012 1 次提交
  6. 08 6月, 2012 2 次提交
    • L
      Revert "mm: correctly synchronize rss-counters at exit/exec" · 48d212a2
      Linus Torvalds 提交于
      This reverts commit 40af1bbd.
      
      It's horribly and utterly broken for at least the following reasons:
      
       - calling sync_mm_rss() from mmput() is fundamentally wrong, because
         there's absolutely no reason to believe that the task that does the
         mmput() always does it on its own VM.  Example: fork, ptrace, /proc -
         you name it.
      
       - calling it *after* having done mmdrop() on it is doubly insane, since
         the mm struct may well be gone now.
      
       - testing mm against NULL before you call it is insane too, since a
      NULL mm there would have caused oopses long before.
      
      .. and those are just the three bugs I found before I decided to give up
      looking for me and revert it asap.  I should have caught it before I
      even took it, but I trusted Andrew too much.
      
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48d212a2
    • K
      mm: correctly synchronize rss-counters at exit/exec · 40af1bbd
      Konstantin Khlebnikov 提交于
      mm->rss_stat counters have per-task delta: task->rss_stat.  Before
      changing task->mm pointer the kernel must flush this delta with
      sync_mm_rss().
      
      do_exit() already calls sync_mm_rss() to flush the rss-counters before
      committing the rss statistics into task->signal->maxrss, taskstats,
      audit and other stuff.  Unfortunately the kernel does this before
      calling mm_release(), which can call put_user() for processing
      task->clear_child_tid.  So at this point we can trigger page-faults and
      task->rss_stat becomes non-zero again.  As a result mm->rss_stat becomes
      inconsistent and check_mm() will print something like this:
      
      | BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1
      | BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1
      
      This patch moves sync_mm_rss() into mm_release(), and moves mm_release()
      out of do_exit() and calls it earlier.  After mm_release() there should
      be no pagefaults.
      
      [akpm@linux-foundation.org: tweak comment]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>		[3.4.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40af1bbd
  7. 01 6月, 2012 1 次提交
  8. 17 5月, 2012 1 次提交
    • S
      coredump: ensure the fpu state is flushed for proper multi-threaded core dump · 11aeca0b
      Suresh Siddha 提交于
      Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
      arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
      of a multi-threaded application.
      
      A look at the exit seqeuence shows that other threads can still be on the
      runqueue potentially at the below shown exit_mm() code snippet:
      
      		if (atomic_dec_and_test(&core_state->nr_threads))
      			complete(&core_state->startup);
      
      ===> other threads can still be active here, but we notify the thread
      ===> dumping core to wakeup from the coredump_wait() after the last thread
      ===> joins this point. Core dumping thread will continue dumping
      ===> all the threads state to the core file.
      
      		for (;;) {
      			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
      			if (!self.task) /* see coredump_finish() */
      				break;
      			schedule();
      		}
      
      As some of those threads are on the runqueue and didn't call schedule() yet,
      their fpu state is still active in the live registers and the thread
      proceeding with the coredump will hit the above mentioned BUG_ON while
      trying to dump other threads fpustate to the coredump file.
      
      BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
      in the code paths for processors supporting xsaveopt. With or without
      xsaveopt, multi-threaded coredump is broken and maynot contain
      the correct fpustate at the time of exit.
      
      In coredump_wait(), wait for all the threads to be come inactive, so
      that we are sure all the extended register state is flushed to
      the memory, so that it can be reliably copied to the core file.
      Reported-by: NSuresh Nalluru <suresh@aristanetworks.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Link: http://lkml.kernel.org/r/1336692811-30576-2-git-send-email-suresh.b.siddha@intel.comAcked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      11aeca0b
  9. 16 5月, 2012 1 次提交
  10. 03 5月, 2012 1 次提交
  11. 14 4月, 2012 1 次提交
    • A
      Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs · 259e5e6c
      Andy Lutomirski 提交于
      With this change, calling
        prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)
      disables privilege granting operations at execve-time.  For example, a
      process will not be able to execute a setuid binary to change their uid
      or gid if this bit is set.  The same is true for file capabilities.
      
      Additionally, LSM_UNSAFE_NO_NEW_PRIVS is defined to ensure that
      LSMs respect the requested behavior.
      
      To determine if the NO_NEW_PRIVS bit is set, a task may call
        prctl(PR_GET_NO_NEW_PRIVS, 0, 0, 0, 0);
      It returns 1 if set and 0 if it is not set. If any of the arguments are
      non-zero, it will return -1 and set errno to -EINVAL.
      (PR_SET_NO_NEW_PRIVS behaves similarly.)
      
      This functionality is desired for the proposed seccomp filter patch
      series.  By using PR_SET_NO_NEW_PRIVS, it allows a task to modify the
      system call behavior for itself and its child tasks without being
      able to impact the behavior of a more privileged task.
      
      Another potential use is making certain privileged operations
      unprivileged.  For example, chroot may be considered "safe" if it cannot
      affect privileged tasks.
      
      Note, this patch causes execve to fail when PR_SET_NO_NEW_PRIVS is
      set and AppArmor is in use.  It is fixed in a subsequent patch.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NEric Paris <eparis@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      
      v18: updated change desc
      v17: using new define values as per 3.4
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      259e5e6c
  12. 31 3月, 2012 1 次提交
  13. 29 3月, 2012 1 次提交
    • D
      Add #includes needed to permit the removal of asm/system.h · 96f951ed
      David Howells 提交于
      asm/system.h is a cause of circular dependency problems because it contains
      commonly used primitive stuff like barrier definitions and uncommonly used
      stuff like switch_to() that might require MMU definitions.
      
      asm/system.h has been disintegrated by this point on all arches into the
      following common segments:
      
       (1) asm/barrier.h
      
           Moved memory barrier definitions here.
      
       (2) asm/cmpxchg.h
      
           Moved xchg() and cmpxchg() here.  #included in asm/atomic.h.
      
       (3) asm/bug.h
      
           Moved die() and similar here.
      
       (4) asm/exec.h
      
           Moved arch_align_stack() here.
      
       (5) asm/elf.h
      
           Moved AT_VECTOR_SIZE_ARCH here.
      
       (6) asm/switch_to.h
      
           Moved switch_to() here.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      96f951ed
  14. 22 3月, 2012 1 次提交
  15. 21 3月, 2012 4 次提交
  16. 20 3月, 2012 1 次提交
  17. 06 3月, 2012 2 次提交
  18. 23 2月, 2012 1 次提交
    • D
      tracepoint, vfs, sched: Add exec() tracepoint · 4ff16c25
      David Smith 提交于
      Added a minimal exec tracepoint. Exec is an important major event
      in the life of a task, like fork(), clone() or exit(), all of
      which we already trace.
      
      [ We also do scheduling re-balancing during exec() - so it's useful
        from a scheduler instrumentation POV as well. ]
      
      If you want to watch a task start up, when it gets exec'ed is a good place
      to start.  With the addition of this tracepoint, exec's can be monitored
      and better picture of general system activity can be obtained. This
      tracepoint will also enable better process life tracking, allowing you to
      answer questions like "what process keeps starting up binary X?".
      
      This tracepoint can also be useful in ftrace filtering and trigger
      conditions: i.e. starting or stopping filtering when exec is called.
      Signed-off-by: NDavid Smith <dsmith@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/4F314D19.7030504@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4ff16c25
  19. 20 2月, 2012 2 次提交
    • D
      Replace the fd_sets in struct fdtable with an array of unsigned longs · 1fd36adc
      David Howells 提交于
      Replace the fd_sets in struct fdtable with an array of unsigned longs and then
      use the standard non-atomic bit operations rather than the FD_* macros.
      
      This:
      
       (1) Removes the abuses of struct fd_set:
      
           (a) Since we don't want to allocate a full fd_set the vast majority of the
           	 time, we actually, in effect, just allocate a just-big-enough array of
           	 unsigned longs and cast it to an fd_set type - so why bother with the
           	 fd_set at all?
      
           (b) Some places outside of the core fdtable handling code (such as
           	 SELinux) want to look inside the array of unsigned longs hidden inside
           	 the fd_set struct for more efficient iteration over the entire set.
      
       (2) Eliminates the use of FD_*() macros in the kernel completely.
      
       (3) Permits the __FD_*() macros to be deleted entirely where not exposed to
           userspace.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.ukSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      1fd36adc
    • D
      Wrap accesses to the fd_sets in struct fdtable · 1dce27c5
      David Howells 提交于
      Wrap accesses to the fd_sets in struct fdtable (for recording open files and
      close-on-exec flags) so that we can move away from using fd_sets since we
      abuse the fd_set structs by not allocating the full-sized structure under
      normal circumstances and by non-core code looking at the internals of the
      fd_sets.
      
      The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
      since that cannot be told about their abnormal lengths.
      
      This introduces six wrapper functions for setting, clearing and testing
      close-on-exec flags and fd-is-open flags:
      
      	void __set_close_on_exec(int fd, struct fdtable *fdt);
      	void __clear_close_on_exec(int fd, struct fdtable *fdt);
      	bool close_on_exec(int fd, const struct fdtable *fdt);
      	void __set_open_fd(int fd, struct fdtable *fdt);
      	void __clear_open_fd(int fd, struct fdtable *fdt);
      	bool fd_is_open(int fd, const struct fdtable *fdt);
      
      Note that I've prepended '__' to the names of the set/clear functions because
      they require the caller to hold a lock to use them.
      
      Note also that I haven't added wrappers for looking behind the scenes at the
      the array.  Possibly that should exist too.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.ukSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      1dce27c5
  20. 07 2月, 2012 1 次提交
    • H
      exec: fix use-after-free bug in setup_new_exec() · 96e02d15
      Heiko Carstens 提交于
      Setting the task name is done within setup_new_exec() by accessing
      bprm->filename. However this happens after flush_old_exec().
      This may result in a use after free bug, flush_old_exec() may
      "complete" vfork_done, which will wake up the parent which in turn
      may free the passed in filename.
      To fix this add a new tcomm field in struct linux_binprm which
      contains the now early generated task name until it is used.
      
      Fixes this bug on s390:
      
        Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
        Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
        Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
        Call Trace:
        ([<0000000000282e2c>] setup_new_exec+0x38/0x374)
         [<00000000002dd12e>] load_elf_binary+0x402/0x1bf4
         [<0000000000280a42>] search_binary_handler+0x38e/0x5bc
         [<0000000000282b6c>] do_execve_common+0x410/0x514
         [<0000000000282cb6>] do_execve+0x46/0x58
         [<00000000005bce58>] kernel_execve+0x28/0x70
         [<000000000014ba2e>] ____call_usermodehelper+0x102/0x140
         [<00000000005bc8da>] kernel_thread_starter+0x6/0xc
         [<00000000005bc8d4>] kernel_thread_starter+0x0/0xc
        Last Breaking-Event-Address:
         [<00000000002830f0>] setup_new_exec+0x2fc/0x374
      
        Kernel panic - not syncing: Fatal exception: panic_on_oops
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96e02d15
  21. 11 1月, 2012 1 次提交
    • K
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki 提交于
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
  22. 04 1月, 2012 1 次提交
  23. 01 11月, 2011 1 次提交
    • D
      oom: remove oom_disable_count · c9f01245
      David Rientjes 提交于
      This removes mm->oom_disable_count entirely since it's unnecessary and
      currently buggy.  The counter was intended to be per-process but it's
      currently decremented in the exit path for each thread that exits, causing
      it to underflow.
      
      The count was originally intended to prevent oom killing threads that
      share memory with threads that cannot be killed since it doesn't lead to
      future memory freeing.  The counter could be fixed to represent all
      threads sharing the same mm, but it's better to remove the count since:
      
       - it is possible that the OOM_DISABLE thread sharing memory with the
         victim is waiting on that thread to exit and will actually cause
         future memory freeing, and
      
       - there is no guarantee that a thread is disabled from oom killing just
         because another thread sharing its mm is oom disabled.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9f01245
  24. 12 8月, 2011 1 次提交
    • V
      move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997
      Vasiliy Kulikov 提交于
      The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
      check in set_user() to check for NPROC exceeding via setuid() and
      similar functions.
      
      Before the check there was a possibility to greatly exceed the allowed
      number of processes by an unprivileged user if the program relied on
      rlimit only.  But the check created new security threat: many poorly
      written programs simply don't check setuid() return code and believe it
      cannot fail if executed with root privileges.  So, the check is removed
      in this patch because of too often privilege escalations related to
      buggy programs.
      
      The NPROC can still be enforced in the common code flow of daemons
      spawning user processes.  Most of daemons do fork()+setuid()+execve().
      The check introduced in execve() (1) enforces the same limit as in
      setuid() and (2) doesn't create similar security issues.
      
      Neil Brown suggested to track what specific process has exceeded the
      limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
      this process would fail on execve(), and other processes' execve()
      behaviour is not changed.
      
      Solar Designer suggested to re-check whether NPROC limit is still
      exceeded at the moment of execve().  If the process was sleeping for
      days between set*uid() and execve(), and the NPROC counter step down
      under the limit, the defered execve() failure because NPROC limit was
      exceeded days ago would be unexpected.  If the limit is not exceeded
      anymore, we clear the flag on successful calls to execve() and fork().
      
      The flag is also cleared on successful calls to set_user() as the limit
      was exceeded for the previous user, not the current one.
      
      Similar check was introduced in -ow patches (without the process flag).
      
      v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72fa5997
  25. 27 7月, 2011 6 次提交