1. 29 9月, 2008 1 次提交
    • B
      mm owner: fix race between swapoff and exit · 31a78f23
      Balbir Singh 提交于
      There's a race between mm->owner assignment and swapoff, more easily
      seen when task slab poisoning is turned on.  The condition occurs when
      try_to_unuse() runs in parallel with an exiting task.  A similar race
      can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
      or ptrace or page migration.
      
      CPU0                                    CPU1
                                              try_to_unuse
                                              looks at mm = task0->mm
                                              increments mm->mm_users
      task 0 exits
      mm->owner needs to be updated, but no
      new owner is found (mm_users > 1, but
      no other task has task->mm = task0->mm)
      mm_update_next_owner() leaves
                                              mmput(mm) decrements mm->mm_users
      task0 freed
                                              dereferencing mm->owner fails
      
      The fix is to notify the subsystem via mm_owner_changed callback(),
      if no new owner is found, by specifying the new task as NULL.
      
      Jiri Slaby:
      mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
      must be set after that, so as not to pass NULL as old owner causing oops.
      
      Daisuke Nishimura:
      mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
      and its callers need to take account of this situation to avoid oops.
      
      Hugh Dickins:
      Lockdep warning and hang below exec_mmap() when testing these patches.
      exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
      so exec_mmap() now needs to do the same.  And with that repositioning,
      there's now no point in mm_need_new_owner() allowing for NULL mm.
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a78f23
  2. 29 7月, 2008 1 次提交
  3. 27 7月, 2008 5 次提交
  4. 26 7月, 2008 12 次提交
  5. 25 7月, 2008 1 次提交
  6. 11 7月, 2008 1 次提交
    • H
      exec: fix stack excutability without PT_GNU_STACK · 96a8e13e
      Hugh Dickins 提交于
      Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
      exec'ing an ELF without a PT_GNU_STACK program header should default to an
      executable stack; but this got broken by the unlimited argv feature because
      stack vma is now created before the right personality has been established:
      so breaking old binaries using nested function trampolines.
      
      Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
      vm_flags used to be set, before the mprotect_fixup.  Checking through
      our existing VM_flags, none would have changed since insert_vm_struct:
      so this seems safer than finding a way through the personality labyrinth.
      
      Reported-by: pageexec@freemail.hu
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96a8e13e
  7. 17 6月, 2008 1 次提交
  8. 27 5月, 2008 1 次提交
  9. 17 5月, 2008 1 次提交
  10. 13 5月, 2008 1 次提交
  11. 02 5月, 2008 1 次提交
  12. 30 4月, 2008 2 次提交
  13. 29 4月, 2008 3 次提交
    • M
      procfs task exe symlink · 925d1c40
      Matt Helsley 提交于
      The kernel implements readlink of /proc/pid/exe by getting the file from
      the first executable VMA.  Then the path to the file is reconstructed and
      reported as the result.
      
      Because of the VMA walk the code is slightly different on nommu systems.
      This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
      walking the VMAs to find the first executable file-backed VMA we store a
      reference to the exec'd file in the mm_struct.
      
      That reference would prevent the filesystem holding the executable file
      from being unmounted even after unmapping the VMAs.  So we track the number
      of VM_EXECUTABLE VMAs and drop the new reference when the last one is
      unmapped.  This avoids pinning the mounted filesystem.
      
      [akpm@linux-foundation.org: improve comments]
      [yamamoto@valinux.co.jp: fix dup_mmap]
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Howells <dhowells@redhat.com>
      Cc:"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      925d1c40
    • B
      cgroups: add an owner to the mm_struct · cf475ad2
      Balbir Singh 提交于
      Remove the mem_cgroup member from mm_struct and instead adds an owner.
      
      This approach was suggested by Paul Menage.  The advantage of this approach
      is that, once the mm->owner is known, using the subsystem id, the cgroup
      can be determined.  It also allows several control groups that are
      virtually grouped by mm_struct, to exist independent of the memory
      controller i.e., without adding mem_cgroup's for each controller, to
      mm_struct.
      
      A new config option CONFIG_MM_OWNER is added and the memory resource
      controller selects this config option.
      
      This patch also adds cgroup callbacks to notify subsystems when mm->owner
      changes.  The mm_cgroup_changed callback is called with the task_lock() of
      the new task held and is called just prior to changing the mm->owner.
      
      I am indebted to Paul Menage for the several reviews of this patchset and
      helping me make it lighter and simpler.
      
      This patch was tested on a powerpc box, it was compiled with both the
      MM_OWNER config turned on and off.
      
      After the thread group leader exits, it's moved to init_css_state by
      cgroup_exit(), thus all future charges from runnings threads would be
      redirected to the init_css_set's subsystem.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>,
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf475ad2
    • T
      exec: remove argv_len from struct linux_binprm · 175a06ae
      Tetsuo Handa 提交于
      I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve().  But it
      doesn't update bprm->argv_len after "remove_arg_zero() +
      copy_strings_kernel()" at load_script() etc.
      
      audit_bprm() is called from search_binary_handler() and
      search_binary_handler() is called from load_script() etc.  Thus, I think the
      condition check
      
        if (bprm->argv_len > (audit_argv_kb << 10))
                return -E2BIG;
      
      in audit_bprm() might return wrong result when strlen(removed_arg) !=
      strlen(spliced_args).  Why not update bprm->argv_len at load_script() etc.  ?
      
      By the way, 2.6.25-rc3 seems to not doing the condition check.  Is the field
      bprm->argv_len no longer needed?
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Ollie Wild <aaw@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      175a06ae
  14. 25 4月, 2008 2 次提交
    • A
      [PATCH] sanitize unshare_files/reset_files_struct · 3b125388
      Al Viro 提交于
      * let unshare_files() give caller the displaced files_struct
      * don't bother with grabbing reference only to drop it in the
        caller if it hadn't been shared in the first place
      * in that form unshare_files() is trivially implemented via
        unshare_fd(), so we eliminate the duplicate logics in fork.c
      * reset_files_struct() is not just only called for current;
        it will break the system if somebody ever calls it for anything
        else (we can't modify ->files of somebody else).  Lose the
        task_struct * argument.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3b125388
    • A
      [PATCH] sanitize handling of shared descriptor tables in failing execve() · fd8328be
      Al Viro 提交于
      * unshare_files() can fail; doing it after irreversible actions is wrong
        and de_thread() is certainly irreversible.
      * since we do it unconditionally anyway, we might as well do it in do_execve()
        and save ourselves the PITA in binfmt handlers, etc.
      * while we are at it, binfmt_som actually leaked files_struct on failure.
      
      As a side benefit, unshare_files(), put_files_struct() and reset_files_struct()
      become unexported.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fd8328be
  15. 04 3月, 2008 1 次提交
    • L
      Allow ARG_MAX execve string space even with a small stack limit · a64e715f
      Linus Torvalds 提交于
      The new code that removed the limitation on the execve string size
      (which was historically 32 pages) replaced it with a much softer limit
      based on RLIMIT_STACK which is usually much larger than the traditional
      limit.  See commit b6a2fea3 ("mm:
      variable length argument support") for details.
      
      However, if you have a small stack limit (perhaps because you need lots
      of stacks in a threaded environment), the new heuristic of allowing up
      to 1/4th of RLIMIT_STACK to be used for argument and environment strings
      could actually be smaller than the old limit.
      
      So just say that it's ok to have up to ARG_MAX strings regardless of the
      value of RLIMIT_STACK, and check the rlimit only when going over that
      traditional limit.
      
      (Of course, if you actually have a *really* small stack limit, the whole
      stack itself will be limited before you hit ARG_MAX, but that has always
      been true and is clearly the right behaviour anyway).
      Acked-by: NCarlos O'Donell <carlos@codesourcery.com>
      Cc: Michael Kerrisk <michael.kerrisk@googlemail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ollie Wild <aaw@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a64e715f
  16. 15 2月, 2008 2 次提交
  17. 09 2月, 2008 3 次提交
  18. 06 2月, 2008 1 次提交
    • O
      exec: rework the group exit and fix the race with kill · ed5d2cac
      Oleg Nesterov 提交于
      As Roland pointed out, we have the very old problem with exec.  de_thread()
      sets SIGNAL_GROUP_EXIT, kills other threads, changes ->group_leader and then
      clears signal->flags.  All signals (even fatal ones) sent in this window
      (which is not too small) will be lost.
      
      With this patch exec doesn't abuse SIGNAL_GROUP_EXIT.  signal_group_exit(),
      the new helper, should be used to detect exit_group() or exec() in progress.
      It can have more users, but this patch does only strictly necessary changes.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed5d2cac