1. 27 5月, 2011 1 次提交
    • J
      mm: extract exe_file handling from procfs · 38646013
      Jiri Slaby 提交于
      Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/.
      This was because exe_file was needed only for /proc/<pid>/exe.  Since we
      will need the exe_file functionality also for core dumps (so core name can
      contain full binary path), built this functionality always into the
      kernel.
      
      To achieve that move that out of proc FS to the kernel/ where in fact it
      should belong.  By doing that we can make dup_mm_exe_file static.  Also we
      can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38646013
  2. 25 5月, 2011 2 次提交
    • P
      mm: mmu_gather rework · d16dfc55
      Peter Zijlstra 提交于
      Rework the existing mmu_gather infrastructure.
      
      The direct purpose of these patches was to allow preemptible mmu_gather,
      but even without that I think these patches provide an improvement to the
      status quo.
      
      The first 9 patches rework the mmu_gather infrastructure.  For review
      purpose I've split them into generic and per-arch patches with the last of
      those a generic cleanup.
      
      The next patch provides generic RCU page-table freeing, and the followup
      is a patch converting s390 to use this.  I've also got 4 patches from
      DaveM lined up (not included in this series) that uses this to implement
      gup_fast() for sparc64.
      
      Then there is one patch that extends the generic mmu_gather batching.
      
      After that follow the mm preemptibility patches, these make part of the mm
      a lot more preemptible.  It converts i_mmap_lock and anon_vma->lock to
      mutexes which together with the mmu_gather rework makes mmu_gather
      preemptible as well.
      
      Making i_mmap_lock a mutex also enables a clean-up of the truncate code.
      
      This also allows for preemptible mmu_notifiers, something that XPMEM I
      think wants.
      
      Furthermore, it removes the new and universially detested unmap_mutex.
      
      This patch:
      
      Remove the first obstacle towards a fully preemptible mmu_gather.
      
      The current scheme assumes mmu_gather is always done with preemption
      disabled and uses per-cpu storage for the page batches.  Change this to
      try and allocate a page for batching and in case of failure, use a small
      on-stack array to make some progress.
      
      Preemptible mmu_gather is desired in general and usable once i_mmap_lock
      becomes a mutex.  Doing it before the mutex conversion saves us from
      having to rework the code by moving the mmu_gather bits inside the
      pte_lock.
      
      Also avoid flushing the tlb batches from under the pte lock, this is
      useful even without the i_mmap_lock conversion as it significantly reduces
      pte lock hold times.
      
      [akpm@linux-foundation.org: fix comment tpyo]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d16dfc55
    • M
      mm: make expand_downwards() symmetrical with expand_upwards() · d05f3169
      Michal Hocko 提交于
      Currently we have expand_upwards exported while expand_downwards is
      accessible only via expand_stack or expand_stack_downwards.
      
      check_stack_guard_page is a nice example of the asymmetry.  It uses
      expand_stack for VM_GROWSDOWN while expand_upwards is called for
      VM_GROWSUP case.
      
      Let's clean this up by exporting both functions and make those names
      consistent.  Let's use expand_{upwards,downwards} because expanding
      doesn't always involve stack manipulation (an example is
      ia64_do_page_fault which uses expand_upwards for registers backing store
      expansion).  expand_downwards has to be defined for both
      CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards
      version in the early process initialization phase for growsup
      configuration.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d05f3169
  3. 14 5月, 2011 1 次提交
  4. 09 4月, 2011 4 次提交
  5. 23 3月, 2011 1 次提交
    • T
      signal: Use GROUP_STOP_PENDING to stop once for a single group stop · 39efa3ef
      Tejun Heo 提交于
      Currently task->signal->group_stop_count is used to decide whether to
      stop for group stop.  However, if there is a task in the group which
      is taking a long time to stop, other tasks which are continued by
      ptrace would repeatedly stop for the same group stop until the group
      stop is complete.
      
      Conversely, if a ptraced task is in TASK_TRACED state, the debugger
      won't get notified of group stops which is inconsistent compared to
      the ptraced task in any other state.
      
      This patch introduces GROUP_STOP_PENDING which tracks whether a task
      is yet to stop for the group stop in progress.  The flag is set when a
      group stop starts and cleared when the task stops the first time for
      the group stop, and consulted whenever whether the task should
      participate in a group stop needs to be determined.  Note that now
      tasks in TASK_TRACED also participate in group stop.
      
      This results in the following behavior changes.
      
      * For a single group stop, a ptracer would see at most one stop
        reported.
      
      * A ptracee in TASK_TRACED now also participates in group stop and the
        tracer would get the notification.  However, as a ptraced task could
        be in TASK_STOPPED state or any ptrace trap could consume group
        stop, the notification may still be missing.  These will be
        addressed with further patches.
      
      * A ptracee may start a group stop while one is still in progress if
        the tracer let it continue with stop signal delivery.  Group stop
        code handles this correctly.
      
      Oleg:
      
      * Spotted that a task might skip signal check even when its
        GROUP_STOP_PENDING is set.  Fixed by updating
        recalc_sigpending_tsk() to check GROUP_STOP_PENDING instead of
        group_stop_count.
      
      * Pointed out that task->group_stop should be cleared whenever
        task->signal->group_stop_count is cleared.  Fixed accordingly.
      
      * Pointed out the behavior inconsistency between TASK_TRACED and
        RUNNING and the last behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      39efa3ef
  6. 21 3月, 2011 1 次提交
    • H
      Small typo fix... · 1bef8291
      Holger Hans Peter Freyther 提交于
      Hi,
      
      I was backporting the coredump over pipe feature and noticed this small typo,
      I wish I would have something bigger to contribute...
      
      >From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
      From: Holger Hans Peter Freyther <holger@moiji-mobile.com>
      Date: Thu, 24 Feb 2011 17:42:50 +0100
      Subject: [PATCH] fs: Fix a small typo in the comment
      
      The function is called umh_pipe_setup not uhm_pipe_setup.
      Signed-off-by: NHolger Hans Peter Freyther <holger@moiji-mobile.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1bef8291
  7. 14 3月, 2011 1 次提交
  8. 03 2月, 2011 1 次提交
  9. 16 12月, 2010 1 次提交
    • T
      install_special_mapping skips security_file_mmap check. · 462e635e
      Tavis Ormandy 提交于
      The install_special_mapping routine (used, for example, to setup the
      vdso) skips the security check before insert_vm_struct, allowing a local
      attacker to bypass the mmap_min_addr security restriction by limiting
      the available pages for special mappings.
      
      bprm_mm_init() also skips the check, and although I don't think this can
      be used to bypass any restrictions, I don't see any reason not to have
      the security check.
      
        $ uname -m
        x86_64
        $ cat /proc/sys/vm/mmap_min_addr
        65536
        $ cat install_special_mapping.s
        section .bss
            resb BSS_SIZE
        section .text
            global _start
            _start:
                mov     eax, __NR_pause
                int     0x80
        $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
        $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
        $ ./install_special_mapping &
        [1] 14303
        $ cat /proc/14303/maps
        0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
        00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
        00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]
      
      It's worth noting that Red Hat are shipping with mmap_min_addr set to
      4096.
      Signed-off-by: NTavis Ormandy <taviso@google.com>
      Acked-by: NKees Cook <kees@ubuntu.com>
      Acked-by: NRobert Swiecki <swiecki@google.com>
      [ Changed to not drop the error code - akpm ]
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      462e635e
  10. 01 12月, 2010 2 次提交
    • O
      exec: copy-and-paste the fixes into compat_do_execve() paths · 114279be
      Oleg Nesterov 提交于
      Note: this patch targets 2.6.37 and tries to be as simple as possible.
      That is why it adds more copy-and-paste horror into fs/compat.c and
      uglifies fs/exec.c, this will be cleanuped later.
      
      compat_copy_strings() plays with bprm->vma/mm directly and thus has
      two problems: it lacks the RLIMIT_STACK check and argv/envp memory
      is not visible to oom killer.
      
      Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
      to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
      as do_execve() does.
      
      Add the fatal_signal_pending/cond_resched checks into compat_count() and
      compat_copy_strings(), this matches the code in fs/exec.c and certainly
      makes sense.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      114279be
    • O
      exec: make argv/envp memory visible to oom-killer · 3c77f845
      Oleg Nesterov 提交于
      Brad Spengler published a local memory-allocation DoS that
      evades the OOM-killer (though not the virtual memory RLIMIT):
      http://www.grsecurity.net/~spender/64bit_dos.c
      
      execve()->copy_strings() can allocate a lot of memory, but
      this is not visible to oom-killer, nobody can see the nascent
      bprm->mm and take it into account.
      
      With this patch get_arg_page() increments current's MM_ANONPAGES
      counter every time we allocate the new page for argv/envp. When
      do_execve() succeds or fails, we change this counter back.
      
      Technically this is not 100% correct, we can't know if the new
      page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
      I don't think this really matters and everything becomes correct
      once exec changes ->mm or fails.
      Reported-by: NBrad Spengler <spender@grsecurity.net>
      Reviewed-and-discussed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c77f845
  11. 28 10月, 2010 3 次提交
  12. 27 10月, 2010 1 次提交
    • Y
      oom: add per-mm oom disable count · 3d5992d2
      Ying Han 提交于
      It's pointless to kill a task if another thread sharing its mm cannot be
      killed to allow future memory freeing.  A subsequent patch will prevent
      kills in such cases, but first it's necessary to have a way to flag a task
      that shares memory with an OOM_DISABLE task that doesn't incur an
      additional tasklist scan, which would make select_bad_process() an O(n^2)
      function.
      
      This patch adds an atomic counter to struct mm_struct that follows how
      many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
      They cannot be killed by the kernel, so their memory cannot be freed in
      oom conditions.
      
      This only requires task_lock() on the task that we're operating on, it
      does not require mm->mmap_sem since task_lock() pins the mm and the
      operation is atomic.
      
      [rientjes@google.com: changelog and sys_unshare() code]
      [rientjes@google.com: protect oom_disable_count with task_lock in fork]
      [rientjes@google.com: use old_mm for oom_disable_count in exec]
      Signed-off-by: NYing Han <yinghan@google.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d5992d2
  13. 15 10月, 2010 2 次提交
  14. 10 9月, 2010 3 次提交
  15. 18 8月, 2010 2 次提交
    • N
      fs: fs_struct rwlock to spinlock · 2a4419b5
      Nick Piggin 提交于
      fs: fs_struct rwlock to spinlock
      
      struct fs_struct.lock is an rwlock with the read-side used to protect root and
      pwd members while taking references to them. Taking a reference to a path
      typically requires just 2 atomic ops, so the critical section is very small.
      Parallel read-side operations would have cacheline contention on the lock, the
      dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
      real parallelism increase.
      
      Replace it with a spinlock to avoid one or two atomic operations in typical
      path lookup fastpath.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2a4419b5
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  16. 28 7月, 2010 1 次提交
    • E
      fsnotify: pass a file instead of an inode to open, read, and write · 2a12a9d7
      Eric Paris 提交于
      fanotify, the upcoming notification system actually needs a struct path so it can
      do opens in the context of listeners, and it needs a file so it can get f_flags
      from the original process.  Close was the only operation that already was passing
      a struct file to the notification hook.  This patch passes a file for access,
      modify, and open as well as they are easily available to these hooks.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a12a9d7
  17. 10 7月, 2010 1 次提交
  18. 09 6月, 2010 1 次提交
  19. 28 5月, 2010 6 次提交
  20. 25 5月, 2010 1 次提交
    • M
      mm: migration: avoid race between shift_arg_pages() and rmap_walk() during... · a8bef8ff
      Mel Gorman 提交于
      mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks
      
      Page migration requires rmap to be able to find all ptes mapping a page
      at all times, otherwise the migration entry can be instantiated, but it
      is possible to leave one behind if the second rmap_walk fails to find
      the page.  If this page is later faulted, migration_entry_to_page() will
      call BUG because the page is locked indicating the page was migrated by
      the migration PTE not cleaned up. For example
      
        kernel BUG at include/linux/swapops.h:105!
        invalid opcode: 0000 [#1] PREEMPT SMP
        ...
        Call Trace:
         [<ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a
         [<ffffffff8130c7a2>] do_page_fault+0x44a/0x46e
         [<ffffffff813099b5>] page_fault+0x25/0x30
         [<ffffffff8114de33>] load_elf_binary+0x152a/0x192b
         [<ffffffff8111329b>] search_binary_handler+0x173/0x313
         [<ffffffff81114896>] do_execve+0x219/0x30a
         [<ffffffff8100a5c6>] sys_execve+0x43/0x5e
         [<ffffffff8100320a>] stub_execve+0x6a/0xc0
        RIP  [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129
      
      There is a race between shift_arg_pages and migration that triggers this
      bug.  A temporary stack is setup during exec and later moved.  If
      migration moves a page in the temporary stack and the VMA is then removed
      before migration completes, the migration PTE may not be found leading to
      a BUG when the stack is faulted.
      
      This patch causes pages within the temporary stack during exec to be
      skipped by migration.  It does this by marking the VMA covering the
      temporary stack with an otherwise impossible combination of VMA flags.
      These flags are cleared when the temporary stack is moved to its final
      location.
      
      [kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8bef8ff
  21. 12 5月, 2010 1 次提交
    • R
      revert "procfs: provide stack information for threads" and its fixup commits · 34441427
      Robin Holt 提交于
      Originally, commit d899bf7b ("procfs: provide stack information for
      threads") attempted to introduce a new feature for showing where the
      threadstack was located and how many pages are being utilized by the
      stack.
      
      Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
      applied to fix the NO_MMU case.
      
      Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
      64-bit") was applied to fix a bug in ia32 executables being loaded.
      
      Commit 9ebd4eba ("procfs: fix /proc/<pid>/stat stack pointer for kernel
      threads") was applied to fix a bug which had kernel threads printing a
      userland stack address.
      
      Commit 1306d603 ('proc: partially revert "procfs: provide stack
      information for threads"') was then applied to revert the stack pages
      being used to solve a significant performance regression.
      
      This patch nearly undoes the effect of all these patches.
      
      The reason for reverting these is it provides an unusable value in
      field 28.  For x86_64, a fork will result in the task->stack_start
      value being updated to the current user top of stack and not the stack
      start address.  This unpredictability of the stack_start value makes
      it worthless.  That includes the intended use of showing how much stack
      space a thread has.
      
      Other architectures will get different values.  As an example, ia64
      gets 0.  The do_fork() and copy_process() functions appear to treat the
      stack_start and stack_size parameters as architecture specific.
      
      I only partially reverted c44972f1 ("procfs: disable per-task stack usage
      on NOMMU") .  If I had completely reverted it, I would have had to change
      mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
      configured.  Since I could not test the builds without significant effort,
      I decided to not change mm/Makefile.
      
      I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
      information for threads on 64-bit") .  I left the KSTK_ESP() change in
      place as that seemed worthwhile.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34441427
  22. 07 3月, 2010 3 次提交
    • N
      coredump: suppress uid comparison test if core output files are pipes · 76595f79
      Neil Horman 提交于
      Modify uid check in do_coredump so as to not apply it in the case of
      pipes.
      
      This just got noticed in testing.  The end of do_coredump validates the
      uid of the inode for the created file against the uid of the crashing
      process to ensure that no one can pre-create a core file with different
      ownership and grab the information contained in the core when they
      shouldn' tbe able to.  This causes failures when using pipes for a core
      dumps if the crashing process is not root, which is the uid of the pipe
      when it is created.
      
      The fix is simple.  Since the check for matching uid's isn't relevant for
      pipes (a process can't create a pipe that the uermodehelper code will open
      anyway), we can just just skip it in the event ispipe is non-zero
      
      Reverts a pipe-affecting change which was accidentally made in
      
      : commit c46f739d
      : Author:     Ingo Molnar <mingo@elte.hu>
      : AuthorDate: Wed Nov 28 13:59:18 2007 +0100
      : Commit:     Linus Torvalds <torvalds@woody.linux-foundation.org>
      : CommitDate: Wed Nov 28 10:58:01 2007 -0800
      :
      :     vfs: coredumping fix
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76595f79
    • O
      coredump: set ->group_exit_code for other CLONE_VM tasks too · 5c99cbf4
      Oleg Nesterov 提交于
      User visible change.
      
      do_coredump() kills all threads which share the same ->mm but only the
      coredumping process gets the proper exit_code.  Other tasks which share
      the same ->mm die "silently" and return status == 0 to parent.
      
      This is historical behaviour, not actually a bug.  But I think Frank
      Heckenbach rightly dislikes the current behaviour.  Simple test-case:
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <signal.h>
      	#include <sys/wait.h>
      
      	int main(void)
      	{
      		int stat;
      
      		if (!fork()) {
      			if (!vfork())
      				kill(getpid(), SIGQUIT);
      		}
      
      		wait(&stat);
      		printf("stat=%x\n", stat);
      		return 0;
      	}
      
      Before this patch it prints "stat=0" despite the fact the child was killed
      by SIGQUIT.  After this patch the output is "stat=3" which obviously makes
      more sense.
      
      Even with this patch, only the task which originates the coredumping gets
      "|= 0x80" if the core was actually dumped, but at least the coredumping
      signal is visible to do_wait/etc.
      Reported-by: NFrank Heckenbach <f.heckenbach@fh-soft.de>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c99cbf4
    • M
      coredump: pass mm->flags as a coredump parameter for consistency · 30736a4d
      Masami Hiramatsu 提交于
      Pass mm->flags as a coredump parameter for consistency.
      
       ---
      1787         if (mm->core_state || !get_dumpable(mm)) {  <- (1)
      1788                 up_write(&mm->mmap_sem);
      1789                 put_cred(cred);
      1790                 goto fail;
      1791         }
      1792
      [...]
      1798         if (get_dumpable(mm) == 2) {    /* Setuid core dump mode */ <-(2)
      1799                 flag = O_EXCL;          /* Stop rewrite attacks */
      1800                 cred->fsuid = 0;        /* Dump root private */
      1801         }
       ---
      
      Since dumpable bits are not protected by lock, there is a chance to change
      these bits between (1) and (2).
      
      To solve this issue, this patch copies mm->flags to
      coredump_params.mm_flags at the beginning of do_coredump() and uses it
      instead of get_dumpable() while dumping core.
      
      This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
      dump_filter bits in mm->flags.
      
      [akpm@linux-foundation.org: fix merge]
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30736a4d