1. 14 1月, 2010 1 次提交
    • A
      fix autofs/afs/etc. magic mountpoint breakage · 86acdca1
      Al Viro 提交于
      We end up trying to kfree() nd.last.name on open("/mnt/tmp", O_CREAT)
      if /mnt/tmp is an autofs direct mount.  The reason is that nd.last_type
      is bogus here; we want LAST_BIND for everything of that kind and we
      get LAST_NORM left over from finding parent directory.
      
      So make sure that it *is* set properly; set to LAST_BIND before
      doing ->follow_link() - for normal symlinks it will be changed
      by __vfs_follow_link() and everything else needs it set that way.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      86acdca1
  2. 16 12月, 2009 2 次提交
  3. 12 11月, 2009 1 次提交
    • S
      pidns: fix a leak in /proc dentries and inodes with pid namespaces. · 29f12ca3
      Sukadev Bhattiprolu 提交于
      Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace'
      that is discussed in:
      
      	http://lkml.org/lkml/2009/10/2/159.
      
      To summarize the thread, when container-init is terminated, it sets the
      PF_EXITING flag, zaps other processes in the container and waits to reap
      them.  As a part of reaping, the container-init should flush any /proc
      dentries associated with the processes.  But because the container-init is
      itself exiting and the following PF_EXITING check, the dentries are not
      flushed, resulting in leak in /proc inodes and dentries.
      
      This fix reverts the commit 7766755a ("Fix /proc dcache deadlock
      in do_exit") which introduced the check for PF_EXITING.  At the time of
      the commit, shrink_dcache_parent() flushed dentries from other filesystems
      also and could have caused a deadlock which the commit fixed.  But as
      pointed out by Eric Biederman, after commit 0feae5c4,
      shrink_dcache_parent() no longer affects other filesystems.  So reverting
      the commit is now safe.
      
      As pointed out by Jan Kara, the leak is not as critical since the
      unclaimed space will be reclaimed under memory pressure or by:
      
      	echo 3 > /proc/sys/vm/drop_caches
      
      But since this check is no longer required, its best to remove it.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Reported-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NJan Kara <jack@ucw.cz>
      Cc: Andrea Arcangeli <andrea@cpushare.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29f12ca3
  4. 23 9月, 2009 3 次提交
  5. 22 9月, 2009 3 次提交
  6. 19 8月, 2009 1 次提交
    • K
      mm: revert "oom: move oom_adj value" · 0753ba01
      KOSAKI Motohiro 提交于
      The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
      the mm_struct.  It was a very good first step for sanitize OOM.
      
      However Paul Menage reported the commit makes regression to his job
      scheduler.  Current OOM logic can kill OOM_DISABLED process.
      
      Why? His program has the code of similar to the following.
      
      	...
      	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
      	...
      	if (vfork() == 0) {
      		set_oom_adj(0); /* Invoked child can be killed */
      		execve("foo-bar-cmd");
      	}
      	....
      
      vfork() parent and child are shared the same mm_struct.  then above
      set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
      change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
      lost OOM immune and it was killed.
      
      Actually, fork-setting-exec idiom is very frequently used in userland program.
      We must not break this assumption.
      
      Then, this patch revert commit 2ff05b2b and related commit.
      
      Reverted commit list
      ---------------------
      - commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
      - commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
      - commit 81236810 (oom: only oom kill exiting tasks with attached memory)
      - commit 933b787b (mm: copy over oom_adj value at fork time)
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0753ba01
  7. 10 8月, 2009 5 次提交
  8. 24 6月, 2009 1 次提交
    • O
      mm_for_maps: simplify, use ptrace_may_access() · a893a84e
      Oleg Nesterov 提交于
      It would be nice to kill __ptrace_may_access(). It requires task_lock(),
      but this lock is only needed to read mm->flags in the middle.
      
      Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
      the code a little bit.
      
      Also, we do not need to take ->mmap_sem in advance. In fact I think
      mm_for_maps() should not play with ->mmap_sem at all, the caller should
      take this lock.
      
      With or without this patch, without ->cred_guard_mutex held we can race
      with exec() and get the new ->mm but check old creds.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      a893a84e
  9. 17 6月, 2009 1 次提交
    • D
      oom: move oom_adj value from task_struct to mm_struct · 2ff05b2b
      David Rientjes 提交于
      The per-task oom_adj value is a characteristic of its mm more than the
      task itself since it's not possible to oom kill any thread that shares the
      mm.  If a task were to be killed while attached to an mm that could not be
      freed because another thread were set to OOM_DISABLE, it would have
      needlessly been terminated since there is no potential for future memory
      freeing.
      
      This patch moves oomkilladj (now more appropriately named oom_adj) from
      struct task_struct to struct mm_struct.  This requires task_lock() on a
      task to check its oom_adj value to protect against exec, but it's already
      necessary to take the lock when dereferencing the mm to find the total VM
      size for the badness heuristic.
      
      This fixes a livelock if the oom killer chooses a task and another thread
      sharing the same memory has an oom_adj value of OOM_DISABLE.  This occurs
      because oom_kill_task() repeatedly returns 1 and refuses to kill the
      chosen task while select_bad_process() will repeatedly choose the same
      task during the next retry.
      
      Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in
      oom_kill_task() to check for threads sharing the same memory will be
      removed in the next patch in this series where it will no longer be
      necessary.
      
      Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since
      these threads are immune from oom killing already.  They simply report an
      oom_adj value of OOM_DISABLE.
      
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ff05b2b
  10. 29 5月, 2009 1 次提交
  11. 11 5月, 2009 1 次提交
  12. 05 5月, 2009 1 次提交
  13. 17 4月, 2009 1 次提交
  14. 01 4月, 2009 1 次提交
  15. 29 3月, 2009 1 次提交
    • H
      fix setuid sometimes wouldn't · 7c2c7d99
      Hugh Dickins 提交于
      check_unsafe_exec() also notes whether the fs_struct is being
      shared by more threads than will get killed by the exec, and if so
      sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
      But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
      use of get_fs_struct(), which also raises that sharing count.
      
      This might occasionally cause a setuid program not to change euid,
      in the same way as happened with files->count (check_unsafe_exec
      also looks at sighand->count, but /proc doesn't raise that one).
      
      We'd prefer exec not to unshare fs_struct: so fix this in procfs,
      replacing get_fs_struct() by get_fs_path(), which does path_get
      while still holding task_lock, instead of raising fs->count.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: stable@kernel.org
      ___
      
       fs/proc/base.c |   50 +++++++++++++++--------------------------------
       1 file changed, 16 insertions(+), 34 deletions(-)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c2c7d99
  16. 28 3月, 2009 1 次提交
  17. 18 3月, 2009 1 次提交
    • L
      Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25
      Linus Torvalds 提交于
      Commit ee6f779b ("filp->f_pos not
      correctly updated in proc_task_readdir") changed the proc code to use
      filp->f_pos directly, rather than through a temporary variable.  In the
      process, that caused the operations to be done on the full 64 bits, even
      though the offset is never that big.
      
      That's all fine and dandy per se, but for some unfathomable reason gcc
      generates absolutely horrid code when using 64-bit values in switch()
      statements.  To the point of actually calling out to gcc helper
      functions like __cmpdi2 rather than just doing the trivial comparisons
      directly the way gcc does for normal compares.  At which point we get
      link failures, because we really don't want to support that kind of
      crazy code.
      
      Fix this by just casting the f_pos value to "unsigned long", which
      is plenty big enough for /proc, and avoids the gcc code generation issue.
      Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Zhang Le <r0bertz@gentoo.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee568b25
  18. 16 3月, 2009 1 次提交
    • Z
      filp->f_pos not correctly updated in proc_task_readdir · ee6f779b
      Zhang Le 提交于
      filp->f_pos only get updated at the end of the function. Thus d_off of those
      dirents who are in the middle will be 0, and this will cause a problem in
      glibc's readdir implementation, specifically endless loop. Because when overflow
      occurs, f_pos will be set to next dirent to read, however it will be 0, unless
      the next one is the last one. So it will start over again and again.
      
      There is a sample program in man 2 gendents. This is the output of the program
      running on a multithread program's task dir before this patch is applied:
      
        $ ./a.out /proc/3807/task
        --------------- nread=128 ---------------
        i-node#  file type  d_reclen  d_off   d_name
          506442  directory    16          1  .
          506441  directory    16          0  ..
          506443  directory    16          0  3807
          506444  directory    16          0  3809
          506445  directory    16          0  3812
          506446  directory    16          0  3861
          506447  directory    16          0  3862
          506448  directory    16          8  3863
      
      This is the output after this patch is applied
      
        $ ./a.out /proc/3807/task
        --------------- nread=128 ---------------
        i-node#  file type  d_reclen  d_off   d_name
          506442  directory    16          1  .
          506441  directory    16          2  ..
          506443  directory    16          3  3807
          506444  directory    16          4  3809
          506445  directory    16          5  3812
          506446  directory    16          6  3861
          506447  directory    16          7  3862
          506448  directory    16          8  3863
      Signed-off-by: NZhang Le <r0bertz@gentoo.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6f779b
  19. 06 1月, 2009 1 次提交
  20. 05 1月, 2009 5 次提交
  21. 22 12月, 2008 1 次提交
    • I
      sched: fix warning in fs/proc/base.c · 826e08b0
      Ingo Molnar 提交于
      Stephen Rothwell reported this new (harmless) build warning on platforms that
      define u64 to long:
      
       fs/proc/base.c: In function 'proc_pid_schedstat':
       fs/proc/base.c:352: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
      
      asm-generic/int-l64.h platforms strike again: that file should be eliminated.
      
      Fix it by casting the parameters to long long.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      826e08b0
  22. 18 12月, 2008 1 次提交
  23. 11 12月, 2008 1 次提交
  24. 14 11月, 2008 2 次提交
  25. 21 10月, 2008 1 次提交
  26. 10 10月, 2008 1 次提交