1. 06 9月, 2009 1 次提交
    • O
      exec: do not sleep in TASK_TRACED under ->cred_guard_mutex · a2a8474c
      Oleg Nesterov 提交于
      Tom Horsley reports that his debugger hangs when it tries to read
      /proc/pid_of_tracee/maps, this happens since
      
      	"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
      	04b836cbf19e885f8366bccb2e4b0474346c02d
      
      commit in 2.6.31.
      
      But the root of the problem lies in the fact that do_execve() path calls
      tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.
      
      The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
      remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
      another task doing PTRACE_ATTACH should not hang until it is killed or the
      tracee resumes.
      
      With this patch do_execve() does not use ->cred_guard_mutex directly and
      we do not hold it throughout, instead:
      
      	- introduce prepare_bprm_creds() helper, it locks the mutex
      	  and calls prepare_exec_creds() to initialize bprm->cred.
      
      	- install_exec_creds() drops the mutex after commit_creds(),
      	  and thus before tracehook_report_exec()->ptrace_stop().
      
      	  or, if exec fails,
      
      	  free_bprm() drops this mutex when bprm->cred != NULL which
      	  indicates install_exec_creds() was not called.
      Reported-by: NTom Horsley <tom.horsley@att.net>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2a8474c
  2. 05 9月, 2009 1 次提交
  3. 01 9月, 2009 1 次提交
  4. 29 8月, 2009 1 次提交
  5. 28 8月, 2009 4 次提交
  6. 27 8月, 2009 4 次提交
  7. 25 8月, 2009 2 次提交
    • T
      NFSv4: Fix an infinite looping problem with the nfs4_state_manager · 7111dc73
      Trond Myklebust 提交于
      Commit 76db6d95 (nfs41: add session setup
      to the state manager) introduces an infinite loop possibility in the NFSv4
      state manager. By first checking nfs4_has_session() before clearing the
      NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
      that flag, but it never gets cleared, and so the state manager loops.
      
      In fact commit c3fad1b1 (nfs41: add session
      reset to state manager) causes this to happen every time we get a network
      partition error.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7111dc73
    • H
      mm: fix hugetlb bug due to user_shm_unlock call · 353d5c30
      Hugh Dickins 提交于
      2.6.30's commit 8a0bdec1 removed
      user_shm_lock() calls in hugetlb_file_setup() but left the
      user_shm_unlock call in shm_destroy().
      
      In detail:
      Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
      is not called in hugetlb_file_setup(). However, user_shm_unlock() is
      called in any case in shm_destroy() and in the following
      atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
      up->__count gets zero, also cleanup_user_struct() is scheduled.
      
      Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
      However, the ref counter up->__count gets unexpectedly non-positive and
      the corresponding structs are freed even though there are live
      references to them, resulting in a kernel oops after a lots of
      shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.
      
      Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
      time of shm_destroy() may give a different answer from at the time
      of hugetlb_file_setup().  And fixed newseg()'s no_id error path,
      which has missed user_shm_unlock() ever since it came in 2.6.9.
      Reported-by: NStefan Huber <shuber2@gmail.com>
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Tested-by: NStefan Huber <shuber2@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      353d5c30
  8. 24 8月, 2009 3 次提交
    • J
      ext3: Improve error message that changing journaling mode on remount is not possible · 3c4cec65
      Jan Kara 提交于
      This patch makes the error message about changing journaling mode on remount
      more descriptive. Some people are going to hit this error now due to commit
      bbae8bcc if they configure a kernel to default
      to data=writeback mode. The problem happens if they have data=ordered set for
      the root filesystem in /etc/fstab but not in the kernel command line (and they
      don't use initrd). Their filesystem then gets mounted as data=writeback by
      kernel but then their boot fails because init scripts won't be able to remount
      the filesystem rw. Better error message will hopefully make it easier for them
      to find the error in their setup and bother us less with error reports :).
      Signed-off-by: NJan Kara <jack@suse.cz>
      3c4cec65
    • T
      ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED · 6d418076
      Theodore Ts'o 提交于
      The old description for this configuration option was perhaps not
      completely balanced in terms of describing the tradeoffs of using a
      default of data=writeback vs. data=ordered.  Despite the fact that old
      description very strongly recomended disabling this feature, all of
      the major distributions have elected to preserve the existing 'legacy'
      default, which is a strong hint that it perhaps wasn't telling the
      whole story.
      
      This revised description has been vetted by a number of ext3
      developers as being better at informing the user about the tradeoffs
      of enabling or disabling this configuration feature.
      
      Cc: linux-ext4@vger.kernel.org
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6d418076
    • M
      kernel_read: redefine offset type · 6777d773
      Mimi Zohar 提交于
      vfs_read() offset is defined as loff_t, but kernel_read()
      offset is only defined as unsigned long. Redefine
      kernel_read() offset as loff_t.
      
      Cc: stable@kernel.org
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      6777d773
  9. 22 8月, 2009 1 次提交
    • L
      Re-introduce page mapping check in mark_buffer_dirty() · 8e9d78ed
      Linus Torvalds 提交于
      In commit a8e7d49a ("Fix race in
      create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
      for a NULL page mapping unintentionally when some of the code inside
      __set_page_dirty() was moved to the callers.
      
      That removal generally didn't matter, since a filesystem would serialize
      truncation (which clears the page mapping) against writing (which marks
      the buffer dirty), so locking at a higher level (either per-page or an
      inode at a time) should mean that the buffer page would be stable.  And
      indeed, nothing bad seemed to happen.
      
      Except it turns out that apparently reiserfs does something odd when
      under load and writing out the journal, and we have a number of bugzilla
      entries that look similar:
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13556
      	http://bugzilla.kernel.org/show_bug.cgi?id=13756
      	http://bugzilla.kernel.org/show_bug.cgi?id=13876
      
      and it looks like reiserfs depended on that check (the common theme
      seems to be "data=journal", and a journal writeback during a truncate).
      
      I suspect reiserfs should have some additional locking, but in the
      meantime this should get us back to the pre-2.6.29 behavior.
      Pattern-pointed-out-by: NRoland Kletzing <devzero@web.de>
      Cc: stable@kernel.org (2.6.29 and 2.6.30)
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e9d78ed
  10. 21 8月, 2009 3 次提交
  11. 19 8月, 2009 3 次提交
    • K
      mm: revert "oom: move oom_adj value" · 0753ba01
      KOSAKI Motohiro 提交于
      The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
      the mm_struct.  It was a very good first step for sanitize OOM.
      
      However Paul Menage reported the commit makes regression to his job
      scheduler.  Current OOM logic can kill OOM_DISABLED process.
      
      Why? His program has the code of similar to the following.
      
      	...
      	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
      	...
      	if (vfork() == 0) {
      		set_oom_adj(0); /* Invoked child can be killed */
      		execve("foo-bar-cmd");
      	}
      	....
      
      vfork() parent and child are shared the same mm_struct.  then above
      set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
      change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
      lost OOM immune and it was killed.
      
      Actually, fork-setting-exec idiom is very frequently used in userland program.
      We must not break this assumption.
      
      Then, this patch revert commit 2ff05b2b and related commit.
      
      Reverted commit list
      ---------------------
      - commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
      - commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
      - commit 81236810 (oom: only oom kill exiting tasks with attached memory)
      - commit 933b787b (mm: copy over oom_adj value at fork time)
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0753ba01
    • J
      vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed · 89a4eb4b
      Jeff Layton 提交于
      get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
      to a signed value.  Fix it to use MAX_LFS_FILESIZE which casts properly
      to a positive signed value.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89a4eb4b
    • R
      nilfs2: fix oopses with doubly mounted snapshots · a9245860
      Ryusuke Konishi 提交于
      will fix kernel oopses like the following:
      
       # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
       # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
       # umount /test1
       # umount /test2
      
      BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
      in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
      1 lock held by umount.nilfs2/3886:
       #0:  (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c
      irq event stamp: 1219
      hardirqs last  enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119
      hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119
      softirqs last  enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad
      softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a
      Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
      Call Trace:
       [<c1023549>] __might_sleep+0x107/0x10e
       [<c13603c0>] do_page_fault+0x246/0x397
       [<c136017a>] ? do_page_fault+0x0/0x397
       [<c135e753>] error_code+0x6b/0x70
       [<c136017a>] ? do_page_fault+0x0/0x397
       [<c104f805>] ? __lock_acquire+0x91/0x12fd
       [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
       [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
       [<c1050b2b>] lock_acquire+0xba/0xdd
       [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<c135d4fe>] down_write+0x2a/0x46
       [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<c104ea2c>] ? mark_held_locks+0x43/0x5b
       [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133
       [<c104ece4>] ? trace_hardirqs_on+0xb/0xd
       [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2]
       [<c10b3352>] generic_shutdown_super+0x49/0xb8
       [<c10b33de>] kill_block_super+0x1d/0x31
       [<c10e6599>] ? vfs_quota_off+0x0/0x12
       [<c10b398f>] deactivate_super+0x57/0x6c
       [<c10c4bc3>] mntput_no_expire+0x8c/0xb4
       [<c10c5094>] sys_umount+0x27f/0x2a4
       [<c10c50c6>] sys_oldumount+0xd/0xf
       [<c10031a4>] sysenter_do_call+0x12/0x38
       ...
      
      This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
      remaining sget() use").
      
      In the patch, a new "put resource" function, nilfs_put_sbinfo()
      was introduced to delay freeing nilfs_sb_info struct.
      
      But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
      function to check the reference count, and it caused the nilfs_sb_info
      was freed when user mounted a snapshot twice.
      
      This bug also suggests there was unseen memory leak in usual mount
      /umount operations for nilfs.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      a9245860
  12. 18 8月, 2009 16 次提交