1. 27 8月, 2009 4 次提交
  2. 25 8月, 2009 2 次提交
    • T
      NFSv4: Fix an infinite looping problem with the nfs4_state_manager · 7111dc73
      Trond Myklebust 提交于
      Commit 76db6d95 (nfs41: add session setup
      to the state manager) introduces an infinite loop possibility in the NFSv4
      state manager. By first checking nfs4_has_session() before clearing the
      NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
      that flag, but it never gets cleared, and so the state manager loops.
      
      In fact commit c3fad1b1 (nfs41: add session
      reset to state manager) causes this to happen every time we get a network
      partition error.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7111dc73
    • H
      mm: fix hugetlb bug due to user_shm_unlock call · 353d5c30
      Hugh Dickins 提交于
      2.6.30's commit 8a0bdec1 removed
      user_shm_lock() calls in hugetlb_file_setup() but left the
      user_shm_unlock call in shm_destroy().
      
      In detail:
      Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
      is not called in hugetlb_file_setup(). However, user_shm_unlock() is
      called in any case in shm_destroy() and in the following
      atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
      up->__count gets zero, also cleanup_user_struct() is scheduled.
      
      Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
      However, the ref counter up->__count gets unexpectedly non-positive and
      the corresponding structs are freed even though there are live
      references to them, resulting in a kernel oops after a lots of
      shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.
      
      Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
      time of shm_destroy() may give a different answer from at the time
      of hugetlb_file_setup().  And fixed newseg()'s no_id error path,
      which has missed user_shm_unlock() ever since it came in 2.6.9.
      Reported-by: NStefan Huber <shuber2@gmail.com>
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Tested-by: NStefan Huber <shuber2@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      353d5c30
  3. 24 8月, 2009 3 次提交
    • J
      ext3: Improve error message that changing journaling mode on remount is not possible · 3c4cec65
      Jan Kara 提交于
      This patch makes the error message about changing journaling mode on remount
      more descriptive. Some people are going to hit this error now due to commit
      bbae8bcc if they configure a kernel to default
      to data=writeback mode. The problem happens if they have data=ordered set for
      the root filesystem in /etc/fstab but not in the kernel command line (and they
      don't use initrd). Their filesystem then gets mounted as data=writeback by
      kernel but then their boot fails because init scripts won't be able to remount
      the filesystem rw. Better error message will hopefully make it easier for them
      to find the error in their setup and bother us less with error reports :).
      Signed-off-by: NJan Kara <jack@suse.cz>
      3c4cec65
    • T
      ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED · 6d418076
      Theodore Ts'o 提交于
      The old description for this configuration option was perhaps not
      completely balanced in terms of describing the tradeoffs of using a
      default of data=writeback vs. data=ordered.  Despite the fact that old
      description very strongly recomended disabling this feature, all of
      the major distributions have elected to preserve the existing 'legacy'
      default, which is a strong hint that it perhaps wasn't telling the
      whole story.
      
      This revised description has been vetted by a number of ext3
      developers as being better at informing the user about the tradeoffs
      of enabling or disabling this configuration feature.
      
      Cc: linux-ext4@vger.kernel.org
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6d418076
    • M
      kernel_read: redefine offset type · 6777d773
      Mimi Zohar 提交于
      vfs_read() offset is defined as loff_t, but kernel_read()
      offset is only defined as unsigned long. Redefine
      kernel_read() offset as loff_t.
      
      Cc: stable@kernel.org
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      6777d773
  4. 22 8月, 2009 1 次提交
    • L
      Re-introduce page mapping check in mark_buffer_dirty() · 8e9d78ed
      Linus Torvalds 提交于
      In commit a8e7d49a ("Fix race in
      create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
      for a NULL page mapping unintentionally when some of the code inside
      __set_page_dirty() was moved to the callers.
      
      That removal generally didn't matter, since a filesystem would serialize
      truncation (which clears the page mapping) against writing (which marks
      the buffer dirty), so locking at a higher level (either per-page or an
      inode at a time) should mean that the buffer page would be stable.  And
      indeed, nothing bad seemed to happen.
      
      Except it turns out that apparently reiserfs does something odd when
      under load and writing out the journal, and we have a number of bugzilla
      entries that look similar:
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13556
      	http://bugzilla.kernel.org/show_bug.cgi?id=13756
      	http://bugzilla.kernel.org/show_bug.cgi?id=13876
      
      and it looks like reiserfs depended on that check (the common theme
      seems to be "data=journal", and a journal writeback during a truncate).
      
      I suspect reiserfs should have some additional locking, but in the
      meantime this should get us back to the pre-2.6.29 behavior.
      Pattern-pointed-out-by: NRoland Kletzing <devzero@web.de>
      Cc: stable@kernel.org (2.6.29 and 2.6.30)
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e9d78ed
  5. 21 8月, 2009 3 次提交
  6. 19 8月, 2009 3 次提交
    • K
      mm: revert "oom: move oom_adj value" · 0753ba01
      KOSAKI Motohiro 提交于
      The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
      the mm_struct.  It was a very good first step for sanitize OOM.
      
      However Paul Menage reported the commit makes regression to his job
      scheduler.  Current OOM logic can kill OOM_DISABLED process.
      
      Why? His program has the code of similar to the following.
      
      	...
      	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
      	...
      	if (vfork() == 0) {
      		set_oom_adj(0); /* Invoked child can be killed */
      		execve("foo-bar-cmd");
      	}
      	....
      
      vfork() parent and child are shared the same mm_struct.  then above
      set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
      change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
      lost OOM immune and it was killed.
      
      Actually, fork-setting-exec idiom is very frequently used in userland program.
      We must not break this assumption.
      
      Then, this patch revert commit 2ff05b2b and related commit.
      
      Reverted commit list
      ---------------------
      - commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
      - commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
      - commit 81236810 (oom: only oom kill exiting tasks with attached memory)
      - commit 933b787b (mm: copy over oom_adj value at fork time)
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0753ba01
    • J
      vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signed · 89a4eb4b
      Jeff Layton 提交于
      get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
      to a signed value.  Fix it to use MAX_LFS_FILESIZE which casts properly
      to a positive signed value.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Robert Love <rlove@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89a4eb4b
    • R
      nilfs2: fix oopses with doubly mounted snapshots · a9245860
      Ryusuke Konishi 提交于
      will fix kernel oopses like the following:
      
       # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
       # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
       # umount /test1
       # umount /test2
      
      BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
      in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
      1 lock held by umount.nilfs2/3886:
       #0:  (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c
      irq event stamp: 1219
      hardirqs last  enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119
      hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119
      softirqs last  enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad
      softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a
      Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
      Call Trace:
       [<c1023549>] __might_sleep+0x107/0x10e
       [<c13603c0>] do_page_fault+0x246/0x397
       [<c136017a>] ? do_page_fault+0x0/0x397
       [<c135e753>] error_code+0x6b/0x70
       [<c136017a>] ? do_page_fault+0x0/0x397
       [<c104f805>] ? __lock_acquire+0x91/0x12fd
       [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
       [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd
       [<c1050b2b>] lock_acquire+0xba/0xdd
       [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<c135d4fe>] down_write+0x2a/0x46
       [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
       [<c104ea2c>] ? mark_held_locks+0x43/0x5b
       [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133
       [<c104ece4>] ? trace_hardirqs_on+0xb/0xd
       [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2]
       [<c10b3352>] generic_shutdown_super+0x49/0xb8
       [<c10b33de>] kill_block_super+0x1d/0x31
       [<c10e6599>] ? vfs_quota_off+0x0/0x12
       [<c10b398f>] deactivate_super+0x57/0x6c
       [<c10c4bc3>] mntput_no_expire+0x8c/0xb4
       [<c10c5094>] sys_umount+0x27f/0x2a4
       [<c10c50c6>] sys_oldumount+0xd/0xf
       [<c10031a4>] sysenter_do_call+0x12/0x38
       ...
      
      This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
      remaining sget() use").
      
      In the patch, a new "put resource" function, nilfs_put_sbinfo()
      was introduced to delay freeing nilfs_sb_info struct.
      
      But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
      function to check the reference count, and it caused the nilfs_sb_info
      was freed when user mounted a snapshot twice.
      
      This bug also suggests there was unseen memory leak in usual mount
      /umount operations for nilfs.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      a9245860
  7. 18 8月, 2009 7 次提交
  8. 17 8月, 2009 1 次提交
    • C
      xfs: fix locking in xfs_iget_cache_hit · bc990f5c
      Christoph Hellwig 提交于
      The locking in xfs_iget_cache_hit currently has numerous problems:
      
       - we clear the reclaim tag without i_flags_lock which protects
         modifications to it
       - we call inode_init_always which can sleep with pag_ici_lock
         held (this is oss.sgi.com BZ #819)
       - we acquire and drop i_flags_lock a lot and thus provide no
         consistency between the various flags we set/clear under it
      
      This patch fixes all that with a major revamp of the locking in
      the function.  The new version acquires i_flags_lock early and
      only drops it once we need to call into inode_init_always or before
      calling xfs_ilock.
      
      This patch fixes a bug seen in the wild where we race modifying the
      reclaim tag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      bc990f5c
  9. 16 8月, 2009 1 次提交
  10. 14 8月, 2009 1 次提交
  11. 12 8月, 2009 13 次提交
  12. 11 8月, 2009 1 次提交
    • J
      ocfs2: Fix possible deadlock when extending quota file · b409d7a0
      Jan Kara 提交于
      In OCFS2, allocator locks rank above transaction start. Thus we
      cannot extend quota file from inside a transaction less we could
      deadlock.
      
      We solve the problem by starting transaction not already in
      ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and
      ocfs2_global_read_dquot() and we allocate blocks to quota files before starting
      the transaction.  In case we crash, quota files will just have a few blocks
      more but that's no problem since we just use them next time we extend the
      quota file.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b409d7a0