1. 20 8月, 2012 3 次提交
  2. 17 8月, 2012 6 次提交
    • I
      autofs4 - fix expire check · d807ff83
      Ian Kent 提交于
      In some cases when an autofs indirect mount is contained in a file
      system that is marked as shared (such as when systemd does the
      equivalent of "mount --make-rshared /" early in the boot), mounts
      stop expiring.
      
      When this happens the first expiry check on a mountpoint dentry in
      autofs_expire_indirect() sees a mountpoint dentry with a higher
      than minimal reference count. Consequently the dentry is condidered
      busy and the actual expiry check is never done.
      
      This particular check was originally meant as an optimisation to
      detect a path walk in progress but with the addition of rcu-walk
      it can be ineffective anyway.
      
      Removing the test allows automounts to expire again since the
      actual expire check doesn't rely on the dentry reference count.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d807ff83
    • T
      ext4: fix kernel BUG on large-scale rm -rf commands · 89a4e48f
      Theodore Ts'o 提交于
      Commit 968dee77: "ext4: fix hole punch failure when depth is greater
      than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel
      crashes when users ran run "rm -rf" on large directory hierarchy on
      ext4 filesystems on RAID devices:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      
          Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710)
          Call Trace:
           [<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110
           [<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
           [<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0
           [<ffffffff81207e05>] ext4_truncate+0xf5/0x100
           [<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
           [<ffffffff811a1312>] evict+0xa2/0x1a0
           [<ffffffff811a1513>] iput+0x103/0x1f0
           [<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
           [<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40
           [<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
           [<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
          Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00
      
          RIP  [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
      
      This could be reproduced as follows:
      
      The problem in commit 968dee77 was that caused the variable 'i' to
      be left uninitialized if the truncate required more space than was
      available in the journal.  This resulted in the function
      ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused
      ext4_ext_remove_space() to restart the truncate operation after
      starting a new jbd2 handle.
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Reported-by: NMarti Raudsepp <marti@juffo.org>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      89a4e48f
    • T
      ext4: fix long mount times on very big file systems · 0548bbb8
      Theodore Ts'o 提交于
      Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
      ext4_statfs()" introduced a O(n**2) calculation which makes very large
      file systems take forever to mount.  Fix this with an optimization for
      non-bigalloc file systems.  (For bigalloc file systems the overhead
      needs to be set in the the superblock.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      0548bbb8
    • T
      ext4: don't call ext4_error while block group is locked · 7a4c5de2
      Theodore Ts'o 提交于
      While in ext4_validate_block_bitmap(), if an block allocation bitmap
      is found to be invalid, we call ext4_error() while the block group is
      still locked.  This causes ext4_commit_super() to call a function
      which might sleep while in an atomic context.
      
      There's no need to keep the block group locked at this point, so hoist
      the ext4_error() call up to ext4_validate_block_bitmap() and release
      the block group spinlock before calling ext4_error().
      
      The reported stack trace can be found at:
      
      	http://article.gmane.org/gmane.comp.file-systems.ext4/33731Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      7a4c5de2
    • I
      autofs4 - fix get_next_positive_subdir() · a45440f0
      Ian Kent 提交于
      Following a report of a crash during an automount expire I found that
      the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong.
      Not only is the locking wrong but the function is more complex than it
      needs to be.
      
      The function is meant to calculate (and dget) the next entry in the list
      of directories contained in the root of an autofs mount point (an autofs
      indirect mount to be precise). The main problem was that the d_lock of
      the owner of the list was not being taken when walking the list, which
      lead to list corruption under load. The only other lock that needs to
      be taken is against the next dentry candidate so it can be checked for
      usability.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a45440f0
    • S
      vfs: fix propagation of atomic_open create error on negative dentry · 62b2ce96
      Sage Weil 提交于
      If ->atomic_open() returns -ENOENT, we take care to return the create
      error (e.g., EACCES), if any.  Do the same when ->atomic_open() returns 1
      and provides a negative dentry.
      
      This fixes a regression where an unprivileged open O_CREAT fails with
      ENOENT instead of EACCES, introduced with the new atomic_open code.  It
      is tested by the open/08.t test in the pjd posix test suite, and was
      observed on top of fuse (backed by ceph-fuse).
      Signed-off-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      62b2ce96
  3. 15 8月, 2012 4 次提交
  4. 09 8月, 2012 1 次提交
  5. 07 8月, 2012 1 次提交
    • Z
      fuse: verify all ioctl retry iov elements · fb6ccff6
      Zach Brown 提交于
      Commit 7572777e attempted to verify that
      the total iovec from the client doesn't overflow iov_length() but it
      only checked the first element.  The iovec could still overflow by
      starting with a small element.  The obvious fix is to check all the
      elements.
      
      The overflow case doesn't look dangerous to the kernel as the copy is
      limited by the length after the overflow.  This fix restores the
      intention of returning an error instead of successfully copying less
      than the iovec represented.
      
      I found this by code inspection.  I built it but don't have a test case.
      I'm cc:ing stable because the initial commit did as well.
      Signed-off-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: <stable@vger.kernel.org>         [2.6.37+]
      fb6ccff6
  6. 06 8月, 2012 2 次提交
    • T
      ext4: avoid kmemcheck complaint from reading uninitialized memory · 7e731bc9
      Theodore Ts'o 提交于
      Commit 03179fe9 introduced a kmemcheck complaint in
      ext4_da_get_block_prep() because we save and restore
      ei->i_da_metadata_calc_last_lblock even though it is left
      uninitialized in the case where i_da_metadata_calc_len is zero.
      
      This doesn't hurt anything, but silencing the kmemcheck complaint
      makes it easier for people to find real bugs.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
      (which is marked as a regression).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      7e731bc9
    • T
      ext4: make sure the journal sb is written in ext4_clear_journal_err() · d796c52e
      Theodore Ts'o 提交于
      After we transfer set the EXT4_ERROR_FS bit in the file system
      superblock, it's not enough to call jbd2_journal_clear_err() to clear
      the error indication from journal superblock --- we need to call
      jbd2_journal_update_sb_errno() as well.  Otherwise, when the root file
      system is mounted read-only, the journal is replayed, and the error
      indicator is transferred to the superblock --- but the s_errno field
      in the jbd2 superblock is left set (since although we cleared it in
      memory, we never flushed it out to disk).
      
      This can end up confusing e2fsck.  We should make e2fsck more robust
      in this case, but the kernel shouldn't be leaving things in this
      confused state, either.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      
      d796c52e
  7. 04 8月, 2012 13 次提交
  8. 03 8月, 2012 1 次提交
    • S
      ceph: simplify+fix atomic_open · 5ef50c3b
      Sage Weil 提交于
      The initial ->atomic_open op was carried over from the old intent code,
      which was incomplete and didn't really work.  Replace it with a fresh
      method.  In particular:
      
       * always attempt to do an atomic open+lookup, both for the create case
         and for lookups of existing files.
       * fix symlink handling by returning 1 to the VFS so that we can follow
         the link to its destination. This fixes a longstanding ceph bug (#2392).
      Signed-off-by: NSage Weil <sage@inktank.com>
      5ef50c3b
  9. 02 8月, 2012 6 次提交
  10. 01 8月, 2012 3 次提交
    • M
      nfs: prevent page allocator recursions with swap over NFS. · 192e501b
      Mel Gorman 提交于
      GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
      just not of any filesystem data.
      
      The problem is that previously NOFS was correct because that avoids
      recursion into the NFS code.  With swap-over-NFS, it is no longer correct
      as swap IO can lead to this recursion.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      192e501b
    • M
      nfs: enable swap on NFS · a564b8f0
      Mel Gorman 提交于
      Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
      will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
      PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
      ->connect() method.
      
      PF_MEMALLOC should allow the allocation of struct socket and related
      objects and the early (re)setting of SOCK_MEMALLOC should allow us to
      receive the packets required for the TCP connection buildup.
      
      [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
      [dfeng@redhat.com: Fix handling of multiple swap files]
      [a.p.zijlstra@chello.nl: Original patch]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a564b8f0
    • M
      nfs: disable data cache revalidation for swapfiles · 29418aa4
      Mel Gorman 提交于
      The VM does not like PG_private set on PG_swapcache pages.  As suggested
      by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
      data cache revalidation on swap files.  as it does not make sense to have
      other clients change the file while it is being used as swap.  This avoids
      setting PG_private on swap pages, since there ought to be no further races
      with invalidate_inode_pages2() to deal with.
      
      Since we cannot set PG_private we cannot use page->private which is
      already used by PG_swapcache pages to store the nfs_page.  Thus augment
      the new nfs_page_find_request logic.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29418aa4