1. 25 7月, 2011 5 次提交
  2. 21 7月, 2011 5 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
    • C
      fs: move inode_dio_done to the end_io handler · 72c5052d
      Christoph Hellwig 提交于
      For filesystems that delay their end_io processing we should keep our
      i_dio_count until the the processing is done.  Enable this by moving
      the inode_dio_done call to the end_io handler if one exist.  Note that
      the actual move to the workqueue for ext4 and XFS is not done in
      this patch yet, but left to the filesystem maintainers.  At least
      for XFS it's not needed yet either as XFS has an internal equivalent
      to i_dio_count.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      72c5052d
    • C
      fs: always maintain i_dio_count · df2d6f26
      Christoph Hellwig 提交于
      Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING.
      This these filesystems to also protect truncate against direct I/O requests
      by using common code.  Right now the only non-DIO_LOCKING filesystem that
      appears to do so is XFS, which uses an opencoded variant of the i_dio_count
      scheme.
      
      Behaviour doesn't change for filesystems never calling inode_dio_wait.
      For ext4 behaviour changes when using the dioread_nonlock option, which
      previously was missing any protection between truncate and direct I/O reads.
      For ocfs2 that handcrafted i_dio_count manipulations are replaced with
      the common code now enable.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      df2d6f26
    • C
      fs: move inode_dio_wait calls into ->setattr · 562c72aa
      Christoph Hellwig 提交于
      Let filesystems handle waiting for direct I/O requests themselves instead
      of doing it beforehand.  This means filesystem-specific locks to prevent
      new dio referenes from appearing can be held.  This is important to allow
      generalizing i_dio_count to non-DIO_LOCKING filesystems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      562c72aa
    • C
      fs: kill i_alloc_sem · bd5fe6c5
      Christoph Hellwig 提交于
      i_alloc_sem is a rather special rw_semaphore.  It's the last one that may
      be released by a non-owner, and it's write side is always mirrored by
      real exclusion.  It's intended use it to wait for all pending direct I/O
      requests to finish before starting a truncate.
      
      Replace it with a hand-grown construct:
      
       - exclusion for truncates is already guaranteed by i_mutex, so it can
         simply fall way
       - the reader side is replaced by an i_dio_count member in struct inode
         that counts the number of pending direct I/O requests.  Truncate can't
         proceed as long as it's non-zero
       - when i_dio_count reaches non-zero we wake up a pending truncate using
         wake_up_bit on a new bit in i_flags
       - new references to i_dio_count can't appear while we are waiting for
         it to read zero because the direct I/O count always needs i_mutex
         (or an equivalent like XFS's i_iolock) for starting a new operation.
      
      This scheme is much simpler, and saves the space of a spinlock_t and a
      struct list_head in struct inode (typically 160 bits on a non-debug 64-bit
      system).
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bd5fe6c5
  3. 20 7月, 2011 6 次提交
  4. 04 6月, 2011 1 次提交
    • A
      more conservative S_NOSEC handling · 9e1f1de0
      Al Viro 提交于
      Caching "we have already removed suid/caps" was overenthusiastic as merged.
      On network filesystems we might have had suid/caps set on another client,
      silently picked by this client on revalidate, all of that *without* clearing
      the S_NOSEC flag.
      
      AFAICS, the only reasonably sane way to deal with that is
      	* new superblock flag; unless set, S_NOSEC is not going to be set.
      	* local block filesystems set it in their ->mount() (more accurately,
      mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
      local block ones clear it)
      	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
      it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
      inode attribute changes are picked from other clients.
      
      It's not an earth-shattering hole (anybody that can set suid on another client
      will almost certainly be able to write to the file before doing that anyway),
      but it's a bug that needs fixing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9e1f1de0
  5. 27 5月, 2011 3 次提交
    • T
      Ocfs2/move_extents: Validate moving goal after the adjustment. · ea5e1675
      Tristan Ye 提交于
      though the goal_to_be_moved will be validated again in following moving, it's
      still a good idea to validate it after adjustment at the very beginning, instead
      of validating it before adjustment.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      ea5e1675
    • T
      Ocfs2/move_extents: Avoid doing division in extent moving. · 6aea6f50
      Tristan Ye 提交于
      It's not wise enough to do a 64bits division anywhere in kernside, replace it
      with a decent helper or proper shifts.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      6aea6f50
    • D
      ocfs2: add cleancache support · 1cfd8bd0
      Dan Magenheimer 提交于
      This eighth patch of eight in this cleancache series "opts-in"
      cleancache for ocfs2.  Clustered filesystems must explicitly enable
      cleancache by calling cleancache_init_shared_fs anytime an instance
      of the filesystem is mounted.  Ocfs2 is currently the only user of
      the clustered filesystem interface but nevertheless, the cleancache
      hooks in the VFS layer are sufficient for ocfs2 including the matching
      cleancache_flush_fs hook which must be called on unmount.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v8: trivial merge conflict update]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Nitin Gupta <ngupta@vflare.org>
      1cfd8bd0
  6. 26 5月, 2011 6 次提交
  7. 25 5月, 2011 14 次提交