1. 15 9月, 2009 23 次提交
  2. 14 9月, 2009 17 次提交
    • C
      fsync: wait for data writeout completion before calling ->fsync · 2daea67e
      Christoph Hellwig 提交于
      Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
      the data, the calls into ->fsync to write out the metadata and then finally
      calls filemap_fdatawait to wait for the data I/O to complete.  What sounds
      like a clever micro-optimization actually is nast trap for many filesystems.
      
      For many modern filesystems i_size or other inode information is only
      updated on I/O completion and we need to wait for I/O to finish before
      we can write out the metadata.  For old fashionen filesystems that
      instanciate blocks during the actual write and also update the metadata
      at that point it opens up a large window were we could expose uninitialized
      blocks after a crash.  While a few filesystems that need it already wait
      for the I/O to finish inside their ->fsync methods it is rather suboptimal
      as it is done under the i_mutex and also always for the whole file instead
      of just a part as we could do for O_SYNC handling.
      
      Here is a small audit of all fsync instances in the tree:
      
       - spufs_mfc_fsync:
       - ps3flash_fsync:
       - vol_cdev_fsync:
       - printer_fsync:
       - fb_deferred_io_fsync:
       - bad_file_fsync:
       - simple_sync_file:
      
      	don't care - filesystems/drivers do't use the page cache or are
      	purely in-memory.
      
       - simple_fsync:
       - file_fsync:
       - affs_file_fsync:
       - fat_file_fsync:
       - jfs_fsync:
       - ubifs_fsync:
       - reiserfs_dir_fsync:
       - reiserfs_sync_file:
      
      	never touch pagecache themselves.  We need to wait before if we do
      	not want to expose stale data after an allocation.
      
       - afs_fsync:
       - fuse_fsync_common:
      
      	do the waiting writeback itself in awkward ways, would benefit from
      	proper semantics
      
       - block_fsync:
      
      	Does a filemap_write_and_wait on the block device inode.  Because we
      	now have f_mapping that is the same inode we call it on in vfs_fsync.
      	So just removing it and letting the VFS do the work in one go would
      	be an improvement.
      
       - btrfs_sync_file:
       - cifs_fsync:
       - xfs_file_fsync:
      
      	need the wait first and currently do it themselves. would benefit from
      	doing it outside i_mutex.
      
       - coda_fsync:
       - ecryptfs_fsync:
       - exofs_file_fsync:
       - shm_fsync:
      
      	only passes the fsync through to the lower layer
      
       - ext3_sync_file:
      
      	doesn't seem to care, comments are confusing.
      
       - ext4_sync_file:
      
      	would need the wait to work correctly for delalloc mode with late
      	i_size updates.  Otherwise the ext3 comment applies.
      
      	currently implemens it's own writeback and wait in an odd way,
      	could benefit from doing it properly.
      
       - gfs2_fsync:
      
      	not needed for journaled data mode, but probably harmless there.
      	Currently writes back data asynchronously itself.  Needs some
      	major audit.
      
       - hostfs_fsync:
      
      	just calls fsync/datasync on the host FD.  Without the wait before
      	data might not even be inflight yet if we're unlucky.
      
       - hpfs_file_fsync:
       - ncp_fsync:
      
      	no-ops.  Dangerous before and after.
      
       - jffs2_fsync:
      
      	just calls jffs2_flush_wbuf_gc, not sure how this relates to data.
      
       - nfs_fsync_dir:
      
      	just increments stats, claims all directory operations are synchronous
      
       - nfs_file_fsync:
      
      	only writes out data???  Looks very odd.
      
       - nilfs_sync_file:
      
      	looks like it expects all data done, but not sure from the code
      
       - ntfs_dir_fsync:
       - ntfs_file_fsync:
      
      	appear to do their own data writeback.  Very convoluted code.
      
       - ocfs2_sync_file:
      
      	does it's own data writeback, but no wait.  probably needs the wait.
      
       - smb_fsync:
      
      	according to a comment expects all pages written already, probably needs
      	the wait before.
      
      This patch only changes vfs_fsync_range, removal of the wait in the methods
      that have it is left to the filesystem maintainers.  Note that most
      filesystems really do need an audit for their fsync methods given the
      gems found in this very brief audit.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      2daea67e
    • J
      vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() · 18f2ee70
      Jan Kara 提交于
      Remove these three functions since nobody uses them anymore.
      Signed-off-by: NJan Kara <jack@suse.cz>
      18f2ee70
    • J
      fat: Opencode sync_page_range_nolock() · 2f3d675b
      Jan Kara 提交于
      fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the
      only user of generic_osync_inode() which does not have a file open.  So
      opencode needed actions for FAT so that we can convert generic_osync_inode() to
      a standard syncing path.
      
      Update a comment about generic_osync_inode().
      
      CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NJan Kara <jack@suse.cz>
      2f3d675b
    • J
      pohmelfs: Use new syncing helper · aa3caafe
      Jan Kara 提交于
      Use new generic_write_sync() helper instead of sync_page_range().
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      aa3caafe
    • J
      xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() · af0f4414
      Jan Kara 提交于
      Christoph Hellwig says that it is enough for XFS to call
      filemap_write_and_wait_range() instead of sync_page_range() because we do
      all the metadata syncing when forcing the log.
      
      CC: Felix Blyakher <felixb@sgi.com>
      CC: xfs@oss.sgi.com
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      af0f4414
    • J
      ocfs2: Update syncing after splicing to match generic version · d23c937b
      Jan Kara 提交于
      Update ocfs2 specific splicing code to use generic syncing helper. The sync now
      does not happen under rw_lock because generic_write_sync() acquires i_mutex
      which ranks above rw_lock. That should not matter because standard fsync path
      does not hold it either.
      Acked-by: NJoel Becker <Joel.Becker@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      CC: ocfs2-devel@oss.oracle.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      d23c937b
    • J
      ntfs: Use new syncing helpers and update comments · ebbbf757
      Jan Kara 提交于
      Use new syncing helpers in .write and .aio_write functions. Also
      remove superfluous syncing in ntfs_file_buffered_write() and update
      comments about generic_osync_inode().
      
      CC: Anton Altaparmakov <aia21@cantab.net>
      CC: linux-ntfs-dev@lists.sourceforge.net
      Signed-off-by: NJan Kara <jack@suse.cz>
      ebbbf757
    • J
      ext4: Remove syncing logic from ext4_file_write · 0d34ec62
      Jan Kara 提交于
      The syncing is now properly handled by generic_file_aio_write() so
      no special ext4 code is needed.
      
      CC: linux-ext4@vger.kernel.org
      CC: tytso@mit.edu
      Signed-off-by: NJan Kara <jack@suse.cz>
      0d34ec62
    • J
      ext3: Remove syncing logic from ext3_file_write · e367626b
      Jan Kara 提交于
      Syncing is now properly done by generic_file_aio_write() so no special logic is
      needed in ext3.
      
      CC: linux-ext4@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      e367626b
    • J
      ext2: Update comment about generic_osync_inode · a2a735ad
      Jan Kara 提交于
      We rely on generic_write_sync() now.
      
      CC: linux-ext4@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      a2a735ad
    • J
      vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode · 148f948b
      Jan Kara 提交于
      Introduce new function for generic inode syncing (vfs_fsync_range) and use
      it from fsync() path. Introduce also new helper for syncing after a sync
      write (generic_write_sync) using the generic function.
      
      Use these new helpers for syncing from generic VFS functions. This makes
      O_SYNC writes to block devices acquire i_mutex for syncing. If we really
      care about this, we can make block_fsync() drop the i_mutex and reacquire
      it before it returns.
      
      CC: Evgeniy Polyakov <zbr@ioremap.net>
      CC: ocfs2-devel@oss.oracle.com
      CC: Joel Becker <joel.becker@oracle.com>
      CC: Felix Blyakher <felixb@sgi.com>
      CC: xfs@oss.sgi.com
      CC: Anton Altaparmakov <aia21@cantab.net>
      CC: linux-ntfs-dev@lists.sourceforge.net
      CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      CC: linux-ext4@vger.kernel.org
      CC: tytso@mit.edu
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      148f948b
    • C
      vfs: Rename generic_file_aio_write_nolock · eef99380
      Christoph Hellwig 提交于
      generic_file_aio_write_nolock() is now used only by block devices and raw
      character device. Filesystems should use __generic_file_aio_write() in case
      generic_file_aio_write() doesn't suit them. So rename the function to
      blkdev_aio_write() and move it to fs/blockdev.c.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      eef99380
    • J
      ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock · 918941a3
      Jan Kara 提交于
      Use the new helper. We have to submit data pages ourselves in case of O_SYNC
      write because __generic_file_aio_write does not do it for us. OCFS2 developpers
      might think about moving the sync out of i_mutex which seems to be easily
      possible but that's out of scope of this patch.
      
      CC: ocfs2-devel@oss.oracle.com
      Acked-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      918941a3
    • J
      pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock · b04f9321
      Jan Kara 提交于
      Use new helper __generic_file_aio_write(). Since the fs takes care of syncing
      by itself afterwards, there are no more changes needed.
      
      CC: Evgeniy Polyakov <zbr@ioremap.net>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b04f9321
    • J
      vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() · c7b50db2
      Jan Kara 提交于
      generic_file_direct_write() and generic_file_buffered_write() called
      generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But
      this is superfluous since generic_file_aio_write() does the syncing as well.
      Also XFS and OCFS2 which call these functions directly handle syncing
      themselves. So let's have a single place where syncing happens:
      generic_file_aio_write().
      
      We slightly change the behavior by syncing only the range of file to which the
      write happened for buffered writes but that should be all that is required.
      
      CC: ocfs2-devel@oss.oracle.com
      CC: Joel Becker <joel.becker@oracle.com>
      CC: Felix Blyakher <felixb@sgi.com>
      CC: xfs@oss.sgi.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      c7b50db2
    • J
      vfs: Export __generic_file_aio_write() and add some comments · e4dd9de3
      Jan Kara 提交于
      Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add
      comments to write helpers explaining how they should be used and export
      __generic_file_aio_write() since it will be used by some filesystems.
      
      CC: ocfs2-devel@oss.oracle.com
      CC: Joel Becker <joel.becker@oracle.com>
      Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      e4dd9de3
    • J
      vfs: Introduce filemap_fdatawait_range · d3bccb6f
      Jan Kara 提交于
      This simple helper saves some filesystems conversion from byte offset
      to page numbers and also makes the fdata* interface more complete.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      d3bccb6f