1. 09 10月, 2009 1 次提交
    • C
      xfs: implement ->dirty_inode to fix timestamp handling · f9581b14
      Christoph Hellwig 提交于
      This is picking up on Felix's repost of Dave's patch to implement a
      .dirty_inode method.  We really need this notification because
      the VFS keeps writing directly into the inode structure instead
      of going through methods to update this state.  In addition to
      the long-known atime issue we now also have a caller in VM code
      that updates c/mtime that way for shared writeable mmaps.  And
      I found another one that no one has noticed in practice in the FIFO
      code.
      
      So implement ->dirty_inode to set i_update_core whenever the
      inode gets externally dirtied, and switch the c/mtime handling to
      the same scheme we already use for atime (always picking up
      the value from the Linux inode).
      
      Note that this patch also removes the xfs_synchronize_atime call
      in xfs_reclaim it was superflous as we already synchronize the time
      when writing the inode via the log (xfs_inode_item_format) or the
      normal buffers (xfs_iflush_int).
      
      In addition also remove the I_CLEAR check before copying the Linux
      timestamps - now that we always have the Linux inode available
      we can always use the timestamps in it.
      
      Also switch to just using file_update_time for regular reads/writes -
      that will get us all optimization done to it for free and make
      sure we notice early when it breaks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      f9581b14
  2. 14 9月, 2009 1 次提交
  3. 02 9月, 2009 1 次提交
    • C
      xfs: merge fsync and O_SYNC handling · 13e6d5cd
      Christoph Hellwig 提交于
      The guarantees for O_SYNC are exactly the same as the ones we need to
      make for an fsync call (and given that Linux O_SYNC is O_DSYNC the
      equivalent is fdadatasync, but we treat both the same in XFS), except
      with a range data writeout.  Jan Kara has started unifying these two
      path for filesystems using the generic helpers, and I've started to
      look at XFS.
      
      The actual transaction commited by xfs_fsync and xfs_write_sync_logforce
      has a different transaction number, but actually is exactly the same.
      We'll only use the fsync transaction going forward.  One major difference
      is that xfs_write_sync_logforce never issues a cache flush unless we
      commit a transaction causing that as a side-effect, which is an obvious
      bug in the O_SYNC handling.  Second all the locking and i_update_size
      vs i_update_core changes from 978b7237
      never made it to xfs_write_sync_logforce, so we add them back.
      
      To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait
      call is moved up to xfs_file_fsync, so that we don't wait on the whole
      file after we already waited for our portion in xfs_write.
      
      We'll also use a plain call to filemap_write_and_wait_range instead
      of the previous sync_page_rang which did it in two steps including
      an half-hearted inode write out that doesn't help us.
      
      Once we're done with this also remove the now useless i_update_size
      tracking.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NFelix Blyakher <felixb@sgi.com>
      Signed-off-by: NFelix Blyakher <felixb@sgi.com>
      13e6d5cd
  4. 10 6月, 2009 1 次提交
  5. 07 4月, 2009 1 次提交
    • D
      xfs: make inode flush at ENOSPC synchronous · 5825294e
      Dave Chinner 提交于
      When we are writing to a single file and hit ENOSPC, we trigger a background
      flush of the inode and try again.  Because we hold page locks and the iolock,
      the flush won't proceed until after we release these locks. This occurs once
      we've given up and ENOSPC has been reported. Hence if this one is the only
      dirty inode in the system, we'll get an ENOSPC prematurely.
      
      To fix this, remove the async flush from the allocation routines and move
      it to the top of the write path where we can do a synchronous flush
      and retry the write again. Only retry once as a second ENOSPC indicates
      that we really are ENOSPC.
      
      This avoids a page cache deadlock when trying to do this flush synchronously
      in the allocation layer that was identified by Mikulas Patocka.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5825294e
  6. 24 12月, 2008 1 次提交
    • L
      [XFS] Fix race in xfs_write() between direct and buffered I/O with DMAPI · 25051158
      Lachlan McIlroy 提交于
      The iolock is dropped and re-acquired around the call to XFS_SEND_NAMESP().
      While the iolock is released the file can become cached.  We then
      'goto retry' and - if we are doing direct I/O - mapping->nrpages may now be
      non zero but need_i_mutex will be zero and we will hit the WARN_ON().
      
      Since we have dropped the I/O lock then the file size may have also changed
      so what we need to do here is 'goto start' like we do for the XFS_SEND_DATA()
      DMAPI event.
      
      We also need to update the filesize before releasing the iolock so that
      needs to be done before the XFS_SEND_NAMESP event.  If we drop the iolock
      before setting the filesize we could race with a truncate.
      Reviewed-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      25051158
  7. 11 12月, 2008 1 次提交
  8. 04 12月, 2008 1 次提交
  9. 01 12月, 2008 1 次提交
    • D
      [XFS] fix error inversion problems with data flushing · 2e656092
      Dave Chinner 提交于
      XFS gets the sign of the error wrong in several places when
      gathering the error from generic linux functions. These functions
      return negative error values, while the core XFS code returns
      positive error values. Hence when XFS inverts the error to be
      returned to the VFS, it can incorrectly invert a negative
      error and this error will be ignored by the syscall return.
      
      Fix all the problems related to calling filemap_* functions.
      
      Problem initially identified by Nick Piggin in xfs_fsync().
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNiv Sardi <xaiki@sgi.com>
      2e656092
  10. 13 8月, 2008 2 次提交
  11. 27 7月, 2008 1 次提交
  12. 29 4月, 2008 1 次提交
    • C
      [XFS] shrink mrlock_t · 579aa9ca
      Christoph Hellwig 提交于
      The writer field is not needed for non_DEBU builds so remove it. While
      we're at i also clean up the interface for is locked asserts to go through
      and xfs_iget.c helper with an interface like the xfs_ilock routines to
      isolated the XFS codebase from mrlock internals. That way we can kill
      mrlock_t entirely once rw_semaphores grow an islocked facility. Also
      remove unused flags to the ilock family of functions.
      
      SGI-PV: 976035
      SGI-Modid: xfs-linux-melb:xfs-kern:30902a
      Signed-off-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      579aa9ca
  13. 19 4月, 2008 1 次提交
  14. 18 4月, 2008 5 次提交
  15. 07 2月, 2008 4 次提交
  16. 06 2月, 2008 1 次提交
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  17. 17 10月, 2007 1 次提交
  18. 16 10月, 2007 4 次提交
  19. 15 10月, 2007 2 次提交
  20. 10 7月, 2007 1 次提交
  21. 19 6月, 2007 1 次提交
  22. 09 5月, 2007 1 次提交
  23. 08 5月, 2007 4 次提交
    • L
      [XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks. · 71dfd5a3
      Lachlan McIlroy 提交于
      In xfs_write() the iolock is dropped and reacquired in XFS_SEND_DATA()
      which means that the file could change from not-cached to cached and we
      need to redo the direct I/O checks. We should also redo the direct I/O
      checks when the file size changes regardless if O_APPEND is set or not.
      
      SGI-PV: 963483
      SGI-Modid: xfs-linux-melb:xfs-kern:28440a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      71dfd5a3
    • L
      [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. · ba87ea69
      Lachlan McIlroy 提交于
      The problem that has been addressed is that of synchronising updates of
      the file size with writes that extend a file. Without the fix the update
      of a file's size, as a result of a write beyond eof, is independent of
      when the cached data is flushed to disk. Often the file size update would
      be written to the filesystem log before the data is flushed to disk. When
      a system crashes between these two events and the filesystem log is
      replayed on mount the file's size will be set but since the contents never
      made it to disk the file is full of holes. If some of the cached data was
      flushed to disk then it may just be a section of the file at the end that
      has holes.
      
      There are existing fixes to help alleviate this problem, particularly in
      the case where a file has been truncated, that force cached data to be
      flushed to disk when the file is closed. If the system crashes while the
      file(s) are still open then this flushing will never occur.
      
      The fix that we have implemented is to introduce a second file size,
      called the in-memory file size, that represents the current file size as
      viewed by the user. The existing file size, called the on-disk file size,
      is the one that get's written to the filesystem log and we only update it
      when it is safe to do so. When we write to a file beyond eof we only
      update the in- memory file size in the write operation. Later when the I/O
      operation, that flushes the cached data to disk completes, an I/O
      completion routine will update the on-disk file size. The on-disk file
      size will be updated to the maximum offset of the I/O or to the value of
      the in-memory file size if the I/O includes eof.
      
      SGI-PV: 958522
      SGI-Modid: xfs-linux-melb:xfs-kern:28322a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      ba87ea69
    • L
      [XFS] Fix race condition in xfs_write(). · 2a329631
      Lachlan McIlroy 提交于
      This change addresses a race in xfs_write() where, for direct I/O, the
      flags need_i_mutex and need_flush are setup before the iolock is acquired.
      The logic used to setup the flags may change between setting the flags and
      acquiring the iolock resulting in these flags having incorrect values. For
      example, if a file is not currently cached then need_i_mutex is set to
      zero and then if the file is cached before the iolock is acquired we will
      fail to do the flushinval before the direct write.
      
      The flush (and also the call to xfs_zero_eof()) need to be done with the
      iolock held exclusive so we need to acquire the iolock before checking for
      cached data (or if the write begins after eof) to prevent this state from
      changing. For direct I/O I've chosen to always acquire the iolock in
      shared mode initially and if there is a need to promote it then drop it
      and reacquire it.
      
      There's also some other tidy-ups including removing the O_APPEND offset
      adjustment since that work is done in generic_write_checks() (and we don't
      use offset as an input parameter anywhere).
      
      SGI-PV: 962170
      SGI-Modid: xfs-linux-melb:xfs-kern:28319a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NDavid Chinner <dgc@sgi.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      2a329631
    • L
      [XFS] propogate return codes from flush routines · d3cf2094
      Lachlan McIlroy 提交于
      This patch handles error return values in fs_flush_pages and
      fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we
      can propogate the errors and handle them at higher layers. I also modified
      xfs_itruncate_start so that it could propogate the error further.
      
      SGI-PV: 961990
      SGI-Modid: xfs-linux-melb:xfs-kern:28231a
      Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: NStewart Smith <stewart@flamingspork.com>
      Signed-off-by: NTim Shimmin <tes@sgi.com>
      d3cf2094
  24. 10 2月, 2007 2 次提交