1. 12 10月, 2011 7 次提交
    • C
      xfs: simplify xfs_trans_ijoin* again · ddc3415a
      Christoph Hellwig 提交于
      There is no reason to keep a reference to the inode even if we unlock
      it during transaction commit because we never drop a reference between
      the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
      back into xfs_trans_ijoin - the third argument decides if an unlock
      is needed now.
      
      I'm actually starting to wonder if allowing inodes to be unlocked
      at transaction commit really is worth the effort.  The only real
      benefit is that they can be unlocked earlier when commiting a
      synchronous transactions, but that could be solved by doing the
      log force manually after the unlock, too.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      ddc3415a
    • C
      xfs: unlock the inode before log force in xfs_fsync · b1037058
      Christoph Hellwig 提交于
      Only read the LSN we need to push to with the ilock held, and then release
      it before we do the log force to improve concurrency.
      
      This also removes the only direct caller of _xfs_trans_commit, thus
      allowing it to be merged into the plain xfs_trans_commit again.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      b1037058
    • D
      xfs: introduce xfs_bmapi_read() · 5c8ed202
      Dave Chinner 提交于
      xfs_bmapi() currently handles both extent map reading and
      allocation. As a result, the code is littered with "if (wr)"
      branches to conditionally do allocation operations if required.
      This makes the code much harder to follow and causes significant
      indent issues with the code.
      
      Given that read mapping is much simpler than allocation, we can
      split out read mapping from xfs_bmapi() and reuse the logic that
      we have already factored out do do all the hard work of handling the
      extent map manipulations. The results in a much simpler function for
      the common extent read operations, and will allow the allocation
      code to be simplified in another commit.
      
      Once xfs_bmapi_read() is implemented, convert all the callers of
      xfs_bmapi() that are only reading extents to use the new function.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      5c8ed202
    • C
      xfs: avoid direct I/O write vs buffered I/O race · c58cb165
      Christoph Hellwig 提交于
      Currently a buffered reader or writer can add pages to the pagecache
      while we are waiting for the iolock in xfs_file_dio_aio_write.  Prevent
      this by re-checking mapping->nrpages after we got the iolock, and if
      nessecary upgrade the lock to exclusive mode.  To simplify this a bit
      only take the ilock inside of xfs_file_aio_write_checks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c58cb165
    • C
      xfs: remove i_iocount · 4a06fd26
      Christoph Hellwig 提交于
      We now have an i_dio_count filed and surrounding infrastructure to wait
      for direct I/O completion instead of i_icount, and we have never needed
      to iocount waits for buffered I/O given that we only set the page uptodate
      after finishing all required work.  Thus remove i_iocount, and replace
      the actually needed waits with calls to inode_dio_wait.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      4a06fd26
    • D
      xfs: don't serialise adjacent concurrent direct IO appending writes · 7271d243
      Dave Chinner 提交于
      For append write workloads, extending the file requires a certain
      amount of exclusive locking to be done up front to ensure sanity in
      things like ensuring that we've zeroed any allocated regions
      between the old EOF and the start of the new IO.
      
      For single threads, this typically isn't a problem, and for large
      IOs we don't serialise enough for it to be a problem for two
      threads on really fast block devices. However for smaller IO and
      larger thread counts we have a problem.
      
      Take 4 concurrent sequential, single block sized and aligned IOs.
      After the first IO is submitted but before it completes, we end up
      with this state:
      
              IO 1    IO 2    IO 3    IO 4
            +-------+-------+-------+-------+
            ^       ^
            |       |
            |       |
            |       |
            |       \- ip->i_new_size
            \- ip->i_size
      
      And the IO is done without exclusive locking because offset <=
      ip->i_size. When we submit IO 2, we see offset > ip->i_size, and
      grab the IO lock exclusive, because there is a chance we need to do
      EOF zeroing. However, there is already an IO in progress that avoids
      the need for IO zeroing because offset <= ip->i_new_size. hence we
      could avoid holding the IO lock exlcusive for this. Hence after
      submission of the second IO, we'd end up this state:
      
              IO 1    IO 2    IO 3    IO 4
            +-------+-------+-------+-------+
            ^               ^
            |               |
            |               |
            |               |
            |               \- ip->i_new_size
            \- ip->i_size
      
      There is no need to grab the i_mutex of the IO lock in exclusive
      mode if we don't need to invalidate the page cache. Taking these
      locks on every direct IO effective serialises them as taking the IO
      lock in exclusive mode has to wait for all shared holders to drop
      the lock. That only happens when IO is complete, so effective it
      prevents dispatch of concurrent direct IO writes to the same inode.
      
      And so you can see that for the third concurrent IO, we'd avoid
      exclusive locking for the same reason we avoided the exclusive lock
      for the second IO.
      
      Fixing this is a bit more complex than that, because we need to hold
      a write-submission local value of ip->i_new_size to that clearing
      the value is only done if no other thread has updated it before our
      IO completes.....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      7271d243
    • D
      xfs: don't serialise direct IO reads on page cache checks · 0c38a251
      Dave Chinner 提交于
      There is no need to grab the i_mutex of the IO lock in exclusive
      mode if we don't need to invalidate the page cache. Taking these
      locks on every direct IO effective serialises them as taking the IO
      lock in exclusive mode has to wait for all shared holders to drop
      the lock. That only happens when IO is complete, so effective it
      prevents dispatch of concurrent direct IO reads to the same inode.
      
      Fix this by taking the IO lock shared to check the page cache state,
      and only then drop it and take the IO lock exclusively if there is
      work to be done. Hence for the normal direct IO case, no exclusive
      locking will occur.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NJoern Engel <joern@logfs.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0c38a251
  2. 13 8月, 2011 1 次提交
    • C
      xfs: remove subdirectories · c59d87c4
      Christoph Hellwig 提交于
      Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
      annoying subdirectories in the XFS source code.  Besides the large
      amount of file rename the only changes are to the Makefile, a few
      files including headers with the subdirectory prefix, and the binary
      sysctl compat code that includes a header under fs/xfs/ from
      kernel/.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c59d87c4
  3. 27 7月, 2011 1 次提交
  4. 26 7月, 2011 1 次提交
  5. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  6. 08 7月, 2011 1 次提交
  7. 16 6月, 2011 1 次提交
    • C
      xfs: make log devices with write back caches work · a27a263b
      Christoph Hellwig 提交于
      There's no reason not to support cache flushing on external log devices.
      The only thing this really requires is flushing the data device first
      both in fsync and log commits.  A side effect is that we also have to
      remove the barrier write test during mount, which has been superflous
      since the new FLUSH+FUA code anyway.  Also use the chance to flush the
      RT subvolume write cache before the fsync commit, which is required
      for correct semantics.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      a27a263b
  8. 31 3月, 2011 1 次提交
  9. 26 3月, 2011 1 次提交
  10. 17 1月, 2011 1 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10
  11. 11 1月, 2011 4 次提交
  12. 12 1月, 2011 1 次提交
    • D
      xfs: introduce xfs_rw_lock() helpers for locking the inode · 487f84f3
      Dave Chinner 提交于
      We need to obtain the i_mutex, i_iolock and i_ilock during the read
      and write paths. Add a set of wrapper functions to neatly
      encapsulate the lock ordering and shared/exclusive semantics to make
      the locking easier to follow and get right.
      
      Note that this changes some of the exclusive locking serialisation in
      that serialisation will occur against the i_mutex instead of the
      XFS_IOLOCK_EXCL. This does not change any behaviour, and it is
      arguably more efficient to use the mutex for such serialisation than
      the rw_sem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      487f84f3
  13. 11 1月, 2011 3 次提交
  14. 27 7月, 2010 6 次提交
  15. 28 5月, 2010 1 次提交
  16. 19 5月, 2010 1 次提交
  17. 02 3月, 2010 6 次提交
  18. 12 12月, 2009 1 次提交
  19. 09 10月, 2009 1 次提交