1. 25 2月, 2017 2 次提交
  2. 23 2月, 2017 2 次提交
  3. 07 2月, 2017 2 次提交
  4. 03 2月, 2017 1 次提交
  5. 31 1月, 2017 1 次提交
    • B
      xfs: sync eofblocks scans under iolock are livelock prone · c3155097
      Brian Foster 提交于
      The xfs_eofblocks.eof_scan_owner field is an internal field to
      facilitate invoking eofb scans from the kernel while under the iolock.
      This is necessary because the eofb scan acquires the iolock of each
      inode. Synchronous scans are invoked on certain buffered write failures
      while under iolock. In such cases, the scan owner indicates that the
      context for the scan already owns the particular iolock and prevents a
      double lock deadlock.
      
      eofblocks scans while under iolock are still livelock prone in the event
      of multiple parallel scans, however. If multiple buffered writes to
      different inodes fail and invoke eofblocks scans at the same time, each
      scan avoids a deadlock with its own inode by virtue of the
      eof_scan_owner field, but will never be able to acquire the iolock of
      the inode from the parallel scan. Because the low free space scans are
      invoked with SYNC_WAIT, the scan will not return until it has processed
      every tagged inode and thus both scans will spin indefinitely on the
      iolock being held across the opposite scan. This problem can be
      reproduced reliably by generic/224 on systems with higher cpu counts
      (x16).
      
      To avoid this problem, simplify the semantics of eofblocks scans to
      never invoke a scan while under iolock. This means that the buffered
      write context must drop the iolock before the scan. It must reacquire
      the lock before the write retry and also repeat the initial write
      checks, as the original state might no longer be valid once the iolock
      was dropped.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c3155097
  6. 10 12月, 2016 1 次提交
    • C
      fs: try to clone files first in vfs_copy_file_range · a76b5b04
      Christoph Hellwig 提交于
      A clone is a perfectly fine implementation of a file copy, so most
      file systems just implement the copy that way.  Instead of duplicating
      this logic move it to the VFS.  Currently btrfs and XFS implement copies
      the same way as clones and there is no behavior change for them, cifs
      only implements clones and grow support for copy_file_range with this
      patch.  NFS implements both, so this will allow copy_file_range to work
      on servers that only implement CLONE and be lot more efficient on servers
      that implements CLONE and COPY.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a76b5b04
  7. 09 12月, 2016 1 次提交
  8. 05 12月, 2016 1 次提交
  9. 30 11月, 2016 2 次提交
    • C
      xfs: use iomap_dio_rw · acdda3aa
      Christoph Hellwig 提交于
      Straight switch over to using iomap for direct I/O - we already have the
      non-COW dio path in write_begin for DAX and files with extent size hints,
      so nothing to add there.  The COW path is ported over from the old
      get_blocks version and a bit of a mess, but I have some work in progress
      to make it look more like the buffered I/O COW path.
      
      This gets rid of xfs_get_blocks_direct and the last caller of
      xfs_get_blocks with the create flag set, so all that code can be removed.
      
      Last but not least I've removed a comment in xfs_filemap_fault that
      refers to xfs_get_blocks entirely instead of updating it - while the
      reference is correct, the whole DAX fault path looks different than
      the non-DAX one, so it seems rather pointless.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJens Axboe <axboe@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      acdda3aa
    • C
      xfs: remove i_iolock and use i_rwsem in the VFS inode instead · 65523218
      Christoph Hellwig 提交于
      This patch drops the XFS-own i_iolock and uses the VFS i_rwsem which
      recently replaced i_mutex instead.  This means we only have to take
      one lock instead of two in many fast path operations, and we can
      also shrink the xfs_inode structure.  Thanks to the xfs_ilock family
      there is very little churn, the only thing of note is that we need
      to switch to use the lock_two_directory helper for taking the i_rwsem
      on two inodes in a few places to make sure our lock order matches
      the one used in the VFS.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJens Axboe <axboe@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      65523218
  10. 08 11月, 2016 2 次提交
  11. 20 10月, 2016 6 次提交
  12. 11 10月, 2016 1 次提交
    • A
      fix ITER_PIPE interaction with direct_IO · c3a69024
      Al Viro 提交于
      by making sure we call iov_iter_advance() on original
      iov_iter even if direct_IO (done on its copy) has returned 0.
      It's a no-op for old iov_iter flavours and does the right thing
      (== truncation of the stuff we'd allocated, but not filled) in
      ITER_PIPE case.  Failures (e.g. -EIO) get caught and dealt with
      by cleanup in generic_file_read_iter().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c3a69024
  13. 10 10月, 2016 1 次提交
  14. 08 10月, 2016 1 次提交
  15. 06 10月, 2016 7 次提交
    • D
      xfs: don't mix reflink and DAX mode for now · 4f435ebe
      Darrick J. Wong 提交于
      Since we don't have a strategy for handling both DAX and reflink,
      for now we'll just prohibit both being set at the same time.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      4f435ebe
    • D
      xfs: garbage collect old cowextsz reservations · 83104d44
      Darrick J. Wong 提交于
      Trim CoW reservations made on behalf of a cowextsz hint if they get too
      old or we run low on quota, so long as we don't have dirty data awaiting
      writeback or directio operations in progress.
      
      Garbage collection of the cowextsize extents are kept separate from
      prealloc extent reaping because setting the CoW prealloc lifetime to a
      (much) higher value than the regular prealloc extent lifetime has been
      useful for combatting CoW fragmentation on VM hosts where the VMs
      experience bursty write behaviors and we can keep the utilization ratios
      low enough that we don't start to run out of space.  IOWs, it benefits
      us to keep the CoW fork reservations around for as long as we can unless
      we run out of blocks or hit inode reclaim.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      83104d44
    • D
      xfs: unshare a range of blocks via fallocate · 98cc2db5
      Darrick J. Wong 提交于
      Unshare all shared extents if the user calls fallocate with the new
      unshare mode flag set, so that we can guarantee that a subsequent
      write will not ENOSPC.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch: pass inode instead of file to xfs_reflink_dirty_range,
            use iomap infrastructure for copy up]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      98cc2db5
    • D
      xfs: add dedupe range vfs function · cc714660
      Darrick J. Wong 提交于
      Define a VFS function which allows userspace to request that the
      kernel reflink a range of blocks between two files if the ranges'
      contents match.  The function fits the new VFS ioctl that standardizes
      the checking for the btrfs EXTENT SAME ioctl.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      cc714660
    • D
      xfs: add clone file and clone range vfs functions · 9fe26045
      Darrick J. Wong 提交于
      Define two VFS functions which allow userspace to reflink a range of
      blocks between two files or to reflink one file's contents to another.
      These functions fit the new VFS ioctls that standardize the checking
      for the btrfs CLONE and CLONE RANGE ioctls.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9fe26045
    • D
      xfs: implement CoW for directio writes · 0613f16c
      Darrick J. Wong 提交于
      For O_DIRECT writes to shared blocks, we have to CoW them just like
      we would with buffered writes.  For writes that are not block-aligned,
      just bounce them to the page cache.
      
      For block-aligned writes, however, we can do better than that.  Use
      the same mechanisms that we employ for buffered CoW to set up a
      delalloc reservation, allocate all the blocks at once, issue the
      writes against the new blocks and use the same ioend functions to
      remap the blocks after the write.  This should be fairly performant.
      
      Christoph discovered that xfs_reflink_allocate_cow_range may stumble
      over invalid entries in the extent array given that it drops the ilock
      but still expects the index to be stable.  Simple fixing it to a new
      lookup for every iteration still isn't correct given that
      xfs_bmapi_allocate will trigger a BUG_ON() if hitting a hole, and
      there is nothing preventing a xfs_bunmapi_cow call removing extents
      once we dropped the ilock either.
      
      This patch duplicates the inner loop of xfs_bmapi_allocate into a
      helper for xfs_reflink_allocate_cow_range so that it can be done under
      the same ilock critical section as our CoW fork delayed allocation.
      The directio CoW warts will be revisited in a later patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      0613f16c
    • A
      switch generic_file_splice_read() to use of ->read_iter() · 82c156f8
      Al Viro 提交于
      ... and kill the ->splice_read() instances that can be switched to it
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      82c156f8
  16. 03 10月, 2016 1 次提交
  17. 22 9月, 2016 1 次提交
  18. 19 9月, 2016 2 次提交
  19. 17 8月, 2016 1 次提交
  20. 27 7月, 2016 1 次提交
    • R
      dax: remote unused fault wrappers · 6b524995
      Ross Zwisler 提交于
      Remove the unused wrappers dax_fault() and dax_pmd_fault().  After this
      removal, rename __dax_fault() and __dax_pmd_fault() to dax_fault() and
      dax_pmd_fault() respectively, and update all callers.
      
      The dax_fault() and dax_pmd_fault() wrappers were initially intended to
      capture some filesystem independent functionality around page faults
      (calling sb_start_pagefault() & sb_end_pagefault(), updating file mtime
      and ctime).
      
      However, the following commits:
      
         5726b27b ("ext2: Add locking for DAX faults")
         ea3d7209 ("ext4: fix races between page faults and hole punching")
      
      added locking to the ext2 and ext4 filesystems after these common
      operations but before __dax_fault() and __dax_pmd_fault() were called.
      This means that these wrappers are no longer used, and are unlikely to
      be used in the future.
      
      XFS has had locking analogous to what was recently added to ext2 and
      ext4 since DAX support was initially introduced by:
      
         6b698ede ("xfs: add DAX file operations support")
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b524995
  21. 22 7月, 2016 1 次提交
    • A
      xfs: remove dax code from object file when disabled · f021bd07
      Arnd Bergmann 提交于
      We check IS_DAX(inode) before calling either xfs_file_dax_read or
      xfs_file_dax_write, and this will lead the call being optimized out at
      compile time when CONFIG_FS_DAX is disabled.
      
      However, the two functions are marked STATIC, so they become global
      symbols when CONFIG_XFS_DEBUG is set, leaving us with two unused global
      functions that call into an undefined function and a broken "allmodconfig"
      build:
      
      fs/built-in.o: In function `xfs_file_dax_read':
      fs/xfs/xfs_file.c:348: undefined reference to `dax_do_io'
      fs/built-in.o: In function `xfs_file_dax_write':
      fs/xfs/xfs_file.c:758: undefined reference to `dax_do_io'
      
      Marking the two functions 'static noinline' instead of 'STATIC' will let
      the compiler drop the symbols when there are no callers but avoid the
      implicit inlining.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 16d4d435 ("xfs: split direct I/O and DAX path")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f021bd07
  22. 20 7月, 2016 2 次提交