1. 23 11月, 2019 1 次提交
  2. 01 11月, 2019 1 次提交
    • D
      xfs: properly serialise fallocate against AIO+DIO · 249bd908
      Dave Chinner 提交于
      AIO+DIO can extend the file size on IO completion, and it holds
      no inode locks while the IO is in flight. Therefore, a race
      condition exists in file size updates if we do something like this:
      
      aio-thread			fallocate-thread
      
      lock inode
      submit IO beyond inode->i_size
      unlock inode
      .....
      				lock inode
      				break layouts
      				if (off + len > inode->i_size)
      					new_size = off + len
      				.....
      				inode_dio_wait()
      				<blocks>
      .....
      completes
      inode->i_size updated
      inode_dio_done()
      ....
      				<wakes>
      				<does stuff no long beyond EOF>
      				if (new_size)
      					xfs_vn_setattr(inode, new_size)
      
      
      Yup, that attempt to extend the file size in the fallocate code
      turns into a truncate - it removes the whatever the aio write
      allocated and put to disk, and reduced the inode size back down to
      where the fallocate operation ends.
      
      Fundamentally, xfs_file_fallocate()  not compatible with racing
      AIO+DIO completions, so we need to move the inode_dio_wait() call
      up to where the lock the inode and break the layouts.
      
      Secondly, storing the inode size and then using it unchecked without
      holding the ILOCK is not safe; we can only do such a thing if we've
      locked out and drained all IO and other modification operations,
      which we don't do initially in xfs_file_fallocate.
      
      It should be noted that some of the fallocate operations are
      compound operations - they are made up of multiple manipulations
      that may zero data, and so we may need to flush and invalidate the
      file multiple times during an operation. However, we only need to
      lock out IO and other space manipulation operations once, as that
      lockout is maintained until the entire fallocate operation has been
      completed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      249bd908
  3. 29 10月, 2019 1 次提交
  4. 28 10月, 2019 2 次提交
  5. 22 10月, 2019 2 次提交
  6. 15 10月, 2019 2 次提交
  7. 20 9月, 2019 2 次提交
  8. 31 8月, 2019 1 次提交
  9. 06 7月, 2019 1 次提交
  10. 01 7月, 2019 1 次提交
  11. 29 6月, 2019 1 次提交
  12. 10 6月, 2019 1 次提交
  13. 23 4月, 2019 1 次提交
  14. 26 3月, 2019 1 次提交
    • B
      xfs: serialize unaligned dio writes against all other dio writes · 2032a8a2
      Brian Foster 提交于
      XFS applies more strict serialization constraints to unaligned
      direct writes to accommodate things like direct I/O layer zeroing,
      unwritten extent conversion, etc. Unaligned submissions acquire the
      exclusive iolock and wait for in-flight dio to complete to ensure
      multiple submissions do not race on the same block and cause data
      corruption.
      
      This generally works in the case of an aligned dio followed by an
      unaligned dio, but the serialization is lost if I/Os occur in the
      opposite order. If an unaligned write is submitted first and
      immediately followed by an overlapping, aligned write, the latter
      submits without the typical unaligned serialization barriers because
      there is no indication of an unaligned dio still in-flight. This can
      lead to unpredictable results.
      
      To provide proper unaligned dio serialization, require that such
      direct writes are always the only dio allowed in-flight at one time
      for a particular inode. We already acquire the exclusive iolock and
      drain pending dio before submitting the unaligned dio. Wait once
      more after the dio submission to hold the iolock across the I/O and
      prevent further submissions until the unaligned I/O completes. This
      is heavy handed, but consistent with the current pre-submission
      serialization for unaligned direct writes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2032a8a2
  15. 24 2月, 2019 1 次提交
  16. 21 2月, 2019 2 次提交
    • C
      xfs: introduce an always_cow mode · 66ae56a5
      Christoph Hellwig 提交于
      Add a mode where XFS never overwrites existing blocks in place.  This
      is to aid debugging our COW code, and also put infatructure in place
      for things like possible future support for zoned block devices, which
      can't support overwrites.
      
      This mode is enabled globally by doing a:
      
          echo 1 > /sys/fs/xfs/debug/always_cow
      
      Note that the parameter is global to allow running all tests in xfstests
      easily in this mode, which would not easily be possible with a per-fs
      sysfs file.
      
      In always_cow mode persistent preallocations are disabled, and fallocate
      will fail when called with a 0 mode (with our without
      FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
      when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.
      
      There are a few interesting xfstests failures when run in always_cow
      mode:
      
       - generic/392 fails because the bytes used in the file used to test
         hole punch recovery are less after the log replay.  This is
         because the blocks written and then punched out are only freed
         with a delay due to the logging mechanism.
       - xfs/170 will fail as the already fragile file streams mechanism
         doesn't seem to interact well with the COW allocator
       - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
         the file system is badly fragmented, but there is not much we
         can do to avoid that when always writing out of place
       - xfs/205 fails because overwriting a file in always_cow mode
         will require new space allocation and the assumption in the
         test thus don't work anymore.
       - xfs/326 fails to modify the file at all in always_cow mode after
         injecting the refcount error, leading to an unexpected md5sum
         after the remount, but that again is expected
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      66ae56a5
    • C
      xfs: fix SEEK_DATA for speculative COW fork preallocation · 60271ab7
      Christoph Hellwig 提交于
      We speculatively allocate extents in the COW fork to reduce
      fragmentation.  But when we write data into such COW fork blocks that
      do now shadow an allocation in the data fork SEEK_DATA will not
      correctly report it, as it only looks at the data fork extents.
      The only reason why that hasn't been an issue so far is because
      we even use these speculative COW fork preallocations over holes in
      the data fork at all for buffered writes, and blocks in the COW
      fork that are written by direct writes are moved into the data
      fork immediately at I/O completion time.
      
      Add a new set of iomap_ops for SEEK_HOLE/SEEK_DATA which looks into
      both the COW and data fork, and reports all COW extents as unwritten
      to the iomap layer.  While this isn't strictly true for COW fork
      extents that were already converted to real extents, the practical
      semantics that you can't read data from them until they are moved
      into the data fork are very similar, and this will force the iomap
      layer into probing the extents for actually present data.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      60271ab7
  17. 20 11月, 2018 1 次提交
  18. 30 10月, 2018 5 次提交
  19. 18 8月, 2018 1 次提交
  20. 12 8月, 2018 1 次提交
  21. 12 7月, 2018 1 次提交
  22. 07 7月, 2018 2 次提交
  23. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  24. 30 5月, 2018 1 次提交
  25. 22 5月, 2018 3 次提交
    • D
      xfs, dax: introduce xfs_break_dax_layouts() · d6dc57e2
      Dan Williams 提交于
      xfs_break_dax_layouts(), similar to xfs_break_leased_layouts(), scans
      for busy / pinned dax pages and waits for those pages to go idle before
      any potential extent unmap operation.
      
      dax_layout_busy_page() handles synchronizing against new page-busy
      events (get_user_pages). It invalidates all mappings to trigger the
      get_user_pages slow path which will eventually block on the xfs inode
      lock held in XFS_MMAPLOCK_EXCL mode. If dax_layout_busy_page() finds a
      busy page it returns it for xfs to wait for the page-idle event that
      will fire when the page reference count reaches 1 (recall ZONE_DEVICE
      pages are idle at count 1, see generic_dax_pagefree()).
      
      While waiting, the XFS_MMAPLOCK_EXCL lock is dropped in order to not
      deadlock the process that might be trying to elevate the page count of
      more pages before arranging for any of them to go idle. I.e. the typical
      case of submitting I/O is that iov_iter_get_pages() elevates the
      reference count of all pages in the I/O before starting I/O on the first
      page. The process of elevating the reference count of all pages involved
      in an I/O may cause faults that need to take XFS_MMAPLOCK_EXCL.
      
      Although XFS_MMAPLOCK_EXCL is dropped while waiting, XFS_IOLOCK_EXCL is
      held while sleeping. We need this to prevent starvation of the truncate
      path as continuous submission of direct-I/O could starve the truncate
      path indefinitely if the lock is dropped.
      
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d6dc57e2
    • D
      xfs: prepare xfs_break_layouts() for another layout type · 69eb5fa1
      Dan Williams 提交于
      When xfs is operating as the back-end of a pNFS block server, it
      prevents collisions between local and remote operations by requiring a
      lease to be held for remotely accessed blocks. Local filesystem
      operations break those leases before writing or mutating the extent map
      of the file.
      
      A similar mechanism is needed to prevent operations on pinned dax
      mappings, like device-DMA, from colliding with extent unmap operations.
      
      BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of
      layout breaking.
      
      Layouts are broken in the BREAK_WRITE case to ensure that layout-holders
      do not collide with local writes. Additionally, layouts are broken in
      the BREAK_UNMAP case to make sure the layout-holder has a consistent
      view of the file's extent map. While BREAK_WRITE breaks can be satisfied
      be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require
      waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL.
      
      After this refactoring xfs_break_layouts() becomes the entry point for
      coordinating both types of breaks. Finally, xfs_break_leased_layouts()
      becomes just the BREAK_WRITE handler.
      
      Note that the unlock tracking is needed in a follow on change. That will
      coordinate retrying either break handler until both successfully test
      for a lease break while maintaining the lock state.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      69eb5fa1
    • D
      xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL · c63a8eae
      Dan Williams 提交于
      In preparation for adding coordination between extent unmap operations
      and busy dax-pages, update xfs_break_layouts() to permit it to be called
      with the mmap lock held. This lock scheme will be required for
      coordinating the break of 'dax layouts' (non-idle dax (ZONE_DEVICE)
      pages mapped into the file's address space). Breaking dax layouts will
      be added to xfs_break_layouts() in a future patch, for now this preps
      the unmap call sites to take and hold XFS_MMAPLOCK_EXCL over the call to
      xfs_break_layouts().
      
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: N"Darrick J. Wong" <darrick.wong@oracle.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c63a8eae
  26. 10 5月, 2018 2 次提交
  27. 03 5月, 2018 1 次提交