1. 12 2月, 2019 1 次提交
    • B
      xfs: validate writeback mapping using data fork seq counter · d9252d52
      Brian Foster 提交于
      The writeback code caches the current extent mapping across multiple
      xfs_do_writepage() calls to avoid repeated lookups for sequential
      pages backed by the same extent. This is known to be slightly racy
      with extent fork changes in certain difficult to reproduce
      scenarios. The cached extent is trimmed to within EOF to help avoid
      the most common vector for this problem via speculative
      preallocation management, but this is a band-aid that does not
      address the fundamental problem.
      
      Now that we have an xfs_ifork sequence counter mechanism used to
      facilitate COW writeback, we can use the same mechanism to validate
      consistency between the data fork and cached writeback mappings. On
      its face, this is somewhat of a big hammer approach because any
      change to the data fork invalidates any mapping currently cached by
      a writeback in progress regardless of whether the data fork change
      overlaps with the range under writeback. In practice, however, the
      impact of this approach is minimal in most cases.
      
      First, data fork changes (delayed allocations) caused by sustained
      sequential buffered writes are amortized across speculative
      preallocations. This means that a cached mapping won't be
      invalidated by each buffered write of a common file copy workload,
      but rather only on less frequent allocation events. Second, the
      extent tree is always entirely in-core so an additional lookup of a
      usable extent mostly costs a shared ilock cycle and in-memory tree
      lookup. This means that a cached mapping reval is relatively cheap
      compared to the I/O itself. Third, spurious invalidations don't
      impact ioend construction. This means that even if the same extent
      is revalidated multiple times across multiple writepage instances,
      we still construct and submit the same size ioend (and bio) if the
      blocks are physically contiguous.
      
      Update struct xfs_writepage_ctx with a new field to hold the
      sequence number of the data fork associated with the currently
      cached mapping. Check the wpc seqno against the data fork when the
      mapping is validated and reestablish the mapping whenever the fork
      has changed since the mapping was cached. This ensures that
      writeback always uses a valid extent mapping and thus prevents lost
      writebacks and stale delalloc block problems.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d9252d52
  2. 04 2月, 2019 1 次提交
    • B
      xfs: eof trim writeback mapping as soon as it is cached · aa6ee4ab
      Brian Foster 提交于
      The cached writeback mapping is EOF trimmed to try and avoid races
      between post-eof block management and writeback that result in
      sending cached data to a stale location. The cached mapping is
      currently trimmed on the validation check, which leaves a race
      window between the time the mapping is cached and when it is trimmed
      against the current inode size.
      
      For example, if a new mapping is cached by delalloc conversion on a
      blocksize == page size fs, we could cycle various locks, perform
      memory allocations, etc.  in the writeback codepath before the
      associated mapping is eventually trimmed to i_size. This leaves
      enough time for a post-eof truncate and file append before the
      cached mapping is trimmed. The former event essentially invalidates
      a range of the cached mapping and the latter bumps the inode size
      such the trim on the next writepage event won't trim all of the
      invalid blocks. fstest generic/464 reproduces this scenario
      occasionally and causes a lost writeback and stale delalloc blocks
      warning on inode inactivation.
      
      To work around this problem, trim the cached writeback mapping as
      soon as it is cached in addition to on subsequent validation checks.
      This is a minor tweak to tighten the race window as much as possible
      until a proper invalidation mechanism is available.
      
      Fixes: 40214d12 ("xfs: trim writepage mapping to within eof")
      Cc: <stable@vger.kernel.org> # v4.14+
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      aa6ee4ab
  3. 18 10月, 2018 1 次提交
  4. 08 8月, 2018 1 次提交
  5. 01 8月, 2018 1 次提交
  6. 30 7月, 2018 1 次提交
  7. 12 7月, 2018 20 次提交
  8. 09 6月, 2018 1 次提交
  9. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  10. 02 6月, 2018 1 次提交
  11. 31 5月, 2018 1 次提交
  12. 16 5月, 2018 1 次提交
  13. 12 4月, 2018 1 次提交
  14. 11 4月, 2018 1 次提交
  15. 31 3月, 2018 1 次提交
    • D
      xfs, dax: introduce xfs_dax_aops · 6e2608df
      Dan Williams 提交于
      In preparation for the dax implementation to start associating dax pages
      to inodes via page->mapping, we need to provide a 'struct
      address_space_operations' instance for dax. Otherwise, direct-I/O
      triggers incorrect page cache assumptions and warnings like the
      following:
      
       WARNING: CPU: 27 PID: 1783 at fs/xfs/xfs_aops.c:1468
       xfs_vm_set_page_dirty+0xf3/0x1b0 [xfs]
       [..]
       CPU: 27 PID: 1783 Comm: dma-collision Tainted: G           O 4.15.0-rc2+ #984
       [..]
       Call Trace:
        set_page_dirty_lock+0x40/0x60
        bio_set_pages_dirty+0x37/0x50
        iomap_dio_actor+0x2b7/0x3b0
        ? iomap_dio_zero+0x110/0x110
        iomap_apply+0xa4/0x110
        iomap_dio_rw+0x29e/0x3b0
        ? iomap_dio_zero+0x110/0x110
        ? xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
        xfs_file_dio_aio_read+0x7c/0x1a0 [xfs]
        xfs_file_read_iter+0xa0/0xc0 [xfs]
        __vfs_read+0xf9/0x170
        vfs_read+0xa6/0x150
        SyS_pread64+0x93/0xb0
        entry_SYSCALL_64_fastpath+0x1f/0x96
      
      ...where the default set_page_dirty() handler assumes that dirty state
      is being tracked in 'struct page' flags.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6e2608df
  16. 19 3月, 2018 1 次提交
  17. 16 3月, 2018 2 次提交
  18. 12 3月, 2018 1 次提交
  19. 29 1月, 2018 1 次提交
    • D
      xfs: skip CoW writes past EOF when writeback races with truncate · 70c57dcd
      Darrick J. Wong 提交于
      Every so often we blow the ASSERT(type != XFS_IO_COW) in xfs_map_blocks
      when running fsstress, as we do in generic/269.  The cause of this is
      writeback racing with truncate -- writeback doesn't take the iolock, so
      truncate can sneak in to decrease i_size and truncate page cache while
      writeback is gathering buffer heads to schedule writeout.
      
      If we hit this race on a block that has a CoW mapping, we'll get a valid
      imap from the CoW fork but the reduced i_size trims the mapping to zero
      length (which makes it invalid), so we call xfs_map_blocks to try again.
      This doesn't do much anyway, since any mapping we get out of that will
      also be invalid, so we might as well skip the assert and just stop.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      70c57dcd
  20. 13 1月, 2018 1 次提交
    • D
      xfs: use %px for data pointers when debugging · c9690043
      Darrick J. Wong 提交于
      Starting with commit 57e73442 ("vsprintf: refactor %pK code out of
      pointer"), the behavior of the raw '%p' printk format specifier was
      changed to print a 32-bit hash of the pointer value to avoid leaking
      kernel pointers into dmesg.  For most situations that's good.
      
      This is /undesirable/ behavior when we're trying to debug XFS, however,
      so define a PTR_FMT that prints the actual pointer when we're in debug
      mode.
      
      Note that %p for tracepoints still prints the raw pointer, so in the
      long run we could consider rewriting some of these messages as
      tracepoints.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      c9690043