1. 03 9月, 2019 1 次提交
  2. 01 7月, 2019 1 次提交
  3. 29 6月, 2019 1 次提交
  4. 26 2月, 2019 2 次提交
  5. 21 2月, 2019 6 次提交
  6. 18 2月, 2019 4 次提交
  7. 12 2月, 2019 2 次提交
    • B
      xfs: use the latest extent at writeback delalloc conversion time · c2b31643
      Brian Foster 提交于
      The writeback delalloc conversion code is racy with respect to
      changes in the currently cached file mapping outside of the current
      page. This is because the ilock is cycled between the time the
      caller originally looked up the mapping and across each real
      allocation of the provided file range. This code has collected
      various hacks over the years to help combat the symptoms of these
      races (i.e., truncate race detection, allocation into hole
      detection, etc.), but none address the fundamental problem that the
      imap may not be valid at allocation time.
      
      Rather than continue to use race detection hacks, update writeback
      delalloc conversion to a model that explicitly converts the delalloc
      extent backing the current file offset being processed. The current
      file offset is the only block we can trust to remain once the ilock
      is dropped because any operation that can remove the block
      (truncate, hole punch, etc.) must flush and discard pagecache pages
      first.
      
      Modify xfs_iomap_write_allocate() to use the xfs_bmapi_delalloc()
      mechanism to request allocation of the entire delalloc extent
      backing the current offset instead of assuming the extent passed by
      the caller is unchanged. Record the range specified by the caller
      and apply it to the resulting allocated extent so previous checks by
      the caller for COW fork overlap are not lost. Finally, overload the
      bmapi delalloc flag with the range reval flag behavior since this is
      the only use case for both.
      
      This ensures that writeback always picks up the correct
      and current extent associated with the page, regardless of races
      with other extent modifying operations. If operating on a data fork
      and the COW overlap state has changed since the ilock was cycled,
      the caller revalidates against the COW fork sequence number before
      using the imap for the next block.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c2b31643
    • B
      xfs: validate writeback mapping using data fork seq counter · d9252d52
      Brian Foster 提交于
      The writeback code caches the current extent mapping across multiple
      xfs_do_writepage() calls to avoid repeated lookups for sequential
      pages backed by the same extent. This is known to be slightly racy
      with extent fork changes in certain difficult to reproduce
      scenarios. The cached extent is trimmed to within EOF to help avoid
      the most common vector for this problem via speculative
      preallocation management, but this is a band-aid that does not
      address the fundamental problem.
      
      Now that we have an xfs_ifork sequence counter mechanism used to
      facilitate COW writeback, we can use the same mechanism to validate
      consistency between the data fork and cached writeback mappings. On
      its face, this is somewhat of a big hammer approach because any
      change to the data fork invalidates any mapping currently cached by
      a writeback in progress regardless of whether the data fork change
      overlaps with the range under writeback. In practice, however, the
      impact of this approach is minimal in most cases.
      
      First, data fork changes (delayed allocations) caused by sustained
      sequential buffered writes are amortized across speculative
      preallocations. This means that a cached mapping won't be
      invalidated by each buffered write of a common file copy workload,
      but rather only on less frequent allocation events. Second, the
      extent tree is always entirely in-core so an additional lookup of a
      usable extent mostly costs a shared ilock cycle and in-memory tree
      lookup. This means that a cached mapping reval is relatively cheap
      compared to the I/O itself. Third, spurious invalidations don't
      impact ioend construction. This means that even if the same extent
      is revalidated multiple times across multiple writepage instances,
      we still construct and submit the same size ioend (and bio) if the
      blocks are physically contiguous.
      
      Update struct xfs_writepage_ctx with a new field to hold the
      sequence number of the data fork associated with the currently
      cached mapping. Check the wpc seqno against the data fork when the
      mapping is validated and reestablish the mapping whenever the fork
      has changed since the mapping was cached. This ensures that
      writeback always uses a valid extent mapping and thus prevents lost
      writebacks and stale delalloc block problems.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d9252d52
  8. 18 10月, 2018 3 次提交
  9. 08 8月, 2018 1 次提交
  10. 03 8月, 2018 2 次提交
    • B
      xfs: automatic dfops inode relogging · a8198666
      Brian Foster 提交于
      Inodes that are held across deferred operations are explicitly
      joined to the dfops structure to ensure appropriate relogging.
      While inodes are currently joined explicitly, we can detect the
      conditions that require relogging at dfops finish time by inspecting
      the transaction item list for inodes with ili_lock_flags == 0.
      
      Replace the xfs_defer_ijoin() infrastructure with such detection and
      automatic relogging of held inodes. This eliminates the need for the
      per-dfops inode list, replaced by an on-stack variant in
      xfs_defer_trans_roll().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a8198666
    • B
      xfs: add missing defer ijoins for held inodes · 488c919a
      Brian Foster 提交于
      Log items that require relogging during deferred operations
      processing are explicitly joined to the associated dfops via the
      xfs_defer_*join() helpers. These calls imply that the associated
      object is "held" by the transaction such that when rolled, the item
      can be immediately joined to a follow up transaction. For buffers,
      this means the buffer remains locked and held after each roll. For
      inodes, this means that the inode remains locked.
      
      Failure to join a held item to the dfops structure means the
      associated object pins the tail of the log while dfops processing
      completes, because the item never relogs and is not unlocked or
      released until deferred processing completes.
      
      Currently, all buffers that are held in transactions (XFS_BLI_HOLD)
      with deferred operations are explicitly joined to the dfops. This is
      not the case for inodes, however, as various contexts defer
      operations to transactions with held inodes without explicit joins
      to the associated dfops (and thus not relogging).
      
      While this is not a catastrophic problem, it is not ideal. Given
      that we want to eventually relog such items automatically during
      dfops processing, start by explicitly adding these missing
      xfs_defer_ijoin() calls. A call is added everywhere an inode is
      joined to a transaction without transferring lock ownership and
      said transaction runs deferred operations.
      
      All xfs_defer_ijoin() calls will eventually be replaced by automatic
      dfops inode relogging. This patch essentially implements the
      behavior change that would otherwise occur due to automatic inode
      dfops relogging.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      488c919a
  11. 01 8月, 2018 1 次提交
  12. 27 7月, 2018 1 次提交
    • B
      xfs: remove all boilerplate defer init/finish code · c8eac49e
      Brian Foster 提交于
      At this point, the transaction subsystem completely manages deferred
      items internally such that the common and boilerplate
      xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() ->
      xfs_trans_commit() sequence can be replaced with a simple
      transaction allocation and commit.
      
      Remove all such boilerplate deferred ops code. In doing so, we
      change each case over to use the dfops in the transaction and
      specifically eliminate:
      
      - The on-stack dfops and associated xfs_defer_init() call, as the
        internal dfops is initialized on transaction allocation.
      - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of
        a transaction.
      - xfs_defer_cancel() calls in error handlers that precede a
        transaction cancel.
      
      The only deferred ops calls that remain are those that are
      non-deterministic with respect to the final commit of the associated
      transaction or are open-coded due to special handling.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c8eac49e
  13. 12 7月, 2018 10 次提交
  14. 25 6月, 2018 1 次提交
    • D
      xfs: recheck reflink state after grabbing ILOCK_SHARED for a write · 5bd88d15
      Darrick J. Wong 提交于
      The reflink iflag could have changed since the earlier unlocked check,
      so if we got ILOCK_SHARED for a write and but we're now a reflink inode
      we have to switch to ILOCK_EXCL and relock.
      
      This helps us avoid blowing lock assertions in things like generic/166:
      
      XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383
      WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:assfail+0x25/0x30 [xfs]
      Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
      RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447
      RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000
      R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9
      FS:  00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0
      Call Trace:
       xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs]
       xfs_file_iomap_begin+0x6d2/0xeb0 [xfs]
       ? iomap_to_fiemap+0x80/0x80
       iomap_apply+0x5e/0x130
       iomap_dio_rw+0x2e0/0x400
       ? iomap_to_fiemap+0x80/0x80
       ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_write_iter+0x7b/0xb0 [xfs]
       __vfs_write+0x16f/0x1f0
       vfs_write+0xc8/0x1c0
       ksys_pwrite64+0x74/0x90
       do_syscall_64+0x56/0x180
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5bd88d15
  15. 21 6月, 2018 1 次提交
  16. 09 6月, 2018 1 次提交
  17. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  18. 10 5月, 2018 1 次提交