1. 22 10月, 2019 3 次提交
  2. 21 10月, 2019 3 次提交
  3. 18 10月, 2019 1 次提交
    • D
      iomap: iomap that extends beyond EOF should be marked dirty · 7684e2c4
      Dave Chinner 提交于
      When doing a direct IO that spans the current EOF, and there are
      written blocks beyond EOF that extend beyond the current write, the
      only metadata update that needs to be done is a file size extension.
      
      However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
      there is IO completion metadata updates required, and hence we may
      fail to correctly sync file size extensions made in IO completion
      when O_DSYNC writes are being used and the hardware supports FUA.
      
      Hence when setting IOMAP_F_DIRTY, we need to also take into account
      whether the iomap spans the current EOF. If it does, then we need to
      mark it dirty so that IO completion will call generic_write_sync()
      to flush the inode size update to stable storage correctly.
      
      Fixes: 3460cac1 ("iomap: Use FUA for pure data O_DSYNC DIO writes")
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: removed the ext4 part; they'll handle it separately]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7684e2c4
  4. 03 9月, 2019 1 次提交
  5. 01 7月, 2019 1 次提交
  6. 29 6月, 2019 1 次提交
  7. 26 2月, 2019 2 次提交
  8. 21 2月, 2019 6 次提交
  9. 18 2月, 2019 4 次提交
  10. 12 2月, 2019 2 次提交
    • B
      xfs: use the latest extent at writeback delalloc conversion time · c2b31643
      Brian Foster 提交于
      The writeback delalloc conversion code is racy with respect to
      changes in the currently cached file mapping outside of the current
      page. This is because the ilock is cycled between the time the
      caller originally looked up the mapping and across each real
      allocation of the provided file range. This code has collected
      various hacks over the years to help combat the symptoms of these
      races (i.e., truncate race detection, allocation into hole
      detection, etc.), but none address the fundamental problem that the
      imap may not be valid at allocation time.
      
      Rather than continue to use race detection hacks, update writeback
      delalloc conversion to a model that explicitly converts the delalloc
      extent backing the current file offset being processed. The current
      file offset is the only block we can trust to remain once the ilock
      is dropped because any operation that can remove the block
      (truncate, hole punch, etc.) must flush and discard pagecache pages
      first.
      
      Modify xfs_iomap_write_allocate() to use the xfs_bmapi_delalloc()
      mechanism to request allocation of the entire delalloc extent
      backing the current offset instead of assuming the extent passed by
      the caller is unchanged. Record the range specified by the caller
      and apply it to the resulting allocated extent so previous checks by
      the caller for COW fork overlap are not lost. Finally, overload the
      bmapi delalloc flag with the range reval flag behavior since this is
      the only use case for both.
      
      This ensures that writeback always picks up the correct
      and current extent associated with the page, regardless of races
      with other extent modifying operations. If operating on a data fork
      and the COW overlap state has changed since the ilock was cycled,
      the caller revalidates against the COW fork sequence number before
      using the imap for the next block.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c2b31643
    • B
      xfs: validate writeback mapping using data fork seq counter · d9252d52
      Brian Foster 提交于
      The writeback code caches the current extent mapping across multiple
      xfs_do_writepage() calls to avoid repeated lookups for sequential
      pages backed by the same extent. This is known to be slightly racy
      with extent fork changes in certain difficult to reproduce
      scenarios. The cached extent is trimmed to within EOF to help avoid
      the most common vector for this problem via speculative
      preallocation management, but this is a band-aid that does not
      address the fundamental problem.
      
      Now that we have an xfs_ifork sequence counter mechanism used to
      facilitate COW writeback, we can use the same mechanism to validate
      consistency between the data fork and cached writeback mappings. On
      its face, this is somewhat of a big hammer approach because any
      change to the data fork invalidates any mapping currently cached by
      a writeback in progress regardless of whether the data fork change
      overlaps with the range under writeback. In practice, however, the
      impact of this approach is minimal in most cases.
      
      First, data fork changes (delayed allocations) caused by sustained
      sequential buffered writes are amortized across speculative
      preallocations. This means that a cached mapping won't be
      invalidated by each buffered write of a common file copy workload,
      but rather only on less frequent allocation events. Second, the
      extent tree is always entirely in-core so an additional lookup of a
      usable extent mostly costs a shared ilock cycle and in-memory tree
      lookup. This means that a cached mapping reval is relatively cheap
      compared to the I/O itself. Third, spurious invalidations don't
      impact ioend construction. This means that even if the same extent
      is revalidated multiple times across multiple writepage instances,
      we still construct and submit the same size ioend (and bio) if the
      blocks are physically contiguous.
      
      Update struct xfs_writepage_ctx with a new field to hold the
      sequence number of the data fork associated with the currently
      cached mapping. Check the wpc seqno against the data fork when the
      mapping is validated and reestablish the mapping whenever the fork
      has changed since the mapping was cached. This ensures that
      writeback always uses a valid extent mapping and thus prevents lost
      writebacks and stale delalloc block problems.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d9252d52
  11. 18 10月, 2018 3 次提交
  12. 08 8月, 2018 1 次提交
  13. 03 8月, 2018 2 次提交
    • B
      xfs: automatic dfops inode relogging · a8198666
      Brian Foster 提交于
      Inodes that are held across deferred operations are explicitly
      joined to the dfops structure to ensure appropriate relogging.
      While inodes are currently joined explicitly, we can detect the
      conditions that require relogging at dfops finish time by inspecting
      the transaction item list for inodes with ili_lock_flags == 0.
      
      Replace the xfs_defer_ijoin() infrastructure with such detection and
      automatic relogging of held inodes. This eliminates the need for the
      per-dfops inode list, replaced by an on-stack variant in
      xfs_defer_trans_roll().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a8198666
    • B
      xfs: add missing defer ijoins for held inodes · 488c919a
      Brian Foster 提交于
      Log items that require relogging during deferred operations
      processing are explicitly joined to the associated dfops via the
      xfs_defer_*join() helpers. These calls imply that the associated
      object is "held" by the transaction such that when rolled, the item
      can be immediately joined to a follow up transaction. For buffers,
      this means the buffer remains locked and held after each roll. For
      inodes, this means that the inode remains locked.
      
      Failure to join a held item to the dfops structure means the
      associated object pins the tail of the log while dfops processing
      completes, because the item never relogs and is not unlocked or
      released until deferred processing completes.
      
      Currently, all buffers that are held in transactions (XFS_BLI_HOLD)
      with deferred operations are explicitly joined to the dfops. This is
      not the case for inodes, however, as various contexts defer
      operations to transactions with held inodes without explicit joins
      to the associated dfops (and thus not relogging).
      
      While this is not a catastrophic problem, it is not ideal. Given
      that we want to eventually relog such items automatically during
      dfops processing, start by explicitly adding these missing
      xfs_defer_ijoin() calls. A call is added everywhere an inode is
      joined to a transaction without transferring lock ownership and
      said transaction runs deferred operations.
      
      All xfs_defer_ijoin() calls will eventually be replaced by automatic
      dfops inode relogging. This patch essentially implements the
      behavior change that would otherwise occur due to automatic inode
      dfops relogging.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      488c919a
  14. 01 8月, 2018 1 次提交
  15. 27 7月, 2018 1 次提交
    • B
      xfs: remove all boilerplate defer init/finish code · c8eac49e
      Brian Foster 提交于
      At this point, the transaction subsystem completely manages deferred
      items internally such that the common and boilerplate
      xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() ->
      xfs_trans_commit() sequence can be replaced with a simple
      transaction allocation and commit.
      
      Remove all such boilerplate deferred ops code. In doing so, we
      change each case over to use the dfops in the transaction and
      specifically eliminate:
      
      - The on-stack dfops and associated xfs_defer_init() call, as the
        internal dfops is initialized on transaction allocation.
      - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of
        a transaction.
      - xfs_defer_cancel() calls in error handlers that precede a
        transaction cancel.
      
      The only deferred ops calls that remain are those that are
      non-deterministic with respect to the final commit of the associated
      transaction or are open-coded due to special handling.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c8eac49e
  16. 12 7月, 2018 8 次提交