1. 22 11月, 2018 1 次提交
    • D
      xfs: flush removing page cache in xfs_reflink_remap_prep · 2c307174
      Dave Chinner 提交于
      On a sub-page block size filesystem, fsx is failing with a data
      corruption after a series of operations involving copying a file
      with the destination offset beyond EOF of the destination of the file:
      
      8093(157 mod 256): TRUNCATE DOWN        from 0x7a120 to 0x50000 ******WWWW
      8094(158 mod 256): INSERT 0x25000 thru 0x25fff  (0x1000 bytes)
      8095(159 mod 256): COPY 0x18000 thru 0x1afff    (0x3000 bytes) to 0x2f400
      8096(160 mod 256): WRITE    0x5da00 thru 0x651ff        (0x7800 bytes) HOLE
      8097(161 mod 256): COPY 0x2000 thru 0x5fff      (0x4000 bytes) to 0x6fc00
      
      The second copy here is beyond EOF, and it is to sub-page (4k) but
      block aligned (1k) offset. The clone runs the EOF zeroing, landing
      in a pre-existing post-eof delalloc extent. This zeroes the post-eof
      extents in the page cache just fine, dirtying the pages correctly.
      
      The problem is that xfs_reflink_remap_prep() now truncates the page
      cache over the range that it is copying it to, and rounds that down
      to cover the entire start page. This removes the dirty page over the
      delalloc extent from the page cache without having written it back.
      Hence later, when the page cache is flushed, the page at offset
      0x6f000 has not been written back and hence exposes stale data,
      which fsx trips over less than 10 operations later.
      
      Fix this by changing xfs_reflink_remap_prep() to use
      xfs_flush_unmap_range().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2c307174
  2. 20 11月, 2018 1 次提交
    • B
      xfs: fix shared extent data corruption due to missing cow reservation · 59e42931
      Brian Foster 提交于
      Page writeback indirectly handles shared extents via the existence
      of overlapping COW fork blocks. If COW fork blocks exist, writeback
      always performs the associated copy-on-write regardless if the
      underlying blocks are actually shared. If the blocks are shared,
      then overlapping COW fork blocks must always exist.
      
      fstests shared/010 reproduces a case where a buffered write occurs
      over a shared block without performing the requisite COW fork
      reservation.  This ultimately causes writeback to the shared extent
      and data corruption that is detected across md5 checks of the
      filesystem across a mount cycle.
      
      The problem occurs when a buffered write lands over a shared extent
      that crosses an extent size hint boundary and that also happens to
      have a partial COW reservation that doesn't cover the start and end
      blocks of the data fork extent.
      
      For example, a buffered write occurs across the file offset (in FSB
      units) range of [29, 57]. A shared extent exists at blocks [29, 35]
      and COW reservation already exists at blocks [32, 34]. After
      accommodating a COW extent size hint of 32 blocks and the existing
      reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
      blocks of reservation at offset 0 and returns with COW reservation
      across the range of [0, 34]. The associated data fork extent is
      still [29, 35], however, which isn't fully covered by the COW
      reservation.
      
      This leads to a buffered write at file offset 35 over a shared
      extent without associated COW reservation. Writeback eventually
      kicks in, performs an overwrite of the underlying shared block and
      causes the associated data corruption.
      
      Update xfs_reflink_reserve_cow() to accommodate the fact that a
      delalloc allocation request may not fully cover the extent in the
      data fork. Trim the data fork extent appropriately, just as is done
      for shared extent boundaries and/or existing COW reservations that
      happen to overlap the start of the data fork extent. This prevents
      shared/010 failures due to data corruption on reflink enabled
      filesystems.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      59e42931
  3. 30 10月, 2018 12 次提交
  4. 18 10月, 2018 3 次提交
  5. 06 10月, 2018 2 次提交
    • D
      xfs: fix data corruption w/ unaligned reflink ranges · b3998900
      Dave Chinner 提交于
      When reflinking sub-file ranges, a data corruption can occur when
      the source file range includes a partial EOF block. This shares the
      unknown data beyond EOF into the second file at a position inside
      EOF, exposing stale data in the second file.
      
      XFS only supports whole block sharing, but we still need to
      support whole file reflink correctly.  Hence if the reflink
      request includes the last block of the souce file, only proceed with
      the reflink operation if it lands at or past the destination file's
      current EOF. If it lands within the destination file EOF, reject the
      entire request with -EINVAL and make the caller go the hard way.
      
      This avoids the data corruption vector, but also avoids disruption
      of returning EINVAL to userspace for the common case of whole file
      cloning.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b3998900
    • D
      xfs: fix data corruption w/ unaligned dedupe ranges · dceeb47b
      Dave Chinner 提交于
      A deduplication data corruption is Exposed by fstests generic/505 on
      XFS. It is caused by extending the block match range to include the
      partial EOF block, but then allowing unknown data beyond EOF to be
      considered a "match" to data in the destination file because the
      comparison is only made to the end of the source file. This corrupts
      the destination file when the source extent is shared with it.
      
      XFS only supports whole block dedupe, but we still need to appear to
      support whole file dedupe correctly.  Hence if the dedupe request
      includes the last block of the souce file, don't include it in the
      actual XFS dedupe operation. If the rest of the range dedupes
      successfully, then report the partial last block as deduped, too, so
      that userspace sees it as a successful dedupe rather than return
      EINVAL because we can't dedupe unaligned blocks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dceeb47b
  6. 05 10月, 2018 3 次提交
  7. 29 9月, 2018 2 次提交
    • C
      xfs: skip delalloc COW blocks in xfs_reflink_end_cow · f5f3f959
      Christoph Hellwig 提交于
      The iomap direct I/O code issues a single ->end_io call for the whole
      I/O request, and if some of the extents cowered needed a COW operation
      it will call xfs_reflink_end_cow over the whole range.
      
      When we do AIO writes we drop the iolock after doing the initial setup,
      but before the I/O completion.  Between dropping the lock and completing
      the I/O we can have a racing buffered write create new delalloc COW fork
      extents in the region covered by the outstanding direct I/O write, and
      thus see delalloc COW fork extents in xfs_reflink_end_cow.  As
      concurrent writes are fundamentally racy and no guarantees are given we
      can simply skip those.
      
      This can be easily reproduced with xfstests generic/208 in always_cow
      mode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      f5f3f959
    • D
      xfs: fix transaction leak in xfs_reflink_allocate_cow() · df307077
      Dave Chinner 提交于
      When xfs_reflink_allocate_cow() allocates a transaction, it drops
      the ILOCK to perform the operation. This Introduces a race condition
      where another thread modifying the file can perform the COW
      allocation operation underneath us. This result in the retry loop
      finding an allocated block and jumping straight to the conversion
      code. It does not, however, cancel the transaction it holds and so
      this gets leaked. This results in a lockdep warning:
      
      ================================================
      WARNING: lock held when returning to user space!
      4.18.5 #1 Not tainted
      ------------------------------------------------
      worker/6123 is leaving the kernel with locks still held!
      1 lock held by worker/6123:
       #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
      
      And eventually the filesystem deadlocks because it runs out of log
      space that is reserved by the leaked transaction and never gets
      released.
      
      The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
      gotos - it's no surprise that it has bug where the flow through
      several goto jumps then fails to clean up context from a non-obvious
      logic path. CLean up the logic flow and make sure every path does
      the right thing.
      Reported-by: NAlexander Y. Fomichev <git.user@gmail.com>
      Tested-by: NAlexander Y. Fomichev <git.user@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [hch: slight refactor]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      df307077
  8. 03 8月, 2018 5 次提交
    • B
      xfs: fold dfops into the transaction · 9d9e6233
      Brian Foster 提交于
      struct xfs_defer_ops has now been reduced to a single list_head. The
      external dfops mechanism is unused and thus everywhere a (permanent)
      transaction is accessible the associated dfops structure is as well.
      
      Remove the xfs_defer_ops structure and fold the list_head into the
      transaction. Also remove the last remnant of external dfops in
      xfs_trans_dup().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9d9e6233
    • B
      xfs: pass transaction to xfs_defer_add() · 0f37d178
      Brian Foster 提交于
      The majority of remaining references to struct xfs_defer_ops in XFS
      are associated with xfs_defer_add(). At this point, there are no
      more external xfs_defer_ops users left. All instances of
      xfs_defer_ops are embedded in the transaction, which means we can
      safely pass the transaction down to the dfops add interface.
      
      Update xfs_defer_add() to receive the transaction as a parameter.
      Various subsystems implement wrappers to allocate and construct the
      context specific data structures for the associated deferred
      operation type. Update these to also carry the transaction down as
      needed and clean up unused dfops parameters along the way.
      
      This removes most of the remaining references to struct
      xfs_defer_ops throughout the code and facilitates removal of the
      structure.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [darrick: fix unused variable warnings with ftrace disabled]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0f37d178
    • B
      xfs: cancel dfops on xfs_defer_finish() error · 9b1f4e98
      Brian Foster 提交于
      The current semantics of xfs_defer_finish() require the caller to
      call xfs_defer_cancel() on error. This is slightly inconsistent with
      transaction commit error handling where a failed commit cleans up
      the transaction before returning.
      
      More significantly, the only requirement for exposure of
      ->dop_pending outside of xfs_defer_finish() is so that
      xfs_defer_cancel() can drain it on error. Since the only recourse of
      xfs_defer_finish() errors is cancellation, mirror the transaction
      logic and cancel remaining dfops before returning from
      xfs_defer_finish() with an error.
      
      Beside simplifying xfs_defer_finish() semantics, this ensures that
      xfs_defer_finish() always returns with an empty ->dop_pending and
      thus facilitates removal of the list from xfs_defer_ops.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9b1f4e98
    • B
      xfs: automatic dfops inode relogging · a8198666
      Brian Foster 提交于
      Inodes that are held across deferred operations are explicitly
      joined to the dfops structure to ensure appropriate relogging.
      While inodes are currently joined explicitly, we can detect the
      conditions that require relogging at dfops finish time by inspecting
      the transaction item list for inodes with ili_lock_flags == 0.
      
      Replace the xfs_defer_ijoin() infrastructure with such detection and
      automatic relogging of held inodes. This eliminates the need for the
      per-dfops inode list, replaced by an on-stack variant in
      xfs_defer_trans_roll().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a8198666
    • B
      xfs: add missing defer ijoins for held inodes · 488c919a
      Brian Foster 提交于
      Log items that require relogging during deferred operations
      processing are explicitly joined to the associated dfops via the
      xfs_defer_*join() helpers. These calls imply that the associated
      object is "held" by the transaction such that when rolled, the item
      can be immediately joined to a follow up transaction. For buffers,
      this means the buffer remains locked and held after each roll. For
      inodes, this means that the inode remains locked.
      
      Failure to join a held item to the dfops structure means the
      associated object pins the tail of the log while dfops processing
      completes, because the item never relogs and is not unlocked or
      released until deferred processing completes.
      
      Currently, all buffers that are held in transactions (XFS_BLI_HOLD)
      with deferred operations are explicitly joined to the dfops. This is
      not the case for inodes, however, as various contexts defer
      operations to transactions with held inodes without explicit joins
      to the associated dfops (and thus not relogging).
      
      While this is not a catastrophic problem, it is not ideal. Given
      that we want to eventually relog such items automatically during
      dfops processing, start by explicitly adding these missing
      xfs_defer_ijoin() calls. A call is added everywhere an inode is
      joined to a transaction without transferring lock ownership and
      said transaction runs deferred operations.
      
      All xfs_defer_ijoin() calls will eventually be replaced by automatic
      dfops inode relogging. This patch essentially implements the
      behavior change that would otherwise occur due to automatic inode
      dfops relogging.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      488c919a
  9. 30 7月, 2018 1 次提交
  10. 27 7月, 2018 3 次提交
  11. 24 7月, 2018 1 次提交
  12. 12 7月, 2018 6 次提交