1. 30 10月, 2018 11 次提交
  2. 06 10月, 2018 2 次提交
    • D
      xfs: fix data corruption w/ unaligned reflink ranges · b3998900
      Dave Chinner 提交于
      When reflinking sub-file ranges, a data corruption can occur when
      the source file range includes a partial EOF block. This shares the
      unknown data beyond EOF into the second file at a position inside
      EOF, exposing stale data in the second file.
      
      XFS only supports whole block sharing, but we still need to
      support whole file reflink correctly.  Hence if the reflink
      request includes the last block of the souce file, only proceed with
      the reflink operation if it lands at or past the destination file's
      current EOF. If it lands within the destination file EOF, reject the
      entire request with -EINVAL and make the caller go the hard way.
      
      This avoids the data corruption vector, but also avoids disruption
      of returning EINVAL to userspace for the common case of whole file
      cloning.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b3998900
    • D
      xfs: fix data corruption w/ unaligned dedupe ranges · dceeb47b
      Dave Chinner 提交于
      A deduplication data corruption is Exposed by fstests generic/505 on
      XFS. It is caused by extending the block match range to include the
      partial EOF block, but then allowing unknown data beyond EOF to be
      considered a "match" to data in the destination file because the
      comparison is only made to the end of the source file. This corrupts
      the destination file when the source extent is shared with it.
      
      XFS only supports whole block dedupe, but we still need to appear to
      support whole file dedupe correctly.  Hence if the dedupe request
      includes the last block of the souce file, don't include it in the
      actual XFS dedupe operation. If the rest of the range dedupes
      successfully, then report the partial last block as deduped, too, so
      that userspace sees it as a successful dedupe rather than return
      EINVAL because we can't dedupe unaligned blocks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dceeb47b
  3. 05 10月, 2018 3 次提交
  4. 29 9月, 2018 2 次提交
    • C
      xfs: skip delalloc COW blocks in xfs_reflink_end_cow · f5f3f959
      Christoph Hellwig 提交于
      The iomap direct I/O code issues a single ->end_io call for the whole
      I/O request, and if some of the extents cowered needed a COW operation
      it will call xfs_reflink_end_cow over the whole range.
      
      When we do AIO writes we drop the iolock after doing the initial setup,
      but before the I/O completion.  Between dropping the lock and completing
      the I/O we can have a racing buffered write create new delalloc COW fork
      extents in the region covered by the outstanding direct I/O write, and
      thus see delalloc COW fork extents in xfs_reflink_end_cow.  As
      concurrent writes are fundamentally racy and no guarantees are given we
      can simply skip those.
      
      This can be easily reproduced with xfstests generic/208 in always_cow
      mode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      f5f3f959
    • D
      xfs: fix transaction leak in xfs_reflink_allocate_cow() · df307077
      Dave Chinner 提交于
      When xfs_reflink_allocate_cow() allocates a transaction, it drops
      the ILOCK to perform the operation. This Introduces a race condition
      where another thread modifying the file can perform the COW
      allocation operation underneath us. This result in the retry loop
      finding an allocated block and jumping straight to the conversion
      code. It does not, however, cancel the transaction it holds and so
      this gets leaked. This results in a lockdep warning:
      
      ================================================
      WARNING: lock held when returning to user space!
      4.18.5 #1 Not tainted
      ------------------------------------------------
      worker/6123 is leaving the kernel with locks still held!
      1 lock held by worker/6123:
       #0: 000000009eab4f1b (sb_internal#2){.+.+}, at: xfs_trans_alloc+0x17c/0x220
      
      And eventually the filesystem deadlocks because it runs out of log
      space that is reserved by the leaked transaction and never gets
      released.
      
      The logic flow in xfs_reflink_allocate_cow() is a convoluted mess of
      gotos - it's no surprise that it has bug where the flow through
      several goto jumps then fails to clean up context from a non-obvious
      logic path. CLean up the logic flow and make sure every path does
      the right thing.
      Reported-by: NAlexander Y. Fomichev <git.user@gmail.com>
      Tested-by: NAlexander Y. Fomichev <git.user@gmail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200981Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [hch: slight refactor]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      df307077
  5. 03 8月, 2018 5 次提交
    • B
      xfs: fold dfops into the transaction · 9d9e6233
      Brian Foster 提交于
      struct xfs_defer_ops has now been reduced to a single list_head. The
      external dfops mechanism is unused and thus everywhere a (permanent)
      transaction is accessible the associated dfops structure is as well.
      
      Remove the xfs_defer_ops structure and fold the list_head into the
      transaction. Also remove the last remnant of external dfops in
      xfs_trans_dup().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9d9e6233
    • B
      xfs: pass transaction to xfs_defer_add() · 0f37d178
      Brian Foster 提交于
      The majority of remaining references to struct xfs_defer_ops in XFS
      are associated with xfs_defer_add(). At this point, there are no
      more external xfs_defer_ops users left. All instances of
      xfs_defer_ops are embedded in the transaction, which means we can
      safely pass the transaction down to the dfops add interface.
      
      Update xfs_defer_add() to receive the transaction as a parameter.
      Various subsystems implement wrappers to allocate and construct the
      context specific data structures for the associated deferred
      operation type. Update these to also carry the transaction down as
      needed and clean up unused dfops parameters along the way.
      
      This removes most of the remaining references to struct
      xfs_defer_ops throughout the code and facilitates removal of the
      structure.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [darrick: fix unused variable warnings with ftrace disabled]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0f37d178
    • B
      xfs: cancel dfops on xfs_defer_finish() error · 9b1f4e98
      Brian Foster 提交于
      The current semantics of xfs_defer_finish() require the caller to
      call xfs_defer_cancel() on error. This is slightly inconsistent with
      transaction commit error handling where a failed commit cleans up
      the transaction before returning.
      
      More significantly, the only requirement for exposure of
      ->dop_pending outside of xfs_defer_finish() is so that
      xfs_defer_cancel() can drain it on error. Since the only recourse of
      xfs_defer_finish() errors is cancellation, mirror the transaction
      logic and cancel remaining dfops before returning from
      xfs_defer_finish() with an error.
      
      Beside simplifying xfs_defer_finish() semantics, this ensures that
      xfs_defer_finish() always returns with an empty ->dop_pending and
      thus facilitates removal of the list from xfs_defer_ops.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9b1f4e98
    • B
      xfs: automatic dfops inode relogging · a8198666
      Brian Foster 提交于
      Inodes that are held across deferred operations are explicitly
      joined to the dfops structure to ensure appropriate relogging.
      While inodes are currently joined explicitly, we can detect the
      conditions that require relogging at dfops finish time by inspecting
      the transaction item list for inodes with ili_lock_flags == 0.
      
      Replace the xfs_defer_ijoin() infrastructure with such detection and
      automatic relogging of held inodes. This eliminates the need for the
      per-dfops inode list, replaced by an on-stack variant in
      xfs_defer_trans_roll().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a8198666
    • B
      xfs: add missing defer ijoins for held inodes · 488c919a
      Brian Foster 提交于
      Log items that require relogging during deferred operations
      processing are explicitly joined to the associated dfops via the
      xfs_defer_*join() helpers. These calls imply that the associated
      object is "held" by the transaction such that when rolled, the item
      can be immediately joined to a follow up transaction. For buffers,
      this means the buffer remains locked and held after each roll. For
      inodes, this means that the inode remains locked.
      
      Failure to join a held item to the dfops structure means the
      associated object pins the tail of the log while dfops processing
      completes, because the item never relogs and is not unlocked or
      released until deferred processing completes.
      
      Currently, all buffers that are held in transactions (XFS_BLI_HOLD)
      with deferred operations are explicitly joined to the dfops. This is
      not the case for inodes, however, as various contexts defer
      operations to transactions with held inodes without explicit joins
      to the associated dfops (and thus not relogging).
      
      While this is not a catastrophic problem, it is not ideal. Given
      that we want to eventually relog such items automatically during
      dfops processing, start by explicitly adding these missing
      xfs_defer_ijoin() calls. A call is added everywhere an inode is
      joined to a transaction without transferring lock ownership and
      said transaction runs deferred operations.
      
      All xfs_defer_ijoin() calls will eventually be replaced by automatic
      dfops inode relogging. This patch essentially implements the
      behavior change that would otherwise occur due to automatic inode
      dfops relogging.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      488c919a
  6. 30 7月, 2018 1 次提交
  7. 27 7月, 2018 3 次提交
  8. 24 7月, 2018 1 次提交
  9. 12 7月, 2018 12 次提交