1. 15 12月, 2017 1 次提交
    • D
      xfs: remove dest file's post-eof preallocations before reflinking · 5c989a0e
      Darrick J. Wong 提交于
      If we try to reflink into a file with post-eof preallocations at an
      offset well past the preallocations, we increase i_size as one would
      expect.  However, those allocations do not have page cache backing them,
      so they won't get cleaned out on their own.  This leads to asserts in
      the collapse/insert range code and xfs_destroy_inode when they encounter
      delalloc extents they weren't expecting to find.
      
      Since there are plenty of other places where we dump those post-eof
      blocks, do the same to the reflink destination file before we start
      remapping extents.  This was found by adding clonerange support to
      fsstress and running it in write-only mode.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5c989a0e
  2. 09 12月, 2017 1 次提交
  3. 07 11月, 2017 3 次提交
  4. 27 10月, 2017 1 次提交
  5. 04 10月, 2017 1 次提交
  6. 02 9月, 2017 1 次提交
  7. 21 7月, 2017 2 次提交
  8. 20 6月, 2017 2 次提交
  9. 04 5月, 2017 1 次提交
    • D
      xfs: reserve enough blocks to handle btree splits when remapping · fe0be23e
      Darrick J. Wong 提交于
      In xfs_reflink_end_cow, we erroneously reserve only enough blocks to
      handle adding 1 extent.  This is problematic if we fragment free space,
      have to do CoW, and then have to perform multiple bmap btree expansions.
      Furthermore, the BUI recovery routine doesn't reserve /any/ blocks to
      handle btree splits, so log recovery fails after our first error causes
      the filesystem to go down.
      
      Therefore, refactor the transaction block reservation macros until we
      have a macro that works for our deferred (re)mapping activities, and fix
      both problems by using that macro.
      
      With 1k blocks we can hit this fairly often in g/187 if the scratch fs
      is big enough.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      fe0be23e
  10. 04 4月, 2017 1 次提交
  11. 08 3月, 2017 1 次提交
  12. 17 2月, 2017 1 次提交
  13. 10 2月, 2017 1 次提交
  14. 07 2月, 2017 3 次提交
  15. 03 2月, 2017 1 次提交
    • D
      xfs: mark speculative prealloc CoW fork extents unwritten · 5eda4300
      Darrick J. Wong 提交于
      Christoph Hellwig pointed out that there's a potentially nasty race when
      performing simultaneous nearby directio cow writes:
      
      "Thread 1 writes a range from B to c
      
      "                    B --------- C
                                 p
      
      "a little later thread 2 writes from A to B
      
      "        A --------- B
                     p
      
      [editor's note: the 'p' denote cowextsize boundaries, which I added to
      make this more clear]
      
      "but the code preallocates beyond B into the range where thread
      "1 has just written, but ->end_io hasn't been called yet.
      "But once ->end_io is called thread 2 has already allocated
      "up to the extent size hint into the write range of thread 1,
      "so the end_io handler will splice the unintialized blocks from
      "that preallocation back into the file right after B."
      
      We can avoid this race by ensuring that thread 1 cannot accidentally
      remap the blocks that thread 2 allocated (as part of speculative
      preallocation) as part of t2's write preparation in t1's end_io handler.
      The way we make this happen is by taking advantage of the unwritten
      extent flag as an intermediate step.
      
      Recall that when we begin the process of writing data to shared blocks,
      we create a delayed allocation extent in the CoW fork:
      
      D: --RRRRRRSSSRRRRRRRR---
      C: ------DDDDDDD---------
      
      When a thread prepares to CoW some dirty data out to disk, it will now
      convert the delalloc reservation into an /unwritten/ allocated extent in
      the cow fork.  The da conversion code tries to opportunistically
      allocate as much of a (speculatively prealloc'd) extent as possible, so
      we may end up allocating a larger extent than we're actually writing
      out:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UUUUUUU---------
      
      Next, we convert only the part of the extent that we're actively
      planning to write to normal (i.e. not unwritten) status:
      
      D: --RRRRRRSSSRRRRRRRR---
      U: ------UURRUUU---------
      
      If the write succeeds, the end_cow function will now scan the relevant
      range of the CoW fork for real extents and remap only the real extents
      into the data fork:
      
      D: --RRRRRRRRSRRRRRRRR---
      U: ------UU--UUU---------
      
      This ensures that we never obliterate valid data fork extents with
      unwritten blocks from the CoW fork.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5eda4300
  16. 23 12月, 2016 1 次提交
  17. 10 12月, 2016 1 次提交
  18. 30 11月, 2016 1 次提交
  19. 28 11月, 2016 3 次提交
    • B
      xfs: clean up cow fork reservation and tag inodes correctly · 0260d8ff
      Brian Foster 提交于
      COW fork reservation is implemented via delayed allocation. The code is
      modeled after the traditional delalloc allocation code, but is slightly
      different in terms of how preallocation occurs. Rather than post-eof
      speculative preallocation, COW fork preallocation is implemented via a
      COW extent size hint that is designed to minimize fragmentation as a
      reflinked file is split over time.
      
      xfs_reflink_reserve_cow() still uses logic that is oriented towards
      dealing with post-eof speculative preallocation, however, and is stale
      or not necessarily correct. First, the EOF alignment to the COW extent
      size hint is implemented in xfs_bmapi_reserve_delalloc() (which does so
      correctly by aligning the start and end offsets) and so is not necessary
      in xfs_reflink_reserve_cow(). The backoff and retry logic on ENOSPC is
      also ineffective for the same reason, as xfs_bmapi_reserve_delalloc()
      will simply perform the same allocation request on the retry. Finally,
      since the COW extent size hint aligns the start and end offset of the
      range to allocate, the end_fsb != orig_end_fsb logic is not sufficient.
      Indeed, if a write request happens to end on an aligned offset, it is
      possible that we do not tag the inode for COW preallocation even though
      xfs_bmapi_reserve_delalloc() may have preallocated at the start offset.
      
      Kill the unnecessary, duplicate code in xfs_reflink_reserve_cow().
      Remove the inode tag logic as well since xfs_bmapi_reserve_delalloc()
      has been updated to tag the inode correctly.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      0260d8ff
    • B
      xfs: track preallocation separately in xfs_bmapi_reserve_delalloc() · 974ae922
      Brian Foster 提交于
      Speculative preallocation is currently processed entirely by the callers
      of xfs_bmapi_reserve_delalloc(). The caller determines how much
      preallocation to include, adjusts the extent length and passes down the
      resulting request.
      
      While this works fine for post-eof speculative preallocation, it is not
      as reliable for COW fork preallocation. COW fork preallocation is
      implemented via the cowextszhint, which aligns the start offset as well
      as the length of the extent. Further, it is difficult for the caller to
      accurately identify when preallocation occurs because the returned
      extent could have been merged with neighboring extents in the fork.
      
      To simplify this situation and facilitate further COW fork preallocation
      enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
      preallocation parameter to incorporate into the allocation request. The
      preallocation blocks value is tacked onto the end of the request and
      adjusted to accommodate neighboring extents and extent size limits.
      Since xfs_bmapi_reserve_delalloc() now knows precisely how much
      preallocation was included in the allocation, it can also tag the inodes
      appropriately to support preallocation reclaim.
      
      Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
      use the preallocation mechanism. This patch should not change behavior
      outside of correctly tagging reflink inodes when start offset
      preallocation occurs (which the caller does not handle correctly).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      974ae922
    • D
      xfs: always succeed when deduping zero bytes · fba3e594
      Darrick J. Wong 提交于
      It turns out that btrfs and xfs had differing interpretations of what
      to do when the dedupe length is zero.  Change xfs to follow btrfs'
      semantics so that the userland interface is consistent.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      fba3e594
  20. 24 11月, 2016 6 次提交
  21. 08 11月, 2016 2 次提交
    • E
      xfs: provide helper for counting extents from if_bytes · 5d829300
      Eric Sandeen 提交于
      The open-coded pattern:
      
      ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)
      
      is all over the xfs code; provide a new helper
      xfs_iext_count(ifp) to count the number of inline extents
      in an inode fork.
      
      [dchinner: pick up several missed conversions]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5d829300
    • B
      xfs: don't skip cow forks w/ delalloc blocks in cowblocks scan · 39937234
      Brian Foster 提交于
      The cowblocks background scanner currently clears the cowblocks tag
      for inodes without any real allocations in the cow fork. This
      excludes inodes with only delalloc blocks in the cow fork. While we
      might never expect to clear delalloc blocks from the cow fork in the
      background scanner, it is not necessarily correct to clear the
      cowblocks tag from such inodes.
      
      For example, if the background scanner happens to process an inode
      between a buffered write and writeback, the scanner catches the
      inode in a state after delalloc blocks have been allocated to the
      cow fork but before the delalloc blocks have been converted to real
      blocks by writeback. The background scanner then incorrectly clears
      the cowblocks tag, even if part of the aforementioned delalloc
      reservation will not be remapped to the data fork (i.e., extra
      blocks due to the cowextsize hint). This means that any such
      additional blocks in the cow fork might never be reclaimed by the
      background scanner and could persist until the inode itself is
      reclaimed.
      
      To address this problem, only skip and clear inodes without any cow
      fork allocations whatsoever from the background scanner. While we
      generally do not want to cancel delalloc reservations from the
      background scanner, the pagecache dirty check following the
      cowblocks check should prevent that situation. If we do end up with
      delalloc cow fork blocks without a dirty address space mapping, this
      is probably an indication that something has gone wrong and the
      blocks should be reclaimed, as they may never be converted to a real
      allocation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      39937234
  22. 24 10月, 2016 1 次提交
    • B
      xfs: clear cowblocks tag when cow fork is emptied · c17a8ef4
      Brian Foster 提交于
      The background cowblocks scan job takes care of scanning for inodes with
      potentially lingering blocks in the cow fork and clearing them out. If
      the background scanner reclaims the cow fork blocks, however, it doesn't
      immediately clear the cowblocks tag from the inode. Instead, the inode
      remains tagged until the background scanner comes around again,
      discovers the inode cow fork has no blocks, clears the tag and fires the
      trace_xfs_inode_free_cowblocks_invalid() tracepoint to indicate that the
      inode may have been incorrectly tagged.
      
      This is not a major functional problem as the tag is ultimately cleared.
      Nonetheless, clear the tag when an inode cow fork is explicitly emptied
      to avoid the extra round trip through the background scanner and
      spurious "invalid" tracepoint.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      c17a8ef4
  23. 20 10月, 2016 4 次提交