1. 17 2月, 2017 2 次提交
    • B
      xfs: split indlen reservations fairly when under reserved · 75d65361
      Brian Foster 提交于
      Certain workoads that punch holes into speculative preallocation can
      cause delalloc indirect reservation splits when the delalloc extent is
      split in two. If further splits occur, an already short-handed extent
      can be split into two in a manner that leaves zero indirect blocks for
      one of the two new extents. This occurs because the shortage is large
      enough that the xfs_bmap_split_indlen() algorithm completely drains the
      requested indlen of one of the extents before it honors the existing
      reservation.
      
      This ultimately results in a warning from xfs_bmap_del_extent(). This
      has been observed during file copies of large, sparse files using 'cp
      --sparse=always.'
      
      To avoid this problem, update xfs_bmap_split_indlen() to explicitly
      apply the reservation shortage fairly between both extents. This smooths
      out the overall indlen shortage and defers the situation where we end up
      with a delalloc extent with zero indlen reservation to extreme
      circumstances.
      Reported-by: NPatrick Dung <mpatdung@gmail.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      75d65361
    • B
      xfs: handle indlen shortage on delalloc extent merge · 0e339ef8
      Brian Foster 提交于
      When a delalloc extent is created, it can be merged with pre-existing,
      contiguous, delalloc extents. When this occurs,
      xfs_bmap_add_extent_hole_delay() merges the extents along with the
      associated indirect block reservations. The expectation here is that the
      combined worst case indlen reservation is always less than or equal to
      the indlen reservation for the individual extents.
      
      This is not always the case, however, as existing extents can less than
      the expected indlen reservation if the extent was previously split due
      to a hole punch. If a new extent merges with such an extent, the total
      indlen requirement may be larger than the sum of the indlen reservations
      held by both extents.
      
      xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen
      reservation is always available and assigns it to the merged extent
      without consideration for the indlen held by the pre-existing extent. As
      a result, the subsequent xfs_mod_fdblocks() call can attempt an
      unintentional allocation rather than a free (indicated by an ASSERT()
      failure). Further, if the allocation happens to fail in this context,
      the failure goes unhandled and creates a filesystem wide block
      accounting inconsistency.
      
      Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the
      indlen reservation assigned to the merged extent to the sum of the
      indlen reservations held by each of the individual extents.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0e339ef8
  2. 07 2月, 2017 1 次提交
    • C
      xfs: go straight to real allocations for direct I/O COW writes · a14234c7
      Christoph Hellwig 提交于
      When we allocate COW fork blocks for direct I/O writes we currently first
      create a delayed allocation, and then convert it to a real allocation
      once we've got the delayed one.
      
      As there is no good reason for that this patch instead makes use call
      xfs_bmapi_write from the COW allocation path.  The only interesting bits
      are a few tweaks the low-level allocator to allow for this, most notably
      the need to remove the call to xfs_bmap_extsize_align for the cowextsize
      in xfs_bmap_btalloc - for the existing convert case it's a no-op, but
      for the direct allocation case it would blow up our block reservation
      way beyond what we reserved for the transaction.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a14234c7
  3. 03 2月, 2017 2 次提交
    • D
      xfs: allow unwritten extents in the CoW fork · 05a630d7
      Darrick J. Wong 提交于
      In the data fork, we only allow extents to perform the following state
      transitions:
      
      delay -> real <-> unwritten
      
      There's no way to move directly from a delalloc reservation to an
      /unwritten/ allocated extent.  However, for the CoW fork we want to be
      able to do the following to each extent:
      
      delalloc -> unwritten -> written -> remapped to data fork
      
      This will help us to avoid a race in the speculative CoW preallocation
      code between a first thread that is allocating a CoW extent and a second
      thread that is remapping part of a file after a write.  In order to do
      this, however, we need two things: first, we have to be able to
      transition from da to unwritten, and second the function that converts
      between real and unwritten has to be made aware of the cow fork.  Do
      both of those things.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      05a630d7
    • D
      xfs: filter out obviously bad btree pointers · d5a91bae
      Darrick J. Wong 提交于
      Don't let anybody load an obviously bad btree pointer.  Since the values
      come from disk, we must return an error, not just ASSERT.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      d5a91bae
  4. 31 1月, 2017 2 次提交
  5. 26 1月, 2017 1 次提交
  6. 24 1月, 2017 1 次提交
    • C
      xfs: fix COW writeback race · d2b3964a
      Christoph Hellwig 提交于
      Due to the way how xfs_iomap_write_allocate tries to convert the whole
      found extents from delalloc to real space we can run into a race
      condition with multiple threads doing writes to this same extent.
      For the non-COW case that is harmless as the only thing that can happen
      is that we call xfs_bmapi_write on an extent that has already been
      converted to a real allocation.  For COW writes where we move the extent
      from the COW to the data fork after I/O completion the race is, however,
      not quite as harmless.  In the worst case we are now calling
      xfs_bmapi_write on a region that contains hole in the COW work, which
      will trip up an assert in debug builds or lead to file system corruption
      in non-debug builds.  This seems to be reproducible with workloads of
      small O_DSYNC write, although so far I've not managed to come up with
      a with an isolated reproducer.
      
      The fix for the issue is relatively simple:  tell xfs_bmapi_write
      that we are only asked to convert delayed allocations and skip holes
      in that case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d2b3964a
  7. 10 1月, 2017 1 次提交
  8. 05 12月, 2016 4 次提交
  9. 28 11月, 2016 2 次提交
    • B
      xfs: track preallocation separately in xfs_bmapi_reserve_delalloc() · 974ae922
      Brian Foster 提交于
      Speculative preallocation is currently processed entirely by the callers
      of xfs_bmapi_reserve_delalloc(). The caller determines how much
      preallocation to include, adjusts the extent length and passes down the
      resulting request.
      
      While this works fine for post-eof speculative preallocation, it is not
      as reliable for COW fork preallocation. COW fork preallocation is
      implemented via the cowextszhint, which aligns the start offset as well
      as the length of the extent. Further, it is difficult for the caller to
      accurately identify when preallocation occurs because the returned
      extent could have been merged with neighboring extents in the fork.
      
      To simplify this situation and facilitate further COW fork preallocation
      enhancements, update xfs_bmapi_reserve_delalloc() to take a separate
      preallocation parameter to incorporate into the allocation request. The
      preallocation blocks value is tacked onto the end of the request and
      adjusted to accommodate neighboring extents and extent size limits.
      Since xfs_bmapi_reserve_delalloc() now knows precisely how much
      preallocation was included in the allocation, it can also tag the inodes
      appropriately to support preallocation reclaim.
      
      Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to
      use the preallocation mechanism. This patch should not change behavior
      outside of correctly tagging reflink inodes when start offset
      preallocation occurs (which the caller does not handle correctly).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      974ae922
    • D
      xfs: factor rmap btree size into the indlen calculations · fd26a880
      Darrick J. Wong 提交于
      When we're estimating the amount of space it's going to take to satisfy
      a delalloc reservation, we need to include the space that we might need
      to grow the rmapbt.  This helps us to avoid running out of space later
      when _iomap_write_allocate needs more space than we reserved.  Eryu Guan
      observed this happening on generic/224 when sunit/swidth were set.
      Reported-by: NEryu Guan <eguan@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      fd26a880
  10. 24 11月, 2016 7 次提交
  11. 08 11月, 2016 2 次提交
  12. 20 10月, 2016 5 次提交
  13. 06 10月, 2016 4 次提交
    • D
      xfs: try other AGs to allocate a BMBT block · 90e2056d
      Darrick J. Wong 提交于
      Prior to the introduction of reflink, allocating a block and mapping
      it into a file was performed in a single transaction with a single
      block reservation, and the allocator was supposed to find enough
      blocks to allocate the extent and any BMBT blocks that might be
      necessary (unless we're low on space).
      
      However, due to the way copy on write works, allocation and mapping
      have been split into two transactions, which means that we must be
      able to handle the case where we allocate an extent for CoW but that
      AG runs out of free space before the blocks can be mapped into a file,
      and the mapping requires a new BMBT block.  When this happens, look in
      one of the other AGs for a BMBT block instead of taking the FS down.
      
      The same applies to the functions that convert a data fork to extents
      and later btree format.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      90e2056d
    • D
      xfs: create a separate cow extent size hint for the allocator · f7ca3522
      Darrick J. Wong 提交于
      Create a per-inode extent size allocator hint for copy-on-write.  This
      hint is separate from the existing extent size hint so that CoW can
      take advantage of the fragmentation-reducing properties of extent size
      hints without disabling delalloc for regular writes.
      
      The extent size hint that's fed to the allocator during a copy on
      write operation is the greater of the cowextsize and regular extsize
      hint.
      
      During reflink, if we're sharing the entire source file to the entire
      destination file and the destination file doesn't already have a
      cowextsize hint, propagate the source file's cowextsize hint to the
      destination file.
      
      Furthermore, zero the bulkstat buffer prior to setting the fields
      so that we don't copy kernel memory contents into userspace.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f7ca3522
    • D
      xfs: store in-progress CoW allocations in the refcount btree · 174edb0e
      Darrick J. Wong 提交于
      Due to the way the CoW algorithm in XFS works, there's an interval
      during which blocks allocated to handle a CoW can be lost -- if the FS
      goes down after the blocks are allocated but before the block
      remapping takes place.  This is exacerbated by the cowextsz hint --
      allocated reservations can sit around for a while, waiting to get
      used.
      
      Since the refcount btree doesn't normally store records with refcount
      of 1, we can use it to record these in-progress extents.  In-progress
      blocks cannot be shared because they're not user-visible, so there
      shouldn't be any conflicts with other programs.  This is a better
      solution than holding EFIs during writeback because (a) EFIs can't be
      relogged currently, (b) even if they could, EFIs are bound by
      available log space, which puts an unnecessary upper bound on how much
      CoW we can have in flight, and (c) we already have a mechanism to
      track blocks.
      
      At mount time, read the refcount records and free anything we find
      with a refcount of 1 because those were in-progress when the FS went
      down.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      174edb0e
    • D
      xfs: support removing extents from CoW fork · 4862cfe8
      Darrick J. Wong 提交于
      Create a helper method to remove extents from the CoW fork without
      any of the side effects (rmapbt/bmbt updates) of the regular extent
      deletion routine.  We'll eventually use this to clear out the CoW fork
      during ioend processing.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      4862cfe8
  14. 05 10月, 2016 6 次提交