1. 06 10月, 2016 18 次提交
    • D
      xfs: garbage collect old cowextsz reservations · 83104d44
      Darrick J. Wong 提交于
      Trim CoW reservations made on behalf of a cowextsz hint if they get too
      old or we run low on quota, so long as we don't have dirty data awaiting
      writeback or directio operations in progress.
      
      Garbage collection of the cowextsize extents are kept separate from
      prealloc extent reaping because setting the CoW prealloc lifetime to a
      (much) higher value than the regular prealloc extent lifetime has been
      useful for combatting CoW fragmentation on VM hosts where the VMs
      experience bursty write behaviors and we can keep the utilization ratios
      low enough that we don't start to run out of space.  IOWs, it benefits
      us to keep the CoW fork reservations around for as long as we can unless
      we run out of blocks or hit inode reclaim.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      83104d44
    • D
      xfs: try other AGs to allocate a BMBT block · 90e2056d
      Darrick J. Wong 提交于
      Prior to the introduction of reflink, allocating a block and mapping
      it into a file was performed in a single transaction with a single
      block reservation, and the allocator was supposed to find enough
      blocks to allocate the extent and any BMBT blocks that might be
      necessary (unless we're low on space).
      
      However, due to the way copy on write works, allocation and mapping
      have been split into two transactions, which means that we must be
      able to handle the case where we allocate an extent for CoW but that
      AG runs out of free space before the blocks can be mapped into a file,
      and the mapping requires a new BMBT block.  When this happens, look in
      one of the other AGs for a BMBT block instead of taking the FS down.
      
      The same applies to the functions that convert a data fork to extents
      and later btree format.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      90e2056d
    • D
      xfs: don't allow reflink when the AG is low on space · 6fa164b8
      Darrick J. Wong 提交于
      If the AG free space is down to the reserves, refuse to reflink our
      way out of space.  Hopefully userspace will make a real copy and/or go
      elsewhere.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      6fa164b8
    • D
      xfs: preallocate blocks for worst-case btree expansion · 84d69619
      Darrick J. Wong 提交于
      To gracefully handle the situation where a CoW operation turns a
      single refcount extent into a lot of tiny ones and then run out of
      space when a tree split has to happen, use the per-AG reserved block
      pool to pre-allocate all the space we'll ever need for a maximal
      btree.  For a 4K block size, this only costs an overhead of 0.3% of
      available disk space.
      
      When reflink is enabled, we have an unfortunate problem with rmap --
      since we can share a block billions of times, this means that the
      reverse mapping btree can expand basically infinitely.  When an AG is
      so full that there are no free blocks with which to expand the rmapbt,
      the filesystem will shut down hard.
      
      This is rather annoying to the user, so use the AG reservation code to
      reserve a "reasonable" amount of space for rmap.  We'll prevent
      reflinks and CoW operations if we think we're getting close to
      exhausting an AG's free space rather than shutting down, but this
      permanent reservation should be enough for "most" users.  Hopefully.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch@lst.de: ensure that we invalidate the freed btree buffer]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      84d69619
    • D
      xfs: create a separate cow extent size hint for the allocator · f7ca3522
      Darrick J. Wong 提交于
      Create a per-inode extent size allocator hint for copy-on-write.  This
      hint is separate from the existing extent size hint so that CoW can
      take advantage of the fragmentation-reducing properties of extent size
      hints without disabling delalloc for regular writes.
      
      The extent size hint that's fed to the allocator during a copy on
      write operation is the greater of the cowextsize and regular extsize
      hint.
      
      During reflink, if we're sharing the entire source file to the entire
      destination file and the destination file doesn't already have a
      cowextsize hint, propagate the source file's cowextsize hint to the
      destination file.
      
      Furthermore, zero the bulkstat buffer prior to setting the fields
      so that we don't copy kernel memory contents into userspace.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f7ca3522
    • D
      xfs: unshare a range of blocks via fallocate · 98cc2db5
      Darrick J. Wong 提交于
      Unshare all shared extents if the user calls fallocate with the new
      unshare mode flag set, so that we can guarantee that a subsequent
      write will not ENOSPC.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch: pass inode instead of file to xfs_reflink_dirty_range,
            use iomap infrastructure for copy up]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      98cc2db5
    • D
      xfs: swap inode reflink flags when swapping inode extents · f0bc4d13
      Darrick J. Wong 提交于
      When we're swapping the extents of two inodes, be sure to swap the
      reflink inode flag too.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f0bc4d13
    • D
      xfs: teach get_bmapx about shared extents and the CoW fork · f86f4037
      Darrick J. Wong 提交于
      Teach xfs_getbmapx how to report shared extents and CoW fork contents
      accurately in the bmap output by querying the refcount btree
      appropriately.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f86f4037
    • D
      xfs: add dedupe range vfs function · cc714660
      Darrick J. Wong 提交于
      Define a VFS function which allows userspace to request that the
      kernel reflink a range of blocks between two files if the ranges'
      contents match.  The function fits the new VFS ioctl that standardizes
      the checking for the btrfs EXTENT SAME ioctl.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      cc714660
    • D
      xfs: add clone file and clone range vfs functions · 9fe26045
      Darrick J. Wong 提交于
      Define two VFS functions which allow userspace to reflink a range of
      blocks between two files or to reflink one file's contents to another.
      These functions fit the new VFS ioctls that standardize the checking
      for the btrfs CLONE and CLONE RANGE ioctls.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9fe26045
    • D
      xfs: reflink extents from one file to another · 862bb360
      Darrick J. Wong 提交于
      Reflink extents from one file to another; that is to say, iteratively
      remove the mappings from the destination file, copy the mappings from
      the source file to the destination file, and increment the reference
      count of all the blocks that got remapped.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      862bb360
    • D
      xfs: store in-progress CoW allocations in the refcount btree · 174edb0e
      Darrick J. Wong 提交于
      Due to the way the CoW algorithm in XFS works, there's an interval
      during which blocks allocated to handle a CoW can be lost -- if the FS
      goes down after the blocks are allocated but before the block
      remapping takes place.  This is exacerbated by the cowextsz hint --
      allocated reservations can sit around for a while, waiting to get
      used.
      
      Since the refcount btree doesn't normally store records with refcount
      of 1, we can use it to record these in-progress extents.  In-progress
      blocks cannot be shared because they're not user-visible, so there
      shouldn't be any conflicts with other programs.  This is a better
      solution than holding EFIs during writeback because (a) EFIs can't be
      relogged currently, (b) even if they could, EFIs are bound by
      available log space, which puts an unnecessary upper bound on how much
      CoW we can have in flight, and (c) we already have a mechanism to
      track blocks.
      
      At mount time, read the refcount records and free anything we find
      with a refcount of 1 because those were in-progress when the FS went
      down.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      174edb0e
    • D
      xfs: cancel pending CoW reservations when destroying inodes · 5e7e605c
      Darrick J. Wong 提交于
      When destroying the inode, cancel all pending reservations in the CoW
      fork so that all the reserved blocks go back to the free pile.  In
      theory this sort of cleanup is only needed to clean up after write
      errors.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5e7e605c
    • D
      xfs: cancel CoW reservations and clear inode reflink flag when freeing blocks · aa8968f2
      Darrick J. Wong 提交于
      When we're freeing blocks (truncate, punch, etc.), clear all CoW
      reservations in the range being freed.  If the file block count
      drops to zero, also clear the inode reflink flag.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      aa8968f2
    • D
      xfs: implement CoW for directio writes · 0613f16c
      Darrick J. Wong 提交于
      For O_DIRECT writes to shared blocks, we have to CoW them just like
      we would with buffered writes.  For writes that are not block-aligned,
      just bounce them to the page cache.
      
      For block-aligned writes, however, we can do better than that.  Use
      the same mechanisms that we employ for buffered CoW to set up a
      delalloc reservation, allocate all the blocks at once, issue the
      writes against the new blocks and use the same ioend functions to
      remap the blocks after the write.  This should be fairly performant.
      
      Christoph discovered that xfs_reflink_allocate_cow_range may stumble
      over invalid entries in the extent array given that it drops the ilock
      but still expects the index to be stable.  Simple fixing it to a new
      lookup for every iteration still isn't correct given that
      xfs_bmapi_allocate will trigger a BUG_ON() if hitting a hole, and
      there is nothing preventing a xfs_bunmapi_cow call removing extents
      once we dropped the ilock either.
      
      This patch duplicates the inner loop of xfs_bmapi_allocate into a
      helper for xfs_reflink_allocate_cow_range so that it can be done under
      the same ilock critical section as our CoW fork delayed allocation.
      The directio CoW warts will be revisited in a later patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      0613f16c
    • D
      xfs: report shared extent mappings to userspace correctly · db1327b1
      Darrick J. Wong 提交于
      Report shared extents through the iomap interface so that FIEMAP flags
      shared blocks accurately.  Have xfs_vm_bmap return zero for reflinked
      files because the bmap-based swap code requires static block mappings,
      which is incompatible with copy on write.
      
      NOTE: Existing userspace bmap users such as lilo will have the same
      problem with reflink files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      db1327b1
    • D
      xfs: move mappings from cow fork to data fork after copy-write · 43caeb18
      Darrick J. Wong 提交于
      After the write component of a copy-write operation finishes, clean up
      the bookkeeping left behind.  On error, we simply free the new blocks
      and pass the error up.  If we succeed, however, then we must remove
      the old data fork mapping and move the cow fork mapping to the data
      fork.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch: Call the CoW failure function during xfs_cancel_ioend]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      43caeb18
    • D
      xfs: support removing extents from CoW fork · 4862cfe8
      Darrick J. Wong 提交于
      Create a helper method to remove extents from the CoW fork without
      any of the side effects (rmapbt/bmbt updates) of the regular extent
      deletion routine.  We'll eventually use this to clear out the CoW fork
      during ioend processing.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      4862cfe8
  2. 05 10月, 2016 15 次提交
  3. 04 10月, 2016 7 次提交