1. 08 4月, 2021 2 次提交
  2. 04 2月, 2021 5 次提交
  3. 23 1月, 2021 3 次提交
  4. 22 10月, 2020 1 次提交
    • D
      xfs: fix fallocate functions when rtextsize is larger than 1 · 25219dbf
      Darrick J. Wong 提交于
      In commit fe341eb1, I forgot that xfs_free_file_space isn't strictly
      a "remove mapped blocks" function.  It is actually a function to zero
      file space by punching out the middle and writing zeroes to the
      unaligned ends of the specified range.  Therefore, putting a rtextsize
      alignment check in that function is wrong because that breaks unaligned
      ZERO_RANGE on the realtime volume.
      
      Furthermore, xfs_file_fallocate already has alignment checks for the
      functions require the file range to be aligned to the size of a
      fundamental allocation unit (which is 1 FSB on the data volume and 1 rt
      extent on the realtime volume).  Create a new helper to check fallocate
      arguments against the realtiem allocation unit size, fix the fallocate
      frontend to use it, fix free_file_space to delete the correct range, and
      remove a now redundant check from insert_file_space.
      
      NOTE: The realtime extent size is not required to be a power of two!
      
      Fixes: fe341eb1 ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      25219dbf
  5. 16 9月, 2020 1 次提交
  6. 27 8月, 2020 1 次提交
    • B
      xfs: finish dfops on every insert range shift iteration · 9c516e0e
      Brian Foster 提交于
      The recent change to make insert range an atomic operation used the
      incorrect transaction rolling mechanism. The explicit transaction
      roll does not finish deferred operations. This means that intents
      for rmapbt updates caused by extent shifts are not logged until the
      final transaction commits. Thus if a crash occurs during an insert
      range, log recovery might leave the rmapbt in an inconsistent state.
      This was discovered by repeated runs of generic/455.
      
      Update insert range to finish dfops on every shift iteration. This
      is similar to collapse range and ensures that intents are logged
      with the transactions that make associated changes.
      
      Fixes: dd87f87d ("xfs: rework insert range into an atomic operation")
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9c516e0e
  7. 17 7月, 2020 1 次提交
  8. 07 7月, 2020 1 次提交
    • B
      xfs: preserve rmapbt swapext block reservation from freed blocks · f74681ba
      Brian Foster 提交于
      The rmapbt extent swap algorithm remaps individual extents between
      the source inode and the target to trigger reverse mapping metadata
      updates. If either inode straddles a format or other bmap allocation
      boundary, the individual unmap and map cycles can trigger repeated
      bmap block allocations and frees as the extent count bounces back
      and forth across the boundary. While net block usage is bound across
      the swap operation, this behavior can prematurely exhaust the
      transaction block reservation because it continuously drains as the
      transaction rolls. Each allocation accounts against the reservation
      and each free returns to global free space on transaction roll.
      
      The previous workaround to this problem attempted to detect this
      boundary condition and provide surplus block reservation to
      acommodate it. This is insufficient because more remaps can occur
      than implied by the extent counts; if start offset boundaries are
      not aligned between the two inodes, for example.
      
      To address this problem more generically and dynamically, add a
      transaction accounting mode that returns freed blocks to the
      transaction reservation instead of the superblock counters on
      transaction roll and use it when the rmapbt based algorithm is
      active. This allows the chain of remap transactions to preserve the
      block reservation based own its own frees and prevent premature
      exhaustion regardless of the remap pattern. Note that this is only
      safe for superblocks with lazy sb accounting, but the latter is
      required for v5 supers and the rmap feature depends on v5.
      
      Fixes: b3fed434 ("xfs: account format bouncing into rmapbt swapext tx reservation")
      Root-caused-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f74681ba
  9. 20 5月, 2020 3 次提交
  10. 07 5月, 2020 1 次提交
  11. 19 3月, 2020 1 次提交
  12. 03 3月, 2020 3 次提交
  13. 12 12月, 2019 1 次提交
    • B
      xfs: stabilize insert range start boundary to avoid COW writeback race · d0c22041
      Brian Foster 提交于
      generic/522 (fsx) occasionally fails with a file corruption due to
      an insert range operation. The primary characteristic of the
      corruption is a misplaced insert range operation that differs from
      the requested target offset. The reason for this behavior is a race
      between the extent shift sequence of an insert range and a COW
      writeback completion that causes a front merge with the first extent
      in the shift.
      
      The shift preparation function flushes and unmaps from the target
      offset of the operation to the end of the file to ensure no
      modifications can be made and page cache is invalidated before file
      data is shifted. An insert range operation then splits the extent at
      the target offset, if necessary, and begins to shift the start
      offset of each extent starting from the end of the file to the start
      offset. The shift sequence operates at extent level and so depends
      on the preparation sequence to guarantee no changes can be made to
      the target range during the shift. If the block immediately prior to
      the target offset was dirty and shared, however, it can undergo
      writeback and move from the COW fork to the data fork at any point
      during the shift. If the block is contiguous with the block at the
      start offset of the insert range, it can front merge and alter the
      start offset of the extent. Once the shift sequence reaches the
      target offset, it shifts based on the latest start offset and
      silently changes the target offset of the operation and corrupts the
      file.
      
      To address this problem, update the shift preparation code to
      stabilize the start boundary along with the full range of the
      insert. Also update the existing corruption check to fail if any
      extent is shifted with a start offset behind the target offset of
      the insert range. This prevents insert from racing with COW
      writeback completion and fails loudly in the event of an unexpected
      extent shift.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d0c22041
  14. 12 11月, 2019 1 次提交
  15. 04 11月, 2019 2 次提交
  16. 01 11月, 2019 1 次提交
    • D
      xfs: properly serialise fallocate against AIO+DIO · 249bd908
      Dave Chinner 提交于
      AIO+DIO can extend the file size on IO completion, and it holds
      no inode locks while the IO is in flight. Therefore, a race
      condition exists in file size updates if we do something like this:
      
      aio-thread			fallocate-thread
      
      lock inode
      submit IO beyond inode->i_size
      unlock inode
      .....
      				lock inode
      				break layouts
      				if (off + len > inode->i_size)
      					new_size = off + len
      				.....
      				inode_dio_wait()
      				<blocks>
      .....
      completes
      inode->i_size updated
      inode_dio_done()
      ....
      				<wakes>
      				<does stuff no long beyond EOF>
      				if (new_size)
      					xfs_vn_setattr(inode, new_size)
      
      
      Yup, that attempt to extend the file size in the fallocate code
      turns into a truncate - it removes the whatever the aio write
      allocated and put to disk, and reduced the inode size back down to
      where the fallocate operation ends.
      
      Fundamentally, xfs_file_fallocate()  not compatible with racing
      AIO+DIO completions, so we need to move the inode_dio_wait() call
      up to where the lock the inode and break the layouts.
      
      Secondly, storing the inode size and then using it unchecked without
      holding the ILOCK is not safe; we can only do such a thing if we've
      locked out and drained all IO and other modification operations,
      which we don't do initially in xfs_file_fallocate.
      
      It should be noted that some of the fallocate operations are
      compound operations - they are made up of multiple manipulations
      that may zero data, and so we may need to flush and invalidate the
      file multiple times during an operation. However, we only need to
      lock out IO and other space manipulation operations once, as that
      lockout is maintained until the entire fallocate operation has been
      completed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      249bd908
  17. 30 10月, 2019 1 次提交
  18. 29 10月, 2019 1 次提交
  19. 28 10月, 2019 1 次提交
  20. 24 10月, 2019 1 次提交
    • B
      xfs: don't set bmapi total block req where minleft is · da781e64
      Brian Foster 提交于
      xfs_bmapi_write() takes a total block requirement parameter that is
      passed down to the block allocation code and is used to specify the
      total block requirement of the associated transaction. This is used
      to try and select an AG that can not only satisfy the requested
      extent allocation, but can also accommodate subsequent allocations
      that might be required to complete the transaction. For example,
      additional bmbt block allocations may be required on insertion of
      the resulting extent to an inode data fork.
      
      While it's important for callers to calculate and reserve such extra
      blocks in the transaction, it is not necessary to pass the total
      value to xfs_bmapi_write() in all cases. The latter automatically
      sets minleft to ensure that sufficient free blocks remain after the
      allocation attempt to expand the format of the associated inode
      (i.e., such as extent to btree conversion, btree splits, etc).
      Therefore, any callers that pass a total block requirement of the
      bmap mapping length plus worst case bmbt expansion essentially
      specify the additional reservation requirement twice. These callers
      can pass a total of zero to rely on the bmapi minleft policy.
      
      Beyond being superfluous, the primary motivation for this change is
      that the total reservation logic in the bmbt code is dubious in
      scenarios where minlen < maxlen and a maxlen extent cannot be
      allocated (which is more common for data extent allocations where
      contiguity is not required). The total value is based on maxlen in
      the xfs_bmapi_write() caller. If the bmbt code falls back to an
      allocation between minlen and maxlen, that allocation will not
      succeed until total is reset to minlen, which essentially throws
      away any additional reservation included in total by the caller. In
      addition, the total value is not reset until after alignment is
      dropped, which means that such callers drop alignment far too
      aggressively than necessary.
      
      Update all callers of xfs_bmapi_write() that pass a total block
      value of the mapping length plus bmbt reservation to instead pass
      zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
      requirement. This trades off slightly less conservative AG selection
      for the ability to preserve alignment in more scenarios.
      xfs_bmapi_write() callers that incorporate unrelated or additional
      reservations in total beyond what is already included in minleft
      must continue to use the former.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      da781e64
  21. 22 10月, 2019 1 次提交
  22. 07 10月, 2019 1 次提交
    • M
      xfs: Fix tail rounding in xfs_alloc_file_space() · e093c4be
      Max Reitz 提交于
      To ensure that all blocks touched by the range [offset, offset + count)
      are allocated, we need to calculate the block count from the difference
      of the range end (rounded up) and the range start (rounded down).
      
      Before this patch, we just round up the byte count, which may lead to
      unaligned ranges not being fully allocated:
      
      $ touch test_file
      $ block_size=$(stat -fc '%S' test_file)
      $ fallocate -o $((block_size / 2)) -l $block_size test_file
      $ xfs_bmap test_file
      test_file:
              0: [0..7]: 1396264..1396271
              1: [8..15]: hole
      
      There should not be a hole there.  Instead, the first two blocks should
      be fully allocated.
      
      With this patch applied, the result is something like this:
      
      $ touch test_file
      $ block_size=$(stat -fc '%S' test_file)
      $ fallocate -o $((block_size / 2)) -l $block_size test_file
      $ xfs_bmap test_file
      test_file:
              0: [0..15]: 11024..11039
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e093c4be
  23. 31 8月, 2019 1 次提交
  24. 28 8月, 2019 1 次提交
  25. 29 6月, 2019 1 次提交
  26. 13 6月, 2019 1 次提交
  27. 27 4月, 2019 1 次提交
  28. 21 2月, 2019 1 次提交
    • C
      xfs: introduce an always_cow mode · 66ae56a5
      Christoph Hellwig 提交于
      Add a mode where XFS never overwrites existing blocks in place.  This
      is to aid debugging our COW code, and also put infatructure in place
      for things like possible future support for zoned block devices, which
      can't support overwrites.
      
      This mode is enabled globally by doing a:
      
          echo 1 > /sys/fs/xfs/debug/always_cow
      
      Note that the parameter is global to allow running all tests in xfstests
      easily in this mode, which would not easily be possible with a per-fs
      sysfs file.
      
      In always_cow mode persistent preallocations are disabled, and fallocate
      will fail when called with a 0 mode (with our without
      FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
      when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.
      
      There are a few interesting xfstests failures when run in always_cow
      mode:
      
       - generic/392 fails because the bytes used in the file used to test
         hole punch recovery are less after the log replay.  This is
         because the blocks written and then punched out are only freed
         with a delay due to the logging mechanism.
       - xfs/170 will fail as the already fragile file streams mechanism
         doesn't seem to interact well with the COW allocator
       - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
         the file system is badly fragmented, but there is not much we
         can do to avoid that when always writing out of place
       - xfs/205 fails because overwriting a file in always_cow mode
         will require new space allocation and the assumption in the
         test thus don't work anymore.
       - xfs/326 fails to modify the file at all in always_cow mode after
         injecting the refcount error, leading to an unexpected md5sum
         after the remount, but that again is expected
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      66ae56a5