1. 17 1月, 2020 4 次提交
    • D
      xfs: clean up xfs_buf_item_get_format return value · c64dd49b
      Darrick J. Wong 提交于
      The only thing that can cause a nonzero return from
      xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
      Get rid of all the unnecessary error handling.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c64dd49b
    • D
      xfs: streamline xfs_attr3_leaf_inactive · 0bb9d159
      Darrick J. Wong 提交于
      Now that we know we don't have to take a transaction to stale the incore
      buffers for a remote value, get rid of the unnecessary memory allocation
      in the leaf walker and call the rmt_stale function directly.  Flatten
      the loop while we're at it.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      0bb9d159
    • D
      xfs: fix memory corruption during remote attr value buffer invalidation · e8db2aaf
      Darrick J. Wong 提交于
      While running generic/103, I observed what looks like memory corruption
      and (with slub debugging turned on) a slub redzone warning on i386 when
      inactivating an inode with a 64k remote attr value.
      
      On a v5 filesystem, maximally sized remote attr values require one block
      more than 64k worth of space to hold both the remote attribute value
      header (64 bytes).  On a 4k block filesystem this results in a 68k
      buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
      that even though we'll never use more than 65,600 bytes of this buffer,
      XFS_MAX_BLOCKSIZE is 64k.
      
      This is a problem because the definition of struct xfs_buf_log_format
      allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
      invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
      the dirty map, writing right off the end of the log item and corrupting
      memory.  We've gotten away with this on x86_64 for years because the
      compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
      
      Fortunately for us, remote attribute values are written to disk with
      xfs_bwrite(), which is to say that they are not logged.  Fix the problem
      by removing all places where we could end up creating a buffer log item
      for a remote attribute value and leave a note explaining why.  Next,
      replace the open-coded buffer invalidation with a call to the helper we
      created in the previous patch that does better checking for bad metadata
      before marking the buffer stale.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e8db2aaf
    • D
      xfs: refactor remote attr value buffer invalidation · 8edbb26b
      Darrick J. Wong 提交于
      Hoist the code that invalidates remote extended attribute value buffers
      into a separate helper function.  This prepares us for a memory
      corruption fix in the next patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      8edbb26b
  2. 16 1月, 2020 2 次提交
  3. 15 1月, 2020 3 次提交
    • D
      xfs: fix s_maxbytes computation on 32-bit kernels · 932befe3
      Darrick J. Wong 提交于
      I observed a hang in generic/308 while running fstests on a i686 kernel.
      The hang occurred when trying to purge the pagecache on a large sparse
      file that had a page created past MAX_LFS_FILESIZE, which caused an
      integer overflow in the pagecache xarray and resulted in an infinite
      loop.
      
      I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in
      commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros") so
      that it is now one page short of the maximum page index on 32-bit
      kernels.  Because the XFS function to compute max offset open-codes the
      2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform
      any sanity checking of s_maxbytes, the code in generic/308 can create a
      page above the pagecache's limit and kaboom.
      
      Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and
      aborting the mount with a warning if our assumptions ever break.  I have
      no answer for why this seems to have been broken for years and nobody
      noticed.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      932befe3
    • D
      xfs: truncate should remove all blocks, not just to the end of the page cache · 4bbb04ab
      Darrick J. Wong 提交于
      xfs_itruncate_extents_flags() is supposed to unmap every block in a file
      from EOF onwards.  Oddly, it uses s_maxbytes as the upper limit to the
      bunmapi range, even though s_maxbytes reflects the highest offset the
      pagecache can support, not the highest offset that XFS supports.
      
      The result of this confusion is that if you create a 20T file on a
      64-bit machine, mount the filesystem on a 32-bit machine, and remove the
      file, we leak everything above 16T.  Fix this by capping the bunmapi
      request at the maximum possible block offset, not s_maxbytes.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      4bbb04ab
    • D
      xfs: introduce XFS_MAX_FILEOFF · a5084865
      Darrick J. Wong 提交于
      Introduce a new #define for the maximum supported file block offset.
      We'll use this in the next patch to make it more obvious that we're
      doing some operation for all possible inode fork mappings after a given
      offset.  We can't use ULLONG_MAX here because bunmapi uses that to
      detect when it's done.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a5084865
  4. 10 1月, 2020 6 次提交
  5. 08 1月, 2020 1 次提交
  6. 07 1月, 2020 2 次提交
  7. 21 12月, 2019 1 次提交
  8. 19 12月, 2019 5 次提交
  9. 18 12月, 2019 1 次提交
  10. 12 12月, 2019 1 次提交
    • B
      xfs: stabilize insert range start boundary to avoid COW writeback race · d0c22041
      Brian Foster 提交于
      generic/522 (fsx) occasionally fails with a file corruption due to
      an insert range operation. The primary characteristic of the
      corruption is a misplaced insert range operation that differs from
      the requested target offset. The reason for this behavior is a race
      between the extent shift sequence of an insert range and a COW
      writeback completion that causes a front merge with the first extent
      in the shift.
      
      The shift preparation function flushes and unmaps from the target
      offset of the operation to the end of the file to ensure no
      modifications can be made and page cache is invalidated before file
      data is shifted. An insert range operation then splits the extent at
      the target offset, if necessary, and begins to shift the start
      offset of each extent starting from the end of the file to the start
      offset. The shift sequence operates at extent level and so depends
      on the preparation sequence to guarantee no changes can be made to
      the target range during the shift. If the block immediately prior to
      the target offset was dirty and shared, however, it can undergo
      writeback and move from the COW fork to the data fork at any point
      during the shift. If the block is contiguous with the block at the
      start offset of the insert range, it can front merge and alter the
      start offset of the extent. Once the shift sequence reaches the
      target offset, it shifts based on the latest start offset and
      silently changes the target offset of the operation and corrupts the
      file.
      
      To address this problem, update the shift preparation code to
      stabilize the start boundary along with the full range of the
      insert. Also update the existing corruption check to fail if any
      extent is shifted with a start offset behind the target offset of
      the insert range. This prevents insert from racing with COW
      writeback completion and fails loudly in the event of an unexpected
      extent shift.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d0c22041
  11. 04 12月, 2019 1 次提交
    • B
      xfs: fix mount failure crash on invalid iclog memory access · 798a9cad
      Brian Foster 提交于
      syzbot (via KASAN) reports a use-after-free in the error path of
      xlog_alloc_log(). Specifically, the iclog freeing loop doesn't
      handle the case of a fully initialized ->l_iclog linked list.
      Instead, it assumes that the list is partially constructed and NULL
      terminated.
      
      This bug manifested because there was no possible error scenario
      after iclog list setup when the original code was added.  Subsequent
      code and associated error conditions were added some time later,
      while the original error handling code was never updated. Fix up the
      error loop to terminate either on a NULL iclog or reaching the end
      of the list.
      
      Reported-by: syzbot+c732f8644185de340492@syzkaller.appspotmail.com
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      798a9cad
  12. 03 12月, 2019 2 次提交
    • O
      xfs: don't check for AG deadlock for realtime files in bunmapi · 69ffe596
      Omar Sandoval 提交于
      Commit 5b094d6d ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
      a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
      in the wrong order. However, this check isn't applicable for realtime
      files. In most cases, it just makes us do unnecessary commits. However,
      without the fix from the previous commit ("xfs: fix realtime file data
      space leak"), if the last and second-to-last extents also happen to have
      different "AG numbers", then the break actually causes __xfs_bunmapi()
      to return without making any progress, which sends
      xfs_itruncate_extents_flags() into an infinite loop.
      
      Fixes: 5b094d6d ("xfs: fix multi-AG deadlock in xfs_bunmapi")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      69ffe596
    • O
      xfs: fix realtime file data space leak · 0c4da70c
      Omar Sandoval 提交于
      Realtime files in XFS allocate extents in rextsize units. However, the
      written/unwritten state of those extents is still tracked in blocksize
      units. Therefore, a realtime file can be split up into written and
      unwritten extents that are not necessarily aligned to the realtime
      extent size. __xfs_bunmapi() has some logic to handle these various
      corner cases. Consider how it handles the following case:
      
      1. The last extent is unwritten.
      2. The last extent is smaller than the realtime extent size.
      3. startblock of the last extent is not aligned to the realtime extent
         size, but startblock + blockcount is.
      
      In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
      to set the second-to-last extent to unwritten. This should merge the
      last and second-to-last extents, so __xfs_bunmapi() moves on to the
      second-to-last extent.
      
      However, if the size of the last and second-to-last extents combined is
      greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
      merge the two extents. When that happens, __xfs_bunmapi() skips past the
      last extent without unmapping it, thus leaking the space.
      
      Fix it by only unwriting the minimum amount needed to align the last
      extent to the realtime extent size, which is guaranteed to merge with
      the last extent.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0c4da70c
  13. 28 11月, 2019 1 次提交
  14. 23 11月, 2019 10 次提交