1. 27 1月, 2020 7 次提交
  2. 24 1月, 2020 2 次提交
  3. 21 1月, 2020 1 次提交
  4. 17 1月, 2020 7 次提交
    • D
      xfs: check log iovec size to make sure it's plausibly a buffer log format · 8a6453a8
      Darrick J. Wong 提交于
      When log recovery is processing buffer log items, we should check that
      the incoming iovec actually describes a region of memory large enough to
      contain the log format and the dirty map.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      8a6453a8
    • D
      xfs: make struct xfs_buf_log_format have a consistent size · b7df5e92
      Darrick J. Wong 提交于
      Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
      end of struct xfs_buf_log_format.  This makes the size consistent so
      that we can check it in xfs_ondisk.h, and will be needed once we start
      logging attribute values.
      
      On amd64 we get the following pahole:
      
      struct xfs_buf_log_format {
              short unsigned int         blf_type;       /*     0     2 */
              short unsigned int         blf_size;       /*     2     2 */
              short unsigned int         blf_flags;      /*     4     2 */
              short unsigned int         blf_len;        /*     6     2 */
              long long int              blf_blkno;      /*     8     8 */
              unsigned int               blf_map_size;   /*    16     4 */
              unsigned int               blf_data_map[16]; /*    20    64 */
              /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */
      
              /* size: 88, cachelines: 2, members: 7 */
              /* padding: 4 */
              /* last cacheline: 24 bytes */
      };
      
      But on i386 we get the following:
      
      struct xfs_buf_log_format {
              short unsigned int         blf_type;       /*     0     2 */
              short unsigned int         blf_size;       /*     2     2 */
              short unsigned int         blf_flags;      /*     4     2 */
              short unsigned int         blf_len;        /*     6     2 */
              long long int              blf_blkno;      /*     8     8 */
              unsigned int               blf_map_size;   /*    16     4 */
              unsigned int               blf_data_map[16]; /*    20    64 */
              /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */
      
              /* size: 84, cachelines: 2, members: 7 */
              /* last cacheline: 20 bytes */
      };
      
      Notice how the amd64 compiler inserts 4 bytes of padding to the end of
      the structure to ensure 8-byte alignment.  Prior to "xfs: fix memory
      corruption during remote attr value buffer invalidation" we would try to
      write to blf_data_map[17], which is harmless on amd64 but really bad on
      i386.
      
      This shouldn't cause any changes in the ondisk logging formats because
      the log code writes out the log vectors with the appropriate size for
      the log item's map_size, and log recovery treats the data_map array as a
      VLA.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      b7df5e92
    • D
      xfs: complain if anyone tries to create a too-large buffer log item · c3d5f0c2
      Darrick J. Wong 提交于
      Complain if someone calls xfs_buf_item_init on a buffer that is larger
      than the dirty bitmap can handle, or tries to log a region that's past
      the end of the dirty bitmap.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c3d5f0c2
    • D
      xfs: clean up xfs_buf_item_get_format return value · c64dd49b
      Darrick J. Wong 提交于
      The only thing that can cause a nonzero return from
      xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
      Get rid of all the unnecessary error handling.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c64dd49b
    • D
      xfs: streamline xfs_attr3_leaf_inactive · 0bb9d159
      Darrick J. Wong 提交于
      Now that we know we don't have to take a transaction to stale the incore
      buffers for a remote value, get rid of the unnecessary memory allocation
      in the leaf walker and call the rmt_stale function directly.  Flatten
      the loop while we're at it.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      0bb9d159
    • D
      xfs: fix memory corruption during remote attr value buffer invalidation · e8db2aaf
      Darrick J. Wong 提交于
      While running generic/103, I observed what looks like memory corruption
      and (with slub debugging turned on) a slub redzone warning on i386 when
      inactivating an inode with a 64k remote attr value.
      
      On a v5 filesystem, maximally sized remote attr values require one block
      more than 64k worth of space to hold both the remote attribute value
      header (64 bytes).  On a 4k block filesystem this results in a 68k
      buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
      that even though we'll never use more than 65,600 bytes of this buffer,
      XFS_MAX_BLOCKSIZE is 64k.
      
      This is a problem because the definition of struct xfs_buf_log_format
      allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
      invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
      the dirty map, writing right off the end of the log item and corrupting
      memory.  We've gotten away with this on x86_64 for years because the
      compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
      
      Fortunately for us, remote attribute values are written to disk with
      xfs_bwrite(), which is to say that they are not logged.  Fix the problem
      by removing all places where we could end up creating a buffer log item
      for a remote attribute value and leave a note explaining why.  Next,
      replace the open-coded buffer invalidation with a call to the helper we
      created in the previous patch that does better checking for bad metadata
      before marking the buffer stale.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e8db2aaf
    • D
      xfs: refactor remote attr value buffer invalidation · 8edbb26b
      Darrick J. Wong 提交于
      Hoist the code that invalidates remote extended attribute value buffers
      into a separate helper function.  This prepares us for a memory
      corruption fix in the next patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      8edbb26b
  5. 16 1月, 2020 2 次提交
  6. 15 1月, 2020 3 次提交
    • D
      xfs: fix s_maxbytes computation on 32-bit kernels · 932befe3
      Darrick J. Wong 提交于
      I observed a hang in generic/308 while running fstests on a i686 kernel.
      The hang occurred when trying to purge the pagecache on a large sparse
      file that had a page created past MAX_LFS_FILESIZE, which caused an
      integer overflow in the pagecache xarray and resulted in an infinite
      loop.
      
      I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in
      commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros") so
      that it is now one page short of the maximum page index on 32-bit
      kernels.  Because the XFS function to compute max offset open-codes the
      2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform
      any sanity checking of s_maxbytes, the code in generic/308 can create a
      page above the pagecache's limit and kaboom.
      
      Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and
      aborting the mount with a warning if our assumptions ever break.  I have
      no answer for why this seems to have been broken for years and nobody
      noticed.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      932befe3
    • D
      xfs: truncate should remove all blocks, not just to the end of the page cache · 4bbb04ab
      Darrick J. Wong 提交于
      xfs_itruncate_extents_flags() is supposed to unmap every block in a file
      from EOF onwards.  Oddly, it uses s_maxbytes as the upper limit to the
      bunmapi range, even though s_maxbytes reflects the highest offset the
      pagecache can support, not the highest offset that XFS supports.
      
      The result of this confusion is that if you create a 20T file on a
      64-bit machine, mount the filesystem on a 32-bit machine, and remove the
      file, we leak everything above 16T.  Fix this by capping the bunmapi
      request at the maximum possible block offset, not s_maxbytes.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      4bbb04ab
    • D
      xfs: introduce XFS_MAX_FILEOFF · a5084865
      Darrick J. Wong 提交于
      Introduce a new #define for the maximum supported file block offset.
      We'll use this in the next patch to make it more obvious that we're
      doing some operation for all possible inode fork mappings after a given
      offset.  We can't use ULLONG_MAX here because bunmapi uses that to
      detect when it's done.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a5084865
  7. 10 1月, 2020 6 次提交
  8. 08 1月, 2020 1 次提交
  9. 07 1月, 2020 2 次提交
  10. 21 12月, 2019 1 次提交
  11. 19 12月, 2019 5 次提交
  12. 18 12月, 2019 1 次提交
  13. 12 12月, 2019 1 次提交
    • B
      xfs: stabilize insert range start boundary to avoid COW writeback race · d0c22041
      Brian Foster 提交于
      generic/522 (fsx) occasionally fails with a file corruption due to
      an insert range operation. The primary characteristic of the
      corruption is a misplaced insert range operation that differs from
      the requested target offset. The reason for this behavior is a race
      between the extent shift sequence of an insert range and a COW
      writeback completion that causes a front merge with the first extent
      in the shift.
      
      The shift preparation function flushes and unmaps from the target
      offset of the operation to the end of the file to ensure no
      modifications can be made and page cache is invalidated before file
      data is shifted. An insert range operation then splits the extent at
      the target offset, if necessary, and begins to shift the start
      offset of each extent starting from the end of the file to the start
      offset. The shift sequence operates at extent level and so depends
      on the preparation sequence to guarantee no changes can be made to
      the target range during the shift. If the block immediately prior to
      the target offset was dirty and shared, however, it can undergo
      writeback and move from the COW fork to the data fork at any point
      during the shift. If the block is contiguous with the block at the
      start offset of the insert range, it can front merge and alter the
      start offset of the extent. Once the shift sequence reaches the
      target offset, it shifts based on the latest start offset and
      silently changes the target offset of the operation and corrupts the
      file.
      
      To address this problem, update the shift preparation code to
      stabilize the start boundary along with the full range of the
      insert. Also update the existing corruption check to fail if any
      extent is shifted with a start offset behind the target offset of
      the insert range. This prevents insert from racing with COW
      writeback completion and fails loudly in the event of an unexpected
      extent shift.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d0c22041
  14. 04 12月, 2019 1 次提交
    • B
      xfs: fix mount failure crash on invalid iclog memory access · 798a9cad
      Brian Foster 提交于
      syzbot (via KASAN) reports a use-after-free in the error path of
      xlog_alloc_log(). Specifically, the iclog freeing loop doesn't
      handle the case of a fully initialized ->l_iclog linked list.
      Instead, it assumes that the list is partially constructed and NULL
      terminated.
      
      This bug manifested because there was no possible error scenario
      after iclog list setup when the original code was added.  Subsequent
      code and associated error conditions were added some time later,
      while the original error handling code was never updated. Fix up the
      error loop to terminate either on a NULL iclog or reaching the end
      of the list.
      
      Reported-by: syzbot+c732f8644185de340492@syzkaller.appspotmail.com
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      798a9cad