1. 27 1月, 2020 7 次提交
  2. 24 1月, 2020 2 次提交
  3. 21 1月, 2020 1 次提交
  4. 17 1月, 2020 7 次提交
    • D
      xfs: check log iovec size to make sure it's plausibly a buffer log format · 8a6453a8
      Darrick J. Wong 提交于
      When log recovery is processing buffer log items, we should check that
      the incoming iovec actually describes a region of memory large enough to
      contain the log format and the dirty map.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      8a6453a8
    • D
      xfs: make struct xfs_buf_log_format have a consistent size · b7df5e92
      Darrick J. Wong 提交于
      Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
      end of struct xfs_buf_log_format.  This makes the size consistent so
      that we can check it in xfs_ondisk.h, and will be needed once we start
      logging attribute values.
      
      On amd64 we get the following pahole:
      
      struct xfs_buf_log_format {
              short unsigned int         blf_type;       /*     0     2 */
              short unsigned int         blf_size;       /*     2     2 */
              short unsigned int         blf_flags;      /*     4     2 */
              short unsigned int         blf_len;        /*     6     2 */
              long long int              blf_blkno;      /*     8     8 */
              unsigned int               blf_map_size;   /*    16     4 */
              unsigned int               blf_data_map[16]; /*    20    64 */
              /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */
      
              /* size: 88, cachelines: 2, members: 7 */
              /* padding: 4 */
              /* last cacheline: 24 bytes */
      };
      
      But on i386 we get the following:
      
      struct xfs_buf_log_format {
              short unsigned int         blf_type;       /*     0     2 */
              short unsigned int         blf_size;       /*     2     2 */
              short unsigned int         blf_flags;      /*     4     2 */
              short unsigned int         blf_len;        /*     6     2 */
              long long int              blf_blkno;      /*     8     8 */
              unsigned int               blf_map_size;   /*    16     4 */
              unsigned int               blf_data_map[16]; /*    20    64 */
              /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */
      
              /* size: 84, cachelines: 2, members: 7 */
              /* last cacheline: 20 bytes */
      };
      
      Notice how the amd64 compiler inserts 4 bytes of padding to the end of
      the structure to ensure 8-byte alignment.  Prior to "xfs: fix memory
      corruption during remote attr value buffer invalidation" we would try to
      write to blf_data_map[17], which is harmless on amd64 but really bad on
      i386.
      
      This shouldn't cause any changes in the ondisk logging formats because
      the log code writes out the log vectors with the appropriate size for
      the log item's map_size, and log recovery treats the data_map array as a
      VLA.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      b7df5e92
    • D
      xfs: complain if anyone tries to create a too-large buffer log item · c3d5f0c2
      Darrick J. Wong 提交于
      Complain if someone calls xfs_buf_item_init on a buffer that is larger
      than the dirty bitmap can handle, or tries to log a region that's past
      the end of the dirty bitmap.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c3d5f0c2
    • D
      xfs: clean up xfs_buf_item_get_format return value · c64dd49b
      Darrick J. Wong 提交于
      The only thing that can cause a nonzero return from
      xfs_buf_item_get_format is if the kmem_alloc fails, which it can't.
      Get rid of all the unnecessary error handling.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c64dd49b
    • D
      xfs: streamline xfs_attr3_leaf_inactive · 0bb9d159
      Darrick J. Wong 提交于
      Now that we know we don't have to take a transaction to stale the incore
      buffers for a remote value, get rid of the unnecessary memory allocation
      in the leaf walker and call the rmt_stale function directly.  Flatten
      the loop while we're at it.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      0bb9d159
    • D
      xfs: fix memory corruption during remote attr value buffer invalidation · e8db2aaf
      Darrick J. Wong 提交于
      While running generic/103, I observed what looks like memory corruption
      and (with slub debugging turned on) a slub redzone warning on i386 when
      inactivating an inode with a 64k remote attr value.
      
      On a v5 filesystem, maximally sized remote attr values require one block
      more than 64k worth of space to hold both the remote attribute value
      header (64 bytes).  On a 4k block filesystem this results in a 68k
      buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
      that even though we'll never use more than 65,600 bytes of this buffer,
      XFS_MAX_BLOCKSIZE is 64k.
      
      This is a problem because the definition of struct xfs_buf_log_format
      allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
      invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
      the dirty map, writing right off the end of the log item and corrupting
      memory.  We've gotten away with this on x86_64 for years because the
      compiler inserts a u32 padding on the end of struct xfs_buf_log_format.
      
      Fortunately for us, remote attribute values are written to disk with
      xfs_bwrite(), which is to say that they are not logged.  Fix the problem
      by removing all places where we could end up creating a buffer log item
      for a remote attribute value and leave a note explaining why.  Next,
      replace the open-coded buffer invalidation with a call to the helper we
      created in the previous patch that does better checking for bad metadata
      before marking the buffer stale.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e8db2aaf
    • D
      xfs: refactor remote attr value buffer invalidation · 8edbb26b
      Darrick J. Wong 提交于
      Hoist the code that invalidates remote extended attribute value buffers
      into a separate helper function.  This prepares us for a memory
      corruption fix in the next patch.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      8edbb26b
  5. 16 1月, 2020 2 次提交
  6. 15 1月, 2020 3 次提交
    • D
      xfs: fix s_maxbytes computation on 32-bit kernels · 932befe3
      Darrick J. Wong 提交于
      I observed a hang in generic/308 while running fstests on a i686 kernel.
      The hang occurred when trying to purge the pagecache on a large sparse
      file that had a page created past MAX_LFS_FILESIZE, which caused an
      integer overflow in the pagecache xarray and resulted in an infinite
      loop.
      
      I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in
      commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros") so
      that it is now one page short of the maximum page index on 32-bit
      kernels.  Because the XFS function to compute max offset open-codes the
      2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform
      any sanity checking of s_maxbytes, the code in generic/308 can create a
      page above the pagecache's limit and kaboom.
      
      Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and
      aborting the mount with a warning if our assumptions ever break.  I have
      no answer for why this seems to have been broken for years and nobody
      noticed.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      932befe3
    • D
      xfs: truncate should remove all blocks, not just to the end of the page cache · 4bbb04ab
      Darrick J. Wong 提交于
      xfs_itruncate_extents_flags() is supposed to unmap every block in a file
      from EOF onwards.  Oddly, it uses s_maxbytes as the upper limit to the
      bunmapi range, even though s_maxbytes reflects the highest offset the
      pagecache can support, not the highest offset that XFS supports.
      
      The result of this confusion is that if you create a 20T file on a
      64-bit machine, mount the filesystem on a 32-bit machine, and remove the
      file, we leak everything above 16T.  Fix this by capping the bunmapi
      request at the maximum possible block offset, not s_maxbytes.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      4bbb04ab
    • D
      xfs: introduce XFS_MAX_FILEOFF · a5084865
      Darrick J. Wong 提交于
      Introduce a new #define for the maximum supported file block offset.
      We'll use this in the next patch to make it more obvious that we're
      doing some operation for all possible inode fork mappings after a given
      offset.  We can't use ULLONG_MAX here because bunmapi uses that to
      detect when it's done.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a5084865
  7. 10 1月, 2020 6 次提交
  8. 08 1月, 2020 1 次提交
  9. 07 1月, 2020 2 次提交
  10. 29 12月, 2019 1 次提交
  11. 25 12月, 2019 1 次提交
  12. 23 12月, 2019 4 次提交
  13. 22 12月, 2019 1 次提交
  14. 21 12月, 2019 2 次提交
    • J
      io_uring: pass in 'sqe' to the prep handlers · 3529d8c2
      Jens Axboe 提交于
      This moves the prep handlers outside of the opcode handlers, and allows
      us to pass in the sqe directly. If the sqe is non-NULL, it means that
      the request should be prepared for the first time.
      
      With the opcode handlers not having access to the sqe at all, we are
      guaranteed that the prep handler has setup the request fully by the
      time we get there. As before, for opcodes that need to copy in more
      data then the io_kiocb allows for, the io_async_ctx holds that info. If
      a prep handler is invoked with req->io set, it must use that to retain
      information for later.
      
      Finally, we can remove io_kiocb->sqe as well.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3529d8c2
    • J
      io_uring: standardize the prep methods · 06b76d44
      Jens Axboe 提交于
      We currently have a mix of use cases. Most of the newer ones are pretty
      uniform, but we have some older ones that use different calling
      calling conventions. This is confusing.
      
      For the opcodes that currently rely on the req->io->sqe copy saving
      them from reuse, add a request type struct in the io_kiocb command
      union to store the data they need.
      
      Prepare for all opcodes having a standard prep method, so we can call
      it in a uniform fashion and outside of the opcode handler. This is in
      preparation for passing in the 'sqe' pointer, rather than storing it
      in the io_kiocb. Once we have uniform prep handlers, we can leave all
      the prep work to that part, and not even pass in the sqe to the opcode
      handler. This ensures that we don't reuse sqe data inadvertently.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      06b76d44