1. 18 3月, 2019 1 次提交
    • B
      xfs: don't trip over uninitialized buffer on extent read of corrupted inode · 6958d11f
      Brian Foster 提交于
      We've had rather rare reports of bmap btree block corruption where
      the bmap root block has a level count of zero. The root cause of the
      corruption is so far unknown. We do have verifier checks to detect
      this form of on-disk corruption, but this doesn't cover a memory
      corruption variant of the problem. The latter is a reasonable
      possibility because the root block is part of the inode fork and can
      reside in-core for some time before inode extents are read.
      
      If this occurs, it leads to a system crash such as the following:
      
       BUG: unable to handle kernel paging request at ffffffff00000221
       PF error: [normal kernel read fault]
       ...
       RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs]
       ...
       Call Trace:
        xfs_iread_extents+0x379/0x540 [xfs]
        xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs]
        ? xfs_attr_get+0xd1/0x120 [xfs]
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        xfs_file_iomap_begin+0x4c4/0x6d0 [xfs]
        ? __vfs_getxattr+0x53/0x70
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        iomap_apply+0x63/0x130
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        iomap_file_buffered_write+0x62/0x90
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs]
        __vfs_write+0x150/0x1b0
        vfs_write+0xba/0x1c0
        ksys_pwrite64+0x64/0xa0
        do_syscall_64+0x5a/0x1d0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The crash occurs because xfs_iread_extents() attempts to release an
      uninitialized buffer pointer as the level == 0 value prevented the
      buffer from ever being allocated or read. Change the level > 0
      assert to an explicit error check in xfs_iread_extents() to avoid
      crashing the kernel in the event of localized, in-core inode
      corruption.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6958d11f
  2. 13 3月, 2019 1 次提交
  3. 11 3月, 2019 1 次提交
  4. 09 3月, 2019 2 次提交
  5. 26 2月, 2019 1 次提交
  6. 21 2月, 2019 1 次提交
    • C
      xfs: make COW fork unwritten extent conversions more robust · 26b91c72
      Christoph Hellwig 提交于
      If we have racing buffered and direct I/O COW fork extents under
      writeback can have been moved to the data fork by the time we call
      xfs_reflink_convert_cow from xfs_submit_ioend.  This would be mostly
      harmless as the block numbers don't change by this move, except for
      the fact that xfs_bmapi_write will crash or trigger asserts when
      not finding existing extents, even despite trying to paper over this
      with the XFS_BMAPI_CONVERT_ONLY flag.
      
      Instead of special casing non-transaction conversions in the already
      way too complicated xfs_bmapi_write just add a new helper for the much
      simpler non-transactional COW fork case, which simplify ignores not
      found extents.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      26b91c72
  7. 19 2月, 2019 1 次提交
  8. 18 2月, 2019 5 次提交
  9. 15 2月, 2019 1 次提交
  10. 12 2月, 2019 19 次提交
  11. 20 12月, 2018 5 次提交
  12. 13 12月, 2018 2 次提交
    • O
      xfs: cache minimum realtime summary level · 355e3532
      Omar Sandoval 提交于
      The realtime summary is a two-dimensional array on disk, effectively:
      
      u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]
      
      rsum[log][bbno] is the number of extents of size 2**log which start in
      bitmap block bbno.
      
      xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
      rsum[log][bbno] != 0 for any log level. However, the summary array is
      stored in row-major order (i.e., like an array in C), so all of these
      entries are not adjacent, but rather spread across the entire summary
      file. In the worst case (a full bitmap block), xfs_rtany_summary() has
      to check every level.
      
      This means that on a moderately-used realtime device, an allocation will
      waste a lot of time finding, reading, and releasing buffers for the
      realtime summary. In particular, one of our storage services (which runs
      on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
      spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
      xfs_trans_brelse() called from xfs_rtany_summary().
      
      One solution would be to also store the summary with the dimensions
      swapped. However, this would require a disk format change to a very old
      component of XFS.
      
      Instead, we can cache the minimum size which contains any extents. We do
      so lazily; rather than guaranteeing that the cache contains the precise
      minimum, it always contains a loose lower bound which we tighten when we
      read or update a summary block. This only uses a few kilobytes of memory
      and is already serialized via the realtime bitmap and summary inode
      locks, so the cost is minimal. With this change, the same workload only
      spends 0.2% of its CPU cycles in the realtime allocator.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      355e3532
    • D
      xfs: precalculate cluster alignment in inodes and blocks · c1b4a321
      Darrick J. Wong 提交于
      Store the inode cluster alignment information in units of inodes and
      blocks in the mount data so that we don't have to keep recalculating
      them.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      c1b4a321