1. 15 4月, 2019 6 次提交
    • D
      xfs: track metadata health status · 6772c1f1
      Darrick J. Wong 提交于
      Add the necessary in-core metadata fields to keep track of which parts
      of the filesystem have been observed and which parts were observed to be
      unhealthy, and print a warning at unmount time if we have unfixed
      problems.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      6772c1f1
    • W
      xfs,fstrim: fix to return correct minlen · 2bf9d264
      Wang Shilong 提交于
      This patch tries to address two problems:
      
      1) return @minlen we used to trim to
      user space.
      
      2) return EINVAL if granularity is larger than
      avg size, even most of cases, granularity is small(4K),
      but if devices return a lager granularity for some reaons
      (testing, bugs etc), fstrim should return failure directly.
      Signed-off-by: NWang Shilong <wshilong@ddn.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2bf9d264
    • B
      xfs: don't account extra agfl blocks as available · 1ca89fbc
      Brian Foster 提交于
      The block allocation AG selection code has parameters that allow a
      caller to perform multiple allocations from a single AG and
      transaction (under certain conditions). The parameters specify the
      total block allocation count required by the transaction and the AG
      selection code selects and locks an AG that will be able to satisfy
      the overall requirement. If the available block accounting
      calculation turns out to be inaccurate and a subsequent allocation
      call fails with -ENOSPC, the resulting transaction cancel leads to
      filesystem shutdown because the transaction is dirty.
      
      This exact problem can be reproduced with a highly parallel space
      consumer and fsstress workload running long enough to a large
      filesystem against -ENOSPC conditions. A bmbt block allocation
      request made for inode extent to bmap format conversion after an
      extent allocation is expected to be satisfied by the same AG and the
      same transaction as the extent allocation. The bmbt block allocation
      fails, however, because the block availability of the AG has changed
      since the AG was selected (outside of the blocks used for the extent
      itself).
      
      The inconsistent block availability calculation is caused by the
      deferred block freeing behavior of the AGFL. This immediately
      removes extra blocks from the AGFL to free up AGFL slots, but rather
      than immediately freeing such blocks as was done in the past, the
      block free is deferred such that said blocks are not available for
      allocation until the current transaction commits. The AG selection
      logic currently considers all AGFL blocks as available and executes
      shortly before any extra AGFL blocks are freed. This means the block
      availability of the current AG can change before the first
      allocation even occurs, but in practice a failure is more likely to
      manifest via a subsequent allocation because extent allocation
      usually has a contiguity requirement larger than a single block that
      can't be satisfied from the AGFL.
      
      In general, XFS prefers operational robustness to absolute
      allocation efficiency. In other words, we prefer to return -ENOSPC
      slightly earlier at the expense of not being able to allocate every
      last block in an AG to avoid this kind of problem. As such, update
      the AG block availability calculation to consider extra AGFL blocks
      as unavailable since they are immediately removed following the
      calculation and will not become available until the current
      transaction commits.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1ca89fbc
    • B
      xfs: shutdown after buf release in iflush cluster abort path · 22fedd80
      Brian Foster 提交于
      If xfs_iflush_cluster() fails due to corruption, the error path
      issues a shutdown and simulates an I/O completion to release the
      buffer. This code has a couple small problems. First, the shutdown
      sequence can issue a synchronous log force, which is unsafe to do
      with buffer locks held. Second, the simulated I/O completion does not
      guarantee the buffer is async and thus is unlocked and released.
      
      For example, if the last operation on the buffer was a read off disk
      prior to the corruption event, XBF_ASYNC is not set and the buffer
      is left locked and held upon return. This results in a memory leak
      as shown by the following message on module unload:
      
       BUG xfs_buf (...): Objects remaining in xfs_buf on __kmem_cache_shutdown()
      
      Fix both of these problems by setting XBF_ASYNC on the buffer prior
      to the simulated I/O error and performing the shutdown immediately
      after ioend processing when the buffer has been released.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      22fedd80
    • B
      xfs: wake commit waiters on CIL abort before log item abort · 545aa41f
      Brian Foster 提交于
      XFS shutdown deadlocks have been reproduced by fstest generic/475.
      The deadlock signature involves log I/O completion running error
      handling to abort logged items and waiting for an inode cluster
      buffer lock in the buffer item unpin handler. The buffer lock is
      held by xfsaild attempting to flush an inode. The buffer happens to
      be pinned and so xfs_iflush() triggers an async log force to begin
      work required to get it unpinned. The log force is blocked waiting
      on the commit completion, which never occurs and thus leaves the
      filesystem deadlocked.
      
      The root problem is that aborted log I/O completion pots commit
      completion behind callback completion, which is unexpected for async
      log forces. Under normal running conditions, an async log force
      returns to the caller once the CIL ctx has been formatted/submitted
      and the commit completion event triggered at the tail end of
      xlog_cil_push(). If the filesystem has shutdown, however, we rely on
      xlog_cil_committed() to trigger the completion event and it happens
      to do so after running log item unpin callbacks. This makes it
      unsafe to invoke an async log force from contexts that hold locks
      that might also be required in log completion processing.
      
      To address this problem, wake commit completion waiters before
      aborting log items in the log I/O completion handler. This ensures
      that an async log force will not deadlock on held locks if the
      filesystem happens to shutdown. Note that it is still unsafe to
      issue a sync log force while holding such locks because a sync log
      force explicitly waits on the force completion, which occurs after
      log I/O completion processing.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      545aa41f
    • B
      xfs: fix use after free in buf log item unlock assert · 4d09807f
      Brian Foster 提交于
      The xfs_buf_log_item ->iop_unlock() callback asserts that the buffer
      is unlocked when either non-stale or aborted. This assert occurs
      after the bli refcount has been dropped and the log item potentially
      freed. The aborted check is thus a potential use after free. This
      problem has been reproduced with KASAN enabled via generic/475.
      
      Fix up xfs_buf_item_unlock() to query aborted state before the bli
      reference is dropped to prevent a potential use after free.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4d09807f
  2. 26 3月, 2019 1 次提交
    • B
      xfs: serialize unaligned dio writes against all other dio writes · 2032a8a2
      Brian Foster 提交于
      XFS applies more strict serialization constraints to unaligned
      direct writes to accommodate things like direct I/O layer zeroing,
      unwritten extent conversion, etc. Unaligned submissions acquire the
      exclusive iolock and wait for in-flight dio to complete to ensure
      multiple submissions do not race on the same block and cause data
      corruption.
      
      This generally works in the case of an aligned dio followed by an
      unaligned dio, but the serialization is lost if I/Os occur in the
      opposite order. If an unaligned write is submitted first and
      immediately followed by an overlapping, aligned write, the latter
      submits without the typical unaligned serialization barriers because
      there is no indication of an unaligned dio still in-flight. This can
      lead to unpredictable results.
      
      To provide proper unaligned dio serialization, require that such
      direct writes are always the only dio allowed in-flight at one time
      for a particular inode. We already acquire the exclusive iolock and
      drain pending dio before submitting the unaligned dio. Wait once
      more after the dio submission to hold the iolock across the I/O and
      prevent further submissions until the unaligned I/O completes. This
      is heavy handed, but consistent with the current pre-submission
      serialization for unaligned direct writes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2032a8a2
  3. 25 3月, 2019 1 次提交
  4. 19 3月, 2019 3 次提交
  5. 18 3月, 2019 1 次提交
    • B
      xfs: don't trip over uninitialized buffer on extent read of corrupted inode · 6958d11f
      Brian Foster 提交于
      We've had rather rare reports of bmap btree block corruption where
      the bmap root block has a level count of zero. The root cause of the
      corruption is so far unknown. We do have verifier checks to detect
      this form of on-disk corruption, but this doesn't cover a memory
      corruption variant of the problem. The latter is a reasonable
      possibility because the root block is part of the inode fork and can
      reside in-core for some time before inode extents are read.
      
      If this occurs, it leads to a system crash such as the following:
      
       BUG: unable to handle kernel paging request at ffffffff00000221
       PF error: [normal kernel read fault]
       ...
       RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs]
       ...
       Call Trace:
        xfs_iread_extents+0x379/0x540 [xfs]
        xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs]
        ? xfs_attr_get+0xd1/0x120 [xfs]
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        xfs_file_iomap_begin+0x4c4/0x6d0 [xfs]
        ? __vfs_getxattr+0x53/0x70
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        iomap_apply+0x63/0x130
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        iomap_file_buffered_write+0x62/0x90
        ? iomap_write_begin.constprop.40+0x2d0/0x2d0
        xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs]
        __vfs_write+0x150/0x1b0
        vfs_write+0xba/0x1c0
        ksys_pwrite64+0x64/0xa0
        do_syscall_64+0x5a/0x1d0
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The crash occurs because xfs_iread_extents() attempts to release an
      uninitialized buffer pointer as the level == 0 value prevented the
      buffer from ever being allocated or read. Change the level > 0
      assert to an explicit error check in xfs_iread_extents() to avoid
      crashing the kernel in the event of localized, in-core inode
      corruption.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6958d11f
  6. 13 3月, 2019 1 次提交
  7. 11 3月, 2019 1 次提交
  8. 09 3月, 2019 2 次提交
  9. 02 3月, 2019 1 次提交
  10. 26 2月, 2019 4 次提交
  11. 24 2月, 2019 1 次提交
  12. 21 2月, 2019 9 次提交
  13. 19 2月, 2019 1 次提交
  14. 18 2月, 2019 8 次提交