1. 24 3月, 2018 2 次提交
    • D
      xfs: sanity-check the unused space before trying to use it · 6915ef35
      Darrick J. Wong 提交于
      In xfs_dir2_data_use_free, we examine on-disk metadata and ASSERT if
      it doesn't make sense.  Since a carefully crafted fuzzed image can cause
      the kernel to crash after blowing a bunch of assertions, let's move
      those checks into a validator function and rig everything up to return
      EFSCORRUPTED to userspace.  Found by lastbit fuzzing ltail.bestcount via
      xfs/391.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      6915ef35
    • B
      xfs: detect agfl count corruption and reset agfl · a27ba260
      Brian Foster 提交于
      The struct xfs_agfl v5 header was originally introduced with
      unexpected padding that caused the AGFL to operate with one less
      slot than intended. The header has since been packed, but the fix
      left an incompatibility for users who upgrade from an old kernel
      with the unpacked header to a newer kernel with the packed header
      while the AGFL happens to wrap around the end. The newer kernel
      recognizes one extra slot at the physical end of the AGFL that the
      previous kernel did not. The new kernel will eventually attempt to
      allocate a block from that slot, which contains invalid data, and
      cause a crash.
      
      This condition can be detected by comparing the active range of the
      AGFL to the count. While this detects a padding mismatch, it can
      also trigger false positives for unrelated flcount corruption. Since
      we cannot distinguish a size mismatch due to padding from unrelated
      corruption, we can't trust the AGFL enough to simply repopulate the
      empty slot.
      
      Instead, avoid unnecessarily complex detection logic and and use a
      solution that can handle any form of flcount corruption that slips
      through read verifiers: distrust the entire AGFL and reset it to an
      empty state. Any valid blocks within the AGFL are intentionally
      leaked. This requires xfs_repair to rectify (which was already
      necessary based on the state the AGFL was found in). The reset
      mitigates the side effect of the padding mismatch problem from a
      filesystem crash to a free space accounting inconsistency. The
      generic approach also means that this patch can be safely backported
      to kernels with or without a packed struct xfs_agfl.
      
      Check the AGF for an invalid freelist count on initial read from
      disk. If detected, set a flag on the xfs_perag to indicate that a
      reset is required before the AGFL can be used. In the first
      transaction that attempts to use a flagged AGFL, reset it to empty,
      warn the user about the inconsistency and allow the freelist fixup
      code to repopulate the AGFL with new blocks. The xfs_perag flag is
      cleared to eliminate the need for repeated checks on each block
      allocation operation.
      
      This allows kernels that include the packing fix commit 96f859d5
      ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
      to handle older unpacked AGFL formats without a filesystem crash.
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a27ba260
  2. 12 3月, 2018 5 次提交
    • B
      xfs: account only rmapbt-used blocks against rmapbt perag res · 0ab32086
      Brian Foster 提交于
      The rmapbt perag metadata reservation reserves blocks for the
      reverse mapping btree (rmapbt). Since the rmapbt uses blocks from
      the agfl and perag accounting is updated as blocks are allocated
      from the allocation btrees, the reservation actually accounts blocks
      as they are allocated to (or freed from) the agfl rather than the
      rmapbt itself.
      
      While this works for blocks that are eventually used for the rmapbt,
      not all agfl blocks are destined for the rmapbt. Blocks that are
      allocated to the agfl (and thus "reserved" for the rmapbt) but then
      used by another structure leads to a growing inconsistency over time
      between the runtime tracking of rmapbt usage vs. actual rmapbt
      usage. Since the runtime tracking thinks all agfl blocks are rmapbt
      blocks, it essentially believes that less future reservation is
      required to satisfy the rmapbt than what is actually necessary.
      
      The inconsistency is rectified across mount cycles because the perag
      reservation is initialized based on the actual rmapbt usage at mount
      time. The problem, however, is that the excessive drain of the
      reservation at runtime opens a window to allocate blocks for other
      purposes that might be required for the rmapbt on a subsequent
      mount. This problem can be demonstrated by a simple test that runs
      an allocation workload to consume agfl blocks over time and then
      observe the difference in the agfl reservation requirement across an
      unmount/mount cycle:
      
        mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194
        ...
        ...      : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1
        umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0
        mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194
      
      As the above tracepoints show, the reservation requirement reduces
      from 3194 blocks to 2956 blocks as the workload runs.  Without any
      other changes in the filesystem, the same reservation requirement
      jumps from 2956 to 3052 blocks over a umount/mount cycle.
      
      To address this divergence, update the RMAPBT reservation to account
      blocks used for the rmapbt only rather than all blocks filled into
      the agfl. This patch makes several high-level changes toward that
      end:
      
      1.) Reintroduce an AGFL reservation type to serve as an accounting
          no-op for blocks allocated to (or freed from) the AGFL.
      2.) Invoke RMAPBT usage accounting from the actual rmapbt block
          allocation path rather than the AGFL allocation path.
      
      The first change is required because agfl blocks are considered free
      blocks throughout their lifetime. The perag reservation subsystem is
      invoked unconditionally by the allocation subsystem, so we need a
      way to tell the perag subsystem (via the allocation subsystem) to
      not make any accounting changes for blocks filled into the AGFL.
      
      The second change causes the in-core RMAPBT reservation usage
      accounting to remain consistent with the on-disk state at all times
      and eliminates the risk of leaving the rmapbt reservation
      underfilled.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0ab32086
    • B
      xfs: rename agfl perag res type to rmapbt · 21592863
      Brian Foster 提交于
      The AGFL perag reservation type accounts all allocations that feed
      into (or are released from) the allocation group free list (agfl).
      The purpose of the reservation is to support worst case conditions
      for the reverse mapping btree (rmapbt). As such, the agfl
      reservation usage accounting only considers rmapbt usage when the
      in-core counters are initialized at mount time.
      
      This implementation inconsistency leads to divergence of the in-core
      and on-disk usage accounting over time. In preparation to resolve
      this inconsistency and adjust the AGFL reservation into an rmapbt
      specific reservation, rename the AGFL reservation type and
      associated accounting fields to something more rmapbt-specific. Also
      fix up a couple tracepoints that incorrectly use the AGFL
      reservation type to pass the agfl state of the associated extent
      where the raw reservation type is expected.
      
      Note that this patch does not change perag reservation behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      21592863
    • D
      xfs: convert XFS_AGFL_SIZE to a helper function · a78ee256
      Dave Chinner 提交于
      The AGFL size calculation is about to get more complex, so lets turn
      the macro into a function first and remove the macro.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [darrick: forward port to newer kernel, simplify the helper]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      a78ee256
    • D
      xfs: convert a few more directory asserts to corruption · 3f883f5b
      Darrick J. Wong 提交于
      Yet another round of playing whack-a-mole with directory code that
      asserts on corrupt on-disk metadata when it really should be returning
      -EFSCORRUPTED instead of ASSERTing.  Found by a xfs/391 crash while
      lastbit fuzzing of ltail.bestcount.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      3f883f5b
    • C
      Cleanup old XFS_BTREE_* traces · e157ebdc
      Carlos Maiolino 提交于
      Remove unused legacy btree traces from IRIX era.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e157ebdc
  3. 01 2月, 2018 1 次提交
  4. 29 1月, 2018 7 次提交
  5. 18 1月, 2018 10 次提交
  6. 17 1月, 2018 1 次提交
    • B
      xfs: cancel tx on xfs_defer_finish() error during xattr set/remove · c4685628
      Brian Foster 提交于
      Chris Dunlop reports a problem where an xattr operation fails,
      reports the following error to syslog and hangs during unmount:
      
       ================================================
       [ BUG: lock held when returning to user space! ]
       ...
       ------------------------------------------------
       <PID> is leaving the kernel with locks still held!
       1 lock held by <PID>:
        #0:  (sb_internal){......}, at: [<ffffffffa07692a3>] xfs_trans_alloc+0xe3/0x130 [xfs]
      
      The failure/shutdown occurs during deferred ops processing which
      leads to an error return from xfs_defer_finish() via
      xfs_attr_leaf_addname(). While the root cause of the failure is
      unknown corruption, the cause of the subsequent BUG above and
      unmount hang is failure to cancel the transaction before returning
      to userspace.
      
      The transaction is not cancelled because the out_defer_cancel error
      handling paths in the xfs_attr_[leaf|node]_[add|remove]name()
      functions clear args.trans without releasing the transaction. The
      callers therefore lose the reference to the transaction and fail to
      cancel it.
      
      Since xfs_attr_[set|remove]() always cancel args.trans when != NULL
      and xfs_defer_finish()->...->xfs_trans_roll() should always return
      with a valid transaction, update the leaf/node xattr functions to
      not reset args.trans in the error path responsible for cancelling
      deferred ops.
      Reported-by: NChris Dunlop <chris@onthe.net.au>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c4685628
  7. 13 1月, 2018 4 次提交
  8. 10 1月, 2018 1 次提交
    • D
      xfs: harden directory integrity checks some more · 46c59736
      Darrick J. Wong 提交于
      If a malicious filesystem image contains a block+ format directory
      wherein the directory inode's core.mode is set such that
      S_ISDIR(core.mode) == 0, and if there are subdirectories of the
      corrupted directory, an attempt to traverse up the directory tree will
      crash the kernel in __xfs_dir3_data_check.  Running the online scrub's
      parent checks will tend to do this.
      
      The crash occurs because the directory inode's d_ops get set to
      xfs_dir[23]_nondir_ops (it's not a directory) but the parent pointer
      scrubber's indiscriminate call to xfs_readdir proceeds past the ASSERT
      if we have non fatal asserts configured.
      
      Fix the null pointer dereference crash in __xfs_dir3_data_check by
      looking for S_ISDIR or wrong d_ops; and teach the parent scrubber
      to bail out if it is fed a non-directory "parent".
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      46c59736
  9. 09 1月, 2018 9 次提交