1. 26 3月, 2021 12 次提交
  2. 25 3月, 2021 1 次提交
  3. 15 3月, 2021 2 次提交
  4. 10 3月, 2021 1 次提交
  5. 27 2月, 2021 1 次提交
  6. 26 2月, 2021 2 次提交
    • D
      xfs: use current->journal_info for detecting transaction recursion · 756b1c34
      Dave Chinner 提交于
      Because the iomap code using PF_MEMALLOC_NOFS to detect transaction
      recursion in XFS is just wrong. Remove it from the iomap code and
      replace it with XFS specific internal checks using
      current->journal_info instead.
      
      [djwong: This change also realigns the lifetime of NOFS flag changes to
      match the incore transaction, instead of the inconsistent scheme we have
      now.]
      
      Fixes: 9070733b ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS")
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      756b1c34
    • D
      xfs: don't nest transactions when scanning for eofblocks · 9febcda6
      Darrick J. Wong 提交于
      Brian Foster reported a lockdep warning on xfs/167:
      
      ============================================
      WARNING: possible recursive locking detected
      5.11.0-rc4 #35 Tainted: G        W I
      --------------------------------------------
      fsstress/17733 is trying to acquire lock:
      ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_free_eofblocks+0x104/0x1d0 [xfs]
      
      but task is already holding lock:
      ffff8e0fd1d90650 (sb_internal){++++}-{0:0}, at: xfs_trans_alloc_inode+0x5f/0x160 [xfs]
      
      stack backtrace:
      CPU: 38 PID: 17733 Comm: fsstress Tainted: G        W I       5.11.0-rc4 #35
      Hardware name: Dell Inc. PowerEdge R740/01KPX8, BIOS 1.6.11 11/20/2018
      Call Trace:
       dump_stack+0x8b/0xb0
       __lock_acquire.cold+0x159/0x2ab
       lock_acquire+0x116/0x370
       xfs_trans_alloc+0x1ad/0x310 [xfs]
       xfs_free_eofblocks+0x104/0x1d0 [xfs]
       xfs_blockgc_scan_inode+0x24/0x60 [xfs]
       xfs_inode_walk_ag+0x202/0x4b0 [xfs]
       xfs_inode_walk+0x66/0xc0 [xfs]
       xfs_trans_alloc+0x160/0x310 [xfs]
       xfs_trans_alloc_inode+0x5f/0x160 [xfs]
       xfs_alloc_file_space+0x105/0x300 [xfs]
       xfs_file_fallocate+0x270/0x460 [xfs]
       vfs_fallocate+0x14d/0x3d0
       __x64_sys_fallocate+0x3e/0x70
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The cause of this is the new code that spurs a scan to garbage collect
      speculative preallocations if we fail to reserve enough blocks while
      allocating a transaction.  While the warning itself is a fairly benign
      lockdep complaint, it does expose a potential livelock if the rwsem
      behavior ever changes with regards to nesting read locks when someone's
      waiting for a write lock.
      
      Fix this by freeing the transaction and jumping back to xfs_trans_alloc
      like this patch in the V4 submission[1].
      
      [1] https://lore.kernel.org/linux-xfs/161142798066.2171939.9311024588681972086.stgit@magnolia/
      
      Fixes: a1a7d05a ("xfs: flush speculative space allocations when we run out of space")
      Reported-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      9febcda6
  7. 25 2月, 2021 2 次提交
    • B
      xfs: don't reuse busy extents on extent trim · 06058bc4
      Brian Foster 提交于
      Freed extents are marked busy from the point the freeing transaction
      commits until the associated CIL context is checkpointed to the log.
      This prevents reuse and overwrite of recently freed blocks before
      the changes are committed to disk, which can lead to corruption
      after a crash. The exception to this rule is that metadata
      allocation is allowed to reuse busy extents because metadata changes
      are also logged.
      
      As of commit 97d3ac75 ("xfs: exact busy extent tracking"), XFS
      has allowed modification or complete invalidation of outstanding
      busy extents for metadata allocations. This implementation assumes
      that use of the associated extent is imminent, which is not always
      the case. For example, the trimmed extent might not satisfy the
      minimum length of the allocation request, or the allocation
      algorithm might be involved in a search for the optimal result based
      on locality.
      
      generic/019 reproduces a corruption caused by this scenario. First,
      a metadata block (usually a bmbt or symlink block) is freed from an
      inode. A subsequent bmbt split on an unrelated inode attempts a near
      mode allocation request that invalidates the busy block during the
      search, but does not ultimately allocate it. Due to the busy state
      invalidation, the block is no longer considered busy to subsequent
      allocation. A direct I/O write request immediately allocates the
      block and writes to it. Finally, the filesystem crashes while in a
      state where the initial metadata block free had not committed to the
      on-disk log. After recovery, the original metadata block is in its
      original location as expected, but has been corrupted by the
      aforementioned dio.
      
      This demonstrates that it is fundamentally unsafe to modify busy
      extent state for extents that are not guaranteed to be allocated.
      This applies to pretty much all of the code paths that currently
      trim busy extents for one reason or another. Therefore to address
      this problem, drop the reuse mechanism from the busy extent trim
      path. This code already knows how to return partial non-busy ranges
      of the targeted free extent and higher level code tracks the busy
      state of the allocation attempt. If a block allocation fails where
      one or more candidate extents is busy, we force the log and retry
      the allocation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      06058bc4
    • D
      xfs: restore speculative_cow_prealloc_lifetime sysctl · 89e0eb8c
      Darrick J. Wong 提交于
      In commit 9669f51d I tried to get rid of the undocumented cow gc
      lifetime knob.  The knob's function was never documented and it now
      doesn't really have a function since eof and cow gc have been
      consolidated.
      
      Regrettably, xfs/231 relies on it and regresses on for-next.  I did not
      succeed at getting far enough through fstests patch review for the fixup
      to land in time.
      
      Restore the sysctl knob, document what it did (does?), put it on the
      deprecation schedule, and rip out a redundant function.
      
      Fixes: 9669f51d ("xfs: consolidate the eofblocks and cowblocks workers")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      89e0eb8c
  8. 12 2月, 2021 1 次提交
    • B
      xfs: consider shutdown in bmapbt cursor delete assert · 1cd738b1
      Brian Foster 提交于
      The assert in xfs_btree_del_cursor() checks that the bmapbt block
      allocation field has been handled correctly before the cursor is
      freed. This field is used for accurate calculation of indirect block
      reservation requirements (for delayed allocations), for example.
      generic/019 reproduces a scenario where this assert fails because
      the filesystem has shutdown while in the middle of a bmbt record
      insertion. This occurs after a bmbt block has been allocated via the
      cursor but before the higher level bmap function (i.e.
      xfs_bmap_add_extent_hole_real()) completes and resets the field.
      
      Update the assert to accommodate the transient state if the
      filesystem has shutdown. While here, clean up the indentation and
      comments in the function.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      1cd738b1
  9. 11 2月, 2021 2 次提交
  10. 05 2月, 2021 1 次提交
    • D
      xfs: fix incorrect root dquot corruption error when switching group/project quota types · 45068063
      Darrick J. Wong 提交于
      While writing up a regression test for broken behavior when a chprojid
      request fails, I noticed that we were logging corruption notices about
      the root dquot of the group/project quota file at mount time when
      testing V4 filesystems.
      
      In commit afeda600, I was trying to improve ondisk dquot validation
      by making sure that when we load an ondisk dquot into memory on behalf
      of an incore dquot, the dquot id and type matches.  Unfortunately, I
      forgot that V4 filesystems only have two quota files, and can switch
      that file between group and project quota types at mount time.  When we
      perform that switch, we'll try to load the default quota limits from the
      root dquot prior to running quotacheck and log a corruption error when
      the types don't match.
      
      This is inconsequential because quotacheck will reset the second quota
      file as part of doing the switch, but we shouldn't leave scary messages
      in the kernel log.
      
      Fixes: afeda600 ("xfs: validate ondisk/incore dquot flags")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      45068063
  11. 04 2月, 2021 15 次提交