1. 09 3月, 2022 1 次提交
    • D
      xfs: make forced shutdown processing atomic · da90a037
      Dave Chinner 提交于
      mainline-inclusion
      from mainline-v5.14-rc4
      commit b36d4651
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b36d4651e1650082d27fa477318183c4a7210e30
      
      -------------------------------------------------
      
      The running of a forced shutdown is a bit of a mess. It does racy
      checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
      does more racy checks in xfs_log_force_unmount() before finally
      setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
      log->icloglock.
      
      Move the checking and setting of XFS_MOUNT_SHUTDOWN into
      xfs_do_force_shutdown() so we only process a shutdown once and once
      only. Serialise this with the mp->m_sb_lock spinlock so that the
      state change is atomic and won't race. Move all the mount specific
      shutdown state changes from xfs_log_force_unmount() to
      xfs_do_force_shutdown() so they are done atomically with setting
      XFS_MOUNT_SHUTDOWN.
      
      Then get rid of the racy xlog_is_shutdown() check from
      xlog_force_shutdown(), and gate the log shutdown on the
      test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
      means that the log is shutdown once and once only, and code that
      needs to prevent races with shutdown can do so by holding the
      icloglock and checking the return value of xlog_is_shutdown().
      
      This results in a predictable shutdown execution process - we set the
      shutdown flags once and process the shutdown once rather than the
      current "as many concurrent shutdowns as can race to the flag
      setting" situation we have now.
      
      Also, now that shutdown is atomic, alway emit a stack trace when the
      error level for the filesystem is high enough. This means that we
      always get a stack trace when trying to diagnose the cause of
      shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE
      cases.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      da90a037
  2. 27 12月, 2021 1 次提交
    • D
      xfs: remove obsolete AGF counter debugging · dc0c7ae0
      Darrick J. Wong 提交于
      mainline-inclusion
      from mainline-v5.12-rc4
      commit 1aec7c3d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1aec7c3d05670b92b7339b19999009a93808efb9
      
      -------------------------------------------------
      
      In commit f8f2835a we changed the behavior of XFS to use EFIs to
      remove blocks from an overfilled AGFL because there were complaints
      about transaction overruns that stemmed from trying to free multiple
      blocks in a single transaction.
      
      Unfortunately, that commit missed a subtlety in the debug-mode
      transaction accounting when a realtime volume is attached.  If a
      realtime file undergoes a data fork mapping change such that realtime
      extents are allocated (or freed) in the same transaction that a data
      device block is also allocated (or freed), we can trip a debugging
      assertion.  This can happen (for example) if a realtime extent is
      allocated and it is necessary to reshape the bmbt to hold the new
      mapping.
      
      When we go to allocate a bmbt block from an AG, the first thing the data
      device block allocator does is ensure that the freelist is the proper
      length.  If the freelist is too long, it will trim the freelist to the
      proper length.
      
      In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
      record the decrement in the AG free list count.  Prior to f8f28 we would
      put the free block back in the free space btrees in the same
      transaction, which calls xfs_trans_agblocks_delta() to record the
      increment in the AG free block count.  Since AGFL blocks are included in
      the global free block count (fdblocks), there is no corresponding
      fdblocks update, so the AGFL free satisfies the following condition in
      xfs_trans_apply_sb_deltas:
      
      	/*
      	 * Check that superblock mods match the mods made to AGF counters.
      	 */
      	ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
      	       (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
      		tp->t_ag_btree_delta));
      
      The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
      the number blocks that were allocated.
      
      After commit f8f28 we defer the block freeing to the next chained
      transaction, which means that the calls to xfs_trans_agflist_delta and
      xfs_trans_agblocks_delta occur in separate transactions.  The (first)
      transaction that shortens the free list trips on the comparison, which
      has now become:
      
      (X + 0) == ((X) + -1 + 0)
      
      because we haven't freed the AGFL block yet; we've only logged an
      intention to free it.  When the second transaction (the deferred free)
      commits, it will evaluate the expression as:
      
      (0 + 0) == (1 + 0 + 0)
      
      and trip over that in turn.
      
      At this point, the astute reader may note that the two commits tagged by
      this patch have been in the kernel for a long time but haven't generated
      any bug reports.  How is it that the author became aware of this bug?
      
      This originally surfaced as an intermittent failure when I was testing
      realtime rmap, but a different bug report by Zorro Lang reveals the same
      assertion occuring on !lazysbcount filesystems.
      
      The common factor to both reports (and why this problem wasn't
      previously reported) becomes apparent if we consider when
      xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():
      
      	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
      		xfs_trans_apply_sb_deltas(tp);
      
      With a modern lazysbcount filesystem, transactions update only the
      percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
      xfs_trans_apply_sb_deltas is rarely called.
      
      However, updates to the count of free realtime extents are not part of
      lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
      removing data fork mappings to realtime files; similarly,
      XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.
      
      Dave mentioned in response to an earlier version of this patch:
      
      "IIUC, what you are saying is that this debug code is simply not
      exercised in normal testing and hasn't been for the past decade?  And it
      still won't be exercised on anything other than realtime device testing?
      
      "...it was debugging code from 1994 that was largely turned into dead
      code when lazysbcounters were introduced in 2007. Hence I'm not sure it
      holds any value anymore."
      
      This debugging code isn't especially helpful - you can modify the
      flcount on one AG and the freeblks of another AG, and it won't trigger.
      Add the fact that nobody noticed for a decade, and let's just get rid of
      it (and start testing realtime :P).
      
      This bug was found by running generic/051 on either a V4 filesystem
      lacking lazysbcount; or a V5 filesystem with a realtime volume.
      
      Cc: bfoster@redhat.com, zlang@redhat.com
      Fixes: f8f2835a ("xfs: defer agfl block frees when dfops is available")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      dc0c7ae0
  3. 07 5月, 2020 1 次提交
  4. 29 6月, 2019 1 次提交
  5. 12 6月, 2019 1 次提交
  6. 02 5月, 2019 1 次提交
  7. 15 2月, 2019 1 次提交
  8. 30 12月, 2018 1 次提交
  9. 13 12月, 2018 1 次提交
  10. 18 10月, 2018 1 次提交
  11. 30 7月, 2018 1 次提交
  12. 25 6月, 2018 1 次提交
  13. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  14. 16 5月, 2018 9 次提交
  15. 12 3月, 2018 1 次提交
  16. 13 1月, 2018 2 次提交
  17. 09 1月, 2018 1 次提交
  18. 22 12月, 2017 1 次提交
    • D
      xfs: always honor OWN_UNKNOWN rmap removal requests · 33df3a9c
      Darrick J. Wong 提交于
      Calling xfs_rmap_free with an unknown owner is supposed to remove any
      rmaps covering that range regardless of owner.  This is used by the EFI
      recovery code to say "we're freeing this, it mustn't be owned by
      anything anymore", but for whatever reason xfs_free_ag_extent filters
      them out.
      
      Therefore, remove the filter and make xfs_rmap_unmap actually treat it
      as a wildcard owner -- free anything that's already there, and if
      there's no owner at all then that's fine too.
      
      There are two existing callers of bmap_add_free that take care the rmap
      deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
      cleanup; convert these to use OWN_NULL (via helpers), and now we really
      require that an RUI (if any) gets added to the defer ops before any EFI.
      
      Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
      growfs will have to consult directly with the rmap to ensure that there
      aren't any rmaps in the grown region.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      33df3a9c
  19. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  20. 31 1月, 2017 2 次提交
  21. 04 1月, 2017 1 次提交
  22. 06 10月, 2016 1 次提交
    • D
      xfs: preallocate blocks for worst-case btree expansion · 84d69619
      Darrick J. Wong 提交于
      To gracefully handle the situation where a CoW operation turns a
      single refcount extent into a lot of tiny ones and then run out of
      space when a tree split has to happen, use the per-AG reserved block
      pool to pre-allocate all the space we'll ever need for a maximal
      btree.  For a 4K block size, this only costs an overhead of 0.3% of
      available disk space.
      
      When reflink is enabled, we have an unfortunate problem with rmap --
      since we can share a block billions of times, this means that the
      reverse mapping btree can expand basically infinitely.  When an AG is
      so full that there are no free blocks with which to expand the rmapbt,
      the filesystem will shut down hard.
      
      This is rather annoying to the user, so use the AG reservation code to
      reserve a "reasonable" amount of space for rmap.  We'll prevent
      reflinks and CoW operations if we think we're getting close to
      exhausting an AG's free space rather than shutting down, but this
      permanent reservation should be enough for "most" users.  Hopefully.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [hch@lst.de: ensure that we invalidate the freed btree buffer]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      84d69619
  23. 05 10月, 2016 1 次提交
  24. 04 10月, 2016 1 次提交
  25. 19 9月, 2016 1 次提交
    • D
      xfs: set up per-AG free space reservations · 3fd129b6
      Darrick J. Wong 提交于
      One unfortunate quirk of the reference count and reverse mapping
      btrees -- they can expand in size when blocks are written to *other*
      allocation groups if, say, one large extent becomes a lot of tiny
      extents.  Since we don't want to start throwing errors in the middle
      of CoWing, we need to reserve some blocks to handle future expansion.
      The transaction block reservation counters aren't sufficient here
      because we have to have a reserve of blocks in every AG, not just
      somewhere in the filesystem.
      
      Therefore, create two per-AG block reservation pools.  One feeds the
      AGFL so that rmapbt expansion always succeeds, and the other feeds all
      other metadata so that refcountbt expansion never fails.
      
      Use the count of how many reserved blocks we need to have on hand to
      create a virtual reservation in the AG.  Through selective clamping of
      the maximum length of allocation requests and of the length of the
      longest free extent, we can make it look like there's less free space
      in the AG unless the reservation owner is asking for blocks.
      
      In other words, play some accounting tricks in-core to make sure that
      we always have blocks available.  On the plus side, there's nothing to
      clean up if we crash, which is contrast to the strategy that the rough
      draft used (actually removing extents from the freespace btrees).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fd129b6
  26. 17 8月, 2016 1 次提交
  27. 03 8月, 2016 4 次提交
    • D
      xfs: add rmap btree geometry feature flag · 5d650e90
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      So xfs_info and other userspace utilities know the filesystem is
      using this feature.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5d650e90
    • D
      xfs: rmap btree requires more reserved free space · 52548852
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      The rmap btree is allocated from the AGFL, which means we have to
      ensure ENOSPC is reported to userspace before we run out of free
      space in each AG. The last allocation in an AG can cause a full
      height rmap btree split, and that means we have to reserve at least
      this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
      Update the various space calculation functions to handle this.
      
      Also, because the macros are now executing conditional code and are
      called quite frequently, convert them to functions that initialise
      variables in the struct xfs_mount, use the new variables everywhere
      and document the calculations better.
      
      [darrick.wong@oracle.com: don't reserve blocks if !rmap]
      [dchinner@redhat.com: update m_ag_max_usable after growfs]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      52548852
    • D
      xfs: add rmap btree growfs support · e70d829f
      Darrick J. Wong 提交于
      Originally-From: Dave Chinner <dchinner@redhat.com>
      
      Now we can read and write rmap btree blocks, we can add support to
      the growfs code to initialise new rmap btree blocks.
      
      [darrick.wong@oracle.com: fill out the rmap offset fields]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e70d829f
    • D
      xfs: add owner field to extent allocation and freeing · 340785cc
      Darrick J. Wong 提交于
      For the rmap btree to work, we have to feed the extent owner
      information to the the allocation and freeing functions. This
      information is what will end up in the rmap btree that tracks
      allocated extents. While we technically don't need the owner
      information when freeing extents, passing it allows us to validate
      that the extent we are removing from the rmap btree actually
      belonged to the owner we expected it to belong to.
      
      We also define a special set of owner values for internal metadata
      that would otherwise have no owner. This allows us to tell the
      difference between metadata owned by different per-ag btrees, as
      well as static fs metadata (e.g. AG headers) and internal journal
      blocks.
      
      There are also a couple of special cases we need to take care of -
      during EFI recovery, we don't actually know who the original owner
      was, so we need to pass a wildcard to indicate that we aren't
      checking the owner for validity. We also need special handling in
      growfs, as we "free" the space in the last AG when extending it, but
      because it's new space it has no actual owner...
      
      While touching the xfs_bmap_add_free() function, re-order the
      parameters to put the struct xfs_mount first.
      
      Extend the owner field to include both the owner type and some sort
      of index within the owner.  The index field will be used to support
      reverse mappings when reflink is enabled.
      
      When we're freeing extents from an EFI, we don't have the owner
      information available (rmap updates have their own redo items).
      xfs_free_extent therefore doesn't need to do an rmap update. Make
      sure that the log replay code signals this correctly.
      
      This is based upon a patch originally from Dave Chinner. It has been
      extended to add more owner information with the intent of helping
      recovery operations when things go wrong (e.g. offset of user data
      block in a file).
      
      [dchinner: de-shout the xfs_rmap_*_owner helpers]
      [darrick: minor style fixes suggested by Christoph Hellwig]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      340785cc