1. 04 5月, 2022 1 次提交
    • A
      xfs: Set up infrastructure for log attribute replay · fd920008
      Allison Henderson 提交于
      Currently attributes are modified directly across one or more
      transactions. But they are not logged or replayed in the event of an
      error. The goal of log attr replay is to enable logging and replaying
      of attribute operations using the existing delayed operations
      infrastructure.  This will later enable the attributes to become part of
      larger multi part operations that also must first be recorded to the
      log.  This is mostly of interest in the scheme of parent pointers which
      would need to maintain an attribute containing parent inode information
      any time an inode is moved, created, or removed.  Parent pointers would
      then be of interest to any feature that would need to quickly derive an
      inode path from the mount point. Online scrub, nfs lookups and fs grow
      or shrink operations are all features that could take advantage of this.
      
      This patch adds two new log item types for setting or removing
      attributes as deferred operations.  The xfs_attri_log_item will log an
      intent to set or remove an attribute.  The corresponding
      xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is
      freed once the transaction is done.  Both log items use a generic
      xfs_attr_log_format structure that contains the attribute name, value,
      flags, inode, and an op_flag that indicates if the operations is a set
      or remove.
      
      [dchinner: added extra little bits needed for intent whiteouts]
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fd920008
  2. 23 10月, 2021 1 次提交
  3. 15 10月, 2021 2 次提交
  4. 07 10月, 2020 4 次提交
    • D
      xfs: fix an incore inode UAF in xfs_bui_recover · ff4ab5e0
      Darrick J. Wong 提交于
      In xfs_bui_item_recover, there exists a use-after-free bug with regards
      to the inode that is involved in the bmap replay operation.  If the
      mapping operation does not complete, we call xfs_bmap_unmap_extent to
      create a deferred op to finish the unmapping work, and we retain a
      pointer to the incore inode.
      
      Unfortunately, the very next thing we do is commit the transaction and
      drop the inode.  If reclaim tears down the inode before we try to finish
      the defer ops, we dereference garbage and blow up.  Therefore, create a
      way to join inodes to the defer ops freezer so that we can maintain the
      xfs_inode reference until we're done with the inode.
      
      Note: This imposes the requirement that there be enough memory to keep
      every incore inode in memory throughout recovery.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      ff4ab5e0
    • D
      xfs: xfs_defer_capture should absorb remaining transaction reservation · 929b92f6
      Darrick J. Wong 提交于
      When xfs_defer_capture extracts the deferred ops and transaction state
      from a transaction, it should record the transaction reservation type
      from the old transaction so that when we continue the dfops chain, we
      still use the same reservation parameters.
      
      Doing this means that the log item recovery functions get to determine
      the transaction reservation instead of abusing tr_itruncate in yet
      another part of xfs.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      929b92f6
    • D
      xfs: xfs_defer_capture should absorb remaining block reservations · 4f9a60c4
      Darrick J. Wong 提交于
      When xfs_defer_capture extracts the deferred ops and transaction state
      from a transaction, it should record the remaining block reservations so
      that when we continue the dfops chain, we can reserve the same number of
      blocks to use.  We capture the reservations for both data and realtime
      volumes.
      
      This adds the requirement that every log intent item recovery function
      must be careful to reserve enough blocks to handle both itself and all
      defer ops that it can queue.  On the other hand, this enables us to do
      away with the handwaving block estimation nonsense that was going on in
      xlog_finish_defer_ops.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      4f9a60c4
    • D
      xfs: proper replay of deferred ops queued during log recovery · e6fff81e
      Darrick J. Wong 提交于
      When we replay unfinished intent items that have been recovered from the
      log, it's possible that the replay will cause the creation of more
      deferred work items.  As outlined in commit 50995582 ("xfs: log
      recovery should replay deferred ops in order"), later work items have an
      implicit ordering dependency on earlier work items.  Therefore, recovery
      must replay the items (both recovered and created) in the same order
      that they would have been during normal operation.
      
      For log recovery, we enforce this ordering by using an empty transaction
      to collect deferred ops that get created in the process of recovering a
      log intent item to prevent them from being committed before the rest of
      the recovered intent items.  After we finish committing all the
      recovered log items, we allocate a transaction with an enormous block
      reservation, splice our huge list of created deferred ops into that
      transaction, and commit it, thereby finishing all those ops.
      
      This is /really/ hokey -- it's the one place in XFS where we allow
      nested transactions; the splicing of the defer ops list is is inelegant
      and has to be done twice per recovery function; and the broken way we
      handle inode pointers and block reservations cause subtle use-after-free
      and allocator problems that will be fixed by this patch and the two
      patches after it.
      
      Therefore, replace the hokey empty transaction with a structure designed
      to capture each chain of deferred ops that are created as part of
      recovering a single unfinished log intent.  Finally, refactor the loop
      that replays those chains to do so using one transaction per chain.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e6fff81e
  5. 23 9月, 2020 1 次提交
    • D
      xfs: log new intent items created as part of finishing recovered intent items · 93293bcb
      Darrick J. Wong 提交于
      During a code inspection, I found a serious bug in the log intent item
      recovery code when an intent item cannot complete all the work and
      decides to requeue itself to get that done.  When this happens, the
      item recovery creates a new incore deferred op representing the
      remaining work and attaches it to the transaction that it allocated.  At
      the end of _item_recover, it moves the entire chain of deferred ops to
      the dummy parent_tp that xlog_recover_process_intents passed to it, but
      fail to log a new intent item for the remaining work before committing
      the transaction for the single unit of work.
      
      xlog_finish_defer_ops logs those new intent items once recovery has
      finished dealing with the intent items that it recovered, but this isn't
      sufficient.  If the log is forced to disk after a recovered log item
      decides to requeue itself and the system goes down before we call
      xlog_finish_defer_ops, the second log recovery will never see the new
      intent item and therefore has no idea that there was more work to do.
      It will finish recovery leaving the filesystem in a corrupted state.
      
      The same logic applies to /any/ deferred ops added during intent item
      recovery, not just the one handling the remaining work.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      93293bcb
  6. 14 5月, 2020 1 次提交
  7. 05 5月, 2020 6 次提交
  8. 13 12月, 2018 2 次提交
  9. 03 8月, 2018 10 次提交
  10. 27 7月, 2018 5 次提交
  11. 12 7月, 2018 2 次提交
  12. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  13. 10 5月, 2018 1 次提交
    • B
      xfs: defer agfl block frees when dfops is available · f8f2835a
      Brian Foster 提交于
      The AGFL fixup code executes before every block allocation/free and
      rectifies the AGFL based on the current, dynamic allocation
      requirements of the fs. The AGFL must hold a minimum number of
      blocks to satisfy a worst case split of the free space btrees caused
      by the impending allocation operation. The AGFL is also updated to
      maintain the implicit requirement for a minimum number of free slots
      to satisfy a worst case join of the free space btrees.
      
      Since the AGFL caches individual blocks, AGFL reduction typically
      involves multiple, single block frees. We've had reports of
      transaction overrun problems during certain workloads that boil down
      to AGFL reduction freeing multiple blocks and consuming more space
      in the log than was reserved for the transaction.
      
      Since the objective of freeing AGFL blocks is to ensure free AGFL
      free slots are available for the upcoming allocation, one way to
      address this problem is to release surplus blocks from the AGFL
      immediately but defer the free of those blocks (similar to how
      file-mapped blocks are unmapped from the file in one transaction and
      freed via a deferred operation) until the transaction is rolled.
      This turns AGFL reduction into an operation with predictable log
      reservation consumption.
      
      Add the capability to defer AGFL block frees when a deferred ops
      list is available to the AGFL fixup code. Add a dfops pointer to the
      transaction to carry dfops through various contexts to the allocator
      context. Deferring AGFL frees is  conditional behavior based on
      whether the transaction pointer is populated. The long term
      objective is to reuse the transaction pointer to clean up all
      unrelated callchains that pass dfops on the stack along with a
      transaction and in doing so, consistently defer AGFL blocks from the
      allocator.
      
      A bit of customization is required to handle deferred completion
      processing because AGFL blocks are accounted against a per-ag
      reservation pool and AGFL blocks are not inserted into the extent
      busy list when freed (they are inserted when used and released back
      to the AGFL). Reuse the majority of the existing deferred extent
      free infrastructure and customize it appropriately to handle AGFL
      blocks.
      
      Note that this patch only adds infrastructure. It does not change
      behavior because no callers have been updated to pass ->t_agfl_dfops
      into the allocation code.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f8f2835a
  14. 15 12月, 2017 1 次提交
  15. 02 9月, 2017 2 次提交