1. 27 7月, 2018 3 次提交
    • B
      xfs: drop unnecessary xfs_defer_finish() dfops parameter · 9e28a242
      Brian Foster 提交于
      Every caller of xfs_defer_finish() now passes the transaction and
      its associated ->t_dfops. The xfs_defer_ops parameter is therefore
      no longer necessary and can be removed.
      
      Since most xfs_defer_finish() callers also have to consider
      xfs_defer_cancel() on error, update the latter to also receive the
      transaction for consistency. The log recovery code contains an
      outlier case that cancels a dfops directly without an available
      transaction. Retain an internal wrapper to support this outlier case
      for the time being.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9e28a242
    • B
      xfs: support embedded dfops in transaction · e021a2e5
      Brian Foster 提交于
      The dfops structure used by multi-transaction operations is
      typically stored on the stack and carried around by the associated
      transaction. The lifecycle of dfops does not quite match that of the
      transaction, but they are tightly related in that the former depends
      on the latter.
      
      The relationship of these objects is tight enough that we can avoid
      the cumbersome boilerplate code required in most cases to manage
      them separately by just embedding an xfs_defer_ops in the
      transaction itself. This means that a transaction allocation returns
      with an initialized dfops, a transaction commit finishes pending
      deferred items before the tx commit, a transaction cancel cancels
      the dfops before the transaction and a transaction dup operation
      transfers the current dfops state to the new transaction.
      
      The dup operation is slightly complicated by the fact that we can no
      longer just copy a dfops pointer from the old transaction to the new
      transaction. This is solved through a dfops move helper that
      transfers the pending items and other dfops state across the
      transactions. This also requires that transaction rolling code
      always refer to the transaction for the current dfops reference.
      
      Finally, to facilitate incremental conversion to the internal dfops
      and continue to support the current external dfops mode of
      operation, create the new ->t_dfops_internal field with a layer of
      indirection. On allocation, ->t_dfops points to the internal dfops.
      This state is overridden by callers who re-init a local dfops on the
      transaction. Once ->t_dfops is overridden, the external dfops
      reference is maintained as the transaction rolls.
      
      This patch adds the fundamental ability to support an internal
      dfops. All codepaths that perform deferred processing continue to
      override the internal dfops until they are converted over in
      subsequent patches.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e021a2e5
    • B
      xfs: pack holes in xfs_defer_ops and xfs_trans · 44fd2946
      Brian Foster 提交于
      Both structures have holes due to member alignment. Move dop_low to
      the end of xfs_defer ops to sanitize the cache line alignment and
      move t_flags to save 8 bytes in xfs_trans.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      44fd2946
  2. 12 7月, 2018 2 次提交
  3. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  4. 10 5月, 2018 4 次提交
    • B
      xfs: add bmapi nodiscard flag · fcb762f5
      Brian Foster 提交于
      Freed extents are unconditionally discarded when online discard is
      enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass
      discards when unnecessary. For example, this will be useful for
      eofblocks trimming.
      
      This patch does not change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fcb762f5
    • D
      xfs: get rid of the log item descriptor · e6631f85
      Dave Chinner 提交于
      It's just a connector between a transaction and a log item. There's
      a 1:1 relationship between a log item descriptor and a log item,
      and a 1:1 relationship between a log item descriptor and a
      transaction. Both relationships are created and terminated at the
      same time, so why do we even have the descriptor?
      
      Replace it with a specific list_head in the log item and a new
      log item dirtied flag to replace the XFS_LID_DIRTY flag.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY]
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e6631f85
    • D
      xfs: log item flags are racy · 22525c17
      Dave Chinner 提交于
      The log item flags contain a field that is protected by the AIL
      lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to
      set and clear these flags, but most of the updates and checks are
      not done with the AIL lock held and so are susceptible to update
      races.
      
      Fix this by changing the log item flags to use atomic bitops rather
      than be reliant on the AIL lock for update serialisation.
      Signed-Off-By: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      22525c17
    • B
      xfs: defer agfl block frees when dfops is available · f8f2835a
      Brian Foster 提交于
      The AGFL fixup code executes before every block allocation/free and
      rectifies the AGFL based on the current, dynamic allocation
      requirements of the fs. The AGFL must hold a minimum number of
      blocks to satisfy a worst case split of the free space btrees caused
      by the impending allocation operation. The AGFL is also updated to
      maintain the implicit requirement for a minimum number of free slots
      to satisfy a worst case join of the free space btrees.
      
      Since the AGFL caches individual blocks, AGFL reduction typically
      involves multiple, single block frees. We've had reports of
      transaction overrun problems during certain workloads that boil down
      to AGFL reduction freeing multiple blocks and consuming more space
      in the log than was reserved for the transaction.
      
      Since the objective of freeing AGFL blocks is to ensure free AGFL
      free slots are available for the upcoming allocation, one way to
      address this problem is to release surplus blocks from the AGFL
      immediately but defer the free of those blocks (similar to how
      file-mapped blocks are unmapped from the file in one transaction and
      freed via a deferred operation) until the transaction is rolled.
      This turns AGFL reduction into an operation with predictable log
      reservation consumption.
      
      Add the capability to defer AGFL block frees when a deferred ops
      list is available to the AGFL fixup code. Add a dfops pointer to the
      transaction to carry dfops through various contexts to the allocator
      context. Deferring AGFL frees is  conditional behavior based on
      whether the transaction pointer is populated. The long term
      objective is to reuse the transaction pointer to clean up all
      unrelated callchains that pass dfops on the stack along with a
      transaction and in doing so, consistently defer AGFL blocks from the
      allocator.
      
      A bit of customization is required to handle deferred completion
      processing because AGFL blocks are accounted against a per-ag
      reservation pool and AGFL blocks are not inserted into the extent
      busy list when freed (they are inserted when used and released back
      to the AGFL). Reuse the majority of the existing deferred extent
      free infrastructure and customize it appropriately to handle AGFL
      blocks.
      
      Note that this patch only adds infrastructure. It does not change
      behavior because no callers have been updated to pass ->t_agfl_dfops
      into the allocation code.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f8f2835a
  5. 29 1月, 2018 1 次提交
  6. 02 9月, 2017 3 次提交
    • B
      xfs: disallow marking previously dirty buffers as ordered · a5814bce
      Brian Foster 提交于
      Ordered buffers are used in situations where the buffer is not
      physically logged but must pass through the transaction/logging
      pipeline for a particular transaction. As a result, ordered buffers
      are not unpinned and written back until the transaction commits to
      the log. Ordered buffers have a strict requirement that the target
      buffer must not be currently dirty and resident in the log pipeline
      at the time it is marked ordered. If a dirty+ordered buffer is
      committed, the buffer is reinserted to the AIL but not physically
      relogged at the LSN of the associated checkpoint. The buffer log
      item is assigned the LSN of the latest checkpoint and the AIL
      effectively releases the previously logged buffer content from the
      active log before the buffer has been written back. If the tail
      pushes forward and a filesystem crash occurs while in this state, an
      inconsistent filesystem could result.
      
      It is currently the caller responsibility to ensure an ordered
      buffer is not already dirty from a previous modification. This is
      unclear and error prone when not used in situations where it is
      guaranteed a buffer has not been previously modified (such as new
      metadata allocations).
      
      To facilitate general purpose use of ordered buffers, update
      xfs_trans_ordered_buf() to conditionally order the buffer based on
      state of the log item and return the status of the result. If the
      bli is dirty, do not order the buffer and return false. The caller
      must either physically log the buffer (having acquired the
      appropriate log reservation) or push it from the AIL to clean it
      before it can be marked ordered in the current transaction.
      
      Note that ordered buffers are currently only used in two situations:
      1.) inode chunk allocation where previously logged buffers are not
      possible and 2.) extent swap which will be updated to handle ordered
      buffer failures in a separate patch.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a5814bce
    • B
      xfs: refactor buffer logging into buffer dirtying helper · 9684010d
      Brian Foster 提交于
      xfs_trans_log_buf() is responsible for logging the dirty segments of
      a buffer along with setting all of the necessary state on the
      transaction, buffer, bli, etc., to ensure that the associated items
      are marked as dirty and prepared for I/O. We have a couple use cases
      that need to to dirty a buffer in a transaction without actually
      logging dirty ranges of the buffer.  One existing use case is
      ordered buffers, which are currently logged with arbitrary ranges to
      accomplish this even though the content of ordered buffers is never
      written to the log. Another pending use case is to relog an already
      dirty buffer across rolled transactions within the deferred
      operations infrastructure. This is required to prevent a held
      (XFS_BLI_HOLD) buffer from pinning the tail of the log.
      
      Refactor xfs_trans_log_buf() into a new function that contains all
      of the logic responsible to dirty the transaction, lidp, buffer and
      bli. This new function can be used in the future for the use cases
      outlined above. This patch does not introduce functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9684010d
    • C
      xfs: refactor xfs_trans_roll · 411350df
      Christoph Hellwig 提交于
      Split xfs_trans_roll into a low-level helper that just rolls the
      actual transaction and a new higher level xfs_trans_roll_inode
      that takes care of logging and rejoining the inode.  This gets
      rid of the NULL inode case, and allows to simplify the special
      cases in the deferred operation code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      411350df
  7. 23 8月, 2017 2 次提交
  8. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  9. 19 6月, 2017 2 次提交
  10. 07 4月, 2017 1 次提交
  11. 04 4月, 2017 1 次提交
  12. 31 1月, 2017 1 次提交
  13. 05 10月, 2016 2 次提交
  14. 04 10月, 2016 2 次提交
  15. 03 8月, 2016 8 次提交
  16. 22 7月, 2016 1 次提交
    • D
      xfs: allocate log vector buffers outside CIL context lock · b1c5ebb2
      Dave Chinner 提交于
      One of the problems we currently have with delayed logging is that
      under serious memory pressure we can deadlock memory reclaim. THis
      occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
      inodes and issues a log force to unpin inodes that are dirty in the
      CIL.
      
      The CIL is pushed, but this will only occur once it gets the CIL
      context lock to ensure that all committing transactions are complete
      and no new transactions start being committed to the CIL while the
      push switches to a new context.
      
      The deadlock occurs when the CIL context lock is held by a
      committing process that is doing memory allocation for log vector
      buffers, and that allocation is then blocked on memory reclaim
      making progress. Memory reclaim, however, is blocked waiting for
      a log force to make progress, and so we effectively deadlock at this
      point.
      
      To solve this problem, we have to move the CIL log vector buffer
      allocation outside of the context lock so that memory reclaim can
      always make progress when it needs to force the log. The problem
      with doing this is that a CIL push can take place while we are
      determining if we need to allocate a new log vector buffer for
      an item and hence the current log vector may go away without
      warning. That means we canot rely on the existing log vector being
      present when we finally grab the context lock and so we must have a
      replacement buffer ready to go at all times.
      
      To ensure this, introduce a "shadow log vector" buffer that is
      always guaranteed to be present when we gain the CIL context lock
      and format the item. This shadow buffer may or may not be used
      during the formatting, but if the log item does not have an existing
      log vector buffer or that buffer is too small for the new
      modifications, we swap it for the new shadow buffer and format
      the modifications into that new log vector buffer.
      
      The result of this is that for any object we modify more than once
      in a given CIL checkpoint, we double the memory required
      to track dirty regions in the log. For single modifications then
      we consume the shadow log vectorwe allocate on commit, and that gets
      consumed by the checkpoint. However, if we make multiple
      modifications, then the second transaction commit will allocate a
      shadow log vector and hence we will end up with double the memory
      usage as only one of the log vectors is consumed by the CIL
      checkpoint. The remaining shadow vector will be freed when th elog
      item is freed.
      
      This can probably be optimised in future - access to the shadow log
      vector is serialised by the object lock (as opposited to the active
      log vector, which is controlled by the CIL context lock) and so we
      can probably free shadow log vector from some objects when the log
      item is marked clean on removal from the AIL.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b1c5ebb2
  17. 06 4月, 2016 1 次提交
    • C
      xfs: better xfs_trans_alloc interface · 253f4911
      Christoph Hellwig 提交于
      Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
      that returns a transaction with all the required log and block reservations,
      and which allows passing transaction flags directly to avoid the cumbersome
      _xfs_trans_alloc interface.
      
      While we're at it we also get rid of the transaction type argument that has
      been superflous since we stopped supporting the non-CIL logging mode.  The
      guts of it will be removed in another patch.
      
      [dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      253f4911
  18. 02 3月, 2016 1 次提交
  19. 19 8月, 2015 3 次提交
    • B
      xfs: ensure EFD trans aborts on log recovery extent free failure · 6bc43af3
      Brian Foster 提交于
      Log recovery attempts to free extents with leftover EFIs in the AIL
      after initial processing. If the extent free fails (e.g., due to
      unrelated fs corruption), the transaction is cancelled, though it
      might not be dirtied at the time. If this is the case, the EFD does
      not abort and thus does not release the EFI. This can lead to hangs
      as the EFI pins the AIL.
      
      Update xlog_recover_process_efi() to log the EFD in the transaction
      before xfs_free_extent() errors are handled to ensure the
      transaction is dirty, aborts the EFD and releases the EFI on error.
      Since this is a requirement for EFD processing (and consistent with
      xfs_bmap_finish()), update the EFD logging helper to do the extent
      free and unconditionally log the EFD. This encodes the required EFD
      logging behavior into the helper and reduces the likelihood of
      errors down the road.
      
      [dchinner: re-add xfs_alloc.h to xfs_log_recover.c to fix build
       failure.]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6bc43af3
    • B
      xfs: return committed status from xfs_trans_roll() · d43ac29b
      Brian Foster 提交于
      Some callers need to make error handling decisions based on whether
      the current transaction successfully committed or not. Rename
      xfs_trans_roll(), add a new parameter and provide a wrapper to
      preserve existing callers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d43ac29b
    • B
      xfs: disentagle EFI release from the extent count · 5e4b5386
      Brian Foster 提交于
      Release of the EFI either occurs based on the reference count or the
      extent count. The extent count used is either the count tracked in
      the EFI or EFD, depending on the particular situation. In either
      case, the count is initialized to the final value and thus always
      matches the current efi_next_extent value once the EFI is completely
      constructed.  For example, the EFI extent count is increased as the
      extents are logged in xfs_bmap_finish() and the full free list is
      always completely processed. Therefore, the count is guaranteed to
      be complete once the EFI transaction is committed. The EFD uses the
      efd_nextents counter to release the EFI. This counter is initialized
      to the count of the EFI when the EFD is created. Thus the EFD, as
      currently used, has no concept of partial EFI release based on
      extent count.
      
      Given that the EFI extent count is always released in whole, use of
      the extent count for reference counting is unnecessary. Remove this
      level of the API and release the EFI based on the core reference
      count. The efi_next_extent counter remains because it is still used
      to track the slot to log the next extent to free.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5e4b5386