1. 30 10月, 2019 1 次提交
  2. 22 10月, 2019 4 次提交
    • B
      xfs: optimize near mode bnobt scans with concurrent cntbt lookups · dc8e69bd
      Brian Foster 提交于
      The near mode fallback algorithm consists of a left/right scan of
      the bnobt. This algorithm has very poor breakdown characteristics
      under worst case free space fragmentation conditions. If a suitable
      extent is far enough from the locality hint, each allocation may
      scan most or all of the bnobt before it completes. This causes
      pathological behavior and extremely high allocation latencies.
      
      While locality is important to near mode allocations, it is not so
      important as to incur pathological allocation latency to provide the
      asolute best available locality for every allocation. If the
      allocation is large enough or far enough away, there is a point of
      diminishing returns. As such, we can bound the overall operation by
      including an iterative cntbt lookup in the broader search. The cntbt
      lookup is optimized to immediately find the extent with best
      locality for the given size on each iteration. Since the cntbt is
      indexed by extent size, the lookup repeats with a variably
      aggressive increasing search key size until it runs off the edge of
      the tree.
      
      This approach provides a natural balance between the two algorithms
      for various situations. For example, the bnobt scan is able to
      satisfy smaller allocations such as for inode chunks or btree blocks
      more quickly where the cntbt search may have to search through a
      large set of extent sizes when the search key starts off small
      relative to the largest extent in the tree. On the other hand, the
      cntbt search more deterministically covers the set of suitable
      extents for larger data extent allocation requests that the bnobt
      scan may have to search the entire tree to locate.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      dc8e69bd
    • B
      xfs: factor out tree fixup logic into helper · d2968825
      Brian Foster 提交于
      Lift the btree fixup path into a helper function.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d2968825
    • B
      xfs: reuse best extent tracking logic for bnobt scan · fec0afda
      Brian Foster 提交于
      The near mode bnobt scan searches left and right in the bnobt
      looking for the closest free extent to the allocation hint that
      satisfies minlen. Once such an extent is found, the left/right
      search terminates, we search one more time in the opposite direction
      and finish the allocation with the best overall extent.
      
      The left/right and find best searches are currently controlled via a
      combination of cursor state and local variables. Clean up this code
      and prepare for further improvements to the near mode fallback
      algorithm by reusing the allocation cursor best extent tracking
      mechanism. Update the tracking logic to deactivate bnobt cursors
      when out of allocation range and replace open-coded extent checks to
      calls to the common helper. In doing so, rename some misnamed local
      variables in the top-level near mode allocation function.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fec0afda
    • B
      xfs: refactor cntbt lastblock scan best extent logic into helper · 396bbf3c
      Brian Foster 提交于
      The cntbt lastblock scan checks the size, alignment, locality, etc.
      of each free extent in the block and compares it with the current
      best candidate. This logic will be reused by the upcoming optimized
      cntbt algorithm, so refactor it into a separate helper. Note that
      acur->diff is now initialized to -1 (unsigned) instead of 0 to
      support the more granular comparison logic in the new helper.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      396bbf3c
  3. 21 10月, 2019 2 次提交
  4. 27 8月, 2019 2 次提交
    • D
      xfs: add kmem_alloc_io() · f8f9ee47
      Dave Chinner 提交于
      Memory we use to submit for IO needs strict alignment to the
      underlying driver contraints. Worst case, this is 512 bytes. Given
      that all allocations for IO are always a power of 2 multiple of 512
      bytes, the kernel heap provides natural alignment for objects of
      these sizes and that suffices.
      
      Until, of course, memory debugging of some kind is turned on (e.g.
      red zones, poisoning, KASAN) and then the alignment of the heap
      objects is thrown out the window. Then we get weird IO errors and
      data corruption problems because drivers don't validate alignment
      and do the wrong thing when passed unaligned memory buffers in bios.
      
      TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
      512 byte alignment of buffers for IO, even if memory debugging
      options are turned on. It is assumed that the minimum allocation
      size will be 512 bytes, and that sizes will be power of 2 mulitples
      of 512 bytes.
      
      Use this everywhere we allocate buffers for IO.
      
      This no longer fails with log recovery errors when KASAN is enabled
      due to the brd driver not handling unaligned memory buffers:
      
      # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f8f9ee47
    • D
      xfs: add kmem allocation trace points · 0ad95687
      Dave Chinner 提交于
      When trying to correlate XFS kernel allocations to memory reclaim
      behaviour, it is useful to know what allocations XFS is actually
      attempting. This information is not directly available from
      tracepoints in the generic memory allocation and reclaim
      tracepoints, so these new trace points provide a high level
      indication of what the XFS memory demand actually is.
      
      There is no per-filesystem context in this code, so we just trace
      the type of allocation, the size and the allocation constraints.
      The kmem code also doesn't include much of the common XFS headers,
      so there are a few definitions that need to be added to the trace
      headers and a couple of types that need to be made common to avoid
      needing to include the whole world in the kmem code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0ad95687
  5. 03 7月, 2019 2 次提交
  6. 29 6月, 2019 2 次提交
    • C
      xfs: split iop_unlock · ddf92053
      Christoph Hellwig 提交于
      The iop_unlock method is called when comitting or cancelling a
      transaction.  In the latter case, the transaction may or may not be
      aborted.  While there is no known problem with the current code in
      practice, this implementation is limited in that any log item
      implementation that might want to differentiate between a commit and a
      cancellation must rely on the aborted state.  The aborted bit is only
      set when the cancelled transaction is dirty, however.  This means that
      there is no way to distinguish between a commit and a clean transaction
      cancellation.
      
      For example, intent log items currently rely on this distinction.  The
      log item is either transferred to the CIL on commit or released on
      transaction cancel. There is currently no possibility for a clean intent
      log item in a transaction, but if that state is ever introduced a cancel
      of such a transaction will immediately result in memory leaks of the
      associated log item(s).  This is an interface deficiency and landmine.
      
      To clean this up, replace the iop_unlock method with an iop_release
      method that is specific to transaction cancel.  The existing
      iop_committing method occurs at the same time as iop_unlock in the
      commit path and there is no need for two separate callbacks here.
      Overload the iop_committing method with the current commit time
      iop_unlock implementations to eliminate the need for the latter and
      further simplify the interface.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      ddf92053
    • C
      xfs: don't use xfs_trans_free_items in the commit path · 195cd83d
      Christoph Hellwig 提交于
      While commiting items looks very similar to freeing them on error it is
      a different operation, and they will diverge a bit soon.
      
      Split out the commit case from xfs_trans_free_items, inline it into
      xfs_log_commit_cil and give it a separate trace point.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      195cd83d
  7. 15 4月, 2019 2 次提交
  8. 21 2月, 2019 1 次提交
  9. 18 2月, 2019 1 次提交
  10. 12 2月, 2019 4 次提交
  11. 20 12月, 2018 4 次提交
  12. 13 12月, 2018 2 次提交
  13. 21 11月, 2018 1 次提交
    • D
      xfs: uncached buffer tracing needs to print bno · d61fa8cb
      Dave Chinner 提交于
      Useless:
      
      xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_unlock:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_submit:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_hold:         dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iowait:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iodone:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iowait_done:  dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_rele:         dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      
      Useful:
      
      
      xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_unlock:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_submit:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_hold:         dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iowait:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iodone:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iowait_done:  dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_rele:         dev 253:32 bno 0x200b5 nblks 0x1 ...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d61fa8cb
  14. 29 9月, 2018 1 次提交
    • B
      xfs: don't unlock invalidated buf on aborted tx commit · d9183105
      Brian Foster 提交于
      xfstests generic/388,475 occasionally reproduce assertion failures
      in xfs_buf_item_unpin() when the final bli reference is dropped on
      an invalidated buffer and the buffer is not locked as it is expected
      to be. Invalidated buffers should remain locked on transaction
      commit until the final unpin, at which point the buffer is removed
      from the AIL and the bli is freed since stale buffers are not
      written back.
      
      The assert failures are associated with filesystem shutdown,
      typically due to log I/O errors injected by the test. The
      problematic situation can occur if the shutdown happens to cause a
      race between an active transaction that has invalidated a particular
      buffer and an I/O error on a log buffer that contains the bli
      associated with the same (now stale) buffer.
      
      Both transaction and log contexts acquire a bli reference. If the
      transaction has already invalidated the buffer by the time the I/O
      error occurs and ends up aborting due to shutdown, the transaction
      and log hold the last two references to a stale bli. If the
      transaction cancel occurs first, it treats the buffer as non-stale
      due to the aborted state: the bli reference is dropped and the
      buffer is released/unlocked. The log buffer I/O error handling
      eventually calls into xfs_buf_item_unpin(), drops the final
      reference to the bli and treats it as stale. The buffer wasn't left
      locked by xfs_buf_item_unlock(), however, so the assert fails and
      the buffer is double unlocked. The latter problem is mitigated by
      the fact that the fs is shutdown and no further damage is possible.
      
      ->iop_unlock() of an invalidated buffer should behave consistently
      with respect to the bli refcount, regardless of aborted state. If
      the refcount remains elevated on commit, we know the bli is awaiting
      an unpin (since it can't be in another transaction) and will be
      handled appropriately on log buffer completion. If the final bli
      reference of an invalidated buffer is dropped in ->iop_unlock(), we
      can assume the transaction has aborted because invalidation implies
      a dirty transaction. In the non-abort case, the log would have
      acquired a bli reference in ->iop_pin() and prevented bli release at
      ->iop_unlock() time. In the abort case the item must be freed and
      buffer unlocked because it wasn't pinned by the log.
      
      Rework xfs_buf_item_unlock() to simplify the currently circuitous
      and duplicate logic and leave invalidated buffers locked based on
      bli refcount, regardless of aborted state. This ensures that a
      pinned, stale buffer is always found locked when eventually
      unpinned.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d9183105
  15. 03 8月, 2018 3 次提交
    • B
      xfs: fold dfops into the transaction · 9d9e6233
      Brian Foster 提交于
      struct xfs_defer_ops has now been reduced to a single list_head. The
      external dfops mechanism is unused and thus everywhere a (permanent)
      transaction is accessible the associated dfops structure is as well.
      
      Remove the xfs_defer_ops structure and fold the list_head into the
      transaction. Also remove the last remnant of external dfops in
      xfs_trans_dup().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9d9e6233
    • B
      xfs: replace xfs_defer_ops ->dop_pending with on-stack list · 1ae093cb
      Brian Foster 提交于
      The xfs_defer_ops ->dop_pending list is used to track active
      deferred operations once intents are logged. These items must be
      aborted in the event of an error. The list is populated as intents
      are logged and items are removed as they complete (or are aborted).
      
      Now that xfs_defer_finish() cancels on error, there is no need to
      ever access ->dop_pending outside of xfs_defer_finish(). The list is
      only ever populated after xfs_defer_finish() begins and is either
      completed or cancelled before it returns.
      
      Remove ->dop_pending from xfs_defer_ops and replace it with a local
      list in the xfs_defer_finish() path. Pass the local list to the
      various helpers now that it is not accessible via dfops. Note that
      we have to check for NULL in the abort case as the final tx roll
      occurs outside of the scope of the new local list (once the dfops
      has completed and thus drained the list).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1ae093cb
    • B
      xfs: replace dop_low with transaction flag · 1214f1cf
      Brian Foster 提交于
      The dop_low field enables the low free space allocation mode when a
      previous allocation has detected difficulty allocating blocks. It
      has historically been part of the xfs_defer_ops structure, which
      means if enabled, it remains enabled across a set of transactions
      until the deferred operations have completed and the dfops is reset.
      
      Now that the dfops is embedded in the transaction, we can save a bit
      more space by using a transaction flag rather than a standalone
      boolean. Drop the ->dop_low field and replace it with a transaction
      flag that is set at the same points, carried across rolling
      transactions and cleared on completion of deferred operations. This
      essentially emulates the behavior of ->dop_low and so should not
      change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1214f1cf
  16. 27 7月, 2018 1 次提交
  17. 12 7月, 2018 5 次提交
  18. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  19. 10 5月, 2018 1 次提交