1. 20 12月, 2018 1 次提交
  2. 13 12月, 2018 2 次提交
  3. 21 11月, 2018 1 次提交
    • D
      xfs: uncached buffer tracing needs to print bno · d61fa8cb
      Dave Chinner 提交于
      Useless:
      
      xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_unlock:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_submit:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_hold:         dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iowait:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iodone:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_iowait_done:  dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_rele:         dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      
      Useful:
      
      
      xfs_buf_get_uncached: dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_unlock:       dev 253:32 bno 0xffffffffffffffff nblks 0x1 ...
      xfs_buf_submit:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_hold:         dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iowait:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iodone:       dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_iowait_done:  dev 253:32 bno 0x200b5 nblks 0x1 ...
      xfs_buf_rele:         dev 253:32 bno 0x200b5 nblks 0x1 ...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d61fa8cb
  4. 29 9月, 2018 1 次提交
    • B
      xfs: don't unlock invalidated buf on aborted tx commit · d9183105
      Brian Foster 提交于
      xfstests generic/388,475 occasionally reproduce assertion failures
      in xfs_buf_item_unpin() when the final bli reference is dropped on
      an invalidated buffer and the buffer is not locked as it is expected
      to be. Invalidated buffers should remain locked on transaction
      commit until the final unpin, at which point the buffer is removed
      from the AIL and the bli is freed since stale buffers are not
      written back.
      
      The assert failures are associated with filesystem shutdown,
      typically due to log I/O errors injected by the test. The
      problematic situation can occur if the shutdown happens to cause a
      race between an active transaction that has invalidated a particular
      buffer and an I/O error on a log buffer that contains the bli
      associated with the same (now stale) buffer.
      
      Both transaction and log contexts acquire a bli reference. If the
      transaction has already invalidated the buffer by the time the I/O
      error occurs and ends up aborting due to shutdown, the transaction
      and log hold the last two references to a stale bli. If the
      transaction cancel occurs first, it treats the buffer as non-stale
      due to the aborted state: the bli reference is dropped and the
      buffer is released/unlocked. The log buffer I/O error handling
      eventually calls into xfs_buf_item_unpin(), drops the final
      reference to the bli and treats it as stale. The buffer wasn't left
      locked by xfs_buf_item_unlock(), however, so the assert fails and
      the buffer is double unlocked. The latter problem is mitigated by
      the fact that the fs is shutdown and no further damage is possible.
      
      ->iop_unlock() of an invalidated buffer should behave consistently
      with respect to the bli refcount, regardless of aborted state. If
      the refcount remains elevated on commit, we know the bli is awaiting
      an unpin (since it can't be in another transaction) and will be
      handled appropriately on log buffer completion. If the final bli
      reference of an invalidated buffer is dropped in ->iop_unlock(), we
      can assume the transaction has aborted because invalidation implies
      a dirty transaction. In the non-abort case, the log would have
      acquired a bli reference in ->iop_pin() and prevented bli release at
      ->iop_unlock() time. In the abort case the item must be freed and
      buffer unlocked because it wasn't pinned by the log.
      
      Rework xfs_buf_item_unlock() to simplify the currently circuitous
      and duplicate logic and leave invalidated buffers locked based on
      bli refcount, regardless of aborted state. This ensures that a
      pinned, stale buffer is always found locked when eventually
      unpinned.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d9183105
  5. 03 8月, 2018 3 次提交
    • B
      xfs: fold dfops into the transaction · 9d9e6233
      Brian Foster 提交于
      struct xfs_defer_ops has now been reduced to a single list_head. The
      external dfops mechanism is unused and thus everywhere a (permanent)
      transaction is accessible the associated dfops structure is as well.
      
      Remove the xfs_defer_ops structure and fold the list_head into the
      transaction. Also remove the last remnant of external dfops in
      xfs_trans_dup().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9d9e6233
    • B
      xfs: replace xfs_defer_ops ->dop_pending with on-stack list · 1ae093cb
      Brian Foster 提交于
      The xfs_defer_ops ->dop_pending list is used to track active
      deferred operations once intents are logged. These items must be
      aborted in the event of an error. The list is populated as intents
      are logged and items are removed as they complete (or are aborted).
      
      Now that xfs_defer_finish() cancels on error, there is no need to
      ever access ->dop_pending outside of xfs_defer_finish(). The list is
      only ever populated after xfs_defer_finish() begins and is either
      completed or cancelled before it returns.
      
      Remove ->dop_pending from xfs_defer_ops and replace it with a local
      list in the xfs_defer_finish() path. Pass the local list to the
      various helpers now that it is not accessible via dfops. Note that
      we have to check for NULL in the abort case as the final tx roll
      occurs outside of the scope of the new local list (once the dfops
      has completed and thus drained the list).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1ae093cb
    • B
      xfs: replace dop_low with transaction flag · 1214f1cf
      Brian Foster 提交于
      The dop_low field enables the low free space allocation mode when a
      previous allocation has detected difficulty allocating blocks. It
      has historically been part of the xfs_defer_ops structure, which
      means if enabled, it remains enabled across a set of transactions
      until the deferred operations have completed and the dfops is reset.
      
      Now that the dfops is embedded in the transaction, we can save a bit
      more space by using a transaction flag rather than a standalone
      boolean. Drop the ->dop_low field and replace it with a transaction
      flag that is set at the same points, carried across rolling
      transactions and cleared on completion of deferred operations. This
      essentially emulates the behavior of ->dop_low and so should not
      change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1214f1cf
  6. 27 7月, 2018 1 次提交
  7. 12 7月, 2018 5 次提交
  8. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  9. 10 5月, 2018 6 次提交
  10. 10 4月, 2018 1 次提交
    • C
      xfs: remove filestream item xfs_inode reference · 7fcd3efa
      Christoph Hellwig 提交于
      The filestreams allocator stores an xfs_fstrm_item structure in the MRU to
      cache inode number to agno mappings for a particular length of time.  Each
      xfs_fstrm_item contains the internal MRU structure, an inode pointer and
      agno value.
      
      The inode pointer stored in the xfs_fstrm_item is not referenced, however,
      which means the inode itself can be removed and reclaimed before the MRU
      item is freed. If this occurs, xfs_fstrm_free_func() can access freed or
      unrelated memory through xfs_fstrm_item->ip and crash.
      
      The obvious solution is to grab an inode reference for xfs_fstrm_item.
      The filestream mechanism only actually uses the inode pointer as a means
      to access the xfs_mount, however.  Rather than add unnecessary
      complexity, simplify the implementation to store an xfs_mount pointer in
      struct xfs_mru_cache, and pass it to the free callback.  This also
      requires updates to the tracepoint class to provide the associated data
      via parameters rather than the inode and a minor hack to peek at the MRU
      key to establish the inode number at free time.
      
      Based on debugging work and an earlier patch from Brian Foster, who
      also wrote most of this changelog.
      Reported-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7fcd3efa
  11. 24 3月, 2018 1 次提交
    • B
      xfs: detect agfl count corruption and reset agfl · a27ba260
      Brian Foster 提交于
      The struct xfs_agfl v5 header was originally introduced with
      unexpected padding that caused the AGFL to operate with one less
      slot than intended. The header has since been packed, but the fix
      left an incompatibility for users who upgrade from an old kernel
      with the unpacked header to a newer kernel with the packed header
      while the AGFL happens to wrap around the end. The newer kernel
      recognizes one extra slot at the physical end of the AGFL that the
      previous kernel did not. The new kernel will eventually attempt to
      allocate a block from that slot, which contains invalid data, and
      cause a crash.
      
      This condition can be detected by comparing the active range of the
      AGFL to the count. While this detects a padding mismatch, it can
      also trigger false positives for unrelated flcount corruption. Since
      we cannot distinguish a size mismatch due to padding from unrelated
      corruption, we can't trust the AGFL enough to simply repopulate the
      empty slot.
      
      Instead, avoid unnecessarily complex detection logic and and use a
      solution that can handle any form of flcount corruption that slips
      through read verifiers: distrust the entire AGFL and reset it to an
      empty state. Any valid blocks within the AGFL are intentionally
      leaked. This requires xfs_repair to rectify (which was already
      necessary based on the state the AGFL was found in). The reset
      mitigates the side effect of the padding mismatch problem from a
      filesystem crash to a free space accounting inconsistency. The
      generic approach also means that this patch can be safely backported
      to kernels with or without a packed struct xfs_agfl.
      
      Check the AGF for an invalid freelist count on initial read from
      disk. If detected, set a flag on the xfs_perag to indicate that a
      reset is required before the AGFL can be used. In the first
      transaction that attempts to use a flagged AGFL, reset it to empty,
      warn the user about the inconsistency and allow the freelist fixup
      code to repopulate the AGFL with new blocks. The xfs_perag flag is
      cleared to eliminate the need for repeated checks on each block
      allocation operation.
      
      This allows kernels that include the packing fix commit 96f859d5
      ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")
      to handle older unpacked AGFL formats without a filesystem crash.
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a27ba260
  12. 13 1月, 2018 2 次提交
  13. 09 1月, 2018 2 次提交
  14. 07 11月, 2017 2 次提交
    • C
      xfs: use a b+tree for the in-core extent list · 6bdcf26a
      Christoph Hellwig 提交于
      Replace the current linear list and the indirection array for the in-core
      extent list with a b+tree to avoid the need for larger memory allocations
      for the indirection array when lots of extents are present.  The current
      extent list implementations leads to heavy pressure on the memory
      allocator when modifying files with a high extent count, and can lead
      to high latencies because of that.
      
      The replacement is a b+tree with a few quirks.  The leaf nodes directly
      store the extent record in two u64 values.  The encoding is a little bit
      different from the existing in-core extent records so that the start
      offset and length which are required for lookups can be retreived with
      simple mask operations.  The inner nodes store a 64-bit key containing
      the start offset in the first half of the node, and the pointers to the
      next lower level in the second half.  In either case we walk the node
      from the beginninig to the end and do a linear search, as that is more
      efficient for the low number of cache lines touched during a search
      (2 for the inner nodes, 4 for the leaf nodes) than a binary search.
      We store termination markers (zero length for the leaf nodes, an
      otherwise impossible high bit for the inner nodes) to terminate the key
      list / records instead of storing a count to use the available cache
      lines as efficiently as possible.
      
      One quirk of the algorithm is that while we normally split a node half and
      half like usual btree implementations we just spill over entries added at
      the very end of the list to a new node on its own.  This means we get a
      100% fill grade for the common cases of bulk insertion when reading an
      inode into memory, and when only sequentially appending to a file.  The
      downside is a slightly higher chance of splits on the first random
      insertions.
      
      Both insert and removal manually recurse into the lower levels, but
      the bulk deletion of the whole tree is still implemented as a recursive
      function call, although one limited by the overall depth and with very
      little stack usage in every iteration.
      
      For the first few extents we dynamically grow the list from a single
      extent to the next powers of two until we have a first full leaf block
      and that building the actual tree.
      
      The code started out based on the generic lib/btree.c code from Joern
      Engel based on earlier work from Peter Zijlstra, but has since been
      rewritten beyond recognition.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6bdcf26a
    • C
      xfs: introduce the xfs_iext_cursor abstraction · b2b1712a
      Christoph Hellwig 提交于
      Add a new xfs_iext_cursor structure to hide the direct extent map
      index manipulations. In addition to the existing lookup/get/insert/
      remove and update routines new primitives to get the first and last
      extent cursor, as well as moving up and down by one extent are
      provided.  Also new are convenience to increment/decrement the
      cursor and retreive the new extent, as well as to peek into the
      previous/next extent without updating the cursor and last but not
      least a macro to iterate over all extents in a fork.
      
      [darrick: rename for_each_iext to for_each_xfs_iext]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      b2b1712a
  15. 03 11月, 2017 1 次提交
  16. 27 10月, 2017 2 次提交
  17. 02 9月, 2017 2 次提交
  18. 23 8月, 2017 1 次提交
  19. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  20. 19 6月, 2017 2 次提交
    • S
      xfs: remove lsn relevant fields from xfs_trans structure and its users · f990fc5a
      Shan Hai 提交于
      The t_lsn is not used anymore and the t_commit_lsn is used as a tmp
      storage for the checkpoint sequence number only in the current code.
      
      And the start/commit lsn are tracked as a transaction group tag in
      the xfs_cil_ctx instead of a single transaction, so remove them from
      the xfs_trans structure and their users to match with the design.
      Signed-off-by: NShan Hai <shan.hai@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f990fc5a
    • B
      xfs: push buffer of flush locked dquot to avoid quotacheck deadlock · 7912e7fe
      Brian Foster 提交于
      Reclaim during quotacheck can lead to deadlocks on the dquot flush
      lock:
      
       - Quotacheck populates a local delwri queue with the physical dquot
         buffers.
       - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
         dirties all of the dquots.
       - Reclaim kicks in and attempts to flush a dquot whose buffer is
         already queud on the quotacheck queue. The flush succeeds but
         queueing to the reclaim delwri queue fails as the backing buffer is
         already queued. The flush unlock is now deferred to I/O completion
         of the buffer from the quotacheck queue.
       - The dqadjust bulkstat continues and dirties the recently flushed
         dquot once again.
       - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
         the flush lock to update the backing buffers with the in-core
         recalculated values. It deadlocks on the redirtied dquot as the
         flush lock was already acquired by reclaim, but the buffer resides
         on the local delwri queue which isn't submitted until the end of
         quotacheck.
      
      This is reproduced by running quotacheck on a filesystem with a
      couple million inodes in low memory (512MB-1GB) situations. This is
      a regression as of commit 43ff2122 ("xfs: on-stack delayed write
      buffer lists"), which removed a trylock and buffer I/O submission
      from the quotacheck dquot flush sequence.
      
      Quotacheck first resets and collects the physical dquot buffers in a
      delwri queue. Then, it traverses the filesystem inodes via bulkstat,
      updates the in-core dquots, flushes the corrected dquots to the
      backing buffers and finally submits the delwri queue for I/O. Since
      the backing buffers are queued across the entire quotacheck
      operation, dquot reclaim cannot possibly complete a dquot flush
      before quotacheck completes.
      
      Therefore, quotacheck must submit the buffer for I/O in order to
      cycle the flush lock and flush the dirty in-core dquot to the
      buffer. Add a delwri queue buffer push mechanism to submit an
      individual buffer for I/O without losing the delwri queue status and
      use it from quotacheck to avoid the deadlock. This restores
      quotacheck behavior to as before the regression was introduced.
      Reported-by: NMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7912e7fe
  21. 26 4月, 2017 2 次提交