1. 07 7月, 2020 2 次提交
    • D
      xfs: add an inode item lock · 1319ebef
      Dave Chinner 提交于
      The inode log item is kind of special in that it can be aggregating
      new changes in memory at the same time time existing changes are
      being written back to disk. This means there are fields in the log
      item that are accessed concurrently from contexts that don't share
      any locking at all.
      
      e.g. updating ili_last_fields occurs at flush time under the
      ILOCK_EXCL and flush lock at flush time, under the flush lock at IO
      completion time, and is read under the ILOCK_EXCL when the inode is
      logged.  Hence there is no actual serialisation between reading the
      field during logging of the inode in transactions vs clearing the
      field in IO completion.
      
      We currently get away with this by the fact that we are only
      clearing fields in IO completion, and nothing bad happens if we
      accidentally log more of the inode than we actually modify. Worst
      case is we consume a tiny bit more memory and log bandwidth.
      
      However, if we want to do more complex state manipulations on the
      log item that requires updates at all three of these potential
      locations, we need to have some mechanism of serialising those
      operations. To do this, introduce a spinlock into the log item to
      serialise internal state.
      
      This could be done via the xfs_inode i_flags_lock, but this then
      leads to potential lock inversion issues where inode flag updates
      need to occur inside locks that best nest inside the inode log item
      locks (e.g. marking inodes stale during inode cluster freeing).
      Using a separate spinlock avoids these sorts of problems and
      simplifies future code.
      
      This does not touch the use of ili_fields in the item formatting
      code - that is entirely protected by the ILOCK_EXCL at this point in
      time, so it remains untouched.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1319ebef
    • D
      xfs: remove logged flag from inode log item · 1dfde687
      Dave Chinner 提交于
      This was used to track if the item had logged fields being flushed
      to disk. We log everything in the inode these days, so this logic is
      no longer needed. Remove it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1dfde687
  2. 20 5月, 2020 2 次提交
  3. 07 5月, 2020 4 次提交
  4. 05 5月, 2020 1 次提交
  5. 29 3月, 2020 1 次提交
  6. 27 3月, 2020 2 次提交
  7. 19 3月, 2020 2 次提交
  8. 03 3月, 2020 2 次提交
  9. 19 11月, 2019 1 次提交
  10. 14 11月, 2019 2 次提交
  11. 05 11月, 2019 1 次提交
  12. 27 8月, 2019 1 次提交
  13. 29 6月, 2019 4 次提交
  14. 30 7月, 2018 1 次提交
  15. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  16. 10 5月, 2018 1 次提交
  17. 15 3月, 2018 2 次提交
  18. 12 3月, 2018 1 次提交
  19. 29 1月, 2018 3 次提交
  20. 07 11月, 2017 1 次提交
    • C
      xfs: use a b+tree for the in-core extent list · 6bdcf26a
      Christoph Hellwig 提交于
      Replace the current linear list and the indirection array for the in-core
      extent list with a b+tree to avoid the need for larger memory allocations
      for the indirection array when lots of extents are present.  The current
      extent list implementations leads to heavy pressure on the memory
      allocator when modifying files with a high extent count, and can lead
      to high latencies because of that.
      
      The replacement is a b+tree with a few quirks.  The leaf nodes directly
      store the extent record in two u64 values.  The encoding is a little bit
      different from the existing in-core extent records so that the start
      offset and length which are required for lookups can be retreived with
      simple mask operations.  The inner nodes store a 64-bit key containing
      the start offset in the first half of the node, and the pointers to the
      next lower level in the second half.  In either case we walk the node
      from the beginninig to the end and do a linear search, as that is more
      efficient for the low number of cache lines touched during a search
      (2 for the inner nodes, 4 for the leaf nodes) than a binary search.
      We store termination markers (zero length for the leaf nodes, an
      otherwise impossible high bit for the inner nodes) to terminate the key
      list / records instead of storing a count to use the available cache
      lines as efficiently as possible.
      
      One quirk of the algorithm is that while we normally split a node half and
      half like usual btree implementations we just spill over entries added at
      the very end of the list to a new node on its own.  This means we get a
      100% fill grade for the common cases of bulk insertion when reading an
      inode into memory, and when only sequentially appending to a file.  The
      downside is a slightly higher chance of splits on the first random
      insertions.
      
      Both insert and removal manually recurse into the lower levels, but
      the bulk deletion of the whole tree is still implemented as a recursive
      function call, although one limited by the overall depth and with very
      little stack usage in every iteration.
      
      For the first few extents we dynamically grow the list from a single
      extent to the next powers of two until we have a first full leaf block
      and that building the actual tree.
      
      The code started out based on the generic lib/btree.c code from Joern
      Engel based on earlier work from Peter Zijlstra, but has since been
      rewritten beyond recognition.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6bdcf26a
  21. 27 10月, 2017 2 次提交
  22. 12 10月, 2017 1 次提交
  23. 27 9月, 2017 1 次提交
  24. 23 8月, 2017 1 次提交
    • C
      xfs: Properly retry failed inode items in case of error during buffer writeback · d3a304b6
      Carlos Maiolino 提交于
      When a buffer has been failed during writeback, the inode items into it
      are kept flush locked, and are never resubmitted due the flush lock, so,
      if any buffer fails to be written, the items in AIL are never written to
      disk and never unlocked.
      
      This causes unmount operation to hang due these items flush locked in AIL,
      but this also causes the items in AIL to never be written back, even when
      the IO device comes back to normal.
      
      I've been testing this patch with a DM-thin device, creating a
      filesystem larger than the real device.
      
      When writing enough data to fill the DM-thin device, XFS receives ENOSPC
      errors from the device, and keep spinning on xfsaild (when 'retry
      forever' configuration is set).
      
      At this point, the filesystem can not be unmounted because of the flush locked
      items in AIL, but worse, the items in AIL are never retried at all
      (once xfs_inode_item_push() will skip the items that are flush locked),
      even if the underlying DM-thin device is expanded to the proper size.
      
      This patch fixes both cases, retrying any item that has been failed
      previously, using the infra-structure provided by the previous patch.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d3a304b6