1. 16 5月, 2018 1 次提交
  2. 07 11月, 2017 1 次提交
    • C
      xfs: use a b+tree for the in-core extent list · 6bdcf26a
      Christoph Hellwig 提交于
      Replace the current linear list and the indirection array for the in-core
      extent list with a b+tree to avoid the need for larger memory allocations
      for the indirection array when lots of extents are present.  The current
      extent list implementations leads to heavy pressure on the memory
      allocator when modifying files with a high extent count, and can lead
      to high latencies because of that.
      
      The replacement is a b+tree with a few quirks.  The leaf nodes directly
      store the extent record in two u64 values.  The encoding is a little bit
      different from the existing in-core extent records so that the start
      offset and length which are required for lookups can be retreived with
      simple mask operations.  The inner nodes store a 64-bit key containing
      the start offset in the first half of the node, and the pointers to the
      next lower level in the second half.  In either case we walk the node
      from the beginninig to the end and do a linear search, as that is more
      efficient for the low number of cache lines touched during a search
      (2 for the inner nodes, 4 for the leaf nodes) than a binary search.
      We store termination markers (zero length for the leaf nodes, an
      otherwise impossible high bit for the inner nodes) to terminate the key
      list / records instead of storing a count to use the available cache
      lines as efficiently as possible.
      
      One quirk of the algorithm is that while we normally split a node half and
      half like usual btree implementations we just spill over entries added at
      the very end of the list to a new node on its own.  This means we get a
      100% fill grade for the common cases of bulk insertion when reading an
      inode into memory, and when only sequentially appending to a file.  The
      downside is a slightly higher chance of splits on the first random
      insertions.
      
      Both insert and removal manually recurse into the lower levels, but
      the bulk deletion of the whole tree is still implemented as a recursive
      function call, although one limited by the overall depth and with very
      little stack usage in every iteration.
      
      For the first few extents we dynamically grow the list from a single
      extent to the next powers of two until we have a first full leaf block
      and that building the actual tree.
      
      The code started out based on the generic lib/btree.c code from Joern
      Engel based on earlier work from Peter Zijlstra, but has since been
      rewritten beyond recognition.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6bdcf26a
  3. 27 10月, 2017 17 次提交
  4. 05 6月, 2017 1 次提交
  5. 04 4月, 2017 1 次提交
  6. 05 10月, 2016 3 次提交
  7. 04 10月, 2016 4 次提交
  8. 19 9月, 2016 1 次提交
    • D
      xfs: set up per-AG free space reservations · 3fd129b6
      Darrick J. Wong 提交于
      One unfortunate quirk of the reference count and reverse mapping
      btrees -- they can expand in size when blocks are written to *other*
      allocation groups if, say, one large extent becomes a lot of tiny
      extents.  Since we don't want to start throwing errors in the middle
      of CoWing, we need to reserve some blocks to handle future expansion.
      The transaction block reservation counters aren't sufficient here
      because we have to have a reserve of blocks in every AG, not just
      somewhere in the filesystem.
      
      Therefore, create two per-AG block reservation pools.  One feeds the
      AGFL so that rmapbt expansion always succeeds, and the other feeds all
      other metadata so that refcountbt expansion never fails.
      
      Use the count of how many reserved blocks we need to have on hand to
      create a virtual reservation in the AG.  Through selective clamping of
      the maximum length of allocation requests and of the length of the
      longest free extent, we can make it look like there's less free space
      in the AG unless the reservation owner is asking for blocks.
      
      In other words, play some accounting tricks in-core to make sure that
      we always have blocks available.  On the plus side, there's nothing to
      clean up if we crash, which is contrast to the strategy that the rough
      draft used (actually removing extents from the freespace btrees).
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3fd129b6
  9. 03 8月, 2016 5 次提交
  10. 16 7月, 2016 1 次提交
  11. 18 3月, 2016 2 次提交
  12. 19 10月, 2015 1 次提交
  13. 29 7月, 2015 1 次提交
  14. 16 2月, 2015 1 次提交