1. 09 1月, 2018 6 次提交
  2. 07 11月, 2017 2 次提交
    • C
      xfs: use a b+tree for the in-core extent list · 6bdcf26a
      Christoph Hellwig 提交于
      Replace the current linear list and the indirection array for the in-core
      extent list with a b+tree to avoid the need for larger memory allocations
      for the indirection array when lots of extents are present.  The current
      extent list implementations leads to heavy pressure on the memory
      allocator when modifying files with a high extent count, and can lead
      to high latencies because of that.
      
      The replacement is a b+tree with a few quirks.  The leaf nodes directly
      store the extent record in two u64 values.  The encoding is a little bit
      different from the existing in-core extent records so that the start
      offset and length which are required for lookups can be retreived with
      simple mask operations.  The inner nodes store a 64-bit key containing
      the start offset in the first half of the node, and the pointers to the
      next lower level in the second half.  In either case we walk the node
      from the beginninig to the end and do a linear search, as that is more
      efficient for the low number of cache lines touched during a search
      (2 for the inner nodes, 4 for the leaf nodes) than a binary search.
      We store termination markers (zero length for the leaf nodes, an
      otherwise impossible high bit for the inner nodes) to terminate the key
      list / records instead of storing a count to use the available cache
      lines as efficiently as possible.
      
      One quirk of the algorithm is that while we normally split a node half and
      half like usual btree implementations we just spill over entries added at
      the very end of the list to a new node on its own.  This means we get a
      100% fill grade for the common cases of bulk insertion when reading an
      inode into memory, and when only sequentially appending to a file.  The
      downside is a slightly higher chance of splits on the first random
      insertions.
      
      Both insert and removal manually recurse into the lower levels, but
      the bulk deletion of the whole tree is still implemented as a recursive
      function call, although one limited by the overall depth and with very
      little stack usage in every iteration.
      
      For the first few extents we dynamically grow the list from a single
      extent to the next powers of two until we have a first full leaf block
      and that building the actual tree.
      
      The code started out based on the generic lib/btree.c code from Joern
      Engel based on earlier work from Peter Zijlstra, but has since been
      rewritten beyond recognition.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6bdcf26a
    • C
      xfs: allow unaligned extent records in xfs_bmbt_disk_set_all · 135dcc10
      Christoph Hellwig 提交于
      To make life a little simpler make xfs_bmbt_set_all unaligned access
      aware so that we can use it directly on the destination buffer.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      135dcc10
  3. 27 10月, 2017 3 次提交
  4. 02 9月, 2017 1 次提交
    • B
      xfs: skip bmbt block ino validation during owner change · 99c794c6
      Brian Foster 提交于
      Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
      owners on v5 (!rmapbt) filesystems. The bmbt scan uses
      xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
      current owner of the block against the parent inode of the bmbt.
      This works during extent swap because the bmbt owners are updated to
      the opposite inode number before the inode extent forks are swapped.
      
      The modified bmbt blocks are marked as ordered buffers which allows
      everything to commit in a single transaction. If the transaction
      commits to the log and the system crashes such that recovery of the
      extent swap is required, log recovery restarts the bmbt scan to fix
      up any bmbt blocks that may have not been written back before the
      crash. The log recovery bmbt scan occurs after the inode forks have
      been swapped, however. This causes the bmbt block owner verification
      to fail, leads to log recovery failure and requires xfs_repair to
      zap the log to recover.
      
      Define a new invalid inode owner flag to inform the btree block
      lookup mechanism that the current inode may be invalid with respect
      to the current owner of the bmbt block. Set this flag on the cursor
      used for change owner scans to allow this operation to work at
      runtime and during log recovery.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Fixes: bb3be7e7 ("xfs: check for bogus values in btree block headers")
      Cc: stable@vger.kernel.org
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      99c794c6
  5. 20 6月, 2017 3 次提交
  6. 26 4月, 2017 2 次提交
  7. 09 3月, 2017 1 次提交
    • C
      xfs: try any AG when allocating the first btree block when reflinking · 2fcc319d
      Christoph Hellwig 提交于
      When a reflink operation causes the bmap code to allocate a btree block
      we're currently doing single-AG allocations due to having ->firstblock
      set and then try any higher AG due a little reflink quirk we've put in
      when adding the reflink code.  But given that we do not have a minleft
      reservation of any kind in this AG we can still not have any space in
      the same or higher AG even if the file system has enough free space.
      To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
      path instead.
      
      [And yes, we need to redo this properly instead of piling hacks over
       hacks.  I'm working on that, but it's not going to be a small series.
       In the meantime this fixes the customer reported issue]
      
      Also add a warning for failing allocations to make it easier to debug.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2fcc319d
  8. 31 1月, 2017 2 次提交
  9. 10 1月, 2017 1 次提交
  10. 09 12月, 2016 1 次提交
  11. 05 12月, 2016 1 次提交
    • D
      xfs: make xfs btree stats less huge · 11ef38af
      Dave Chinner 提交于
      Embedding a switch statement in every btree stats inc/add adds a lot
      of code overhead to the core btree infrastructure paths. Stats are
      supposed to be small and lightweight, but the btree stats have
      become big and bloated as we've added more btrees. It needs fixing
      because the reflink code will just add more overhead again.
      
      Convert the v2 btree stats to arrays instead of independent
      variables, and instead use the type to index the specific btree
      array via an enum. This allows us to use array based indexing
      to update the stats, rather than having to derefence variables
      specific to the btree type.
      
      If we then wrap the xfsstats structure in a union and place uint32_t
      array beside it, and calculate the correct btree stats array base
      array index when creating a btree cursor,  we can easily access
      entries in the stats structure without having to switch names based
      on the btree type.
      
      We then replace with the switch statement with a simple set of stats
      wrapper macros, resulting in a significant simplification of the
      btree stats code, and:
      
         text	   data	    bss	    dec	    hex	filename
        48905	    144	      8	  49057	   bfa1	fs/xfs/libxfs/xfs_btree.o.old
        36793	    144	      8	  36945	   9051	fs/xfs/libxfs/xfs_btree.o
      
      it reduces the core btree infrastructure code size by close to 25%!
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      11ef38af
  12. 06 10月, 2016 1 次提交
    • D
      xfs: try other AGs to allocate a BMBT block · 90e2056d
      Darrick J. Wong 提交于
      Prior to the introduction of reflink, allocating a block and mapping
      it into a file was performed in a single transaction with a single
      block reservation, and the allocator was supposed to find enough
      blocks to allocate the extent and any BMBT blocks that might be
      necessary (unless we're low on space).
      
      However, due to the way copy on write works, allocation and mapping
      have been split into two transactions, which means that we must be
      able to handle the case where we allocate an extent for CoW but that
      AG runs out of free space before the blocks can be mapped into a file,
      and the mapping requires a new BMBT block.  When this happens, look in
      one of the other AGs for a BMBT block instead of taking the FS down.
      
      The same applies to the functions that convert a data fork to extents
      and later btree format.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      90e2056d
  13. 05 10月, 2016 1 次提交
  14. 03 8月, 2016 6 次提交
  15. 21 6月, 2016 1 次提交
  16. 02 3月, 2016 1 次提交
  17. 08 2月, 2016 1 次提交
  18. 04 1月, 2016 1 次提交
  19. 29 7月, 2015 1 次提交
    • E
      xfs: create new metadata UUID field and incompat flag · ce748eaa
      Eric Sandeen 提交于
      This adds a new superblock field, sb_meta_uuid.  If set, along with
      a new incompat flag, the code will use that field on a V5 filesystem
      to compare to metadata UUIDs, which allows us to change the user-
      visible UUID at will.  Userspace handles the setting and clearing
      of the incompat flag as appropriate, as the UUID gets changed; i.e.
      setting the user-visible UUID back to the original UUID (as stored in
      the new field) will remove the incompatible feature flag.
      
      If the incompat flag is not set, this copies the user-visible UUID into
      into the meta_uuid slot in memory when the superblock is read from disk;
      the meta_uuid field is not written back to disk in this case.
      
      The remainder of this patch simply switches verifiers, initializers,
      etc to use the new sb_meta_uuid field.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ce748eaa
  20. 28 11月, 2014 3 次提交
  21. 30 7月, 2014 1 次提交