1. 07 11月, 2017 6 次提交
    • C
      xfs: remove the nr_extents argument to xfs_iext_insert · 0254c2f2
      Christoph Hellwig 提交于
      We only have two places that insert 2 extents at the same time, so unroll
      the loop there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0254c2f2
    • C
      xfs: use a b+tree for the in-core extent list · 6bdcf26a
      Christoph Hellwig 提交于
      Replace the current linear list and the indirection array for the in-core
      extent list with a b+tree to avoid the need for larger memory allocations
      for the indirection array when lots of extents are present.  The current
      extent list implementations leads to heavy pressure on the memory
      allocator when modifying files with a high extent count, and can lead
      to high latencies because of that.
      
      The replacement is a b+tree with a few quirks.  The leaf nodes directly
      store the extent record in two u64 values.  The encoding is a little bit
      different from the existing in-core extent records so that the start
      offset and length which are required for lookups can be retreived with
      simple mask operations.  The inner nodes store a 64-bit key containing
      the start offset in the first half of the node, and the pointers to the
      next lower level in the second half.  In either case we walk the node
      from the beginninig to the end and do a linear search, as that is more
      efficient for the low number of cache lines touched during a search
      (2 for the inner nodes, 4 for the leaf nodes) than a binary search.
      We store termination markers (zero length for the leaf nodes, an
      otherwise impossible high bit for the inner nodes) to terminate the key
      list / records instead of storing a count to use the available cache
      lines as efficiently as possible.
      
      One quirk of the algorithm is that while we normally split a node half and
      half like usual btree implementations we just spill over entries added at
      the very end of the list to a new node on its own.  This means we get a
      100% fill grade for the common cases of bulk insertion when reading an
      inode into memory, and when only sequentially appending to a file.  The
      downside is a slightly higher chance of splits on the first random
      insertions.
      
      Both insert and removal manually recurse into the lower levels, but
      the bulk deletion of the whole tree is still implemented as a recursive
      function call, although one limited by the overall depth and with very
      little stack usage in every iteration.
      
      For the first few extents we dynamically grow the list from a single
      extent to the next powers of two until we have a first full leaf block
      and that building the actual tree.
      
      The code started out based on the generic lib/btree.c code from Joern
      Engel based on earlier work from Peter Zijlstra, but has since been
      rewritten beyond recognition.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6bdcf26a
    • C
      xfs: remove support for inlining data/extents into the inode fork · 43518812
      Christoph Hellwig 提交于
      Supporting a small bit of data inside the inode fork blows up the fork size
      a lot, removing the 32 bytes of inline data halves the effective size of
      the inode fork (and it still has a lot of unused padding left), and the
      performance of a single kmalloc doesn't show up compared to the size to read
      an inode or create one.
      
      It also simplifies the fork management code a lot.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      43518812
    • C
      xfs: introduce the xfs_iext_cursor abstraction · b2b1712a
      Christoph Hellwig 提交于
      Add a new xfs_iext_cursor structure to hide the direct extent map
      index manipulations. In addition to the existing lookup/get/insert/
      remove and update routines new primitives to get the first and last
      extent cursor, as well as moving up and down by one extent are
      provided.  Also new are convenience to increment/decrement the
      cursor and retreive the new extent, as well as to peek into the
      previous/next extent without updating the cursor and last but not
      least a macro to iterate over all extents in a fork.
      
      [darrick: rename for_each_iext to for_each_xfs_iext]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      b2b1712a
    • C
      xfs: iterate over extents in xfs_iextents_copy · 71565f4b
      Christoph Hellwig 提交于
      This actually makes the function very slightly less efficient for now as we
      detour through the expanded irect format between the in-core extent format
      and the on-disk one instead of just endian swapping them.  But with the
      incore extent btree the in-core one will use a different format and the
      representation will be entirely hidden.  It also happens to make the
      function a whole more readable.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      71565f4b
    • C
      xfs: pass an on-disk extent to xfs_bmbt_validate_extent · f36bc228
      Christoph Hellwig 提交于
      This prepares for getting rid of the current in-memory extent format.
      At the end of the series we will change the calling convention again
      to pass the xfs_bmbt_irec structure once it is available everywhere.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f36bc228
  2. 27 10月, 2017 6 次提交
  3. 02 9月, 2017 2 次提交
  4. 26 4月, 2017 1 次提交
    • C
      xfs: simplify validation of the unwritten extent bit · 0c1d9e4a
      Christoph Hellwig 提交于
      XFS only supports the unwritten extent bit in the data fork, and only if
      the file system has a version 5 superblock or the unwritten extent
      feature bit.
      
      We currently have two routines that validate the invariant:
      xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
      and xfs_validate_extent that triggers and assert in debug build.
      
      Both of them iterate over all extents of an inode fork when called,
      which isn't very efficient.
      
      This patch instead adds a new helper that verifies the invariant one
      extent at a time, and calls it from the places where we iterate over
      all extents to converted them from or two the in-memory format.  The
      callers then return -EFSCORRUPTED when reading invalid extents from
      disk, or trigger an assert when writing them to disk.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0c1d9e4a
  5. 04 4月, 2017 1 次提交
    • D
      xfs: rework the inline directory verifiers · 78420281
      Darrick J. Wong 提交于
      The inline directory verifiers should be called on the inode fork data,
      which means after iformat_local on the read side, and prior to
      ifork_flush on the write side.  This makes the fork verifier more
      consistent with the way buffer verifiers work -- i.e. they will operate
      on the memory buffer that the code will be reading and writing directly.
      
      Furthermore, revise the verifier function to return -EFSCORRUPTED so
      that we don't flood the logs with corruption messages and assert
      notices.  This has been a particular problem with xfs/348, which
      triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
      kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
      that, at least not in a verifier.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      78420281
  6. 29 3月, 2017 1 次提交
    • D
      xfs: rework the inline directory verifiers · 005c5db8
      Darrick J. Wong 提交于
      The inline directory verifiers should be called on the inode fork data,
      which means after iformat_local on the read side, and prior to
      ifork_flush on the write side.  This makes the fork verifier more
      consistent with the way buffer verifiers work -- i.e. they will operate
      on the memory buffer that the code will be reading and writing directly.
      
      Furthermore, revise the verifier function to return -EFSCORRUPTED so
      that we don't flood the logs with corruption messages and assert
      notices.  This has been a particular problem with xfs/348, which
      triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
      kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
      that, at least not in a verifier.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      ---
      v2: get the inode d_ops the proper way
      v3: describe the bug that this patch fixes; no code changes
      005c5db8
  7. 15 3月, 2017 1 次提交
  8. 03 2月, 2017 2 次提交
  9. 24 11月, 2016 1 次提交
    • C
      xfs: new inode extent list lookup helpers · 93533c78
      Christoph Hellwig 提交于
      xfs_iext_lookup_extent looks up a single extent at the passed in offset,
      and returns the extent covering the area, or the one behind it in case
      of a hole, as well as the index of the returned extent in arguments,
      as well as a simple bool as return value that is set to false if no
      extent could be found because the offset is behind EOF.  It is a simpler
      replacement for xfs_bmap_search_extent that leaves looking up the rarely
      needed previous extent to the caller and has a nicer calling convention.
      
      xfs_iext_get_extent is a helper for iterating over the extent list,
      it takes an extent index as input, and returns the extent at that index
      in it's expanded form in an argument if it exists.  The actual return
      value is a bool whether the index is valid or not.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      93533c78
  10. 08 11月, 2016 1 次提交
  11. 05 10月, 2016 2 次提交
  12. 18 5月, 2016 1 次提交
    • A
      xfs: optimise xfs_iext_destroy · 32b43ab6
      Alex Lyakas 提交于
      When unmounting XFS, we call:
      
      xfs_inode_free => xfs_idestroy_fork => xfs_iext_destroy
      
      This goes over the whole indirection array and calls
      xfs_iext_irec_remove for each one of the erps (from the last one to
      the first one). As a result, we keep shrinking (reallocating
      actually) the indirection array until we shrink out all of its
      elements. When we have files with huge numbers of extents, umount
      takes 30-80 sec, depending on the amount of files that XFS loaded
      and the amount of indirection entries of each file. The unmount
      stack looks like:
      
      [<ffffffffc0b6d200>] xfs_iext_realloc_indirect+0x40/0x60 [xfs]
      [<ffffffffc0b6cd8e>] xfs_iext_irec_remove+0xee/0xf0 [xfs]
      [<ffffffffc0b6cdcd>] xfs_iext_destroy+0x3d/0xb0 [xfs]
      [<ffffffffc0b6cef6>] xfs_idestroy_fork+0xb6/0xf0 [xfs]
      [<ffffffffc0b87002>] xfs_inode_free+0xb2/0xc0 [xfs]
      [<ffffffffc0b87260>] xfs_reclaim_inode+0x250/0x340 [xfs]
      [<ffffffffc0b87583>] xfs_reclaim_inodes_ag+0x233/0x370 [xfs]
      [<ffffffffc0b8823d>] xfs_reclaim_inodes+0x1d/0x20 [xfs]
      [<ffffffffc0b96feb>] xfs_unmountfs+0x7b/0x1a0 [xfs]
      [<ffffffffc0b98e4d>] xfs_fs_put_super+0x2d/0x70 [xfs]
      [<ffffffff811e9e36>] generic_shutdown_super+0x76/0x100
      [<ffffffff811ea207>] kill_block_super+0x27/0x70
      [<ffffffff811ea519>] deactivate_locked_super+0x49/0x60
      [<ffffffff811eaaee>] deactivate_super+0x4e/0x70
      [<ffffffff81207593>] cleanup_mnt+0x43/0x90
      [<ffffffff81207632>] __cleanup_mnt+0x12/0x20
      [<ffffffff8108f8e7>] task_work_run+0xa7/0xe0
      [<ffffffff81014ff7>] do_notify_resume+0x97/0xb0
      [<ffffffff81717c6f>] int_signal+0x12/0x17
      
      Further, this reallocation prevents us from freeing the extent list
      from a RCU callback as allocation can block. Hence if the extent
      list is in indirect format, optimise the freeing of the extent list
      to only use kmem_free calls by freeing entire extent buffer pages at
      a time, rather than extent by extent.
      
      [dchinner: simplified freeing loop based on Christoph's suggestion]
      Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      32b43ab6
  13. 06 4月, 2016 3 次提交
  14. 09 2月, 2016 1 次提交
  15. 08 2月, 2016 1 次提交
  16. 28 11月, 2014 4 次提交
  17. 30 7月, 2014 1 次提交
  18. 25 6月, 2014 2 次提交
  19. 22 6月, 2014 2 次提交
  20. 14 4月, 2014 1 次提交