1. 03 8月, 2016 13 次提交
  2. 22 7月, 2016 1 次提交
    • D
      libxfs: directory node splitting does not have an extra block · 160ae76f
      Dave Chinner 提交于
      xfsprogs source commit 4280e59dcbc4cd8e01585efe788a68eb378048e8
      
      xfs_da3_split() has to handle all three versions of the
      directory/attribute btree structure. The attr tree is v1, the dir
      tre is v2 or v3. The main difference between the v1 and v2/3 trees
      is the way tree nodes are split - in the v1 tree we can require a
      double split to occur because the object to be inserted may be
      larger than the space made by splitting a leaf. In this case we need
      to do a double split - one to split the full leaf, then another to
      allocate an empty leaf block in the correct location for the new
      entry.  This does not happen with dir (v2/v3) formats as the objects
      being inserted are always guaranteed to fit into the new space in
      the split blocks.
      
      Indeed, for directories they *may* be an extra block on this buffer
      pointer. However, it's guaranteed not to be a leaf block (i.e. a
      directory data block) - the directory code only ever places hash
      index or free space blocks in this pointer (as a cursor of
      sorts), and so to use it as a directory data block will immediately
      corrupt the directory.
      
      The problem is that the code assumes that there may be extra blocks
      that we need to link into the tree once we've split the root, but
      this is not true for either dir or attr trees, because the extra
      attr block is always consumed by the last node split before we split
      the root. Hence the linking in an extra block is always wrong at the
      root split level, and this manifests itself in repair as a directory
      corruption in a repaired directory, leaving the directory rebuild
      incomplete.
      
      This is a dir v2 zero-day bug - it was in the initial dir v2 commit
      that was made back in February 1998.
      
      Fix this by ensuring the linking of the blocks after the root split
      never tries to make use of the extra blocks that may be held in the
      cursor. They are held there for other purposes and should never be
      touched by the root splitting code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      160ae76f
  3. 20 7月, 2016 5 次提交
  4. 21 6月, 2016 4 次提交
  5. 01 6月, 2016 2 次提交
  6. 18 5月, 2016 1 次提交
    • A
      xfs: optimise xfs_iext_destroy · 32b43ab6
      Alex Lyakas 提交于
      When unmounting XFS, we call:
      
      xfs_inode_free => xfs_idestroy_fork => xfs_iext_destroy
      
      This goes over the whole indirection array and calls
      xfs_iext_irec_remove for each one of the erps (from the last one to
      the first one). As a result, we keep shrinking (reallocating
      actually) the indirection array until we shrink out all of its
      elements. When we have files with huge numbers of extents, umount
      takes 30-80 sec, depending on the amount of files that XFS loaded
      and the amount of indirection entries of each file. The unmount
      stack looks like:
      
      [<ffffffffc0b6d200>] xfs_iext_realloc_indirect+0x40/0x60 [xfs]
      [<ffffffffc0b6cd8e>] xfs_iext_irec_remove+0xee/0xf0 [xfs]
      [<ffffffffc0b6cdcd>] xfs_iext_destroy+0x3d/0xb0 [xfs]
      [<ffffffffc0b6cef6>] xfs_idestroy_fork+0xb6/0xf0 [xfs]
      [<ffffffffc0b87002>] xfs_inode_free+0xb2/0xc0 [xfs]
      [<ffffffffc0b87260>] xfs_reclaim_inode+0x250/0x340 [xfs]
      [<ffffffffc0b87583>] xfs_reclaim_inodes_ag+0x233/0x370 [xfs]
      [<ffffffffc0b8823d>] xfs_reclaim_inodes+0x1d/0x20 [xfs]
      [<ffffffffc0b96feb>] xfs_unmountfs+0x7b/0x1a0 [xfs]
      [<ffffffffc0b98e4d>] xfs_fs_put_super+0x2d/0x70 [xfs]
      [<ffffffff811e9e36>] generic_shutdown_super+0x76/0x100
      [<ffffffff811ea207>] kill_block_super+0x27/0x70
      [<ffffffff811ea519>] deactivate_locked_super+0x49/0x60
      [<ffffffff811eaaee>] deactivate_super+0x4e/0x70
      [<ffffffff81207593>] cleanup_mnt+0x43/0x90
      [<ffffffff81207632>] __cleanup_mnt+0x12/0x20
      [<ffffffff8108f8e7>] task_work_run+0xa7/0xe0
      [<ffffffff81014ff7>] do_notify_resume+0x97/0xb0
      [<ffffffff81717c6f>] int_signal+0x12/0x17
      
      Further, this reallocation prevents us from freeing the extent list
      from a RCU callback as allocation can block. Hence if the extent
      list is in indirect format, optimise the freeing of the extent list
      to only use kmem_free calls by freeing entire extent buffer pages at
      a time, rather than extent by extent.
      
      [dchinner: simplified freeing loop based on Christoph's suggestion]
      Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      32b43ab6
  7. 06 4月, 2016 5 次提交
  8. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  9. 15 3月, 2016 4 次提交
    • C
      xfs: always set rvalp in xfs_dir2_node_trim_free · 355cced4
      Christoph Hellwig 提交于
      xfs_dir2_node_trim_free can return with setting the rvalp argument
      pointer.  Initialize it to 0 at the beginning of the function and
      only update it to 1 if we succeeded trimming a freespace block.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      355cced4
    • B
      xfs: borrow indirect blocks from freed extent when available · d34999c9
      Brian Foster 提交于
      xfs_bmap_del_extent() handles extent removal from the in-core and
      on-disk extent lists. When removing a delalloc range, it updates the
      indirect block reservation appropriately based on the removal. It
      currently enforces that the new indirect block reservation is less than
      or equal to the original. This is normally the case in all situations
      except for in certain cases when the removed range creates a hole in a
      single delalloc extent, thus splitting a single delalloc extent in two.
      
      It is possible with small enough extents to split an indlen==1 extent
      into two such slightly smaller extents. This leaves one extent with 0
      indirect blocks and leads to assert failures in other areas (e.g.,
      xfs_bunmapi() if the extent happens to be removed).
      
      Update the indlen distribution code to steal blocks from the deleted
      extent, if necessary, to satisfy the worst case total indirect
      reservation for the new extents. This is safe as the caller does not
      update the fdblocks counters until the extent is removed. Blocks stolen
      in this manner simply remain accounted as allocated, having ownership
      transferred from the data extent to an indirect reservation.
      
      As a precaution, fall back to the original reservation algorithm if the
      new indlen requirement is not met and warn if we end up with extents
      without any reservation at all to detect this more easily in the future.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      d34999c9
    • B
      xfs: refactor delalloc indlen reservation split into helper · a9bd24ac
      Brian Foster 提交于
      The delayed allocation indirect reservation splitting code is not
      sufficient in some cases where a delalloc extent is split in two. In
      preparation for enhancements to this code, refactor the current indlen
      distribution algorithm into a new helper function.
      
      [dchinner: rename temp, temp2 variables]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      a9bd24ac
    • B
      xfs: update freeblocks counter after extent deletion · b2706a05
      Brian Foster 提交于
      xfs_bunmapi() currently updates the fdblocks counter, unreserves quota,
      etc. before the extent is deleted by xfs_bmap_del_extent(). The function
      has problems dividing up the indirect reserved blocks for scenarios
      where a single delalloc extent is split in two. Particularly, there
      aren't always enough blocks reserved for multiple extents in a single
      extent reservation.
      
      The solution to this problem is to allow the extent removal code to
      steal from the deleted extent to meet indirect reservation requirements.
      Move the block of code in xfs_bmapi() that updates the fdblocks counter
      to after the call to xfs_bmap_del_extent() to allow the codepath to
      update the extent record before the free blocks are accounted. Also,
      reshuffle the code slightly so the delalloc accounting occurs near the
      xfs_bmap_del_extent() call to provide context for the comments.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      b2706a05
  10. 09 3月, 2016 1 次提交
  11. 07 3月, 2016 1 次提交
  12. 02 3月, 2016 1 次提交
  13. 09 2月, 2016 1 次提交