1. 29 1月, 2014 4 次提交
    • F
      Btrfs: fix very slow inode eviction and fs unmount · 131e404a
      Filipe David Borba Manana 提交于
      The inode eviction can be very slow, because during eviction we
      tell the VFS to truncate all of the inode's pages. This results
      in calls to btrfs_invalidatepage() which in turn does calls to
      lock_extent_bits() and clear_extent_bit(). These calls result in
      too many merges and splits of extent_state structures, which
      consume a lot of time and cpu when the inode has many pages. In
      some scenarios I have experienced umount times higher than 15
      minutes, even when there's no pending IO (after a btrfs fs sync).
      
      A quick way to reproduce this issue:
      
      $ mkfs.btrfs -f /dev/sdb3
      $ mount /dev/sdb3 /mnt/btrfs
      $ cd /mnt/btrfs
      $ sysbench --test=fileio --file-num=128 --file-total-size=16G \
          --file-test-mode=seqwr --num-threads=128 \
          --file-block-size=16384 --max-time=60 --max-requests=0 run
      $ time btrfs fi sync .
      FSSync '.'
      
      real	0m25.457s
      user	0m0.000s
      sys	0m0.092s
      $ cd ..
      $ time umount /mnt/btrfs
      
      real	1m38.234s
      user	0m0.000s
      sys	1m25.760s
      
      The same test on ext4 runs much faster:
      
      $ mkfs.ext4 /dev/sdb3
      $ mount /dev/sdb3 /mnt/ext4
      $ cd /mnt/ext4
      $ sysbench --test=fileio --file-num=128 --file-total-size=16G \
          --file-test-mode=seqwr --num-threads=128 \
          --file-block-size=16384 --max-time=60 --max-requests=0 run
      $ sync
      $ cd ..
      $ time umount /mnt/ext4
      
      real	0m3.626s
      user	0m0.004s
      sys	0m3.012s
      
      After this patch, the unmount (inode evictions) is much faster:
      
      $ mkfs.btrfs -f /dev/sdb3
      $ mount /dev/sdb3 /mnt/btrfs
      $ cd /mnt/btrfs
      $ sysbench --test=fileio --file-num=128 --file-total-size=16G \
          --file-test-mode=seqwr --num-threads=128 \
          --file-block-size=16384 --max-time=60 --max-requests=0 run
      $ time btrfs fi sync .
      FSSync '.'
      
      real	0m26.774s
      user	0m0.000s
      sys	0m0.084s
      $ cd ..
      $ time umount /mnt/btrfs
      
      real	0m1.811s
      user	0m0.000s
      sys	0m1.564s
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      131e404a
    • K
      btrfs: expand btrfs_find_item() to include find_root_ref functionality · 75ac2dd9
      Kelley Nielsen 提交于
      This patch is the second step in bootstrapping the btrfs_find_item
      interface. The btrfs_find_root_ref() is similar to the former
      __inode_info(); it accepts four of its parameters, and duplicates the
      first half of its functionality.
      
      Replace the one former call to btrfs_find_root_ref() with a call to
      btrfs_find_item(), along with the defined key type that was used
      internally by btrfs_find_root ref, and a null found key. In
      btrfs_find_item(), add a test for the null key at the place where
      the functionality of btrfs_find_root_ref() ends; btrfs_find_item()
      then returns if the test passes. Finally, remove btrfs_find_root_ref().
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Suggested-by: NZach Brown <zab@redhat.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      75ac2dd9
    • V
      btrfs: remove unused variable from btrfs_new_inode · 99e22f78
      Valentina Giusti 提交于
      Variable owner in btrfs_new_inode is unused since commit
      d82a6f1d
      (Btrfs: kill BTRFS_I(inode)->block_group)
      Signed-off-by: NValentina Giusti <valentina.giusti@microon.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      99e22f78
    • J
      Btrfs: incompatible format change to remove hole extents · 16e7549f
      Josef Bacik 提交于
      Btrfs has always had these filler extent data items for holes in inodes.  This
      has made somethings very easy, like logging hole punches and sending hole
      punches.  However for large holey files these extent data items are pure
      overhead.  So add an incompatible feature to no longer add hole extents to
      reduce the amount of metadata used by these sort of files.  This has a few
      changes for logging and send obviously since they will need to detect holes and
      log/send the holes if there are any.  I've tested this thoroughly with xfstests
      and it doesn't cause any issues with and without the incompat format set.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      16e7549f
  2. 21 11月, 2013 2 次提交
  3. 12 11月, 2013 19 次提交
  4. 19 10月, 2013 1 次提交
  5. 11 10月, 2013 1 次提交
  6. 21 9月, 2013 3 次提交
  7. 13 9月, 2013 1 次提交
  8. 01 9月, 2013 9 次提交
    • J
      Btrfs: only update disk_i_size as we remove extents · 7f4f6e0a
      Josef Bacik 提交于
      This fixes a problem where if we fail a truncate we will leave the i_size set
      where we wanted to truncate to instead of where we were able to truncate to.
      Fix this by making btrfs_truncate_inode_items do the disk_i_size update as it
      removes extents, that way it will always be consistent with where its extents
      are.  Then if the truncate fails at all we can update the in-ram i_size with
      what we have on disk and delete the orphan item.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      7f4f6e0a
    • J
      Btrfs: allow partial ordered extent completion · 77cef2ec
      Josef Bacik 提交于
      We currently have this problem where you can truncate pages that have not yet
      been written for an ordered extent.  We do this because the truncate will be
      coming behind to clean us up anyway so what's the harm right?  Well if truncate
      fails for whatever reason we leave an orphan item around for the file to be
      cleaned up later.  But if the user goes and truncates up the file and tries to
      read from the area that had been discarded previously they will get a csum error
      because we never actually wrote that data out.
      
      This patch fixes this by allowing us to either discard the ordered extent
      completely, by which I mean we just free up the space we had allocated and not
      add the file extent, or adjust the length of the file extent we write.  We do
      this by setting the length we truncated down to in the ordered extent, and then
      we set the file extent length and ram bytes to this length.  The total disk
      space stays unchanged since we may be compressed and we can't just chop off the
      disk space, but at least this way the file extent only points to the valid data.
      Then when the file extent is free'd the extent and csums will be freed normally.
      
      This patch is needed for the next series which will give us more graceful
      recovery of failed truncates.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      77cef2ec
    • J
      Btrfs: do not clear our orphan item runtime flag on eexist · e8e7cff6
      Josef Bacik 提交于
      We were unconditionally clearing our runtime flag on the inode on error when
      trying to insert an orphan item.  This is wrong in the case of -EEXIST since we
      obviously have an orphan item.  This was causing us to not do the correct
      cleanup of our orphan items which caused issues on cleanup.  This happens
      because currently when truncate fails we just leave the orphan item on there so
      it can be cleaned up, so if we go to remove the file later we will hit this
      issue.  What we do for truncate isn't right either, but we shouldn't screw this
      sort of thing up on error either, so fix this and then I'll fix truncate in a
      different patch.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e8e7cff6
    • G
      Btrfs: Remove superfluous casts from u64 to unsigned long long · c1c9ff7c
      Geert Uytterhoeven 提交于
      u64 is "unsigned long long" on all architectures now, so there's no need to
      cast it when formatting it using the "ll" length modifier.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c1c9ff7c
    • J
      Btrfs: avoid starting a transaction in the write path · 00361589
      Josef Bacik 提交于
      I noticed while looking at a deadlock that we are always starting a transaction
      in cow_file_range().  This isn't really needed since we only need a transaction
      if we are doing an inline extent, or if the allocator needs to allocate a chunk.
      So push down all the transaction start stuff to be closer to where we actually
      need a transaction in all of these cases.  This will hopefully reduce our write
      latency when we are committing often.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      00361589
    • J
      Btrfs: fix the error handling wrt orphan items · 4ef31a45
      Josef Bacik 提交于
      There are several places where we BUG_ON() if we fail to remove the orphan items
      and such, which is not ok, so remove those and either abort or just carry on.
      This also fixes a problem where if we couldn't start a transaction we wouldn't
      actually remove the orphan item reserve for the inode.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4ef31a45
    • L
      Btrfs: allow compressed extents to be merged during defragment · 116e0024
      Liu Bo 提交于
      The rule originally comes from nocow writing, but snapshot-aware
      defrag is a different case, the extent has been writen and we're
      not going to change the extent but add a reference on the data.
      
      So we're able to allow such compressed extents to be merged into
      one bigger extent if they're pointing to the same data.
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      116e0024
    • J
      Btrfs: fix what bits we clear when erroring out from delalloc · 151a41bc
      Josef Bacik 提交于
      First of all we no longer set EXTENT_DIRTY when we dirty an extent so this patch
      removes the clearing of EXTENT_DIRTY since it confuses me.  This patch also adds
      clearing EXTENT_DEFRAG and also doing EXTENT_DO_ACCOUNTING when we have errors.
      This is because if we are clearing delalloc without adding an ordered extent
      then we need to make sure the enospc handling stuff is accounted for.  Also if
      this range was DEFRAG we need to make sure that bit is cleared so we dont leak
      it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      151a41bc
    • J
      Btrfs: cleanup arguments to extent_clear_unlock_delalloc · c2790a2e
      Josef Bacik 提交于
      This patch removes the io_tree argument for extent_clear_unlock_delalloc since
      we always use &BTRFS_I(inode)->io_tree, and it separates out the extent tree
      operations from the page operations.  This way we just pass in the extent bits
      we want to clear and then pass in the operations we want done to the pages.
      This is because I'm going to fix what extent bits we clear in some cases and
      rather than add a bunch of new flags we'll just use the actual extent bits we
      want to clear.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c2790a2e