1. 29 1月, 2014 12 次提交
    • F
      Btrfs: faster file extent item replace operations · 1acae57b
      Filipe David Borba Manana 提交于
      When writing to a file we drop existing file extent items that cover the
      write range and then add a new file extent item that represents that write
      range.
      
      Before this change we were doing a tree lookup to remove the file extent
      items, and then after we did another tree lookup to insert the new file
      extent item.
      Most of the time all the file extent items we need to drop are located
      within a single leaf - this is the leaf where our new file extent item ends
      up at. Therefore, in this common case just combine these 2 operations into
      a single one.
      
      By avoiding the second btree navigation for insertion of the new file extent
      item, we reduce btree node/leaf lock acquisitions/releases, btree block/leaf
      COW operations, CPU time on btree node/leaf key binary searches, etc.
      
      Besides for file writes, this is an operation that happens for file fsync's
      as well. However log btrees are much less likely to big as big as regular
      fs btrees, therefore the impact of this change is smaller.
      
      The following benchmark was performed against an SSD drive and a
      HDD drive, both for random and sequential writes:
      
        sysbench --test=fileio --file-num=4096 --file-total-size=8G \
           --file-test-mode=[rndwr|seqwr] --num-threads=512 \
           --file-block-size=8192 \ --max-requests=1000000 \
           --file-fsync-freq=0 --file-io-mode=sync [prepare|run]
      
      All results below are averages of 10 runs of the respective test.
      
      ** SSD sequential writes
      
      Before this change: 225.88 Mb/sec
      After this change:  277.26 Mb/sec
      
      ** SSD random writes
      
      Before this change: 49.91 Mb/sec
      After this change:  56.39 Mb/sec
      
      ** HDD sequential writes
      
      Before this change: 68.53 Mb/sec
      After this change:  69.87 Mb/sec
      
      ** HDD random writes
      
      Before this change: 13.04 Mb/sec
      After this change:  14.39 Mb/sec
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1acae57b
    • M
      Btrfs: fix the wrong nocow range check · e77751aa
      Miao Xie 提交于
      The following warning message was outputed when running the 274th case
      of xfstests with nodatacow option:
       BUG: Bad page state in process kswapd0  pfn:1c66f
       page:ffffea0000636848 count:0 mapcount:0 mapping:(null) index:0x78000
       page flags: 0x1000000000100a(error|uptodate|private_2)
      
      It is because the check of nocow range was wrong, we should compare the
      start and end position of the extent with the write position to verify
      if the write position was in the extent, but the current code just used
      the start postion to do the check, so we got the wrong extent and told
      the caller that it was a nocow write. And then when we write back the
      dirty pages, we found we should cow the extent, but at that time, there
      was no space in the fs, we had to the error flag for the page. When
      someone reclaimed that page, the above warning outputed. Fix it.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e77751aa
    • F
      Btrfs: reduce btree node locking duration on item update · eb653de1
      Filipe David Borba Manana 提交于
      If we do a btree search with the goal of updating an existing item
      without changing its size (ins_len == 0 and cow == 1), then we never
      need to hold locks on upper level nodes (even when slot == 0) after we
      COW their child nodes/leaves, as we won't have node splits or merges
      in this scenario (that is, no key additions, removals or shifts on any
      nodes or leaves).
      
      Therefore release the locks immediately after COWing the child nodes/leaves
      while navigating the btree, even if their parent slot is 0, instead of
      returning a path to the caller with those nodes locked, which would get
      released only when the caller releases or frees the path (or if it calls
      btrfs_unlock_up_safe).
      
      This is a common scenario, for example when updating inode items in fs
      trees and block group items in the extent tree.
      
      The following benchmarks were performed on a quad core machine with 32Gb
      of ram, using a leaf/node size of 4Kb (to generate deeper fs trees more
      quickly).
      
        sysbench --test=fileio --file-num=131072 --file-total-size=8G \
          --file-test-mode=seqwr --num-threads=512 --file-block-size=8192 \
          --max-requests=100000 --file-io-mode=sync [prepare|run]
      
      Before this change:  49.85Mb/s (average of 5 runs)
      After this change:   50.38Mb/s (average of 5 runs)
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      eb653de1
    • M
      Btrfs: introduce the delayed inode ref deletion for the single link inode · 67de1176
      Miao Xie 提交于
      The inode reference item is close to inode item, so we insert it simultaneously
      with the inode item insertion when we create a file/directory.. In fact, we also
      can handle the inode reference deletion by the same way. So we made this patch to
      introduce the delayed inode reference deletion for the single link inode(At most
      case, the file doesn't has hard link, so we don't take the hard link into account).
      
      This function is based on the delayed inode mechanism. After applying this patch,
      we can reduce the time of the file/directory deletion by ~10%.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      67de1176
    • F
      Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0
      Frank Holton 提交于
      Convert all applicable cases of printk and pr_* to the btrfs_* macros.
      
      Fix all uses of the BTRFS prefix.
      Signed-off-by: NFrank Holton <fholton@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      efe120a0
    • W
      Btrfs: fix a warning when iput a file · 180589ef
      Wang Shilong 提交于
      See the warning below:
      
      [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
      [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
      [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
      [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
      [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
      [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
      [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
      
      clear extent bit here to avoid triggering WARN_ON() in remove_extent_mapping()
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      180589ef
    • W
      Btrfs: remove dead comments for read_csums() · 663df053
      Wang Shilong 提交于
      Chris introduced hleper function  read_csums() and this function
      has been removed, but we forgot to remove its corresponding comments.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      663df053
    • T
      Btrfs: fix error check of btrfs_lookup_dentry() · 5662344b
      Tsutomu Itoh 提交于
      Clean up btrfs_lookup_dentry() to never return NULL, but PTR_ERR(-ENOENT)
      instead. This keeps the return value convention consistent.
      
      Callers who use btrfs_lookup_dentry() require a trivial update.
      
      create_snapshot() in particular looks like it can also lose a BUG_ON(!inode)
      which is not really needed - there seems less harm in returning ENOENT to
      userspace at that point in the stack than there is to crash the machine.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5662344b
    • F
      Btrfs: fix very slow inode eviction and fs unmount · 131e404a
      Filipe David Borba Manana 提交于
      The inode eviction can be very slow, because during eviction we
      tell the VFS to truncate all of the inode's pages. This results
      in calls to btrfs_invalidatepage() which in turn does calls to
      lock_extent_bits() and clear_extent_bit(). These calls result in
      too many merges and splits of extent_state structures, which
      consume a lot of time and cpu when the inode has many pages. In
      some scenarios I have experienced umount times higher than 15
      minutes, even when there's no pending IO (after a btrfs fs sync).
      
      A quick way to reproduce this issue:
      
      $ mkfs.btrfs -f /dev/sdb3
      $ mount /dev/sdb3 /mnt/btrfs
      $ cd /mnt/btrfs
      $ sysbench --test=fileio --file-num=128 --file-total-size=16G \
          --file-test-mode=seqwr --num-threads=128 \
          --file-block-size=16384 --max-time=60 --max-requests=0 run
      $ time btrfs fi sync .
      FSSync '.'
      
      real	0m25.457s
      user	0m0.000s
      sys	0m0.092s
      $ cd ..
      $ time umount /mnt/btrfs
      
      real	1m38.234s
      user	0m0.000s
      sys	1m25.760s
      
      The same test on ext4 runs much faster:
      
      $ mkfs.ext4 /dev/sdb3
      $ mount /dev/sdb3 /mnt/ext4
      $ cd /mnt/ext4
      $ sysbench --test=fileio --file-num=128 --file-total-size=16G \
          --file-test-mode=seqwr --num-threads=128 \
          --file-block-size=16384 --max-time=60 --max-requests=0 run
      $ sync
      $ cd ..
      $ time umount /mnt/ext4
      
      real	0m3.626s
      user	0m0.004s
      sys	0m3.012s
      
      After this patch, the unmount (inode evictions) is much faster:
      
      $ mkfs.btrfs -f /dev/sdb3
      $ mount /dev/sdb3 /mnt/btrfs
      $ cd /mnt/btrfs
      $ sysbench --test=fileio --file-num=128 --file-total-size=16G \
          --file-test-mode=seqwr --num-threads=128 \
          --file-block-size=16384 --max-time=60 --max-requests=0 run
      $ time btrfs fi sync .
      FSSync '.'
      
      real	0m26.774s
      user	0m0.000s
      sys	0m0.084s
      $ cd ..
      $ time umount /mnt/btrfs
      
      real	0m1.811s
      user	0m0.000s
      sys	0m1.564s
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      131e404a
    • K
      btrfs: expand btrfs_find_item() to include find_root_ref functionality · 75ac2dd9
      Kelley Nielsen 提交于
      This patch is the second step in bootstrapping the btrfs_find_item
      interface. The btrfs_find_root_ref() is similar to the former
      __inode_info(); it accepts four of its parameters, and duplicates the
      first half of its functionality.
      
      Replace the one former call to btrfs_find_root_ref() with a call to
      btrfs_find_item(), along with the defined key type that was used
      internally by btrfs_find_root ref, and a null found key. In
      btrfs_find_item(), add a test for the null key at the place where
      the functionality of btrfs_find_root_ref() ends; btrfs_find_item()
      then returns if the test passes. Finally, remove btrfs_find_root_ref().
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Suggested-by: NZach Brown <zab@redhat.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      75ac2dd9
    • V
      btrfs: remove unused variable from btrfs_new_inode · 99e22f78
      Valentina Giusti 提交于
      Variable owner in btrfs_new_inode is unused since commit
      d82a6f1d
      (Btrfs: kill BTRFS_I(inode)->block_group)
      Signed-off-by: NValentina Giusti <valentina.giusti@microon.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      99e22f78
    • J
      Btrfs: incompatible format change to remove hole extents · 16e7549f
      Josef Bacik 提交于
      Btrfs has always had these filler extent data items for holes in inodes.  This
      has made somethings very easy, like logging hole punches and sending hole
      punches.  However for large holey files these extent data items are pure
      overhead.  So add an incompatible feature to no longer add hole extents to
      reduce the amount of metadata used by these sort of files.  This has a few
      changes for logging and send obviously since they will need to detect holes and
      log/send the holes if there are any.  I've tested this thoroughly with xfstests
      and it doesn't cause any issues with and without the incompat format set.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      16e7549f
  2. 21 11月, 2013 2 次提交
  3. 12 11月, 2013 19 次提交
  4. 19 10月, 2013 1 次提交
  5. 11 10月, 2013 1 次提交
  6. 21 9月, 2013 3 次提交
  7. 13 9月, 2013 1 次提交
  8. 01 9月, 2013 1 次提交
    • J
      Btrfs: only update disk_i_size as we remove extents · 7f4f6e0a
      Josef Bacik 提交于
      This fixes a problem where if we fail a truncate we will leave the i_size set
      where we wanted to truncate to instead of where we were able to truncate to.
      Fix this by making btrfs_truncate_inode_items do the disk_i_size update as it
      removes extents, that way it will always be consistent with where its extents
      are.  Then if the truncate fails at all we can update the in-ram i_size with
      what we have on disk and delete the orphan item.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      7f4f6e0a