1. 10 6月, 2014 2 次提交
  2. 11 3月, 2014 9 次提交
  3. 29 1月, 2014 5 次提交
    • C
      Btrfs: don't use ram_bytes for uncompressed inline items · 514ac8ad
      Chris Mason 提交于
      If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
      the new size.  The fixe uses the size directly from the item header when
      reading uncompressed inlines, and also fixes truncate to update the
      size as it goes.
      Reported-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org
      514ac8ad
    • M
      Btrfs: flush the dirty pages of the ordered extent aggressively during logging csum · 23c671a5
      Miao Xie 提交于
      The performance of fsync dropped down suddenly sometimes, the main reason
      of this problem was that we might only flush part dirty pages in a ordered
      extent, then got that ordered extent, wait for the csum calcucation. But if
      no task flushed the left part, we would wait until the flusher flushed them,
      sometimes we need wait for several seconds, it made the performance drop
      down suddenly. (On my box, it drop down from 56MB/s to 4-10MB/s)
      
      This patch improves the above problem by flushing left dirty pages aggressively.
      
      Test Environment:
      CPU:		2CPU * 2Cores
      Memory:		4GB
      Partition:	20GB(HDD)
      
      Test Command:
       # sysbench --num-threads=8 --test=fileio --file-num=1 \
       > --file-total-size=8G --file-block-size=32768 \
       > --file-io-mode=sync --file-fsync-freq=100 \
       > --file-fsync-end=no --max-requests=10000 \
       > --file-test-mode=rndwr run
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      23c671a5
    • F
      Btrfs: faster file extent item replace operations · 1acae57b
      Filipe David Borba Manana 提交于
      When writing to a file we drop existing file extent items that cover the
      write range and then add a new file extent item that represents that write
      range.
      
      Before this change we were doing a tree lookup to remove the file extent
      items, and then after we did another tree lookup to insert the new file
      extent item.
      Most of the time all the file extent items we need to drop are located
      within a single leaf - this is the leaf where our new file extent item ends
      up at. Therefore, in this common case just combine these 2 operations into
      a single one.
      
      By avoiding the second btree navigation for insertion of the new file extent
      item, we reduce btree node/leaf lock acquisitions/releases, btree block/leaf
      COW operations, CPU time on btree node/leaf key binary searches, etc.
      
      Besides for file writes, this is an operation that happens for file fsync's
      as well. However log btrees are much less likely to big as big as regular
      fs btrees, therefore the impact of this change is smaller.
      
      The following benchmark was performed against an SSD drive and a
      HDD drive, both for random and sequential writes:
      
        sysbench --test=fileio --file-num=4096 --file-total-size=8G \
           --file-test-mode=[rndwr|seqwr] --num-threads=512 \
           --file-block-size=8192 \ --max-requests=1000000 \
           --file-fsync-freq=0 --file-io-mode=sync [prepare|run]
      
      All results below are averages of 10 runs of the respective test.
      
      ** SSD sequential writes
      
      Before this change: 225.88 Mb/sec
      After this change:  277.26 Mb/sec
      
      ** SSD random writes
      
      Before this change: 49.91 Mb/sec
      After this change:  56.39 Mb/sec
      
      ** HDD sequential writes
      
      Before this change: 68.53 Mb/sec
      After this change:  69.87 Mb/sec
      
      ** HDD random writes
      
      Before this change: 13.04 Mb/sec
      After this change:  14.39 Mb/sec
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1acae57b
    • K
      btrfs: expand btrfs_find_item() to include find_orphan_item functionality · 3f870c28
      Kelley Nielsen 提交于
      This is the third step in bootstrapping the btrfs_find_item interface.
      The function find_orphan_item(), in orphan.c, is similar to the two
      functions already replaced by the new interface. It uses two parameters,
      which are already present in the interface, and is nearly identical to
      the function brought in in the previous patch.
      
      Replace the two calls to find_orphan_item() with calls to
      btrfs_find_item(), with the defined objectid and type that was used
      internally by find_orphan_item(), a null path, and a null key. Add a
      test for a null path to btrfs_find_item, and if it passes, allocate and
      free the path. Finally, remove find_orphan_item().
      Signed-off-by: NKelley Nielsen <kelleynnn@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3f870c28
    • J
      Btrfs: incompatible format change to remove hole extents · 16e7549f
      Josef Bacik 提交于
      Btrfs has always had these filler extent data items for holes in inodes.  This
      has made somethings very easy, like logging hole punches and sending hole
      punches.  However for large holey files these extent data items are pure
      overhead.  So add an incompatible feature to no longer add hole extents to
      reduce the amount of metadata used by these sort of files.  This has a few
      changes for logging and send obviously since they will need to detect holes and
      log/send the holes if there are any.  I've tested this thoroughly with xfstests
      and it doesn't cause any issues with and without the incompat format set.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      16e7549f
  4. 21 11月, 2013 2 次提交
  5. 12 11月, 2013 12 次提交
  6. 21 9月, 2013 3 次提交
    • J
      Btrfs: drop dir i_size when adding new names on replay · d555438b
      Josef Bacik 提交于
      So if we have dir_index items in the log that means we also have the inode item
      as well, which means that the inode's i_size is correct.  However when we
      process dir_index'es we call btrfs_add_link() which will increase the
      directory's i_size for the new entry.  To fix this we need to just set the dir
      items i_size to 0, and then as we find dir_index items we adjust the i_size.
      btrfs_add_link() will do it for new entries, and if the entry already exists we
      can just add the name_len to the i_size ourselves.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      d555438b
    • J
      Btrfs: replay dir_index items before other items · dd8e7217
      Josef Bacik 提交于
      A user reported a bug where his log would not replay because he was getting
      -EEXIST back.  This was because he had a file moved into a directory that was
      logged.  What happens is the file had a lower inode number, and so it is
      processed first when replaying the log, and so we add the inode ref in for the
      directory it was moved to.  But then we process the directories DIR_INDEX item
      and try to add the inode ref for that inode and it fails because we already
      added it when we replayed the inode.  To solve this problem we need to just
      process any DIR_INDEX items we have in the log first so this all is taken care
      of, and then we can replay the rest of the items.  With this patch my reproducer
      can remount the file system properly instead of erroring out.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dd8e7217
    • J
      Btrfs: actually log directory we are fsync()'ing · de2b530b
      Josef Bacik 提交于
      If you just create a directory and then fsync that directory and then pull the
      power plug you will come back up and the directory will not be there.  That is
      because we won't actually create directories if we've logged files inside of
      them since they will be created on replay, but in this check we will set our
      logged_trans of our current directory if it happens to be a directory, making us
      think it doesn't need to be logged.  Fix the logic to only do this to parent
      directories.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      de2b530b
  7. 01 9月, 2013 2 次提交
  8. 10 8月, 2013 1 次提交
  9. 14 6月, 2013 4 次提交
    • J
      Btrfs: exclude logged extents before replying when we are mixed · 8c2a1a30
      Josef Bacik 提交于
      With non-mixed block groups we replay the logs before we're allowed to do any
      writes, so we get away with not pinning/removing the data extents until right
      when we replay them.  However with mixed block groups we allocate out of the
      same pool, so we could easily allocate a metadata block that was logged in our
      tree log.  To deal with this we just need to notice that we have mixed block
      groups and do the normal excluding/removal dance during the pin stage of the log
      replay and that way we don't allocate metadata blocks from areas we have logged
      data extents.  With this patch we now pass xfstests generic/311 with mixed
      block groups turned on.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8c2a1a30
    • M
      Btrfs: merge pending IO for tree log write back · c6adc9cc
      Miao Xie 提交于
      Before applying this patch, we flushed the log tree of the fs/file
      tree firstly, and then flushed the log root tree. It is ineffective,
      especially on the hard disk. This patch improved this problem by wrapping
      the above two flushes by the same blk_plug.
      
      By test, the performance of the sync write went up ~60%(2.9MB/s -> 4.6MB/s)
      on my scsi disk whose disk buffer was enabled.
      
      Test step:
       # mkfs.btrfs -f -m single <disk>
       # mount <disk> <mnt>
       # dd if=/dev/zero of=<mnt>/file0 bs=32K count=1024 oflag=sync
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c6adc9cc
    • L
      Btrfs: kill replicate code in replay_one_buffer · 2da1c669
      Liu Bo 提交于
      EXTREF is treated same as REF, so we can make the code tidy.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2da1c669
    • M
      Btrfs: cleanup the similar code of the fs root read · cb517eab
      Miao Xie 提交于
      There are several functions whose code is similar, such as
        btrfs_find_last_root()
        btrfs_read_fs_root_no_radix()
      
      Besides that, some functions are invoked twice, it is unnecessary,
      for example, we are sure that all roots which is found in
        btrfs_find_orphan_roots()
      have their orphan items, so it is unnecessary to check the orphan
      item again.
      
      So cleanup it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cb517eab