1. 02 10月, 2012 4 次提交
    • J
      Btrfs: add hole punching · 2aaa6655
      Josef Bacik 提交于
      This patch adds hole punching via fallocate.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2aaa6655
    • J
      Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d
      Josef Bacik 提交于
      I audited all users of btrfs_drop_extents and found that nobody actually uses
      the hint_byte argument.  I'm sure it was used for something at some point but
      it's not used now, and the way the pinning works the disk bytenr would never be
      immediately useful anyway so lets just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2671485d
    • J
      Btrfs: do not needlessly restart the transaction for enospc · ca7e70f5
      Josef Bacik 提交于
      We will stop and restart a transaction every time we move to a different leaf
      when truncating a file.  This is for enospc reasons, but really we could
      probably get away with doing this a little better by actually working until we
      hit an ENOSPC.  So add a ->failfast flag to the block_rsv and set it when we do
      truncates which will fail as soon as the block rsv runs out of space, and then
      at that point we can stop and restart the transaction and refill the block rsv
      and carry on.  This will make rm'ing of a file with lots of extents a bit
      faster.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ca7e70f5
    • J
      Btrfs: turbo charge fsync · 5dc562c5
      Josef Bacik 提交于
      At least for the vm workload.  Currently on fsync we will
      
      1) Truncate all items in the log tree for the given inode if they exist
      
      and
      
      2) Copy all items for a given inode into the log
      
      The problem with this is that for things like VMs you can have lots of
      extents from the fragmented writing behavior, and worst yet you may have
      only modified a few extents, not the entire thing.  This patch fixes this
      problem by tracking which transid modified our extent, and then when we do
      the tree logging we find all of the extents we've modified in our current
      transaction, sort them and commit them.  We also only truncate up to the
      xattrs of the inode and copy that stuff in normally, and then just drop any
      extents in the range we have that exist in the log already.  Here are some
      numbers of a 50 meg fio job that does random writes and fsync()s after every
      write
      
      		Original	Patched
      SATA drive	82KB/s		140KB/s
      Fusion drive	431KB/s		2532KB/s
      
      So around 2-6 times faster depending on your hardware.  There are a few
      corner cases, for example if you truncate at all we have to do it the old
      way since there is no way to be sure what is in the log is ok.  This
      probably could be done smarter, but if you write-fsync-truncate-write-fsync
      you deserve what you get.  All this work is in RAM of course so if your
      inode gets evicted from cache and you read it in and fsync it we'll do it
      the slow way if we are still in the same transaction that we last modified
      the inode in.
      
      The biggest cool part of this is that it requires no changes to the recovery
      code, so if you fsync with this patch and crash and load an old kernel, it
      will run the recovery and be a-ok.  I have tested this pretty thoroughly
      with an fsync tester and everything comes back fine, as well as xfstests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5dc562c5
  2. 29 8月, 2012 2 次提交
    • A
      Btrfs: fix deadlock in wait_for_more_refs · 1fa11e26
      Arne Jansen 提交于
      Commit a168650c introduced a waiting mechanism to prevent busy waiting in
      btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
      where a tree_mod_seq is held while waiting for the io to complete, while
      the end_io calls btrfs_run_delayed_refs.
      This whole mechanism is unnecessary. If not enough runnable refs are
      available to satisfy count, just return as count is more like a guideline
      than a strict requirement.
      In case we have to run all refs, commit transaction makes sure that no
      other threads are working in the transaction anymore, so we just assert
      here that no refs are blocked.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      1fa11e26
    • J
      Btrfs: don't allocate a seperate csums array for direct reads · c329861d
      Josef Bacik 提交于
      We've been allocating a big array for csums instead of storing them in the
      io_tree like we do for buffered reads because previously we were locking the
      entire range, so we didn't have an extent state for each sector of the
      range.  But now that we do the range locking as we map the buffers we can
      limit the mapping lenght to sectorsize and use the private part of the
      io_tree for our csums.  This allows us to avoid an extra memory allocation
      for direct reads which could incur latency.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c329861d
  3. 31 7月, 2012 1 次提交
  4. 26 7月, 2012 3 次提交
  5. 25 7月, 2012 1 次提交
  6. 24 7月, 2012 2 次提交
    • L
      Btrfs: rewrite BTRFS_SETGET_FUNCS · 18077bb4
      Li Zefan 提交于
      BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
      btrfs_foo() functions, which read and write specific fields in the
      extent buffer.
      
      The total number of set/get functions is ~200, but in fact we only
      need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.
      
      It results in redunction of ~37K bytes.
      
         text    data     bss     dec     hex filename
       629661   12489     216  642366   9cd3e fs/btrfs/btrfs.o.orig
       592637   12489     216  605342   93c9e fs/btrfs/btrfs.o
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      18077bb4
    • L
      Btrfs: kill free_space pointer from inode structure · b4d7c3c9
      Li Zefan 提交于
      Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
      which means every inode has the same BTRFS_I(inode)->free_space pointer.
      
      This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      b4d7c3c9
  7. 12 7月, 2012 2 次提交
  8. 10 7月, 2012 4 次提交
  9. 21 6月, 2012 1 次提交
  10. 15 6月, 2012 1 次提交
    • J
      Btrfs: add btrfs_next_old_leaf · 3d7806ec
      Jan Schmidt 提交于
      To make sense of the tree mod log, the backref walker not only needs
      btrfs_search_old_slot, but it also called btrfs_next_leaf, which in turn was
      calling btrfs_search_slot. This obviously didn't give the correct result.
      
      This commit adds btrfs_next_old_leaf, a drop-in replacement for
      btrfs_next_leaf with a time_seq parameter. If it is zero, it behaves exactly
      like btrfs_next_leaf. If it is non-zero, it will use btrfs_search_old_slot
      with this time_seq parameter.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      3d7806ec
  11. 02 6月, 2012 1 次提交
  12. 31 5月, 2012 1 次提交
    • J
      Btrfs: use delayed ref sequence numbers for all fs-tree updates · 95a06077
      Jan Schmidt 提交于
      The sequence number for delayed refs is needed to postpone certain delayed
      refs for a very short period while walking backrefs. Before the tree
      modification log, we thought we'd only have to hold back those references
      that don't have a counter operation.
      
      While now we've the tree mod log, we're rewinding fs tree blocks to a
      defined consistent state. We cannot know in advance for which tree block
      we'll be doing rewind operations later. Therefore, we must postpone all the
      delayed refs for fs-tree blocks, even those having a counter operation.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      95a06077
  13. 30 5月, 2012 5 次提交
    • S
      Btrfs: set ioprio of scrub readahead to idle · 3d136a11
      Stefan Behrens 提交于
      Reduce ioprio class of scrub readahead threads to idle priority.
      This setting is fixed. This priority has shown the best performance
      during all measurements.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      3d136a11
    • S
      Btrfs: read device stats on mount, write modified ones during commit · 733f4fbb
      Stefan Behrens 提交于
      The device statistics are written into the device tree with each
      transaction commit. Only modified statistics are written.
      When a filesystem is mounted, the device statistics for each involved
      device are read from the device tree and used to initialize the
      counters.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      733f4fbb
    • J
      Btrfs: fix how we deal with the orphan block rsv · 8a35d95f
      Josef Bacik 提交于
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Btrfs: fix how we deal with the orphan block rsv
      
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8a35d95f
    • J
      Btrfs: add btrfs_search_old_slot · 5d9e75c4
      Jan Schmidt 提交于
      The tree modification log together with the current state of the tree gives
      a consistent, old version of the tree. btrfs_search_old_slot is used to
      search through this old version and return old (dummy!) extent buffers.
      Naturally, this function cannot do any tree modifications.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      5d9e75c4
    • J
      Btrfs: add tree modification log functions · bd989ba3
      Jan Schmidt 提交于
      The tree mod log will log modifications made fs-tree nodes. Most
      modifications are done by autobalance of the tree. Such changes are recorded
      as long as a block entry exists. When released, the log is cleaned.
      
      With the tree modification log, it's possible to reconstruct a consistent
      old state of the tree. This is required to do backref walking on a busy
      file system.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      bd989ba3
  14. 26 5月, 2012 3 次提交
  15. 19 4月, 2012 1 次提交
  16. 13 4月, 2012 1 次提交
  17. 28 3月, 2012 1 次提交
  18. 27 3月, 2012 6 次提交