1. 07 5月, 2013 1 次提交
    • J
      Btrfs: add a incompatible format change for smaller metadata extent refs · 3173a18f
      Josef Bacik 提交于
      We currently store the first key of the tree block inside the reference for the
      tree block in the extent tree.  This takes up quite a bit of space.  Make a new
      key type for metadata which holds the level as the offset and completely removes
      storing the btrfs_tree_block_info inside the extent ref.  This reduces the size
      from 51 bytes to 33 bytes per extent reference for each tree block.  In practice
      this results in a 30-35% decrease in the size of our extent tree, which means we
      COW less and can keep more of the extent tree in memory which makes our heavy
      metadata operations go much faster.  This is not an automatic format change, you
      must enable it at mkfs time or with btrfstune.  This patch deals with having
      metadata stored as either the old format or the new format so it is easy to
      convert.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3173a18f
  2. 05 3月, 2013 5 次提交
  3. 20 2月, 2013 1 次提交
  4. 09 1月, 2013 1 次提交
  5. 13 12月, 2012 1 次提交
  6. 12 12月, 2012 2 次提交
    • M
      Btrfs: make delalloc inodes be flushed by multi-task · 8ccf6f19
      Miao Xie 提交于
      This patch introduce a new worker pool named "flush_workers", and if we
      want to force all the inode with pending delalloc to the disks, we can
      queue those inodes into the work queue of the worker pool, in this way,
      those inodes will be flushed by multi-task.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      8ccf6f19
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  7. 09 10月, 2012 1 次提交
    • J
      Btrfs: cache extent state when writing out dirty metadata pages · e6138876
      Josef Bacik 提交于
      Everytime we write out dirty pages we search for an offset in the tree,
      convert the bits in the state, and then when we wait we search for the
      offset again and clear the bits.  So for every dirty range in the io tree we
      are doing 4 rb searches, which is suboptimal.  With this patch we are only
      doing 2 searches for every cycle (modulo weird things happening).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e6138876
  8. 04 10月, 2012 1 次提交
  9. 02 10月, 2012 2 次提交
  10. 31 7月, 2012 1 次提交
  11. 24 7月, 2012 1 次提交
  12. 28 4月, 2012 2 次提交
  13. 22 3月, 2012 6 次提交
  14. 17 1月, 2012 1 次提交
    • J
      Btrfs: add a delalloc mutex to inodes for delalloc reservations · f248679e
      Josef Bacik 提交于
      I was using i_mutex for this, but we're getting bogus lockdep warnings by doing
      that and theres no real way to get rid of those, so just stop using i_mutex to
      protect delalloc metadata reservations and use a delalloc mutex instead.  This
      shouldn't be contended often at all, only if you are writing and mmap writing to
      the file at the same time.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      f248679e
  15. 22 12月, 2011 1 次提交
    • A
      Btrfs: mark delayed refs as for cow · 66d7e7f0
      Arne Jansen 提交于
      Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
      from every call site. The for_cow parameter will later on be used to
      determine if a ref will change anything with respect to qgroups.
      
      Delayed refs coming from relocation are always counted as for_cow, as they
      don't change subvol quota.
      
      Also pass in the fs_info for later use.
      
      btrfs_find_all_roots() will use this as an optimization, as changes that are
      for_cow will not change anything with respect to which root points to a
      certain leaf. Thus, we don't need to add the current sequence number to
      those delayed refs.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      66d7e7f0
  16. 16 12月, 2011 1 次提交
    • J
      Btrfs: fix how we do delalloc reservations and how we free reservations on error · 660d3f6c
      Josef Bacik 提交于
      Running xfstests 269 with some tracing my scripts kept spitting out errors about
      releasing bytes that we didn't actually have reserved.  This took me down a huge
      rabbit hole and it turns out the way we deal with reserved_extents is wrong,
      we need to only be setting it if the reservation succeeds, otherwise the free()
      method will come in and unreserve space that isn't actually reserved yet, which
      can lead to other warnings and such.  The math was all working out right in the
      end, but it caused all sorts of other issues in addition to making my scripts
      yell and scream and generally make it impossible for me to track down the
      original issue I was looking for.  The other problem is with our error handling
      in the reservation code.  There are two cases that we need to deal with
      
      1) We raced with free.  In this case free won't free anything because csum_bytes
      is modified before we dro the lock in our reservation path, so free rightly
      doesn't release any space because the reservation code may be depending on that
      reservation.  However if we fail, we need the reservation side to do the free at
      that point since that space is no longer in use.  So as it stands the code was
      doing this fine and it worked out, except in case #2
      
      2) We don't race with free.  Nobody comes in and changes anything, and our
      reservation fails.  In this case we didn't reserve anything anyway and we just
      need to clean up csum_bytes but not free anything.  So we keep track of
      csum_bytes before we drop the lock and if it hasn't changed we know we can just
      decrement csum_bytes and carry on.
      
      Because of the case where we can race with free()'s since we have to drop our
      spin_lock to do the reservation, I'm going to serialize all reservations with
      the i_mutex.  We already get this for free in the heavy use paths, truncate and
      file write all hold the i_mutex, just needed to add it to page_mkwrite and
      various ioctl/balance things.  With this patch my space leak scripts no longer
      scream bloody murder.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      660d3f6c
  17. 11 11月, 2011 1 次提交
    • M
      Btrfs: fix orphan backref nodes · 76b9e23d
      Miao Xie 提交于
      If the root node of a fs/file tree is in the block group that is
      being relocated, but the others are not in the other block groups.
      when we create a snapshot for this tree between the relocation tree
      creation ends and ->create_reloc_tree is set to 0, Btrfs will create
      some backref nodes that are the lowest nodes of the backrefs cache.
      But we forget to add them into ->leaves list of the backref cache
      and deal with them, and at last, they will triggered BUG_ON().
      
        kernel BUG at fs/btrfs/relocation.c:239!
      
      This patch fixes it by adding them into ->leaves list of backref cache.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      76b9e23d
  18. 21 10月, 2011 1 次提交
  19. 20 10月, 2011 6 次提交
  20. 28 7月, 2011 1 次提交
  21. 18 6月, 2011 1 次提交
    • C
      Btrfs: fix relocation races · 7585717f
      Chris Mason 提交于
      The recent commit to get rid of our trans_mutex introduced
      some races with block group relocation.  The problem is that relocation
      needs to do some record keeping about each root, and it was relying
      on the transaction mutex to coordinate things in subtle ways.
      
      This fix adds a mutex just for the relocation code and makes sure
      it doesn't have a big impact on normal operations.  The race is
      really fixed in btrfs_record_root_in_trans, which is where we
      step back and wait for the relocation code to finish accounting
      setup.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7585717f
  22. 24 5月, 2011 2 次提交
    • J
      Btrfs: don't always do readahead · 026fd317
      Josef Bacik 提交于
      Our readahead is sort of sloppy, and really isn't always needed.  For example if
      ls is doing a stating ls (which is the default) it's going to stat in non-disk
      order, so if say you have a directory with a stupid amount of files, readahead
      is going to do nothing but waste time in the case of doing the stat.  Taking the
      unconditional readahead out made my test go from 57 minutes to 36 minutes.  This
      means that everywhere we do loop through the tree we want to make sure we do set
      path->reada properly, so I went through and found all of the places where we
      loop through the path and set reada to 1.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      026fd317
    • J
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik 提交于
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a4abeea4