1. 01 9月, 2013 1 次提交
  2. 10 8月, 2013 1 次提交
  3. 02 7月, 2013 1 次提交
    • J
      Btrfs: make the chunk allocator completely tree lockless · 6df9a95e
      Josef Bacik 提交于
      When adjusting the enospc rules for relocation I ran into a deadlock because we
      were relocating the only system chunk and that forced us to try and allocate a
      new system chunk while holding locks in the chunk tree, which caused us to
      deadlock.  To fix this I've moved all of the dev extent addition and chunk
      addition out to the delayed chunk completion stuff.  We still keep the in-memory
      stuff which makes sure everything is consistent.
      
      One change I had to make was to search the commit root of the device tree to
      find a free dev extent, and hold onto any chunk em's that we allocated in that
      transaction so we do not allocate the same dev extent twice.  This has the side
      effect of fixing a bug with balance that has been there ever since balance
      existed.  Basically you can free a block group and it's dev extent and then
      immediately allocate that dev extent for a new block group and write stuff to
      that dev extent, all within the same transaction.  So if you happen to crash
      during a balance you could come back to a completely broken file system.  This
      patch should keep these sort of things from happening in the future since we
      won't be able to allocate free'd dev extents until after the transaction
      commits.  This has passed all of the xfstests and my super annoying stress test
      followed by a balance.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df9a95e
  4. 14 6月, 2013 3 次提交
    • M
      Btrfs: make the state of the transaction more readable · 4a9d8bde
      Miao Xie 提交于
      We used 3 variants to track the state of the transaction, it was complex
      and wasted the memory space. Besides that, it was hard to understand that
      which types of the transaction handles should be blocked in each transaction
      state, so the developers often made mistakes.
      
      This patch improved the above problem. In this patch, we define 6 states
      for the transaction,
        enum btrfs_trans_state {
      	TRANS_STATE_RUNNING		= 0,
      	TRANS_STATE_BLOCKED		= 1,
      	TRANS_STATE_COMMIT_START	= 2,
      	TRANS_STATE_COMMIT_DOING	= 3,
      	TRANS_STATE_UNBLOCKED		= 4,
      	TRANS_STATE_COMPLETED		= 5,
      	TRANS_STATE_MAX			= 6,
        }
      and just use 1 variant to track those state.
      
      In order to make the blocked handle types for each state more clear,
      we introduce a array:
        unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
      	[TRANS_STATE_RUNNING]		= 0U,
      	[TRANS_STATE_BLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START),
      	[TRANS_STATE_COMMIT_START]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH),
      	[TRANS_STATE_COMMIT_DOING]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN),
      	[TRANS_STATE_UNBLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
      	[TRANS_STATE_COMPLETED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
        }
      it is very intuitionistic.
      
      Besides that, because we remove ->in_commit in transaction structure, so
      the lock ->commit_lock which was used to protect it is unnecessary, remove
      ->commit_lock.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4a9d8bde
    • M
      Btrfs: remove unnecessary varient ->num_joined in btrfs_transaction structure · 3f1e3fa6
      Miao Xie 提交于
      We used ->num_joined track if there were some writers which join the current
      transaction when the committer was sleeping. If some writers joined the current
      transaction, we has to continue the while loop to do some necessary stuff, such
      as flush the ordered operations. But it is unnecessary because we will do it
      after the while loop.
      
      Besides that, tracking ->num_joined would make the committer drop into the while
      loop when there are lots of internal writers(TRANS_JOIN).
      
      So we remove ->num_joined and don't track if there are some writers which join
      the current transaction when the committer is sleeping.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3f1e3fa6
    • M
      Btrfs: don't wait for all the writers circularly during the transaction commit · 0860adfd
      Miao Xie 提交于
      btrfs_commit_transaction has the following loop before we commit the
      transaction.
      
      do {
          // attempt to do some useful stuff and/or sleep
      } while (atomic_read(&cur_trans->num_writers) > 1 ||
      	 (should_grow && cur_trans->num_joined != joined));
      
      This is used to prevent from the TRANS_START to get in the way of a
      committing transaction. But it does not prevent from TRANS_JOIN, that
      is we would do this loop for a long time if some writers JOIN the
      current transaction endlessly.
      
      Because we need join the current transaction to do some useful stuff,
      we can not block TRANS_JOIN here. So we introduce a external writer
      counter, which is used to count the TRANS_USERSPACE/TRANS_START writers.
      If the external writer counter is zero, we can break the above loop.
      
      In order to make the code more clear, we don't use enum variant
      to define the type of the transaction handle, use bitmask instead.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0860adfd
  5. 07 5月, 2013 2 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • D
      btrfs: clean snapshots one by one · 9d1a2a3a
      David Sterba 提交于
      Each time pick one dead root from the list and let the caller know if
      it's needed to continue. This should improve responsiveness during
      umount and balance which at some point waits for cleaning all currently
      queued dead roots.
      
      A new dead root is added to the end of the list, so the snapshots
      disappear in the order of deletion.
      
      The snapshot cleaning work is now done only from the cleaner thread and the
      others wake it if needed.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9d1a2a3a
  6. 01 3月, 2013 2 次提交
  7. 21 2月, 2013 3 次提交
    • M
      Btrfs: fix uncompleted transaction · d4edf39b
      Miao Xie 提交于
      In some cases, we need commit the current transaction, but don't want
      to start a new one if there is no running transaction, so we introduce
      the function - btrfs_attach_transaction(), which can catch the current
      transaction, and return -ENOENT if there is no running transaction.
      
      But no running transaction doesn't mean the current transction completely,
      because we removed the running transaction before it completes. In some
      cases, it doesn't matter. But in some special cases, such as freeze fs, we
      hope the transaction is fully on disk, it will introduce some bugs, for
      example, we may feeze the fs and dump the data in the disk, if the transction
      doesn't complete, we would dump inconsistent data. So we need fix the above
      problem for those cases.
      
      We fixes this problem by introducing a function:
      	btrfs_attach_transaction_barrier()
      if we hope all the transaction is fully on the disk, even they are not
      running, we can use this function.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d4edf39b
    • J
      Btrfs: place ordered operations on a per transaction list · 569e0f35
      Josef Bacik 提交于
      Miao made the ordered operations stuff run async, which introduced a
      deadlock where we could get somebody (sync) racing in and committing the
      transaction while a commit was already happening.  The new committer would
      try and flush ordered operations which would hang waiting for the commit to
      finish because it is done asynchronously and no longer inherits the callers
      trans handle.  To fix this we need to make the ordered operations list a per
      transaction list.  We can get new inodes added to the ordered operation list
      by truncating them and then having another process writing to them, so this
      makes it so that anybody trying to add an ordered operation _must_ start a
      transaction in order to add itself to the list, which will keep new inodes
      from getting added to the ordered operations list after we start committing.
      This should fix the deadlock and also keeps us from doing a lot more work
      than we need to during commit.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      569e0f35
    • E
      btrfs: remove cache only arguments from defrag path · de78b51a
      Eric Sandeen 提交于
      The entry point at the defrag ioctl always sets "cache only" to 0;
      the codepaths haven't run for a long time as far as I can
      tell.  Chris says they're dead code, so remove them.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      de78b51a
  8. 20 2月, 2013 1 次提交
    • J
      Btrfs: don't re-enter when allocating a chunk · c6b305a8
      Josef Bacik 提交于
      If we start running low on metadata space we will try to allocate a chunk,
      which could then try to allocate a chunk to add the device entry.  The thing
      is we allocate a chunk before we try really hard to make the allocation, so
      we should be able to find space for the device entry.  Add a flag to the
      trans handle so we know we're currently allocating a chunk so we can just
      bail out if we try to allocate another chunk.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c6b305a8
  9. 12 12月, 2012 1 次提交
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  10. 09 10月, 2012 2 次提交
    • M
      Btrfs: fix orphan transaction on the freezed filesystem · 354aa0fb
      Miao Xie 提交于
      With the following debug patch:
      
       static int btrfs_freeze(struct super_block *sb)
       {
      + 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
      +	struct btrfs_transaction *trans;
      +
      +	spin_lock(&fs_info->trans_lock);
      +	trans = fs_info->running_transaction;
      +	if (trans) {
      +		printk("Transid %llu, use_count %d, num_writer %d\n",
      +			trans->transid, atomic_read(&trans->use_count),
      +			atomic_read(&trans->num_writers));
      +	}
      +	spin_unlock(&fs_info->trans_lock);
       	return 0;
       }
      
      I found there was a orphan transaction after the freeze operation was done.
      
      It is because the transaction may not be committed when the transaction handle
      end even though it is the last handle of the current transaction. This design
      avoid committing the transaction frequently, but also introduce the above
      problem.
      
      So I add btrfs_attach_transaction() which can catch the current transaction
      and commit it. If there is no transaction, it will return ENOENT, and do not
      anything.
      
      This function also can be used to instead of btrfs_join_transaction_freeze()
      because it don't increase the writer counter and don't start a new transaction,
      so it also can fix the deadlock between sync and freeze.
      
      Besides that, it is used to instead of btrfs_join_transaction() in
      transaction_kthread(), because if there is no transaction, the transaction
      kthread needn't anything.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      354aa0fb
    • M
      Btrfs: add a type field for the transaction handle · a698d075
      Miao Xie 提交于
      This patch add a type field into the transaction handle structure,
      in this way, we needn't implement various end-transaction functions
      and can make the code more simple and readable.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      a698d075
  11. 04 10月, 2012 1 次提交
    • J
      Btrfs: fix race in sync and freeze again · 60376ce4
      Josef Bacik 提交于
      I screwed this up, there is a race between checking if there is a running
      transaction and actually starting a transaction in sync where we could race
      with a freezer and get ourselves into trouble.  To fix this we need to make
      a new join type to only do the try lock on the freeze stuff.  If it fails
      we'll return EPERM and just return from sync.  This fixes a hang Liu Bo
      reported when running xfstest 68 in a loop.  Thanks,
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      60376ce4
  12. 02 10月, 2012 3 次提交
    • J
      Btrfs: delay block group item insertion · ea658bad
      Josef Bacik 提交于
      So we have lots of places where we try to preallocate chunks in order to
      make sure we have enough space as we make our allocations.  This has
      historically meant that we're constantly tweaking when we should allocate a
      new chunk, and historically we have gotten this horribly wrong so we way
      over allocate either metadata or data.  To try and keep this from happening
      we are going to make it so that the block group item insertion is done out
      of band at the end of a transaction.  This will allow us to create chunks
      even if we are trying to make an allocation for the extent tree.  With this
      patch my enospc tests run faster (didn't expect this) and more efficiently
      use the disk space (this is what I wanted).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ea658bad
    • M
      Btrfs: fix corrupted metadata in the snapshot · 8407aa46
      Miao Xie 提交于
      When we delete a inode, we will remove all the delayed items including delayed
      inode update, and then truncate all the relative metadata. If there is lots of
      metadata, we will end the current transaction, and start a new transaction to
      truncate the left metadata. In this way, we will leave a inode item that its
      link counter is > 0, and also may leave some directory index items in fs/file tree
      after the current transaction ends. In other words, the metadata in this fs/file tree
      is inconsistent. If we create a snapshot for this tree now, we will find a inode with
      corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
      because its link counter is not 0.
      
      We fix this problem by updating the inode item before the current transaction ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      8407aa46
    • L
      Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The current btrfs checks if an inode is in log by comparing
      root's last_log_commit to inode's last_sub_trans[2].
      
      But the problem is that this root->last_log_commit is shared among
      inodes.
      
      Say we have N inodes to be logged, after the first inode,
      root's last_log_commit is updated and the N-1 remained files will
      be skipped.
      
      This fixes the bug by keeping a local copy of root's last_log_commit
      inside each inode and this local copy will be maintained itself.
      
      [1]: we regard each log transaction as a subset of btrfs's transaction,
      i.e. sub_trans
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      46d8bc34
  13. 24 7月, 2012 1 次提交
    • J
      Btrfs: change how we indicate we're adding csums · 0e721106
      Josef Bacik 提交于
      There is weird logic I had to put in place to make sure that when we were
      adding csums that we'd used the delalloc block rsv instead of the global
      block rsv.  Part of this meant that we had to free up our transaction
      reservation before we ran the delayed refs since csum deletion happens
      during the delayed ref work.  The problem with this is that when we release
      a reservation we will add it to the global reserve if it is not full in
      order to keep us going along longer before we have to force a transaction
      commit.  By releasing our reservation before we run delayed refs we don't
      get the opportunity to drain down the global reserve for the work we did, so
      we won't refill it as often.  This isn't a problem per-se, it just results
      in us possibly committing transactions more and more often, and in rare
      cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
      out of space in our block rsv.
      
      This also helps us by holding onto space while the delayed refs run so we
      don't end up with as many people trying to do things at the same time, which
      again will help us not force commits or hit the use_block_rsv warnings.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0e721106
  14. 12 7月, 2012 3 次提交
  15. 10 7月, 2012 1 次提交
  16. 22 3月, 2012 1 次提交
  17. 24 5月, 2011 4 次提交
    • J
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik 提交于
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d82a6f1d
    • J
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik 提交于
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a4abeea4
    • J
      Btrfs: if we've already started a trans handle, use that one · 2a1eb461
      Josef Bacik 提交于
      We currently track trans handles in current->journal_info, but we don't actually
      use it.  This patch fixes it.  This will cover the case where we have multiple
      people starting transactions down the call chain.  This keeps us from having to
      allocate a new handle and all of that, we just increase the use count of the
      current handle, save the old block_rsv, and return.  I tested this with xfstests
      and it worked out fine.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      2a1eb461
    • J
      Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40
      Josef Bacik 提交于
      I keep forgetting that btrfs_join_transaction() just ignores the num_items
      argument, which leads me to sending pointless patches and looking stupid :).  So
      just kill the num_items argument from btrfs_join_transaction and
      btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7a7eaa40
  18. 21 5月, 2011 1 次提交
    • M
      btrfs: implement delayed inode items operation · 16cdcec7
      Miao Xie 提交于
      Changelog V5 -> V6:
      - Fix oom when the memory load is high, by storing the delayed nodes into the
        root's radix tree, and letting btrfs inodes go.
      
      Changelog V4 -> V5:
      - Fix the race on adding the delayed node to the inode, which is spotted by
        Chris Mason.
      - Merge Chris Mason's incremental patch into this patch.
      - Fix deadlock between readdir() and memory fault, which is reported by
        Itaru Kitayama.
      
      Changelog V3 -> V4:
      - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
        inode in time.
      
      Changelog V2 -> V3:
      - Fix the race between the delayed worker and the task which does delayed items
        balance, which is reported by Tsutomu Itoh.
      - Modify the patch address David Sterba's comment.
      - Fix the bug of the cpu recursion spinlock, reported by Chris Mason
      
      Changelog V1 -> V2:
      - break up the global rb-tree, use a list to manage the delayed nodes,
        which is created for every directory and file, and used to manage the
        delayed directory name index items and the delayed inode item.
      - introduce a worker to deal with the delayed nodes.
      
      Compare with Ext3/4, the performance of file creation and deletion on btrfs
      is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
      such as inode item, directory name item, directory name index and so on.
      
      If we can do some delayed b+ tree insertion or deletion, we can improve the
      performance, so we made this patch which implemented delayed directory name
      index insertion/deletion and delayed inode update.
      
      Implementation:
      - introduce a delayed root object into the filesystem, that use two lists to
        manage the delayed nodes which are created for every file/directory.
        One is used to manage all the delayed nodes that have delayed items. And the
        other is used to manage the delayed nodes which is waiting to be dealt with
        by the work thread.
      - Every delayed node has two rb-tree, one is used to manage the directory name
        index which is going to be inserted into b+ tree, and the other is used to
        manage the directory name index which is going to be deleted from b+ tree.
      - introduce a worker to deal with the delayed operation. This worker is used
        to deal with the works of the delayed directory name index items insertion
        and deletion and the delayed inode update.
        When the delayed items is beyond the lower limit, we create works for some
        delayed nodes and insert them into the work queue of the worker, and then
        go back.
        When the delayed items is beyond the upper bound, we create works for all
        the delayed nodes that haven't been dealt with, and insert them into the work
        queue of the worker, and then wait for that the untreated items is below some
        threshold value.
      - When we want to insert a directory name index into b+ tree, we just add the
        information into the delayed inserting rb-tree.
        And then we check the number of the delayed items and do delayed items
        balance. (The balance policy is above.)
      - When we want to delete a directory name index from the b+ tree, we search it
        in the inserting rb-tree at first. If we look it up, just drop it. If not,
        add the key of it into the delayed deleting rb-tree.
        Similar to the delayed inserting rb-tree, we also check the number of the
        delayed items and do delayed items balance.
        (The same to inserting manipulation)
      - When we want to update the metadata of some inode, we cached the data of the
        inode into the delayed node. the worker will flush it into the b+ tree after
        dealing with the delayed insertion and deletion.
      - We will move the delayed node to the tail of the list after we access the
        delayed node, By this way, we can cache more delayed items and merge more
        inode updates.
      - If we want to commit transaction, we will deal with all the delayed node.
      - the delayed node will be freed when we free the btrfs inode.
      - Before we log the inode items, we commit all the directory name index items
        and the delayed inode update.
      
      I did a quick test by the benchmark tool[1] and found we can improve the
      performance of file creation by ~15%, and file deletion by ~20%.
      
      Before applying this patch:
      Create files:
              Total files: 50000
              Total time: 1.096108
              Average time: 0.000022
      Delete files:
              Total files: 50000
              Total time: 1.510403
              Average time: 0.000030
      
      After applying this patch:
      Create files:
              Total files: 50000
              Total time: 0.932899
              Average time: 0.000019
      Delete files:
              Total files: 50000
              Total time: 1.215732
              Average time: 0.000024
      
      [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
      
      Many thanks for Kitayama-san's help!
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      16cdcec7
  19. 04 5月, 2011 1 次提交
  20. 12 4月, 2011 1 次提交
    • J
      Btrfs: avoid taking the trans_mutex in btrfs_end_transaction · 13c5a93e
      Josef Bacik 提交于
      I've been working on making our O_DIRECT latency not suck and I noticed we were
      taking the trans_mutex in btrfs_end_transaction.  So to do this we convert
      num_writers and use_count to atomic_t's and just decrement them in
      btrfs_end_transaction.  Instead of deleting the transaction from the trans list
      in put_transaction we do that in btrfs_commit_transaction() since that's the
      only time it actually needs to be removed from the list.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      13c5a93e
  21. 23 12月, 2010 1 次提交
    • L
      Btrfs: Add readonly snapshots support · b83cc969
      Li Zefan 提交于
      Usage:
      
      Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
      ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).
      
      Implementation:
      
      - Set readonly bit of btrfs_root_item->flags.
      - Add readonly checks in btrfs_permission (inode_permission),
      btrfs_setattr, btrfs_set/remove_xattr and some ioctls.
      
      Changelog for v3:
      
      - Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags.
      - Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      b83cc969
  22. 30 10月, 2010 2 次提交
    • S
      Btrfs: add START_SYNC, WAIT_SYNC ioctls · 46204592
      Sage Weil 提交于
      START_SYNC will start a sync/commit, but not wait for it to
      complete.  Any modification started after the ioctl returns is
      guaranteed not to be included in the commit.  If a non-NULL
      pointer is passed, the transaction id will be returned to
      userspace.
      
      WAIT_SYNC will wait for any in-progress commit to complete.  If a
      transaction id is specified, the ioctl will block and then
      return (success) when the specified transaction has committed.
      If it has already committed when we call the ioctl, it returns
      immediately.  If the specified transaction doesn't exist, it
      returns EINVAL.
      
      If no transaction id is specified, WAIT_SYNC will wait for the
      currently committing transaction to finish it's commit to disk.
      If there is no currently committing transaction, it returns
      success.
      
      These ioctls are useful for applications which want to impose an
      ordering on when fs modifications reach disk, but do not want to
      wait for the full (slow) commit process to do so.
      
      Picky callers can take the transid returned by START_SYNC and
      feed it to WAIT_SYNC, and be certain to wait only as long as
      necessary for the transaction _they_ started to reach disk.
      
      Sloppy callers can START_SYNC and WAIT_SYNC without a transid,
      and provided they didn't wait too long between the calls, they
      will get the same result.  However, if a second commit starts
      before they call WAIT_SYNC, they may end up waiting longer for
      it to commit as well.  Even so, a START_SYNC+WAIT_SYNC still
      guarantees that any operation completed before the START_SYNC
      reaches disk.
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      46204592
    • S
      Btrfs: async transaction commit · bb9c12c9
      Sage Weil 提交于
      Add support for an async transaction commit that is ordered such that any
      subsequent operations will join the following transaction, but does not
      wait until the current commit is fully on disk.  This avoids much of the
      latency associated with the btrfs_commit_transaction for callers concerned
      with serialization and not safety.
      
      The wait_for_unblock flag controls whether we wait for the 'middle' portion
      of commit_transaction to complete, which is necessary if the caller expects
      some of the modifications contained in the commit to be available (this is
      the case for subvol/snapshot creation).
      Signed-off-by: NSage Weil <sage@newdream.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bb9c12c9
  23. 29 10月, 2010 1 次提交
    • J
      Btrfs: create special free space cache inode · 0af3d00b
      Josef Bacik 提交于
      In order to save free space cache, we need an inode to hold the data, and we
      need a special item to point at the right inode for the right block group.  So
      first, create a special item that will point to the right inode, and the number
      of extent entries we will have and the number of bitmaps we will have.  We
      truncate and pre-allocate space everytime to make sure it's uptodate.
      
      This feature will be turned on as soon as you mount with -o space_cache, however
      it is safe to boot into old kernels, they will just generate the cache the old
      fashion way.  When you boot back into a newer kernel we will notice that we
      modified and not the cache and automatically discard the cache.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0af3d00b
  24. 25 5月, 2010 2 次提交