提交 · afa427cf9d6ef64e73df68882cbabde0e6a61639 · openeuler / raspberrypi-kernel

07 12月, 2015 1 次提交
- D
  btrfs: remove a trivial helper btrfs_set_buffer_uptodate · 4db8c528
  由 David Sterba 提交于 12月 03, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  4db8c528
01 10月, 2015 1 次提交

Btrfs: add btrfs_read_dev_one_super() to read one specific SB · 29c36d72

由 Anand Jain 提交于 8月 14, 2015

This uses a chunk of code from btrfs_read_dev_super() and creates
a function called btrfs_read_dev_one_super() so that next patch
can use it for scratch superblock.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
[renamed bufhead to bh]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

29c36d72

01 9月, 2015 1 次提交

btrfs: Add raid56 support for updating · 943c6e99

由 Zhao Lei 提交于 8月 19, 2015

 num_tolerated_disk_barrier_failures in btrfs_balance

Code for updating fs_info->num_tolerated_disk_barrier_failures in
btrfs_balance() lacks raid56 support.

Reason:
 Above code was wroten in 2012-08-01, together with
 btrfs_calc_num_tolerated_disk_barrier_failures()'s first version.

 Then, btrfs_calc_num_tolerated_disk_barrier_failures() got updated
 later to support raid56, but code in btrfs_balance() was not
 updated together.

Fix:
 Merge above similar code to a common function:
 btrfs_get_num_tolerated_disk_barrier_failures()
 and make it support both case.

 It can fix this bug with a bonus of cleanup, and make these code
 never in above no-sync state from now on.
Suggested-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

943c6e99

17 2月, 2015 1 次提交

Btrfs: disk-io: replace root args iff only fs_info used · 01d58472

由 Daniel Dressler 提交于 11月 21, 2014

This is the 3rd independent patch of a larger project to cleanup btrfs's
internal usage of btrfs_root. Many functions take btrfs_root only to
grab the fs_info struct.

By requiring a root these functions cause programmer overhead. That
these functions can accept any valid root is not obvious until
inspection.

This patch reduces the specificity of such functions to accept the
fs_info directly.

These patches can be applied independently and thus are not being
submitted as a patch series. There should be about 26 patches by the
project's completion. Each patch will cleanup between 1 and 34 functions
apiece.  Each patch covers a single file's functions.

This patch affects the following function(s):
  1) csum_tree_block
  2) csum_dirty_buffer
  3) check_tree_block_fsid
  4) btrfs_find_tree_block
  5) clean_tree_block
Signed-off-by: NDaniel Dressler <danieru.dressler@gmail.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

01d58472

13 12月, 2014 3 次提交

btrfs: sink blocksize parameter to btrfs_find_create_tree_block · a83fffb7

由 David Sterba 提交于 6月 15, 2014

Finally it's clear that the requested blocksize is always equal to
nodesize, with one exception, the superblock.

Superblock has fixed size regardless of the metadata block size, but
uses the same helpers to initialize sys array/chunk tree and to work
with the chunk items. So it pretends to be an extent_buffer for a
moment, btrfs_read_sys_array is full of special cases, we're adding one
more.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

a83fffb7

D
btrfs: sink blocksize parameter to reada_tree_block_flagged · c0dcaa4d
由 David Sterba 提交于 6月 15, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
c0dcaa4d
D
btrfs: sink blocksize parameter to readahead_tree_block · d3e46fea
由 David Sterba 提交于 6月 15, 2014
```
All callers pass nodesize.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
d3e46fea

02 10月, 2014 6 次提交

btrfs: use slab for end_io_wq structures · 97eb6b69

由 David Sterba 提交于 7月 30, 2014

The structure is frequently reused.  Rename it according to the slab
name.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

97eb6b69

D
btrfs: use enum for wq endio metadata type · bfebd8b5
由 David Sterba 提交于 7月 30, 2014
```
The enum exists but is not consistently used.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
bfebd8b5
D
btrfs: remove unused parameter blocksize from btrfs_find_tree_block · 0308af44
由 David Sterba 提交于 6月 15, 2014
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
0308af44
D
btrfs: remove parameter blocksize from read_tree_block · ce86cd59
由 David Sterba 提交于 6月 15, 2014
```
We know the tree block size, no need to pass it around.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
ce86cd59

btrfs: return void from readahead_tree_block · 6197d86e

由 David Sterba 提交于 6月 15, 2014

Errors in readahead are not fatal and ignored elsewhere in the code.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6197d86e

btrfs: remove unused parameter from readahead_tree_block · 58dc4ce4

由 David Sterba 提交于 6月 15, 2014

The parent_transid parameter has been unused since its introduction in
ca7a79ad ("Pass down the expected generation number when reading
tree blocks"). In reada_tree_block, it was even wrongly set to leafsize.
Transid check is done in the proper read and readahead ignores errors.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

58dc4ce4

18 9月, 2014 2 次提交

Btrfs: implement repair function when direct read fails · 8b110e39

由 Miao Xie 提交于 9月 12, 2014

This patch implement data repair function when direct read fails.

The detail of the implementation is:
- When we find the data is not right, we try to read the data from the other
  mirror.
- When the io on the mirror ends, we will insert the endio work into the
  dedicated btrfs workqueue, not common read endio workqueue, because the
  original endio work is still blocked in the btrfs endio workqueue, if we
  insert the endio work of the io on the mirror into that workqueue, deadlock
  would happen.
- After we get right data, we write it back to the corrupted mirror.
- And if the data on the new mirror is still corrupted, we will try next
  mirror until we read right data or all the mirrors are traversed.
- After the above work, we set the uptodate flag according to the result.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

8b110e39

btrfs: make close_ctree return void · 3abdbd78

由 David Sterba 提交于 6月 04, 2014

There's no user of the return value and we can get rid of the comment in
put_super.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

3abdbd78

10 6月, 2014 1 次提交

Btrfs: add sanity tests for new qgroup accounting code · faa2dbf0

由 Josef Bacik 提交于 5月 07, 2014

This exercises the various parts of the new qgroup accounting code. We do some
basic stuff and do some things with the shared refs to make sure all that code
works. I had to add a bunch of infrastructure because I needed to be able to
insert items into a fake tree without having to do all the hard work myself,
hopefully this will be usefull in the future. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

faa2dbf0

12 11月, 2013 1 次提交

Btrfs: add a sanity test for btrfs_split_item · 06ea65a3

由 Josef Bacik 提交于 9月 19, 2013

While looking at somebodys corruption I became completely convinced that
btrfs_split_item was broken, so I wrote this test to verify that it was working
as it was supposed to. Thankfully it appears to be working as intended, so just
add this test to make sure nobody breaks it in the future. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

06ea65a3

11 10月, 2013 1 次提交

Btrfs: fix oops caused by the space balance and dead roots · c00869f1

由 Miao Xie 提交于 9月 25, 2013

When doing space balance and subvolume destroy at the same time, we met
the following oops:

kernel BUG at fs/btrfs/relocation.c:2247!
RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
Call Trace:
 [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
 [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
 [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
 [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
 [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
 [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
 [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
 [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
 [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
 [<ffffffff81122f53>] ? path_openat+0x234/0x4db
 [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
 [<ffffffff810f8ab6>] ? vma_link+0x74/0x94
 [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
 [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
 [<ffffffff811259d4>] SyS_ioctl+0x57/0x83
 [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
 [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b

It is because we returned the error number if the reference of the root was 0
when doing space relocation. It was not right here, because though the root
was dead(refs == 0), but the space it held still need be relocated, or we
could not remove the block group. So in this case, we should return the root
no matter it is dead or not.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

c00869f1

14 6月, 2013 2 次提交

Btrfs: introduce grab/put functions for the root of the fs/file tree · b0feb9d9

由 Miao Xie 提交于 5月 15, 2013

The grab/put funtions will be used in the next patch, which need grab
the root object and ensure it is not freed. We use reference counter
instead of the srcu lock is to aovid blocking the memory reclaim task,
which invokes synchronize_srcu().
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b0feb9d9

Btrfs: cleanup the similar code of the fs root read · cb517eab

由 Miao Xie 提交于 5月 15, 2013

There are several functions whose code is similar, such as
  btrfs_find_last_root()
  btrfs_read_fs_root_no_radix()

Besides that, some functions are invoked twice, it is unnecessary,
for example, we are sure that all roots which is found in
  btrfs_find_orphan_roots()
have their orphan items, so it is unnecessary to check the orphan
item again.

So cleanup it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

cb517eab

07 5月, 2013 2 次提交

btrfs: make static code static & remove dead code · 48a3b636

由 Eric Sandeen 提交于 4月 25, 2013

Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

48a3b636

Btrfs: cleanup unused arguments of btrfs_csum_data · b0496686

由 Liu Bo 提交于 3月 14, 2013

Argument 'root' is no more used in btrfs_csum_data().
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b0496686

30 4月, 2013 1 次提交

Btrfs: cleanup unused function · e75206cf

由 Liu Bo 提交于 3月 06, 2013

btrfs_abort_devices() is no more used.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e75206cf

02 2月, 2013 1 次提交

Btrfs: RAID5 and RAID6 · 53b381b3

由 David Woodhouse 提交于 1月 29, 2013

This builds on David Woodhouse's original Btrfs raid5/6 implementation.
The code has changed quite a bit, blame Chris Mason for any bugs.

Read/modify/write is done after the higher levels of the filesystem have
prepared a given bio.  This means the higher layers are not responsible
for building full stripes, and they don't need to query for the topology
of the extents that may get allocated during delayed allocation runs.
It also means different files can easily share the same stripe.

But, it does expose us to incorrect parity if we crash or lose power
while doing a read/modify/write cycle.  This will be addressed in a
later commit.

Scrub is unable to repair crc errors on raid5/6 chunks.

Discard does not work on raid5/6 (yet)

The stripe size is fixed at 64KiB per disk.  This will be tunable
in a later commit.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

53b381b3

13 12月, 2012 1 次提交

Btrfs: cleanup for btrfs_btree_balance_dirty · b53d3f5d

由 Liu Bo 提交于 11月 14, 2012

- 'nr' is no more used.
- btrfs_btree_balance_dirty() and __btrfs_btree_balance_dirty() can share
  a bunch of code.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b53d3f5d

09 10月, 2012 1 次提交

Btrfs: make filesystem read-only when submitting barrier fails · 5af3e8cc

由 Stefan Behrens 提交于 8月 01, 2012

So far the return code of barrier_all_devices() is ignored, which
means that errors are ignored. The result can be a corrupt
filesystem which is not consistent.
This commit adds code to evaluate the return code of
barrier_all_devices(). The normal btrfs_error() mechanism is used to
switch the filesystem into read-only mode when errors are detected.

In order to decide whether barrier_all_devices() should return
error or success, the number of disks that are allowed to fail the
barrier submission is calculated. This calculation accounts for the
worst RAID level of metadata, system and data. If single, dup or
RAID0 is in use, a single disk error is already considered to be
fatal. Otherwise a single disk error is tolerated.

The calculation of the number of disks that are tolerated to fail
the barrier operation is performed when the filesystem gets mounted,
when a balance operation is started and finished, and when devices
are added or removed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

5af3e8cc

29 8月, 2012 1 次提交

Btrfs: remove superblock writing after fatal error · 68ce9682

由 Stefan Behrens 提交于 8月 01, 2012

With commit acce952b, btrfs was changed to flag the filesystem with
BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal
error happened like a write I/O errors of all mirrors.
In such situations, on unmount, the superblock is written in
btrfs_error_commit_super(). This is done with the intention to be able
to evaluate the error flag on the next mount. A warning is printed
in this case during the next mount and the log tree is ignored.

The issue is that it is possible that the superblock points to a root
that was not written (due to write I/O errors).
The result is that the filesystem cannot be mounted. btrfsck also does
not start and all the other btrfs-progs tools fail to start as well.
However, mount -o recovery is working well and does the right things
to recover the filesystem (i.e., don't use the log root, clear the
free space cache and use the next mountable root that is stored in the
root backup array).

This patch removes the writing of the superblock when
BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error
flag in the mount function.

These lines can be used to reproduce the issue (using /dev/sdm):
SCRATCH_DEV=/dev/sdm
SCRATCH_MNT=/mnt
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo
ls -alLF /dev/mapper/foo
mkfs.btrfs /dev/mapper/foo
mount /dev/mapper/foo $SCRATCH_MNT
echo bar > $SCRATCH_MNT/foo
sync
echo 0 25165824 error | dmsetup reload foo
dmsetup resume foo
ls -alF $SCRATCH_MNT
touch $SCRATCH_MNT/1
ls -alF $SCRATCH_MNT
sleep 35
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo
dmsetup resume foo
sleep 1
umount $SCRATCH_MNT
btrfsck /dev/mapper/foo
dmsetup remove foo
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

68ce9682

10 7月, 2012 1 次提交

Btrfs: added helper to create new trees · 20897f5c

由 Arne Jansen 提交于 9月 13, 2011

This creates a brand new tree. Will be used to create
the quota tree.
Signed-off-by: NArne Jansen <sensille@gmx.net>

20897f5c

30 5月, 2012 1 次提交

btrfs: Drop unused function btrfs_abort_devices() · d07eb911

由 Asias He 提交于 5月 25, 2012

1) This function is not used anywhere.

2) Using the blk_abort_queue() to abort the queue seems not correct.
blk_abort_queue() is used for timeout handling (block/blk-timeout.c).

Cc: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NAsias He <asias@redhat.com>

d07eb911

06 5月, 2012 1 次提交

Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919

由 Chris Mason 提交于 5月 06, 2012

verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.

But, a few callers are using it with spinlocks held.  Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case.  This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b9fab919

22 3月, 2012 3 次提交

J
btrfs: enhance transaction abort infrastructure · 49b25e05
由 Jeff Mahoney 提交于 3月 01, 2012
```
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
```
49b25e05
J
btrfs: return void in functions without error conditions · 143bede5
由 Jeff Mahoney 提交于 3月 01, 2012
```
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
```
143bede5

btrfs: clean_tree_block should panic on observed memory corruption and return void · d5c13f92

由 Jeff Mahoney 提交于 3月 01, 2012

The only error condition in clean_tree_block is an accounting bug.
Returning without modifying dirty_metadata_bytes and as if the cleaning
as been performed may cause problems later so it should panic instead.

It should probably be a BUG_ON but we have btrfs_panic now.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

d5c13f92

09 1月, 2012 3 次提交

A
btrfs: take allocation of ->tree_root into open_ctree() · f84a8bd6
由 Al Viro 提交于 11月 17, 2011
```
now that we don't need it for sget() anymore...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f84a8bd6

btrfs: make open_ctree() return int · ad2b2c80

由 Al Viro 提交于 11月 17, 2011

It returns either ERR_PTR(-ve) or sb->s_fs_info.  The latter can
be found by caller just as well, TYVM, no need to return it.  Just
return -ve or 0...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ad2b2c80

btrfs: sanitizing ->fs_info, part 4 · 6f07e42e

由 Al Viro 提交于 11月 17, 2011

A new helper: btrfs_alloc_root(fs_info); allocates btrfs_root
and sets ->fs_info.  All places allocating the suckers converted
to it.  At that point we *never* reassign ->fs_info of btrfs_root;
it's set before anyone sees the address of newly allocated
struct btrfs_root and never assigned anywhere else.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6f07e42e

06 11月, 2011 1 次提交

Btrfs: make sure to flush queued bios if write_cache_pages waits · 01d658f2

由 Chris Mason 提交于 11月 01, 2011

write_cache_pages tries to build up a large bio to stuff down the pipe.
But if it needs to wait for a page lock, it needs to make sure and send
down any pending writes so we don't deadlock with anyone who has the
page lock and is waiting for writeback of things inside the bio.

Dave Sterba triggered this as a deadlock between the autodefrag code and
the extent write_cache_pages
Signed-off-by: NChris Mason <chris.mason@oracle.com>

01d658f2

02 10月, 2011 1 次提交

btrfs: add READAHEAD extent buffer flag · ab0fff03

由 Arne Jansen 提交于 5月 23, 2011

Add a READAHEAD extent buffer flag.
Add a function to trigger a read with this flag set.

Changes v2:
 - use extent buffer flags instead of extent state flags

Changes v5:
 - adapt to changed read_extent_buffer_pages interface
 - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set
Signed-off-by: NArne Jansen <sensille@gmx.net>

ab0fff03

28 7月, 2011 1 次提交

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

21 5月, 2011 1 次提交

btrfs: implement delayed inode items operation · 16cdcec7

由 Miao Xie 提交于 4月 22, 2011

Changelog V5 -> V6:
- Fix oom when the memory load is high, by storing the delayed nodes into the
  root's radix tree, and letting btrfs inodes go.

Changelog V4 -> V5:
- Fix the race on adding the delayed node to the inode, which is spotted by
  Chris Mason.
- Merge Chris Mason's incremental patch into this patch.
- Fix deadlock between readdir() and memory fault, which is reported by
  Itaru Kitayama.

Changelog V3 -> V4:
- Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
  inode in time.

Changelog V2 -> V3:
- Fix the race between the delayed worker and the task which does delayed items
  balance, which is reported by Tsutomu Itoh.
- Modify the patch address David Sterba's comment.
- Fix the bug of the cpu recursion spinlock, reported by Chris Mason

Changelog V1 -> V2:
- break up the global rb-tree, use a list to manage the delayed nodes,
  which is created for every directory and file, and used to manage the
  delayed directory name index items and the delayed inode item.
- introduce a worker to deal with the delayed nodes.

Compare with Ext3/4, the performance of file creation and deletion on btrfs
is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
such as inode item, directory name item, directory name index and so on.

If we can do some delayed b+ tree insertion or deletion, we can improve the
performance, so we made this patch which implemented delayed directory name
index insertion/deletion and delayed inode update.

Implementation:
- introduce a delayed root object into the filesystem, that use two lists to
  manage the delayed nodes which are created for every file/directory.
  One is used to manage all the delayed nodes that have delayed items. And the
  other is used to manage the delayed nodes which is waiting to be dealt with
  by the work thread.
- Every delayed node has two rb-tree, one is used to manage the directory name
  index which is going to be inserted into b+ tree, and the other is used to
  manage the directory name index which is going to be deleted from b+ tree.
- introduce a worker to deal with the delayed operation. This worker is used
  to deal with the works of the delayed directory name index items insertion
  and deletion and the delayed inode update.
  When the delayed items is beyond the lower limit, we create works for some
  delayed nodes and insert them into the work queue of the worker, and then
  go back.
  When the delayed items is beyond the upper bound, we create works for all
  the delayed nodes that haven't been dealt with, and insert them into the work
  queue of the worker, and then wait for that the untreated items is below some
  threshold value.
- When we want to insert a directory name index into b+ tree, we just add the
  information into the delayed inserting rb-tree.
  And then we check the number of the delayed items and do delayed items
  balance. (The balance policy is above.)
- When we want to delete a directory name index from the b+ tree, we search it
  in the inserting rb-tree at first. If we look it up, just drop it. If not,
  add the key of it into the delayed deleting rb-tree.
  Similar to the delayed inserting rb-tree, we also check the number of the
  delayed items and do delayed items balance.
  (The same to inserting manipulation)
- When we want to update the metadata of some inode, we cached the data of the
  inode into the delayed node. the worker will flush it into the b+ tree after
  dealing with the delayed insertion and deletion.
- We will move the delayed node to the tail of the list after we access the
  delayed node, By this way, we can cache more delayed items and merge more
  inode updates.
- If we want to commit transaction, we will deal with all the delayed node.
- the delayed node will be freed when we free the btrfs inode.
- Before we log the inode items, we commit all the directory name index items
  and the delayed inode update.

I did a quick test by the benchmark tool[1] and found we can improve the
performance of file creation by ~15%, and file deletion by ~20%.

Before applying this patch:
Create files:
        Total files: 50000
        Total time: 1.096108
        Average time: 0.000022
Delete files:
        Total files: 50000
        Total time: 1.510403
        Average time: 0.000030

After applying this patch:
Create files:
        Total files: 50000
        Total time: 0.932899
        Average time: 0.000019
Delete files:
        Total files: 50000
        Total time: 1.215732
        Average time: 0.000024

[1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3

Many thanks for Kitayama-san's help!
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

16cdcec7