提交 · 3268a2468eb6a31af89930cbae58a62fe6ca6d2d · gsplhtlxg / clone-Linux

15 1月, 2013 3 次提交

Btrfs: reset path lock state to zero · 3268a246

由 Liu Bo 提交于 12月 28, 2012

We forgot to reset the path lock state to zero after we unlock the path block,
and this can lead to the ASSERT checker in tree unlock API.
Reported-by: NSlava Barinov <rayslava@gmail.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

3268a246

Btrfs: let allocation start from the right raid type · ac5c9300

由 Liu Bo 提交于 12月 27, 2012

This'd avoid us empty looping.

Say we have only one disk and the metadata raid type will be defaultly DUP,
and we do not need to start from index=0(RAID10) and get over two empty
loops to index=2(DUP).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ac5c9300

Btrfs: set flushing if we're limited flushing · 72bcd99d

由 Josef Bacik 提交于 12月 18, 2012

We still need to say we're flushing if we're limit flushing to keep somebody
from coming in and stealing our reservation.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

72bcd99d

17 12月, 2012 4 次提交

Btrfs: don't take inode delalloc mutex if we're a free space inode · c64c2bd8

由 Josef Bacik 提交于 12月 14, 2012

This confuses and angers lockdep even though it's ok.  We don't really need
the lock for free space inodes since only the transaction committer will be
reserving space.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

c64c2bd8

Btrfs: fix autodefrag and umount lockup · 1135d6df

由 Josef Bacik 提交于 12月 14, 2012

This happens because writeback_inodes_sb_nr_if_idle does down_read.  This
doesn't work for us and it has not been fixed upstream yet, so do it
ourselves and use that instead so we can stop having this stupid long
standing lockup.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1135d6df

Btrfs: put raid properties into global table · 31e50229

由 Liu Bo 提交于 11月 21, 2012

Raid properties can be shared among raid calculation code, we can put
them into a global table to keep it simple.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

31e50229

Btrfs: fix missing reserved space release in error path of delalloc reservation · 4b5829a8

由 Miao Xie 提交于 12月 05, 2012

We forget to release the reserved space in the error path of delalloc
reservatiom, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

4b5829a8

13 12月, 2012 4 次提交

Btrfs: disallow some operations on the device replace target device · 63a212ab

由 Stefan Behrens 提交于 11月 05, 2012

This patch adds some code to disallow operations on the device that
is used as the target for the device replace operation.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

63a212ab

Btrfs: pass fs_info to btrfs_map_block() instead of mapping_tree · 3ec706c8

由 Stefan Behrens 提交于 11月 05, 2012

This is required for the device replace procedure in a later step.
Two calling functions also had to be changed to have the fs_info
pointer: repair_io_failure() and scrub_setup_recheck_block().
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3ec706c8

Btrfs: fix a deadlock in aborting transaction due to ENOSPC · 37c4146d

由 Liu Bo 提交于 11月 05, 2012

When committing a transaction, we may bail out of running delayed refs
due to ENOSPC, and then abort the current transaction to flip into readonly.

But we'll hit a deadlock on ref head's lock since we forget to release
its lock and other cleanup stuff.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

37c4146d

fs/btrfs: use WARN · 31b1a2bd

由 Julia Lawall 提交于 11月 03, 2012

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

31b1a2bd

12 12月, 2012 4 次提交

Btrfs: fill the global reserve when unpinning space · 7b398f8e

由 Josef Bacik 提交于 10月 22, 2012

Dave gave me an image of a very full file system that would abort the
transaction because it ran out of space while committing the transaction.
This is because we would think there was plenty of room to create a snapshot
even though the global reserve was not full. This happens because we
calculate the global reserve size before we unpin any space, so after we
unpin the space we allow reservations to occur even though we haven't
reserved all of the space for our global reserve. Fix this by adding to the
global reserve while unpinning in order to make sure we always have enough
space to do our work. With this patch we no longer end up with an aborted
transaction, we return ENOSPC properly to the person trying to create the
snapshot. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7b398f8e

Btrfs: improve the noflush reservation · 08e007d2

由 Miao Xie 提交于 10月 16, 2012

In some places(such as: evicting inode), we just can not flush the reserved
space of delalloc, flushing the delayed directory index and delayed inode
is OK, but we don't try to flush those things and just go back when there is
no enough space to be reserved. This patch fixes this problem.

We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
If we can in the transaction, we should not flush anything, or the deadlock
would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
and we will flush all things.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

08e007d2

Btrfs: fix wrong comment in can_overcommit() · 561c294d

由 Miao Xie 提交于 10月 16, 2012

The comment is not coincident with the code. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

561c294d

Btrfs: cleanup duplicated division functions · 3fed40cc

由 Miao Xie 提交于 9月 13, 2012

div_factor{_fine} has been implemented for two times, cleanup it.
And I move them into a independent file named math.h because they are
common math functions.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3fed40cc

09 10月, 2012 4 次提交

Btrfs: don't commit instead of overcommitting · 44734ed1

由 Josef Bacik 提交于 9月 28, 2012

I don't think we have the same problem that this was supposed to fix
originally since we can allocate chunks in the enospc path now. This code
is causing us to constantly commit the transaction as we get close to using
all of our available space in our currently allocated chunks, instead of
allocating another chunk and carrying on with life, which is not nice for
performance. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

44734ed1

Btrfs: cache extent state when writing out dirty metadata pages · e6138876

由 Josef Bacik 提交于 9月 27, 2012

Everytime we write out dirty pages we search for an offset in the tree,
convert the bits in the state, and then when we wait we search for the
offset again and clear the bits. So for every dirty range in the io tree we
are doing 4 rb searches, which is suboptimal. With this patch we are only
doing 2 searches for every cycle (modulo weird things happening). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

e6138876

Btrfs: run delayed refs first when out of space · 67b0fd63

由 Josef Bacik 提交于 9月 24, 2012

Running delayed refs is faster than running delalloc, so lets do that first
to try and reclaim space.  This makes my fs_mark test about 20% faster.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

67b0fd63

btrfs: move transaction aborts to the point of failure · 005d6427

由 David Sterba 提交于 9月 18, 2012

Call btrfs_abort_transaction as early as possible when an error
condition is detected, that way the line number reported is useful
and we're not clueless anymore which error path led to the abort.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

005d6427

04 10月, 2012 2 次提交

L
Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents · 6bbe3a9c
由 Liu Bo 提交于 9月 14, 2012
```
nocow_only is now an obsolete argument.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
```
6bbe3a9c

Btrfs: cleanup for duplicated code in find_free_extent · ab26e9d6

由 Liu Bo 提交于 9月 14, 2012

There is already an 'add free space' phrase in front of this one, we
needn't to redo it.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

ab26e9d6

02 10月, 2012 9 次提交

Btrfs: remove bytes argument from do_chunk_alloc · 698d0082

由 Josef Bacik 提交于 9月 12, 2012

Everybody is just making stuff up, and it's just used to see if we really do
need to alloc a chunk, and since we do this when we already know we really
do it's just a waste of space.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

698d0082

Btrfs: delay block group item insertion · ea658bad

由 Josef Bacik 提交于 9月 11, 2012

So we have lots of places where we try to preallocate chunks in order to
make sure we have enough space as we make our allocations. This has
historically meant that we're constantly tweaking when we should allocate a
new chunk, and historically we have gotten this horribly wrong so we way
over allocate either metadata or data. To try and keep this from happening
we are going to make it so that the block group item insertion is done out
of band at the end of a transaction. This will allow us to create chunks
even if we are trying to make an allocation for the extent tree. With this
patch my enospc tests run faster (didn't expect this) and more efficiently
use the disk space (this is what I wanted). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ea658bad

Btrfs: fix our overcommit math · a80c8dcf

由 Josef Bacik 提交于 9月 06, 2012

I noticed I was seeing large lags when running my torrent test in a vm on my
laptop. While trying to make it lag less I noticed that our overcommit math
was taking into account the number of bytes we wanted to reclaim, not the
number of bytes we actually wanted to allocate, which means we wouldn't
overcommit as often. This patch fixes the overcommit math and makes
shrink_delalloc() use that logic so that it will stop looping faster. We
still have pretty high spikes of latency, but the test now takes 3 minutes
less time (about 5% faster). Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

a80c8dcf

Btrfs: wait on async pages when shrinking delalloc · dea31f52

由 Josef Bacik 提交于 9月 06, 2012

Mitch reported a problem where you could get an ENOSPC error when untarring
a kernel git tree onto a 16gb file system with compress-force=zlib. This is
because compression is a huge pain, it will return from ->writepages()
without having actually created any ordered extents. To get around this we
check to see if the async submit counter is up, and if it is wait until it
drops to 0 before doing our normal ordered wait dance. With this patch I
can now untar a kernel git tree onto a 16gb file system without getting
ENOSPC errors. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

dea31f52

Btrfs: fix wrong size for the reservation of the, snapshot creation · 48c03c4b

由 Miao Xie 提交于 9月 06, 2012

We should insert/update 6 items(root ref, root backref, dir item, dir index,
root item and parent inode) when creating a snapshot, not 5 items, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

48c03c4b

Btrfs: add a new "type" field into the block reservation structure · 66d8f3dd

由 Miao Xie 提交于 9月 06, 2012

Sometimes we need choose the method of the reservation according to the type
of the block reservation, such as the reservation for the delayed inode update.
Now we identify the type just by comparing the address of the reservation
variants, it is very ugly if it is a temporary one because we need compare it
with all the common reservation variants. So we add a new "type" field to keep
the type the reservation variants.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

66d8f3dd

Btrfs: add hole punching · 2aaa6655

由 Josef Bacik 提交于 8月 29, 2012

This patch adds hole punching via fallocate.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2aaa6655

Btrfs: do not needlessly restart the transaction for enospc · ca7e70f5

由 Josef Bacik 提交于 8月 27, 2012

We will stop and restart a transaction every time we move to a different leaf
when truncating a file. This is for enospc reasons, but really we could
probably get away with doing this a little better by actually working until we
hit an ENOSPC. So add a ->failfast flag to the block_rsv and set it when we do
truncates which will fail as soon as the block rsv runs out of space, and then
at that point we can stop and restart the transaction and refill the block rsv
and carry on. This will make rm'ing of a file with lots of extents a bit
faster. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ca7e70f5

Btrfs: do not allocate chunks as agressively · 54338b5c

由 Josef Bacik 提交于 8月 14, 2012

Swinging this pendulum back the other way. We've been allocating chunks up
to 2% of the disk no matter how much we actually have allocated. So instead
fix this calculation to only allocate chunks if we have more than 80% of the
space available allocated. Please test this as it will likely cause all
sorts of ENOSPC problems to pop up suddenly. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

54338b5c

29 8月, 2012 5 次提交

Btrfs: allow delayed refs to be merged · ae1e206b

由 Josef Bacik 提交于 8月 07, 2012

Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
Basically what happens is the balance relocates a tree block which will drop
the implicit refs for all of its children and adds a full backref. Once the
block is relocated we have to add the implicit refs back, so when we cow the
block again we add the implicit refs for its children back. The problem
comes when the original drop ref doesn't get run before we add the implicit
refs back. The delayed ref stuff will specifically prefer ADD operations
over DROP to keep us from freeing up an extent that will have references to
it, so we try to add the implicit ref before it is actually removed and we
panic. This worked fine before because the add would have just canceled the
drop out and we would have been fine. But the backref walking work needs to
be able to freeze the delayed ref stuff in time so we have this ever
increasing sequence number that gets attached to all new delayed ref updates
which makes us not merge refs and we run into this issue.

So to fix this we need to merge delayed refs. So everytime we run a
clustered ref we need to try and merge all of its delayed refs. The backref
walking stuff locks the delayed ref head before processing, so if we have it
locked we are safe to merge any refs inside of the sequence number. If
there is no sequence number we can merge all refs. Doing this not only
fixes our bug but keeps the delayed ref code from adding and removing
useless refs and batching together multiple refs into one search instead of
one search per delayed ref, which will really help our commit times. I ran
this with Daniels test and 276 and I haven't seen any problems. Thanks,
Reported-by: NDaniel J Blueman <daniel@quora.org>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ae1e206b

Btrfs: fix race in run_clustered_refs · 22cd2e7d

由 Arne Jansen 提交于 8月 09, 2012

With commit

commit d1270cd9
Author: Arne Jansen <sensille@gmx.net>
Date:   Tue Sep 13 15:16:43 2011 +0200

     Btrfs: put back delayed refs that are too new

I added a window where the delayed_ref's head->ref_mod code can diverge
from the sum of the remaining refs, because we release the head->mutex
in the middle. This leads to btrfs_lookup_extent_info returning wrong
numbers. This patch fixes this by adjusting the head's ref_mod with each
delayed ref we run.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

22cd2e7d

Btrfs: increase the size of the free space cache · 6fc823b1

由 Josef Bacik 提交于 8月 06, 2012

Arne was complaining about the space cache having mismatching generation
numbers when debugging a deadlock. This is because we can run out of space
in our preallocated range for our space cache if you have a pretty
fragmented amount of space in your pinned space. So just increase the
amount of space we preallocate for space cache so we can be sure to have
enough space. This will only really affect data ranges since their the only
chunks that end up larger than 256MB. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6fc823b1

Btrfs: fix deadlock in wait_for_more_refs · 1fa11e26

由 Arne Jansen 提交于 8月 06, 2012

Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1fa11e26

Btrfs: unlock on error in btrfs_delalloc_reserve_metadata() · 55e591ff

由 Dan Carpenter 提交于 7月 30, 2012

We should release this mutex before returning the error code.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

55e591ff

26 7月, 2012 1 次提交

Btrfs: add a barrier before a waitqueue_active check · cd1cfc49

由 Chris Mason 提交于 7月 25, 2012

We were missing wakeups on the delayed ref waitqueue due
to races on waitqueue_active.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cd1cfc49

24 7月, 2012 4 次提交

Btrfs: make btrfs's allocation smoothly with preallocation · df57dbe6

由 Liu Bo 提交于 7月 23, 2012

For backref walking, we've introduce delayed ref's sequence.  However,
it changes our preallocation behavior.

The story is that when we preallocate an extent and then mark it written
piece by piece, the ideal case should be that we don't need to COW the
extent, which is why we use 'preallocate'.

But we may not make use of preallocation, since when we check for cross refs on
the extent, we may have two ref entries which have the same content except
the sequence value, and we recognize them as cross refs and do COW to allocate
another extent.

So we end up with several pieces of space instead of an whole extent.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

df57dbe6

Btrfs: kill free_space pointer from inode structure · b4d7c3c9

由 Li Zefan 提交于 7月 09, 2012

Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
which means every inode has the same BTRFS_I(inode)->free_space pointer.

This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).
Signed-off-by: NLi Zefan <lizefan@huawei.com>

b4d7c3c9

Btrfs: add ro notification to dump_space_info · 799ffc3c

由 Liu Bo 提交于 7月 06, 2012

Block group has ro attributes, make dump_space_info show it.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

799ffc3c

Btrfs: fix a bug of writting free space cache during balance · cf7c1ef6

由 Liu Bo 提交于 7月 06, 2012

Here is the whole story:
1)
A free space cache consists of two parts:
o  free space cache inode, which is special becase it's stored in root tree.
o  free space info, which is stored as the above inode's file data.

But we only build up another new inode and does not flush its free space info
onto disk when we _clear and setup_ free space cache, and this ends up with
that the block group cache's cache_state remains DC_SETUP instead of DC_WRITTEN.

And holding DC_SETUP means that we will not truncate this free space cache inode,
which means the disk offset of its file extent will remain _unchanged_ at least
until next transaction finishes committing itself.

2)
We can set a block group readonly when we relocate the block group.

However,
if the readonly block group covers the disk offset where our free space cache
inode is going to write, it will force the free space cache inode into
cow_file_range() and it'll end up hitting a BUG_ON.

3)
Due to the above analysis, we fix this bug by adding the missing dirty flag.

4)
However, it's not over, there is still another case, nospace_cache.

With nospace_cache, we do not want to set dirty flag, instead we just truncate
free space cache inode and bail out with setting cache state DC_WRITTEN.

We can benifit from it since it saves us another 'pre-allocation' part which
usually costs a lot.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

cf7c1ef6