提交 · b24e03db0df3e9164c9649db12fecc8c2d81b0d1 · gsplhtlxg / clone-Linux

20 10月, 2011 25 次提交

Btrfs: release trans metadata bytes before flushing delayed refs · b24e03db

由 Josef Bacik 提交于 10月 14, 2011

We started setting trans->block_rsv = NULL to allow the delayed refs flushing
stuff to use the right block_rsv and then just made
btrfs_trans_release_metadata() unconditionally use the trans block rsv. The
problem with this is we need to reserve some space in the transaction and then
migrate it to the global block rsv, so we need to be able to free that out
properly. So instead just move btrfs_trans_release_metadata() before the
delayed ref flushing and use trans->block_rsv for the freeing. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

b24e03db

Btrfs: allow shrink_delalloc flush the needed reclaimed pages · 877da174

由 Josef Bacik 提交于 10月 14, 2011

Currently we only allow a maximum of 2 megabytes of pages to be flushed at a
time. This was ok before, but now we have overcommit which will screw us in a
heartbeat if we are quickly filling the disk. So instead pick either 2
megabytes or the number of pages we need to reclaim to be safe again, which ever
is larger. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

877da174

Btrfs: wait for ordered extents if we're in trouble when shrinking delalloc · f104d044

由 Josef Bacik 提交于 10月 14, 2011

The only way we actually reclaim delalloc space is waiting for the IO to
completely finish. Usually we kick off a bunch of IO and wait for a little bit
and hope we can make our reservation, and usually this works out pretty well.
With overcommit however we can get seriously underwater if we're filling up the
disk quickly, so we need to be able to force the delalloc shrinker to wait for
the ordered IO to finish to give us a better chance of actually reclaiming
enough space to get our reservation. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

f104d044

Btrfs: don't check bytes_pinned to determine if we should commit the transaction · bbb495c2

由 Josef Bacik 提交于 10月 14, 2011

Before the only reason to commit the transaction to recover space in
reserve_metadata_bytes() was if there were enough pinned_bytes to satisfy our
reservation. But now we have the delayed inode stuff which will hold it's
reservations until we commit the transaction. So say we max out our reservation
by creating a bunch of files but don't have any pinned bytes we will ENOSPC out
early even though we could commit the transaction and get that space back. So
now just unconditionally commit the transaction since currently there is no way
to know how much metadata space is being reserved by delayed inode stuff.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

bbb495c2

Btrfs: wait for ordered extents if we didn't reclaim enough · 4b91c14f

由 Josef Bacik 提交于 10月 07, 2011

I noticed recently that my overcommit patch was causing one of my enospc tests
to fail 25% of the time with early ENOSPC. This is because my overcommit patch
was letting us go way over board, but it wasn't waiting long enough to let the
delalloc shrinker do it's job. The problem is we just start writeback and wait
a little bit hoping we flush enough, but we only free up delalloc space by
having the writes complete all the way. We do this by waiting for ordered
extents, which we do but only if we already free'd enough for the reservation,
which isn't right, we should flush ordered extents if we didn't reclaim enough
in case that will push us over the edge. With this patch I've not seen a
failure in this enospc test after running it in a loop for an hour. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4b91c14f

Btrfs: inline checksums into the disk free space cache · 5b0e95bf

由 Josef Bacik 提交于 10月 06, 2011

Yeah yeah I know this is how we used to do it and then I changed it, but damnit
I'm changing it back. The fact is that writing out checksums will modify
metadata, which could cause us to dirty a block group we've already written out,
so we have to truncate it and all of it's checksums and re-write it which will
write new checksums which could dirty a blockg roup that has already been
written and you see where I'm going with this? This can cause unmount or really
anything that depends on a transaction to commit to take it's sweet damned time
to happen. So go back to the way it was, only this time we're specifically
setting NODATACOW because we can't go through the COW pathway anyway and we're
doing our own built-in cow'ing by truncating the free space cache. The other
new thing is once we truncate the old cache and preallocate the new space, we
don't need to do that song and dance at all for the rest of the transaction, we
can just overwrite the existing space with the new cache if the block group
changes for whatever reason, and the NODATACOW will let us do this fine. So
keep track of which transaction we last cleared our cache in and if we cleared
it in this transaction just say we're all setup and carry on. This survives
xfstests and stress.sh.

The inode cache will continue to use the normal csum infrastructure since it
only gets written once and there will be no more modifications to the fs tree in
a transaction commit.
Signed-off-by: NJosef Bacik <josef@redhat.com>

5b0e95bf

Btrfs: take overflow into account in reserving space · 9a82ca65

由 Josef Bacik 提交于 10月 05, 2011

My overcommit stuff can be a little racy when we're filling up the disk with
fs_mark and we overcommit into things that quickly get used up for data. So use
num_bytes to see if we have enough available space so we're less likely to
overcommit ourselves out of the ability to make reservations. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

9a82ca65

Btrfs: introduce mount option no_space_cache · 73bc1876

由 Josef Bacik 提交于 10月 03, 2011

Some users have requested this and I've found I needed a way to disable cache
loading without actually clearing the cache, so introduce the no_space_cache
option. Before we check the super blocks cache generation field and if it was
populated we always turned space caching on. Now we check this and set the
space cache option on, and then parse the mount options so that if we want it
off it get's turned off. Then we check the mount option all the places we do
the caching work instead of checking the super's cache generation. This makes
things more consistent and lets us turn space caching off. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

73bc1876

Btrfs: allow us to overcommit our enospc reservations · 2bf64758

由 Josef Bacik 提交于 9月 26, 2011

One of the things that kills us is the fact that our ENOSPC reservations are
horribly over the top in most normal cases.  There isn't too much that can be
done about this because when we are completely full we really need them to work
like this so we don't under reserve.  However if there is plenty of unallocated
chunks on the disk we can use that to gauge how much we can overcommit.  So this
patch adds chunk free space accounting so we always know how much unallocated
space we have.  Then if we fail to make a reservation within our allocated
space, check to see if we can overcommit.  In the normal flushing case (like
with delalloc metadata reservations) we'll take the free space and divide it by
2 if our metadata profile is setup for DUP or any of those, and then divide it
by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
case (we really need this reservation now!) we only limit ourselves to half of
the free space.  This makes this fio test

[torrent]
filename=torrent-test
rw=randwrite
size=4g
ioengine=sync
directory=/mnt/btrfs-test

go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
file system.  This doesn't seem to break my other enospc tests, but could really
use some more testing as this is a super scary change.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2bf64758

Btrfs: check unused against how much space we actually want · ef3be457

由 Josef Bacik 提交于 9月 22, 2011

There is a bug that may lead to early ENOSPC in our reservation code. We've
been checking against num_bytes which may be above and beyond what we want to
actually reserve, which could give us a false ENOSPC. Fix this by making sure
the unused space is above how much we want to reserve and not how much we're
trying to flush. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ef3be457

Btrfs: delay iput when deleting a block group · 455757c3

由 Josef Bacik 提交于 9月 19, 2011

I kept getting warnings from evict because we were calling
btrfs_start_transaction() with a transaction already started when doing a
balance. This is because we remove a block group which requires a transaction,
and the put the last reference on the cache inode. Instead of doing this we
need to delay the iput so it is done not within a transaction having started.
This gets rid of our warnings. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

455757c3

Btrfs: stop passing a trans handle all around the reservation code · 4a92b1b8

由 Josef Bacik 提交于 8月 30, 2011

The only thing that we need to have a trans handle for is in
reserve_metadata_bytes and thats to know how much flushing we can do.  So
instead of passing it around, just check current->journal_info for a
trans_handle so we know if we can commit a transaction to try and free up space
or not.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4a92b1b8

Btrfs: don't get the block_rsv in btrfs_free_tree_block · d02c9955

由 Josef Bacik 提交于 8月 30, 2011

Since the durable block rsv stuff has been killed there is no need to get the
block_rsv in btrfs_free_tree_block anymore.
Signed-off-by: NJosef Bacik <josef@redhat.com>

d02c9955

Btrfs: use the transactions block_rsv for the csum root · 4c13d758

由 Josef Bacik 提交于 8月 30, 2011

The alloc warnings everybody has been seeing is because we have been reserving
space for csums, but we weren't actually using that space. So make
get_block_rsv() return the trans->block_rsv if we're modifying the csum root.
Also set the trans->block_rsv to NULL so that if we modify the csum root when
running delayed ref's that comes out of the global reserve like it's supposed
to. With this patch I'm not seeing those alloc warnings anymore. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4c13d758

Btrfs: handle enospc accounting for free space inodes · c09544e0

由 Josef Bacik 提交于 8月 30, 2011

Since free space inodes now use normal checksumming we need to make sure to
account for their metadata use. So reserve metadata space, and then if we fail
to write out the metadata we can just release it, otherwise it will be freed up
when the io completes. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c09544e0

Btrfs: don't increase the block_rsv's size when emergency allocating space · 7f701508

由 Josef Bacik 提交于 8月 22, 2011

If we have to emergency reserve space we need to not increase the block_rsv
size, otherwise we'll leak space. Take for instance delalloc, say we reserve
4k, and we use that 4k, and then we have to emergency allocate another 4k, we
bump the size up to 8k, however we've only accounted for 4k in reservations in
all of our supporting logic, so we'll go to free the 4k and end up having a size
of 4k, which will cause us to later not free as much space. I saw this doing
testing where I wasn't reserving enough space for something but was still
leaking space, very frustrating. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7f701508

Btrfs: fix space leak when we fail to make an allocation · 7ed49f18

由 Josef Bacik 提交于 8月 19, 2011

When changing back to using a spin_lock to protect the extent counters I decided
that since we would only be dropping our original extent, it was ok to just drop
the extent and return. However since somebody else could have come in and done
a reservation, we need to do the normal song and dance to clear the reservation
out properly. So calculate how much space we need to free, and then subtract
what we just attempted to reserve. If it's more then we know we need to drop
those bytes from the delalloc block rsv. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7ed49f18

Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check · 482e6dc5

由 Josef Bacik 提交于 8月 19, 2011

If you run xfstest 224 it you will get lots of messages about not being able to
delete inodes and that they will be cleaned up next mount. This is because
btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to
flush, so if there was not enough space, it simply failed. But in truncate and
evict case we could easily flush space to try and get enough space to do our
work, so make btrfs_block_rsv_check take a flush argument to pass down to
reserve_metadata_bytes. Now xfstests 224 runs fine without all those
complaints. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

482e6dc5

Btrfs: kill btrfs_truncate_reserve_metadata · 5e962c78

由 Josef Bacik 提交于 8月 08, 2011

Since we've optimized the truncate path, we no longer require this function.
Signed-off-by: NJosef Bacik <josef@redhat.com>

5e962c78

Btrfs: don't try to commit in btrfs_block_rsv_check · 13553e52

由 Josef Bacik 提交于 8月 08, 2011

We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot
because we have a transaction open it will return EAGAIN, so we do not need to
try and commit the transaction again.
Signed-off-by: NJosef Bacik <josef@redhat.com>

13553e52

Btrfs: kill unused parts of block_rsv · dabdb640

由 Josef Bacik 提交于 8月 08, 2011

The priority and refill_used flags are not used anymore, and neither is the
usage counter, so just remove them from btrfs_block_rsv.
Signed-off-by: NJosef Bacik <josef@redhat.com>

dabdb640

Btrfs: kill the durable block rsv stuff · 37be25bc

由 Josef Bacik 提交于 8月 05, 2011

This is confusing code and isn't used by anything anymore, so delete it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

37be25bc

Btrfs: calculate checksum space correctly · 7709cde3

由 Josef Bacik 提交于 8月 04, 2011

We have not been reserving enough space for checksums. We were just reserving
bytes for the checksum items themselves, we were not taking into account having
to cow the tree and such. This patch adds a csum_bytes counter to the inode for
keeping track of the number of bytes outstanding we have for checksums. Then we
calculate how many leaves would be required for the checksums we are given and
use that to reserve space. This adds a significant amount of bytes to our
reservations, but we will handle this later. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7709cde3

Btrfs: use bytes_may_use for all ENOSPC reservations · fb25e914

由 Josef Bacik 提交于 7月 26, 2011

We have been using bytes_reserved for metadata reservations, which is wrong
since we use that to keep track of outstanding reservations from the allocator.
This resulted in us doing a lot of silly things to make sure we don't allocate a
bunch of metadata chunks since we never had a real view of how much space was
actually in use by metadata.

This passes Arne's enospc test and xfstests as well as my own enospc tests.
Hopefully this will get us moving in the right direction. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fb25e914

Btrfs: kill reserved_bytes in inode · 0cbbdf7c

由 Josef Bacik 提交于 7月 14, 2011

reserved_bytes is not used for anything in the inode, remove it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

0cbbdf7c

21 8月, 2011 1 次提交

Btrfs: fix 64 bit divide problem · 6719db6a

由 Josef Bacik 提交于 8月 20, 2011

This fixes a regression introduced by commit cdcb725c ("Btrfs: check
if there is enough space for balancing smarter").  We can't do 64-bit
divides on 32-bit architectures.

In cases where we need to divide/multiply by 2 we should just left/right
shift respectively, and in cases where theres N number of devices use
do_div.  Also make the counters u64 to match up with rw_devices.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Acked-and-tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6719db6a

17 8月, 2011 3 次提交

Btrfs: forced readonly when btrfs_drop_snapshot() fails · cb1b69f4

由 Tsutomu Itoh 提交于 8月 09, 2011

The filesystem turns readonly instead of returning the error to the
caller when detected error in btrfs_drop_snapshot().
and, because the caller doesn't check the error, the function type is
changed to 'void'.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb1b69f4

Btrfs: check if there is enough space for balancing smarter · cdcb725c

由 liubo 提交于 8月 03, 2011

When checking if there is enough space for balancing a block group,
since we do not take raid types into consideration, we do not account
corrent amounts of space that we needed.  This makes us do some extra
work before we get ENOSPC.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cdcb725c

Btrfs: detect wether a device supports discard · d5e2003c

由 Josef Bacik 提交于 8月 04, 2011

We have a problem where if a user specifies discard but doesn't actually support
it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
because this gets called (in a fashion) from the tree log recovery code, which
has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
replay. So instead detect wether our devices support discard when we're adding
them and then don't issue discards if we know that the device doesn't support
it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d5e2003c

02 8月, 2011 4 次提交

Btrfs: don't print the leaf if we had an error · b783e62d

由 Josef Bacik 提交于 7月 13, 2011

In __btrfs_free_extent we will print the leaf if we fail to find the extent we
wanted, but the problem is if we get an error we won't have a leaf so often this
leads to a NULL pointer dereference and we lose the error that actually
occurred. So only print the leaf if ret > 0, which means we didn't find the
item we were looking for but we didn't error either. This way the error is
preserved.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b783e62d

Btrfs: fix oops while writing data to SSD partitions · ff1f2b44

由 liubo 提交于 7月 27, 2011

Here I have a two SSD-partitions btrfs, and they are defaultly set to
"data=raid0, metadata=raid1", then I try to fill my btrfs partition
till "No space left on device", via "dd if=/dev/zero of=/mnt/btrfs/tmp".

I get an oops panic from kernel BUG at fs/btrfs/extent-tree.c:5199!, which
refers to find_free_extent's
BUG_ON(index != get_block_group_index(block_group));

In SSD mode, in order to find enough space to alloc, we may check the
block_group cache which has been checked sometime before, but the index is not
updated, where it hits the BUG_ON.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Acked-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ff1f2b44

Btrfs: Protect the readonly flag of block group · 61cfea9b

由 WuBo 提交于 7月 26, 2011

The access for ro in btrfs_block_group_cache should be protected
because of the racy lock in relocation.
Signed-off-by: NWu Bo <wu.bo@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

61cfea9b

Btrfs: return error to caller when btrfs_unlink() failes · b532402e

由 Tsutomu Itoh 提交于 7月 19, 2011

When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink()
are error, the error code is returned to the caller instead of
BUG_ON().
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b532402e

28 7月, 2011 7 次提交

Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors · 75c195a2

由 Chris Mason 提交于 7月 27, 2011

The btrfs transaction code will return any errors that come from
reserve_metadata_bytes.  We need to make sure we don't return funny
things like 1 or EAGAIN.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75c195a2

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

Btrfs: switch the btrfs tree locks to reader/writer · bd681513

由 Chris Mason 提交于 7月 16, 2011

The btrfs metadata btree is the source of significant
lock contention, especially in the root node.   This
commit changes our locking to use a reader/writer
lock.

The lock is built on top of rw spinlocks, and it
extends the lock tracking to remember if we have a
read lock or a write lock when we go to blocking.  Atomics
count the number of blocking readers or writers at any
given time.

It removes all of the adaptive spinning from the old code
and uses only the spinning/blocking hints inside of btrfs
to decide when it should continue spinning.

In read heavy workloads this is dramatically faster.  In write
heavy workloads we're still faster because of less contention
on the root node lock.

We suffer slightly in dbench because we schedule more often
during write locks, but all other benchmarks so far are improved.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bd681513

Btrfs: fix BUG_ON() caused by ENOSPC when relocating space · 199c36ea

由 Miao Xie 提交于 7月 15, 2011

When we balanced the chunks across the devices, BUG_ON() in
__finish_chunk_alloc() was triggered.

------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:2568!
[SNIP]
Call Trace:
 [<ffffffffa049525e>] btrfs_alloc_chunk+0x8e/0xa0 [btrfs]
 [<ffffffffa04546b0>] do_chunk_alloc+0x330/0x3a0 [btrfs]
 [<ffffffffa045c654>] btrfs_reserve_extent+0xb4/0x1f0 [btrfs]
 [<ffffffffa045c86b>] btrfs_alloc_free_block+0xdb/0x350 [btrfs]
 [<ffffffffa048a8d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa04476fd>] __btrfs_cow_block+0x14d/0x5e0 [btrfs]
 [<ffffffffa044660d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0447c9b>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa044dd5e>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa044f07d>] btrfs_insert_empty_items+0x8d/0xf0 [btrfs]
 [<ffffffffa045e973>] insert_with_overflow+0x43/0x110 [btrfs]
 [<ffffffffa045eb0d>] btrfs_insert_dir_item+0xcd/0x1f0 [btrfs]
 [<ffffffffa0489bd0>] ? map_extent_buffer+0xb0/0xc0 [btrfs]
 [<ffffffff812276ad>] ? rb_insert_color+0x9d/0x160
 [<ffffffffa046cc40>] ? inode_tree_add+0xf0/0x150 [btrfs]
 [<ffffffffa0474801>] btrfs_add_link+0xc1/0x1c0 [btrfs]
 [<ffffffff811dacac>] ? security_inode_init_security+0x1c/0x30
 [<ffffffffa04a28aa>] ? btrfs_init_acl+0x4a/0x180 [btrfs]
 [<ffffffffa047492f>] btrfs_add_nondir+0x2f/0x70 [btrfs]
 [<ffffffffa046af16>] ? btrfs_init_inode_security+0x46/0x60 [btrfs]
 [<ffffffffa0474ac0>] btrfs_create+0x150/0x1d0 [btrfs]
 [<ffffffff81159c63>] ? generic_permission+0x23/0xb0
 [<ffffffff8115b415>] vfs_create+0xa5/0xc0
 [<ffffffff8115ce6e>] do_last+0x5fe/0x880
 [<ffffffff8115dc0d>] path_openat+0xcd/0x3d0
 [<ffffffff8115e029>] do_filp_open+0x49/0xa0
 [<ffffffff8116a965>] ? alloc_fd+0x95/0x160
 [<ffffffff8114f0c7>] do_sys_open+0x107/0x1e0
 [<ffffffff810bcc3f>] ? audit_syscall_entry+0x1bf/0x1f0
 [<ffffffff8114f1e0>] sys_open+0x20/0x30
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa049444a>] __finish_chunk_alloc+0x20a/0x220 [btrfs]

The reason is:
Task1					Space balance task
do_chunk_alloc()
  __finish_chunk_alloc()
    update device info
    in the chunk tree
      alloc system metadata block
					relocate system metadata block group
					  set system metadata block group
					  readonly, This block group is the
					  only one that can allocate space. So
					  there is no free space that can be
					  allocated now.
        find no space and don't try
        to alloc new chunk, and then
        return ENOSPC
  BUG_ON() in __finish_chunk_alloc()
  was triggered.

Fix this bug by allocating a new system metadata chunk before relocating the
old one if we find there is no free space which can be allocated after setting
the old block group to be read-only.
Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

199c36ea

Btrfs: fix enospc problems with delalloc · 9e0baf60

由 Josef Bacik 提交于 7月 15, 2011

So I had this brilliant idea to use atomic counters for outstanding and reserved
extents, but this turned out to be a bad idea.  Consider this where we have 1
outstanding extent and 1 reserved extent

Reserver				Releaser
					atomic_dec(outstanding) now 0
atomic_read(outstanding)+1 get 1
atomic_read(reserved) get 1
don't actually reserve anything because
they are the same
					atomic_cmpxchg(reserved, 1, 0)
atomic_inc(outstanding)
atomic_add(0, reserved)
					free reserved space for 1 extent

Then the reserver now has no actual space reserved for it, and when it goes to
finish the ordered IO it won't have enough space to do it's allocation and you
get those lovely warnings.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9e0baf60

Btrfs: don't flush delalloc arbitrarily · a5991428

由 Josef Bacik 提交于 7月 15, 2011

Kill the check to see if we have 512mb of reserved space in delalloc and
shrink_delalloc if we do.  This causes unexpected latencies and we have other
logic to see if we need to throttle.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5991428

Btrfs: use a worker thread to do caching · bab39bf9

由 Josef Bacik 提交于 6月 30, 2011

A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,
Reported-by: NRoman Mamedov <rm@romanrm.ru>
Signed-off-by: NJosef Bacik <josef@redhat.com>

bab39bf9