提交 · a9b5fcddce594a408a48d523087b5bb64ce82469 · openanolis / cloud-kernel

20 10月, 2011 8 次提交

Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check · 482e6dc5

由 Josef Bacik 提交于 8月 19, 2011

If you run xfstest 224 it you will get lots of messages about not being able to
delete inodes and that they will be cleaned up next mount. This is because
btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to
flush, so if there was not enough space, it simply failed. But in truncate and
evict case we could easily flush space to try and get enough space to do our
work, so make btrfs_block_rsv_check take a flush argument to pass down to
reserve_metadata_bytes. Now xfstests 224 runs fine without all those
complaints. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

482e6dc5

Btrfs: kill btrfs_truncate_reserve_metadata · 5e962c78

由 Josef Bacik 提交于 8月 08, 2011

Since we've optimized the truncate path, we no longer require this function.
Signed-off-by: NJosef Bacik <josef@redhat.com>

5e962c78

Btrfs: don't try to commit in btrfs_block_rsv_check · 13553e52

由 Josef Bacik 提交于 8月 08, 2011

We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot
because we have a transaction open it will return EAGAIN, so we do not need to
try and commit the transaction again.
Signed-off-by: NJosef Bacik <josef@redhat.com>

13553e52

Btrfs: kill unused parts of block_rsv · dabdb640

由 Josef Bacik 提交于 8月 08, 2011

The priority and refill_used flags are not used anymore, and neither is the
usage counter, so just remove them from btrfs_block_rsv.
Signed-off-by: NJosef Bacik <josef@redhat.com>

dabdb640

Btrfs: kill the durable block rsv stuff · 37be25bc

由 Josef Bacik 提交于 8月 05, 2011

This is confusing code and isn't used by anything anymore, so delete it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

37be25bc

Btrfs: calculate checksum space correctly · 7709cde3

由 Josef Bacik 提交于 8月 04, 2011

We have not been reserving enough space for checksums. We were just reserving
bytes for the checksum items themselves, we were not taking into account having
to cow the tree and such. This patch adds a csum_bytes counter to the inode for
keeping track of the number of bytes outstanding we have for checksums. Then we
calculate how many leaves would be required for the checksums we are given and
use that to reserve space. This adds a significant amount of bytes to our
reservations, but we will handle this later. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7709cde3

Btrfs: use bytes_may_use for all ENOSPC reservations · fb25e914

由 Josef Bacik 提交于 7月 26, 2011

We have been using bytes_reserved for metadata reservations, which is wrong
since we use that to keep track of outstanding reservations from the allocator.
This resulted in us doing a lot of silly things to make sure we don't allocate a
bunch of metadata chunks since we never had a real view of how much space was
actually in use by metadata.

This passes Arne's enospc test and xfstests as well as my own enospc tests.
Hopefully this will get us moving in the right direction. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fb25e914

Btrfs: kill reserved_bytes in inode · 0cbbdf7c

由 Josef Bacik 提交于 7月 14, 2011

reserved_bytes is not used for anything in the inode, remove it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

0cbbdf7c

21 8月, 2011 1 次提交

Btrfs: fix 64 bit divide problem · 6719db6a

由 Josef Bacik 提交于 8月 20, 2011

This fixes a regression introduced by commit cdcb725c ("Btrfs: check
if there is enough space for balancing smarter").  We can't do 64-bit
divides on 32-bit architectures.

In cases where we need to divide/multiply by 2 we should just left/right
shift respectively, and in cases where theres N number of devices use
do_div.  Also make the counters u64 to match up with rw_devices.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Acked-and-tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6719db6a

17 8月, 2011 3 次提交

Btrfs: forced readonly when btrfs_drop_snapshot() fails · cb1b69f4

由 Tsutomu Itoh 提交于 8月 09, 2011

The filesystem turns readonly instead of returning the error to the
caller when detected error in btrfs_drop_snapshot().
and, because the caller doesn't check the error, the function type is
changed to 'void'.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb1b69f4

Btrfs: check if there is enough space for balancing smarter · cdcb725c

由 liubo 提交于 8月 03, 2011

When checking if there is enough space for balancing a block group,
since we do not take raid types into consideration, we do not account
corrent amounts of space that we needed.  This makes us do some extra
work before we get ENOSPC.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cdcb725c

Btrfs: detect wether a device supports discard · d5e2003c

由 Josef Bacik 提交于 8月 04, 2011

We have a problem where if a user specifies discard but doesn't actually support
it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
because this gets called (in a fashion) from the tree log recovery code, which
has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
replay. So instead detect wether our devices support discard when we're adding
them and then don't issue discards if we know that the device doesn't support
it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d5e2003c

02 8月, 2011 4 次提交

Btrfs: don't print the leaf if we had an error · b783e62d

由 Josef Bacik 提交于 7月 13, 2011

In __btrfs_free_extent we will print the leaf if we fail to find the extent we
wanted, but the problem is if we get an error we won't have a leaf so often this
leads to a NULL pointer dereference and we lose the error that actually
occurred. So only print the leaf if ret > 0, which means we didn't find the
item we were looking for but we didn't error either. This way the error is
preserved.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b783e62d

Btrfs: fix oops while writing data to SSD partitions · ff1f2b44

由 liubo 提交于 7月 27, 2011

Here I have a two SSD-partitions btrfs, and they are defaultly set to
"data=raid0, metadata=raid1", then I try to fill my btrfs partition
till "No space left on device", via "dd if=/dev/zero of=/mnt/btrfs/tmp".

I get an oops panic from kernel BUG at fs/btrfs/extent-tree.c:5199!, which
refers to find_free_extent's
BUG_ON(index != get_block_group_index(block_group));

In SSD mode, in order to find enough space to alloc, we may check the
block_group cache which has been checked sometime before, but the index is not
updated, where it hits the BUG_ON.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Acked-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ff1f2b44

Btrfs: Protect the readonly flag of block group · 61cfea9b

由 WuBo 提交于 7月 26, 2011

The access for ro in btrfs_block_group_cache should be protected
because of the racy lock in relocation.
Signed-off-by: NWu Bo <wu.bo@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

61cfea9b

Btrfs: return error to caller when btrfs_unlink() failes · b532402e

由 Tsutomu Itoh 提交于 7月 19, 2011

When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink()
are error, the error code is returned to the caller instead of
BUG_ON().
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b532402e

28 7月, 2011 7 次提交

Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors · 75c195a2

由 Chris Mason 提交于 7月 27, 2011

The btrfs transaction code will return any errors that come from
reserve_metadata_bytes.  We need to make sure we don't return funny
things like 1 or EAGAIN.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75c195a2

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

Btrfs: switch the btrfs tree locks to reader/writer · bd681513

由 Chris Mason 提交于 7月 16, 2011

The btrfs metadata btree is the source of significant
lock contention, especially in the root node.   This
commit changes our locking to use a reader/writer
lock.

The lock is built on top of rw spinlocks, and it
extends the lock tracking to remember if we have a
read lock or a write lock when we go to blocking.  Atomics
count the number of blocking readers or writers at any
given time.

It removes all of the adaptive spinning from the old code
and uses only the spinning/blocking hints inside of btrfs
to decide when it should continue spinning.

In read heavy workloads this is dramatically faster.  In write
heavy workloads we're still faster because of less contention
on the root node lock.

We suffer slightly in dbench because we schedule more often
during write locks, but all other benchmarks so far are improved.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bd681513

Btrfs: fix BUG_ON() caused by ENOSPC when relocating space · 199c36ea

由 Miao Xie 提交于 7月 15, 2011

When we balanced the chunks across the devices, BUG_ON() in
__finish_chunk_alloc() was triggered.

------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:2568!
[SNIP]
Call Trace:
 [<ffffffffa049525e>] btrfs_alloc_chunk+0x8e/0xa0 [btrfs]
 [<ffffffffa04546b0>] do_chunk_alloc+0x330/0x3a0 [btrfs]
 [<ffffffffa045c654>] btrfs_reserve_extent+0xb4/0x1f0 [btrfs]
 [<ffffffffa045c86b>] btrfs_alloc_free_block+0xdb/0x350 [btrfs]
 [<ffffffffa048a8d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa04476fd>] __btrfs_cow_block+0x14d/0x5e0 [btrfs]
 [<ffffffffa044660d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0447c9b>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa044dd5e>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa044f07d>] btrfs_insert_empty_items+0x8d/0xf0 [btrfs]
 [<ffffffffa045e973>] insert_with_overflow+0x43/0x110 [btrfs]
 [<ffffffffa045eb0d>] btrfs_insert_dir_item+0xcd/0x1f0 [btrfs]
 [<ffffffffa0489bd0>] ? map_extent_buffer+0xb0/0xc0 [btrfs]
 [<ffffffff812276ad>] ? rb_insert_color+0x9d/0x160
 [<ffffffffa046cc40>] ? inode_tree_add+0xf0/0x150 [btrfs]
 [<ffffffffa0474801>] btrfs_add_link+0xc1/0x1c0 [btrfs]
 [<ffffffff811dacac>] ? security_inode_init_security+0x1c/0x30
 [<ffffffffa04a28aa>] ? btrfs_init_acl+0x4a/0x180 [btrfs]
 [<ffffffffa047492f>] btrfs_add_nondir+0x2f/0x70 [btrfs]
 [<ffffffffa046af16>] ? btrfs_init_inode_security+0x46/0x60 [btrfs]
 [<ffffffffa0474ac0>] btrfs_create+0x150/0x1d0 [btrfs]
 [<ffffffff81159c63>] ? generic_permission+0x23/0xb0
 [<ffffffff8115b415>] vfs_create+0xa5/0xc0
 [<ffffffff8115ce6e>] do_last+0x5fe/0x880
 [<ffffffff8115dc0d>] path_openat+0xcd/0x3d0
 [<ffffffff8115e029>] do_filp_open+0x49/0xa0
 [<ffffffff8116a965>] ? alloc_fd+0x95/0x160
 [<ffffffff8114f0c7>] do_sys_open+0x107/0x1e0
 [<ffffffff810bcc3f>] ? audit_syscall_entry+0x1bf/0x1f0
 [<ffffffff8114f1e0>] sys_open+0x20/0x30
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa049444a>] __finish_chunk_alloc+0x20a/0x220 [btrfs]

The reason is:
Task1					Space balance task
do_chunk_alloc()
  __finish_chunk_alloc()
    update device info
    in the chunk tree
      alloc system metadata block
					relocate system metadata block group
					  set system metadata block group
					  readonly, This block group is the
					  only one that can allocate space. So
					  there is no free space that can be
					  allocated now.
        find no space and don't try
        to alloc new chunk, and then
        return ENOSPC
  BUG_ON() in __finish_chunk_alloc()
  was triggered.

Fix this bug by allocating a new system metadata chunk before relocating the
old one if we find there is no free space which can be allocated after setting
the old block group to be read-only.
Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

199c36ea

Btrfs: fix enospc problems with delalloc · 9e0baf60

由 Josef Bacik 提交于 7月 15, 2011

So I had this brilliant idea to use atomic counters for outstanding and reserved
extents, but this turned out to be a bad idea.  Consider this where we have 1
outstanding extent and 1 reserved extent

Reserver				Releaser
					atomic_dec(outstanding) now 0
atomic_read(outstanding)+1 get 1
atomic_read(reserved) get 1
don't actually reserve anything because
they are the same
					atomic_cmpxchg(reserved, 1, 0)
atomic_inc(outstanding)
atomic_add(0, reserved)
					free reserved space for 1 extent

Then the reserver now has no actual space reserved for it, and when it goes to
finish the ordered IO it won't have enough space to do it's allocation and you
get those lovely warnings.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9e0baf60

Btrfs: don't flush delalloc arbitrarily · a5991428

由 Josef Bacik 提交于 7月 15, 2011

Kill the check to see if we have 512mb of reserved space in delalloc and
shrink_delalloc if we do.  This causes unexpected latencies and we have other
logic to see if we need to throttle.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5991428

Btrfs: use a worker thread to do caching · bab39bf9

由 Josef Bacik 提交于 6月 30, 2011

A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,
Reported-by: NRoman Mamedov <rm@romanrm.ru>
Signed-off-by: NJosef Bacik <josef@redhat.com>

bab39bf9

26 7月, 2011 2 次提交

btrfs: don't BUG_ON allocation errors in btrfs_drop_snapshot · 38a1a919

由 Mark Fasheh 提交于 7月 13, 2011

In addition to properly handling allocation failure from btrfs_alloc_path, I
also fixed up the kzalloc error handling code immediately below it.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

38a1a919

btrfs: Don't BUG_ON alloc_path errors in find_next_chunk · 92b8e897

由 Mark Fasheh 提交于 7月 12, 2011

I also removed the BUG_ON from error return of find_next_chunk in
init_first_rw_device(). It turns out that the only caller of
init_first_rw_device() also BUGS on any nonzero return so no actual behavior
change has occurred here.

do_chunk_alloc() also needed an update since it calls btrfs_alloc_chunk()
which can now return -ENOMEM. Instead of setting space_info->full on any
error from btrfs_alloc_chunk() I catch and return every error value _except_
-ENOSPC. Thanks goes to Tsutomu Itoh for pointing that issue out.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

92b8e897

15 7月, 2011 1 次提交

btrfs: don't BUG_ON btrfs_alloc_path() errors · d8926bb3

由 Mark Fasheh 提交于 7月 13, 2011

This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation
failure. All the sites that are fixed in this patch were checked by me to
be fairly trivial to fix because of at least one of two criteria:

 - Callers of the function catch errors from it already so bubbling the
   error up will be handled.
 - Callers of the function might BUG_ON any nonzero return code in which
   case there is no behavior changed (but we still got to remove a BUG_ON)

The following functions were updated:

btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group,
btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written,
btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink,
insert_reserved_file_extent, and run_delalloc_nocow
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d8926bb3

11 7月, 2011 2 次提交

Btrfs: serialize flushers in reserve_metadata_bytes · fdb5effd

由 Josef Bacik 提交于 6月 07, 2011

We keep having problems with early enospc, and that's because our method of
making space is inherently racy. The problem is we can have one guy trying to
make space for himself, and in the meantime people come in and steal his
reservation. In order to stop this we make a waitqueue and put anybody who
comes into reserve_metadata_bytes on that waitqueue if somebody is trying to
make more space. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fdb5effd

Btrfs: do transaction space reservation before joining the transaction · b5009945

由 Josef Bacik 提交于 6月 07, 2011

We have to do weird things when handling enospc in the transaction joining code.
Because we've already joined the transaction we cannot commit the transaction
within the reservation code since it will deadlock, so we have to return EAGAIN
and then make sure we don't retry too many times. Instead of doing this, just
do the reservation the normal way before we join the transaction, that way we
can do whatever we want to try and reclaim space, and then if it fails we know
for sure we are out of space and we can return ENOSPC. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

b5009945

25 6月, 2011 1 次提交

Btrfs: fix type mismatch in find_free_extent() · e0f54067

由 Ilya Dryomov 提交于 6月 18, 2011

data parameter should be u64 because a full-sized chunk flags field is
passed instead of 0/1 for distinguishing data from metadata.  All
underlying functions expect u64.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e0f54067

13 6月, 2011 1 次提交

Btrfs: drop the delalloc_bytes check in shrink_delalloc · f4c44016

由 Chris Mason 提交于 6月 13, 2011

Even when delalloc_bytes is zero, we may need to sleep while waiting
for delalloc space.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4c44016

09 6月, 2011 2 次提交

Btrfs: fix the allocator loop logic · 723bda20

由 Josef Bacik 提交于 5月 27, 2011

I was testing with empty_cluster = 0 to try and reproduce a problem and kept
hitting early enospc panics.  This was because our loop logic was a little
confused.  So this is what I did

1) Make the loop variable the ultimate decider on wether we should loop again
isntead of checking to see if we had an uncached bg, empty size or empty
cluster.

2) Increment loop before checking to see what we are on to make the loop
definitions make more sense.

3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0
unless we didn't actually allocate a chunk.  If we did allocate a chunk we
should be able to easily setup a new cluster so clearing
empty_size/empty_cluster makes us less efficient.

This kept me from hitting panics while trying to reproduce the other problem.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

723bda20

Btrfs: don't commit the transaction if we dont have enough pinned bytes · f2bb8f5c

由 Josef Bacik 提交于 5月 25, 2011

I noticed when running an enospc test that we would get stuck committing the
transaction in check_data_space even though we truly didn't have enough space.
So check to see if bytes_pinned is bigger than num_bytes, if it's not don't
commit the transaction. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

f2bb8f5c

04 6月, 2011 1 次提交

btrfs: add helper for fs_info->closing · 7841cb28

由 David Sterba 提交于 5月 31, 2011

wrap checking of filesystem 'closing' flag and fix a few missing memory
barriers.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

7841cb28

24 5月, 2011 7 次提交

Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item · 1cd30799

由 Tsutomu Itoh 提交于 5月 19, 2011

Currently, btrfs_truncate_item and btrfs_extend_item returns only 0.
So, the check by BUG_ON in the caller is unnecessary.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cd30799

btrfs: don't spin in shrink_delalloc if there is nothing to free · c4f675cd

由 Sergei Trofimovich 提交于 5月 20, 2011

Observed as a large delay when --mixed filesystem is filled up.
Test example:
1. create tiny --mixed FS:
   $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1
   $ mkfs.btrfs --mixed 2G.img
   $ mount -oloop 2G.img /mnt/ut/
2. Try to fill it up:
   $ dd if=/dev/urandom of=10M.file bs=10240 count=1024
   $ seq 1 256 | while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done

Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC
message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux
it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution).

No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning
in shrink_delalloc.
Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c4f675cd

Btrfs: don't try to allocate from a block group that doesn't have enough space · cca1c81f

由 Josef Bacik 提交于 5月 13, 2011

If we have a very large filesystem, we can spend a lot of time in
find_free_extent just trying to allocate from empty block groups.  So instead
check to see if the block group even has enough space for the allocation, and if
not go on to the next block group.
Signed-off-by: NJosef Bacik <josef@redhat.com>

cca1c81f

Btrfs: don't always do readahead · 026fd317

由 Josef Bacik 提交于 5月 13, 2011

Our readahead is sort of sloppy, and really isn't always needed. For example if
ls is doing a stating ls (which is the default) it's going to stat in non-disk
order, so if say you have a directory with a stupid amount of files, readahead
is going to do nothing but waste time in the case of doing the stat. Taking the
unconditional readahead out made my test go from 57 minutes to 36 minutes. This
means that everywhere we do loop through the tree we want to make sure we do set
path->reada properly, so I went through and found all of the places where we
loop through the path and set reada to 1. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

026fd317

Btrfs: try not to sleep as much when doing slow caching · 589d8ade

由 Josef Bacik 提交于 5月 11, 2011

When the fs is super full and we unmount the fs, we could get stuck in this
thing where unmount is waiting for the caching kthread to make progress and the
caching kthread keeps scheduling because we're in the middle of a commit. So
instead just let the caching kthread keep going and only yeild if
need_resched(). This makes my horrible umount case go from taking up to 10
minutes to taking less than 20 seconds. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

589d8ade

Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d

由 Josef Bacik 提交于 5月 11, 2011

Originally this was going to be used as a way to give hints to the allocator,
but frankly we can get much better hints elsewhere and it's not even used at all
for anything usefull. In addition to be completely useless, when we initialize
an inode we try and find a freeish block group to set as the inodes block group,
and with a completely full 40gb fs this takes _forever_, so I imagine with say
1tb fs this is just unbearable. So just axe the thing altoghether, we don't
need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
inode lookup in my testcase. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d82a6f1d

Btrfs: fix how we do space reservation for truncate · fcb80c2a

由 Josef Bacik 提交于 5月 03, 2011

The ceph guys keep running into problems where we have space reserved in our
orphan block rsv when freeing it up. This is because they tend to do snapshots
alot, so their truncates tend to use a bunch of space, so when we go to do
things like update the inode we have to steal reservation space in order to make
the reservation happen. This happens because truncate can use as much space as
it freaking feels like, but we still have to hold space for removing the orphan
item and updating the inode, which will definitely always happen. So in order
to fix this we need to split all of the reservation stuf up. So with this patch
we have

1) The orphan block reserve which only holds the space for deleting our orphan
item when everything is over.

2) The truncate block reserve which gets allocated and used specifically for the
space that the truncate will use on a per truncate basis.

3) The transaction will always have 1 item's worth of data reserved so we can
update the inode normally.

Hopefully this will make the ceph problem go away. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fcb80c2a

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功