提交 · 75c195a2cac2c3c8366c0b87de2d6814c4f4d638 · openeuler / raspberrypi-kernel

28 7月, 2011 14 次提交

Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors · 75c195a2

由 Chris Mason 提交于 7月 27, 2011

The btrfs transaction code will return any errors that come from
reserve_metadata_bytes.  We need to make sure we don't return funny
things like 1 or EAGAIN.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75c195a2

Btrfs: use the commit_root for reading free_space_inode crcs · 2cf8572d

由 Chris Mason 提交于 7月 26, 2011

Now that we are using regular file crcs for the free space cache,
we can deadlock if we try to read the free_space_inode while we are
updating the crc tree.

This commit fixes things by using the commit_root to read the crcs.  This is
safe because we the free space cache file would already be loaded if
that block group had been changed in the current transaction.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2cf8572d

Btrfs: reduce extent_state lock contention for metadata · 19b6caf4

由 Chris Mason 提交于 7月 25, 2011

For metadata buffers that don't straddle pages (all of them), btrfs
can safely use the page uptodate bits and extent_buffer uptodate bit
instead of needing to use the extent_state tree.

This greatly reduces contention on the state tree lock.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

19b6caf4

Btrfs: remove lockdep magic from btrfs_next_leaf · 31533fb2

由 Chris Mason 提交于 7月 26, 2011

Before the reader/writer locks, btrfs_next_leaf needed to keep
the path blocking to avoid making lockdep upset.

Now that btrfs_next_leaf only takes read locks, this isn't required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

31533fb2

Btrfs: make a lockdep class for each root · 85d4e461

由 Chris Mason 提交于 7月 26, 2011

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d4e461

Btrfs: switch the btrfs tree locks to reader/writer · bd681513

由 Chris Mason 提交于 7月 16, 2011

The btrfs metadata btree is the source of significant
lock contention, especially in the root node.   This
commit changes our locking to use a reader/writer
lock.

The lock is built on top of rw spinlocks, and it
extends the lock tracking to remember if we have a
read lock or a write lock when we go to blocking.  Atomics
count the number of blocking readers or writers at any
given time.

It removes all of the adaptive spinning from the old code
and uses only the spinning/blocking hints inside of btrfs
to decide when it should continue spinning.

In read heavy workloads this is dramatically faster.  In write
heavy workloads we're still faster because of less contention
on the root node lock.

We suffer slightly in dbench because we schedule more often
during write locks, but all other benchmarks so far are improved.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bd681513

Btrfs: fix deadlock when throttling transactions · 81317fde

由 Josef Bacik 提交于 7月 24, 2011

Hit this nice little deadlock.  What happens is this

__btrfs_end_transaction with throttle set, --use_count so it equals 0
  btrfs_commit_transaction
    <somebody else actually manages to start the commit>
    btrfs_end_transaction --use_count so now its -1 <== BAD
      we just return and wait on the transaction

This is bad because we just return after our use_count is -1 and don't let go
of our num_writer count on the transaction, so the guy committing the
transaction just sits there forever.  Fix this by inc'ing our use_count if we're
going to call commit_transaction so that if we call btrfs_end_transaction it's
valid.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

81317fde

Btrfs: stop using highmem for extent_buffers · a6591715

由 Chris Mason 提交于 7月 19, 2011

The extent_buffers have a very complex interface where
we use HIGHMEM for metadata and try to cache a kmap mapping
to access the memory.

The next commit adds reader/writer locks, and concurrent use
of this kmap cache would make it even more complex.

This commit drops the ability to use HIGHMEM with extent buffers,
and rips out all of the related code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6591715

Btrfs: fix BUG_ON() caused by ENOSPC when relocating space · 199c36ea

由 Miao Xie 提交于 7月 15, 2011

When we balanced the chunks across the devices, BUG_ON() in
__finish_chunk_alloc() was triggered.

------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:2568!
[SNIP]
Call Trace:
 [<ffffffffa049525e>] btrfs_alloc_chunk+0x8e/0xa0 [btrfs]
 [<ffffffffa04546b0>] do_chunk_alloc+0x330/0x3a0 [btrfs]
 [<ffffffffa045c654>] btrfs_reserve_extent+0xb4/0x1f0 [btrfs]
 [<ffffffffa045c86b>] btrfs_alloc_free_block+0xdb/0x350 [btrfs]
 [<ffffffffa048a8d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa04476fd>] __btrfs_cow_block+0x14d/0x5e0 [btrfs]
 [<ffffffffa044660d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0447c9b>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa044dd5e>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa044f07d>] btrfs_insert_empty_items+0x8d/0xf0 [btrfs]
 [<ffffffffa045e973>] insert_with_overflow+0x43/0x110 [btrfs]
 [<ffffffffa045eb0d>] btrfs_insert_dir_item+0xcd/0x1f0 [btrfs]
 [<ffffffffa0489bd0>] ? map_extent_buffer+0xb0/0xc0 [btrfs]
 [<ffffffff812276ad>] ? rb_insert_color+0x9d/0x160
 [<ffffffffa046cc40>] ? inode_tree_add+0xf0/0x150 [btrfs]
 [<ffffffffa0474801>] btrfs_add_link+0xc1/0x1c0 [btrfs]
 [<ffffffff811dacac>] ? security_inode_init_security+0x1c/0x30
 [<ffffffffa04a28aa>] ? btrfs_init_acl+0x4a/0x180 [btrfs]
 [<ffffffffa047492f>] btrfs_add_nondir+0x2f/0x70 [btrfs]
 [<ffffffffa046af16>] ? btrfs_init_inode_security+0x46/0x60 [btrfs]
 [<ffffffffa0474ac0>] btrfs_create+0x150/0x1d0 [btrfs]
 [<ffffffff81159c63>] ? generic_permission+0x23/0xb0
 [<ffffffff8115b415>] vfs_create+0xa5/0xc0
 [<ffffffff8115ce6e>] do_last+0x5fe/0x880
 [<ffffffff8115dc0d>] path_openat+0xcd/0x3d0
 [<ffffffff8115e029>] do_filp_open+0x49/0xa0
 [<ffffffff8116a965>] ? alloc_fd+0x95/0x160
 [<ffffffff8114f0c7>] do_sys_open+0x107/0x1e0
 [<ffffffff810bcc3f>] ? audit_syscall_entry+0x1bf/0x1f0
 [<ffffffff8114f1e0>] sys_open+0x20/0x30
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa049444a>] __finish_chunk_alloc+0x20a/0x220 [btrfs]

The reason is:
Task1					Space balance task
do_chunk_alloc()
  __finish_chunk_alloc()
    update device info
    in the chunk tree
      alloc system metadata block
					relocate system metadata block group
					  set system metadata block group
					  readonly, This block group is the
					  only one that can allocate space. So
					  there is no free space that can be
					  allocated now.
        find no space and don't try
        to alloc new chunk, and then
        return ENOSPC
  BUG_ON() in __finish_chunk_alloc()
  was triggered.

Fix this bug by allocating a new system metadata chunk before relocating the
old one if we find there is no free space which can be allocated after setting
the old block group to be read-only.
Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

199c36ea

Btrfs: tag pages for writeback in sync · f7aaa06b

由 Josef Bacik 提交于 7月 15, 2011

Everybody else does this, we need to do it too. If we're syncing, we need to
tag the pages we're going to write for writeback so we don't end up writing the
same stuff over and over again if somebody is constantly redirtying our file.
This will keep us from having latencies with heavy sync workloads. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f7aaa06b

Btrfs: fix enospc problems with delalloc · 9e0baf60

由 Josef Bacik 提交于 7月 15, 2011

So I had this brilliant idea to use atomic counters for outstanding and reserved
extents, but this turned out to be a bad idea.  Consider this where we have 1
outstanding extent and 1 reserved extent

Reserver				Releaser
					atomic_dec(outstanding) now 0
atomic_read(outstanding)+1 get 1
atomic_read(reserved) get 1
don't actually reserve anything because
they are the same
					atomic_cmpxchg(reserved, 1, 0)
atomic_inc(outstanding)
atomic_add(0, reserved)
					free reserved space for 1 extent

Then the reserver now has no actual space reserved for it, and when it goes to
finish the ordered IO it won't have enough space to do it's allocation and you
get those lovely warnings.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9e0baf60

Btrfs: don't flush delalloc arbitrarily · a5991428

由 Josef Bacik 提交于 7月 15, 2011

Kill the check to see if we have 512mb of reserved space in delalloc and
shrink_delalloc if we do.  This causes unexpected latencies and we have other
logic to see if we need to throttle.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a5991428

Btrfs: use find_or_create_page instead of grab_cache_page · a94733d0

由 Josef Bacik 提交于 7月 11, 2011

grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we
need GFP_NOFS so we don't deadlock. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

a94733d0

Btrfs: use a worker thread to do caching · bab39bf9

由 Josef Bacik 提交于 6月 30, 2011

A user reported a deadlock when copying a bunch of files. This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread. The page was locked by the person
starting the caching kthread. To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks. Thanks,
Reported-by: NRoman Mamedov <rm@romanrm.ru>
Signed-off-by: NJosef Bacik <josef@redhat.com>

bab39bf9

11 7月, 2011 5 次提交

Btrfs: fix how we merge extent states and deal with cached states · df98b6e2

由 Josef Bacik 提交于 6月 20, 2011

First, we can sometimes free the state we're merging, which means anybody who
calls merge_state() may have the state it passed in free'ed. This is
problematic because we could end up caching the state, which makes caching
useless as the state will no longer be part of the tree. So instead of free'ing
the state we passed into merge_state(), set it's end to the other->end and free
the other state. This way we are sure to cache the correct state. Also because
we can merge states together, instead of only using the cache'd state if it's
start == the start we are looking for, go ahead and use it if the start we are
looking for is within the range of the cached state. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

df98b6e2

Btrfs: use the normal checksumming infrastructure for free space cache · 2f356126

由 Josef Bacik 提交于 6月 10, 2011

We used to store the checksums of the space cache directly in the space cache,
however that doesn't work out too well if we have more space than we can fit the
checksums into the first page. So instead use the normal checksumming
infrastructure. There were problems with doing this originally but those
problems don't exist now so this works out fine. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2f356126

Btrfs: serialize flushers in reserve_metadata_bytes · fdb5effd

由 Josef Bacik 提交于 6月 07, 2011

We keep having problems with early enospc, and that's because our method of
making space is inherently racy. The problem is we can have one guy trying to
make space for himself, and in the meantime people come in and steal his
reservation. In order to stop this we make a waitqueue and put anybody who
comes into reserve_metadata_bytes on that waitqueue if somebody is trying to
make more space. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fdb5effd

Btrfs: do transaction space reservation before joining the transaction · b5009945

由 Josef Bacik 提交于 6月 07, 2011

We have to do weird things when handling enospc in the transaction joining code.
Because we've already joined the transaction we cannot commit the transaction
within the reservation code since it will deadlock, so we have to return EAGAIN
and then make sure we don't retry too many times. Instead of doing this, just
do the reservation the normal way before we join the transaction, that way we
can do whatever we want to try and reclaim space, and then if it fails we know
for sure we are out of space and we can return ENOSPC. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

b5009945

Btrfs: try to only do one btrfs_search_slot in do_setxattr · fa09200b

由 Josef Bacik 提交于 5月 27, 2011

I've been watching how many btrfs_search_slot()'s we do and I noticed that when
we create a file with selinux enabled we were doing 2 each time we initialize
the security context. That's because we lookup the xattr first so we can delete
it if we're setting a new value to an existing xattr. But in the create case we
don't have any xattrs, so it is completely useless to have the extra lookup. So
re-arrange things so that we only lookup first if we specifically have
XATTR_REPLACE. That way in the basic case we only do 1 search, and in the more
complicated case we do the normal 2 lookups. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fa09200b

07 7月, 2011 3 次提交

btrfs: fix oops when doing space balance · 149e2d76

由 Miao Xie 提交于 7月 06, 2011

We need to make sure the data relocation inode doesn't go through
the delayed metadata updates, otherwise we get an oops during balance:

kernel BUG at fs/btrfs/relocation.c:4303!
[SNIP]
Call Trace:
 [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs]
 [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs]
 [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
 [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs]
 [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
 [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs]
 [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs]
 [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs]
 [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs]
 [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130
 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs]
 [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs]
 [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs]
 [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs]
 [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs]
 [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs]
 [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs]
 [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs]
 [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs]
 [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270
 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0
 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540
 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0
 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

149e2d76

Btrfs: don't panic if we get an error while balancing V2 · 508794eb

由 Josef Bacik 提交于 7月 02, 2011

A user reported an error where if we try to balance an fs after a device has
been removed it will blow up. This is because we get an EIO back and this is
where BUG_ON(ret) bites us in the ass. To fix we just exit. Thanks,
Reported-by: NAnand Jain <Anand.Jain@oracle.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

508794eb

btrfs: add missing options displayed in mount output · 0942caa3

由 David Sterba 提交于 6月 28, 2011

There are three missed mount options settable by user which are not
currently displayed in mount output.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0942caa3

27 6月, 2011 1 次提交

btrfs: fix inconsonant inode information · 2f7e33d4

由 Miao Xie 提交于 6月 23, 2011

When iputting the inode, We may leave the delayed nodes if they have some
delayed items that have not been dealt with. So when the inode is read again,
we must look up the relative delayed node, and use the information in it to
initialize the inode. Or we will get inconsonant inode information, it may
cause that the same directory index number is allocated again, and hit the
following oops:

[ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the
insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17)
[ 5447.569766] ------------[ cut here ]------------
[ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301!
[SNIP]
[ 5447.790721] Call Trace:
[ 5447.793191]  [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs]
[ 5447.800156]  [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs]
[ 5447.806517]  [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs]
[ 5447.812876]  [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs]
[ 5447.818961]  [<ffffffff8111f840>] vfs_create+0x72/0x92
[ 5447.824090]  [<ffffffff8111fa8c>] do_last+0x22c/0x40b
[ 5447.829133]  [<ffffffff8112076a>] path_openat+0xc0/0x2ef
[ 5447.834438]  [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44
[ 5447.841216]  [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67
[ 5447.847846]  [<ffffffff81121a79>] do_filp_open+0x3d/0x87
[ 5447.853156]  [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d
[ 5447.859072]  [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80
[ 5447.864636]  [<ffffffff8111f179>] ? do_getname+0x14b/0x173
[ 5447.870112]  [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26
[ 5447.875682]  [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10
[ 5447.880882]  [<ffffffff81112d39>] do_sys_open+0x69/0xae
[ 5447.886153]  [<ffffffff81112db1>] sys_open+0x20/0x22
[ 5447.891114]  [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b

Fix it by reusing the old delayed node.
Reported-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Tested-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2f7e33d4

25 6月, 2011 3 次提交

Btrfs: make sure to update total_bitmaps when freeing cache V3 · 9b90f513

由 Josef Bacik 提交于 6月 24, 2011

A user reported this bug again where we have more bitmaps than we are supposed
to. This is because we failed to load the free space cache, but don't update
the ctl->total_bitmaps counter when we remove entries from the tree. This patch
fixes this problem and we should be good to go again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9b90f513

Btrfs: fix type mismatch in find_free_extent() · e0f54067

由 Ilya Dryomov 提交于 6月 18, 2011

data parameter should be u64 because a full-sized chunk flags field is
passed instead of 0/1 for distinguishing data from metadata.  All
underlying functions expect u64.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e0f54067

Btrfs: make sure to record the transid in new inodes · 1973f0fa

由 Chris Mason 提交于 6月 24, 2011

When we create a new inode, we aren't filling in the
field that records the transaction that last changed this
inode.

If we then go to fsync that inode, it will be skipped because the field
isn't filled in.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1973f0fa

18 6月, 2011 6 次提交

Btrfs: avoid delayed metadata items during commits · e999376f

由 Chris Mason 提交于 6月 17, 2011

Snapshot creation has two phases.  One is the initial snapshot setup,
and the second is done during commit, while nobody is allowed to modify
the root we are snapshotting.

The delayed metadata insertion code can break that rule, it does a
delayed inode update on the inode of the parent of the snapshot,
and delayed directory item insertion.

This makes sure to run the pending delayed operations before we
record the snapshot root, which avoids corruptions.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e999376f

btrfs: fix uninitialized return value · 35a30d7c

由 David Sterba 提交于 6月 13, 2011

When allocation fails in btrfs_read_fs_root_no_name, ret is not set
although it is returned, holding a garbage value.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35a30d7c

btrfs: fix wrong reservation when doing delayed inode operations · 19fd2949

由 Miao Xie 提交于 6月 15, 2011

We have migrated the space for the delayed inode items from
trans_block_rsv to global_block_rsv, but we forgot to set trans->block_rsv to
global_block_rsv when we doing delayed inode operations, and the following Oops
happened:

[ 9792.654889] ------------[ cut here ]------------
[ 9792.654898] WARNING: at fs/btrfs/extent-tree.c:5681
btrfs_alloc_free_block+0xca/0x27c [btrfs]()
[ 9792.654899] Hardware name: To Be Filled By O.E.M.
[ 9792.654900] Modules linked in: btrfs zlib_deflate libcrc32c
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
arc4 rt61pci rt2x00pci rt2x00lib snd_hda_codec_hdmi mac80211
snd_hda_codec_realtek cfg80211 snd_hda_intel edac_core snd_seq rfkill
pcspkr serio_raw snd_hda_codec eeprom_93cx6 edac_mce_amd sp5100_tco
i2c_piix4 k10temp snd_hwdep snd_seq_device snd_pcm floppy r8169 xhci_hcd
mii snd_timer snd soundcore snd_page_alloc ipv6 firewire_ohci pata_acpi
ata_generic firewire_core pata_via crc_itu_t radeon ttm drm_kms_helper
drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
[ 9792.654919] Pid: 2762, comm: rm Tainted: G        W   2.6.39+ #1
[ 9792.654920] Call Trace:
[ 9792.654922]  [<ffffffff81053c4a>] warn_slowpath_common+0x83/0x9b
[ 9792.654925]  [<ffffffff81053c7c>] warn_slowpath_null+0x1a/0x1c
[ 9792.654933]  [<ffffffffa038e747>] btrfs_alloc_free_block+0xca/0x27c [btrfs]
[ 9792.654945]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.654953]  [<ffffffffa038189b>] __btrfs_cow_block+0xfc/0x30c [btrfs]
[ 9792.654963]  [<ffffffffa0396aa6>] ? btrfs_buffer_uptodate+0x47/0x58 [btrfs]
[ 9792.654970]  [<ffffffffa0382e48>] ? read_block_for_search+0x94/0x368 [btrfs]
[ 9792.654978]  [<ffffffffa0381ba9>] btrfs_cow_block+0xfe/0x146 [btrfs]
[ 9792.654986]  [<ffffffffa03848b0>] btrfs_search_slot+0x14d/0x4b6 [btrfs]
[ 9792.654997]  [<ffffffffa03b8562>] ? map_extent_buffer+0x6e/0xa8 [btrfs]
[ 9792.655022]  [<ffffffffa03938e8>] btrfs_lookup_inode+0x2f/0x8f [btrfs]
[ 9792.655025]  [<ffffffff8147afac>] ? _cond_resched+0xe/0x22
[ 9792.655027]  [<ffffffff8147b892>] ? mutex_lock+0x29/0x50
[ 9792.655039]  [<ffffffffa03d41b1>] btrfs_update_delayed_inode+0x72/0x137 [btrfs]
[ 9792.655051]  [<ffffffffa03d4ea2>] btrfs_run_delayed_items+0x90/0xdb [btrfs]
[ 9792.655062]  [<ffffffffa039a69b>] btrfs_commit_transaction+0x228/0x654 [btrfs]
[ 9792.655064]  [<ffffffff8106e8da>] ? remove_wait_queue+0x3a/0x3a
[ 9792.655075]  [<ffffffffa03a2fa5>] btrfs_evict_inode+0x14d/0x202 [btrfs]
[ 9792.655077]  [<ffffffff81132bd6>] evict+0x71/0x111
[ 9792.655079]  [<ffffffff81132de0>] iput+0x12a/0x132
[ 9792.655081]  [<ffffffff8112aa3a>] do_unlinkat+0x106/0x155
[ 9792.655083]  [<ffffffff81127b83>] ? path_put+0x1f/0x23
[ 9792.655085]  [<ffffffff8109c53c>] ? audit_syscall_entry+0x145/0x171
[ 9792.655087]  [<ffffffff81128410>] ? putname+0x34/0x36
[ 9792.655090]  [<ffffffff8112b441>] sys_unlinkat+0x29/0x2b
[ 9792.655092]  [<ffffffff81482c42>] system_call_fastpath+0x16/0x1b
[ 9792.655093] ---[ end trace 02b696eb02b3f768 ]---

This patch fix it by setting the reservation of the transaction handle to the
correct one.
Reported-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

19fd2949

btrfs: Remove unused sysfs code · 9fe6a50f

由 Maarten Lankhorst 提交于 6月 16, 2011

Removes code no longer used. The sysfs file itself is kept, because the
btrfs developers expressed interest in putting new entries to sysfs.
Signed-off-by: NMaarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9fe6a50f

btrfs: fix dereference of ERR_PTR value · 3ed4498c

由 David Sterba 提交于 6月 13, 2011

smatch reports:

btrfs_recover_log_trees error: 'wc.replay_dest' dereferencing
possible ERR_PTR()
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3ed4498c

Btrfs: fix relocation races · 7585717f

由 Chris Mason 提交于 6月 13, 2011

The recent commit to get rid of our trans_mutex introduced
some races with block group relocation.  The problem is that relocation
needs to do some record keeping about each root, and it was relying
on the transaction mutex to coordinate things in subtle ways.

This fix adds a mutex just for the relocation code and makes sure
it doesn't have a big impact on normal operations.  The race is
really fixed in btrfs_record_root_in_trans, which is where we
step back and wait for the relocation code to finish accounting
setup.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7585717f

16 6月, 2011 3 次提交

Btrfs: set no_trans_join after trying to expand the transaction · ed0ca140

由 Josef Bacik 提交于 6月 14, 2011

We can lockup if we try to allow new writers join the transaction and we have
flushoncommit set or have a pending snapshot. This is because we set
no_trans_join and then loop around and try to wait for ordered extents again.
The problem is the ordered endio stuff needs to join the transaction, which it
can't do because no_trans_join is set. So instead wait until after this loop to
set no_trans_join and then make sure to wait for num_writers == 1 in case
anybody got started in between us exiting the loop and setting no_trans_join.
This could easily be reproduced by mounting -o flushoncommit and running xfstest
13. It cannot be reproduced with this patch. Thanks,
Reported-by: NJim Schutt <jaschut@sandia.gov>
Signed-off-by: NJosef Bacik <josef@redhat.com>

ed0ca140

Btrfs: protect the pending_snapshots list with trans_lock · 8351583e

由 Josef Bacik 提交于 6月 14, 2011

Currently there is nothing protecting the pending_snapshots list on the
transaction. We only hold the directory mutex that we are snapshotting and a
read lock on the subvol_sem, so we could race with somebody else creating a
snapshot in a different directory and end up with list corruption. So protect
this list with the trans_lock. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

8351583e

Btrfs: fix path leakage on subvol deletion · 71d7aed0

由 Josef Bacik 提交于 6月 14, 2011

The delayed ref patch accidently removed the btrfs_free_path in
btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

71d7aed0

13 6月, 2011 2 次提交

Btrfs: drop the delalloc_bytes check in shrink_delalloc · f4c44016

由 Chris Mason 提交于 6月 13, 2011

Even when delalloc_bytes is zero, we may need to sleep while waiting
for delalloc space.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4c44016

Btrfs: check the return value from set_anon_super · ac08aedf

由 Chris Mason 提交于 6月 13, 2011

Al Viro noticed we weren't checking for set_anon_super failures.  This
adds the required checks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ac08aedf

11 6月, 2011 3 次提交

Btrfs: use join_transaction in btrfs_evict_inode() · 30b4caf5

由 Li Zefan 提交于 6月 08, 2011

The WARN_ON() in start_transaction() was triggered while balancing.

The cause is btrfs_relocate_chunk() started a transaction and
then called iput() on the inode that stores free space cache,
and iput() called btrfs_start_transaction() again.
Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

30b4caf5

Btrfs - use %pU to print fsid · 22b63a29

由 Ilya Dryomov 提交于 2月 09, 2011

Get rid of FIXME comment.  Uuids from dmesg are now the same as uuids
given by btrfs-progs.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

22b63a29

Btrfs: fix extent state leak on failed nodatasum reads · 08d2f347

由 Jan Schmidt 提交于 5月 04, 2011

When encountering an EIO while reading from a nodatasum extent, we
insert an error record into the inode's failure tree.
btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd
better clear the failure tree in that case, otherwise the kernel
complains about

	BUG extent_state: Objects remaining on kmem_cache_close()

on rmmod.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

08d2f347