- 06 12月, 2016 3 次提交
-
-
由 Jeff Mahoney 提交于
We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
There are many functions that are always called with the same root argument. Rather than passing the same root every time, we can pass an fs_info pointer instead and have the function get the root pointer itself. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
There are 11 functions that accept a root parameter and immediately overwrite it. We can pass those an fs_info pointer instead. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 28 9月, 2016 1 次提交
-
-
由 Deepa Dinamani 提交于
current_fs_time() uses struct super_block* as an argument. As per Linus's suggestion, this is changed to take struct inode* as a parameter instead. This is because the function is primarily meant for vfs inode timestamps. Also the function was renamed as per Arnd's suggestion. Change all calls to current_fs_time() to use the new current_time() function instead. current_fs_time() will be deleted. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 9月, 2016 4 次提交
-
-
由 Jeff Mahoney 提交于
For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
This patch converts printk(KERN_* style messages to use the pr_* versions. One side effect is that anything that was KERN_DEBUG is now automatically a dynamic debug message. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Since we could get errors from the concurrent aborted transaction, the check of this BUG_ON in start_transaction is not true any more. Say, while flushing free space cache inode's dirty pages, btrfs_finish_ordered_io -> btrfs_join_transaction_nolock (the transaction has been aborted.) -> BUG_ON(type == TRANS_JOIN_NOLOCK); Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 26 9月, 2016 1 次提交
-
-
由 Josef Bacik 提交于
We have a lot of random ints in btrfs_fs_info that can be put into flags. This is mostly equivalent with the exception of how we deal with quota going on or off, now instead we set a flag when we are turning it on or off and deal with that appropriately, rather than just having a pending state that the current quota_enabled gets set to. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 25 8月, 2016 1 次提交
-
-
由 Wang Xiaoguang 提交于
When running fstests generic/068, sometimes we got below deadlock: xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080 ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000 ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001 ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8 Call Trace: [<ffffffff816a9045>] schedule+0x35/0x80 [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140 [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100 [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30 [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs] [<ffffffff810d32b5>] percpu_down_read+0x35/0x50 [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40 [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs] [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs] [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs] [<ffffffff81230a1a>] evict+0xba/0x1a0 [<ffffffff812316b6>] iput+0x196/0x200 [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs] [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs] [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs] [<ffffffff81218040>] freeze_super+0xf0/0x190 [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0 [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140 [<ffffffff81229409>] SyS_ioctl+0x79/0x90 [<ffffffff81003c12>] do_syscall_64+0x62/0x110 [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25 >From this warning, freeze_super() already holds SB_FREEZE_FS, but btrfs_freeze() will call btrfs_commit_transaction() again, if btrfs_commit_transaction() finds that it has delayed iputs to handle, it'll start_transaction(), which will try to get SB_FREEZE_FS lock again, then deadlock occurs. The root cause is that in btrfs, sync_filesystem(sb) does not make sure all metadata is updated. There still maybe some codes adding delayed iputs, see below sample race window: CPU1 | CPU2 |-> freeze_super() | |-> sync_filesystem(sb); | | |-> cleaner_kthread() | | |-> btrfs_delete_unused_bgs() | | |-> btrfs_remove_chunk() | | |-> btrfs_remove_block_group() | | |-> btrfs_add_delayed_iput() | | |-> sb->s_writers.frozen = SB_FREEZE_FS; | |-> sb_wait_write(sb, SB_FREEZE_FS); | | acquire SB_FREEZE_FS lock. | | | |-> btrfs_freeze() | |-> btrfs_commit_transaction() | |-> btrfs_run_delayed_iputs() | | will handle delayed iputs, | | that means start_transaction() | | will be called, which will try | | to get SB_FREEZE_FS lock. | To fix this issue, introduce a "int fs_frozen" to record internally whether fs has been frozen. If fs has been frozen, we can not handle delayed iputs. Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ add comment to btrfs_freeze ] Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 26 7月, 2016 3 次提交
-
-
由 Jeff Mahoney 提交于
__btrfs_abort_transaction doesn't use its root parameter except to obtain an fs_info pointer. We can obtain that from trans->root->fs_info for now and from trans->fs_info in a later patch. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
btrfs_trans_handle->root is documented as for use for confirming that the root passed in to start the transaction is the same as the one ending it. It's used in several places when an fs_info pointer is needed, so let's just add an fs_info pointer directly. Eventually, the root pointer can be removed. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
btrfs_test_opt and friends only use the root pointer to access the fs_info. Let's pass the fs_info directly in preparation to eliminate similar patterns all over btrfs. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 23 6月, 2016 1 次提交
-
-
由 Josef Bacik 提交于
Using the offwakecputime bpf script I noticed most of our time was spent waiting on the delayed ref throttling. This is what is supposed to happen, but sometimes the transaction can commit and then we're waiting for throttling that doesn't matter anymore. So change this stuff to be a little smarter by tracking the transid we were in when we initiated the throttling. If the transaction we get is different then we can just bail out. This resulted in a 50% speedup in my fs_mark test, and reduced the amount of time spent throttling by 60 seconds over the entire run (which is about 30 minutes). Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 18 6月, 2016 2 次提交
-
-
由 David Sterba 提交于
Introduced in 2c1984f2 ("btrfs: build fixup for qgroup_account_snapshot") as temporary bisectability build fixup. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We've renamed btrfs_std_error, this one is left from last merge. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 13 5月, 2016 1 次提交
-
-
由 Filipe Manana 提交于
Before the relocation process of a block group starts, it sets the block group to readonly mode, then flushes all delalloc writes and then finally it waits for all ordered extents to complete. This last step includes waiting for ordered extents destinated at extents allocated in other block groups, making us waste unecessary time. So improve this by waiting only for ordered extents that fall into the block group's range. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
-
- 12 5月, 2016 2 次提交
-
-
由 David Sterba 提交于
The macro btrfs_std_error got renamed to btrfs_handle_fs_error in an independent branch for the same merge target (4.7). To make the code compilable for bisectability reasons, add a temporary stub. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Current btrfs qgroup design implies a requirement that after calling btrfs_qgroup_account_extents() there must be a commit root switch. Normally this is OK, as btrfs_qgroup_accounting_extents() is only called inside btrfs_commit_transaction() just be commit_cowonly_roots(). However there is a exception at create_pending_snapshot(), which will call btrfs_qgroup_account_extents() but no any commit root switch. In case of creating a snapshot whose parent root is itself (create a snapshot of fs tree), it will corrupt qgroup by the following trace: (skipped unrelated data) ====== btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 1 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 0, excl = 0 qgroup_update_counters: qgid = 5, cur_old_count = 0, cur_new_count = 1, rfer = 16384, excl = 16384 btrfs_qgroup_account_extent: bytenr = 29786112, num_bytes = 16384, nr_old_roots = 0, nr_new_roots = 0 ====== The problem here is in first qgroup_account_extent(), the nr_new_roots of the extent is 1, which means its reference got increased, and qgroup increased its rfer and excl. But at second qgroup_account_extent(), its reference got decreased, but between these two qgroup_account_extent(), there is no switch roots. This leads to the same nr_old_roots, and this extent just got ignored by qgroup, which means this extent is wrongly accounted. Fix it by call commit_cowonly_roots() after qgroup_account_extent() in create_pending_snapshot(), with needed preparation. Mark: I added a check at the top of qgroup_account_snapshot() to skip this code if qgroups are turned off. xfstest btrfs/122 exposes this problem. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NMark Fasheh <mfasheh@suse.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 29 4月, 2016 1 次提交
-
-
由 David Sterba 提交于
Single caller passes GFP_NOFS. We can get rid of the gfpflags_allow_blocking checks as NOFS can block but does not recurse to filesystem through reclaim. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 28 4月, 2016 1 次提交
-
-
由 Anand Jain 提交于
btrfs_std_error() handles errors, puts FS into readonly mode (as of now). So its good idea to rename it to btrfs_handle_fs_error(). Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ edit changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 18 2月, 2016 2 次提交
-
-
由 Deepa Dinamani 提交于
CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_fs_time() instead. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: linux-btrfs@vger.kernel.org Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
There are a few places where we add to trans->bytes_reserved but don't have the corresponding trace point. With these added my tool no longer sees transaction leaks. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 07 1月, 2016 3 次提交
-
-
由 David Sterba 提交于
We can also preallocate btrfs_path that's used during pending snapshot creation and avoid another late ENOMEM failure. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The actual snapshot creation is delayed until transaction commit. If we cannot get enough memory for the root item there, we have to fail the whole transaction commit which is bad. So we'll allocate the memory at the ioctl call and pass it along with the pending_snapshot struct. The potential ENOMEM will be returned to the caller of snapshot ioctl. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 17 12月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
When a transaction is aborted, or its commit fails before writing the new superblock and calling btrfs_finish_extent_commit(), we leak reference counts on the block groups attached to the transaction's delete_bgs list, because btrfs_finish_extent_commit() is never called for those two cases. Fix this by dropping their references at btrfs_put_transaction(), which is called when transactions are aborted (by making the transaction kthread commit the transaction) or if their commits fail. Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
- 10 12月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
As of my previous change titled "Btrfs: fix scrub preventing unused block groups from being deleted", the following warning at extent-tree.c:btrfs_delete_unused_bgs() can be hit when we mount the a filesysten with "-o discard": 10263 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info) 10264 { (...) 10405 if (trimming) { 10406 WARN_ON(!list_empty(&block_group->bg_list)); 10407 spin_lock(&trans->transaction->deleted_bgs_lock); 10408 list_move(&block_group->bg_list, 10409 &trans->transaction->deleted_bgs); 10410 spin_unlock(&trans->transaction->deleted_bgs_lock); 10411 btrfs_get_block_group(block_group); 10412 } (...) This happens because scrub can now add back the block group to the list of unused block groups (fs_info->unused_bgs). This is dangerous because we are moving the block group from the unused block groups list to the list of deleted block groups without holding the lock that protects the source list (fs_info->unused_bgs_lock). The following diagram illustrates how this happens: CPU 1 CPU 2 cleaner_kthread() btrfs_delete_unused_bgs() sees bg X in list fs_info->unused_bgs deletes bg X from list fs_info->unused_bgs scrub_enumerate_chunks() searches device tree using its commit root finds device extent for block group X gets block group X from the tree fs_info->block_group_cache_tree (via btrfs_lookup_block_group()) sets bg X to RO (again) scrub_chunk(bg X) sets bg X back to RW mode adds bg X to the list fs_info->unused_bgs again, since it's still unused and currently not in that list sets bg X to RO mode btrfs_remove_chunk(bg X) --> discard is enabled and bg X is in the fs_info->unused_bgs list again so the warning is triggered --> we move it from that list into the transaction's delete_bgs list, but we can have another task currently manipulating the first list (fs_info->unused_bgs) Fix this by using the same lock (fs_info->unused_bgs_lock) to protect both the list of unused block groups and the list of deleted block groups. This makes it safe and there's not much worry for more lock contention, as this lock is seldom used and only the cleaner kthread adds elements to the list of deleted block groups. The warning goes away too, as this was previously an impossible case (and would have been better a BUG_ON/ASSERT) but it's not impossible anymore. Reproduced with fstest btrfs/073 (using MOUNT_OPTIONS="-o discard"). Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
- 25 11月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
It's possible to reach a state where the cleaner kthread isn't able to start a transaction to delete an unused block group due to lack of enough free metadata space and due to lack of unallocated device space to allocate a new metadata block group as well. If this happens try to use space from the global block group reserve just like we do for unlink operations, so that we don't reach a permanent state where starting a transaction for filesystem operations (file creation, renames, etc) keeps failing with -ENOSPC. Such an unfortunate state was observed on a machine where over a dozen unused data block groups existed and the cleaner kthread was failing to delete them due to ENOSPC error when attempting to start a transaction, and even running balance with a -dusage=0 filter failed with ENOSPC as well. Also unmounting and mounting again the filesystem didn't help. Allowing the cleaner kthread to use the global block reserve to delete the unused data block groups fixed the problem. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 22 10月, 2015 6 次提交
-
-
由 Josef Bacik 提交于
I want to set some per transaction flags, so instead of adding yet another int lets just convert the current two int indicators to flags and add a flags field for future use. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
We have a mechanism to make sure we don't lose updates for ordered extents that were logged in the transaction that is currently running. We add the ordered extent to a transaction list and then the transaction waits on all the ordered extents in that list. However are substantially large file systems this list can be extremely large, and can give us soft lockups, since the ordered extents don't remove themselves from the list when they do complete. To fix this we simply add a counter to the transaction that is incremented any time we have a logged extent that needs to be completed in the current transaction. Then when the ordered extent finally completes it decrements the per transaction counter and wakes up the transaction if we are the last ones. This will eliminate the softlockup. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
As we have the new metadata reservation functions, use them to replace the old btrfs_qgroup_reserve() call for metadata. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Alexandru Moise 提交于
The value of num_items that start_transaction() ultimately always takes is a small one, so a 64 bit integer is overkill. Also change num_items for btrfs_start_transaction() and btrfs_start_transaction_lflush() as well. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Use memset() to null out the btrfs_delayed_ref_root of btrfs_transaction instead of setting all the members to 0 by hand. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Alexandru Moise 提交于
Just call kmem_cache_zalloc() instead of calling kmem_cache_alloc(). We're just initializing most fields to 0, false and NULL later on _anyway_, so to make the code mode readable and potentially gain a bit of performance (completely untested claim), we should fill our btrfs_trans_handle with zeros on allocation then just initialize those five remaining fields (not counting the list_heads) as normal. Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 11 10月, 2015 2 次提交
-
-
由 David Sterba 提交于
Reduce number of undocumented barriers out there. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
waitqueue_active should be preceded by a barrier, in this function we don't need to call it all the time. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 06 10月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
Josef ran into a deadlock while a transaction handle was finalizing the creation of its block groups, which produced the following trace: [260445.593112] fio D ffff88022a9df468 0 8924 4518 0x00000084 [260445.593119] ffff88022a9df468 ffffffff81c134c0 ffff880429693c00 ffff88022a9df488 [260445.593126] ffff88022a9e0000 ffff8803490d7b00 ffff8803490d7b18 ffff88022a9df4b0 [260445.593132] ffff8803490d7af8 ffff88022a9df488 ffffffff8175a437 ffff8803490d7b00 [260445.593137] Call Trace: [260445.593145] [<ffffffff8175a437>] schedule+0x37/0x80 [260445.593189] [<ffffffffa0850f37>] btrfs_tree_lock+0xa7/0x1f0 [btrfs] [260445.593197] [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0 [260445.593225] [<ffffffffa07eac44>] btrfs_lock_root_node+0x34/0x50 [btrfs] [260445.593253] [<ffffffffa07eff6b>] btrfs_search_slot+0x88b/0xa00 [btrfs] [260445.593295] [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs] [260445.593324] [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [260445.593351] [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [260445.593394] [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs] [260445.593427] [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs] [260445.593459] [<ffffffffa0800964>] do_chunk_alloc+0x2a4/0x2e0 [btrfs] [260445.593491] [<ffffffffa0803815>] find_free_extent+0xa55/0xd90 [btrfs] [260445.593524] [<ffffffffa0803c22>] btrfs_reserve_extent+0xd2/0x220 [btrfs] [260445.593532] [<ffffffff8119fe5d>] ? account_page_dirtied+0xdd/0x170 [260445.593564] [<ffffffffa0803e78>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs] [260445.593597] [<ffffffffa080c9de>] ? btree_set_page_dirty+0xe/0x10 [btrfs] [260445.593626] [<ffffffffa07eb5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs] [260445.593654] [<ffffffffa07ebbff>] btrfs_cow_block+0x11f/0x1c0 [btrfs] [260445.593682] [<ffffffffa07ef8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs] [260445.593724] [<ffffffffa08389df>] ? free_extent_buffer+0x4f/0x90 [btrfs] [260445.593752] [<ffffffffa07f1a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [260445.593830] [<ffffffffa07ea94a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [260445.593905] [<ffffffffa08403b9>] btrfs_finish_chunk_alloc+0x1c9/0x570 [btrfs] [260445.593946] [<ffffffffa08002ab>] btrfs_create_pending_block_groups+0x11b/0x200 [btrfs] [260445.593990] [<ffffffffa0815798>] btrfs_commit_transaction+0xa8/0xb40 [btrfs] [260445.594042] [<ffffffffa085abcd>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs] [260445.594089] [<ffffffffa082bc84>] btrfs_sync_file+0x294/0x350 [btrfs] [260445.594115] [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0 [260445.594133] [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180 [260445.594149] [<ffffffff8123e35d>] do_fsync+0x3d/0x70 [260445.594169] [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110 [260445.594187] [<ffffffff8123e600>] SyS_fsync+0x10/0x20 [260445.594204] [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71 This happened because the same transaction handle created a large number of block groups and while finalizing their creation (inserting new items and updating existing items in the chunk and device trees) a new metadata extent had to be allocated and no free space was found in the current metadata block groups, which made find_free_extent() attempt to allocate a new block group via do_chunk_alloc(). However at do_chunk_alloc() we ended up allocating a new system chunk too and exceeded the threshold of 2Mb of reserved chunk bytes, which makes do_chunk_alloc() enter the final part of block group creation again (at btrfs_create_pending_block_groups()) and attempt to lock again the root of the chunk tree when it's already write locked by the same task. Similarly we can deadlock on extent tree nodes/leafs if while we are running delayed references we end up creating a new metadata block group in order to allocate a new node/leaf for the extent tree (as part of a CoW operation or growing the tree), as btrfs_create_pending_block_groups inserts items into the extent tree as well. In this case we get the following trace: [14242.773581] fio D ffff880428ca3418 0 3615 3100 0x00000084 [14242.773588] ffff880428ca3418 ffff88042d66b000 ffff88042a03c800 ffff880428ca3438 [14242.773594] ffff880428ca4000 ffff8803e4b20190 ffff8803e4b201a8 ffff880428ca3460 [14242.773600] ffff8803e4b20188 ffff880428ca3438 ffffffff8175a437 ffff8803e4b20190 [14242.773606] Call Trace: [14242.773613] [<ffffffff8175a437>] schedule+0x37/0x80 [14242.773656] [<ffffffffa057ff07>] btrfs_tree_lock+0xa7/0x1f0 [btrfs] [14242.773664] [<ffffffff810db7c0>] ? prepare_to_wait_event+0xf0/0xf0 [14242.773692] [<ffffffffa0519c44>] btrfs_lock_root_node+0x34/0x50 [btrfs] [14242.773720] [<ffffffffa051ef6b>] btrfs_search_slot+0x88b/0xa00 [btrfs] [14242.773750] [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [14242.773758] [<ffffffff811ef4a2>] ? kmem_cache_alloc+0x1d2/0x200 [14242.773786] [<ffffffffa0520ad1>] btrfs_insert_item+0x71/0xf0 [btrfs] [14242.773818] [<ffffffffa052f292>] btrfs_create_pending_block_groups+0x102/0x200 [btrfs] [14242.773850] [<ffffffffa052f96e>] do_chunk_alloc+0x2ae/0x2f0 [btrfs] [14242.773934] [<ffffffffa0532825>] find_free_extent+0xa55/0xd90 [btrfs] [14242.773998] [<ffffffffa0532c22>] btrfs_reserve_extent+0xc2/0x1d0 [btrfs] [14242.774041] [<ffffffffa0532e38>] btrfs_alloc_tree_block+0x108/0x4a0 [btrfs] [14242.774078] [<ffffffffa051a5cd>] __btrfs_cow_block+0x12d/0x5b0 [btrfs] [14242.774118] [<ffffffffa051abff>] btrfs_cow_block+0x11f/0x1c0 [btrfs] [14242.774155] [<ffffffffa051e8c7>] btrfs_search_slot+0x1e7/0xa00 [btrfs] [14242.774194] [<ffffffffa0528021>] ? __btrfs_free_extent.isra.70+0x2e1/0xcb0 [btrfs] [14242.774235] [<ffffffffa0520a06>] btrfs_insert_empty_items+0x66/0xc0 [btrfs] [14242.774274] [<ffffffffa051994a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [14242.774318] [<ffffffffa052c433>] __btrfs_run_delayed_refs+0xbb3/0x1020 [btrfs] [14242.774358] [<ffffffffa052f404>] btrfs_run_delayed_refs.part.78+0x74/0x280 [btrfs] [14242.774391] [<ffffffffa052f627>] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [14242.774432] [<ffffffffa05be236>] commit_cowonly_roots+0x8d/0x2bd [btrfs] [14242.774474] [<ffffffffa059d07f>] ? __btrfs_run_delayed_items+0x1cf/0x210 [btrfs] [14242.774516] [<ffffffffa05adac3>] ? btrfs_qgroup_account_extents+0x83/0x130 [btrfs] [14242.774558] [<ffffffffa0544c40>] btrfs_commit_transaction+0x590/0xb40 [btrfs] [14242.774599] [<ffffffffa0589b9d>] ? btrfs_log_dentry_safe+0x6d/0x80 [btrfs] [14242.774642] [<ffffffffa055ac54>] btrfs_sync_file+0x294/0x350 [btrfs] [14242.774650] [<ffffffff8123e29b>] vfs_fsync_range+0x3b/0xa0 [14242.774657] [<ffffffff81023891>] ? syscall_trace_enter_phase1+0x131/0x180 [14242.774663] [<ffffffff8123e35d>] do_fsync+0x3d/0x70 [14242.774669] [<ffffffff81023bb8>] ? syscall_trace_leave+0xb8/0x110 [14242.774675] [<ffffffff8123e600>] SyS_fsync+0x10/0x20 [14242.774681] [<ffffffff8175de6e>] entry_SYSCALL_64_fastpath+0x12/0x71 Fix this by never recursing into the finalization phase of block group creation and making sure we never trigger the finalization of block group creation while running delayed references. Reported-by: NJosef Bacik <jbacik@fb.com> Fixes: 00d80e34 ("Btrfs: fix quick exhaustion of the system array in the superblock") Signed-off-by: NFilipe Manana <fdmanana@suse.com>
-
- 29 9月, 2015 1 次提交
-
-
由 Anand Jain 提交于
btrfs_error() and btrfs_std_error() does the same thing and calls _btrfs_std_error(), so consolidate them together. And the main motivation is that btrfs_error() is closely named with btrfs_err(), one handles error action the other is to log the error, so don't closely name them. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Suggested-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 23 9月, 2015 1 次提交
-
-
由 Josef Bacik 提交于
When dropping a snapshot we need to account for the qgroup changes. If we drop the snapshot in all one go then the backref code will fail to find blocks from the snapshot we dropped since it won't be able to find the root in the fs root cache. This can lead to us failing to find refs from other roots that pointed at blocks in the now deleted root. To handle this we need to not remove the fs roots from the cache until after we process the qgroup operations. Do this by adding dropped roots to a list on the transaction, and letting the transaction remove the roots at the same time it drops the commit roots. This will keep all of the backref searching code in sync properly, and fixes a problem Mark was seeing with snapshot delete and qgroups. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Tested-by: NHolger Hoffstätte <holger.hoffstaette@googlemail.com> Signed-off-by: NChris Mason <clm@fb.com>
-