- 07 1月, 2016 1 次提交
-
-
由 David Sterba 提交于
Replace the integers by enums for better readability. The value 2 does not have any meaning since a7175319 "Btrfs: do less aggressive btree readahead" (2009-01-22). Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 26 10月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
In the kernel 4.2 merge window we had a big changes to the implementation of delayed references and qgroups which made the no_quota field of delayed references not used anymore. More specifically the no_quota field is not used anymore as of: commit 0ed4792a ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.") Leaving the no_quota field actually prevents delayed references from getting merged, which in turn cause the following BUG_ON(), at fs/btrfs/extent-tree.c, to be hit when qgroups are enabled: static int run_delayed_tree_ref(...) { (...) BUG_ON(node->ref_mod != 1); (...) } This happens on a scenario like the following: 1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. 2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota. 3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref2 is incompatible due to Ref2->no_quota != Ref3->no_quota. 4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added. It's not merged with the reference at the tail of the list of refs for bytenr X because the reference at the tail, Ref3 is incompatible due to Ref3->no_quota != Ref4->no_quota. 5) We run delayed references, trigger merging of delayed references, through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs(). 6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and all other conditions are satisfied too. So Ref1 gets a ref_mod value of 2. 7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and all other conditions are satisfied too. So Ref2 gets a ref_mod value of 2. 8) Ref1 and Ref2 aren't merged, because they have different values for their no_quota field. 9) Delayed reference Ref1 is picked for running (select_delayed_ref() always prefers references with an action == BTRFS_ADD_DELAYED_REF). So run_delayed_tree_ref() is called for Ref1 which triggers the BUG_ON because Ref1->red_mod != 1 (equals 2). So fix this by removing the no_quota field, as it's not used anymore as of commit 0ed4792a ("btrfs: qgroup: Switch to new extent-oriented qgroup mechanism."). The use of no_quota was also buggy in at least two places: 1) At delayed-refs.c:btrfs_add_delayed_tree_ref() - we were setting no_quota to 0 instead of 1 when the following condition was true: is_fstree(ref_root) || !fs_info->quota_enabled 2) At extent-tree.c:__btrfs_inc_extent_ref() - we were attempting to reset a node's no_quota when the condition "!is_fstree(root_objectid) || !root->fs_info->quota_enabled" was true but we did it only in an unused local stack variable, that is, we never reset the no_quota value in the node itself. This fixes the remainder of problems several people have been having when running delayed references, mostly while a balance is running in parallel, on a 4.2+ kernel. Very special thanks to Stéphane Lesimple for helping debugging this issue and testing this fix on his multi terabyte filesystem (which took more than one day to balance alone, plus fsck, etc). Also, this fixes deadlock issue when using the clone ioctl with qgroups enabled, as reported by Elias Probst in the mailing list. The deadlock happens because after calling btrfs_insert_empty_item we have our path holding a write lock on a leaf of the fs/subvol tree and then before releasing the path we called check_ref() which did backref walking, when qgroups are enabled, and tried to read lock the same leaf. The trace for this case is the following: INFO: task systemd-nspawn:6095 blocked for more than 120 seconds. (...) Call Trace: [<ffffffff86999201>] schedule+0x74/0x83 [<ffffffff863ef64c>] btrfs_tree_read_lock+0xc0/0xea [<ffffffff86137ed7>] ? wait_woken+0x74/0x74 [<ffffffff8639f0a7>] btrfs_search_old_slot+0x51a/0x810 [<ffffffff863a129b>] btrfs_next_old_leaf+0xdf/0x3ce [<ffffffff86413a00>] ? ulist_add_merge+0x1b/0x127 [<ffffffff86411688>] __resolve_indirect_refs+0x62a/0x667 [<ffffffff863ef546>] ? btrfs_clear_lock_blocking_rw+0x78/0xbe [<ffffffff864122d3>] find_parent_nodes+0xaf3/0xfc6 [<ffffffff86412838>] __btrfs_find_all_roots+0x92/0xf0 [<ffffffff864128f2>] btrfs_find_all_roots+0x45/0x65 [<ffffffff8639a75b>] ? btrfs_get_tree_mod_seq+0x2b/0x88 [<ffffffff863e852e>] check_ref+0x64/0xc4 [<ffffffff863e9e01>] btrfs_clone+0x66e/0xb5d [<ffffffff863ea77f>] btrfs_ioctl_clone+0x48f/0x5bb [<ffffffff86048a68>] ? native_sched_clock+0x28/0x77 [<ffffffff863ed9b0>] btrfs_ioctl+0xabc/0x25cb (...) The problem goes away by eleminating check_ref(), which no longer is needed as its purpose was to get a value for the no_quota field of a delayed reference (this patch removes the no_quota field as mentioned earlier). Reported-by: NStéphane Lesimple <stephane_btrfs@lesimple.fr> Tested-by: NStéphane Lesimple <stephane_btrfs@lesimple.fr> Reported-by: NElias Probst <mail@eliasprobst.eu> Reported-by: NPeter Becker <floyd.net@gmail.com> Reported-by: NMalte Schröder <malte@tnxip.de> Reported-by: NDerek Dongray <derek@valedon.co.uk> Reported-by: NErkki Seppala <flux-btrfs@inside.org> Cc: stable@vger.kernel.org # 4.2+ Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
-
- 22 10月, 2015 2 次提交
-
-
由 Qu Wenruo 提交于
Cleanup the old facilities which use old btrfs_qgroup_reserve() function call, replace them with the newer version, and remove the "__" prefix in them. Also, make btrfs_qgroup_reserve/free() functions private, as they are now only used inside qgroup codes. Now, the whole btrfs qgroup is swithed to use the new reserve facilities. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Qu Wenruo 提交于
Use new reserve/free for buffered write and inode cache. For buffered write case, as nodatacow write won't increase quota account, so unlike old behavior which does reserve before check nocow, now we check nocow first and then only reserve data if we can't do nocow write. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 29 9月, 2015 1 次提交
-
-
由 Anand Jain 提交于
btrfs_error() and btrfs_std_error() does the same thing and calls _btrfs_std_error(), so consolidate them together. And the main motivation is that btrfs_error() is closely named with btrfs_err(), one handles error action the other is to log the error, so don't closely name them. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Suggested-by: NDavid Sterba <dsterba@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 09 8月, 2015 4 次提交
-
-
由 Zhaolei 提交于
These arguments are not used in functions, remove them for cleanup and make kernel stack happy. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhaolei 提交于
objectid's init-value is not used in any case, remove it. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhaolei 提交于
We need error checking code for get_ref_objectid_v0() in relocate_block_group(), to avoid unpredictable result, especially for accessing uninitialized value(when function failed) after this line. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhaolei 提交于
More than one code call set_block_group_ro() and restore rw in fail. Old code use bool bit to save blockgroup's ro state, it can not support parallel case(it is confirmd exist in my debug log). This patch use ref count to store ro state, and rename set_block_group_ro/set_block_group_rw to inc_block_group_ro/dec_block_group_ro. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 02 7月, 2015 1 次提交
-
-
由 Shilong Wang 提交于
btrfs_force_chunk_alloc() return 1 for allocation chunk successfully. This problem exists since commit c87f08ca. With this patch, we might fix some enospc problems for balances. Signed-off-by: NWang Shilong <wangshilong1991@gmail.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Tested-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 03 6月, 2015 1 次提交
-
-
由 Liu Bo 提交于
The return value of read_tree_block() can confuse callers as it always returns NULL for either -ENOMEM or -EIO, so it's likely that callers parse it to a wrong error, for instance, in btrfs_read_tree_root(). This fixes the above issue. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
- 13 4月, 2015 1 次提交
-
-
由 Dongsheng Yang 提交于
There are two problems in qgroup: a). The PAGE_CACHE is 4K, even when we are writing a data of 1K, qgroup will reserve a 4K size. It will cause the last 3K in a qgroup is not available to user. b). When user is writing a inline data, qgroup will not reserve it, it means this is a window we can exceed the limit of a qgroup. The main idea of this patch is reserving the data size of write_bytes rather than the reserve_bytes. It means qgroup will not care about the data size btrfs will reserve for user, but only care about the data size user is going to write. Then reserve it when user want to write and release it in transaction committed. In this way, qgroup can be released from the complex procedure in btrfs and only do the reserve when user want to write and account when the data is written in commit_transaction(). Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 11 4月, 2015 1 次提交
-
-
由 Chris Mason 提交于
We loop through all of the dirty block groups during commit and write the free space cache. In order to make sure the cache is currect, we do this while no other writers are allowed in the commit. If a large number of block groups are dirty, this can introduce long stalls during the final stages of the commit, which can block new procs trying to change the filesystem. This commit changes the block group cache writeout to take appropriate locks and allow it to run earlier in the commit. We'll still have to redo some of the block groups, but it means we can get most of the work out of the way without blocking the entire FS. Signed-off-by: NChris Mason <clm@fb.com>
-
- 13 12月, 2014 2 次提交
-
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
All callers pass nodesize. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 04 10月, 2014 2 次提交
-
-
由 Josef Bacik 提交于
Marc Merlin sent me a broken fs image months ago where it would blow up in the upper->checked BUG_ON() in build_backref_tree. This is because we had a scenario like this block a -- level 4 (not shared) | block b -- level 3 (reloc block, shared) | block c -- level 2 (not shared) | block d -- level 1 (shared) | block e -- level 0 (shared) We go to build a backref tree for block e, we notice block d is shared and add it to the list of blocks to lookup it's backrefs for. Now when we loop around we will check edges for the block, so we will see we looked up block c last time. So we lookup block d and then see that the block that points to it is block c and we can just skip that edge since we've already been up this path. The problem is because we clear need_check when we see block d (as it is shared) we never add block b as needing to be checked. And because block c is in our path already we bail out before we walk up to block b and add it to the backref check list. To fix this we need to reset need_check if we trip over a block that doesn't need to be checked. This will make sure that any subsequent blocks in the path as we're walking up afterwards are added to the list to be processed. With this patch I can now mount Marc's fs image and it'll complete the balance without panicing. Thanks, Reported-by: NMarc MERLIN <marc@merlins.org> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Josef Bacik 提交于
When balance panics it tends to panic in the BUG_ON(!upper->checked); test, because it means it couldn't build the backref tree properly. This is annoying to users and frankly a recoverable error, nothing in this function is actually fatal since it is just an in-memory building of the backrefs for a given bytenr. So go through and change all the BUG_ON()'s to ASSERT()'s, and fix the BUG_ON(!upper->checked) thing to just return an error. This patch also fixes the error handling so it tears down the work we've done properly. This code was horribly broken since we always just panic'ed instead of actually erroring out, so it needed to be completely re-worked. With this patch my broken image no longer panics when I mount it. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 02 10月, 2014 4 次提交
-
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
We know the tree block size, no need to pass it around. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
It's trivial with a single user. And remove one pointless BUG_ON. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
The parent_transid parameter has been unused since its introduction in ca7a79ad ("Pass down the expected generation number when reading tree blocks"). In reada_tree_block, it was even wrongly set to leafsize. Transid check is done in the proper read and readahead ignores errors. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 18 9月, 2014 2 次提交
-
-
由 David Sterba 提交于
The nodesize and leafsize were never of different values. Unify the usage and make nodesize the one. Cleanup the redundant checks and helpers. Shaves a few bytes from .text: text data bss dec hex filename 852418 24560 23112 900090 dbbfa btrfs.ko.before 851074 24584 23112 898770 db6d2 btrfs.ko.after Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
- 10 6月, 2014 2 次提交
-
-
由 David Sterba 提交于
I've noticed an extra line after "use no compression", but search revealed much more in messages of more critical levels and rare errors. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Miao Xie 提交于
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 07 4月, 2014 1 次提交
-
-
由 Josef Bacik 提交于
This was done to allow NO_COW to continue to be NO_COW after relocation but it is not right. When relocating we will convert blocks to FULL_BACKREF that we relocate. We can leave some of these full backref blocks behind if they are not cow'ed out during the relocation, like if we fail the relocation with ENOSPC and then just drop the reloc tree. Then when we go to cow the block again we won't lookup the extent flags because we won't think there has been a snapshot recently which means we will do our normal ref drop thing instead of adding back a tree ref and dropping the shared ref. This will cause btrfs_free_extent to blow up because it can't find the ref we are trying to free. This was found with my ref verifying tool. Thanks, Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 11 3月, 2014 1 次提交
-
-
由 Miao Xie 提交于
We needn't flush all delalloc inodes when we doesn't get s_umount lock, or we would make the tasks wait for a long time. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com>
-
- 29 1月, 2014 6 次提交
-
-
由 Wang Shilong 提交于
During balance test, we hit an oops: [ 2013.841551] kernel BUG at fs/btrfs/relocation.c:1174! The problem is that if we fail to relocate tree blocks, we should update backref cache, otherwise, some pending nodes are not updated while snapshot check @cache->last_trans is within one transaction and won't update it and then oops happen. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
Previously, we will free reloc root memory and then force filesystem to be readonly. The problem is that there may be another thread commiting transaction which will try to access freed reloc root during merging reloc roots process. To keep consistency snapshots shared space, we should allow snapshot finished if possible, so here we don't free reloc root memory. signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
@nr is no longer used, remove it from select_reloc_root() Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Frank Holton 提交于
Convert all applicable cases of printk and pr_* to the btrfs_* macros. Fix all uses of the BTRFS prefix. Signed-off-by: NFrank Holton <fholton@gmail.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
We have commited transaction before, remove redundant filemap writting and waiting here, it can speed up balance relocation process. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
We hit a forever loop when doing balance relocation,the reason is that we firstly reserve 4M(node size is 16k).and within transaction we will try to add extra reservation for snapshot roots,this will return -EAGAIN if there has been a thread flushing space to reserve space.We will do this again and again with filesystem becoming nearly full. If the above '-EAGAIN' case happens, we try to refill reservation more outsize of transaction, and this will return eariler in enospc case,however, this dosen't really hurt because it makes no sense doing balance relocation with the filesystem nearly full. Miao Xie helped a lot to track this issue, thanks. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 12 12月, 2013 3 次提交
-
-
由 Wang Shilong 提交于
I hit an oops when merging reloc roots fails, the reason is that new reloc roots may be added and we should make sure we cleanup all reloc roots. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
Quota tree and UUID Tree is only cowed, they can not be snapshoted. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Wang Shilong 提交于
I hit an oops when inserting reloc root into @reloc_root_tree(it can be easily triggered when forcing cow for relocation root) [ 866.494539] [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs] [ 866.495321] [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs] [ 866.496109] [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs] [ 866.496908] [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs] [ 866.497703] [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs] This is because reloc root inserted into @reloc_root_tree is not within one transaction,reloc root may be cowed and root block bytenr will be reused then oops happens.We should update reloc root in @reloc_root_tree when cow reloc root node, fix it. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 12 11月, 2013 4 次提交
-
-
由 Miao Xie 提交于
rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Dulshani Gunawardhana 提交于
Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source code that outputs a more descriptive warnings. Also fix the styling warning of redundant braces that came up as a result of this fix. Signed-off-by: NDulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: NZach Brown <zab@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Wang Shilong 提交于
We define a 'int' to get extent's generation by mistake,fix it. Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-