- 07 10月, 2020 5 次提交
-
-
由 Josef Bacik 提交于
When we COW a block we are holding a lock on the original block, and then we lock the new COW block. Because our lockdep maps are based on root + level, this will make lockdep complain. We need a way to indicate a subclass for locking the COW'ed block, so plumb through our btrfs_lock_nesting from btrfs_cow_block down to the btrfs_init_buffer, and then introduce BTRFS_NESTING_COW to be used for cow'ing blocks. The reason I've added all this extra infrastructure is because there will be need of different nesting classes in follow up patches. Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
[BUG] With a crafted image, btrfs can panic at insert_inline_extent_backref(): kernel BUG at fs/btrfs/extent-tree.c:1857! invalid opcode: 0000 [#1] SMP PTI CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9 RIP: 0010:insert_inline_extent_backref+0xcc/0xe0 RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001 RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0 Call Trace: ? _cond_resched+0x1a/0x50 __btrfs_inc_extent_ref.isra.64+0x7e/0x240 ? btrfs_merge_delayed_refs+0xa5/0x330 __btrfs_run_delayed_refs+0x653/0x1120 btrfs_run_delayed_refs+0xdb/0x1b0 btrfs_commit_transaction+0x52/0x950 ? start_transaction+0x94/0x450 transaction_kthread+0x163/0x190 kthread+0x105/0x140 ? btrfs_cleanup_transaction+0x560/0x560 ? kthread_destroy_worker+0x50/0x50 ret_from_fork+0x35/0x40 Modules linked in: ---[ end trace 2ad8b3de903cf825 ]--- [CAUSE] Due to extent tree corruption (still valid by itself, but bad cross ref), we can allocate an extent which is still in extent tree. The offending tree block of that case is from csum tree. The newly allocated tree block is also for csum tree. Then we will try to insert a tree block ref for the existing tree block ref. For a tree extent item, tree block can never be shared directly by the same tree twice. We have such BUG_ON() to prevent such problem, but this is not a proper error handling. [FIX] Replace that BUG_ON() with proper error message and leaf dump for debug build. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202829Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
__btrfs_free_extent() is doing two things: 1. Reduce the refs number of an extent backref Either it's an inline extent backref (inside EXTENT/METADATA item) or a keyed extent backref (SHARED_* item). We only need to locate that backref line, either reduce the number or remove the backref line completely. 2. Update the refs count in EXTENT/METADATA_ITEM During step 1), we will try to locate the EXTENT/METADATA_ITEM without triggering another btrfs_search_slot() as fast path. Only when we fail to locate that item, we will trigger another btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we updated/deleted the backref line. And we have a lot of strict checks on things like refs_to_drop against extent refs and special case checks for single ref extents. There are 7 BUG_ON()s, although they're doing correct checks, they can be triggered by crafted images. This patch improves the function: - Introduce two examples to show what __btrfs_free_extent() is doing One inline backref case and one keyed case. Should cover most cases. - Kill all BUG_ON()s with proper error message and optional leaf dump - Add comment to show the overall flow Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202819 [ The report triggers one BUG_ON() in __btrfs_free_extent() ] Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
When unpinning we were only calling btrfs_try_granting_tickets() if global_rsv->space_info == space_info, which is problematic because we use ticketing for SYSTEM chunks, and want to use it for DATA as well. Fix this by moving this call outside of that if statement. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
The current trace event always output result like this: find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA) find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA) find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA) find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA) T's saying we're allocating data extent for EXTENT tree, which is not even possible. It's because we always use EXTENT tree as the owner for trace_find_free_extent() without using the @root from btrfs_reserve_extent(). This patch will change the parameter to use proper @root for trace_find_free_extent(): Now it looks much better: find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=4096 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA) find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=7(CSUM_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) find_free_extent: root=1(ROOT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP) Reported-by: NHans van Kranenburg <hans@knorrie.org> CC: stable@vger.kernel.org # 5.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 07 9月, 2020 1 次提交
-
-
由 Qu Wenruo 提交于
[BUG] A completely sane converted fs will cause kernel warning at balance time: [ 1557.188633] BTRFS info (device sda7): relocating block group 8162107392 flags data [ 1563.358078] BTRFS info (device sda7): found 11722 extents [ 1563.358277] BTRFS info (device sda7): leaf 7989321728 gen 95 total ptrs 213 free space 3458 owner 2 [ 1563.358280] item 0 key (7984947200 169 0) itemoff 16250 itemsize 33 [ 1563.358281] extent refs 1 gen 90 flags 2 [ 1563.358282] ref#0: tree block backref root 4 [ 1563.358285] item 1 key (7985602560 169 0) itemoff 16217 itemsize 33 [ 1563.358286] extent refs 1 gen 93 flags 258 [ 1563.358287] ref#0: shared block backref parent 7985602560 [ 1563.358288] (parent 7985602560 is NOT ALIGNED to nodesize 16384) [ 1563.358290] item 2 key (7985635328 169 0) itemoff 16184 itemsize 33 ... [ 1563.358995] BTRFS error (device sda7): eb 7989321728 invalid extent inline ref type 182 [ 1563.358996] ------------[ cut here ]------------ [ 1563.359005] WARNING: CPU: 14 PID: 2930 at 0xffffffff9f231766 Then with transaction abort, and obviously failed to balance the fs. [CAUSE] That mentioned inline ref type 182 is completely sane, it's BTRFS_SHARED_BLOCK_REF_KEY, it's some extra check making kernel to believe it's invalid. Commit 64ecdb64 ("Btrfs: add one more sanity check for shared ref type") introduced extra checks for backref type. One of the requirement is, parent bytenr must be aligned to node size, which is not correct. One example is like this: 0 1G 1G+4K 2G 2G+4K | |///////////////////|//| <- A chunk starts at 1G+4K | | <- A tree block get reserved at bytenr 1G+4K Then we have a valid tree block at bytenr 1G+4K, but not aligned to nodesize (16K). Such chunk is not ideal, but current kernel can handle it pretty well. We may warn about such tree block in the future, but should not reject them. [FIX] Change the alignment requirement from node size alignment to sector size alignment. Also, to make our lives a little easier, also output @iref when btrfs_get_extent_inline_ref_type() failed, so we can locate the item easier. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205475 Fixes: 64ecdb64 ("Btrfs: add one more sanity check for shared ref type") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NQu Wenruo <wqu@suse.com> [ update comments and messages ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 27 8月, 2020 1 次提交
-
-
由 Josef Bacik 提交于
When flipping over to the rw_semaphore I noticed I'd get a lockdep splat in replace_path(), which is weird because we're swapping the reloc root with the actual target root. Turns out this is because we're using the root->root_key.objectid as the root id for the newly allocated tree block when setting the lockdep class, however we need to be using the actual owner of this new block, which is saved in owner. The affected path is through btrfs_copy_root as all other callers of btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid == root->root_key.objectid . CC: stable@vger.kernel.org # 5.4+ Reviewed-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 21 8月, 2020 1 次提交
-
-
由 Boris Burkov 提交于
can_nocow_extent and btrfs_cross_ref_exist both rely on a heuristic for detecting a must cow condition which is not exactly accurate, but saves unnecessary tree traversal. The incorrect assumption is that if the extent was created in a generation smaller than the last snapshot generation, it must be referenced by that snapshot. That is true, except the snapshot could have since been deleted, without affecting the last snapshot generation. The original patch claimed a performance win from this check, but it also leads to a bug where you are unable to use a swapfile if you ever snapshotted the subvolume it's in. Make the check slower and more strict for the swapon case, without modifying the general cow checks as a compromise. Turning swap on does not seem to be a particularly performance sensitive operation, so incurring a possibly unnecessary btrfs_search_slot seems worthwhile for the added usability. Note: Until the snapshot is competely cleaned after deletion, check_committed_refs will still cause the logic to think that cow is necessary, so the user must until 'btrfs subvolu sync' finished before activating the swapfile swapon. CC: stable@vger.kernel.org # 5.4+ Suggested-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NBoris Burkov <boris@bur.io> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 12 8月, 2020 1 次提交
-
-
由 Qu Wenruo 提交于
[BUG] The following script can lead to tons of beyond device boundary access: mkfs.btrfs -f $dev -b 10G mount $dev $mnt trimfs $mnt btrfs filesystem resize 1:-1G $mnt trimfs $mnt [CAUSE] Since commit 929be17a ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit"), we try to avoid trimming ranges that's already trimmed. So we check device->alloc_state by finding the first range which doesn't have CHUNK_TRIMMED and CHUNK_ALLOCATED not set. But if we shrunk the device, that bits are not cleared, thus we could easily got a range starts beyond the shrunk device size. This results the returned @start and @end are all beyond device size, then we call "end = min(end, device->total_bytes -1);" making @end smaller than device size. Then finally we goes "len = end - start + 1", totally underflow the result, and lead to the beyond-device-boundary access. [FIX] This patch will fix the problem in two ways: - Clear CHUNK_TRIMMED | CHUNK_ALLOCATED bits when shrinking device This is the root fix - Add extra safety check when trimming free device extents We check and warn if the returned range is already beyond current device. Link: https://github.com/kdave/btrfs-progs/issues/282 Fixes: 929be17a ("btrfs: Switch btrfs_trim_free_extents to find_first_clear_extent_bit") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 27 7月, 2020 2 次提交
-
-
由 Qu Wenruo 提交于
[BUG] There is a bug report about bad signal timing could lead to read-only fs during balance: BTRFS info (device xvdb): balance: start -d -m -s BTRFS info (device xvdb): relocating block group 73001861120 flags metadata BTRFS info (device xvdb): found 12236 extents, stage: move data extents BTRFS info (device xvdb): relocating block group 71928119296 flags data BTRFS info (device xvdb): found 3 extents, stage: move data extents BTRFS info (device xvdb): found 3 extents, stage: update data pointers BTRFS info (device xvdb): relocating block group 60922265600 flags metadata BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4 unknown BTRFS info (device xvdb): forced readonly BTRFS info (device xvdb): balance: ended with status: -4 [CAUSE] The direct cause is the -EINTR from the following call chain when a fatal signal is pending: relocate_block_group() |- clean_dirty_subvols() |- btrfs_drop_snapshot() |- btrfs_start_transaction() |- btrfs_delayed_refs_rsv_refill() |- btrfs_reserve_metadata_bytes() |- __reserve_metadata_bytes() |- wait_reserve_ticket() |- prepare_to_wait_event(); |- ticket->error = -EINTR; Normally this behavior is fine for most btrfs_start_transaction() callers, as they need to catch any other error, same for the signal, and exit ASAP. However for balance, especially for the clean_dirty_subvols() case, we're already doing cleanup works, getting -EINTR from btrfs_drop_snapshot() could cause a lot of unexpected problems. From the mentioned forced read-only report, to later balance error due to half dropped reloc trees. [FIX] Fix this problem by using btrfs_join_transaction() if btrfs_drop_snapshot() is called from relocation context. Since btrfs_join_transaction() won't get interrupted by signal, we can continue the cleanup. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>3 Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
[BUG] Sometime fsstress could lead to qgroup warning for case like generic/013: BTRFS warning (device dm-3): qgroup 0/259 has unreleased space, type 1 rsv 81920 ------------[ cut here ]------------ WARNING: CPU: 9 PID: 24535 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:close_ctree+0x1dc/0x323 [btrfs] Call Trace: btrfs_put_super+0x15/0x17 [btrfs] generic_shutdown_super+0x72/0x110 kill_anon_super+0x18/0x30 btrfs_kill_super+0x17/0x30 [btrfs] deactivate_locked_super+0x3b/0xa0 deactivate_super+0x40/0x50 cleanup_mnt+0x135/0x190 __cleanup_mnt+0x12/0x20 task_work_run+0x64/0xb0 __prepare_exit_to_usermode+0x1bc/0x1c0 __syscall_return_slowpath+0x47/0x230 do_syscall_64+0x64/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 6c341cdf9b6cc3c1 ]--- BTRFS error (device dm-3): qgroup reserved space leaked While that subvolume 259 is no longer in that filesystem. [CAUSE] Normally per-trans qgroup reserved space is freed when a transaction is committed, in commit_fs_roots(). However for completely dropped subvolume, that subvolume is completely gone, thus is no longer in the fs_roots_radix, and its per-trans reserved qgroup will never be freed. Since the subvolume is already gone, leaked per-trans space won't cause any trouble for end users. [FIX] Just call btrfs_qgroup_free_meta_all_pertrans() before a subvolume is completely dropped. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 25 5月, 2020 5 次提交
-
-
由 Qu Wenruo 提交于
The name BTRFS_ROOT_REF_COWS is not very clear about the meaning. In fact, that bit can only be set to those trees: - Subvolume roots - Data reloc root - Reloc roots for above roots All other trees won't get this bit set. So just by the result, it is obvious that, roots with this bit set can have tree blocks shared with other trees. Either shared by snapshots, or by reloc roots (an special snapshot created by relocation). This patch will rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE to make it easier to understand, and update all comment mentioning "reference counted" to follow the rename. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Filipe Manana 提交于
Back in 2014, commit 04216820 ("Btrfs: fix race between fs trimming and block group remove/allocation"), I added the 'trimming' member to the block group structure. Its purpose was to prevent races between trimming and block group deletion/allocation by pinning the block group in a way that prevents its logical address and device extents from being reused while trimming is in progress for a block group, so that if another task deletes the block group and then another task allocates a new block group that gets the same logical address and device extents while the trimming task is still in progress. After the previous fix for scrub (patch "btrfs: fix a race between scrub and block group removal/allocation"), scrub now also has the same needs that trimming has, so the member name 'trimming' no longer makes sense. Since there is already a 'pinned' member in the block group that refers to space reservations (pinned bytes), rename the member to 'frozen', add a comment on top of it to describe its general purpose and rename the helpers to increment and decrement the counter as well, to match the new member name. The next patch in the series will move the helpers into a more suitable file (from free-space-cache.c to block-group.c). Signed-off-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 YueHaibing 提交于
There's no callers in-tree anymore since commit 64403612 ("btrfs: rework btrfs_check_space_for_delayed_refs") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Deleting a subvolume on a full filesystem leads to ENOSPC followed by a forced read-only. This is not a transaction abort and the filesystem is otherwise ok, so the error should be just propagated to the callers. This is caused by unnecessary call to btrfs_handle_fs_error for all errors, except EAGAIN. This does not make sense as the standard transaction abort mechanism is in btrfs_drop_snapshot so all relevant failures are handled. Originally in commit cb1b69f4 ("Btrfs: forced readonly when btrfs_drop_snapshot() fails") there was no return value at all, so the btrfs_std_error made some sense but once the error handling and propagation has been implemented we don't need it anymore. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jules Irenge 提交于
Sparse reports a warning at btrfs_lock_cluster() warning: context imbalance in btrfs_lock_cluster() - wrong count The root cause is the missing annotation at btrfs_lock_cluster() Add the missing __acquires(&cluster->refill_lock) annotation. Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 24 3月, 2020 24 次提交
-
-
由 Josef Bacik 提交于
Readahead will generate a lot of extra reads for adjacent nodes, but when running delayed refs we have no idea if the next ref is going to be adjacent or not, so this potentially just generates a lot of extra IO. To make matters worse each ref is truly just looking for one item, it doesn't generally search forward, so we simply don't need it here. Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
There are a few different ways to free roots, either you allocated them yourself and you just do free_extent_buffer(root->node); free_extent_buffer(root->commit_node); btrfs_put_root(root); Which is the pattern for log roots. Or for snapshots/subvolumes that are being dropped you simply call btrfs_free_fs_root() which does all the cleanup for you. Unify this all into btrfs_put_root(), so that we don't free up things associated with the root until the last reference is dropped. This makes the root freeing code much more significant. The only caveat is at close_ctree() time we have to free the extent buffers for all of our main roots (extent_root, chunk_root, etc) because we have to drop the btree_inode and we'll run into issues if we hold onto those nodes until ->kill_sb() time. This will be addressed in the future when we kill the btree_inode. Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It's no longer used following 30d40577 ("btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON()"), so just remove it. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
No need to add a level of indirection for hiding a simple 'if'. Open code insert_extent_backref in its sole caller. No functional changes. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
This function finally factor out prepare_allocation() form find_free_extent(). This function is called before the allocation loop and a specific allocator function like prepare_allocation_clustered() should initialize their private information and can set proper hint_byte to indicate where to start the allocation with. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
LOOP_NO_EMPTY_SIZE is solely dedicated for clustered allocation. So, we can skip this stage and give up the allocation. Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Factor out chunk_allocation_failed() from find_free_extent_update_loop(). This function is called when it failed to allocate a chunk. The function can modify "ffe_ctl->loop" and return 0 to continue with the next stage. Or, it can return -ENOSPC to give up here. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Now that, we don't use last_ptr and use_cluster in the function. Drop these arguments from it. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Factor out found_extent() from find_free_extent_update_loop(). This function is called when a proper extent is found and before returning from find_free_extent(). Hook functions like found_extent_clustered() should save information for a next allocation. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Factor out release_block_group() from find_free_extent(). This function is called when it gives up an allocation from a block group. Each allocation policy should reset its information for an allocation in the next block group. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Now that, find_free_extent_clustered() and find_free_extent_unclustered() can access "last_ptr" from the "clustered" variable, we can drop it from the arguments. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Factor out do_allocation() from find_free_extent(). This function do an actual extent allocation in a given block group. The ffe_ctl->policy is used to determine the actual allocator function to use. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Move "last_ptr" and "use_cluster" into struct find_free_extent_ctl, so that hook functions for clustered allocator can use these variables. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
This commit moves hint_byte into find_free_extent_ctl, so that we can modify the hint_byte in the other functions. This will help us split find_free_extent further. This commit also renames the function argument "hint_byte" to "hint_byte_orig" to avoid misuse. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
This commit introduces extent allocation policy for btrfs. This policy controls how btrfs allocate an extents from block groups. There is no functional change introduced with this commit. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
While the "full_search" variable defined in find_free_extent() is bool, but the full_search argument of find_free_extent_update_loop() is defined as int. Let's trivially fix the argument type. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
All callers pass extent buffer start and length so the extent buffer itself should work fine. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This patch removes all haphazard code implementing nocow writers exclusion from pending snapshot creation and switches to using the drew lock to ensure this invariant still holds. 'Readers' are snapshot creators from create_snapshot and 'writers' are nocow writers from buffered write path or btrfs_setsize. This locking scheme allows for multiple snapshots to happen while any nocow writers are blocked, since writes to page cache in the nocow path will make snapshots inconsistent. So for performance reasons we'd like to have the ability to run multiple concurrent snapshots and also favors readers in this case. And in case there aren't pending snapshots (which will be the majority of the cases) we rely on the percpu's writers counter to avoid cacheline contention. The main gain from using the drew lock is it's now a lot easier to reason about the guarantees of the locking scheme and whether there is some silent breakage lurking. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
If we're allocating a logged extent we attempt to insert an extent record for the file extent directly. We increase space_info->bytes_reserved, because the extent entry addition will call btrfs_update_block_group(), which will convert the ->bytes_reserved to ->bytes_used. However if we fail at any point while inserting the extent entry we will bail and leave space on ->bytes_reserved, which will trigger a WARN_ON() on umount. Fix this by pinning the space if we fail to insert, which is what happens in every other failure case that involves adding the extent entry. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This commit flips the switch to start tracking/processing pinned extents on a per-transaction basis. It mostly replaces all references from btrfs_fs_info::(pinned_extents|freed_extents[]) to btrfs_transaction::pinned_extents. Two notable modifications that warrant explicit mention are changing clean_pinned_extents to get a reference to the previously running transaction. The other one is removal of call to btrfs_destroy_pinned_extent since transactions are going to be cleaned in btrfs_cleanup_one_transaction. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
In preparation to making pinned extents per-transaction ensure that log such extents are always excluded from caching. To achieve this in addition to marking them via btrfs_pin_extent_for_log_replay they also need to be marked with btrfs_add_excluded_extent to prevent log tree extent buffer being loaded by the free space caching thread. That's required since log tree blocks are not recorded in the extent tree, hence they always look free. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
All callers have a reference to a transaction handle so pass it to pin_down_extent. This is the final step before switching pinned extent tracking to a per-transaction basis. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Preparation for refactoring pinned extents tracking. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
btrfs_pin_reserved_extent is now only called with a valid transaction so exploit the fact to take a transaction. This is preparation for tracking pinned extents on a per-transaction basis. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-