- 14 11月, 2018 5 次提交
-
-
由 Jeff Mahoney 提交于
commit d4e329de upstream. btrfs_trim_fs iterates over the fs_devices->alloc_list while holding the device_list_mutex. The problem is that ->alloc_list is protected by the chunk mutex. We don't want to hold the chunk mutex over the trim of the entire file system. Fortunately, the ->dev_list list is protected by the dev_list mutex and while it will give us all devices, including read-only devices, we already just skip the read-only devices. Then we can continue to take and release the chunk mutex while scanning each device. Fixes: 499f377f ("btrfs: iterate over unused chunk space in FITRIM") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Qu Wenruo 提交于
commit 6ba9fc8e upstream. [BUG] fstrim on some btrfs only trims the unallocated space, not trimming any space in existing block groups. [CAUSE] Before fstrim_range passed to btrfs_trim_fs(), it gets truncated to range [0, super->total_bytes). So later btrfs_trim_fs() will only be able to trim block groups in range [0, super->total_bytes). While for btrfs, any bytenr aligned to sectorsize is valid, since btrfs uses its logical address space, there is nothing limiting the location where we put block groups. For filesystem with frequent balance, it's quite easy to relocate all block groups and bytenr of block groups will start beyond super->total_bytes. In that case, btrfs will not trim existing block groups. [FIX] Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs() can get the unmodified range, which is normally set to [0, U64_MAX]. Reported-by: NChris Murphy <lists@colorremedies.com> Fixes: f4c697e6 ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl") CC: <stable@vger.kernel.org> # v4.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Qu Wenruo 提交于
commit 93bba24d upstream. Function btrfs_trim_fs() doesn't handle errors in a consistent way. If error happens when trimming existing block groups, it will skip the remaining blocks and continue to trim unallocated space for each device. The return value will only reflect the final error from device trimming. This patch will fix such behavior by: 1) Recording the last error from block group or device trimming The return value will also reflect the last error during trimming. Make developer more aware of the problem. 2) Continuing trimming if possible If we failed to trim one block group or device, we could still try the next block group or device. 3) Report number of failures during block group and device trimming It would be less noisy, but still gives user a brief summary of what's going wrong. Such behavior can avoid confusion for cases like failure to trim the first block group and then only unallocated space is trimmed. Reported-by: NChris Murphy <lists@colorremedies.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ add bg_ret and dev_ret to the messages ] Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Qu Wenruo 提交于
commit b72c3aba upstream. [BUG] For certain crafted image, whose csum root leaf has missing backref, if we try to trigger write with data csum, it could cause deadlock with the following kernel WARN_ON(): WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400 CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8 Workqueue: btrfs-endio-write btrfs_endio_write_helper RIP: 0010:btrfs_tree_lock+0x3e2/0x400 Call Trace: btrfs_alloc_tree_block+0x39f/0x770 __btrfs_cow_block+0x285/0x9e0 btrfs_cow_block+0x191/0x2e0 btrfs_search_slot+0x492/0x1160 btrfs_lookup_csum+0xec/0x280 btrfs_csum_file_blocks+0x2be/0xa60 add_pending_csums+0xaf/0xf0 btrfs_finish_ordered_io+0x74b/0xc90 finish_ordered_fn+0x15/0x20 normal_work_helper+0xf6/0x500 btrfs_endio_write_helper+0x12/0x20 process_one_work+0x302/0x770 worker_thread+0x81/0x6d0 kthread+0x180/0x1d0 ret_from_fork+0x35/0x40 [CAUSE] That crafted image has missing backref for csum tree root leaf. And when we try to allocate new tree block, since there is no EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot and use it. The extent tree of the image looks like: Normal image | This fuzzed image ----------------------------------+-------------------------------- BG 29360128 | BG 29360128 One empty slot | One empty slot 29364224: backref to UUID tree | 29364224: backref to UUID tree Two empty slots | Two empty slots 29376512: backref to CSUM tree | One empty slot (bad type) <<< 29380608: backref to D_RELOC tree | 29380608: backref to D_RELOC tree ... | ... Since bytenr 29376512 has no METADATA/EXTENT_ITEM, when btrfs try to alloc tree block, it's an valid slot for btrfs. And for finish_ordered_write, when we need to insert csum, we try to CoW csum tree root. By accident, empty slots at bytenr BG_OFFSET, BG_OFFSET + 8K, BG_OFFSET + 12K is already used by tree block COW for other trees, the next empty slot is BG_OFFSET + 16K, which should be the backref for CSUM tree. But due to the bad type, btrfs can recognize it and still consider it as an empty slot, and will try to use it for csum tree CoW. Then in the following call trace, we will try to lock the new tree block, which turns out to be the old csum tree root which is already locked: btrfs_search_slot() called on csum tree root, which is at 29376512 |- btrfs_cow_block() |- btrfs_set_lock_block() | |- Now locks tree block 29376512 (old csum tree root) |- __btrfs_cow_block() |- btrfs_alloc_tree_block() |- btrfs_reserve_extent() | Now it returns tree block 29376512, which extent tree | shows its empty slot, but it's already hold by csum tree |- btrfs_init_new_buffer() |- btrfs_tree_lock() | Triggers WARN_ON(eb->lock_owner == current->pid) |- wait_event() Wait lock owner to release the lock, but it's locked by ourself, so it will deadlock [FIX] This patch will do the lock_owner and current->pid check at btrfs_init_new_buffer(). So above deadlock can be avoided. Since such problem can only happen in crafted image, we will still trigger kernel warning for later aborted transaction, but with a little more meaningful warning message. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405Reported-by: NXu Wen <wen.xu@gatech.edu> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Qu Wenruo 提交于
commit 65c6e82b upstream. [BUG] When mounting certain crafted image, btrfs will trigger kernel BUG_ON() when trying to recover balance: kernel BUG at fs/btrfs/extent-tree.c:8956! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 662 Comm: mount Not tainted 4.18.0-rc1-custom+ #10 RIP: 0010:walk_up_proc+0x336/0x480 [btrfs] RSP: 0018:ffffb53540c9b890 EFLAGS: 00010202 Call Trace: walk_up_tree+0x172/0x1f0 [btrfs] btrfs_drop_snapshot+0x3a4/0x830 [btrfs] merge_reloc_roots+0xe1/0x1d0 [btrfs] btrfs_recover_relocation+0x3ea/0x420 [btrfs] open_ctree+0x1af3/0x1dd0 [btrfs] btrfs_mount_root+0x66b/0x740 [btrfs] mount_fs+0x3b/0x16a vfs_kern_mount.part.9+0x54/0x140 btrfs_mount+0x16d/0x890 [btrfs] mount_fs+0x3b/0x16a vfs_kern_mount.part.9+0x54/0x140 do_mount+0x1fd/0xda0 ksys_mount+0xba/0xd0 __x64_sys_mount+0x21/0x30 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] Extent tree corruption. In this particular case, reloc tree root's owner is DATA_RELOC_TREE (should be TREE_RELOC), thus its backref is corrupted and we failed the owner check in walk_up_tree(). [FIX] It's pretty hard to take care of every extent tree corruption, but at least we can remove such BUG_ON() and exit more gracefully. And since in this particular image, DATA_RELOC_TREE and TREE_RELOC share the same root (which is obviously invalid), we needs to make __del_reloc_root() more robust to detect such invalid sharing to avoid possible NULL dereference as root->node can be NULL in this case. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200411Reported-by: NXu Wen <wen.xu@gatech.edu> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 23 8月, 2018 1 次提交
-
-
由 Lu Fengqi 提交于
After btrfs_qgroup_reserve_meta_prealloc(), num_bytes will be assigned again by btrfs_calc_trans_metadata_size(). Once block_rsv fails, we can't properly free the num_bytes of the previous qgroup_reserve. Use a separate variable to store the num_bytes of the qgroup_reserve. Delete the comment for the qgroup_reserved that does not exist and add a comment about use_global_rsv. Fixes: c4c129db ("btrfs: drop unused parameter qgroup_reserved") CC: stable@vger.kernel.org # 4.18+ Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 06 8月, 2018 34 次提交
-
-
由 Qu Wenruo 提交于
If a crafted image has missing block group items, it could cause unexpected behavior and breaks the assumption of 1:1 chunk<->block group mapping. Although we have the block group -> chunk mapping check, we still need chunk -> block group mapping check. This patch will do extra check to ensure each chunk has its corresponding block group. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847Reported-by: NXu Wen <wen.xu@gatech.edu> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NGu Jinxiang <gujx@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
A crafted btrfs image with incorrect chunk<->block group mapping will trigger a lot of unexpected things as the mapping is essential. Although the problem can be caught by block group item checker added in "btrfs: tree-checker: Verify block_group_item", it's still not sufficient. A sufficiently valid block group item can pass the check added by the mentioned patch but could fail to match the existing chunk. This patch will add extra block group -> chunk mapping check, to ensure we have a completely matching (start, len, flags) chunk for each block group at mount time. Here we reuse the original helper find_first_block_group(), which is already doing the basic bg -> chunk checks, adding further checks of the start/len and type flags. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837Reported-by: NXu Wen <wen.xu@gatech.edu> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NSu Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Misono Tomohiro 提交于
There is no user of this function anymore. This was forgotten to be removed in commit a575ceeb ("Btrfs: get rid of unused orphan infrastructure"). Signed-off-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Lu Fengqi 提交于
It can be referenced from the passed transaction handle. Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Leftover after fix e339a6b0 ("Btrfs: __btrfs_mod_ref should always use no_quota"), that removed it from the function calls but not the structure. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Josef Bacik 提交于
If we're trying to make a data reservation and we have to allocate a data chunk we could leak ret == 1, as do_chunk_alloc() will return 1 if it allocated a chunk. Since the end of the function is the success path just return 0. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed transaction handle. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
In find_free_extent() under checks: label, we have the following code: search_start = ALIGN(offset, fs_info->stripesize); /* move on to the next group */ if (search_start + num_bytes > block_group->key.objectid + block_group->key.offset) { btrfs_add_free_space(block_group, offset, num_bytes); goto loop; } if (offset < search_start) btrfs_add_free_space(block_group, offset, search_start - offset); BUG_ON(offset > search_start); However ALIGN() is rounding up, thus @search_start >= @offset and that BUG_ON() will never be triggered. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
There are many places that open code the duplicity factor of the block group profiles, create a common helper. This can be easily extended for more copies. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Lu Fengqi 提交于
The fs_info can be fetched from the transaction handle directly. Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Lu Fengqi 提交于
It can be fetched from the transaction handle. Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Introduce a small helper, btrfs_mark_bg_unused(), to acquire locks and add a block group to unused_bgs list. No functional modification, and only 3 callers are involved. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
do_chunk_alloc implements logic to detect whether there is currently pending chunk allocation (by means of space_info->chunk_alloc being set) and if so it loops around to the 'again' label. Additionally, based on the state of the space_info (e.g. whether it's full or not) and the return value of should_alloc_chunk() it decides whether this is a "hard" error (ENOSPC) or we can just return 0. This patch refactors all of this: 1. Put order to the scattered ifs handling the various cases in an easy-to-read if {} else if{} branches. This makes clear the various cases we are interested in handling. 2. Call should_alloc_chunk only once and use the result in the if/else if constructs. All of this is done under space_info->lock, so even before multiple calls of should_alloc_chunk were unnecessary. 3. Rewrite the "do {} while()" loop currently implemented via label into an explicit loop construct. 4. Move the mutex locking for the case where the caller is the one doing the allocation. For the case where the caller needs to wait a concurrent allocation, introduce a pair of mutex_lock/mutex_unlock to act as a barrier and reword the comment. 5. Switch local vars to bool type where pertinent. All in all this shouldn't introduce any functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Ethan Lien 提交于
In commit b150a4f1 ("Btrfs: use a percpu to keep track of possibly pinned bytes") we use total_bytes_pinned to track how many bytes we are going to free in this transaction. When we are close to ENOSPC, we check it and know if we can make the allocation by commit the current transaction. For every data/metadata extent we are going to free, we add total_bytes_pinned in btrfs_free_extent() and btrfs_free_tree_block(), and release it in unpin_extent_range() when we finish the transaction. So this is a variable we frequently update but rarely read - just the suitable use of percpu_counter. But in previous commit we update total_bytes_pinned by default 32 batch size, making every update essentially a spin lock protected update. Since every spin lock/unlock operation involves syncing a globally used variable and some kind of barrier in a SMP system, this is more expensive than using total_bytes_pinned as a simple atomic64_t. So fix this by using a customized batch size. Since we only read total_bytes_pinned when we are close to ENOSPC and fail to allocate new chunk, we can use a really large batch size and have nearly no penalty in most cases. [Test] We tested the patch on a 4-cores x86 machine: 1. fallocate a 16GiB size test file 2. take snapshot (so all following writes will be COW) 3. run a 180 sec, 4 jobs, 4K random write fio on test file We also added a temporary lockdep class on percpu_counter's spin lock used by total_bytes_pinned to track it by lock_stat. [Results] unpatched: lock_stat version 0.4 ----------------------------------------------------------------------- class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg total_bytes_pinned_percpu: 82 82 0.21 0.61 29.46 0.36 298340 635973 0.09 11.01 173476.25 0.27 patched: lock_stat version 0.4 ----------------------------------------------------------------------- class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg total_bytes_pinned_percpu: 1 1 0.62 0.62 0.62 0.62 13601 31542 0.14 9.61 11016.90 0.35 [Analysis] Since the spin lock only protects a single in-memory variable, the contentions (number of lock acquisitions that had to wait) in both unpatched and patched version are low. But when we see acquisitions and acq-bounces, we get much lower counts in patched version. Here the most important metric is acq-bounces. It means how many times the lock gets transferred between different cpus, so the patch can really reduce cacheline bouncing of spin lock (also the global counter of percpu_counter) in a SMP system. Fixes: b150a4f1 ("Btrfs: use a percpu to keep track of possibly pinned bytes") Signed-off-by: NEthan Lien <ethanlien@synology.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Functions that get btrfs inode can simply reach the fs_info by dereferencing the root and this looks a bit more straightforward compared to the btrfs_sb(...) indirection. If the transaction handle is available and not NULL it's used instead. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The v0 extent type checks are the right case for the unlikely annotations as we don't expect to ever see them, so let's give the compiler some hint. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Following the removal of the v0 handling code let's be courteous and print an error message when such extents are handled. In the cases where we have a transaction just abort it, otherwise just call btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO. In case the error handling would be too intrusive, leave the BUG_ON in place, like extent_data_ref_count, other proper handling would catch that earlier. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
The v0 compat code was introduced in commit 5d4f98a2 ("Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)") 9 years ago, which was merged in 2.6.31. This means that the code is there to support filesystems which are _VERY_ old and if you are using btrfs on such an old kernel, you have much bigger problems. This coupled with the fact that no one is likely testing/maintining this code likely means it has bugs lurking. All things considered I think 43 kernel releases later it's high time this remnant of the past got removed. This patch removes all code wrapped in #ifdefs but leaves the BUG_ONs in case we have a v0 with no support intact as a sort of safety-net. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Su Yue 提交于
If type of extent_inline_ref found is not expected, filesystem may have been corrupted, should return EUCLEAN instead of EINVAL. Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
[BUG] Under certain KVM load and LTP tests, it is possible to hit the following calltrace if quota is enabled: BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 BTRFS critical (device vda2): unable to find logical 8820195328 length 4096 WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 blk_status_to_errno+0x1a/0x30 CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 (unreleased) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] task: ffff9f827b340bc0 task.stack: ffffb4f8c0304000 RIP: 0010:blk_status_to_errno+0x1a/0x30 Call Trace: submit_extent_page+0x191/0x270 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] __do_readpage+0x2d2/0x810 [btrfs] ? btrfs_create_repair_bio+0x130/0x130 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] __extent_read_full_page+0xe7/0x100 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] read_extent_buffer_pages+0x1ab/0x2d0 [btrfs] ? run_one_async_done+0xc0/0xc0 [btrfs] btree_read_extent_buffer_pages+0x94/0xf0 [btrfs] read_tree_block+0x31/0x60 [btrfs] read_block_for_search.isra.35+0xf0/0x2e0 [btrfs] btrfs_search_slot+0x46b/0xa00 [btrfs] ? kmem_cache_alloc+0x1a8/0x510 ? btrfs_get_token_32+0x5b/0x120 [btrfs] find_parent_nodes+0x11d/0xeb0 [btrfs] ? leaf_space_used+0xb8/0xd0 [btrfs] ? btrfs_leaf_free_space+0x49/0x90 [btrfs] ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots_safe+0x93/0x100 [btrfs] btrfs_find_all_roots+0x45/0x60 [btrfs] btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs] btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs] btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs] insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs] btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs] ? pick_next_task_fair+0x2cd/0x530 ? __switch_to+0x92/0x4b0 btrfs_worker_helper+0x81/0x300 [btrfs] process_one_work+0x1da/0x3f0 worker_thread+0x2b/0x3f0 ? process_one_work+0x3f0/0x3f0 kthread+0x11a/0x130 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x35/0x40 BTRFS critical (device vda2): unable to find logical 8820195328 length 16384 BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO failure BTRFS info (device vda2): forced readonly BTRFS error (device vda2): pending csums is 2887680 [CAUSE] It's caused by race with block group auto removal: - There is a meta block group X, which has only one tree block The tree block belongs to fs tree 257. - In current transaction, some operation modified fs tree 257 The tree block gets COWed, so the block group X is empty, and marked as unused, queued to be deleted. - Some workload (like fsync) wakes up cleaner_kthread() Which will call btrfs_delete_unused_bgs() to remove unused block groups. So block group X along its chunk map get removed. - Some delalloc work finished for fs tree 257 Quota needs to get the original reference of the extent, which will read tree blocks of commit root of 257. Then since the chunk map gets removed, the above warning gets triggered. [FIX] Just let btrfs_delete_unused_bgs() skip block group which still has pinned bytes. However there is a minor side effect: currently we only queue empty blocks at update_block_group(), and such empty block group with pinned bytes won't go through update_block_group() again, such block group won't be removed, until it gets new extent allocated and removed. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
When a new extent buffer is allocated there are a few mandatory fields which need to be set in order for the buffer to be sane: level, generation, bytenr, backref_rev, owner and FSID/UUID. Currently this is open coded in the callers of btrfs_alloc_tree_block, meaning it's fairly high in the abstraction hierarchy of operations. This patch solves this by simply moving this init code in btrfs_init_new_buffer, since this is the function which initializes a newly allocated extent buffer. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed transaction handle. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed bg cache. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from trans since the function is always called within a valid transaction. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced directly from the transaction handle since it's always valid. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed transaction handle, since it's always valid. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed transaction handle, since it's always valid. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed block group. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from the passed block group. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from trans since the function is always called within a transaction. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can be referenced from trans since the function is always called within a transaction. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
This function is always called with a valid transaction handle from where fs_info can be referenced. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It can always be referneced from the passed transaction handle since it's always valid. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
fs_info can be refenreced from the transaction handle, since it's always valid. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-