- 16 5月, 2022 40 次提交
-
-
由 Christoph Hellwig 提交于
Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there. Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->submit_io hook to stash away additional information for file system use. To make use of this additional space for information in the completion path, the file system needs to override the ->bi_end_io callback and then call back into iomap, so export iomap_dio_bio_end_io for that. Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Add a wrapper around iomap_dio_rw that keeps the direct I/O internals isolated in inode.c. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
While the active zones within an active block group are reset, and their active resource is released, the block group itself is kept in the active block group list and marked as active. As a result, the list will contain more than max_active_zones block groups. That itself is not fatal for the device as the zones are properly reset. However, that inflated list is, of course, strange. Also, a to-appear patch series, which deactivates an active block group on demand, gets confused with the wrong list. So, fix the issue by finishing the unused block group once it gets read-only, so that we can release the active resource in an early stage. Fixes: be1a1d7a ("btrfs: zoned: finish fully written block group") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Commit be1a1d7a ("btrfs: zoned: finish fully written block group") introduced zone finishing code both for data and metadata end_io path. However, the metadata side is not working as it should. First, it compares logical address (eb->start + eb->len) with offset within a block group (cache->zone_capacity) in submit_eb_page(). That essentially disabled zone finishing on metadata end_io path. Furthermore, fixing the issue above revealed we cannot call btrfs_zone_finish_endio() in end_extent_buffer_writeback(). We cannot call btrfs_lookup_block_group() which require spin lock inside end_io context. Introduce btrfs_schedule_zone_finish_bg() to wait for the extent buffer writeback and do the zone finish IO in a workqueue. Also, drop EXTENT_BUFFER_ZONE_FINISH as it is no longer used. Fixes: be1a1d7a ("btrfs: zoned: finish fully written block group") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Currently, btrfs_zone_finish_endio() finishes a block group only when the written region reaches the end of the block group. We can also finish the block group when no more allocation is possible. Fixes: be1a1d7a ("btrfs: zoned: finish fully written block group") CC: stable@vger.kernel.org # 5.16+ Reviewed-by: NPankaj Raghav <p.raghav@samsung.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
btrfs_zone_finish() and btrfs_zone_finish_endio() have similar code. Introduce do_zone_finish() to factor out the common code. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Naohiro Aota 提交于
Introduce a wrapper to check if all the space in a block group is allocated or not. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
When iterating the backrefs in an extent item if the ptr to the 'current' backref record goes beyond the extent item a warning is generated and -ENOENT is returned. However what's more appropriate to debug such cases would be to return EUCLEAN and also print identifying information about the performed search as well as the current content of the leaf containing the possibly corrupted extent item. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The bio_ctrl is the last use of bio_flags that has been converted to compress type everywhere else. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Several functions take parameter bio_flags that was simplified to just compress type, unify it and change the type accordingly. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The bio_flags is now used to store unchanged compress type, so unify that. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The helpers extent_set_compress_type and extent_compress_type have become trivial after previous cleanups and can be removed. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The bio_flags are used only to encode the compression and there are no other EXTENT_BIO_* flags, so the compress type can be stored directly. The struct member name is left unchanged and will be cleaned in later patches. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The helper used to do more with the wbc state but now it's just one subtraction, no need to have a special helper. It became trivial in a9132667 ("Btrfs: make mapping->writeback_index point to the last written page"). Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The value of btrfs_delayed_extent_op::is_data is always false, we can cascade the change and simplify code that depends on it, removing the structure member eventually. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
The parameter has been added in 2009 in the infamous monster commit 5d4f98a2 ("Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)") but not used ever since. We can sink it and allow further simplifications. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Filipe Manana 提交于
When reserving data space for a direct IO write we can end up deadlocking if we have multiple tasks attempting a write to the same file range, there are multiple extents covered by that file range, we are low on available space for data and the writes don't expand the inode's i_size. The deadlock can happen like this: 1) We have a file with an i_size of 1M, at offset 0 it has an extent with a size of 128K and at offset 128K it has another extent also with a size of 128K; 2) Task A does a direct IO write against file range [0, 256K), and because the write is within the i_size boundary, it takes the inode's lock (VFS level) in shared mode; 3) Task A locks the file range [0, 256K) at btrfs_dio_iomap_begin(), and then gets the extent map for the extent covering the range [0, 128K). At btrfs_get_blocks_direct_write(), it creates an ordered extent for that file range ([0, 128K)); 4) Before returning from btrfs_dio_iomap_begin(), it unlocks the file range [0, 256K); 5) Task A executes btrfs_dio_iomap_begin() again, this time for the file range [128K, 256K), and locks the file range [128K, 256K); 6) Task B starts a direct IO write against file range [0, 256K) as well. It also locks the inode in shared mode, as it's within the i_size limit, and then tries to lock file range [0, 256K). It is able to lock the subrange [0, 128K) but then blocks waiting for the range [128K, 256K), as it is currently locked by task A; 7) Task A enters btrfs_get_blocks_direct_write() and tries to reserve data space. Because we are low on available free space, it triggers the async data reclaim task, and waits for it to reserve data space; 8) The async reclaim task decides to wait for all existing ordered extents to complete (through btrfs_wait_ordered_roots()). It finds the ordered extent previously created by task A for the file range [0, 128K) and waits for it to complete; 9) The ordered extent for the file range [0, 128K) can not complete because it blocks at btrfs_finish_ordered_io() when trying to lock the file range [0, 128K). This results in a deadlock, because: - task B is holding the file range [0, 128K) locked, waiting for the range [128K, 256K) to be unlocked by task A; - task A is holding the file range [128K, 256K) locked and it's waiting for the async data reclaim task to satisfy its space reservation request; - the async data reclaim task is waiting for ordered extent [0, 128K) to complete, but the ordered extent can not complete because the file range [0, 128K) is currently locked by task B, which is waiting on task A to unlock file range [128K, 256K) and task A waiting on the async data reclaim task. This results in a deadlock between 4 task: task A, task B, the async data reclaim task and the task doing ordered extent completion (a work queue task). This type of deadlock can sporadically be triggered by the test case generic/300 from fstests, and results in a stack trace like the following: [12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds. [12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 #1 [12084.035562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000 [12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs] [12084.036599] Call Trace: [12084.036601] <TASK> [12084.036606] __schedule+0x3cb/0xed0 [12084.036616] schedule+0x4e/0xb0 [12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs] [12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0 [12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs] [12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs] [12084.036719] ? lock_is_held_type+0xe8/0x140 [12084.036727] process_one_work+0x252/0x5a0 [12084.036736] ? process_one_work+0x5a0/0x5a0 [12084.036738] worker_thread+0x52/0x3b0 [12084.036743] ? process_one_work+0x5a0/0x5a0 [12084.036745] kthread+0xf2/0x120 [12084.036747] ? kthread_complete_and_exit+0x20/0x20 [12084.036751] ret_from_fork+0x22/0x30 [12084.036765] </TASK> [12084.036769] INFO: task kworker/u16:11:153787 blocked for more than 241 seconds. [12084.037702] Not tainted 5.18.0-rc2-btrfs-next-115 #1 [12084.038540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.039506] task:kworker/u16:11 state:D stack: 0 pid:153787 ppid: 2 flags:0x00004000 [12084.039511] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [12084.039551] Call Trace: [12084.039553] <TASK> [12084.039557] __schedule+0x3cb/0xed0 [12084.039566] schedule+0x4e/0xb0 [12084.039569] schedule_timeout+0xed/0x130 [12084.039573] ? mark_held_locks+0x50/0x80 [12084.039578] ? _raw_spin_unlock_irq+0x24/0x50 [12084.039580] ? lockdep_hardirqs_on+0x7d/0x100 [12084.039585] __wait_for_common+0xaf/0x1f0 [12084.039587] ? usleep_range_state+0xb0/0xb0 [12084.039596] btrfs_wait_ordered_extents+0x3d6/0x470 [btrfs] [12084.039636] btrfs_wait_ordered_roots+0x175/0x240 [btrfs] [12084.039670] flush_space+0x25b/0x630 [btrfs] [12084.039712] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [12084.039747] process_one_work+0x252/0x5a0 [12084.039756] ? process_one_work+0x5a0/0x5a0 [12084.039758] worker_thread+0x52/0x3b0 [12084.039762] ? process_one_work+0x5a0/0x5a0 [12084.039765] kthread+0xf2/0x120 [12084.039766] ? kthread_complete_and_exit+0x20/0x20 [12084.039770] ret_from_fork+0x22/0x30 [12084.039783] </TASK> [12084.039800] INFO: task kworker/u16:17:217907 blocked for more than 241 seconds. [12084.040709] Not tainted 5.18.0-rc2-btrfs-next-115 #1 [12084.041398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.042404] task:kworker/u16:17 state:D stack: 0 pid:217907 ppid: 2 flags:0x00004000 [12084.042411] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] [12084.042461] Call Trace: [12084.042463] <TASK> [12084.042471] __schedule+0x3cb/0xed0 [12084.042485] schedule+0x4e/0xb0 [12084.042490] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs] [12084.042539] ? prepare_to_wait_exclusive+0xc0/0xc0 [12084.042551] lock_extent_bits+0x37/0x90 [btrfs] [12084.042601] btrfs_finish_ordered_io.isra.0+0x3fd/0x960 [btrfs] [12084.042656] ? lock_is_held_type+0xe8/0x140 [12084.042667] btrfs_work_helper+0xf8/0x400 [btrfs] [12084.042716] ? lock_is_held_type+0xe8/0x140 [12084.042727] process_one_work+0x252/0x5a0 [12084.042742] worker_thread+0x52/0x3b0 [12084.042750] ? process_one_work+0x5a0/0x5a0 [12084.042754] kthread+0xf2/0x120 [12084.042757] ? kthread_complete_and_exit+0x20/0x20 [12084.042763] ret_from_fork+0x22/0x30 [12084.042783] </TASK> [12084.042798] INFO: task fio:234517 blocked for more than 241 seconds. [12084.043598] Not tainted 5.18.0-rc2-btrfs-next-115 #1 [12084.044282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12084.045244] task:fio state:D stack: 0 pid:234517 ppid:234515 flags:0x00004000 [12084.045248] Call Trace: [12084.045250] <TASK> [12084.045254] __schedule+0x3cb/0xed0 [12084.045263] schedule+0x4e/0xb0 [12084.045266] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs] [12084.045298] ? prepare_to_wait_exclusive+0xc0/0xc0 [12084.045306] lock_extent_bits+0x37/0x90 [btrfs] [12084.045336] btrfs_dio_iomap_begin+0x336/0xc60 [btrfs] [12084.045370] ? lock_is_held_type+0xe8/0x140 [12084.045378] iomap_iter+0x184/0x4c0 [12084.045383] __iomap_dio_rw+0x2c6/0x8a0 [12084.045406] iomap_dio_rw+0xa/0x30 [12084.045408] btrfs_do_write_iter+0x370/0x5e0 [btrfs] [12084.045440] aio_write+0xfa/0x2c0 [12084.045448] ? __might_fault+0x2a/0x70 [12084.045451] ? kvm_sched_clock_read+0x14/0x40 [12084.045455] ? lock_release+0x153/0x4a0 [12084.045463] io_submit_one+0x615/0x9f0 [12084.045467] ? __might_fault+0x2a/0x70 [12084.045469] ? kvm_sched_clock_read+0x14/0x40 [12084.045478] __x64_sys_io_submit+0x83/0x160 [12084.045483] ? syscall_enter_from_user_mode+0x1d/0x50 [12084.045489] do_syscall_64+0x3b/0x90 [12084.045517] entry_SYSCALL_64_after_hwframe+0x44/0xae [12084.045521] RIP: 0033:0x7fa76511af79 [12084.045525] RSP: 002b:00007ffd6d6b9058 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 [12084.045530] RAX: ffffffffffffffda RBX: 00007fa75ba6e760 RCX: 00007fa76511af79 [12084.045532] RDX: 0000557b304ff3f0 RSI: 0000000000000001 RDI: 00007fa75ba4c000 [12084.045535] RBP: 00007fa75ba4c000 R08: 00007fa751b76000 R09: 0000000000000330 [12084.045537] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [12084.045540] R13: 0000000000000000 R14: 0000557b304ff3f0 R15: 0000557b30521eb0 [12084.045561] </TASK> Fix this issue by always reserving data space before locking a file range at btrfs_dio_iomap_begin(). If we can't reserve the space, then we don't error out immediately - instead after locking the file range, check if we can do a NOCOW write, and if we can we don't error out since we don't need to allocate a data extent, however if we can't NOCOW then error out with -ENOSPC. This also implies that we may end up reserving space when it's not needed because the write will end up being done in NOCOW mode - in that case we just release the space after we noticed we did a NOCOW write - this is the same type of logic that is done in the path for buffered IO writes. Fixes: f0bfa76a ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range") CC: stable@vger.kernel.org # 5.17+ Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Goldwyn Rodrigues 提交于
Derive the compression type from extent map as opposed to the bio flags passed. This makes it more precise and not reliant on function parameters. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
[SUSPICIOUS CODE] When refactoring scrub code, I noticed a very strange behavior around scrub_remap_extent(): if (sctx->is_dev_replace) scrub_remap_extent(fs_info, cur_logical, scrub_len, &cur_physical, &target_dev, &cur_mirror); As replace target is a 1:1 copy of the source device, thus physical offset inside the target should be the same as physical inside source, thus this remap call makes no sense to me. [REAL FUNCTIONALITY] After more investigation, the function name scrub_remap_extent() doesn't tell anything of the truth, nor does its if () condition. The real story behind this function is that, for scrub_pages() we never expect missing device, even for replacing missing device. What scrub_remap_extent() is really doing is to find a live mirror, and make later scrub_pages() to read data from the good copy, other than from the missing device and increase error counters unnecessarily. [IMPROVEMENT] We have no need to bother scrub_remap_extent() in scrub_simple_mirror() at all, we only need to call it before we call scrub_pages(). And rename the function to scrub_find_live_copy(), add extra comments on them. By this we can remove one parameter from scrub_extent(), and reduce the unnecessary calls to scrub_remap_extent() for regular replace. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Since we have find_first_extent_item() to iterate the extent items of a certain range, there is no need to use the open-coded version. Replace the final scrub call site with find_first_extent_item(). Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Currently scrub_raid56_parity() has a large double loop, handling the following things at the same time: - Iterate each data stripe - Iterate each extent item in one data stripe Refactor this by: - Introduce a new helper to handle data stripe iteration The new helper is scrub_raid56_data_stripe_for_parity(), which only has one while() loop handling the extent items inside the data stripe. The code is still mostly the same as the old code. - Call cond_resched() for each extent Previously we only call cond_resched() under a complex if () check. I see no special reason to do that, and for other scrub functions, like scrub_simple_mirror() we're already doing the same cond_resched() after scrubbing one extent. - Add more comments Please note that, this patch is only to address the double loop, there are incoming patches to do extra cleanup. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Although RAID56 has complex repair mechanism, which involves reading the whole full stripe, but inside one data stripe, it's in fact no different than SINGLE/RAID1. The point here is, for data stripe we just check the csum for each extent we hit. Only for csum mismatch case, our repair paths divide. So we can still reuse scrub_simple_mirror() for RAID56 data stripes, which saves quite some code. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Since we have moved all other profiles handling into their own functions, now the main body of scrub_stripe() is just handling RAID56 profiles. There is no need to address other profiles in the main loop of scrub_stripe(), so we can remove those dead branches. Since we're here, also slightly change the timing of initialization of variables like @offset, @increment and @logical. Especially for @logical, we don't really need to initialize it for btrfs_extent_root()/btrfs_csum_root(), we can use bg->start for that purpose. Now those variables are only initialize for RAID56 branches. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
The new entrance will iterate through each data stripe which belongs to the target device. And since inside each data stripe, RAID0 is just SINGLE, while RAID10 is just RAID1, we can reuse scrub_simple_mirror() to do the scrub properly. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
The new helper, scrub_simple_mirror(), will scrub all extents inside a range which only has simple mirror based duplication. This covers every range of SINGLE/DUP/RAID1/RAID1C*, and inside each data stripe for RAID0/RAID10. Currently we will use this function to scrub SINGLE/DUP/RAID1/RAID1C* profiles. As one can see, the new entrance for those simple-mirror based profiles can be small enough (with comments, just reach 100 lines). This function will be the basis for the incoming scrub refactor. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
The new helper, find_first_extent_item(), will locate an extent item (either EXTENT_ITEM or METADATA_ITEM) which covers any byte of the search range. This helper will later be used to refactor scrub code. Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
The variable @physical_end is the exclusive stripe end, currently it's calculated using @physical + @dev_extent_len / map->stripe_len * map->stripe_len. And since at allocation time we ensured dev_extent_len is stripe_len aligned, the result is the same as @physical + @dev_extent_len. So this patch will just assign @physical and @physical_end early, without using @nstripes. This is especially helpful for any possible out: label user, as now we only need to initialize @offset before going to out: label. Since we're here, also make @physical_end constant. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Gabriel Niebler 提交于
… rename it to simply fs_roots and adjust all usages of this object to use the XArray API, because it is notionally easier to use and understand, as it provides array semantics, and also takes care of locking for us, further simplifying the code. Also do some refactoring, esp. where the API change requires largely rewriting some functions, anyway. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NGabriel Niebler <gniebler@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Gabriel Niebler 提交于
… named 'extent_buffers'. Also adjust all usages of this object to use the XArray API, which greatly simplifies the code as it takes care of locking and is generally easier to use and understand, providing notionally simpler array semantics. Also perform some light refactoring. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NGabriel Niebler <gniebler@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Gabriel Niebler 提交于
… and adjust all usages of this object to use the XArray API for the sake of consistency. XArray API provides array semantics, so it is notionally easier to use and understand, and it also takes care of locking for us. None of this makes a real difference in this particular patch, but it does in other places where similar replacements are or have been made and we want to be consistent in our usage of data structures in btrfs. Signed-off-by: NGabriel Niebler <gniebler@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Gabriel Niebler 提交于
… in the btrfs_root struct and adjust all usages of this object to use the XArray API, because it is notionally easier to use and understand, as it provides array semantics, and also takes care of locking for us, further simplifying the code. Also use the opportunity to do some light refactoring. Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NGabriel Niebler <gniebler@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
In function btrfs_bg_flags_to_raid_index(), we use quite some if () to convert the BTRFS_BLOCK_GROUP_* bits to a index number. But the truth is, there is really no such need for so many branches at all. Since all BTRFS_BLOCK_GROUP_* flags are just one single bit set inside BTRFS_BLOCK_GROUP_PROFILES_MASK, we can easily use ilog2() to calculate their values. This calculation has an anchor point, the lowest PROFILE bit, which is RAID0. Even it's fixed on-disk format and should never change, here I added extra compile time checks to make it super safe: 1. Make sure RAID0 is always the lowest bit in PROFILE_MASK This is done by finding the first (least significant) bit set of RAID0 and PROFILE_MASK & ~RAID0. 2. Make sure RAID0 bit set beyond the highest bit of TYPE_MASK Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
It's only internally used as another way to represent btrfs profiles, it's not exposed through any on-disk format, in fact this btrfs_raid_types is diverted from the on-disk format values. Furthermore, since it's internal structure, its definition can change in the future. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
rmw_workers doesn't need ordered execution or thread disabling threshold (as the thresh parameter is less than DFT_THRESHOLD). Just switch to the normal workqueues that use a lot less resources, especially in the work_struct vs btrfs_work structures. Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
All three scrub workqueues don't need ordered execution or thread disabling threshold (as the thresh parameter is less than DFT_THRESHOLD). Just switch to the normal workqueues that use a lot less resources, especially in the work_struct vs btrfs_work structures. Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Just let the one caller that wants optional WQ_HIGHPRI handling allocate a separate btrfs_workqueue for that. This allows to rename struct __btrfs_workqueue to btrfs_workqueue, remove a pointer indirection and separate allocation for all btrfs_workqueue users and generally simplify the code. Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Now the btrfs RAID56 infrastructure has migrated to use sector_ptr interface, it should be safe to enable subpage support for RAID56. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
The non-compatible part is only the bitmap iteration part, now the bitmap size is extended to rbio::stripe_nsectors, not the old rbio::stripe_npages. Since we're here, also slightly improve the function by: - Rename @i to @stripe - Rename @bit to @sectornr - Move @page and @index into the inner loop Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-