提交 · 381b9b4c9cf968c3154d1bad736d11559a38f1c9 · openeuler / Kernel

25 7月, 2022 1 次提交

btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap · 381b9b4c

由 Qu Wenruo 提交于 5月 27, 2022

Previously we use "unsigned long *" for those two bitmaps.

But since we only support fixed stripe length (64KiB, already checked in
tree-checker), "unsigned long *" is really a waste of memory, while we
can just use "unsigned long".

This saves us 8 bytes in total for scrub_parity.

To be extra safe, add an ASSERT() making sure calclulated @nsectors is
always smaller than BITS_PER_LONG.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

381b9b4c

16 5月, 2022 19 次提交

btrfs: scrub: move scrub_remap_extent() call into scrub_extent() · a13467ee

由 Qu Wenruo 提交于 3月 11, 2022

[SUSPICIOUS CODE]
When refactoring scrub code, I noticed a very strange behavior around
scrub_remap_extent():

	if (sctx->is_dev_replace)
		scrub_remap_extent(fs_info, cur_logical, scrub_len,
				   &cur_physical, &target_dev, &cur_mirror);

As replace target is a 1:1 copy of the source device, thus physical
offset inside the target should be the same as physical inside source,
thus this remap call makes no sense to me.

[REAL FUNCTIONALITY]
After more investigation, the function name scrub_remap_extent()
doesn't tell anything of the truth, nor does its if () condition.

The real story behind this function is that, for scrub_pages() we never
expect missing device, even for replacing missing device.

What scrub_remap_extent() is really doing is to find a live mirror, and
make later scrub_pages() to read data from the good copy, other than
from the missing device and increase error counters unnecessarily.

[IMPROVEMENT]
We have no need to bother scrub_remap_extent() in scrub_simple_mirror()
at all, we only need to call it before we call scrub_pages().

And rename the function to scrub_find_live_copy(), add extra comments on
them.

By this we can remove one parameter from scrub_extent(), and reduce the
unnecessary calls to scrub_remap_extent() for regular replace.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a13467ee

btrfs: scrub: use find_first_extent_item to for extent item search · d483bfd2

由 Qu Wenruo 提交于 3月 11, 2022

Since we have find_first_extent_item() to iterate the extent items of a
certain range, there is no need to use the open-coded version.

Replace the final scrub call site with find_first_extent_item().
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d483bfd2

btrfs: scrub: refactor scrub_raid56_parity() · 9ae53bf9

由 Qu Wenruo 提交于 3月 11, 2022

Currently scrub_raid56_parity() has a large double loop, handling the
following things at the same time:

- Iterate each data stripe
- Iterate each extent item in one data stripe

Refactor this by:

- Introduce a new helper to handle data stripe iteration
  The new helper is scrub_raid56_data_stripe_for_parity(), which
  only has one while() loop handling the extent items inside the
  data stripe.

  The code is still mostly the same as the old code.

- Call cond_resched() for each extent
  Previously we only call cond_resched() under a complex if () check.
  I see no special reason to do that, and for other scrub functions,
  like scrub_simple_mirror() we're already doing the same cond_resched()
  after scrubbing one extent.

- Add more comments

Please note that, this patch is only to address the double loop, there
are incoming patches to do extra cleanup.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9ae53bf9

btrfs: scrub: use scrub_simple_mirror() to handle RAID56 data stripe scrub · 18d30ab9

由 Qu Wenruo 提交于 3月 11, 2022

Although RAID56 has complex repair mechanism, which involves reading the
whole full stripe, but inside one data stripe, it's in fact no different
than SINGLE/RAID1.

The point here is, for data stripe we just check the csum for each
extent we hit.  Only for csum mismatch case, our repair paths divide.

So we can still reuse scrub_simple_mirror() for RAID56 data stripes,
which saves quite some code.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

18d30ab9

btrfs: scrub: cleanup the non-RAID56 branches in scrub_stripe() · e430c428

由 Qu Wenruo 提交于 3月 11, 2022

Since we have moved all other profiles handling into their own
functions, now the main body of scrub_stripe() is just handling RAID56
profiles.

There is no need to address other profiles in the main loop of
scrub_stripe(), so we can remove those dead branches.

Since we're here, also slightly change the timing of initialization of
variables like @offset, @increment and @logical.

Especially for @logical, we don't really need to initialize it for
btrfs_extent_root()/btrfs_csum_root(), we can use bg->start for that
purpose.

Now those variables are only initialize for RAID56 branches.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e430c428

btrfs: scrub: introduce dedicated helper to scrub simple-stripe based range · 8557635e

由 Qu Wenruo 提交于 3月 11, 2022

The new entrance will iterate through each data stripe which belongs to
the target device.

And since inside each data stripe, RAID0 is just SINGLE, while RAID10 is
just RAID1, we can reuse scrub_simple_mirror() to do the scrub properly.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8557635e

btrfs: scrub: introduce dedicated helper to scrub simple-mirror based range · 09022b14

由 Qu Wenruo 提交于 3月 11, 2022

The new helper, scrub_simple_mirror(), will scrub all extents inside a
range which only has simple mirror based duplication.

This covers every range of SINGLE/DUP/RAID1/RAID1C*, and inside each
data stripe for RAID0/RAID10.

Currently we will use this function to scrub SINGLE/DUP/RAID1/RAID1C*
profiles.  As one can see, the new entrance for those simple-mirror
based profiles can be small enough (with comments, just reach 100
lines).

This function will be the basis for the incoming scrub refactor.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

09022b14

btrfs: scrub: introduce a helper to locate an extent item · 416bd7e7

由 Qu Wenruo 提交于 3月 11, 2022

The new helper, find_first_extent_item(), will locate an extent item
(either EXTENT_ITEM or METADATA_ITEM) which covers any byte of the
search range.

This helper will later be used to refactor scrub code.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

416bd7e7

btrfs: calculate physical_end using dev_extent_len directly in scrub_stripe() · 1194a824

由 Qu Wenruo 提交于 3月 11, 2022

The variable @physical_end is the exclusive stripe end, currently it's
calculated using @physical + @dev_extent_len / map->stripe_len *
 map->stripe_len.

And since at allocation time we ensured dev_extent_len is stripe_len
aligned, the result is the same as @physical + @dev_extent_len.

So this patch will just assign @physical and @physical_end early,
without using @nstripes.

This is especially helpful for any possible out: label user, as now we
only need to initialize @offset before going to out: label.

Since we're here, also make @physical_end constant.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1194a824

btrfs: use normal workqueues for scrub · be539518

由 Christoph Hellwig 提交于 4月 18, 2022

All three scrub workqueues don't need ordered execution or thread
disabling threshold (as the thresh parameter is less than DFT_THRESHOLD).
Just switch to the normal workqueues that use a lot less resources,
especially in the work_struct vs btrfs_work structures.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

be539518

btrfs: raid56: make raid56_add_scrub_pages() subpage compatible · 6346f6bf

由 Qu Wenruo 提交于 4月 01, 2022

This requires one extra parameter @pgoff for the function.

In the current code base, scrub is still one page per sector, thus the
new parameter will always be 0.

It needs the extra subpage scrub optimization code to fully take
advantage.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6346f6bf

btrfs: don't allocate a btrfs_bio for scrub bios · 75c17e66

由 Christoph Hellwig 提交于 4月 04, 2022

All the scrub bios go straight to the block device or the raid56 code,
none of which looks at the btrfs_bio.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

75c17e66

btrfs: use on-stack bio in scrub_repair_page_from_good_copy · f77dcc0d

由 Christoph Hellwig 提交于 4月 04, 2022

The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
so just use an on-stack bio.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f77dcc0d

btrfs: use on-stack bio in scrub_recheck_block · f3b8a7f3

由 Christoph Hellwig 提交于 4月 04, 2022

The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
so just use an on-stack bio.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f3b8a7f3

btrfs: check-integrity: split submit_bio from btrfsic checking · 58ff51f1

由 Christoph Hellwig 提交于 4月 04, 2022

Require a separate call to the integrity checking helpers from the
actual bio submission.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

58ff51f1

btrfs: remove unnecessary type casts · 0d031dc4

由 Yu Zhe 提交于 3月 31, 2022

Explicit type casts are not necessary when it's void* to another pointer
type.
Signed-off-by: NYu Zhe <yuzhe@nfschina.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0d031dc4

btrfs: scrub: rename scrub_bio::pagev and related members · e360d2f5

由 Qu Wenruo 提交于 3月 13, 2022

Since the subpage support for scrub, one page no longer always represents
one sector, thus scrub_bio::pagev and scrub_bio::sector_count are no
longer accurate.

Rename them to scrub_bio::sectors and scrub_bio::sector_count respectively.
This also involves scrub_ctx::pages_per_bio and other macros involved.

Now the renaming of pages involved in scrub is be finished.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e360d2f5

btrfs: scrub: rename scrub_page to scrub_sector · 46343501

由 Qu Wenruo 提交于 3月 13, 2022

Since the subpage support of scrub, scrub_sector is in fact just
representing one sector.

Thus the name scrub_page is no longer correct, rename it to
scrub_sector.

This also involves the following renames:

- spage -> sector
  Normally we would just replace "page" with "sector" and result
  something like "ssector".
  But the repeating 's' is not really eye friendly.

  So here we just simple use "sector", as there is nothing from MM layer
  called "sector" to cause any confusion.

- scrub_parity::spages -> sectors_list
  Normally we use plural to indicate an array, not a list.
  Rename it to @sectors_list to be more explicit on the list part.

- Also reformat and update comments that get changed
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

46343501

btrfs: scrub: rename members related to scrub_block::pagev · 7e737cbc

由 Qu Wenruo 提交于 3月 13, 2022

The following will be renamed in this patch:

- scrub_block::pagev -> sectors

- scrub_block::page_count -> sector_count

- SCRUB_MAX_PAGES_PER_BLOCK -> SCRUB_MAX_SECTORS_PER_BLOCK

- page_num -> sector_num to iterate scrub_block::sectors

For now scrub_page is not yet renamed to keep the patch reasonable and
it will be updated in a followup.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7e737cbc

21 4月, 2022 1 次提交

btrfs: fix assertion failure during scrub due to block group reallocation · a692e13d

由 Filipe Manana 提交于 4月 19, 2022

During a scrub, or device replace, we can race with block group removal
and allocation and trigger the following assertion failure:

[7526.385524] assertion failed: cache->start == chunk_offset, in fs/btrfs/scrub.c:3817
[7526.387351] ------------[ cut here ]------------
[7526.387373] kernel BUG at fs/btrfs/ctree.h:3599!
[7526.388001] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[7526.388970] CPU: 2 PID: 1158150 Comm: btrfs Not tainted 5.17.0-rc8-btrfs-next-114 #4
[7526.390279] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[7526.392430] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[7526.393520] Code: f3 48 c7 c7 20 (...)
[7526.396926] RSP: 0018:ffffb9154176bc40 EFLAGS: 00010246
[7526.397690] RAX: 0000000000000048 RBX: ffffa0db8a910000 RCX: 0000000000000000
[7526.398732] RDX: 0000000000000000 RSI: ffffffff9d7239a2 RDI: 00000000ffffffff
[7526.399766] RBP: ffffa0db8a911e10 R08: ffffffffa71a3ca0 R09: 0000000000000001
[7526.400793] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa0db4b170800
[7526.401839] R13: 00000003494b0000 R14: ffffa0db7c55b488 R15: ffffa0db8b19a000
[7526.402874] FS:  00007f6c99c40640(0000) GS:ffffa0de6d200000(0000) knlGS:0000000000000000
[7526.404038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7526.405040] CR2: 00007f31b0882160 CR3: 000000014b38c004 CR4: 0000000000370ee0
[7526.406112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[7526.407148] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[7526.408169] Call Trace:
[7526.408529]  <TASK>
[7526.408839]  scrub_enumerate_chunks.cold+0x11/0x79 [btrfs]
[7526.409690]  ? do_wait_intr_irq+0xb0/0xb0
[7526.410276]  btrfs_scrub_dev+0x226/0x620 [btrfs]
[7526.410995]  ? preempt_count_add+0x49/0xa0
[7526.411592]  btrfs_ioctl+0x1ab5/0x36d0 [btrfs]
[7526.412278]  ? __fget_files+0xc9/0x1b0
[7526.412825]  ? kvm_sched_clock_read+0x14/0x40
[7526.413459]  ? lock_release+0x155/0x4a0
[7526.414022]  ? __x64_sys_ioctl+0x83/0xb0
[7526.414601]  __x64_sys_ioctl+0x83/0xb0
[7526.415150]  do_syscall_64+0x3b/0xc0
[7526.415675]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[7526.416408] RIP: 0033:0x7f6c99d34397
[7526.416931] Code: 3c 1c e8 1c ff (...)
[7526.419641] RSP: 002b:00007f6c99c3fca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[7526.420735] RAX: ffffffffffffffda RBX: 00005624e1e007b0 RCX: 00007f6c99d34397
[7526.421779] RDX: 00005624e1e007b0 RSI: 00000000c400941b RDI: 0000000000000003
[7526.422820] RBP: 0000000000000000 R08: 00007f6c99c40640 R09: 0000000000000000
[7526.423906] R10: 00007f6c99c40640 R11: 0000000000000246 R12: 00007fff746755de
[7526.424924] R13: 00007fff746755df R14: 0000000000000000 R15: 00007f6c99c40640
[7526.425950]  </TASK>

That assertion is relatively new, introduced with commit d04fbe19
("btrfs: scrub: cleanup the argument list of scrub_chunk()").

The block group we get at scrub_enumerate_chunks() can actually have a
start address that is smaller then the chunk offset we extracted from a
device extent item we got from the commit root of the device tree.
This is very rare, but it can happen due to a race with block group
removal and allocation. For example, the following steps show how this
can happen:

1) We are at transaction T, and we have the following blocks groups,
   sorted by their logical start address:

   [ bg A, start address A, length 1G (data) ]
   [ bg B, start address B, length 1G (data) ]
   (...)
   [ bg W, start address W, length 1G (data) ]

     --> logical address space hole of 256M,
         there used to be a 256M metadata block group here

   [ bg Y, start address Y, length 256M (metadata) ]

      --> Y matches W's end offset + 256M

   Block group Y is the block group with the highest logical address in
   the whole filesystem;

2) Block group Y is deleted and its extent mapping is removed by the call
   to remove_extent_mapping() made from btrfs_remove_block_group().

   So after this point, the last element of the mapping red black tree,
   its rightmost node, is the mapping for block group W;

3) While still at transaction T, a new data block group is allocated,
   with a length of 1G. When creating the block group we do a call to
   find_next_chunk(), which returns the logical start address for the
   new block group. This calls returns X, which corresponds to the
   end offset of the last block group, the rightmost node in the mapping
   red black tree (fs_info->mapping_tree), plus one.

   So we get a new block group that starts at logical address X and with
   a length of 1G. It spans over the whole logical range of the old block
   group Y, that was previously removed in the same transaction.

   However the device extent allocated to block group X is not the same
   device extent that was used by block group Y, and it also does not
   overlap that extent, which must be always the case because we allocate
   extents by searching through the commit root of the device tree
   (otherwise it could corrupt a filesystem after a power failure or
   an unclean shutdown in general), so the extent allocator is behaving
   as expected;

4) We have a task running scrub, currently at scrub_enumerate_chunks().
   There it searches for device extent items in the device tree, using
   its commit root. It finds a device extent item that was used by
   block group Y, and it extracts the value Y from that item into the
   local variable 'chunk_offset', using btrfs_dev_extent_chunk_offset();

   It then calls btrfs_lookup_block_group() to find block group for
   the logical address Y - since there's currently no block group that
   starts at that logical address, it returns block group X, because
   its range contains Y.

   This results in triggering the assertion:

      ASSERT(cache->start == chunk_offset);

   right before calling scrub_chunk(), as cache->start is X and
   chunk_offset is Y.

This is more likely to happen of filesystems not larger than 50G, because
for these filesystems we use a 256M size for metadata block groups and
a 1G size for data block groups, while for filesystems larger than 50G,
we use a 1G size for both data and metadata block groups (except for
zoned filesystems). It could also happen on any filesystem size due to
the fact that system block groups are always smaller (32M) than both
data and metadata block groups, but these are not frequently deleted, so
much less likely to trigger the race.

So make scrub skip any block group with a start offset that is less than
the value we expect, as that means it's a new block group that was created
in the current transaction. It's pointless to continue and try to scrub
its extents, because scrub searches for extents using the commit root, so
it won't find any. For a device replace, skip it as well for the same
reasons, and we don't need to worry about the possibility of extents of
the new block group not being to the new device, because we have the write
duplication setup done through btrfs_map_block().

Fixes: d04fbe19 ("btrfs: scrub: cleanup the argument list of scrub_chunk()")
CC: stable@vger.kernel.org # 5.17
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a692e13d

14 3月, 2022 1 次提交

btrfs: scrub: remove redundant initialization of increment · 5c07c53f

由 Jiapeng Chong 提交于 1月 21, 2022

increment is being initialized to map->stripe_len but this is never
read as increment is overwritten later on. Remove the redundant
initialization.

Cleans up the following clang-analyzer warning:

fs/btrfs/scrub.c:3193:6: warning: Value stored to 'increment' during its
initialization is never read [clang-analyzer-deadcode.DeadStores].
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5c07c53f

07 1月, 2022 8 次提交

btrfs: scrub: cleanup the argument list of scrub_stripe() · 2ae8ae3d

由 Qu Wenruo 提交于 12月 15, 2021

The argument list of btrfs_stripe() has similar problems of
scrub_chunk():

- Duplicated and ambiguous @base argument
  Can be fetched from btrfs_block_group::bg.

- Ambiguous argument @length
  It's again device extent length

- Ambiguous argument @num
  The instinctive guess would be mirror number, but in fact it's stripe
  index.

Fix it by:

- Remove @base parameter

- Rename @length to @dev_extent_len

- Rename @num to @stripe_index
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2ae8ae3d

btrfs: scrub: cleanup the argument list of scrub_chunk() · d04fbe19

由 Qu Wenruo 提交于 12月 15, 2021

The argument list of scrub_chunk() has the following problems:

- Duplicated @chunk_offset
  It is the same as btrfs_block_group::start.

- Confusing @length
  The most instinctive guess is chunk length, and one may want to delete
  it, but the truth is, it's the device extent length.

Fix this by:

- Remove @chunk_offset
  Use btrfs_block_group::start instead.

- Rename @length to @dev_extent_len
  Also rename the caller to remove the ambiguous naming.

- Rename @cache to @bg
  The "_cache" suffix for btrfs_block_group has been removed for a while.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d04fbe19

btrfs: remove reada infrastructure · f26c9238

由 Qu Wenruo 提交于 12月 14, 2021

Currently there is only one user for btrfs metadata readahead, and
that's scrub.

But even for the single user, it's not providing the correct
functionality it needs, as scrub needs reada for commit root, which
current readahead can't provide. (Although it's pretty easy to add such
feature).

Despite this, there are some extra problems related to metadata
readahead:

- Duplicated feature with btrfs_path::reada

- Partly duplicated feature of btrfs_fs_info::buffer_radix
  Btrfs already caches its metadata in buffer_radix, while readahead
  tries to read the tree block no matter if it's already cached.

- Poor layer separation
  Metadata readahead works kinda at device level.
  This is definitely not the correct layer it should be, since metadata
  is at btrfs logical address space, it should not bother device at all.

  This brings extra chance for bugs to sneak in, while brings
  unnecessary complexity.

- Dead code
  In the very beginning of scrub.c we have #undef DEBUG, rendering all
  the debug related code useless and unable to test.

Thus here I purpose to remove the metadata readahead mechanism
completely.

[BENCHMARK]
There is a full benchmark for the scrub performance difference using the
old btrfs_reada_add() and btrfs_path::reada.

For the worst case (no dirty metadata, slow HDD), there could be a 5%
performance drop for scrub.
For other cases (even SATA SSD), there is no distinguishable performance
difference.

The number is reported scrub speed, in MiB/s.
The resolution is limited by the reported duration, which only has a
resolution of 1 second.

	Old		New		Diff
SSD	455.3		466.332		+2.42%
HDD	103.927 	98.012		-5.69%

Comprehensive test methodology is in the cover letter of the patch.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f26c9238

btrfs: scrub: use btrfs_path::reada for extent tree readahead · dcf62b20

由 Qu Wenruo 提交于 12月 14, 2021

For scrub, we trigger two readaheads for two trees, extent tree to get
where to scrub, and csum tree to get the data checksum.

For csum tree we already trigger readahead in
btrfs_lookup_csums_range(), by setting path->reada.
But for extent tree we don't have any path based readahead.

Add the readahead for extent tree as well, so we can later remove the
btrfs_reada_add() based readahead.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

dcf62b20

btrfs: scrub: remove the unnecessary path parameter for scrub_raid56_parity() · 2522dbe8

由 Qu Wenruo 提交于 12月 14, 2021

In function scrub_stripe() we allocated two btrfs_path's, one @path for
extent tree search and another @ppath for full stripe extent tree search
for RAID56.

This is totally umncessary, as the @ppath usage is completely inside
scrub_raid56_parity(), thus we can move the path allocation into
scrub_raid56_parity() completely.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2522dbe8

btrfs: zoned: sink zone check into btrfs_repair_one_zone · 554aed7d

由 Johannes Thumshirn 提交于 12月 07, 2021

Sink zone check into btrfs_repair_one_zone() so we don't need to do it
in all callers.

Also as btrfs_repair_one_zone() doesn't return a sensible error, make it
a boolean function and return false in case it got called on a non-zoned
filesystem and true on a zoned filesystem.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

554aed7d

btrfs: scrub: merge SCRUB_PAGES_PER_RD_BIO and SCRUB_PAGES_PER_WR_BIO · c9d328c0

由 Qu Wenruo 提交于 12月 06, 2021

These two values were introduced in commit ff023aac ("Btrfs: add code
to scrub to copy read data to another disk") as an optimization.

But the truth is, block layer scheduler can do whatever it wants to
merge/split bios to improve performance.

Doing such "optimization" is not really going to affect much, especially
considering how good current block layer optimizations are doing.
Remove such old and immature optimization from our code.

Since we're here, also change BUG_ON()s using these two macros to use
ASSERT()s.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c9d328c0

btrfs: update SCRUB_MAX_PAGES_PER_BLOCK · 0bb3acdc

由 Qu Wenruo 提交于 12月 06, 2021

Use BTRFS_MAX_METADATA_BLOCKSIZE and SZ_4K (minimal sectorsize) to
calculate this value.

And remove one stale comment on the value, in fact with recent subpage
support, BTRFS_MAX_METADATA_BLOCKSIZE * PAGE_SIZE is already beyond
BTRFS_STRIPE_LEN, just we don't use the full page.

Also since we're here, update the BUG_ON() related to
SCRUB_MAX_PAGES_PER_BLOCK to ASSERT().

As those ASSERT() are really only for developers to catch early obvious
bugs, not to let end users suffer.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

0bb3acdc

03 1月, 2022 3 次提交

btrfs: stop accessing ->csum_root directly · fc28b25e

由 Josef Bacik 提交于 11月 05, 2021

We are going to have multiple csum roots in the future, so convert all
users of ->csum_root to btrfs_csum_root() and rename ->csum_root to
->_csum_root so we can easily find remaining users in the future.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fc28b25e

btrfs: stop accessing ->extent_root directly · 29cbcf40

由 Josef Bacik 提交于 11月 05, 2021

When we start having multiple extent roots we'll need to use a helper to
get to the correct extent_root.  Rename fs_info->extent_root to
_extent_root and convert all of the users of the extent root to using
the btrfs_extent_root() helper.  This will allow us to easily clean up
the remaining direct accesses in the future.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

29cbcf40

btrfs: drop the _nr from the item helpers · 3212fa14

由 Josef Bacik 提交于 10月 21, 2021

Now that all call sites are using the slot number to modify item values,
rename the SETGET helpers to raw_item_*(), and then rework the _nr()
helpers to be the btrfs_item_*() btrfs_set_item_*() helpers, and then
rename all of the callers to the new helpers.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3212fa14

16 11月, 2021 1 次提交

btrfs: make 1-bit bit-fields of scrub_page unsigned int · d08e38b6

由 Colin Ian King 提交于 11月 10, 2021

The bitfields have_csum and io_error are currently signed which is not
recommended as the representation is an implementation defined
behaviour. Fix this by making the bit-fields unsigned ints.

Fixes: 2c363954 ("btrfs: scrub: remove the anonymous structure from scrub_page")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NColin Ian King <colin.i.king@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d08e38b6

27 10月, 2021 6 次提交

btrfs: handle device lookup with btrfs_dev_lookup_args · 562d7b15

由 Josef Bacik 提交于 10月 05, 2021

We have a lot of device lookup functions that all do something slightly
different.  Clean this up by adding a struct to hold the different
lookup criteria, and then pass this around to btrfs_find_device() so it
can do the proper matching based on the lookup criteria.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

562d7b15

btrfs: add a BTRFS_FS_ERROR helper · 84961539

由 Josef Bacik 提交于 10月 05, 2021

We have a few flags that are inconsistently used to describe the fs in
different states of failure.  As of 5963ffca ("btrfs: always abort
the transaction if we abort a trans handle") we will always set
BTRFS_FS_STATE_ERROR if we abort, so we don't have to check both ABORTED
and ERROR to see if things have gone wrong.  Add a helper to check
BTRFS_FS_STATE_ERROR and then convert all checkers of FS_STATE_ERROR to
use the helper.

The TRANS_ABORTED bit check was added in af722733 ("Btrfs: clean up
resources during umount after trans is aborted") but is not actually
specific.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

84961539

btrfs: remove btrfs_raid_bio::fs_info member · 6a258d72

由 Qu Wenruo 提交于 9月 23, 2021

We can grab fs_info reliably from btrfs_raid_bio::bioc, as the bioc is
always passed into alloc_rbio(), and only get released when the raid bio
is released.

Remove btrfs_raid_bio::fs_info member, and cleanup all the @fs_info
parameters for alloc_rbio() callers.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6a258d72

btrfs: rename struct btrfs_io_bio to btrfs_bio · c3a3b19b

由 Qu Wenruo 提交于 9月 15, 2021

Previously we had "struct btrfs_bio", which records IO context for
mirrored IO and RAID56, and "strcut btrfs_io_bio", which records extra
btrfs specific info for logical bytenr bio.

With "btrfs_bio" renamed to "btrfs_io_context", we are safe to rename
"btrfs_io_bio" to "btrfs_bio" which is a more suitable name now.

The struct btrfs_bio changes meaning by this commit. There was a
suggested name like btrfs_logical_bio but it's a bit long and we'd
prefer to use a shorter name.

This could be a concern for backports to older kernels where the
different meaning could possibly cause confusion or bugs. Comparing the
new and old structures, there's no overlap among the struct members so a
build would break in case of incorrect backport.

We haven't had many backports to bio code anyway so this is more of a
theoretical cause of bugs and a matter of precaution but we'll need to
keep the semantic change in mind.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c3a3b19b

btrfs: remove btrfs_bio_alloc() helper · cd8e0cca

由 Qu Wenruo 提交于 9月 15, 2021

The helper btrfs_bio_alloc() is almost the same as btrfs_io_bio_alloc(),
except it's allocating using BIO_MAX_VECS as @nr_iovecs, and initializes
bio->bi_iter.bi_sector.

However the naming itself is not using "btrfs_io_bio" to indicate its
parameter is "strcut btrfs_io_bio" and can be easily confused with
"struct btrfs_bio".

Considering assigned bio->bi_iter.bi_sector is such a simple work and
there are already tons of call sites doing that manually, there is no
need to do that in a helper.

Remove btrfs_bio_alloc() helper, and enhance btrfs_io_bio_alloc()
function to provide a fail-safe value for its @nr_iovecs.

And then replace all btrfs_bio_alloc() callers with
btrfs_io_bio_alloc().
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cd8e0cca

btrfs: rename btrfs_bio to btrfs_io_context · 4c664611

由 Qu Wenruo 提交于 9月 15, 2021

The structure btrfs_bio is used by two different sites:

- bio->bi_private for mirror based profiles
  For those profiles (SINGLE/DUP/RAID1*/RAID10), this structures records
  how many mirrors are still pending, and save the original endio
  function of the bio.

- RAID56 code
  In that case, RAID56 only utilize the stripes info, and no long uses
  that to trace the pending mirrors.

So btrfs_bio is not always bind to a bio, and contains more info for IO
context, thus renaming it will make the naming less confusing.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4c664611

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功