提交 · e14bfdb5a1f5fbb0a45f64a763c8da00637072d1 · openeuler / Kernel

14 3月, 2022 4 次提交

btrfs: remove the bio argument from finish_compressed_bio_read · e14bfdb5

由 Josef Bacik 提交于 2月 18, 2022

This bio is usually one of the compressed bio's, and we don't actually
need it in this function, so remove the argument and stop passing it
around.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e14bfdb5

btrfs: check correct bio in finish_compressed_bio_read · b0bbc8a3

由 Josef Bacik 提交于 2月 18, 2022

Commit c09abff8 ("btrfs: cloned bios must not be iterated by
bio_for_each_segment_all") added ASSERT()'s to make sure we weren't
calling bio_for_each_segment_all() on a RAID5/6 bio.  However it was
checking the bio that the compression code passed in, not the
cb->orig_bio that we actually iterate over, so adjust this ASSERT() to
check the correct bio.
Reviewed-by: NBoris Burkov <boris@bur.io>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b0bbc8a3

btrfs: add BTRFS_IOC_ENCODED_WRITE · 7c0c7269

由 Omar Sandoval 提交于 8月 13, 2019

The implementation resembles direct I/O: we have to flush any ordered
extents, invalidate the page cache, and do the io tree/delalloc/extent
map/ordered extent dance. From there, we can reuse the compression code
with a minor modification to distinguish the write from writeback. This
also creates inline extents when possible.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7c0c7269

btrfs: don't advance offset for compressed bios in btrfs_csum_one_bio() · e331f6b1

由 Omar Sandoval 提交于 11月 06, 2019

btrfs_csum_one_bio() loops over each filesystem block in the bio while
keeping a cursor of its current logical position in the file in order to
look up the ordered extent to add the checksums to. However, this
doesn't make much sense for compressed extents, as a sector on disk does
not correspond to a sector of decompressed file data. It happens to work
because:

1) the compressed bio always covers one ordered extent
2) the size of the bio is always less than the size of the ordered
   extent

However, the second point will not always be true for encoded writes.

Let's add a boolean parameter to btrfs_csum_one_bio() to indicate that
it can assume that the bio only covers one ordered extent. Since we're
already changing the signature, let's get rid of the contig parameter
and make it implied by the offset parameter, similar to the change we
recently made to btrfs_lookup_bio_sums(). Additionally, let's rename
nr_sectors to blockcount to make it clear that it's the number of
filesystem blocks, not the number of 512-byte sectors.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e331f6b1

07 1月, 2022 1 次提交

btrfs: remove unnecessary parameter type from compression_decompress_bio · 4a9e803e

由 Su Yue 提交于 12月 27, 2021

btrfs_decompress_bio, the only caller of compression_decompress_bio gets
type from @CB and passes it to compression_decompress_bio.
However, compression_decompress_bio can get compression type directly
from @CB.

So remove the parameter and access it through @CB.  No functional
change.
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NSu Yue <l@damenly.su>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4a9e803e

03 1月, 2022 1 次提交

btrfs: set BTRFS_FS_STATE_NO_CSUMS if we fail to load the csum root · 056c8311

由 Josef Bacik 提交于 11月 05, 2021

We have a few places where we skip doing csums if we mounted with one of
the rescue options that ignores bad csum roots.  In the future when
there are multiple csum roots it'll be costly to check and see if there
are any missing csum roots, so simply add a flag to indicate the fs
should skip loading csums in case of errors.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

056c8311

27 10月, 2021 15 次提交

Revert "btrfs: compression: drop kmap/kunmap from generic helpers" · 3a60f653

由 David Sterba 提交于 10月 27, 2021

This reverts commit 4c2bf276.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839Signed-off-by: NDavid Sterba <dsterba@suse.com>

3a60f653

btrfs: subpage: make end_compressed_bio_writeback() compatible · 741ec653

由 Qu Wenruo 提交于 9月 27, 2021

In end_compressed_writeback() we just clear the full page writeback.
For subpage case, if there are two delalloc ranges in the same page, the
2nd range will trigger a BUG_ON() as the page writeback is already
cleared by previous range.

Fix it by using btrfs_page_clamp_clear_writeback() helper.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

741ec653

btrfs: subpage: make btrfs_submit_compressed_write() compatible · bbbff01a

由 Qu Wenruo 提交于 9月 27, 2021

There is a WARN_ON() checking if @start is aligned to PAGE_SIZE, not
sectorsize, which will cause false alert for subpage.  Fix it to check
against sectorsize.

Furthermore:

- Use ASSERT() to do the check
  So that in the future we may skip the check for production build

- Also check alignment for @len
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bbbff01a

btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write · 91507240

由 Qu Wenruo 提交于 9月 27, 2021

Currently btrfs_submit_compressed_write() will check
btrfs_bio_fits_in_stripe() each time a new page is going to be added.
Even if compressed extent is small, we don't really need to do that for
every page.

Align the behavior to extent_io.c, by determining the stripe boundary
when allocating a bio.

Unlike extent_io.c, in compressed.c we don't need to bother things like
different bio flags, thus no need to re-use bio_ctrl.

Here we just manually introduce new local variable, next_stripe_start,
and use that value returned from alloc_compressed_bio() to calculate
the stripe boundary.

Then each time we add some page range into the bio, we check if we
reached the boundary.  And if reached, submit it.

Also, since we have @cur_disk_bytenr to determine whether we're the last
bio, we don't need a explicit last_bio: tag for error handling any more.

And since we use @cur_disk_bytenr to wait, there is no need for
pending_bios, also remove it to save some memory of compressed_bio.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

91507240

btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_read · f472c28f

由 Qu Wenruo 提交于 9月 27, 2021

Currently btrfs_submit_compressed_read() will check
btrfs_bio_fits_in_stripe() each time a new page is going to be added.
Even if compressed extent is small, we don't really need to do that for
every page.

This patch will align the behavior to extent_io.c, by determining the
stripe boundary when allocating a bio.

Unlike extent_io.c, in compressed.c we don't need to bother things like
different bio flags, thus no need to re-use bio_ctrl.

Here we just manually introduce new local variable, next_stripe_start,
and teach alloc_compressed_bio() to calculate the stripe boundary.

Then each time we add some page range into the bio, we check if we
reached the boundary.  And if reached, submit it.

Also, since we have @cur_disk_byte to determine whether we're the last
bio, we don't need a explicit last_bio: tag for error handling any more.

And we can use @cur_disk_byte to track which range has been added to
bio, we can also use @cur_disk_byte to calculate the wait condition, no
need for @pending_bios.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f472c28f

btrfs: introduce alloc_compressed_bio() for compression · 22c306fe

由 Qu Wenruo 提交于 9月 27, 2021

Just aggregate the bio allocation code into one helper, so that we can
replace 4 call sites.

There is one special note for zoned write.

Currently btrfs_submit_compressed_write() will only allocate the first
bio using ZONE_APPEND.  If we have to submit current bio due to stripe
boundary, the new bio allocated will not use ZONE_APPEND.

In theory this should be a bug, but considering zoned mode currently
only support SINGLE profile, which doesn't have any stripe boundary
limit, it should never be a problem and we have assertions in place.

This function will provide a good entrance for any work which needs to
be done at bio allocation time. Like determining the stripe boundary.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

22c306fe

btrfs: introduce submit_compressed_bio() for compression · 2d4e0b84

由 Qu Wenruo 提交于 9月 27, 2021

The new helper, submit_compressed_bio(), will aggregate the following
work:

- Increase compressed_bio::pending_bios
- Remap the endio function
- Map and submit the bio

This slightly reorders calls to btrfs_csum_one_bio or
btrfs_lookup_bio_sums but but none of them does anything regarding IO
submission so this is effectively no change. We mainly care about order
of

- atomic_inc
- btrfs_bio_wq_end_io
- btrfs_map_bio
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2d4e0b84

btrfs: handle errors properly inside btrfs_submit_compressed_write() · 6853c64a

由 Qu Wenruo 提交于 9月 27, 2021

Just like btrfs_submit_compressed_read(), there are quite some BUG_ON()s
inside btrfs_submit_compressed_write() for the bio submission path.

Fix them using the same method:

- For last bio, just endio the bio
  As in that case, one of the endio function of all these submitted bio
  will be able to free the compressed_bio

- For half-submitted bio, wait and finish the compressed_bio manually
  In this case, as long as all other bio finish, we're the only one
  referring the compressed bio, and can manually finish it.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6853c64a

btrfs: handle errors properly inside btrfs_submit_compressed_read() · 86ccbb4d

由 Qu Wenruo 提交于 9月 27, 2021

There are quite some BUG_ON()s inside btrfs_submit_compressed_read(),
namely all errors inside the for() loop relies on BUG_ON() to handle
-ENOMEM.

Handle these errors properly by:

- Wait for submitted bios to finish first
  Using wake_var_event() APIs to wait without introducing extra memory
  overhead inside compressed_bio.
  This allows us to wait for any submitted bio to finish, while still
  keeps the compressed_bio from being freed.

- Introduce finish_compressed_bio_read() to finish the compressed_bio

- Properly end the bio and finish compressed_bio when error happens

Now in btrfs_submit_compressed_read() even when the bio submission
failed, we can properly handle the error without triggering BUG_ON().
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

86ccbb4d

btrfs: subpage: add bitmap for PageChecked flag · e4f94347

由 Qu Wenruo 提交于 9月 27, 2021

Although in btrfs we have very limited usage of PageChecked flag, it's
still some page flag not yet subpage compatible.

Fix it by introducing btrfs_subpage::checked_offset to do the convert.

For most call sites, especially for free-space cache, COW fixup and
btrfs_invalidatepage(), they all work in full page mode anyway.

For other call sites, they work as subpage compatible mode.

Some call sites need extra modification:

- btrfs_drop_pages()
  Needs extra parameter to get the real range we need to clear checked
  flag.

  Also since btrfs_drop_pages() will accept pages beyond the dirtied
  range, update btrfs_subpage_clamp_range() to handle such case
  by setting @len to 0 if the page is beyond target range.

- btrfs_invalidatepage()
  We need to call subpage helper before calling __btrfs_releasepage(),
  or it will trigger ASSERT() as page->private will be cleared.

- btrfs_verify_data_csum()
  In theory we don't need the io_bio->csum check anymore, but it's
  won't hurt.  Just change the comment.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e4f94347

btrfs: introduce compressed_bio::pending_sectors to trace compressed bio · 6ec9765d

由 Qu Wenruo 提交于 9月 27, 2021

For btrfs_submit_compressed_read() and btrfs_submit_compressed_write(),
we have a pretty weird dance around compressed_bio::pending_bios:

  btrfs_submit_compressed_read/write()
  {
	cb = kmalloc()
	refcount_set(&cb->pending_bios, 0);
	bio = btrfs_alloc_bio();

	/* NOTE here, we haven't yet submitted any bio */
	refcount_set(&cb->pending_bios, 1);

	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
		if (submit) {
			/* Here we submit bio, but we always have one
			 * extra pending_bios */
			refcount_inc(&cb->pending_bios);
			ret = btrfs_map_bio();
		}
	}

	/* Submit the last bio */
	ret = btrfs_map_bio();
  }

There are two reasons why we do this:

- compressed_bio::pending_bios is a refcount
  Thus if it's reduced to 0, it can not be increased again.

- To ensure the compressed_bio is not freed by some submitted bios
  If the submitted bio is finished before the next bio submitted,
  we can free the compressed_bio completely.

But the above code is sometimes confusing, and we can do it better by
introducing a new member, compressed_bio::pending_sectors.

Now we use compressed_bio::pending_sectors to indicate whether we have
any pending sectors under IO or not yet submitted.

If pending_sectors == 0, we're definitely the last bio of compressed_bio,
and is OK to release the compressed bio.

Now the workflow looks like this:

  btrfs_submit_compressed_read/write()
  {
	cb = kmalloc()
	atomic_set(&cb->pending_bios, 0);
	refcount_set(&cb->pending_sectors,
		     compressed_len >> sectorsize_bits);
	bio = btrfs_alloc_bio();

	for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
		if (submit) {
			refcount_inc(&cb->pending_bios);
			ret = btrfs_map_bio();
		}
	}

	/* Submit the last bio */
	refcount_inc(&cb->pending_bios);
	ret = btrfs_map_bio();
  }

For now we still need pending_bios for later error handling, but will
remove pending_bios eventually after properly handling the errors.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6ec9765d

btrfs: subpage: make add_ra_bio_pages() compatible · 6a404910

由 Qu Wenruo 提交于 9月 27, 2021

[BUG]
If we remove the subpage limitation in add_ra_bio_pages(), then read a
compressed extent which has part of its range in next page, like the
following inode layout:

	0	32K	64K	96K	128K
	|<--------------|-------------->|

Btrfs will trigger ASSERT() in endio function:

  assertion failed: atomic_read(&subpage->readers) >= nbits
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3431!
  Internal error: Oops - BUG: 0 [#1] SMP
  Workqueue: btrfs-endio btrfs_work_helper [btrfs]
  Call trace:
   assertfail.constprop.0+0x28/0x2c [btrfs]
   btrfs_subpage_end_reader+0x148/0x14c [btrfs]
   end_page_read+0x8c/0x100 [btrfs]
   end_bio_extent_readpage+0x320/0x6b0 [btrfs]
   bio_endio+0x15c/0x1dc
   end_workqueue_fn+0x44/0x64 [btrfs]
   btrfs_work_helper+0x74/0x250 [btrfs]
   process_one_work+0x1d4/0x47c
   worker_thread+0x180/0x400
   kthread+0x11c/0x120
   ret_from_fork+0x10/0x30
  ---[ end trace c8b7b552d3bb408c ]---

[CAUSE]
When we read the page range [0, 64K), we find it's a compressed extent,
and we will try to add extra pages in add_ra_bio_pages() to avoid
reading the same compressed extent.

But when we add such page into the read bio, it doesn't follow the
behavior of btrfs_do_readpage() to properly set subpage::readers.

This means, for page [64K, 128K), its subpage::readers is still 0.

And when endio is executed on both pages, since page [64K, 128K) has 0
subpage::readers, it triggers above ASSERT()

[FIX]
Function add_ra_bio_pages() is far from subpage compatible, it always
assume PAGE_SIZE == sectorsize, thus when it skip to next range it
always just skip PAGE_SIZE.

Make it subpage compatible by:

- Skip to next page properly when needed
  If we find there is already a page cache, we need to skip to next page.
  For that case, we shouldn't just skip PAGE_SIZE bytes, but use
  @pg_index to calculate the next bytenr and continue.

- Only add the page range covered by current extent map
  We need to calculate which range is covered by current extent map and
  only add that part into the read bio.

- Update subpage::readers before submitting the bio

- Use proper cursor other than confusing @last_offset

- Calculate the missed threshold based on sector size
  It's no longer using missed pages, as for 64K page size, we have at
  most 3 pages to skip. (If aligned only 2 pages)

- Add ASSERT() to make sure our bytenr is always aligned

- Add comment for the function
  Add a special note for subpage case, as the function won't really
  work well for subpage cases.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6a404910

btrfs: remove unused parameter nr_pages in add_ra_bio_pages() · cd9255be

由 Qu Wenruo 提交于 9月 27, 2021

Variable @nr_pages only gets increased but never used.  Remove it.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cd9255be

btrfs: rename struct btrfs_io_bio to btrfs_bio · c3a3b19b

由 Qu Wenruo 提交于 9月 15, 2021

Previously we had "struct btrfs_bio", which records IO context for
mirrored IO and RAID56, and "strcut btrfs_io_bio", which records extra
btrfs specific info for logical bytenr bio.

With "btrfs_bio" renamed to "btrfs_io_context", we are safe to rename
"btrfs_io_bio" to "btrfs_bio" which is a more suitable name now.

The struct btrfs_bio changes meaning by this commit. There was a
suggested name like btrfs_logical_bio but it's a bit long and we'd
prefer to use a shorter name.

This could be a concern for backports to older kernels where the
different meaning could possibly cause confusion or bugs. Comparing the
new and old structures, there's no overlap among the struct members so a
build would break in case of incorrect backport.

We haven't had many backports to bio code anyway so this is more of a
theoretical cause of bugs and a matter of precaution but we'll need to
keep the semantic change in mind.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c3a3b19b

btrfs: remove btrfs_bio_alloc() helper · cd8e0cca

由 Qu Wenruo 提交于 9月 15, 2021

The helper btrfs_bio_alloc() is almost the same as btrfs_io_bio_alloc(),
except it's allocating using BIO_MAX_VECS as @nr_iovecs, and initializes
bio->bi_iter.bi_sector.

However the naming itself is not using "btrfs_io_bio" to indicate its
parameter is "strcut btrfs_io_bio" and can be easily confused with
"struct btrfs_bio".

Considering assigned bio->bi_iter.bi_sector is such a simple work and
there are already tons of call sites doing that manually, there is no
need to do that in a helper.

Remove btrfs_bio_alloc() helper, and enhance btrfs_io_bio_alloc()
function to provide a fail-safe value for its @nr_iovecs.

And then replace all btrfs_bio_alloc() callers with
btrfs_io_bio_alloc().
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cd8e0cca

18 10月, 2021 1 次提交

mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h> · e41d12f5

由 Christoph Hellwig 提交于 9月 20, 2021

There is no need to pull blk-cgroup.h and thus blkdev.h in here, so
break the include chain.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e41d12f5

23 8月, 2021 5 次提交

btrfs: rework btrfs_decompress_buf2page() · 1c3dc173

由 Qu Wenruo 提交于 7月 05, 2021

There are several bugs inside the function btrfs_decompress_buf2page()

- @start_byte doesn't take bvec.bv_offset into consideration
  Thus it can't handle case where the target range is not page aligned.

- Too many helper variables
  There are tons of helper variables, @buf_offset, @current_buf_start,
  @start_byte, @prev_start_byte, @working_bytes, @bytes.
  This hurts anyone who wants to read the function.

- No obvious main cursor for the iteartion
  A new problem caused by previous problem.

- Comments for parameter list makes no sense
  Like @buf_start is the offset to @buf, or offset inside the full
  decompressed extent? (Spoiler alert, the later case)
  And @total_out acts more like @buf_start + @size_of_buf.

  The worst is @disk_start.
  The real meaning of it is the file offset of the full decompressed
  extent.

This patch will rework the whole function by:

- Add a proper comment with ASCII art to explain the parameter list

- Rework parameter list
  The old @buf_start is renamed to @decompressed, to show how many bytes
  are already decompressed inside the full decompressed extent.
  The old @total_out is replaced by @buf_len, which is the decompressed
  data size.
  For old @disk_start and @bio, just pass @compressed_bio in.

- Use single main cursor
  The main cursor will be @cur_file_offset, to show what's the current
  file offset.
  Other helper variables will be declared inside the main loop, and only
  minimal amount of helper variables:
  * offset_inside_decompressed_buf:	The only real helper
  * copy_start_file_offset:		File offset we start memcpy
  * bvec_file_offset:			File offset of current bvec

Even with all these extensive comments, the final function is still
smaller than the original function, which is definitely a win.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1c3dc173

btrfs: grab correct extent map for subpage compressed extent read · 557023ea

由 Qu Wenruo 提交于 7月 05, 2021

[BUG]
When subpage compressed read write support is enabled, btrfs/038 always
fails with EIO.

A simplified script can easily trigger the problem:

  mkfs.btrfs -f -s 4k $dev
  mount $dev $mnt -o compress=lzo

  xfs_io -f -c "truncate 118811" $mnt/foo
  xfs_io -c "pwrite -S 0x0d -b 39987 92267 39987" $mnt/foo > /dev/null

  sync
  btrfs subvolume snapshot -r $mnt $mnt/mysnap1

  xfs_io -c "pwrite -S 0x3e -b 80000 200000 80000" $mnt/foo > /dev/null
  sync

  xfs_io -c "pwrite -S 0xdc -b 10000 250000 10000" $mnt/foo > /dev/null
  xfs_io -c "pwrite -S 0xff -b 10000 300000 10000" $mnt/foo > /dev/null

  sync
  btrfs subvolume snapshot -r $mnt $mnt/mysnap2

  cat $mnt/mysnap2/foo
  # Above cat will fail due to EIO

[CAUSE]
The problem is in btrfs_submit_compressed_read().

When it tries to grab the extent map of the read range, it uses the
following call:

	em = lookup_extent_mapping(em_tree,
  				   page_offset(bio_first_page_all(bio)),
				   fs_info->sectorsize);

The problem is in the page_offset(bio_first_page_all(bio)) part.

The offending inode has the following file extent layout

        item 10 key (257 EXTENT_DATA 131072) itemoff 15639 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13680640 nr 4096
                extent data offset 0 nr 4096 ram 4096
                extent compression 0 (none)
        item 11 key (257 EXTENT_DATA 135168) itemoff 15586 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 0 nr 0
        item 12 key (257 EXTENT_DATA 196608) itemoff 15533 itemsize 53
                generation 8 type 1 (regular)
                extent data disk byte 13676544 nr 4096
                extent data offset 0 nr 53248 ram 86016
                extent compression 2 (lzo)

And the bio passed in has the following parameters:

page_offset(bio_first_page_all(bio))	= 131072
bio_first_bvec_all(bio)->bv_offset	= 65536

If we use page_offset(bio_first_page_all(bio) without adding bv_offset,
we will get an extent map for file offset 131072, not 196608.

This means we read uncompressed data from disk, and later decompression
will definitely fail.

[FIX]
Take bv_offset into consideration when trying to grab an extent map.

And add an ASSERT() to ensure we're really getting a compressed extent.

Thankfully this won't affect anything but subpage, thus we only need to
ensure this patch get merged before we enabled basic subpage support.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

557023ea

btrfs: disable compressed readahead for subpage · ca62e85d

由 Qu Wenruo 提交于 7月 26, 2021

For current subpage support, we only support 64K page size with 4K
sector size.

This makes compressed readahead less effective, as maximum compressed
extent size is only 128K, 2x the page size.

On the other hand, the function add_ra_bio_pages() is still assuming
sectorsize == PAGE_SIZE, and code change may affect 4K page size
systems.

So for now, let's disable subpage compressed readahead for now.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ca62e85d

btrfs: compression: drop kmap/kunmap from generic helpers · 4c2bf276

由 David Sterba 提交于 6月 15, 2021

The pages in compressed_pages are not from highmem anymore so we can
drop the mapping for checksum calculation and inline extent.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4c2bf276

btrfs: drop from __GFP_HIGHMEM all allocations · b0ee5e1e

由 David Sterba 提交于 6月 14, 2021

The highmem flag is used for allocating pages for compression and for
raid56 pages. The high memory makes sense on 32bit systems but is not
without problems. On 64bit system's it's just another layer of wrappers.

The time the pages are allocated for compression or raid56 is relatively
short (about a transaction commit), so the pages are not blocked
indefinitely. As the number of pages depends on the amount of data being
written/read, there's a theoretical problem. A fast device on a 32bit
system could use most of the low memory pool, while with the highmem
allocation that would not happen. This was possibly the original idea
long time ago, but nowadays we optimize for 64bit systems.

This patch removes all usage of the __GFP_HIGHMEM flag for page
allocation, the kmap/kunmap are still in place and will be removed in
followup patches. Remaining is masking out the bit in
alloc_extent_state and __lookup_free_space_inode, that can safely stay.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b0ee5e1e

29 7月, 2021 1 次提交

btrfs: mark compressed range uptodate only if all bio succeed · 240246f6

由 Goldwyn Rodrigues 提交于 7月 09, 2021

In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.

Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.

Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

240246f6

22 6月, 2021 1 次提交

btrfs: remove a stale comment for btrfs_decompress_bio() · c86bdc9b

由 Qu Wenruo 提交于 6月 11, 2021

Since commit 8140dc30 ("btrfs: btrfs_decompress_bio() could accept
compressed_bio instead"), btrfs_decompress_bio() accepts
"struct compressed_bio" other than open-coded parameter list.

Thus the comments for the parameter list is no longer needed.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c86bdc9b

21 6月, 2021 7 次提交

btrfs: pass btrfs_inode to btrfs_writepage_endio_finish_ordered() · 38a39ac7

由 Qu Wenruo 提交于 4月 08, 2021

There is a pretty bad abuse of btrfs_writepage_endio_finish_ordered() in
end_compressed_bio_write().

It passes compressed pages to btrfs_writepage_endio_finish_ordered(),
which is only supposed to accept inode pages.

Thankfully the important info here is the inode, so let's pass
btrfs_inode directly into btrfs_writepage_endio_finish_ordered(), and
make @page parameter optional.

By this, end_compressed_bio_write() can happily pass page=NULL while
still getting everything done properly.

Also, to cooperate with such modification, replace @page parameter for
trace_btrfs_writepage_end_io_hook() with btrfs_inode.
Although this removes page_index info, the existing start/len should be
enough for most usage.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

38a39ac7

btrfs: fix comment about max_out in btrfs_compress_pages · 4183abf6

由 Anand Jain 提交于 5月 29, 2021

Commit e5d74902 ("btrfs: derive maximum output size in the
compression implementation") removed @max_out argument in
btrfs_compress_pages() but its comment remained, remove it.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4183abf6

btrfs: optimize variables size in btrfs_submit_compressed_write · 65b5355f

由 Anand Jain 提交于 5月 29, 2021

Patch "btrfs: reduce compressed_bio member's types" reduced some
member's size. Function arguments @len, @compressed_len and @nr_pages
can be declared as unsigned int.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

65b5355f

btrfs: optimize variables size in btrfs_submit_compressed_read · 356b4a2d

由 Anand Jain 提交于 5月 29, 2021

Patch "btrfs: reduce compressed_bio member's types" reduced some
member's size. Declare the variables @compressed_len, @nr_pages and
@pg_index size as an unsigned int in the function
btrfs_submit_compressed_read.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

356b4a2d

btrfs: reduce the variable size to fit nr_pages · 1d08ce58

由 Anand Jain 提交于 5月 29, 2021

Patch "btrfs: reduce compressed_bio member's types" reduced the
@nr_pages size to unsigned int, its cascading effects are updated here.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1d08ce58

btrfs: reduce compressed_bio members' types · 282ab3ff

由 David Sterba 提交于 10月 14, 2019

Several members of compressed_bio are of type that's unnecessarily big
for the values that they'd hold:

- the size of the uncompressed and compressed data is 128K now, we can
  keep is as int
- same for number of pages
- the compress type fits to a byte
- the errors is 0/1

The size of the unpatched structure is 80 bytes with several holes.
Reordering nr_pages next to the pages the hole after pending_bios is
filled and the resulting size is 56 bytes. This keeps the csums array
aligned to 8 bytes, which is nice. Further size optimizations may be
possible but right now it looks good to me:

struct compressed_bio {
        refcount_t                 pending_bios;         /*     0     4 */
        unsigned int               nr_pages;             /*     4     4 */
        struct page * *            compressed_pages;     /*     8     8 */
        struct inode *             inode;                /*    16     8 */
        u64                        start;                /*    24     8 */
        unsigned int               len;                  /*    32     4 */
        unsigned int               compressed_len;       /*    36     4 */
        u8                         compress_type;        /*    40     1 */
        u8                         errors;               /*    41     1 */

        /* XXX 2 bytes hole, try to pack */

        int                        mirror_num;           /*    44     4 */
        struct bio *               orig_bio;             /*    48     8 */
        u8                         sums[];               /*    56     0 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 54, holes: 1, sum holes: 2 */
        /* last cacheline: 56 bytes */
};
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

282ab3ff

btrfs: zoned: factor out zoned device lookup · e7ff9e6b

由 Johannes Thumshirn 提交于 5月 19, 2021

To be able to construct a zone append bio we need to look up the
btrfs_device. The code doing the chunk map lookup to get the device is
present in btrfs_submit_compressed_write and submit_extent_page.

Factor out the lookup calls into a helper and use it in the submission
paths.
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e7ff9e6b

28 5月, 2021 1 次提交

btrfs: fix compressed writes that cross stripe boundary · 4c80a97d

由 Qu Wenruo 提交于 5月 25, 2021

[BUG]
When running btrfs/027 with "-o compress" mount option, it always
crashes with the following call trace:

  BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/volumes.c:6651!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G           OE     5.13.0-rc2-custom+ #26
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
  RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs]
  Call Trace:
   btrfs_submit_compressed_write+0x2d7/0x470 [btrfs]
   submit_compressed_extents+0x3b0/0x470 [btrfs]
   ? mark_held_locks+0x49/0x70
   btrfs_work_helper+0x131/0x3e0 [btrfs]
   process_one_work+0x28f/0x5d0
   worker_thread+0x55/0x3c0
   ? process_one_work+0x5d0/0x5d0
   kthread+0x141/0x160
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  ---[ end trace 63113a3a91f34e68 ]---

[CAUSE]
The critical message before the crash means we have a bio at logical
bytenr 298901504 length 12288, but only 8192 bytes can fit into one
stripe, the remaining 4096 bytes go to another stripe.

In btrfs, all bios are properly split to avoid cross stripe boundary,
but commit 764c7c9a ("btrfs: zoned: fix parallel compressed writes")
changed the behavior for compressed writes.

Previously if we find our new page can't be fitted into current stripe,
ie. "submit == 1" case, we submit current bio without adding current
page.

       submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0);

   page->mapping = NULL;
   if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
       PAGE_SIZE) {

But after the modification, we will add the page no matter if it crosses
stripe boundary, leading to the above crash.

       submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0);

   if (pg_index == 0 && use_append)
           len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0);
   else
           len = bio_add_page(bio, page, PAGE_SIZE, 0);

   page->mapping = NULL;
   if (submit || len < PAGE_SIZE) {

[FIX]
It's no longer possible to revert to the original code style as we have
two different bio_add_*_page() calls now.

The new fix is to skip the bio_add_*_page() call if @submit is true.

Also to avoid @len to be uninitialized, always initialize it to zero.

If @submit is true, @len will not be checked.
If @submit is not true, @len will be the return value of
bio_add_*_page() call.
Either way, the behavior is still the same as the old code.
Reported-by: NJosef Bacik <josef@toxicpanda.com>
Fixes: 764c7c9a ("btrfs: zoned: fix parallel compressed writes")
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4c80a97d

20 5月, 2021 1 次提交

btrfs: zoned: fix parallel compressed writes · 764c7c9a

由 Johannes Thumshirn 提交于 5月 19, 2021

When multiple processes write data to the same block group on a
compressed zoned filesystem, the underlying device could report I/O
errors and data corruption is possible.

This happens because on a zoned file system, compressed data writes
where sent to the device via a REQ_OP_WRITE instead of a
REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel
submission it cannot be guaranteed that the data is always submitted
aligned to the underlying zone's write pointer.

The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a
zoned filesystem is non intrusive on a regular file system or when
submitting to a conventional zone on a zoned filesystem, as it is
guarded by btrfs_use_zone_append.
Reported-by: NDavid Sterba <dsterba@suse.com>
Fixes: 9d294a68 ("btrfs: zoned: enable to mount ZONED incompat flag")
CC: stable@vger.kernel.org # 5.12.x: e380adfc: btrfs: zoned: pass start block to btrfs_use_zone_append
CC: stable@vger.kernel.org # 5.12.x
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

764c7c9a

06 5月, 2021 1 次提交

btrfs: use memzero_page() instead of open coded kmap pattern · d048b9c2

由 Ira Weiny 提交于 5月 04, 2021

There are many places where kmap/memset/kunmap patterns occur.

Use the newly lifted memzero_page() to eliminate direct uses of kmap and
leverage the new core functions use of kmap_local_page().

The development of this patch was aided by the following coccinelle
script:

// <smpl>
// SPDX-License-Identifier: GPL-2.0-only
// Find kmap/memset/kunmap pattern and replace with memset*page calls
//
// NOTE: Offsets and other expressions may be more complex than what the script
// will automatically generate.  Therefore a catchall rule is provided to find
// the pattern which then must be evaluated by hand.
//
// Confidence: Low
// Copyright: (C) 2021 Intel Corporation
// URL: http://coccinelle.lip6.fr/
// Comments:
// Options:

//
// Then the memset pattern
//
@ memset_rule1 @
expression page, V, L, Off;
identifier ptr;
type VP;
@@

(
-VP ptr = kmap(page);
|
-ptr = kmap(page);
|
-VP ptr = kmap_atomic(page);
|
-ptr = kmap_atomic(page);
)
<+...
(
-memset(ptr, 0, L);
+memzero_page(page, 0, L);
|
-memset(ptr + Off, 0, L);
+memzero_page(page, Off, L);
|
-memset(ptr, V, L);
+memset_page(page, V, 0, L);
|
-memset(ptr + Off, V, L);
+memset_page(page, V, Off, L);
)
...+>
(
-kunmap(page);
|
-kunmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on memset_rule1
@
identifier memset_rule1.ptr;
type VP, VP1;
@@

-VP ptr;
	... when != ptr;
? VP1 ptr;

//
// Catch all
//
@ memset_rule2 @
expression page;
identifier ptr;
expression GenTo, GenSize, GenValue;
type VP;
@@

(
-VP ptr = kmap(page);
|
-ptr = kmap(page);
|
-VP ptr = kmap_atomic(page);
|
-ptr = kmap_atomic(page);
)
<+...
(
//
// Some call sites have complex expressions within the memset/memcpy
// The follow are catch alls which need to be evaluated by hand.
//
-memset(GenTo, 0, GenSize);
+memzero_pageExtra(page, GenTo, GenSize);
|
-memset(GenTo, GenValue, GenSize);
+memset_pageExtra(page, GenValue, GenTo, GenSize);
)
...+>
(
-kunmap(page);
|
-kunmap_atomic(ptr);
)

// Remove any pointers left unused
@
depends on memset_rule2
@
identifier memset_rule2.ptr;
type VP, VP1;
@@

-VP ptr;
	... when != ptr;
? VP1 ptr;

// </smpl>

Link: https://lkml.kernel.org/r/20210309212137.2610186-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d048b9c2

20 4月, 2021 1 次提交

btrfs: handle remount to no compress during compression · 1d8ba9e7

由 Qu Wenruo 提交于 8月 04, 2020

[BUG]
When running btrfs/071 with inode_need_compress() removed from
compress_file_range(), we got the following crash:

  BUG: kernel NULL pointer dereference, address: 0000000000000018
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
  RIP: 0010:compress_file_range+0x476/0x7b0 [btrfs]
  Call Trace:
   ? submit_compressed_extents+0x450/0x450 [btrfs]
   async_cow_start+0x16/0x40 [btrfs]
   btrfs_work_helper+0xf2/0x3e0 [btrfs]
   process_one_work+0x278/0x5e0
   worker_thread+0x55/0x400
   ? process_one_work+0x5e0/0x5e0
   kthread+0x168/0x190
   ? kthread_create_worker_on_cpu+0x70/0x70
   ret_from_fork+0x22/0x30
  ---[ end trace 65faf4eae941fa7d ]---

This is already after the patch "btrfs: inode: fix NULL pointer
dereference if inode doesn't need compression."

[CAUSE]
@pages is firstly created by kcalloc() in compress_file_extent():
                pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);

Then passed to btrfs_compress_pages() to be utilized there:

                ret = btrfs_compress_pages(...
                                           pages,
                                           &nr_pages,
                                           ...);

btrfs_compress_pages() will initialize each page as output, in
zlib_compress_pages() we have:

                        pages[nr_pages] = out_page;
                        nr_pages++;

Normally this is completely fine, but there is a special case which
is in btrfs_compress_pages() itself:

        switch (type) {
        default:
                return -E2BIG;
        }

In this case, we didn't modify @pages nor @out_pages, leaving them
untouched, then when we cleanup pages, the we can hit NULL pointer
dereference again:

        if (pages) {
                for (i = 0; i < nr_pages; i++) {
                        WARN_ON(pages[i]->mapping);
                        put_page(pages[i]);
                }
        ...
        }

Since pages[i] are all initialized to zero, and btrfs_compress_pages()
doesn't change them at all, accessing pages[i]->mapping would lead to
NULL pointer dereference.

This is not possible for current kernel, as we check
inode_need_compress() before doing pages allocation.
But if we're going to remove that inode_need_compress() in
compress_file_extent(), then it's going to be a problem.

[FIX]
When btrfs_compress_pages() hits its default case, modify @out_pages to
0 to prevent such problem from happening.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212331
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1d8ba9e7

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功