提交 · 6ca1765b366e3a678e143de0decc3d1d39c15429 · openeuler / Kernel

31 3月, 2018 1 次提交

Btrfs: scrub: batch rebuild for raid56 · 6ca1765b

由 Liu Bo 提交于 3月 07, 2018

In case of raid56, writes and rebuilds always take BTRFS_STRIPE_LEN(64K)
as unit, however, scrub_extent() sets blocksize as unit, so rebuild
process may be triggered on every block on a same stripe.

A typical example would be that when we're replacing a disappeared disk,
all reads on the disks get -EIO, every block (size is 4K if blocksize is
4K) would go thru these,

scrub_handle_errored_block
  scrub_recheck_block # re-read pages one by one
  scrub_recheck_block # rebuild by calling raid56_parity_recover()
                        page by page

Although with raid56 stripe cache most of reads during rebuild can be
avoided, the parity recover calculation(xor or raid6 algorithms) needs to
be done $(BTRFS_STRIPE_LEN / blocksize) times.

This makes it smarter by doing raid56 scrub/replace on stripe length.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6ca1765b

26 3月, 2018 3 次提交

Btrfs: dev-replace: make sure target is identical to source when raid56 rebuild fails · 4759700a

由 Liu Bo 提交于 3月 02, 2018

In the last step of scrub_handle_error_block, we try to combine good
copies on all possible mirrors, this works fine for raid1 and raid10,
but not for raid56 as it's doing parity rebuild.

If parity rebuild doesn't get back with correct data which matches its
checksum, in case of replace we'd rather write what is stored in the
source device than the data calculuated from parity.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4759700a

Btrfs: dev-replace: skip prealloc extents when copy nocow pages · ed5d5f37

由 Liu Bo 提交于 2月 27, 2018

It doens't make sense to process prealloc extents as pages will be
filled with zero when reading prealloc extents.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ed5d5f37

btrfs: not a disk error if the bio_add_page fails · 7ef2d6a7

由 Anand Jain 提交于 1月 05, 2018

bio_add_page() can fail for logical reasons as from the bio_add_page()
comments:

/*
 * This will only fail if either bio->bi_vcnt == bio->bi_max_vecs or
 * it's a cloned bio.
 */

Here we have just allocated the bio, so both of those failures can't
occur. So drop the check. We can also drop the error stats for write
error.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7ef2d6a7

22 1月, 2018 8 次提交

btrfs: rename btrfs_device::scrub_device to scrub_ctx · cadbc0a0

由 Anand Jain 提交于 1月 03, 2018

btrfs_device::scrub_device is not a device which is being scrubbed,
but it holds the scrub context, so rename to reflect the same. No
functional changes here.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

cadbc0a0

Btrfs: fix scrub to repair raid6 corruption · 762221f0

由 Liu Bo 提交于 1月 02, 2018

The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.

We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.

[1]: https://patchwork.kernel.org/patch/10091755/Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

762221f0

btrfs: sink unlock_extent parameter gfp_flags · e43bbe5e

由 David Sterba 提交于 12月 12, 2017

All callers pass either GFP_NOFS or GFP_KERNEL now, so we can sink the
parameter to the function, though we lose some of the slightly better
semantics of GFP_KERNEL in some places, it's worth cleaning up the
callchains.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e43bbe5e

Btrfs: use struct completion in scrub_submit_raid56_bio_wait · b4ff5ad7

由 Liu Bo 提交于 11月 30, 2017

This changes to use struct completion directly and removes 'struct
scrub_bio_ret' along with the code using it.

This struct is used to get the return value from bio, but the caller can
access bio to get the return value directly and is holding a reference
on it so it won't go away underneath us and can be removed safely.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b4ff5ad7

btrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT · 401e29c1

由 Anand Jain 提交于 12月 04, 2017

Currently device state is being managed by each individual int
variable such as struct btrfs_device::is_tgtdev_for_dev_replace.
Instead of that declare btrfs_device::dev_state
BTRFS_DEV_STATE_MISSING and use the bit operations.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
[ whitespace adjustments ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

401e29c1

btrfs: cleanup device states define BTRFS_DEV_STATE_MISSING · e6e674bd

由 Anand Jain 提交于 12月 04, 2017

Currently device state is being managed by each individual int
variable such as struct btrfs_device::missing. Instead of that
declare btrfs_device::dev_state BTRFS_DEV_STATE_MISSING and use
the bit operations.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by : Nikolay Borisov <nborisov@suse.com>
[ whitespace adjustments ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e6e674bd

btrfs: cleanup device states define BTRFS_DEV_STATE_IN_FS_METADATA · e12c9621

由 Anand Jain 提交于 12月 04, 2017

Currently device state is being managed by each individual int
variable such as struct btrfs_device::in_fs_metadata. Instead of
that declare device state BTRFS_DEV_STATE_IN_FS_METADATA and use
the bit operations.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
[ whitespace adjustments ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e12c9621

btrfs: cleanup device states define BTRFS_DEV_STATE_WRITEABLE · ebbede42

由 Anand Jain 提交于 12月 04, 2017

Currently device state is being managed by each individual int
variable such as struct btrfs_device::writeable. Instead of that
declare device state BTRFS_DEV_STATE_WRITEABLE and use the
bit operations.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
[ whitespace adjustments ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ebbede42

02 11月, 2017 1 次提交

btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c

由 Zygo Blaxell 提交于 9月 22, 2017

The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
offset (encoded as a single logical address) to a list of extent refs.
LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
(extent ref -> extent bytenr and offset, or logical address).  These are
useful capabilities for programs that manipulate extents and extent
references from userspace (e.g. dedup and defrag utilities).

When the extents are uncompressed (and not encrypted and not other),
check_extent_in_eb performs filtering of the extent refs to remove any
extent refs which do not contain the same extent offset as the 'logical'
parameter's extent offset.  This prevents LOGICAL_INO from returning
references to more than a single block.

To find the set of extent references to an uncompressed extent from [a, b),
userspace has to run a loop like this pseudocode:

	for (i = a; i < b; ++i)
		extent_ref_set += LOGICAL_INO(i);

At each iteration of the loop (up to 32768 iterations for a 128M extent),
data we are interested in is collected in the kernel, then deleted by
the filter in check_extent_in_eb.

When the extents are compressed (or encrypted or other), the 'logical'
parameter must be an extent bytenr (the 'a' parameter in the loop).
No filtering by extent offset is done (or possible?) so the result is
the complete set of extent refs for the entire extent.  This removes
the need for the loop, since we get all the extent refs in one call.

Add an 'ignore_offset' argument to iterate_inodes_from_logical,
[...several levels of function call graph...], and check_extent_in_eb, so
that we can disable the extent offset filtering for uncompressed extents.
This flag can be set by an improved version of the LOGICAL_INO ioctl to
get either behavior as desired.

There is no functional change in this patch.  The new flag is always
false.
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor coding style fixes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c995ab3c

30 10月, 2017 1 次提交

btrfs: scrub: get rid of sector_t · 6aa21263

由 David Sterba 提交于 10月 04, 2017

The use of sector_t is not necessry, it's just for a warning.  Switch to
u64 and rename the variable and use byte units instead of 512b, ie.
dropping the >> 9 shifts.  The messages are adjusted as well.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6aa21263

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

21 8月, 2017 4 次提交

D
btrfs: scrub: simplify scrub worker initialization · af1cbe0a
由 David Sterba 提交于 3月 31, 2017
```
Minor simplification, merge calls to one.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
af1cbe0a
D
btrfs: scrub: clean up division in scrub_find_csum · 1d1bf92d
由 David Sterba 提交于 3月 31, 2017
```
Use proper helpers for 64bit division.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
1d1bf92d

btrfs: scrub: clean up division in __scrub_mark_bitmap · 7736b0a4

由 David Sterba 提交于 3月 31, 2017

Use proper helpers for 64bit division and then cast to narrower type.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7736b0a4

btrfs: scrub: use bool for flush_all_writes · 2073c4c2

由 David Sterba 提交于 3月 31, 2017

flush_all_writes is an atomic but does not use the semantics at all,
it's just on/off indicator, we can use bool.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2073c4c2

18 8月, 2017 1 次提交

btrfs: use appropriate define for the fsid · 44880fdc

由 Anand Jain 提交于 7月 29, 2017

Though BTRFS_FSID_SIZE and BTRFS_UUID_SIZE are of the same size, we
should use the matching constant for the fsid buffer.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

44880fdc

16 8月, 2017 2 次提交

btrfs: account that we're waiting for IO in scrub_submit_raid56_bio_wait · 131ce436

由 David Sterba 提交于 7月 19, 2017

Correctly account for IO when waiting for a submitted bio in scrub. This
only for the accounting purposes and should not change other behaviour.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

131ce436

btrfs: drop newlines from strings when using btrfs_* helpers · 913e1535

由 David Sterba 提交于 7月 13, 2017

The helpers append "\n" so we can keep the actual strings shorter. The
extra newline will print an empty line.  Some messages have been
slightly modified to be more consistent with the rest (lowercase first
letter).
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

913e1535

30 6月, 2017 2 次提交

btrfs: fix integer overflow in calc_reclaim_items_nr · 6374e57a

由 Chris Mason 提交于 6月 23, 2017

Dave Jones hit a WARN_ON(nr < 0) in btrfs_wait_ordered_roots() with
v4.12-rc6.  This was because commit 70e7af24 made it possible for
calc_reclaim_items_nr() to return a negative number.  It's not really a
bug in that commit, it just didn't go far enough down the stack to find
all the possible 64->32 bit overflows.

This switches calc_reclaim_items_nr() to return a u64 and changes everyone
that uses the results of that math to u64 as well.
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Fixes: 70e7af24 ("Btrfs: fix delalloc accounting leak caused by u32 overflow")
Signed-off-by: NChris Mason <clm@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6374e57a

btrfs: scrub: fix target device intialization while setting up scrub context · ded56184

由 David Sterba 提交于 6月 26, 2017

The commit "btrfs: scrub: inline helper scrub_setup_wr_ctx" inlined a
helper but wrongly sets up the target device. Incidentally there's a
local variable with the same name as a parameter in the previous
function, so this got caught during runtime as crash in test btrfs/027.
Reported-by: NChris Mason <clm@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ded56184

20 6月, 2017 9 次提交

btrfs: sink gfp parameter to btrfs_io_bio_alloc · c5e4c3d7

由 David Sterba 提交于 6月 12, 2017

We can hardcode GFP_NOFS to btrfs_io_bio_alloc, although it means we
change it back from GFP_KERNEL in scrub. I'd rather save a few stack
bytes from not passing the gfp flags in the remaining, more imporatant,
contexts and the bio allocating API now looks more consistent.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c5e4c3d7

btrfs: btrfs_io_bio_alloc never fails, skip error handling · e4f56903

由 David Sterba 提交于 6月 02, 2017

Update direct callers of btrfs_io_bio_alloc that do error handling, that
we can now remove.
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e4f56903

btrfs: scrub: add memalloc_nofs protection around init_ipath · de2491fd

由 David Sterba 提交于 5月 31, 2017

init_ipath is called from a safe ioctl context and from scrub when
printing an error.  The protection is added for three reasons:

* init_data_container calls vmalloc and this does not work as expected
  in the GFP_NOFS context, so this silently does GFP_KERNEL and might
  deadlock in some cases
* keep the context constraint of GFP_NOFS, used by scrub
* we want to use GFP_KERNEL unconditionally inside init_ipath or its
  callees
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

de2491fd

btrfs: scrub: embed scrub_wr_ctx into scrub context · 3fb99303