提交 · f3038ee3a3f1017a1cbe9907e31fa12d366c5dcb · openeuler / Kernel

22 1月, 2018 19 次提交

btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker · f3038ee3

由 Nikolay Borisov 提交于 12月 05, 2017

This function was introduced by 247e743c ("Btrfs: Use async helpers
to deal with pages that have been improperly dirtied") and it didn't do
any error handling then. This function might very well fail in ENOMEM
situation, yet it's not handled, this could lead to inconsistent state.
So let's handle the failure by setting the mapping error bit.

Cc: stable@vger.kernel.org
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f3038ee3

btrfs: remove redundant check in btrfs_get_extent_fiemap · bf8d32b9

由 Nikolay Borisov 提交于 12月 01, 2017

Before returning hole_em in btrfs_get_fiemap_extent we check if it's different
than null. However, by the time this null check is triggered we already know
hole_em is not null because it means it points to the em we found and it
has already been dereferenced.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bf8d32b9

btrfs: Remove unused variable in btrfs_get_extent · 5c9a702e

由 Nikolay Borisov 提交于 12月 01, 2017

trans was statically assigned to NULL and this never changed over the
course of btrfs_get_extent. So remove any code which checks whether
trans != NULL and just hardcode the fact trans is always NULL.

Resolves-coverity-id: 112806
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5c9a702e

Btrfs: compress_file_range() change page dirty status once · e9679de3

由 Timofey Titovets 提交于 10月 24, 2017

We need to call extent_range_clear_dirty_for_io()
on compression range to prevent application from changing
page content, while pages compressing.

extent_range_clear_dirty_for_io() runs on each loop iteration,
"(end - start)" can be much (up to 1024 times) bigger
then compression range (BTRFS_MAX_UNCOMPRESSED).

The start pointer is advanced each time we manage to compress part of
the range. The end pointer does not change so we could redirty the
remaining parts repeatedly.

Fix that behaviour by call extent_range_clear_dirty_for_io()
only once, the first time it happens.

This is the safest but probably not the best behaviour. Previous
iterations of the patch tried to redirty only the range that we were not
able to compress. This has been refused by David for safety reasons, the
writeout callchain is complex and there could be some path that relies
on redirtying the entire unwritten range.
Signed-off-by: NTimofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ enhance changelog, the history and safety concerns, add comment ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e9679de3

btrfs: Remove redundant FLAG_VACANCY · 4a2d25cd

由 Nikolay Borisov 提交于 11月 23, 2017

Commit 9036c102 ("Btrfs: update hole handling v2") added the
FLAG_VACANCY to denote holes, however there was already a consistent way
of flagging extents which represent hole - ->block_start =
EXTENT_MAP_HOLE. And also the only place where this flag is checked is
in the fiemap code, but the block_start value is also checked and every
other place in the filesystem detects holes by using block_start
value's. So remove the extra flag. This survived a full xfstest run.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4a2d25cd

btrfs: switch btrfs_trans_handle::adding_csums to bool · 7c2871a2

由 David Sterba 提交于 11月 08, 2017

The semantics of adding_csums matches bool, 'short' was most likely used
to save space in a698d075 ("Btrfs: add a type field for the
transaction handle").
Signed-off-by: NDavid Sterba <dsterba@suse.com>

7c2871a2

btrfs: remove dead code from btrfs_get_extent · bf46f52d

由 Edmund Nadolski 提交于 11月 20, 2017

Due to new_inline logic, the create == 0 is always true at this
point in the code, so the create != 0 branch can be removed.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bf46f52d

D
btrfs: sink get_extent parameter to extent_readpages · 0932584b
由 David Sterba 提交于 6月 23, 2017
```
There's only one caller that passes btrfs_get_extent.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
0932584b

btrfs: sink get_extent parameter to extent_fiemap · 2135fb9b

由 David Sterba 提交于 6月 23, 2017

All callers pass btrfs_get_extent_fiemap and we don't expect anything
else in the context of extent_fiemap.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2135fb9b

D
btrfs: sink get_extent parameter to extent_write_full_page · deac642d
由 David Sterba 提交于 6月 23, 2017
```
There's only one caller.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
deac642d
D
btrfs: sink get_extent parameter to extent_write_locked_range · 916b9298
由 David Sterba 提交于 6月 23, 2017
```
There's only one caller.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
916b9298
D
btrfs: sink get_extent parameter to extent_writepages · 43317599
由 David Sterba 提交于 6月 23, 2017
```
There's only one caller.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
43317599

btrfs: Cleanup existing name_len checks · bae15d95

由 Qu Wenruo 提交于 11月 08, 2017

Since tree-checker has verified leaf when reading from disk, we don't
need the existing verify_dir_item() or btrfs_is_name_len_valid() checks.
Signed-off-by: NQu Wenruo <wqu@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bae15d95

btrfs: use GFP_KERNEL in btrfs_alloc_inode · 712e36c5

由 David Sterba 提交于 10月 31, 2017

This callback is called directly from VFS, no locks are held at the
allocation time.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

712e36c5

D
btrfs: sink gfp parameter to clear_extent_uptodate · f08dc36f
由 David Sterba 提交于 10月 31, 2017
```
There's only one callsite with GFP_NOFS.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
f08dc36f

btrfs: sink gfp parameter to clear_extent_bit · ae0f1625

由 David Sterba 提交于 10月 31, 2017

All callers use GFP_NOFS, we don't have to pass it as an argument. The
built-in tests pass GFP_KERNEL, but they run only at module load time
and NOFS works there as well.
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ae0f1625

Btrfs: document rules about bio async submit · 4c274bc6

由 Liu Bo 提交于 11月 01, 2017

These rules have been hidden in several if-else and are not
straightforward to follow, for example, dio submit hook's nocsum case
has a bug , i.e. doing async submit instead of sync submit, which has
been fixed recently.

This is documenting the rules for reference.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4c274bc6

Btrfs: add __init macro to btrfs init functions · f5c29bd9

由 Liu Bo 提交于 11月 02, 2017

Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be freed up
after.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f5c29bd9

Btrfs: remove redundant btrfs_balance_delayed_items · 1805f2ca

由 Liu Bo 提交于 10月 20, 2017

In functions like btrfs_create(), we run both
btrfs_balance_delayed_items() and btrfs_btree_balance_dirty() after
the operation, but btrfs_btree_balance_dirty() is surely going to run
btrfs_balance_delayed_items().

This keeps only btrfs_btree_balance_dirty().
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1805f2ca

07 12月, 2017 1 次提交

btrfs: Fix quota reservation leak on preallocated files · b430b775

由 Justin Maggard 提交于 10月 30, 2017

Commit c6887cd1 ("Btrfs: don't do nocow check unless we have to")
changed the behavior of __btrfs_buffered_write() so that it first tries
to get a data space reservation, and then skips the relatively expensive
nocow check if the reservation succeeded.

If we have quotas enabled, the data space reservation also includes a
quota reservation.  But in the rewrite case, the space has already been
accounted for in qgroups.  So btrfs_check_data_free_space() increases
the quota reservation, but it never gets decreased when the data
actually gets written and overwrites the pre-existing data.  So we're
left with both the qgroup and qgroup reservation accounting for the same
space.

This commit adds the missing btrfs_qgroup_free_data() call in the case
of BTRFS_ORDERED_PREALLOC extents.

Fixes: c6887cd1 ("Btrfs: don't do nocow check unless we have to")
Signed-off-by: NJustin Maggard <jmaggard@netgear.com>
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

b430b775

16 11月, 2017 1 次提交

Btrfs: fix reported number of inode blocks after buffered append writes · e3b8a485

由 Filipe Manana 提交于 11月 04, 2017

The patch from commit a7e3b975 ("Btrfs: fix reported number of inode
blocks") introduced a regression where if we do a buffered write starting
at position equal to or greater than the file's size and then stat(2) the
file before writeback is triggered, the number of used blocks does not
change (unless there's a prealloc/unwritten extent). Example:

  $ xfs_io -f -c "pwrite -S 0xab 0 64K" foobar
  $ du -h foobar
  0	foobar
  $ sync
  $ du -h foobar
  64K	foobar

The first version of that patch didn't had this regression and the second
version, which was the one committed, was made only to address some
performance regression detected by the intel test robots using fs_mark.

This fixes the regression by setting the new delaloc bit in the range, and
doing it at btrfs_dirty_pages() while setting the regular dealloc bit as
well, so that this way we set both bits at once avoiding navigation of the
inode's io tree twice. Doing it at btrfs_dirty_pages() is also the most
meaninful place, as we should set the new dellaloc bit when if we set the
delalloc bit, which happens only if we copied bytes into the pages at
__btrfs_buffered_write().

This was making some of LTP's du tests fail, which can be quickly run
using a command line like the following:

  $ ./runltp -q -p -l /ltp.log -f commands -s du -d /mnt

Fixes: a7e3b975 ("Btrfs: fix reported number of inode blocks")
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e3b8a485

15 11月, 2017 2 次提交

Btrfs: bail out gracefully rather than BUG_ON · 56a0e706

由 Liu Bo 提交于 10月 30, 2017

If a file's DIR_ITEM key is invalid (due to memory errors) and gets
written to disk, a future lookup_path can end up with kernel panic due
to BUG_ON().

This gets rid of the BUG_ON(), meanwhile output the corrupted key and
return ENOENT if it's invalid.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reported-by: NGuillaume Bouchard <bouchard@mercs-eng.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

56a0e706

Btrfs: add write_flags for compression bio · f82b7359

由 Liu Bo 提交于 10月 23, 2017

Compression code path has only flaged bios with REQ_OP_WRITE no matter
where the bios come from, but it could be a sync write if fsync starts
this writeback or a normal writeback write if wb kthread starts a
periodic writeback.

It breaks the rule that sync writes and writeback writes need to be
differentiated from each other, because from the POV of block layer,
all bios need to be recognized by these flags in order to do some
management, e.g. throttlling.

This passes writeback_control to compression write path so that it can
send bios with proper flags to block layer.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f82b7359

02 11月, 2017 5 次提交

btrfs: move btrfs_truncate_block out of trans handle · ddfae63c

由 Josef Bacik 提交于 10月 19, 2017

Since we do a delalloc reserve in btrfs_truncate_block we can deadlock
with freeze. If somebody else is trying to allocate metadata for this
inode and it gets stuck in start_delalloc_inodes because of freeze we
will deadlock. Be safe and move this outside of a trans handle. This
also has a side-effect of making sure that we're not leaving stale data
behind in the other_encoding or encryption case. Not an issue now since
nobody uses it, but it would be a problem in the future.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ddfae63c

btrfs: make the delalloc block rsv per inode · 69fe2d75

由 Josef Bacik 提交于 10月 19, 2017

The way we handle delalloc metadata reservations has gotten
progressively more complicated over the years.  There is so much cruft
and weirdness around keeping the reserved count and outstanding counters
consistent and handling the error cases that it's impossible to
understand.

Fix this by making the delalloc block rsv per-inode.  This way we can
calculate the actual size of the outstanding metadata reservations every
time we make a change, and then reserve the delta based on that amount.
This greatly simplifies the code everywhere, and makes the error
handling in btrfs_delalloc_reserve_metadata far less terrifying.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

69fe2d75

Btrfs: rework outstanding_extents · 8b62f87b

由 Josef Bacik 提交于 10月 19, 2017

Right now we do a lot of weird hoops around outstanding_extents in order
to keep the extent count consistent.  This is because we logically
transfer the outstanding_extent count from the initial reservation
through the set_delalloc_bits.  This makes it pretty difficult to get a
handle on how and when we need to mess with outstanding_extents.

Fix this by revamping the rules of how we deal with outstanding_extents.
Now instead everybody that is holding on to a delalloc extent is
required to increase the outstanding extents count for itself.  This
means we'll have something like this

btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
 btrfs_set_extent_delalloc	- outstanding_extents = 2
btrfs_release_delalloc_extents	- outstanding_extents = 1

for an initial file write.  Now take the append write where we extend an
existing delalloc range but still under the maximum extent size

btrfs_delalloc_reserve_metadata - outstanding_extents = 2
  btrfs_set_extent_delalloc
    btrfs_set_bit_hook		- outstanding_extents = 3
    btrfs_merge_extent_hook	- outstanding_extents = 2
btrfs_delalloc_release_extents	- outstanding_extnets = 1

In order to make the ordered extent transition we of course must now
make ordered extents carry their own outstanding_extent reservation, so
for cow_file_range we end up with

btrfs_add_ordered_extent	- outstanding_extents = 2
clear_extent_bit		- outstanding_extents = 1
btrfs_remove_ordered_extent	- outstanding_extents = 0

This makes all manipulations of outstanding_extents much more explicit.
Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
combined with btrfs_release_delalloc_extents, even in the error case, as
that is the only function that actually modifies the
outstanding_extents counter.

The drawback to this is now we are much more likely to have transient
cases where outstanding_extents is much larger than it actually should
be.  This could happen before as we manipulated the delalloc bits, but
now it happens basically at every write.  This may put more pressure on
the ENOSPC flushing code, but I think making this code simpler is worth
the cost.  I have another change coming to mitigate this side-effect
somewhat.

I also added trace points for the counter manipulation.  These were used
by a bpf script I wrote to help track down leak issues.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8b62f87b

btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c

由 Zygo Blaxell 提交于 9月 22, 2017

The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
offset (encoded as a single logical address) to a list of extent refs.
LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
(extent ref -> extent bytenr and offset, or logical address).  These are
useful capabilities for programs that manipulate extents and extent
references from userspace (e.g. dedup and defrag utilities).

When the extents are uncompressed (and not encrypted and not other),
check_extent_in_eb performs filtering of the extent refs to remove any
extent refs which do not contain the same extent offset as the 'logical'
parameter's extent offset.  This prevents LOGICAL_INO from returning
references to more than a single block.

To find the set of extent references to an uncompressed extent from [a, b),
userspace has to run a loop like this pseudocode:

	for (i = a; i < b; ++i)
		extent_ref_set += LOGICAL_INO(i);

At each iteration of the loop (up to 32768 iterations for a 128M extent),
data we are interested in is collected in the kernel, then deleted by
the filter in check_extent_in_eb.

When the extents are compressed (or encrypted or other), the 'logical'
parameter must be an extent bytenr (the 'a' parameter in the loop).
No filtering by extent offset is done (or possible?) so the result is
the complete set of extent refs for the entire extent.  This removes
the need for the loop, since we get all the extent refs in one call.

Add an 'ignore_offset' argument to iterate_inodes_from_logical,
[...several levels of function call graph...], and check_extent_in_eb, so
that we can disable the extent offset filtering for uncompressed extents.
This flag can be set by an improved version of the LOGICAL_INO ioctl to
get either behavior as desired.

There is no functional change in this patch.  The new flag is always
false.
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ minor coding style fixes ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c995ab3c

btrfs: allow to set compression level for zlib · f51d2b59

由 David Sterba 提交于 9月 15, 2017

Preliminary support for setting compression level for zlib, the
following works:

$ mount -o compess=zlib                 # default
$ mount -o compess=zlib0                # same
$ mount -o compess=zlib9                # level 9, slower sync, less data
$ mount -o compess=zlib1                # level 1, faster sync, more data
$ mount -o remount,compress=zlib3	# level set by remount

The compress-force works the same as compress'.  The level is visible in
the same format in /proc/mounts. Level set via file property does not
work yet.

Required patch: "btrfs: prepare for extensions in compression options"
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f51d2b59

30 10月, 2017 8 次提交

btrfs: use BLK_STS defines where needed · 2dbe0c77

由 Anand Jain 提交于 10月 14, 2017

At few places we could use BLK_STS_OK and BLK_STS_NOSUPP.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NSatoru Taekeuchi <satoru.takeuchi@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ dropped first hunk btrfs_endio_direct_read ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

2dbe0c77

btrfs: pass root to various extent ref mod functions · 84f7d8e6

由 Josef Bacik 提交于 9月 29, 2017

We need the actual root for the ref verifier tool to work, so change
these functions to pass the root around instead.  This will be used in
a subsequent patch.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

84f7d8e6

Btrfs: remove nr_async_submits and async_submit_draining · 736cd52e

由 Liu Bo 提交于 9月 07, 2017

Now that we have the combo of flushing twice, which can make sure IO
have started since the second flush will wait for page lock which
won't be unlocked unless setting page writeback and queuing ordered
extents, we don't need %async_submit_draining, %async_delalloc_pages
and %nr_async_submits to tell whether the IO has actually started.

Moreover, all the flushers in use are followed by functions that wait
for ordered extents to complete, so %nr_async_submits, which tracks
whether bio's async submit has made progress, doesn't really make
sense.

However, %async_delalloc_pages is still required by shrink_delalloc()
as that function doesn't flush twice in the normal case (just issues a
writeback with WB_REASON_FS_FREE_SPACE).
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

736cd52e

Btrfs: compress_file_range remove dead variable num_bytes · 1170862d

由 Timofey Titovets 提交于 10月 03, 2017

Remove dead assigment of num_bytes.

Also as num_bytes only used in the will_compress block as copy of
total_in just replace that with total_in and drop num_bytes entirely.
Signed-off-by: NTimofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1170862d

btrfs: Fix bool initialization/comparison · 897ca819

由 Thomas Meyer 提交于 10月 07, 2017

Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

897ca819

Btrfs: cleanup 'start' subtraction from try uncompressed inline extent · 6018ba0a

由 Timofey Titovets 提交于 9月 15, 2017

Was added in:
  c8b97818
  "Btrfs: Add zlib compression support"
Survive to near time (from 08.10.2008).

Because 'start' checked for zero before branch, so it's safe to remove
that subtraction.
Signed-off-by: NTimofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: NSatoru Takeuchi <satoru.takeuchi@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6018ba0a

btrfs: Remove unused parameter from check_direct_IO · 8c70c9f8

由 Nikolay Borisov 提交于 8月 21, 2017

Introduced by 5a5f79b5 ("Btrfs: allow unaligned DIO") and never
used. The buffered fallback from unaligned DIO works as expected.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NTimofey Titovets <nefelim4ag@gmail.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8c70c9f8

Btrfs: do not async submit for nodatasum inodes · 9b4a9b28

由 Liu Bo 提交于 8月 18, 2017

While we submit direct writes, if the inode is flagged with nodatasum,
there's no benefit to submit asynchronously, because

a) we don't have to calculate checksum across processors,

b) and direct IO has started a plug, but async submit makes us queue
IO on each device's scheduled IO list instead of DIO's plug list, so
that IOs get much less merges in general.

Lets use sync submit for nodatasum inodes.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9b4a9b28

26 9月, 2017 3 次提交

Btrfs: fix unexpected result when dio reading corrupted blocks · 99c4e3b9

由 Liu Bo 提交于 9月 15, 2017

commit 4246a0b6 ("block: add a bi_error field to struct bio")
changed the logic of how dio read endio reports errors.

For single stripe dio read, %bio->bi_status reflects the error before
verifying checksum, and now we're updating it when data block matches
with its checksum, while in the mismatching case, %bio->bi_status is
not updated to relfect that.

When some blocks in a file have been corrupted on disk, reading such a
file ends up with

1) checksum errors are reported in kernel log
2) read(2) returns successfully with some content being 0x01.

In order to fix it, we need to report its checksum mismatch error to
the upper layer (dio layer in this case) as well.

Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reported-by: NGoffredo Baroncelli <kreijack@inwind.it>
Tested-by: NGoffredo Baroncelli <kreijack@inwind.it>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

99c4e3b9

btrfs: finish ordered extent cleaning if no progress is found · 67c003f9

由 Naohiro Aota 提交于 9月 01, 2017

__endio_write_update_ordered() repeats the search until it reaches the end
of the specified range. This works well with direct IO path, because before
the function is called, it's ensured that there are ordered extents filling
whole the range. It's not the case, however, when it's called from
run_delalloc_range(): it is possible to have error in the midle of the loop
in e.g. run_delalloc_nocow(), so that there exisits the range not covered
by any ordered extents. By cleaning such "uncomplete" range,
__endio_write_update_ordered() stucks at offset where there're no ordered
extents.

Since the ordered extents are created from head to tail, we can stop the
search if there are no offset progress.

Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
Cc: <stable@vger.kernel.org> # 4.12
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

67c003f9

btrfs: clear ordered flag on cleaning up ordered extents · 63d71450

由 Naohiro Aota 提交于 9月 01, 2017

Commit 52427260 ("btrfs: Handle delalloc error correctly to avoid
ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup
submitted ordered extents. However, it does not clear the ordered bit
(Private2) of corresponding pages. Thus, the following BUG occurs from
free_pages_check_bad() (on btrfs/125 with nospace_cache).

BUG: Bad page state in process btrfs  pfn:3fa787
page:ffffdf2acfe9e1c0 count:0 mapcount:0 mapping:          (null) index:0xd
flags: 0x8000000000002008(uptodate|private_2)
raw: 8000000000002008 0000000000000000 000000000000000d 00000000ffffffff
raw: ffffdf2acf5c1b20 ffffb443802238b0 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x2000(private_2)

This patch clears the flag same as other places calling
btrfs_dec_test_ordered_pending() for every page in the specified range.

Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
Cc: <stable@vger.kernel.org> # 4.12
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

63d71450

24 8月, 2017 1 次提交

Btrfs: fix blk_status_t/errno confusion · 58efbc9f

由 Omar Sandoval 提交于 8月 22, 2017

This fixes several instances of blk_status_t and bare errno ints being
mixed up, some of which are real bugs.

In the normal case, 0 matches BLK_STS_OK, so we don't observe any
effects of the missing conversion, but in case of errors or passes
through the repair/retry paths, the errors get mixed up.

The changes were identified using 'sparse', we don't have reports of the
buggy behaviour.

Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

58efbc9f

openeuler / Kernel 12 个月 前同步成功

openeuler / Kernel
12 个月前同步成功