提交 · 9a33944bdf075ca93062cde206cb25e62044890e · openeuler / raspberrypi-kernel

18 4月, 2017 40 次提交

btrfs: scrub: Don't append on-disk pages for raid56 scrub · 9a33944b

由 Qu Wenruo 提交于 3月 29, 2017

In the following situation, scrub will calculate wrong parity to
overwrite the correct one:

RAID5 full stripe:

Before
|     Dev 1      |     Dev  2     |     Dev 3     |
| Data stripe 1  | Data stripe 2  | Parity Stripe |
--------------------------------------------------- 0
| 0x0000 (Bad)   |     0xcdcd     |     0x0000    |
--------------------------------------------------- 4K
|     0xcdcd     |     0xcdcd     |     0x0000    |
...
|     0xcdcd     |     0xcdcd     |     0x0000    |
--------------------------------------------------- 64K

After scrubbing dev3 only:

|     Dev 1      |     Dev  2     |     Dev 3     |
| Data stripe 1  | Data stripe 2  | Parity Stripe |
--------------------------------------------------- 0
| 0xcdcd (Good)  |     0xcdcd     | 0xcdcd (Bad)  |
--------------------------------------------------- 4K
|     0xcdcd     |     0xcdcd     |     0x0000    |
...
|     0xcdcd     |     0xcdcd     |     0x0000    |
--------------------------------------------------- 64K

The reason is that after raid56 read rebuild rbio->stripe_pages are all
correctly recovered (0xcd for data stripes).

However when we check and repair parity in
scrub_parity_check_and_repair(), we will append pages in sparity->spages
list to rbio->bio_pages[], which contains old on-disk data.

And when we submit parity data to disk, we calculate parity using
rbio->bio_pages[] first, if rbio->bio_pages[] not found, then fallback
to rbio->stripe_pages[].

The patch fix it by not appending pages from sparity->spages.
So finish_parity_scrub() will use rbio->stripe_pages[] which is correct.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9a33944b

btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint · d51ea5dd

由 Qu Wenruo 提交于 3月 13, 2017

Newly introduced qgroup reserved space trace points are normally nested
into several common qgroup operations.

While some other trace points are not well placed to co-operate with
them, causing confusing output.

This patch re-arrange trace_btrfs_qgroup_release_data() and
trace_btrfs_qgroup_free_delayed_ref() trace points so they are triggered
before reserved space ones.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

d51ea5dd

btrfs: qgroup: Add trace point for qgroup reserved space · 3159fe7b

由 Qu Wenruo 提交于 3月 13, 2017

Introduce the following trace points:
qgroup_update_reserve
qgroup_meta_reserve

These trace points are handy to trace qgroup reserve space related
problems.

Also export btrfs_qgroup structure, as now we directly pass btrfs_qgroup
structure to trace points, so that structure needs to be exported.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3159fe7b

D
btrfs: drop redundant parameters from btrfs_map_sblock · 825ad4c9
由 David Sterba 提交于 3月 28, 2017
```
All callers pass 0 for mirror_num and 1 for need_raid_map.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
825ad4c9
D
btrfs: sink GFP flags parameter to tree_mod_log_insert_root · bcc8e07f
由 David Sterba 提交于 3月 28, 2017
```
All (1) callers pass the same value.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
bcc8e07f
D
btrfs: sink GFP flags parameter to tree_mod_log_insert_move · 176ef8f5
由 David Sterba 提交于 3月 28, 2017
```
All (1) callers pass the same value.
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
176ef8f5

Btrfs: fix wrong failed mirror_num of read-repair on raid56 · abad60c6

由 Liu Bo 提交于 3月 29, 2017

In raid56 scenario, after trying parity recovery, we didn't set
mirror_num for btrfs_bio with failed mirror_num, hence
end_bio_extent_readpage() will report a random mirror_num in dmesg
log.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

abad60c6

Btrfs: set scrub page's io_error if failing to submit io · 1bcd7aa1

由 Liu Bo 提交于 3月 29, 2017

Scrub repairs data by the unit called scrub_block, which may contain
several pages.  Scrub always tries to look up a good copy of a whole
block, but if there's no such copy, it tries to do repair page by page.

If we don't set page's io_error when checking this bad copy, in the last
step, we may skip this page when repairing bad copy from good copy.

Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

1bcd7aa1

btrfs: track exclusive filesystem operation in flags · 171938e5

由 David Sterba 提交于 3月 28, 2017

There are several operations, usually started from ioctls, that cannot
run concurrently. The status is tracked in
mutually_exclusive_operation_running as an atomic_t. We can easily track
the status as one of the per-filesystem flag bits with same
synchronization guarantees.

The conversion replaces:

* atomic_xchg(..., 1)    ->   test_and_set_bit(FLAG, ...)
* atomic_set(..., 0)     ->   clear_bit(FLAG, ...)
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

171938e5

btrfs: qgroups: Retry after commit on getting EDQUOT · 48a89bc4

由 Goldwyn Rodrigues 提交于 3月 27, 2017

We are facing the same problem with EDQUOT which was experienced with
ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but
here is a quick fix, which may be too big a hammer.

Quotas are reserved during the start of an operation, incrementing
qg->reserved. However, it is written to disk in a commit_transaction
which could take as long as commit_interval. In the meantime there
could be deletions which are not accounted for because deletions are
accounted for only while committed (free_refroot). So, when we get
a EDQUOT flush the data to disk and try again.

This fixes fstests btrfs/139.

Here is a sample script which shows this issue.

DEVICE=/dev/vdb
MOUNTPOINT=/mnt
TESTVOL=$MOUNTPOINT/tmp
QUOTA=5
PROG=btrfs
DD_BS="4k"
DD_COUNT="256"
RUN_TIMES=5000

mkfs.btrfs -f $DEVICE
mount -o commit=240 $DEVICE $MOUNTPOINT
$PROG subvolume create $TESTVOL
$PROG quota enable $TESTVOL
$PROG qgroup limit ${QUOTA}G $TESTVOL

typeset -i DD_RUN_GOOD
typeset -i QUOTA

function _check_cmd() {
        if [[ ${?} > 0 ]]; then
                echo -n "$(date) E: Running previous command"
                echo ${*}
                echo "Without sync"
                $PROG qgroup show -pcreFf ${TESTVOL}
                echo "With sync"
                $PROG qgroup show -pcreFf --sync ${TESTVOL}
                exit 1
        fi
}

while true; do
  DD_RUN_GOOD=$RUN_TIMES

  while (( ${DD_RUN_GOOD} != 0 )); do
        dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}
        _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}"
        DD_RUN_GOOD=(${DD_RUN_GOOD}-1)
  done

  $PROG qgroup show -pcref $TESTVOL
  echo "----------- Cleanup ---------- "
  rm $TESTVOL/quotatest*

done
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

48a89bc4

btrfs: replace hardcoded value with SEQ_LAST macro · de47c9d3

由 Edmund Nadolski 提交于 3月 16, 2017

Define the SEQ_LAST macro to replace (u64)-1 in places where said
value triggers a special-case ref search behavior.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Reviewed-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

de47c9d3

btrfs: provide enumeration for __merge_refs mode argument · f58d88b3

由 Edmund Nadolski 提交于 3月 16, 2017

Replace hardcoded numeric values for __merge_refs 'mode' argument
with descriptive constants.
Signed-off-by: NEdmund Nadolski <enadolski@suse.com>
Reviewed-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f58d88b3

btrfs: remove unused qgroup members from btrfs_trans_handle · f486135e

由 David Sterba 提交于 3月 15, 2017

The members have been effectively unused since "Btrfs: rework qgroup
accounting" (fcebe456), there's no substitute for
assert_qgroups_uptodate so it's removed as well.
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

f486135e

btrfs: remove local blocksize variable in reada_find_extent · 994a5d2b

由 David Sterba 提交于 3月 15, 2017

The name is misleading and the local variable serves no purpose.
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

994a5d2b

btrfs: remove redundant parameter from reada_start_machine_dev · 5721b8ad