提交 · 9799d2c32bef6fba098fbef763002bc8d4851a2c · openeuler / Kernel

11 11月, 2015 1 次提交

btrfs: scrub: set error stats when tree block spanning stripes · 9799d2c3

由 Zhao Lei 提交于 8月 25, 2015

It is better to show error stats to user when we found tree block
spanning stripes.

On a btrfs created by old version of btrfs-convert:
Before patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for 8b342d35-2904-41ab-b3cb-2f929709cf47
          scrub started at Tue Aug 25 21:19:09 2015 and finished after 00:00:00
          total bytes scrubbed: 53.54MiB with 0 errors
  # dmesg
  ...
  [  128.711434] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832
  [  128.712744] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368
  ...

After patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for ff7f844b-7a4e-4b1a-88a9-8252ab25be1b
          scrub started at Tue Aug 25 21:42:29 2015 and finished after 00:00:00
          total bytes scrubbed: 53.60MiB with 2 errors
          error details:
          corrected errors: 0, uncorrectable errors: 2, unverified errors: 0
  ERROR: There are uncorrectable errors.
  # dmesg
  ...omit...
  #
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

9799d2c3

08 10月, 2015 3 次提交
- D
  btrfs: switch message printers to ratelimited variants · 94647322
  由 David Sterba 提交于 10月 08, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  94647322
- D
  btrfs: switch message printers to ratelimited _in_rcu variants · b14af3b4
  由 David Sterba 提交于 10月 08, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  b14af3b4
- D
  btrfs: switch message printers to _in_rcu variants · ecaeb14b
  由 David Sterba 提交于 10月 08, 2015
```
Signed-off-by: NDavid Sterba <dsterba@suse.com>
```
  ecaeb14b
01 9月, 2015 2 次提交

btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks and scrub_chunk · 8c204c96

由 Zhao Lei 提交于 8月 19, 2015

These variables are not used from introduced version, remove them.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

8c204c96

btrfs: Update out-of-date "skip parity stripe" comment · 7955323b

由 Zhao Lei 提交于 8月 18, 2015

Because btrfs support scrub raid56 parity stripe now.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

7955323b

14 8月, 2015 1 次提交

block: remove bio_get_nr_vecs() · b54ffb73

由 Kent Overstreet 提交于 5月 19, 2015

We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.
Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
[hch: rebased and wrote a changelog]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b54ffb73

09 8月, 2015 13 次提交

Btrfs: fix parity scrub of RAID 5/6 with missing device · 4a770891

由 Omar Sandoval 提交于 6月 19, 2015

When testing the previous patch, Zhao Lei reported a similar bug when
attempting to scrub a degraded RAID 5/6 filesystem with a missing
device, leading to NULL pointer dereferences from the RAID 5/6 parity
scrubbing code.

The first cause was the same as in the previous patch: attempting to
call bio_add_page() on a missing block device. To fix this,
scrub_extent_for_parity() can just mark the sectors on the missing
device as errors instead of attempting to read from it.

Additionally, the code uses scrub_remap_extent() to map the extent of
the corresponding data stripe, but the extent wasn't already mapped. If
scrub_remap_extent() finds a missing block device, it doesn't initialize
extent_dev, so we're left with a NULL struct btrfs_device. The solution
is to use btrfs_map_block() directly.
Reported-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

4a770891

Btrfs: fix device replace of a missing RAID 5/6 device · 73ff61db

由 Omar Sandoval 提交于 6月 19, 2015

The original implementation of device replace on RAID 5/6 seems to have
missed support for replacing a missing device. When this is attempted,
we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which
crashes when we try to dereference it. This happens because
btrfs_map_block() has no choice but to return us the missing device
because RAID 5/6 don't have any alternate mirrors to read from, and a
missing device has a NULL bdev.

The idea implemented here is to handle the missing device case
separately, which better only happen when we're replacing a missing RAID
5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to
reconstruct the data from parity, check it with
scrub_recheck_block_checksum(), and write it out with
scrub_write_block_to_dev_replace().
Reported-by: NPhilip <bugzilla@philip-seeger.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

73ff61db

Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation · b4ee1782

由 Omar Sandoval 提交于 6月 19, 2015

The current RAID 5/6 recovery code isn't quite prepared to handle
missing devices. In particular, it expects a bio that we previously
attempted to use in the read path, meaning that it has valid pages
allocated. However, missing devices have a NULL blkdev, and we can't
call bio_add_page() on a bio with a NULL blkdev. We could do manual
manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add
a separate path that allows us to manually add pages to the rbio.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

b4ee1782

Btrfs: remove misleading handling of missing device scrub · 03679ade

由 Omar Sandoval 提交于 6月 19, 2015

scrub_submit() claims that it can handle a bio with a NULL block device,
but this is misleading, as calling bio_add_page() on a bio with a NULL
->bi_bdev would've already crashed. Delete this, as we're about to
properly handle a missing block device.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

03679ade

btrfs: Fix data checksum error cause by replace with io-load. · 55e3a601

由 Zhaolei 提交于 8月 05, 2015

xfstests btrfs/070 sometimes failed.
In my test machine, its fail rate is about 30%.
In another vm(vmware), its fail rate is about 50%.

Reason:
  btrfs/070 do replace and defrag with fsstress simultaneously,
  after above operation, checksum error is found by scrub.

  Actually, it have no relationship with defrag operation, only
  replace with fsstress can trigger this bug.

  New data writen to target device have possibility rewrited by
  old data from source device by replace code in debug, to avoid
  above problem, we can set target block group to readonly in
  replace period, so new data requested by other operation will
  not write to same place with replace code.

  Before patch(4.1-rc3):
    30% failed in 100 xfstests.
  After patch:
    0% failed in 300 xfstests.

It also happened in btrfs/071 as it's another scrub with IO load tests.
Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

55e3a601

btrfs: use scrub_pause_on/off() to reduce code in scrub_enumerate_chunks() · b708ce96

由 Zhaolei 提交于 8月 05, 2015

Use new intruduced scrub_pause_on/off() can make this code block
clean and more readable.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

b708ce96

btrfs: Separate scrub_blocked_if_needed() to scrub_pause_on/off() · 0e22be89

由 Zhaolei 提交于 8月 05, 2015

It can reduce current duplicated code which is similar to
scrub_blocked_if_needed() but can not call it because little
different.
It also used by my next patch which is in same case.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

0e22be89

btrfs: Bypass unrelated items before accessing its contents in scrub · d7cad238

由 Zhao Lei 提交于 7月 22, 2015

When we access extent_root in scrub_stripe() and
scrub_raid56_parity(), we need bypass unrelated tree item firstly
before using its contents to do other condition.

It is not a bug fix, only making code sequence in logic.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d7cad238

btrfs: Load only necessary csums into list in scrub · fe8cf654

由 Zhao Lei 提交于 7月 22, 2015

We need not load csum of whole strip in scrub because strip is trimed
before use, it is to say, what we really need to calculate csum is
data between [extent_logical, extent_len).

This patch changed to use above segment for btrfs_lookup_csums_range()
in scrub_stripe()
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

fe8cf654

btrfs: Fix calculate typo caused by ambiguous meaning of logic_end · a0dd59de

由 Zhao Lei 提交于 7月 21, 2015

For example, in scrub_raid56_parity(), following lines are used
to judge is all data processed:
 place1: if (key.objectid > logic_end) ...
 place2: if (logic_start >= logic_end) ...
 ...
 (place2 is typo, is should be ">", it is copied from other
  place, where logic_end's meaning is different, long story...)

We can fix above typo directly, but the root reason is ambiguous
meaning of logic_end in scrub raid56 parity.

In other place, XXX_end is pointed to data which is not included,
and we need to process segment of [XXX_start, XXX_end).

But for scrub raid56 parity, logic_end is pointed to lattest data
need to process, and introduced many "+ 1" and "- 1" in code as
below:
 length = sparity->logic_end - sparity->logic_start + 1
 logic_end - logic_start + 1
 stripe_logical + increment - 1

This patch changed logic_end's meaning to make it in normal understanding
in raid56 parity functions and data struct alone with above bugfix.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

a0dd59de

btrfs: Free checksum list on scrub_extent() fail · 6fa96d72

由 Zhao Lei 提交于 7月 21, 2015

When scrub_extent() failed, we need to free previois created
checksum list.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

6fa96d72

btrfs: Check cancel and pause in interval of scrub operation · f2f66a2f

由 Zhao Lei 提交于 7月 21, 2015

Old code checking cancel and pause request inside scrub stripe
operation, like:
  loop() {
    if (parity) {
      scrub_parity_stripe();
      continue;
    }

    check_cancel_and_pause()

    scrub_normal_stripe();
  }

Reason is when introduce raid56 stripe scrub, new code is inserted
simplely to front of loop.

Better to:
  loop() {
    check_cancel_and_pause()

    if (parity)
      scrub_parity_stripe();
    else
      scrub_normal_stripe();
  }

This patch adjusted code place to realize above sequence.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

f2f66a2f

btrfs: Fix scrub panic when leaf crosses stripes · a323e813

由 Zhao Lei 提交于 7月 23, 2015

Scrub panic in following operation:
  mkfs.ext4 /dev/vdh
  btrfs-convert /dev/vdh
  mount /dev/vdh /mnt/tmp1
  btrfs scrub start -B /dev/vdh
  (panic)

Reason:
  1: In some case, leaf created by btrfs-convert was splited into 2
     strips.
  2: Scrub bypassed part of above wrong leaf data, but remain data
     caused panic in scrub_checksum_tree_block().

For reason 1:
  we can get following information after some simple operation.
  a. mkfs.ext4 /dev/vdh
     btrfs-convert /dev/vdh
  b. btrfs-debug-tree /dev/vdh
     we can see following item in extent tree:
     item 25 key (27054080 METADATA_ITEM 0) itemoff 15083 itemsize 33
     Its logical address is [27054080, 27070464)
     and acrossed 2 strips:
     [27000832, 27066368)
     [27066368, 27131904)
  Will be fixed in btrfs-progs(btrfs-convert, btrfsck, ...)

For reason 2:
  Scrub is trying to do a "bypass" in this case, but the result is
  "panic", because current code lacks of some condition in bypass,
  and let some wrong leaf data escaped.

This patch fixed above scrub code.

Before patch:
  # btrfs scrub start -B /dev/vdh
  (panic)

After patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for 353cec8f-da31-4a94-aa35-be72d997b06e
  ...
  # dmesg
  ...
  [   59.088697] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832
  [   59.089929] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368
  #
Reported-by: NChris Murphy <lists@colorremedies.com>
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

a323e813

29 7月, 2015 1 次提交

block: add a bi_error field to struct bio · 4246a0b6

由 Christoph Hellwig 提交于 7月 20, 2015

Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4246a0b6

01 7月, 2015 1 次提交

btrfs: add error handling for scrub_workers_get() · e82afc52

由 Zhao Lei 提交于 6月 12, 2015

Although it is a rare case, we'd better free previous allocated
memory on error.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

e82afc52

10 6月, 2015 1 次提交

btrfs: Fix lockdep warning of wr_ctx->wr_lock in scrub_free_wr_ctx() · 20b2e302

由 Zhao Lei 提交于 6月 04, 2015

lockdep report following warning in test:
 [25176.843958] =================================
 [25176.844519] [ INFO: inconsistent lock state ]
 [25176.845047] 4.1.0-rc3 #22 Tainted: G        W
 [25176.845591] ---------------------------------
 [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
 [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes:
 [25176.847246]  (&wr_ctx->wr_lock){+.?...}, at: [<ffffffffa04cdc6d>] scrub_free_ctx+0x2d/0xf0 [btrfs]
 [25176.847838] {SOFTIRQ-ON-W} state was registered at:
 [25176.848396]   [<ffffffff810bf460>] __lock_acquire+0x6a0/0xe10
 [25176.848955]   [<ffffffff810bfd1e>] lock_acquire+0xce/0x2c0
 [25176.849491]   [<ffffffff816489af>] mutex_lock_nested+0x7f/0x410
 [25176.850029]   [<ffffffffa04d04ff>] scrub_stripe+0x4df/0x1080 [btrfs]
 [25176.850575]   [<ffffffffa04d11b1>] scrub_chunk.isra.19+0x111/0x130 [btrfs]
 [25176.851110]   [<ffffffffa04d144c>] scrub_enumerate_chunks+0x27c/0x510 [btrfs]
 [25176.851660]   [<ffffffffa04d3b87>] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs]
 [25176.852189]   [<ffffffffa04e918e>] btrfs_dev_replace_start+0x36e/0x450 [btrfs]
 [25176.852771]   [<ffffffffa04a98e0>] btrfs_ioctl+0x1e10/0x2d20 [btrfs]
 [25176.853315]   [<ffffffff8121c5b8>] do_vfs_ioctl+0x318/0x570
 [25176.853868]   [<ffffffff8121c851>] SyS_ioctl+0x41/0x80
 [25176.854406]   [<ffffffff8164da17>] system_call_fastpath+0x12/0x6f
 [25176.854935] irq event stamp: 51506
 [25176.855511] hardirqs last  enabled at (51506): [<ffffffff810d4ce5>] vprintk_emit+0x225/0x5e0
 [25176.856059] hardirqs last disabled at (51505): [<ffffffff810d4b77>] vprintk_emit+0xb7/0x5e0
 [25176.856642] softirqs last  enabled at (50886): [<ffffffff81067a23>] __do_softirq+0x363/0x640
 [25176.857184] softirqs last disabled at (50949): [<ffffffff8106804d>] irq_exit+0x10d/0x120
 [25176.857746]
 other info that might help us debug this:
 [25176.858845]  Possible unsafe locking scenario:
 [25176.859981]        CPU0
 [25176.860537]        ----
 [25176.861059]   lock(&wr_ctx->wr_lock);
 [25176.861705]   <Interrupt>
 [25176.862272]     lock(&wr_ctx->wr_lock);
 [25176.862881]
  *** DEADLOCK ***

Reason:
 Above warning is caused by:
 Interrupt
 -> bio_endio()
 -> ...
 -> scrub_put_ctx()
 -> scrub_free_ctx() *1
 -> ...
 -> mutex_lock(&wr_ctx->wr_lock);

 scrub_put_ctx() is allowed to be called in end_bio interrupt, but
 in code design, it will never call scrub_free_ctx(sctx) in interrupe
 context(above *1), because btrfs_scrub_dev() get one additional
 reference of sctx->refs, which makes scrub_free_ctx() only called
 withine btrfs_scrub_dev().

 Now the code runs out of our wish, because free sequence in
 scrub_pending_bio_dec() have a gap.

 Current code:
 -----------------------------------+-----------------------------------
 scrub_pending_bio_dec()            |  btrfs_scrub_dev
 -----------------------------------+-----------------------------------
 atomic_dec(&sctx->bios_in_flight); |
 wake_up(&sctx->list_wait);         |
                                    | scrub_put_ctx()
                                    | -> atomic_dec_and_test(&sctx->refs)
 scrub_put_ctx(sctx);               |
 -> atomic_dec_and_test(&sctx->refs)|
 -> scrub_free_ctx()                |
 -----------------------------------+-----------------------------------

 We expected:
 -----------------------------------+-----------------------------------
 scrub_pending_bio_dec()            |  btrfs_scrub_dev
 -----------------------------------+-----------------------------------
 atomic_dec(&sctx->bios_in_flight); |
 wake_up(&sctx->list_wait);         |
 scrub_put_ctx(sctx);               |
 -> atomic_dec_and_test(&sctx->refs)|
                                    | scrub_put_ctx()
                                    | -> atomic_dec_and_test(&sctx->refs)
                                    | -> scrub_free_ctx()
 -----------------------------------+-----------------------------------

Fix:
 Move scrub_pending_bio_dec() to a workqueue, to avoid this function run
 in interrupt context.
 Tested by check tracelog in debug.

Changelog v1->v2:
 Use workqueue instead of adjust function call sequence in v1,
 because v1 will introduce a bug pointed out by:
 Filipe David Manana <fdmanana@gmail.com>
Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

20b2e302

04 3月, 2015 7 次提交

D
btrfs: cleanup, use correct type in div_u64_rem · 9d644a62
由 David Sterba 提交于 2月 20, 2015
```
div_u64_rem expects u32 for divisior and reminder.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
9d644a62

btrfs: replace remaining do_div calls with div_u64 variants · 47c5713f

由 David Sterba 提交于 2月 20, 2015

Switch to div_u64_rem that does type checking and has more obvious
semantics than do_div.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

47c5713f

btrfs: cleanup 64bit/32bit divs, provably bounded values · b8b93add

由 David Sterba 提交于 1月 16, 2015

The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type.
Get rid of a few more do_div instances.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

b8b93add

btrfs: cleanup, use kmalloc_array/kcalloc array helpers · 31e818fe

由 David Sterba 提交于 2月 20, 2015

Convert kmalloc(nr * size, ..) to kmalloc_array that does additional
overflow checks, the zeroing variant is kcalloc.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

31e818fe

D
btrfs: cleanup, use correct type in div_u64_rem · 29cf342b
由 David Sterba 提交于 2月 20, 2015
```
div_u64_rem expects u32 for divisior and reminder.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
29cf342b

btrfs: replace remaining do_div calls with div_u64 variants · 35b850f1

由 David Sterba 提交于 2月 20, 2015

Switch to div_u64_rem that does type checking and has more obvious
semantics than do_div.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

35b850f1

btrfs: cleanup 64bit/32bit divs, provably bounded values · c7abe829

由 David Sterba 提交于 1月 16, 2015

The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type.
Get rid of a few more do_div instances.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

c7abe829

17 2月, 2015 1 次提交

btrfs: use correct type for workqueue flags · 6f011058

由 David Sterba 提交于 2月 16, 2015

Through all the local wrappers to alloc_workqueue, __alloc_workqueue_key
takes an unsigned int.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

6f011058

15 2月, 2015 1 次提交

Btrfs: scrub, fix sleep in atomic context · f55985f4

由 Filipe Manana 提交于 2月 09, 2015

My previous patch "Btrfs: fix scrub race leading to use-after-free"
introduced the possibility to sleep in an atomic context, which happens
when the scrub_lock mutex is held at the time scrub_pending_bio_dec()
is called - this function can be called under an atomic context.
Chris ran into this in a debug kernel which gave the following trace:

[ 1928.950319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:621
[ 1928.967334] in_atomic(): 1, irqs_disabled(): 0, pid: 149670, name: fsstress
[ 1928.981324] INFO: lockdep is turned off.
[ 1928.989244] CPU: 24 PID: 149670 Comm: fsstress Tainted: G        W     3.19.0-rc7-mason+ #41
[ 1929.006418] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
[ 1929.022207]  ffffffff81a22cf8 ffff881076e03b78 ffffffff816b8dd9 ffff881076e03b78
[ 1929.037267]  ffff880d8e828710 ffff881076e03ba8 ffffffff810856c4 ffff881076e03bc8
[ 1929.052315]  0000000000000000 000000000000026d ffffffff81a22cf8 ffff881076e03bd8
[ 1929.067381] Call Trace:
[ 1929.072344]  <IRQ>  [<ffffffff816b8dd9>] dump_stack+0x4f/0x6e
[ 1929.083968]  [<ffffffff810856c4>] ___might_sleep+0x174/0x230
[ 1929.095352]  [<ffffffff810857d2>] __might_sleep+0x52/0x90
[ 1929.106223]  [<ffffffff816bb68f>] mutex_lock_nested+0x2f/0x3b0
[ 1929.117951]  [<ffffffff810ab37d>] ? trace_hardirqs_on+0xd/0x10
[ 1929.129708]  [<ffffffffa05dc838>] scrub_pending_bio_dec+0x38/0x70 [btrfs]
[ 1929.143370]  [<ffffffffa05dd0e0>] scrub_parity_bio_endio+0x50/0x70 [btrfs]
[ 1929.157191]  [<ffffffff812fa603>] bio_endio+0x53/0xa0
[ 1929.167382]  [<ffffffffa05f96bc>] rbio_orig_end_io+0x7c/0xa0 [btrfs]
[ 1929.180161]  [<ffffffffa05f97ba>] raid_write_parity_end_io+0x5a/0x80 [btrfs]
[ 1929.194318]  [<ffffffff812fa603>] bio_endio+0x53/0xa0
[ 1929.204496]  [<ffffffff8130401b>] blk_update_request+0x1eb/0x450
[ 1929.216569]  [<ffffffff81096e58>] ? trigger_load_balance+0x78/0x500
[ 1929.229176]  [<ffffffff8144c74d>] scsi_end_request+0x3d/0x1f0
[ 1929.240740]  [<ffffffff8144ccac>] scsi_io_completion+0xac/0x5b0
[ 1929.252654]  [<ffffffff81441c50>] scsi_finish_command+0xf0/0x150
[ 1929.264725]  [<ffffffff8144d317>] scsi_softirq_done+0x147/0x170
[ 1929.276635]  [<ffffffff8130ace6>] blk_done_softirq+0x86/0xa0
[ 1929.288014]  [<ffffffff8105d92e>] __do_softirq+0xde/0x600
[ 1929.298885]  [<ffffffff8105df6d>] irq_exit+0xbd/0xd0
(...)

Fix this by using a reference count on the scrub context structure
instead of locking the scrub_lock mutex.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

f55985f4

03 2月, 2015 1 次提交

Btrfs: fix scrub race leading to use-after-free · de554a4f

由 Filipe Manana 提交于 1月 27, 2015

While running a scrub on a kernel with CONFIG_DEBUG_PAGEALLOC=y, I got
the following trace:

[68127.807663] BUG: unable to handle kernel paging request at ffff8803f8947a50
[68127.807663] IP: [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
[68127.807663] PGD 3003067 PUD 43e1f5067 PMD 43e030067 PTE 80000003f8947060
[68127.807663] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[68127.807663] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc processor parpo
[68127.807663] CPU: 2 PID: 3081 Comm: kworker/u8:5 Not tainted 3.18.0-rc6-btrfs-next-3+ #4
[68127.807663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[68127.807663] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs]
[68127.807663] task: ffff880101fc5250 ti: ffff8803f097c000 task.ti: ffff8803f097c000
[68127.807663] RIP: 0010:[<ffffffff8107da31>]  [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
[68127.807663] RSP: 0018:ffff8803f097fbb8  EFLAGS: 00010093
[68127.807663] RAX: 0000000028dd386c RBX: ffff8803f8947a50 RCX: 0000000028dd3854
[68127.807663] RDX: 0000000000000018 RSI: 0000000000000002 RDI: 0000000000000001
[68127.807663] RBP: ffff8803f097fbd8 R08: 0000000000000004 R09: 0000000000000001
[68127.807663] R10: ffff880102620980 R11: ffff8801f3e8c900 R12: 000000000001d390
[68127.807663] R13: 00000000cabd13c8 R14: ffff8803f8947800 R15: ffff88037c574f00
[68127.807663] FS:  0000000000000000(0000) GS:ffff88043dd00000(0000) knlGS:0000000000000000
[68127.807663] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[68127.807663] CR2: ffff8803f8947a50 CR3: 00000000b6481000 CR4: 00000000000006e0
[68127.807663] Stack:
[68127.807663]  ffffffff823942a8 ffff8803f8947a50 ffff8802a3416f80 0000000000000000
[68127.807663]  ffff8803f097fc18 ffffffff8141e7c0 ffffffff81072948 000000000034f314
[68127.807663]  ffff8803f097fc08 0000000000000292 ffff8803f097fc48 ffff8803f8947a50
[68127.807663] Call Trace:
[68127.807663]  [<ffffffff8141e7c0>] _raw_spin_lock_irqsave+0x4b/0x55
[68127.807663]  [<ffffffff81072948>] ? __wake_up+0x22/0x4b
[68127.807663]  [<ffffffff81072948>] __wake_up+0x22/0x4b
[68127.807663]  [<ffffffffa0392327>] scrub_pending_bio_dec+0x32/0x36 [btrfs]
[68127.807663]  [<ffffffffa0395e70>] scrub_bio_end_io_worker+0x5a3/0x5c9 [btrfs]
[68127.807663]  [<ffffffff810e0c7c>] ? time_hardirqs_off+0x15/0x28
[68127.807663]  [<ffffffff81078106>] ? trace_hardirqs_off_caller+0x4c/0xb9
[68127.807663]  [<ffffffffa0372a7c>] normal_work_helper+0xf1/0x238 [btrfs]
[68127.807663]  [<ffffffffa0372d3d>] btrfs_scrub_helper+0x12/0x14 [btrfs]
[68127.807663]  [<ffffffff810582d2>] process_one_work+0x1e4/0x3b6
[68127.807663]  [<ffffffff81078180>] ? trace_hardirqs_off+0xd/0xf
[68127.807663]  [<ffffffff81058dc9>] worker_thread+0x1fb/0x2a8
[68127.807663]  [<ffffffff81058bce>] ? rescuer_thread+0x219/0x219
[68127.807663]  [<ffffffff8105cd75>] kthread+0xdb/0xe3
[68127.807663]  [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67
[68127.807663]  [<ffffffff8141f1ec>] ret_from_fork+0x7c/0xb0
[68127.807663]  [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67
[68127.807663] Code: 39 c2 75 14 8d 8a 00 00 01 00 89 d0 f0 0f b1 0b 39 d0 0f 84 81 00 00 00 4c 69 2d 27 86 99 00 fa 00 00 00 45 31 e4 4d 39 ec 74 2b <8b> 13 89 d0 c1 e8 10 66 39 c2 75
[68127.807663] RIP  [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
[68127.807663]  RSP <ffff8803f097fbb8>
[68127.807663] CR2: ffff8803f8947a50
[68127.807663] ---[ end trace d7045aac00a66cd8 ]---

This is due to a race that can happen in a very tiny time window and is
illustrated by the following sequence diagram:

         CPU 1                                                     CPU 2

                                                                btrfs_scrub_dev()
scrub_bio_end_io_worker()
   scrub_pending_bio_dec()
       atomic_dec(&sctx->bios_in_flight)
                                                                   wait sctx->bios_in_flight == 0
                                                                   wait sctx->workers_pending == 0
                                                                   mutex_lock(&fs_info->scrub_lock)
                                                                   (...)
                                                                   mutex_lock(&fs_info->scrub_lock)
                                                                   scrub_free_ctx(sctx)
                                                                      kfree(sctx)
       wake_up(&sctx->list_wait)
          __wake_up()
              spin_lock_irqsave(&sctx->list_wait->lock, flags)

Another variation of this scenario that results in the same use-after-free
issue is:

         CPU 1                                                     CPU 2

                                                                btrfs_scrub_dev()
                                                                   wait sctx->bios_in_flight == 0
scrub_bio_end_io_worker()
   scrub_pending_bio_dec()
       __wake_up(&sctx->list_wait)
          spin_lock_irqsave(&sctx->list_wait->lock, flags)
          default_wake_function()
              wake up task at CPU 2
                                                                   wait sctx->workers_pending == 0
                                                                   mutex_lock(&fs_info->scrub_lock)
                                                                   (...)
                                                                   mutex_lock(&fs_info->scrub_lock)
                                                                   scrub_free_ctx(sctx)
                                                                      kfree(sctx)
          spin_unlock_irqrestore(&sctx->list_wait->lock, flags)

Fix this by holding the scrub lock while doing the wakeup.

This isn't a recent regression, the issue as been around since the scrub
feature was added (2011, commit a2de733c).
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

de554a4f

28 1月, 2015 1 次提交

btrfs: fix raid56 scrub failed in xfstests btrfs/072 · 063c54dc

由 Gui Hecheng 提交于 1月 09, 2015

The xfstests btrfs/072 reports uncorrectable read errors in dmesg,
because scrub forgets to use commit_root for parity scrub routine
and scrub attempts to scrub those extents items whose contents are
not fully on disk.

To fix it, we just add the @search_commit_root flag back.
Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NChris Mason <clm@fb.com>

063c54dc

22 1月, 2015 6 次提交

Rename all ref_count to refs in struct · 57019345

由 Zhao Lei 提交于 1月 20, 2015

refs is better than ref_count to record a struct's ref count.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Suggested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

57019345

Btrfs: Introduce BTRFS_BLOCK_GROUP_RAID56_MASK to check raid56 simply · ffe2d203

由 Zhao Lei 提交于 1月 20, 2015

So we can check raid56 with:
 (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)
instead of long:
 (map->type & (BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6))
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

ffe2d203

Btrfs: Include map_type in raid_bio · 10f11900

由 Zhao Lei 提交于 1月 20, 2015

Corrent code use many kinds of "clever" way to determine operation
target's raid type, as:
  raid_map != NULL
  or
  raid_map[MAX_NR] == RAID[56]_Q_STRIPE

To make code easy to maintenance, this patch put raid type into
bbio, and we can always get raid type from bbio with a "stupid"
way.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

10f11900

Btrfs: Simplify scrub_setup_recheck_block()'s argument · be50a8dd

由 Zhao Lei 提交于 1月 20, 2015

scrub_setup_recheck_block() have many arguments but most of them
can be get from one of them, we can remove them to make code clean.
Some other cleanup for that function also included in this patch.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

be50a8dd

Btrfs: Combine per-page recover in dev-replace and scrub · b968fed1

由 Zhao Lei 提交于 1月 20, 2015

The code are similar, combine them to make code clean and easy to maintenance.
Some lost condition are also completed with benefit of this combination.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

b968fed1

Btrfs: Separate finding-right-mirror and writing-to-target's process in... · 8d6738c1

由 Zhao Lei 提交于 1月 20, 2015

Btrfs: Separate finding-right-mirror and writing-to-target's process in scrub_handle_errored_block()

In corrent code, code of finding-right-mirror and writing-to-target
are mixed in logic, if we find a right mirror but failed in writing
to target, it will treat as "hadn't found right block", and fill the
target with sblock_bad.

Actually, "failed in writing to target" does not mean "source
block is wrong", this patch separate above two condition in logic,
and do some cleanup to make code clean.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

8d6738c1

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功