- 11 11月, 2015 6 次提交
-
-
由 Zhao Lei 提交于
It is useless. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
We don't need pass so many arguments for recheck sblock now, this patch cleans them. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
We can use existing scrub_checksum_data() and scrub_checksum_tree_block() for scrub_recheck_block_checksum(), instead of write duplicated code. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
We should reset sblock->xxx_error stats before calling scrub_recheck_block_checksum(). Current code run correctly because all sblock are allocated by k[cz]alloc(), and the error stats are not got changed. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
scrub_setup_recheck_block() isn't setup all necessary fields for sblock_to_check because history reason. So current code need more arguments in severial functions, and more local variables, just to passing these lacked values to necessary place. This patch setup above fields to sblock_to_check in scrub_setup_recheck_block(), for: 1: more cleanup for function arg, local variable 2: to make sblock_to_check complete, then we can use sblock_to_check without concern about some uninitialized member. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
It is better to show error stats to user when we found tree block spanning stripes. On a btrfs created by old version of btrfs-convert: Before patch: # btrfs scrub start -B /dev/vdh scrub done for 8b342d35-2904-41ab-b3cb-2f929709cf47 scrub started at Tue Aug 25 21:19:09 2015 and finished after 00:00:00 total bytes scrubbed: 53.54MiB with 0 errors # dmesg ... [ 128.711434] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832 [ 128.712744] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368 ... After patch: # btrfs scrub start -B /dev/vdh scrub done for ff7f844b-7a4e-4b1a-88a9-8252ab25be1b scrub started at Tue Aug 25 21:42:29 2015 and finished after 00:00:00 total bytes scrubbed: 53.60MiB with 2 errors error details: corrected errors: 0, uncorrectable errors: 2, unverified errors: 0 ERROR: There are uncorrectable errors. # dmesg ...omit... # Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 08 10月, 2015 3 次提交
-
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 01 9月, 2015 2 次提交
-
-
由 Zhao Lei 提交于
These variables are not used from introduced version, remove them. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Because btrfs support scrub raid56 parity stripe now. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 14 8月, 2015 1 次提交
-
-
由 Kent Overstreet 提交于
We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: NSteven Whitehouse <swhiteho@redhat.com> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 8月, 2015 13 次提交
-
-
由 Omar Sandoval 提交于
When testing the previous patch, Zhao Lei reported a similar bug when attempting to scrub a degraded RAID 5/6 filesystem with a missing device, leading to NULL pointer dereferences from the RAID 5/6 parity scrubbing code. The first cause was the same as in the previous patch: attempting to call bio_add_page() on a missing block device. To fix this, scrub_extent_for_parity() can just mark the sectors on the missing device as errors instead of attempting to read from it. Additionally, the code uses scrub_remap_extent() to map the extent of the corresponding data stripe, but the extent wasn't already mapped. If scrub_remap_extent() finds a missing block device, it doesn't initialize extent_dev, so we're left with a NULL struct btrfs_device. The solution is to use btrfs_map_block() directly. Reported-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
The original implementation of device replace on RAID 5/6 seems to have missed support for replacing a missing device. When this is attempted, we end up calling bio_add_page() on a bio with a NULL ->bi_bdev, which crashes when we try to dereference it. This happens because btrfs_map_block() has no choice but to return us the missing device because RAID 5/6 don't have any alternate mirrors to read from, and a missing device has a NULL bdev. The idea implemented here is to handle the missing device case separately, which better only happen when we're replacing a missing RAID 5/6 device. We use the new BTRFS_RBIO_REBUILD_MISSING operation to reconstruct the data from parity, check it with scrub_recheck_block_checksum(), and write it out with scrub_write_block_to_dev_replace(). Reported-by: NPhilip <bugzilla@philip-seeger.de> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=96141Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
The current RAID 5/6 recovery code isn't quite prepared to handle missing devices. In particular, it expects a bio that we previously attempted to use in the read path, meaning that it has valid pages allocated. However, missing devices have a NULL blkdev, and we can't call bio_add_page() on a bio with a NULL blkdev. We could do manual manipulation of bio->bi_io_vec, but that's pretty gross. So instead, add a separate path that allows us to manually add pages to the rbio. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
scrub_submit() claims that it can handle a bio with a NULL block device, but this is misleading, as calling bio_add_page() on a bio with a NULL ->bi_bdev would've already crashed. Delete this, as we're about to properly handle a missing block device. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhaolei 提交于
xfstests btrfs/070 sometimes failed. In my test machine, its fail rate is about 30%. In another vm(vmware), its fail rate is about 50%. Reason: btrfs/070 do replace and defrag with fsstress simultaneously, after above operation, checksum error is found by scrub. Actually, it have no relationship with defrag operation, only replace with fsstress can trigger this bug. New data writen to target device have possibility rewrited by old data from source device by replace code in debug, to avoid above problem, we can set target block group to readonly in replace period, so new data requested by other operation will not write to same place with replace code. Before patch(4.1-rc3): 30% failed in 100 xfstests. After patch: 0% failed in 300 xfstests. It also happened in btrfs/071 as it's another scrub with IO load tests. Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhaolei 提交于
Use new intruduced scrub_pause_on/off() can make this code block clean and more readable. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhaolei 提交于
It can reduce current duplicated code which is similar to scrub_blocked_if_needed() but can not call it because little different. It also used by my next patch which is in same case. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
When we access extent_root in scrub_stripe() and scrub_raid56_parity(), we need bypass unrelated tree item firstly before using its contents to do other condition. It is not a bug fix, only making code sequence in logic. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
We need not load csum of whole strip in scrub because strip is trimed before use, it is to say, what we really need to calculate csum is data between [extent_logical, extent_len). This patch changed to use above segment for btrfs_lookup_csums_range() in scrub_stripe() Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
For example, in scrub_raid56_parity(), following lines are used to judge is all data processed: place1: if (key.objectid > logic_end) ... place2: if (logic_start >= logic_end) ... ... (place2 is typo, is should be ">", it is copied from other place, where logic_end's meaning is different, long story...) We can fix above typo directly, but the root reason is ambiguous meaning of logic_end in scrub raid56 parity. In other place, XXX_end is pointed to data which is not included, and we need to process segment of [XXX_start, XXX_end). But for scrub raid56 parity, logic_end is pointed to lattest data need to process, and introduced many "+ 1" and "- 1" in code as below: length = sparity->logic_end - sparity->logic_start + 1 logic_end - logic_start + 1 stripe_logical + increment - 1 This patch changed logic_end's meaning to make it in normal understanding in raid56 parity functions and data struct alone with above bugfix. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
When scrub_extent() failed, we need to free previois created checksum list. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Old code checking cancel and pause request inside scrub stripe operation, like: loop() { if (parity) { scrub_parity_stripe(); continue; } check_cancel_and_pause() scrub_normal_stripe(); } Reason is when introduce raid56 stripe scrub, new code is inserted simplely to front of loop. Better to: loop() { check_cancel_and_pause() if (parity) scrub_parity_stripe(); else scrub_normal_stripe(); } This patch adjusted code place to realize above sequence. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Zhao Lei 提交于
Scrub panic in following operation: mkfs.ext4 /dev/vdh btrfs-convert /dev/vdh mount /dev/vdh /mnt/tmp1 btrfs scrub start -B /dev/vdh (panic) Reason: 1: In some case, leaf created by btrfs-convert was splited into 2 strips. 2: Scrub bypassed part of above wrong leaf data, but remain data caused panic in scrub_checksum_tree_block(). For reason 1: we can get following information after some simple operation. a. mkfs.ext4 /dev/vdh btrfs-convert /dev/vdh b. btrfs-debug-tree /dev/vdh we can see following item in extent tree: item 25 key (27054080 METADATA_ITEM 0) itemoff 15083 itemsize 33 Its logical address is [27054080, 27070464) and acrossed 2 strips: [27000832, 27066368) [27066368, 27131904) Will be fixed in btrfs-progs(btrfs-convert, btrfsck, ...) For reason 2: Scrub is trying to do a "bypass" in this case, but the result is "panic", because current code lacks of some condition in bypass, and let some wrong leaf data escaped. This patch fixed above scrub code. Before patch: # btrfs scrub start -B /dev/vdh (panic) After patch: # btrfs scrub start -B /dev/vdh scrub done for 353cec8f-da31-4a94-aa35-be72d997b06e ... # dmesg ... [ 59.088697] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832 [ 59.089929] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368 # Reported-by: NChris Murphy <lists@colorremedies.com> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 29 7月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 7月, 2015 1 次提交
-
-
由 Zhao Lei 提交于
Although it is a rare case, we'd better free previous allocated memory on error. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 10 6月, 2015 1 次提交
-
-
由 Zhao Lei 提交于
lockdep report following warning in test: [25176.843958] ================================= [25176.844519] [ INFO: inconsistent lock state ] [25176.845047] 4.1.0-rc3 #22 Tainted: G W [25176.845591] --------------------------------- [25176.846153] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [25176.846713] fsstress/26661 [HC0[0]:SC1[1]:HE1:SE0] takes: [25176.847246] (&wr_ctx->wr_lock){+.?...}, at: [<ffffffffa04cdc6d>] scrub_free_ctx+0x2d/0xf0 [btrfs] [25176.847838] {SOFTIRQ-ON-W} state was registered at: [25176.848396] [<ffffffff810bf460>] __lock_acquire+0x6a0/0xe10 [25176.848955] [<ffffffff810bfd1e>] lock_acquire+0xce/0x2c0 [25176.849491] [<ffffffff816489af>] mutex_lock_nested+0x7f/0x410 [25176.850029] [<ffffffffa04d04ff>] scrub_stripe+0x4df/0x1080 [btrfs] [25176.850575] [<ffffffffa04d11b1>] scrub_chunk.isra.19+0x111/0x130 [btrfs] [25176.851110] [<ffffffffa04d144c>] scrub_enumerate_chunks+0x27c/0x510 [btrfs] [25176.851660] [<ffffffffa04d3b87>] btrfs_scrub_dev+0x1c7/0x6c0 [btrfs] [25176.852189] [<ffffffffa04e918e>] btrfs_dev_replace_start+0x36e/0x450 [btrfs] [25176.852771] [<ffffffffa04a98e0>] btrfs_ioctl+0x1e10/0x2d20 [btrfs] [25176.853315] [<ffffffff8121c5b8>] do_vfs_ioctl+0x318/0x570 [25176.853868] [<ffffffff8121c851>] SyS_ioctl+0x41/0x80 [25176.854406] [<ffffffff8164da17>] system_call_fastpath+0x12/0x6f [25176.854935] irq event stamp: 51506 [25176.855511] hardirqs last enabled at (51506): [<ffffffff810d4ce5>] vprintk_emit+0x225/0x5e0 [25176.856059] hardirqs last disabled at (51505): [<ffffffff810d4b77>] vprintk_emit+0xb7/0x5e0 [25176.856642] softirqs last enabled at (50886): [<ffffffff81067a23>] __do_softirq+0x363/0x640 [25176.857184] softirqs last disabled at (50949): [<ffffffff8106804d>] irq_exit+0x10d/0x120 [25176.857746] other info that might help us debug this: [25176.858845] Possible unsafe locking scenario: [25176.859981] CPU0 [25176.860537] ---- [25176.861059] lock(&wr_ctx->wr_lock); [25176.861705] <Interrupt> [25176.862272] lock(&wr_ctx->wr_lock); [25176.862881] *** DEADLOCK *** Reason: Above warning is caused by: Interrupt -> bio_endio() -> ... -> scrub_put_ctx() -> scrub_free_ctx() *1 -> ... -> mutex_lock(&wr_ctx->wr_lock); scrub_put_ctx() is allowed to be called in end_bio interrupt, but in code design, it will never call scrub_free_ctx(sctx) in interrupe context(above *1), because btrfs_scrub_dev() get one additional reference of sctx->refs, which makes scrub_free_ctx() only called withine btrfs_scrub_dev(). Now the code runs out of our wish, because free sequence in scrub_pending_bio_dec() have a gap. Current code: -----------------------------------+----------------------------------- scrub_pending_bio_dec() | btrfs_scrub_dev -----------------------------------+----------------------------------- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| -> scrub_free_ctx() | -----------------------------------+----------------------------------- We expected: -----------------------------------+----------------------------------- scrub_pending_bio_dec() | btrfs_scrub_dev -----------------------------------+----------------------------------- atomic_dec(&sctx->bios_in_flight); | wake_up(&sctx->list_wait); | scrub_put_ctx(sctx); | -> atomic_dec_and_test(&sctx->refs)| | scrub_put_ctx() | -> atomic_dec_and_test(&sctx->refs) | -> scrub_free_ctx() -----------------------------------+----------------------------------- Fix: Move scrub_pending_bio_dec() to a workqueue, to avoid this function run in interrupt context. Tested by check tracelog in debug. Changelog v1->v2: Use workqueue instead of adjust function call sequence in v1, because v1 will introduce a bug pointed out by: Filipe David Manana <fdmanana@gmail.com> Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 04 3月, 2015 7 次提交
-
-
由 David Sterba 提交于
div_u64_rem expects u32 for divisior and reminder. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
Switch to div_u64_rem that does type checking and has more obvious semantics than do_div. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type. Get rid of a few more do_div instances. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
Convert kmalloc(nr * size, ..) to kmalloc_array that does additional overflow checks, the zeroing variant is kcalloc. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
div_u64_rem expects u32 for divisior and reminder. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
Switch to div_u64_rem that does type checking and has more obvious semantics than do_div. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
The divisor is derived from nodesize or PAGE_SIZE, fits into 32bit type. Get rid of a few more do_div instances. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 17 2月, 2015 1 次提交
-
-
由 David Sterba 提交于
Through all the local wrappers to alloc_workqueue, __alloc_workqueue_key takes an unsigned int. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 15 2月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
My previous patch "Btrfs: fix scrub race leading to use-after-free" introduced the possibility to sleep in an atomic context, which happens when the scrub_lock mutex is held at the time scrub_pending_bio_dec() is called - this function can be called under an atomic context. Chris ran into this in a debug kernel which gave the following trace: [ 1928.950319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:621 [ 1928.967334] in_atomic(): 1, irqs_disabled(): 0, pid: 149670, name: fsstress [ 1928.981324] INFO: lockdep is turned off. [ 1928.989244] CPU: 24 PID: 149670 Comm: fsstress Tainted: G W 3.19.0-rc7-mason+ #41 [ 1929.006418] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012 [ 1929.022207] ffffffff81a22cf8 ffff881076e03b78 ffffffff816b8dd9 ffff881076e03b78 [ 1929.037267] ffff880d8e828710 ffff881076e03ba8 ffffffff810856c4 ffff881076e03bc8 [ 1929.052315] 0000000000000000 000000000000026d ffffffff81a22cf8 ffff881076e03bd8 [ 1929.067381] Call Trace: [ 1929.072344] <IRQ> [<ffffffff816b8dd9>] dump_stack+0x4f/0x6e [ 1929.083968] [<ffffffff810856c4>] ___might_sleep+0x174/0x230 [ 1929.095352] [<ffffffff810857d2>] __might_sleep+0x52/0x90 [ 1929.106223] [<ffffffff816bb68f>] mutex_lock_nested+0x2f/0x3b0 [ 1929.117951] [<ffffffff810ab37d>] ? trace_hardirqs_on+0xd/0x10 [ 1929.129708] [<ffffffffa05dc838>] scrub_pending_bio_dec+0x38/0x70 [btrfs] [ 1929.143370] [<ffffffffa05dd0e0>] scrub_parity_bio_endio+0x50/0x70 [btrfs] [ 1929.157191] [<ffffffff812fa603>] bio_endio+0x53/0xa0 [ 1929.167382] [<ffffffffa05f96bc>] rbio_orig_end_io+0x7c/0xa0 [btrfs] [ 1929.180161] [<ffffffffa05f97ba>] raid_write_parity_end_io+0x5a/0x80 [btrfs] [ 1929.194318] [<ffffffff812fa603>] bio_endio+0x53/0xa0 [ 1929.204496] [<ffffffff8130401b>] blk_update_request+0x1eb/0x450 [ 1929.216569] [<ffffffff81096e58>] ? trigger_load_balance+0x78/0x500 [ 1929.229176] [<ffffffff8144c74d>] scsi_end_request+0x3d/0x1f0 [ 1929.240740] [<ffffffff8144ccac>] scsi_io_completion+0xac/0x5b0 [ 1929.252654] [<ffffffff81441c50>] scsi_finish_command+0xf0/0x150 [ 1929.264725] [<ffffffff8144d317>] scsi_softirq_done+0x147/0x170 [ 1929.276635] [<ffffffff8130ace6>] blk_done_softirq+0x86/0xa0 [ 1929.288014] [<ffffffff8105d92e>] __do_softirq+0xde/0x600 [ 1929.298885] [<ffffffff8105df6d>] irq_exit+0xbd/0xd0 (...) Fix this by using a reference count on the scrub context structure instead of locking the scrub_lock mutex. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 03 2月, 2015 1 次提交
-
-
由 Filipe Manana 提交于
While running a scrub on a kernel with CONFIG_DEBUG_PAGEALLOC=y, I got the following trace: [68127.807663] BUG: unable to handle kernel paging request at ffff8803f8947a50 [68127.807663] IP: [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122 [68127.807663] PGD 3003067 PUD 43e1f5067 PMD 43e030067 PTE 80000003f8947060 [68127.807663] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [68127.807663] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc processor parpo [68127.807663] CPU: 2 PID: 3081 Comm: kworker/u8:5 Not tainted 3.18.0-rc6-btrfs-next-3+ #4 [68127.807663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [68127.807663] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs] [68127.807663] task: ffff880101fc5250 ti: ffff8803f097c000 task.ti: ffff8803f097c000 [68127.807663] RIP: 0010:[<ffffffff8107da31>] [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122 [68127.807663] RSP: 0018:ffff8803f097fbb8 EFLAGS: 00010093 [68127.807663] RAX: 0000000028dd386c RBX: ffff8803f8947a50 RCX: 0000000028dd3854 [68127.807663] RDX: 0000000000000018 RSI: 0000000000000002 RDI: 0000000000000001 [68127.807663] RBP: ffff8803f097fbd8 R08: 0000000000000004 R09: 0000000000000001 [68127.807663] R10: ffff880102620980 R11: ffff8801f3e8c900 R12: 000000000001d390 [68127.807663] R13: 00000000cabd13c8 R14: ffff8803f8947800 R15: ffff88037c574f00 [68127.807663] FS: 0000000000000000(0000) GS:ffff88043dd00000(0000) knlGS:0000000000000000 [68127.807663] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [68127.807663] CR2: ffff8803f8947a50 CR3: 00000000b6481000 CR4: 00000000000006e0 [68127.807663] Stack: [68127.807663] ffffffff823942a8 ffff8803f8947a50 ffff8802a3416f80 0000000000000000 [68127.807663] ffff8803f097fc18 ffffffff8141e7c0 ffffffff81072948 000000000034f314 [68127.807663] ffff8803f097fc08 0000000000000292 ffff8803f097fc48 ffff8803f8947a50 [68127.807663] Call Trace: [68127.807663] [<ffffffff8141e7c0>] _raw_spin_lock_irqsave+0x4b/0x55 [68127.807663] [<ffffffff81072948>] ? __wake_up+0x22/0x4b [68127.807663] [<ffffffff81072948>] __wake_up+0x22/0x4b [68127.807663] [<ffffffffa0392327>] scrub_pending_bio_dec+0x32/0x36 [btrfs] [68127.807663] [<ffffffffa0395e70>] scrub_bio_end_io_worker+0x5a3/0x5c9 [btrfs] [68127.807663] [<ffffffff810e0c7c>] ? time_hardirqs_off+0x15/0x28 [68127.807663] [<ffffffff81078106>] ? trace_hardirqs_off_caller+0x4c/0xb9 [68127.807663] [<ffffffffa0372a7c>] normal_work_helper+0xf1/0x238 [btrfs] [68127.807663] [<ffffffffa0372d3d>] btrfs_scrub_helper+0x12/0x14 [btrfs] [68127.807663] [<ffffffff810582d2>] process_one_work+0x1e4/0x3b6 [68127.807663] [<ffffffff81078180>] ? trace_hardirqs_off+0xd/0xf [68127.807663] [<ffffffff81058dc9>] worker_thread+0x1fb/0x2a8 [68127.807663] [<ffffffff81058bce>] ? rescuer_thread+0x219/0x219 [68127.807663] [<ffffffff8105cd75>] kthread+0xdb/0xe3 [68127.807663] [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67 [68127.807663] [<ffffffff8141f1ec>] ret_from_fork+0x7c/0xb0 [68127.807663] [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67 [68127.807663] Code: 39 c2 75 14 8d 8a 00 00 01 00 89 d0 f0 0f b1 0b 39 d0 0f 84 81 00 00 00 4c 69 2d 27 86 99 00 fa 00 00 00 45 31 e4 4d 39 ec 74 2b <8b> 13 89 d0 c1 e8 10 66 39 c2 75 [68127.807663] RIP [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122 [68127.807663] RSP <ffff8803f097fbb8> [68127.807663] CR2: ffff8803f8947a50 [68127.807663] ---[ end trace d7045aac00a66cd8 ]--- This is due to a race that can happen in a very tiny time window and is illustrated by the following sequence diagram: CPU 1 CPU 2 btrfs_scrub_dev() scrub_bio_end_io_worker() scrub_pending_bio_dec() atomic_dec(&sctx->bios_in_flight) wait sctx->bios_in_flight == 0 wait sctx->workers_pending == 0 mutex_lock(&fs_info->scrub_lock) (...) mutex_lock(&fs_info->scrub_lock) scrub_free_ctx(sctx) kfree(sctx) wake_up(&sctx->list_wait) __wake_up() spin_lock_irqsave(&sctx->list_wait->lock, flags) Another variation of this scenario that results in the same use-after-free issue is: CPU 1 CPU 2 btrfs_scrub_dev() wait sctx->bios_in_flight == 0 scrub_bio_end_io_worker() scrub_pending_bio_dec() __wake_up(&sctx->list_wait) spin_lock_irqsave(&sctx->list_wait->lock, flags) default_wake_function() wake up task at CPU 2 wait sctx->workers_pending == 0 mutex_lock(&fs_info->scrub_lock) (...) mutex_lock(&fs_info->scrub_lock) scrub_free_ctx(sctx) kfree(sctx) spin_unlock_irqrestore(&sctx->list_wait->lock, flags) Fix this by holding the scrub lock while doing the wakeup. This isn't a recent regression, the issue as been around since the scrub feature was added (2011, commit a2de733c). Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 28 1月, 2015 1 次提交
-
-
由 Gui Hecheng 提交于
The xfstests btrfs/072 reports uncorrectable read errors in dmesg, because scrub forgets to use commit_root for parity scrub routine and scrub attempts to scrub those extents items whose contents are not fully on disk. To fix it, we just add the @search_commit_root flag back. Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com> Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 22 1月, 2015 1 次提交
-
-
由 Zhao Lei 提交于
refs is better than ref_count to record a struct's ref count. Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Suggested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-