提交 · 2c17620384cd754daf834683d9793460656ec0c3 · openeuler / Kernel

03 6月, 2023 1 次提交

md/raid1: stop mdx_raid1 thread when raid1 array run failed · 025dac6f

由 Jiang Li 提交于 6月 03, 2023

mainline inclusion
from mainline-v6.2-rc1
commit b611ad14
category: bugfix
bugzilla: 188662, https://gitee.com/openeuler/kernel/issues/I6UMUF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b611ad14006e5be2170d9e8e611bf49dff288911

--------------------------------

fail run raid1 array when we assemble array with the inactive disk only,
but the mdx_raid1 thread were not stop, Even if the associated resources
have been released. it will caused a NULL dereference when we do poweroff.

This causes the following Oops:
    [  287.587787] BUG: kernel NULL pointer dereference, address: 0000000000000070
    [  287.594762] #PF: supervisor read access in kernel mode
    [  287.599912] #PF: error_code(0x0000) - not-present page
    [  287.605061] PGD 0 P4D 0
    [  287.607612] Oops: 0000 [#1] SMP NOPTI
    [  287.611287] CPU: 3 PID: 5265 Comm: md0_raid1 Tainted: G     U            5.10.146 #0
    [  287.619029] Hardware name: xxxxxxx/To be filled by O.E.M, BIOS 5.19 06/16/2022
    [  287.626775] RIP: 0010:md_check_recovery+0x57/0x500 [md_mod]
    [  287.632357] Code: fe 01 00 00 48 83 bb 10 03 00 00 00 74 08 48 89 ......
    [  287.651118] RSP: 0018:ffffc90000433d78 EFLAGS: 00010202
    [  287.656347] RAX: 0000000000000000 RBX: ffff888105986800 RCX: 0000000000000000
    [  287.663491] RDX: ffffc90000433bb0 RSI: 00000000ffffefff RDI: ffff888105986800
    [  287.670634] RBP: ffffc90000433da0 R08: 0000000000000000 R09: c0000000ffffefff
    [  287.677771] R10: 0000000000000001 R11: ffffc90000433ba8 R12: ffff888105986800
    [  287.684907] R13: 0000000000000000 R14: fffffffffffffe00 R15: ffff888100b6b500
    [  287.692052] FS:  0000000000000000(0000) GS:ffff888277f80000(0000) knlGS:0000000000000000
    [  287.700149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  287.705897] CR2: 0000000000000070 CR3: 000000000320a000 CR4: 0000000000350ee0
    [  287.713033] Call Trace:
    [  287.715498]  raid1d+0x6c/0xbbb [raid1]
    [  287.719256]  ? __schedule+0x1ff/0x760
    [  287.722930]  ? schedule+0x3b/0xb0
    [  287.726260]  ? schedule_timeout+0x1ed/0x290
    [  287.730456]  ? __switch_to+0x11f/0x400
    [  287.734219]  md_thread+0xe9/0x140 [md_mod]
    [  287.738328]  ? md_thread+0xe9/0x140 [md_mod]
    [  287.742601]  ? wait_woken+0x80/0x80
    [  287.746097]  ? md_register_thread+0xe0/0xe0 [md_mod]
    [  287.751064]  kthread+0x11a/0x140
    [  287.754300]  ? kthread_park+0x90/0x90
    [  287.757974]  ret_from_fork+0x1f/0x30

In fact, when raid1 array run fail, we need to do
md_unregister_thread() before raid1_free().
Signed-off-by: NJiang Li <jiang.li@ugreen.com>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
(cherry picked from commit 22eeb5d1)

025dac6f

15 3月, 2023 1 次提交

raid1, raid10: switch to precise io accounting · 77d9bc78

由 Yu Kuai 提交于 3月 15, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6L586
CVE: NA

--------------------------------

'ios' and 'sectors' is counted in bio_start_io_acct() while io is
started insted of io is done. Hence switch to precise io accounting to
count them when io is done.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

77d9bc78

04 8月, 2022 2 次提交

md/raid1: enable io accounting · 3232bfe2

由 Guoqing Jiang 提交于 8月 04, 2022

mainline inclusion
from mainline-v5.14-rc1
commit a0159832
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I587H6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a0159832e51e3af03b89ecc5d6b9db451e529b5f

-------------------------------

For raid1, we record the start time between split bio and clone bio,
and finish the accounting in the final endio.

Also introduce start_time in r1bio accordingly.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>
Conflict:
	drivers/md/raid1.c
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3232bfe2

md/raid1: rename print_msg with r1bio_existed · a6502b42

由 Guoqing Jiang 提交于 8月 04, 2022

mainline inclusion
from mainline-v5.14-rc1
commit 9b8ae7b9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I587H6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9b8ae7b938235229ccb112c4e887ff1bcc232836

-------------------------------

The caller of raid1_read_request could pass NULL or a valid pointer for
"struct r1bio *r1_bio", so it actually means whether r1_bio is existed
or not.
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a6502b42

10 5月, 2022 3 次提交

md/raid1: fix missing bitmap update w/o WriteMostly devices · 24fc8753

由 Song Liu 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.16
commit 46669e86
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I566T6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46669e8616c649c71c4cfcd712fd3d107e771380

--------------------------------

commit [1] causes missing bitmap updates when there isn't any WriteMostly
devices.

Detailed steps to reproduce by Norbert (which somehow didn't make to lore):

   # setup md10 (raid1) with two drives (1 GByte sparse files)
   dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0
   dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0

   losetup /dev/loop11 disk1
   losetup /dev/loop12 disk2

   mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12

   # add bitmap (aka write-intent log)
   mdadm /dev/md10 --grow --bitmap=internal

   echo check > /sys/block/md10/md/sync_action

   root:# cat /sys/block/md10/md/mismatch_cnt
   0
   root:#

   # remove member drive disk2 (loop12)
   mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12

   # modify degraded md device
   dd if=/dev/urandom of=/dev/md10 bs=512 count=1

   # no blocks recorded as out of sync on the remaining member disk1/loop11
   root:# mdadm -X /dev/loop11 | grep Bitmap
             Bitmap : 16 bits (chunks), 0 dirty (0.0%)
   root:#

   # re-add disk2, nothing synced because of empty bitmap
   mdadm /dev/md10 --re-add /dev/loop12

   # check integrity again
   echo check > /sys/block/md10/md/sync_action

   # disk1 and disk2 are no longer in sync, reads return differend data
   root:# cat /sys/block/md10/md/mismatch_cnt
   128
   root:#

   # clean up
   mdadm -S /dev/md10
   losetup -d /dev/loop11
   losetup -d /dev/loop12
   rm disk1 disk2

Fix this by moving the WriteMostly check to the if condition for
alloc_behind_master_bio().

[1] commit fd3b6975 ("md/raid1: only allocate write behind bio for WriteMostly device")
Fixes: fd3b6975 ("md/raid1: only allocate write behind bio for WriteMostly device")
Cc: stable@vger.kernel.org # v5.12+
Cc: Guoqing Jiang <guoqing.jiang@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: NNorbert Warmuth <nwarmuth@t-online.de>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

24fc8753

md/raid1: only allocate write behind bio for WriteMostly device · 6037574a

由 Guoqing Jiang 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.16-rc1
commit fd3b6975
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I566T6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fd3b6975e9c11c4fa00965f82a0bfbb3b7b44101

--------------------------------

Commit 6607cd31 ("raid1: ensure write
behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
size of behind bio is not bigger than BIO_MAX_VECS sectors.

Unfortunately the same calltrace still could happen since an array could
enable write-behind without write mostly device.

To match the manpage of mdadm (which says "write-behind is only attempted
on drives marked as write-mostly"), we need to check WriteMostly flag to
avoid such unexpected behavior.

[1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25

Cc: stable@vger.kernel.org # v5.12+
Cc: Jens Stutte <jens@chianterastutte.eu>
Reported-by: NJens Stutte <jens@chianterastutte.eu>
Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6037574a

raid1: ensure write behind bio has less than BIO_MAX_VECS sectors · 36ac0d56

由 Guoqing Jiang 提交于 5月 10, 2022

mainline inclusion
from mainline-v5.15-rc1
commit 6607cd31
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I566T6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6607cd319b6b91bff94e90f798a61c031650b514

--------------------------------

We can't split write behind bio with more than BIO_MAX_VECS sectors,
otherwise the below call trace was triggered because we could allocate
oversized write behind bio later.

[ 8.097936] bvec_alloc+0x90/0xc0
[ 8.098934] bio_alloc_bioset+0x1b3/0x260
[ 8.099959] raid1_make_request+0x9ce/0xc50 [raid1]
[ 8.100988] ? __bio_clone_fast+0xa8/0xe0
[ 8.102008] md_handle_request+0x158/0x1d0 [md_mod]
[ 8.103050] md_submit_bio+0xcd/0x110 [md_mod]
[ 8.104084] submit_bio_noacct+0x139/0x530
[ 8.105127] submit_bio+0x78/0x1d0
[ 8.106163] ext4_io_submit+0x48/0x60 [ext4]
[ 8.107242] ext4_writepages+0x652/0x1170 [ext4]
[ 8.108300] ? do_writepages+0x41/0x100
[ 8.109338] ? __ext4_mark_inode_dirty+0x240/0x240 [ext4]
[ 8.110406] do_writepages+0x41/0x100
[ 8.111450] __filemap_fdatawrite_range+0xc5/0x100
[ 8.112513] file_write_and_wait_range+0x61/0xb0
[ 8.113564] ext4_sync_file+0x73/0x370 [ext4]
[ 8.114607] __x64_sys_fsync+0x33/0x60
[ 8.115635] do_syscall_64+0x33/0x40
[ 8.116670] entry_SYSCALL_64_after_hwframe+0x44/0xae

Thanks for the comment from Christoph.

[1]. https://bugs.archlinux.org/task/70992

Cc: stable@vger.kernel.org # v5.12+
Reported-by: NJens Stutte <jens@chianterastutte.eu>
Tested-by: NJens Stutte <jens@chianterastutte.eu>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>

Conflicts:
	drivers/md/raid1.c
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

36ac0d56

23 12月, 2021 1 次提交

md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev · ceff49d9

由 Yufen Yu 提交于 12月 23, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JYYO?from=project-issue
CVE: NA

---------------------------

We get a NULL pointer dereference oops when test raid1 as follow:

mdadm -CR /dev/md1 -l 1 -n 2 /dev/sd[ab]

mdadm /dev/md1 -f /dev/sda
mdadm /dev/md1 -r /dev/sda
mdadm /dev/md1 -a /dev/sda
sleep 5
mdadm /dev/md1 -f /dev/sdb
mdadm /dev/md1 -r /dev/sdb
mdadm /dev/md1 -a /dev/sdb

After a disk(/dev/sda) has been removed, we add the disk to
raid array again, which would trigger recovery action.
Since the rdev current state is 'spare', read/write bio can
be issued to the disk.

Then we set the other disk (/dev/sdb) faulty. Since the raid
array is now in degraded state and /dev/sdb is the only
'In_sync' disk, raid1_error() will return but without set
faulty success.

However, that can interrupt the recovery action and
md_check_recovery will try to call remove_and_add_spares()
to remove the spare disk. And the race condition between
remove_and_add_spares() and raid1_write_request() in follow
can cause NULL pointer dereference for conf->mirrors[i].rdev:

raid1_write_request()   md_check_recovery    raid1_error()
rcu_read_lock()
rdev != NULL
!test_bit(Faulty, &rdev->flags)

                                           conf->recovery_disabled=
                                             mddev->recovery_disabled;
                                            return busy

                        remove_and_add_spares
                        raid1_remove_disk
                        rdev->nr_pending == 0

atomic_inc(&rdev->nr_pending);
rcu_read_unlock()

                        p->rdev=NULL

conf->mirrors[i].rdev->data_offset
NULL pointer deref!!!

                        if (!test_bit(RemoveSynchronized,
                          &rdev->flags))
                         synchronize_rcu();
                         p->rdev=rdev

To fix the race condition, we add a new flag 'WantRemove' for rdev.
Before access conf->mirrors[i].rdev, we need to ensure the rdev
without 'WantRemove' bit.

Link: https://marc.info/?l=linux-raid&m=156412052717709&w=2Reported-by: NZou Wei <zou_wei@huawei.com>
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Confilct:
        drivers/md/md.h
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: Nyuyufen <yuyufen@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ceff49d9

15 10月, 2021 1 次提交

md/raid10: properly indicate failure when ending a failed write request · 92948f37

由 Wei Shuyu 提交于 10月 15, 2021

stable inclusion
from stable-5.10.58
commit a786282b55b4d1134931c679a80e900c46461eae
bugzilla: 176984 https://gitee.com/openeuler/kernel/issues/I4E2P4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a786282b55b4d1134931c679a80e900c46461eae

--------------------------------

commit 5ba03936 upstream.

Similar to [1], this patch fixes the same bug in raid10. Also cleanup the
comments.

[1] commit 2417b986 ("md/raid1: properly indicate failure when ending
                         a failed write request")
Cc: stable@vger.kernel.org
Fixes: 7cee6d4e ("md/raid10: end bio when the device faulty")
Signed-off-by: NWei Shuyu <wsy@dogben.com>
Acked-by: NGuoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

92948f37

03 6月, 2021 1 次提交

md/raid1: properly indicate failure when ending a failed write request · aeefa199

由 Paul Clements 提交于 5月 22, 2021

stable inclusion
from stable-5.10.36
commit 661061a45e32d8b2cc0e306da9f169ad44011382
bugzilla: 51867
CVE: NA

--------------------------------

commit 2417b986 upstream.

This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.

Since we are in the failure leg of raid1_end_write_request, the request
either needs to be retried (R1BIO_WriteError) or failed (R1BIO_Degraded).

Fixes: eeba6809 ("md/raid1: end bio when the device faulty")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: NPaul Clements <paul.clements@us.sios.com>
Signed-off-by: NSong Liu <song@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

aeefa199

09 7月, 2020 1 次提交

writeback: remove bdi->congested_fn · 21cf8661

由 Christoph Hellwig 提交于 7月 01, 2020

Except for pktdvd, the only places setting congested bits are file
systems that allocate their own backing_dev_info structures.  And
pktdvd is a deprecated driver that isn't useful in stack setup
either.  So remove the dead congested_fn stacking infrastructure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSong Liu <song@kernel.org>
Acked-by: NDavid Sterba <dsterba@suse.com>
[axboe: fixup unused variables in bcache/request.c]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21cf8661

01 7月, 2020 1 次提交

block: rename generic_make_request to submit_bio_noacct · ed00aabd

由 Christoph Hellwig 提交于 7月 01, 2020

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed00aabd

14 5月, 2020 1 次提交

md/raid1: release pending accounting for an I/O only after write-behind is also finished · c91114c2

由 David Jeffery 提交于 1月 27, 2020

When using RAID1 and write-behind, md can deadlock when errors occur. With
write-behind, r1bio structs can be accounted by raid1 as queued but not
counted as pending. The pending count is dropped when the original bio is
returned complete but write-behind for the r1bio may still be active.

This breaks the accounting used in some conditions to know when the raid1
md device has reached an idle state. It can result in calls to
freeze_array deadlocking. freeze_array will never complete from a negative
"unqueued" value being calculated due to a queued count larger than the
pending count.

To properly account for write-behind, move the call to allow_barrier from
call_bio_endio to raid_end_bio_io. When using write-behind, md can call
call_bio_endio before all write-behind I/O is complete. Using
raid_end_bio_io for the point to call allow_barrier will release the
pending count at a point where all I/O for an r1bio, even write-behind, is
done.
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Signed-off-by: NSong Liu <songliubraving@fb.com>

c91114c2

14 1月, 2020 5 次提交

md/raid1: introduce wait_for_serialization · d0d2d8ba