md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev
hulk inclusion
category: bugfix
bugzilla: 18683
CVE: NA
---------------------------
We get a NULL pointer dereference oops when test raid1 as follow:
mdadm -CR /dev/md1 -l 1 -n 2 /dev/sd[ab]
mdadm /dev/md1 -f /dev/sda
mdadm /dev/md1 -r /dev/sda
mdadm /dev/md1 -a /dev/sda
sleep 5
mdadm /dev/md1 -f /dev/sdb
mdadm /dev/md1 -r /dev/sdb
mdadm /dev/md1 -a /dev/sdb
After a disk(/dev/sda) has been removed, we add the disk to
raid array again, which would trigger recovery action.
Since the rdev current state is 'spare', read/write bio can
be issued to the disk.
Then we set the other disk (/dev/sdb) faulty. Since the raid
array is now in degraded state and /dev/sdb is the only
'In_sync' disk, raid1_error() will return but without set
faulty success.
However, that can interrupt the recovery action and
md_check_recovery will try to call remove_and_add_spares()
to remove the spare disk. And the race condition between
remove_and_add_spares() and raid1_write_request() in follow
can cause NULL pointer dereference for conf->mirrors[i].rdev:
raid1_write_request() md_check_recovery raid1_error()
rcu_read_lock()
rdev != NULL
!test_bit(Faulty, &rdev->flags)
conf->recovery_disabled=
mddev->recovery_disabled;
return busy
remove_and_add_spares
raid1_remove_disk
rdev->nr_pending == 0
atomic_inc(&rdev->nr_pending);
rcu_read_unlock()
p->rdev=NULL
conf->mirrors[i].rdev->data_offset
NULL pointer deref!!!
if (!test_bit(RemoveSynchronized,
&rdev->flags))
synchronize_rcu();
p->rdev=rdev
To fix the race condition, we add a new flag 'WantRemove' for rdev.
Before access conf->mirrors[i].rdev, we need to ensure the rdev
without 'WantRemove' bit.
Link: https://marc.info/?l=linux-raid&m=156412052717709&w=2Reported-by: NZou Wei <zou_wei@huawei.com>
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Showing
想要评论请 注册 或 登录