• Y
    md/raid1: fix a race between removing rdev and access conf->mirrors[i].rdev · 10783f79
    Yufen Yu 提交于
    hulk inclusion
    category: bugfix
    bugzilla: 18683
    CVE: NA
    ---------------------------
    
    We get a NULL pointer dereference oops when test raid1 as follow:
    
    mdadm -CR /dev/md1 -l 1 -n 2 /dev/sd[ab]
    
    mdadm /dev/md1 -f /dev/sda
    mdadm /dev/md1 -r /dev/sda
    mdadm /dev/md1 -a /dev/sda
    sleep 5
    mdadm /dev/md1 -f /dev/sdb
    mdadm /dev/md1 -r /dev/sdb
    mdadm /dev/md1 -a /dev/sdb
    
    After a disk(/dev/sda) has been removed, we add the disk to
    raid array again, which would trigger recovery action.
    Since the rdev current state is 'spare', read/write bio can
    be issued to the disk.
    
    Then we set the other disk (/dev/sdb) faulty. Since the raid
    array is now in degraded state and /dev/sdb is the only
    'In_sync' disk, raid1_error() will return but without set
    faulty success.
    
    However, that can interrupt the recovery action and
    md_check_recovery will try to call remove_and_add_spares()
    to remove the spare disk. And the race condition between
    remove_and_add_spares() and raid1_write_request() in follow
    can cause NULL pointer dereference for conf->mirrors[i].rdev:
    
    raid1_write_request()   md_check_recovery    raid1_error()
    rcu_read_lock()
    rdev != NULL
    !test_bit(Faulty, &rdev->flags)
    
                                               conf->recovery_disabled=
                                                 mddev->recovery_disabled;
                                                return busy
    
                            remove_and_add_spares
                            raid1_remove_disk
                            rdev->nr_pending == 0
    
    atomic_inc(&rdev->nr_pending);
    rcu_read_unlock()
    
                            p->rdev=NULL
    
    conf->mirrors[i].rdev->data_offset
    NULL pointer deref!!!
    
                            if (!test_bit(RemoveSynchronized,
                              &rdev->flags))
                             synchronize_rcu();
                             p->rdev=rdev
    
    To fix the race condition, we add a new flag 'WantRemove' for rdev.
    Before access conf->mirrors[i].rdev, we need to ensure the rdev
    without 'WantRemove' bit.
    
    Link: https://marc.info/?l=linux-raid&m=156412052717709&w=2Reported-by: NZou Wei <zou_wei@huawei.com>
    Signed-off-by: NYufen Yu <yuyufen@huawei.com>
    Reviewed-by: NHou Tao <houtao1@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    10783f79
raid1.c 92.1 KB