• A
    md/raid10: Fix raid10 replace hang when new added disk faulty · 52c7cc01
    Alex Wu 提交于
    commit ee37d7314a32ab6809eacc3389bad0406c69a81f upstream.
    
    [Symptom]
    
    Resync thread hang when new added disk faulty during replacing.
    
    [Root Cause]
    
    In raid10_sync_request(), we expect to issue a bio with callback
    end_sync_read(), and a bio with callback end_sync_write().
    
    In normal situation, we will add resyncing sectors into
    mddev->recovery_active when raid10_sync_request() returned, and sub
    resynced sectors from mddev->recovery_active when end_sync_write()
    calls end_sync_request().
    
    If new added disk, which are replacing the old disk, is set faulty,
    there is a race condition:
        1. In the first rcu protected section, resync thread did not detect
           that mreplace is set faulty and pass the condition.
        2. In the second rcu protected section, mreplace is set faulty.
        3. But, resync thread will prepare the read object first, and then
           check the write condition.
        4. It will find that mreplace is set faulty and do not have to
           prepare write object.
    This cause we add resync sectors but never sub it.
    
    [How to Reproduce]
    
    This issue can be easily reproduced by the following steps:
        mdadm -C /dev/md0 --assume-clean -l 10 -n 4 /dev/sd[abcd]
        mdadm /dev/md0 -a /dev/sde
        mdadm /dev/md0 --replace /dev/sdd
        sleep 1
        mdadm /dev/md0 -f /dev/sde
    
    [How to Fix]
    
    This issue can be fixed by using local variables to record the result
    of test conditions. Once the conditions are satisfied, we can make sure
    that we need to issue a bio for read and a bio for write.
    
    Previous 'commit 24afd80d ("md/raid10: handle recovery of
    replacement devices.")' will also check whether bio is NULL, but leave
    the comment saying that it is a pointless test. So we remove this dummy
    check.
    Reported-by: NAlex Chen <alexchen@synology.com>
    Reviewed-by: NAllen Peng <allenpeng@synology.com>
    Reviewed-by: NBingJing Chang <bingjingc@synology.com>
    Signed-off-by: NAlex Wu <alexwu@synology.com>
    Signed-off-by: NShaohua Li <shli@fb.com>
    Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
    Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
    52c7cc01
raid10.c 135.3 KB