• Y
    Revert "md: unlock mddev before reap sync_thread in action_store" · 75396cbf
    Yu Kuai 提交于
    hulk inclusion
    category: bugfix
    bugzilla: https://gitee.com/openeuler/kernel/issues/I6OMCC
    CVE: NA
    
    --------------------------------
    
    This reverts commit 9dfbdafd.
    
    Because it will introduce a defect that sync_thread can be running while
    MD_RECOVERY_RUNNING is cleared, which will cause some unexpected problems,
    for example:
    
    list_add corruption. prev->next should be next (ffff0001ac1daba0), but was ffff0000ce1a02a0. (prev=ffff0000ce1a02a0).
    Call trace:
     __list_add_valid+0xfc/0x140
     insert_work+0x78/0x1a0
     __queue_work+0x500/0xcf4
     queue_work_on+0xe8/0x12c
     md_check_recovery+0xa34/0xf30
     raid10d+0xb8/0x900 [raid10]
     md_thread+0x16c/0x2cc
     kthread+0x1a4/0x1ec
     ret_from_fork+0x10/0x18
    
    This is because work is requeued while it's still inside workqueue:
    
    t1:			t2:
    action_store
     mddev_lock
      if (mddev->sync_thread)
       mddev_unlock
       md_unregister_thread
       // first sync_thread is done
    			md_check_recovery
    			 mddev_try_lock
    			 /*
    			  * once MD_RECOVERY_DONE is set, new sync_thread
    			  * can start.
    			  */
    			 set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
    			 INIT_WORK(&mddev->del_work, md_start_sync)
    			 queue_work(md_misc_wq, &mddev->del_work)
    			  test_and_set_bit(WORK_STRUCT_PENDING_BIT, ...)
    			  // set pending bit
    			  insert_work
    			   list_add_tail
    			 mddev_unlock
       mddev_lock_nointr
       md_reap_sync_thread
       // MD_RECOVERY_RUNNING is cleared
     mddev_unlock
    
    t3:
    
    // before queued work started from t2
    md_check_recovery
     // MD_RECOVERY_RUNNING is not set, a new sync_thread can be started
     INIT_WORK(&mddev->del_work, md_start_sync)
      work->data = 0
      // work pending bit is cleared
     queue_work(md_misc_wq, &mddev->del_work)
      insert_work
       list_add_tail
       // list is corrupted
    
    This patch revert the commit to fix the problem, the deadlock this
    commit tries to fix will be fixed in following patches.
    Signed-off-by: NYu Kuai <yukuai3@huawei.com>
    Signed-off-by: NSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20230322064122.2384589-2-yukuai1@huaweicloud.comReviewed-by: NHou Tao <houtao1@huawei.com>
    Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
    75396cbf
md.c 259.9 KB