• Z
    scsi: fix iscsi rescan fails to create block · 3b649a02
    Zhong Jinghua 提交于
    hulk inclusion
    category: bugfix
    bugzilla: 188150, https://gitee.com/openeuler/kernel/issues/I643OL
    CVE: NA
    
    --------------------------------
    
    When the three iscsi operations delete, logout, and rescan are concurrent
    at the same time, there is a probability of failure to add disk through
    device_add_disk(). The concurrent process is as follows:
    
    T0: scan host // echo 1 > /sys/devices/platform/host1/scsi_host/host1/scan
    T1: delete target // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
    T2: logout // iscsiadm -m node --login
    T3: T2 scsi_queue_work
    T4: T0 bus_probe_device
    
    T0                          T1                     T2                     T3
    scsi_scan_target
     mutex_lock(&shost->scan_mutex);
      __scsi_scan_target
       scsi_report_lun_scan
        scsi_add_lun
         scsi_sysfs_add_sdev
          device_add
           kobject_add
           //create session1/target1:0:0/1:0:0:1/
           ...
           bus_probe_device
           // Create block asynchronously
     mutex_unlock(&shost->scan_mutex);
                           sdev_store_delete
                            scsi_remove_device
                             device_remove_file
                              mutex_lock(scan_mutex)
                               __scsi_remove_device
                                res = scsi_device_set_state(sdev, SDEV_CANCEL)
                                                 iscsi_if_recv_msg
                                                  scsi_queue_work
                                                                     __iscsi_unbind_session
                                                                     session->target_id = ISCSI_MAX_TARGET
                                                                       __scsi_remove_target
                                                                       sdev->sdev_state == SDEV_CANCEL
                                                                       continue;
                                                                       // end, No delete kobject 1:0:0:1
                                                 iscsi_if_recv_msg
                                                  transport->destroy_session(session)
                                                   __iscsi_destroy_session
                                                   iscsi_session_teardown
                                                    iscsi_remove_session
                                                     __iscsi_unbind_session
                                                      iscsi_session_event
                                                     device_del
                                                     // delete session
    T4:
    // create the block, its parent is 1:0:0:1
    // If kobject 1:0:0:1 does not exist, it won't go down
    __device_attach_async_helper
     device_lock
     ...
     __device_attach_driver
      driver_probe_device
       really_probe
        sd_probe
         device_add_disk
          register_disk
           device_add
          // error
    
    The block is created after the seesion is deleted.
    When T2 deletes the session, it will mark block'parent 1:0:01 as unusable:
    T2
    device_del
     kobject_del
      sysfs_remove_dir
       __kernfs_remove
       // Mark the children under the session as unusable
        while ((pos = kernfs_next_descendant_post(pos, kn)))
    		if (kernfs_active(pos))
    			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
    
    Then, create the block:
    T4
    device_add
     kobject_add
      kobject_add_varg
       kobject_add_internal
        create_dir
         sysfs_create_dir_ns
          kernfs_create_dir_ns
           kernfs_add_one
            if ((parent->flags & KERNFS_ACTIVATED) && !kernfs_active(parent))
    		goto out_unlock;
    		// return error
    
    This error will cause a warning:
    kobject_add_internal failed for block (error: -2 parent: 1:0:0:1).
    In the lower version (such as 5.10), there is no corresponding error handling, continuing
    to go down will trigger a kernel panic, so cc stable.
    
    Therefore, creating the block should not be done after deleting the session.
    More practically, we should ensure that the target under the session is deleted first,
    and then the session is deleted. In this way, there are two possibilities:
    
    1) if the process(T1) of deleting the target execute first, it will grab the device_lock(),
    and the process(T4) of creating the block will wait for the deletion to complete.
    Then, block's parent 1:0:0:1 has been deleted, it won't go down.
    
    2) if the process(T4) of creating block execute first, it will grab the device_lock(),
    and the process(T1) of deleting the target will wait for the creation block to complete.
    Then, the process(T2) of deleting the session should need wait for the deletion to complete.
    
    Fix it by removing the judgment of state equal to SDEV_CANCEL in
    __scsi_remove_target() to ensure the order of deletion. Then, it will wait for
    T1's mutex_lock(scan_mutex) and device_del() in __scsi_remove_device() will wait for
    T4's device_lock(dev).
    But we found that such a fix would cause the previous problem:
    commit 81b6c999 ("scsi: core: check for device state in __scsi_remove_target()").
    So we use scsi_device_try_get() instead of get_devcie() to fix the previous problem.
    
    Fixes: 81b6c999 ("scsi: core: check for device state in __scsi_remove_target()")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
    Reviewed-by: NHou Tao <houtao1@huawei.com>
    Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
    (cherry picked from commit e06779a6)
    3b649a02
scsi_sysfs.c 43.0 KB