• W
    scsi: libiscsi: Fix cmds hung when sd_shutdown · 30a5327f
    Wu Bo 提交于
    hulk inclusion
    category: bugfix
    bugzilla: NA
    CVE: NA
    
    https://gitee.com/src-openeuler/kernel/issues/I28N9J
    ---------------------------
    
    For some reason, during reboot the system, iscsi.service failed to
    logout all sessions. kernel will hang forever on its
    sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
    still existent paths.
    
    [ 1044.098991] reboot: Mddev shutdown finished.
    [ 1044.099311] reboot: Usermodehelper disable finished.
    [ 1050.611244]  connection2:0: ping timeout of 5 secs expired, recv timeout 5,
    last rx 4295152378, last ping 4295153633, now 4295154944
    [ 1348.599676] Call trace:
    [ 1348.599887]  __switch_to+0xe8/0x150
    [ 1348.600113]  __schedule+0x33c/0xa08
    [ 1348.600372]  schedule+0x2c/0x88
    [ 1348.600567]  schedule_timeout+0x184/0x3a8
    [ 1348.600820]  io_schedule_timeout+0x28/0x48
    [ 1348.601089]  wait_for_common_io.constprop.2+0x168/0x258
    [ 1348.601425]  wait_for_completion_io_timeout+0x28/0x38
    [ 1348.601762]  blk_execute_rq+0x98/0xd8
    [ 1348.602006]  __scsi_execute+0xe0/0x1e8
    [ 1348.602262]  sd_sync_cache+0xd0/0x220 [sd_mod]
    [ 1348.602551]  sd_shutdown+0x6c/0xf8 [sd_mod]
    [ 1348.602826]  device_shutdown+0x13c/0x250
    [ 1348.603078]  kernel_restart_prepare+0x5c/0x68
    [ 1348.603400]  kernel_restart+0x20/0x98
    [ 1348.603683]  __se_sys_reboot+0x214/0x260
    [ 1348.603987]  __arm64_sys_reboot+0x24/0x30
    [ 1348.604300]  el0_svc_common+0x80/0x1b8
    [ 1348.604590]  el0_svc_handler+0x78/0xe0
    [ 1348.604877]  el0_svc+0x10/0x260
    
    d7549412 (scsi: libiscsi: Allow sd_shutdown on bad transport) Once
    solved this problem. The iscsi_eh_cmd_timed_out() function add system_state
    judgment, and will return BLK_EH_DONE and mark the result as
    DID_NO_CONNECT when system_state is not SYSTEM_RUNNING,
    To tell upper layers that the command was handled during
    the transport layer error handler helper.
    
    The scsi Mid Layer timeout handler function(scsi_times_out) will be
    abort the cmd if the scsi LLD timeout handler return BLK_EH_DONE.
    if abort cmd failed, will enter scsi EH logic.
    
    Scsi EH will do reset target logic, if reset target failed, Will
    call iscsi_eh_session_reset() function to drop the session.
    
    The iscsi_eh_session_reset function will wait for a relogin,
    session termination from userspace, or a recovery/replacement timeout.
    But at this time, the app iscsid has exited, and the session was marked as
    ISCSI_STATE_FAILED, So the SCSI EH process will never be
    scheduled back again.
    
    PID: 9123   TASK: ffff80020c1b4d80  CPU: 3   COMMAND: "scsi_eh_2"
     #0 [ffff00008632bb70] __switch_to at ffff000080088738
     #1 [ffff00008632bb90] __schedule at ffff000080a00480
     #2 [ffff00008632bc20] schedule at ffff000080a00b58
     #3 [ffff00008632bc30] iscsi_eh_session_reset at ffff000000d1ab9c [libiscsi]
     #4 [ffff00008632bcb0] iscsi_eh_recover_target at ffff000000d1d1fc [libiscsi]
     #5 [ffff00008632bd00] scsi_try_target_reset at ffff0000806f0bac
     #6 [ffff00008632bd30] scsi_eh_ready_devs at ffff0000806f2724
     #7 [ffff00008632bde0] scsi_error_handler at ffff0000806f41d4
     #8 [ffff00008632be70] kthread at ffff000080119ae0
    Reported-by: NTianxiong Lu <lutianxiong@huawei.com>
    Signed-off-by: NWu Bo <wubo40@huawei.com>
    Signed-off-by: NYu Kuai <yukuai3@huawei.com>
    Reviewed-by: NJason Yan <yanaijie@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    30a5327f
libiscsi.c 99.6 KB