• Z
    scsi: fix ata_port_wait_eh() hang caused by missing to wake up eh thread · c2b89ae5
    zhengbin 提交于
    hulk inclusion
    category: bugfix
    bugzilla: 12843
    CVE: NA
    
    ---------------------------
    
    When I use fio test kernel in the following steps:
    1.The sas controller mixes SAS/SATA disks
    2.Use fio test all disks
    3.Simultaneous enable/disable/link_reset/hard_reset PHY
    
    it will hang in ata_port_wait_eh
    Call trace:
     __switch_to+0xb4/0x1b8
     __schedule+0x1e8/0x718
     schedule+0x38/0x90
     ata_port_wait_eh+0x70/0xf8
     sas_ata_wait_eh+0x24/0x30 [libsas]
     transport_sas_phy_reset.isra.3+0x128/0x160 [libsas]
     phy_reset_work+0x20/0x30 [libsas]
     process_one_work+0x1e4/0x460
     worker_thread+0x40/0x450
     kthread+0x12c/0x130
     ret_from_fork+0x10/0x18
    
    The key code process is like this:
    scsi_dec_host_busy
    	atomic_dec(&shost->host_busy);
    	if (unlikely(scsi_host_in_recovery(shost))) {
    		spin_lock_irqsave(shost->host_lock, flags);
    		...
    		scsi_eh_wakeup(shost)
    		...
    	}
    
    scsi_schedule_eh
    	spin_lock_irqsave(shost->host_lock, flags);
    	if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
    	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
    		...
    		scsi_eh_wakeup(shost);
    	}
    
    scsi_eh_wakeup
    	if (scsi_host_busy(shost) == shost->host_failed)
    		wake_up_process(shost->ehandler);
    
    In scsi_dec_host_busy, host_busy & shost_state not in spinlock. Neither
    function wakes up the SCSI error handler in the following timing:
    
    CPU 0(call scsi_dec_host_busy)    CPU 1(call scsi_schedule_eh)
    LOAD shost_state(!=recovery)
                                      scsi_host_set_state(SHOST_RECOVERY)
                                      scsi_eh_wakeup(host_busy != host_failed)
    atomic_dec(&shost->host_busy);
    if (scsi_host_in_recovery(shost))
    
    Add a smp_mb between host_busy and shost_state.
    Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
    [yan: backport from 5.0]
    Signed-off-by: NJason Yan <yanaijie@huawei.com>
    Reviewed-by: NMiao Xie <miaoxie@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    c2b89ae5
scsi_error.c 67.8 KB