提交 c2b89ae5 编写于 作者: Z zhengbin 提交者: Xie XiuQi

scsi: fix ata_port_wait_eh() hang caused by missing to wake up eh thread

hulk inclusion
category: bugfix
bugzilla: 12843
CVE: NA

---------------------------

When I use fio test kernel in the following steps:
1.The sas controller mixes SAS/SATA disks
2.Use fio test all disks
3.Simultaneous enable/disable/link_reset/hard_reset PHY

it will hang in ata_port_wait_eh
Call trace:
 __switch_to+0xb4/0x1b8
 __schedule+0x1e8/0x718
 schedule+0x38/0x90
 ata_port_wait_eh+0x70/0xf8
 sas_ata_wait_eh+0x24/0x30 [libsas]
 transport_sas_phy_reset.isra.3+0x128/0x160 [libsas]
 phy_reset_work+0x20/0x30 [libsas]
 process_one_work+0x1e4/0x460
 worker_thread+0x40/0x450
 kthread+0x12c/0x130
 ret_from_fork+0x10/0x18

The key code process is like this:
scsi_dec_host_busy
	atomic_dec(&shost->host_busy);
	if (unlikely(scsi_host_in_recovery(shost))) {
		spin_lock_irqsave(shost->host_lock, flags);
		...
		scsi_eh_wakeup(shost)
		...
	}

scsi_schedule_eh
	spin_lock_irqsave(shost->host_lock, flags);
	if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
		...
		scsi_eh_wakeup(shost);
	}

scsi_eh_wakeup
	if (scsi_host_busy(shost) == shost->host_failed)
		wake_up_process(shost->ehandler);

In scsi_dec_host_busy, host_busy & shost_state not in spinlock. Neither
function wakes up the SCSI error handler in the following timing:

CPU 0(call scsi_dec_host_busy)    CPU 1(call scsi_schedule_eh)
LOAD shost_state(!=recovery)
                                  scsi_host_set_state(SHOST_RECOVERY)
                                  scsi_eh_wakeup(host_busy != host_failed)
atomic_dec(&shost->host_busy);
if (scsi_host_in_recovery(shost))

Add a smp_mb between host_busy and shost_state.
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
[yan: backport from 5.0]
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
上级 f73a53f1
...@@ -88,6 +88,13 @@ void scsi_schedule_eh(struct Scsi_Host *shost) ...@@ -88,6 +88,13 @@ void scsi_schedule_eh(struct Scsi_Host *shost)
if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 || if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) { scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
/*
* We have to order shost_state store above and test of
* the host_busy(scsi_eh_wakeup will test it), because
* scsi_dec_host_busy accesses these variables without
* host_lock.
*/
smp_mb__before_atomic();
shost->host_eh_scheduled++; shost->host_eh_scheduled++;
scsi_eh_wakeup(shost); scsi_eh_wakeup(shost);
} }
......
...@@ -346,6 +346,11 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost) ...@@ -346,6 +346,11 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost)
rcu_read_lock(); rcu_read_lock();
atomic_dec(&shost->host_busy); atomic_dec(&shost->host_busy);
/*
* We have to order host_busy dec above and test of the shost_state
* below outside the host_lock.
*/
smp_mb__after_atomic();
if (unlikely(scsi_host_in_recovery(shost))) { if (unlikely(scsi_host_in_recovery(shost))) {
spin_lock_irqsave(shost->host_lock, flags); spin_lock_irqsave(shost->host_lock, flags);
if (shost->host_failed || shost->host_eh_scheduled) if (shost->host_failed || shost->host_eh_scheduled)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册