[SCSI] zfcp: Recover from stalled outbound queue

Depending on interruptions on some storage systems, the complete channel can stall which looks like an outbound queue stall to Linux. When trying to acquire a free SBAL for a non-SCSI command, zfcp waits for 5 seconds for a free slot to appear. This is the right place to detect a queue stall: If the wait times out, we assume a stalled queue and try to recover this. The overall strategy should be to trigger the erp from specific events, and not try an overall escalation from one failed port to a full-blown queue recovery. If we manage to send a command, the status codes for this command or a timeout will trigger the right follow-on actions. Reviewed-by: N Swen Schillig <swen@vnet.ibm.com> Signed-off-by: N Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: N James Bottomley <James.Bottomley@HansenPartnership.com>

[SCSI] zfcp: Recover from stalled outbound queue
Depending on interruptions on some storage systems, the complete channel can stall which looks like an outbound queue stall to Linux. When trying to acquire a free SBAL for a non-SCSI command, zfcp waits for 5 seconds for a free slot to appear. This is the right place to detect a queue stall: If the wait times out, we assume a stalled queue and try to recover this. The overall strategy should be to trigger the erp from specific events, and not try an overall escalation from one failed port to a full-blown queue recovery. If we manage to send a command, the status codes for this command or a timeout will trigger the right follow-on actions. Reviewed-by: N Swen Schillig <swen@vnet.ibm.com> Signed-off-by: N Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: N James Bottomley <James.Bottomley@HansenPartnership.com>
cbf1ed02 · Christof Schmitt · James Bottomley · 85600f7f · cbf1ed02
隐藏空白更改
内联并排

Showing with 4 addition and 1 deletion

drivers/s390/scsi/zfcp_fsf.c drivers/s390/scsi/zfcp_fsf.c +4 -1

未找到文件。
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -670,8 +670,11 @@ static int zfcp_fsf_req_sbal_get(struct zfcp_adapter *adapter)
 			       zfcp_fsf_sbal_check(adapter), 5 * HZ);
 	if (ret > 0)
 		return 0;
-	if (!ret)
+	if (!ret) {
 		atomic_inc(&adapter->qdio_outb_full);
+		/* assume hanging outbound queue, try queue recovery */
+		zfcp_erp_adapter_reopen(adapter, 0, "fsrsg_1", NULL);
+	}

 	spin_lock_bh(&adapter->req_q_lock);
 	return -EIO;