提交 45fa613f 编写于 作者: M Mike Christie 提交者: Zheng Zengkai

scsi: core: sysfs: Fix hang when device state is set via sysfs

stable inclusion
form stable-v5.10.82
commit a792e0128d232251edb5fdf42fb0f9fbb0b44a73
bugzilla: 185877 https://gitee.com/openeuler/kernel/issues/I4QU6V

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a792e0128d232251edb5fdf42fb0f9fbb0b44a73

--------------------------------

[ Upstream commit 4edd8cd4 ]

This fixes a regression added with:

commit f0f82e24 ("scsi: core: Fix capacity set to zero after
offlinining device")

The problem is that after iSCSI recovery, iscsid will call into the kernel
to set the dev's state to running, and with that patch we now call
scsi_rescan_device() with the state_mutex held. If the SCSI error handler
thread is just starting to test the device in scsi_send_eh_cmnd() then it's
going to try to grab the state_mutex.

We are then stuck, because when scsi_rescan_device() tries to send its I/O
scsi_queue_rq() calls -> scsi_host_queue_ready() -> scsi_host_in_recovery()
which will return true (the host state is still in recovery) and I/O will
just be requeued. scsi_send_eh_cmnd() will then never be able to grab the
state_mutex to finish error handling.

To prevent the deadlock move the rescan-related code to after we drop the
state_mutex.

This also adds a check for if we are already in the running state. This
prevents extra scans and helps the iscsid case where if the transport class
has already onlined the device during its recovery process then we don't
need userspace to do it again plus possibly block that daemon.

Link: https://lore.kernel.org/r/20211105221048.6541-3-michael.christie@oracle.com
Fixes: f0f82e24 ("scsi: core: Fix capacity set to zero after offlinining device")
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: lijinlin <lijinlin3@huawei.com>
Cc: Wu Bo <wubo40@huawei.com>
Reviewed-by: NLee Duncan <lduncan@suse.com>
Reviewed-by: NWu Bo <wubo40@huawei.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
上级 c1d313c4
...@@ -796,6 +796,7 @@ store_state_field(struct device *dev, struct device_attribute *attr, ...@@ -796,6 +796,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
int i, ret; int i, ret;
struct scsi_device *sdev = to_scsi_device(dev); struct scsi_device *sdev = to_scsi_device(dev);
enum scsi_device_state state = 0; enum scsi_device_state state = 0;
bool rescan_dev = false;
for (i = 0; i < ARRAY_SIZE(sdev_states); i++) { for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
const int len = strlen(sdev_states[i].name); const int len = strlen(sdev_states[i].name);
...@@ -814,7 +815,16 @@ store_state_field(struct device *dev, struct device_attribute *attr, ...@@ -814,7 +815,16 @@ store_state_field(struct device *dev, struct device_attribute *attr,
} }
mutex_lock(&sdev->state_mutex); mutex_lock(&sdev->state_mutex);
if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING) {
ret = count;
} else {
ret = scsi_device_set_state(sdev, state); ret = scsi_device_set_state(sdev, state);
if (ret == 0 && state == SDEV_RUNNING)
rescan_dev = true;
}
mutex_unlock(&sdev->state_mutex);
if (rescan_dev) {
/* /*
* If the device state changes to SDEV_RUNNING, we need to * If the device state changes to SDEV_RUNNING, we need to
* run the queue to avoid I/O hang, and rescan the device * run the queue to avoid I/O hang, and rescan the device
...@@ -823,11 +833,9 @@ store_state_field(struct device *dev, struct device_attribute *attr, ...@@ -823,11 +833,9 @@ store_state_field(struct device *dev, struct device_attribute *attr,
* blk_mq_freeze_queue_wait() and because that call may be * blk_mq_freeze_queue_wait() and because that call may be
* waiting for pending I/O to finish. * waiting for pending I/O to finish.
*/ */
if (ret == 0 && state == SDEV_RUNNING) {
blk_mq_run_hw_queues(sdev->request_queue, true); blk_mq_run_hw_queues(sdev->request_queue, true);
scsi_rescan_device(dev); scsi_rescan_device(dev);
} }
mutex_unlock(&sdev->state_mutex);
return ret == 0 ? count : -EINVAL; return ret == 0 ? count : -EINVAL;
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册