提交 d5bf4b7f 编写于 作者: S Sagi Grimberg 提交者: Christoph Hellwig

nvme-rdma: fix concurrent reset and reconnect

Now ctrl state machine allows to transition from RESETTING to
RECONNECTING.  In nvme-rdma when we receive a rdma cm DISONNECTED event,
we trigger nvme_rdma_error_recovery. This happens also when we execute a
controller reset, issue a cm diconnect request and receive a cm
disconnect reply, as a result, the reset work and the error recovery work
can run concurrently.

Until now the state machine prevented from the error recovery work from
running as a result of a controller reset (RESETTING -> RECONNECTING was
not allowed).

To fix this, we adopt the FC state machine approach, we always transition
from LIVE to RESETTING and only then to RECONNECTING.  We do this both
for the error recovery work and the controller reset work:

 1. transition to RESETTING
 2. teardown the controller association
 3. transition to RECONNECTING

This will restore the protection against reset work and error recovery work
from concurrently running together.

Fixes: 3cec7f9d ("nvme: allow controller RESETTING to RECONNECTING transition")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
上级 cee160fd
...@@ -974,12 +974,18 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work) ...@@ -974,12 +974,18 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q); blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_start_queues(&ctrl->ctrl); nvme_start_queues(&ctrl->ctrl);
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
/* state change failure should never happen */
WARN_ON_ONCE(1);
return;
}
nvme_rdma_reconnect_or_remove(ctrl); nvme_rdma_reconnect_or_remove(ctrl);
} }
static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl) static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
{ {
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING))
return; return;
queue_work(nvme_wq, &ctrl->err_work); queue_work(nvme_wq, &ctrl->err_work);
...@@ -1753,6 +1759,12 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work) ...@@ -1753,6 +1759,12 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
nvme_stop_ctrl(&ctrl->ctrl); nvme_stop_ctrl(&ctrl->ctrl);
nvme_rdma_shutdown_ctrl(ctrl, false); nvme_rdma_shutdown_ctrl(ctrl, false);
if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING)) {
/* state change failure should never happen */
WARN_ON_ONCE(1);
return;
}
ret = nvme_rdma_configure_admin_queue(ctrl, false); ret = nvme_rdma_configure_admin_queue(ctrl, false);
if (ret) if (ret)
goto out_fail; goto out_fail;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册