- 22 1月, 2014 1 次提交
-
-
由 Bart Van Assche 提交于
The rport timers must be stopped before the SRP initiator destroys the resources associated with the SCSI host. This is necessary because otherwise the callback functions invoked from the SRP transport layer could trigger a use-after-free. Stopping the rport timers before invoking scsi_remove_host() can trigger long delays in the SCSI error handler if a transport layer failure occurs while scsi_remove_host() is in progress. Hence move the code for stopping the rport timers from srp_rport_release() into a new function and invoke that function after scsi_remove_host() has finished. This patch fixes the following sporadic kernel crash: kernel BUG at include/asm-generic/dma-mapping-common.h:64! invalid opcode: 0000 [#1] SMP RIP: 0010:[<ffffffffa03b20b1>] [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp] Call Trace: [<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp] [<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp] [<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp] [<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp] [<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp] [<ffffffff8108b370>] worker_thread+0x170/0x2a0 [<ffffffff81090876>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 09 11月, 2013 13 次提交
-
-
由 Bart Van Assche 提交于
The IB spec does not guarantee that the opcode is available in error completions. Hence do not rely on it. See also commit 948d1e88 ("IB/srp: Introduce srp_handle_qp_err()"). Signed-off-by: NBart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
If SCSI commands are submitted with a SCSI request timeout that is lower than the the IB RC timeout, it can happen that the SCSI error handler has already started device recovery before transport layer error handling starts. So it can happen that the SCSI error handler tries to abort a SCSI command after it has been reset by srp_rport_reconnect(). Tell the SCSI error handler that such commands have finished and that it is not necessary to continue its recovery strategy for commands that have been reset by srp_rport_reconnect(). Signed-off-by: NBart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Vu Pham 提交于
Remove an SRP target from the SRP target list before invoking the last scsi_host_put() call. This change is necessary because that last put frees the memory that holds the srp_target_port structure. This patch prevents the following kernel oops: RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570 Call Trace: [<ffffffff810b11e4>] lock_acquire+0xa4/0x120 [<ffffffff81531206>] _spin_lock+0x36/0x70 [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp] [<ffffffff8109125c>] worker_thread+0x21c/0x3d0 [<ffffffff81096e86>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 Signed-off-by: NVu Pham <vuhuong@mellanox.com> [ bvanassche - Modified path description and CC'ed stable. ] Signed-off-by: NBart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Jack Wang 提交于
Currently, it's not possible to change queue depth for a device behind SRP host. Sometimes, we need to adjust queue_depth for performance reason (eg storage busy, we need lower queue_depth to avoid running into SCSI error handler), so this patch add support for SRP driver. Signed-off-by: NJack Wang <jinpu.wang@profitbricks.com> Tested-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Tested-by: NJack Wang <xjtuwjp@gmail.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
This patch does not change any functionality. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: Vu Pham <vu@mellanox.com> Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
On an initiator system with multiple IB ports it is not yet possible to figure out what the originating port of an SRP connection is. Hence make the source GID available in sysfs. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
After a transport layer occurred, periodically try to reconnect to the target until the dev_loss timer expires. Protect the callback functions that can be invoked from inside the SCSI EH against concurrent invocation with srp_reconnect_rport() via the rport mutex. Change the default dev_loss_tmo from 60s into 600s to give the reconnect mechanism a chance to kick in. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Add support for periodically reconnecting to an SRP target until the dev_loss timer expires. After the tenth reconnection attempt, gradually slow down subsequent reconnect attempts. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these parameters. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback could get invoked after srp_remove_host() has finished. In other words, without this patch the queuecommand callback can get invoked after the rport data structure has been freed. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Vu Pham 提交于
Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count allows to reduce the path failover time. Signed-off-by: NVu Pham <vu@mellanox.com> [ bvanassche: Rewrote patch description / changed default retry count ] Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 12 7月, 2013 1 次提交
-
-
由 Bart Van Assche 提交于
If the transport layer is offline it is more appropriate to let srp_abort() return FAST_IO_FAIL instead of SUCCESS. Reported-by: NSebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 02 7月, 2013 4 次提交
-
-
由 Vu Pham 提交于
Signed-off-by: NVu Pham <vu@mellanox.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Several InfiniBand HCAs allow configuring the completion vector per CQ. This allows spreading the workload created by IB completion interrupts over multiple MSI-X vectors and hence over multiple CPU cores. In other words, configuring the completion vector properly not only allows reducing latency on an initiator connected to multiple SRP targets but also allows improving throughput. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
An SRP target is required to maintain a single connection between initiator and target. This means that if the 'add_target' attribute is used to create a second connection to a target, the first connection will be logged out and that the SCSI error handler will kick in. The SCSI error handler will cause the SRP initiator to reconnect, which will cause I/O over the second connection to fail. Avoid such ping-pong behavior by disabling relogins. If reconnecting manually is necessary, that is possible by deleting and recreating an rport via sysfs. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NSebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
If reconnecting failed we know that no command completion will be received anymore. Hence let the SCSI error handler fail such commands immediately. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 28 6月, 2013 3 次提交
-
-
由 Bart Van Assche 提交于
The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSebastian Riemer <sebastian.riemer@profitbricks.com> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
The SCSI error handler assumes that the transport layer is operational if an eh_abort_handler() returns SUCCESS. Hence srp_abort() only should return SUCCESS if sending the ABORT TASK task management function succeeded. This patch avoids the SCSI error handler skipping the srp_reset_host() call after a transport layer error. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Dotan Barak 提交于
If the add_one callback fails during driver load no resources are allocated so there isn't a need to release any resources. Trying to clean the resource may lead to the following kernel panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp] RIP: 0010:[<ffffffffa0132331>] [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp] Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0) Call Trace: [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core] [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp] [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: NEli Cohen <eli@mellanox.co.il> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NSebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 26 2月, 2013 4 次提交
-
-
由 Bart Van Assche 提交于
If an SRP target is no longer reachable and srp_reset_host() fails to reconnect then ib_srp will invoke scsi_remove_host(). That function will invoke __scsi_remove_device() for each LUN. And that last function will change the device state from SDEV_TRANSPORT_OFFLINE into SDEV_CANCEL. Certain user space software, e.g. older versions of multipathd, continue queueing I/O to SCSI devices that are in the SDEV_CANCEL state. If these I/O requests are submitted as SG_IO that means that the REQ_PREEMPT flag will be set and hence that these requests will be passed to srp_queuecommand(). These requests will time out. If new requests are queued fast enough from user space these active requests will prevent __scsi_remove_device() to finish. Avoid this by failing I/O requests in the SDEV_CANCEL state if the transport is offline. Introduce a new variable to keep track of the transport state instead of failing requests if (!target->connected || target->qp_in_error), so that the SCSI error handler has a chance to retry commands after a transport layer failure occurred. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
If a SCSI command times out it is passed to the SCSI error handler. The SCSI error handler will try to abort the commands that timed out. If aborting fails, a device reset will be attempted. If the device reset also fails a host reset will be attempted. If the host reset also fails the whole procedure will be repeated. srp_abort() and srp_reset_device() fail for a QP in the error state. srp_reset_host() fails after host removal has started. Hence if the SCSI error handler gets invoked after host removal has started and with the QP in the error state an endless loop will be triggered. Modify the SCSI error handling functions in ib_srp as follows: - Abort SCSI commands properly even if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even after host removal has already started or if reconnecting fails. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dave@thedillows.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Do not send a task management function if sending will fail anyway because either there is no RDMA/RC connection or the QP is in the error state. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dave@thedillows.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target(). Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dave@thedillows.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 01 12月, 2012 12 次提交
-
-
由 Bart Van Assche 提交于
Make it possible to disconnect the IB RC connection used by the SRP protocol to communicate with a target. Have the SRP transport layer create a sysfs "delete" attribute for initiator drivers that support this functionality. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Vu Pham 提交于
Now that SRP recreates the CM ID, QP, and CQ for each connection, there is no need to wait for the timewait state to complete. Signed-off-by: NVu Pham <vu@mellanox.com> Signed-off-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Ishai Rabinovitz 提交于
HW QP FATAL errors persist over a reset operation, but we can recover from that by recreating the QP and associated CQs for each connection. Creating a new QP/CQ also completely forecloses any possibility of getting stale completions or packets on the new connection. Signed-off-by: NIshai Rabinovitz <ishai@mellanox.co.il> Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il> [ updated to current code from OFED, cleaned up commit message ] Signed-off-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Only queue removal work after having changed the target state into SRP_TARGET_REMOVED and not if that state was already equal to SRP_TARGET_REMOVED. That allows us to remove the state SRP_TARGET_DEAD. Add a call to srp_disconnect_target() in srp_remove_target() -- due to previous changes it is now safe to invoke that function even if the IB connection has already been disconnected. This change allows us to replace the target removal code in srp_remove_one() by an (indirect) call to srp_remove_target(). Rename srp_target_port.work into srp_target_port.remove_work to reflect its usage. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Keep track of the connection state. Only report QP errors while connected. Only invoke ib_send_cm_dreq() when connected so that invoking srp_disconnect_target() after having received a DREQ does not cause an error message to be printed. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
If the RDMA RC connection is closed, tell the SCSI mid-layer to terminate all pending commands instead of only the first. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Introduce the function srp_handle_qp_err(), change the type of qp_in_error from int into bool and move the initialization of that variable from srp_reconnect_target() to srp_connect_target(). Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Since scsi_remove_host() has been modified so that SCSI error handling functions will no longer be invoked after scsi_remove_host() returns, the test at the start of srp_send_tsk_mgmt() is now superfluous. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()). Make sure that these commands have a chance to reach the SCSI device. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Block the SCSI host while reconnecting instead of representing the reconnection activity as a distinct SRP target state. This allows us to eliminate the target state SRP_TARGET_CONNECTING. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
Increase the block layer timeout for disks so that it is above the InfiniBand transport layer timeout. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 01 10月, 2012 2 次提交
-
-
由 Bart Van Assche 提交于
We need to call scsi_done() for commands after we abort them. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Cc: <stable@vger.kernel.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Bart Van Assche 提交于
srp_free_req() uses the scsi_cmnd structure contents to unmap buffers, so we must invoke srp_free_req() before we release ownership of that structure. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NDavid Dillow <dillowda@ornl.gov> Cc: <stable@vger.kernel.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-