- 24 9月, 2016 2 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christoph Hellwig 提交于
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 9月, 2016 1 次提交
-
-
由 Sagi Grimberg 提交于
Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly try to stop/free rdma qps/cm_ids that are already freed. Fixes: e89ca58f ("nvme-rdma: add DELETING queue flag") Reported-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 9月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 9月, 2016 3 次提交
-
-
由 Colin Ian King 提交于
If there is an error on req->mr, req->mr is set to null, however the following statement sets req->mr->need_inval causing a null pointer dereference. Fix this by bailing out to label 'out' to immediately return and hence skip over the offending null pointer dereference. Fixes: f5b7b559 ("nvme-rdma: Get rid of duplicate variable") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Steve Wise 提交于
Change nvme-rdma to use the IB Client API to detect device removal. This has the wonderful benefit of being able to blow away all the ib/rdma_cm resources for the device being removed. No craziness about not destroying the cm_id handling the event. No deadlocks due to broken iw_cm/rdma_cm/iwarp dependencies. And no need to have a bound cm_id around during controller recovery/reconnect to catch device removal events. We don't use the device_add aspect of the ib_client service since we only want to create resources for an IB device if we have a target utilizing that device. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
When we get a surprise disconnect from the target we queue a periodic reconnect (which is the sane thing to do...). We only move the queues out of CONNECTED when we retry to reconnect (after 10 seconds in the default case) but we stop the blk queues immediately so we are not bothered with traffic from now on. If delete() is kicking off in this period the queues are still in CONNECTED state. Part of the delete sequence is trying to issue ctrl shutdown if the admin queue is CONNECTED (which it is!). This request is issued but stuck in blk-mq waiting for the queues to start again. This might be the one preventing us from forward progress... The patch separates the queue flags to CONNECTED and DELETING. Now we will move out of CONNECTED as soon as error recovery kicks in (before stopping the queues) and DELETING is on when we start the queue deletion. Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 04 9月, 2016 2 次提交
-
-
由 Steve Wise 提交于
After address resolution, the nvme_rdma_queue rdma resources are allocated. If rdma route resolution or the connect fails, or the controller reconnect times out and gives up, then the rdma resources need to be freed. Otherwise, rdma resources are leaked. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimbrg.me> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Steve Wise 提交于
Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimbrg.me> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 28 8月, 2016 2 次提交
-
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We already have need_inval in ib_mr, lets use that instead. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 18 8月, 2016 2 次提交
-
-
由 Jay Freyensee 提交于
Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as a 0-based value. Also per spec, the RDMA binding values shall be set to sqsize, which makes hsqsize 0-based values. Thus, the sqsize during NVMf connect() is now: [root@fedora23-fabrics-host1 for-48]# dmesg [ 318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for admin queue: 31 [ 318.720884] nvme nvme0: creating 16 I/O queues. [ 318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o queue: 127 Finally, current interpretation implies hrqsize is 1's based so set it appropriately. Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Jay Freyensee 提交于
Upon admin queue connect(), the rdma qp was being set based on NVMF_AQ_DEPTH. However, the fabrics layer was using the sqsize field value set for I/O queues for the admin queue, which threw the nvme layer and rdma layer off-whack: root@fedora23-fabrics-host1 nvmf]# dmesg [ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize being sent is: 128 [ 3507.798858] nvme nvme0: creating 16 I/O queues. [ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr 192.168.1.3:4420 Thus, to have a different admin queue value, we use NVMF_AQ_DEPTH for connect() and RDMA private data as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec (and in that RDMA private data we treat hrqsize as 1's-based value, per current understanding of the fabrics spec). Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: NDaniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 16 8月, 2016 1 次提交
-
-
由 Colin Ian King 提交于
ret is not initialized so it contains garbage. Ensure garbage is not returned by initializing rc to 0. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 04 8月, 2016 2 次提交
-
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
-
由 Sagi Grimberg 提交于
When we reset or reconnect to a controller, we are cancelling the async event handler so we can safely re-establish resources, but we need to remember to start it again when we successfully reconnect. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 03 8月, 2016 6 次提交
-
-
由 Sagi Grimberg 提交于
Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it). Instead, check that the admin queue is connected. Note that it is safe because we can never see a copmeting thread trying to destroy the admin queue (reset or delete controller). Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If we wait until we free the controller (free_ctrl) we might lose our rdma device without any notification while we still have open resources (tags mrs and dma mappings). Instead, destroy the tags with their rdma resources once we delete the device and not when freeing it. Note that we don't do that in nvme_rdma_shutdown_ctrl because controller reset uses it as well and we want to give active I/O a chance to complete successfully. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
nvme_uninit_ctrl already does that for us. Note that we reordered nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly fine because we actually want ctrl uninit (aen, scan cancellation and namespaces removal) to happen before we shutdown the rdma resources. Also, centralize the deletion work and the dead controller removal work code duplication into __nvme_rdma_shutdown_ctrl that accepts a shutdown boolean. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Device removal sequence may have crashed because the controller (and admin queue space) was freed before we destroyed the admin queue resources. Thus we want to destroy the admin queue and only then queue controller deletion and wait for it to complete. More specifically we: 1. own the controller deletion (make sure we are not competing with another deletion). 2. get rid of inflight reconnects if exists (which also destroy and create queues). 3. destroy the queue. 4. safely queue controller deletion (and wait for it to complete). Reported-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
On an ordered target shutdown, the target can send a AEN on a namespace removal, this will trigger the host to queue ns-list query. The shutdown will trigger error recovery which will attepmt periodic reconnect. We can hit a race where the ns rescanning fails (error recovery kicked in and we're not connected) causing removing all the namespaces and when we reconnect we won't see any namespaces for this controller. So, queue a namespace rescan after we successfully reconnected to the target. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Roland Dreier 提交于
Zero out the full nvme_rdma_cm_req structure before sending it. Otherwise we end up leaking kernel memory in the reserved field, which might break forward compatibility in the future. Signed-off-by: NRoland Dreier <roland@purestorage.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 12 7月, 2016 2 次提交
-
-
由 Sagi Grimberg 提交于
Always use the maximum qp retry count as the error recovery timeout is dictated from the nvme keep-alive. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Wei Yongjun 提交于
PTR_ERR should be applied before its argument is reassigned, otherwise the return value will be set to 0, not error code. Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 7月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
This patch implements the RDMA host (initiator in SCSI speak) driver. It can be used to connect to remote NVMe over Fabrics controllers over Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well a the new fabrics library. To connect to all NVMe over Fabrics controller reachable on a given taget port using RDMA/CM use the following command: nvme connect-all -t rdma -a $IPADDR This requires the latest version of nvme-cli with Fabrics support. Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com> Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-