- 26 11月, 2017 5 次提交
-
-
由 Israel Rukshin 提交于
Currently, blk_mq_tagset_iter() iterate over initial hctx tags only. If an I/O scheduler is used, it doesn't iterate the hctx scheduler tags and the static request aren't been updated. For example, while using NVMe over Fabrics RDMA host, this cause us not to reinit the scheduler requests and thus not re-register all the memory regions during the tagset re-initialization in the reconnect flow. This may lead to a memory registration error: "MEMREG for CQE 0xffff88044c14dce8 failed with status memory management operation error (6)" With this commit we don't need to reinit the requests, and thus fix this failure. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If we got a remote invalidation on a bogus rkey, this is a protocol error. Fail the connection in this case. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We must not complete a request before the host memory region is invalidated. Luckily we have send with invalidate protocol support so we usually don't need to execute it, but in case the target did not invalidate a memory region for us, we must wait for the invalidation to complete before unmapping host memory and completing the I/O. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
In order to guarantee that the HCA will never get an access violation (either from invalidated rkey or from iommu) when retrying a send operation we must complete a request only when both send completion and the nvme cqe has arrived. We need to set the send/recv completions flags atomically because we might have more than a single context accessing the request concurrently (one is cq irq-poll context and the other is user-polling used in IOCB_HIPRI). Only then we are safe to invalidate the rkey (if needed), unmap the host buffers, and complete the IO. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
The entire completions suppress mechanism is currently broken because the HCA might retry a send operation (due to dropped ack) after the nvme transaction has completed. In order to handle this, we signal all send completions and introduce a separate done handler for async events as they will be handled differently (as they don't include in-capsule data by definition). Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 20 11月, 2017 1 次提交
-
-
由 Sagi Grimberg 提交于
When the fabrics queue is not alive and fully functional, no commands should be allowed to pass but connect (which moves the queue to a fully functional state). Any other command should be failed, with either temporary status BLK_STS_RESOUCE or permanent status BLK_STS_IOERR. This is shared across all fabrics, hence move the check to fabrics library. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 11 11月, 2017 3 次提交
-
-
由 Keith Busch 提交于
The driver can handle tracking only one AEN request, so this patch removes handling for multiple ones. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
All the transports were unnecessarilly duplicating the AEN request accounting. This patch defines everything in one place. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Max Gurtovoy 提交于
QP object is created using rdma_cm api, therefore the destruction should use the same api for symmetry. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 11月, 2017 4 次提交
-
-
由 Sagi Grimberg 提交于
instead of just queueing delete work Reported-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
No change in behavior except that the FC code cancels two work items a little later now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
It is only used in two places, and some of the work done by it will be taken into common code soon. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Move the ->delete_work and the associated helpers to common code instead of duplicating them in every driver. This also adds the missing reference get/put for the loop driver. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 27 10月, 2017 2 次提交
-
-
由 James Smart 提交于
Adds support for the duplicate_connect option. When set to true, checks whether there's an existing controller via the same target address (traddr), target port (trsvcid), and if specified, host address (host_traddr). Fails the connection request if there is an existing controller. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Instead of allocating a separate struct device for the character device handle embedd it into struct nvme_ctrl and use it for the main controller refcounting. This removes double refcounting and gets us an automatic reference for the character device operations. We keep ctrl->device as a pointer for now to avoid chaning printks all over, but in the future we could look into message printing helpers that take a controller structure similar to what other subsystems do. Note the delete_ctrl operation always already has a reference (either through sysfs due this change, or because every open file on the /dev/nvme-fabrics node has a refernece) when it is entered now, so we don't need to do the unless_zero variant there. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
- 23 10月, 2017 3 次提交
-
-
由 Nitzan Carmi 提交于
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
nvme_rdma_queue_is_ready() fails requests in case a queue is not LIVE. If the controller is in RECONNECTING state, we might be in this state for a long time (until we successfully reconnect) and we are better off with failing the request fast. Otherwise, we fail with BLK_STS_RESOURCE to have the block layer try again soon. In case we are removing the controller when the admin queue is not LIVE, we will terminate the request with BLK_STS_RESOURCE but it happens before we call blk_mq_start_request() so the request timeout never expires, and the queue will never get back to LIVE (because we are removing the controller). This causes the removal operation to block infinitly [1]. Thus, if we are removing (state DELETING), and the queue is not LIVE, we need to fail the request permanently as there is no chance for it to ever complete successfully. [1] -- sysrq: SysRq : Show Blocked State task PC stack pid father kworker/u66:2 D 0 440 2 0x80000000 Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma] Call Trace: __schedule+0x3e9/0xb00 schedule+0x40/0x90 schedule_timeout+0x221/0x580 io_schedule_timeout+0x1e/0x50 wait_for_completion_io_timeout+0x118/0x180 blk_execute_rq+0x86/0xc0 __nvme_submit_sync_cmd+0x89/0xf0 nvmf_reg_write32+0x4b/0x90 [nvme_fabrics] nvme_shutdown_ctrl+0x41/0xe0 nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma] nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma] nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma] process_one_work+0x1fd/0x630 worker_thread+0x1db/0x3b0 kthread+0x11e/0x150 ret_from_fork+0x27/0x40 01 D 0 2868 2862 0x00000000 Call Trace: __schedule+0x3e9/0xb00 schedule+0x40/0x90 schedule_timeout+0x260/0x580 wait_for_completion+0x108/0x170 flush_work+0x1e0/0x270 nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma] nvme_sysfs_delete+0x2a/0x40 dev_attr_store+0x18/0x30 sysfs_kf_write+0x45/0x60 kernfs_fop_write+0x124/0x1c0 __vfs_write+0x28/0x150 vfs_write+0xc7/0x1b0 SyS_write+0x49/0xa0 entry_SYSCALL_64_fastpath+0x18/0xad -- Reported-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 19 10月, 2017 12 次提交
-
-
由 Sagi Grimberg 提交于
We should make sure to escelate allocation failures to prevent a use-after-free in nvmf_create_ctrl. Fixes: b28a308e ("nvme-rdma: move tagset allocation to a dedicated routine") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
The fact that we free the async event buffer in nvme_rdma_destroy_admin_queue can cause us to free it more than once because this happens in every reconnect attempt since commit 31fdf184. we rely on the queue state flags DELETING to avoid this for other resources. A more complete fix is to not destroy the admin/io queues unconditionally on every reconnect attempt, but its a bit more extensive and will go in the next release. Fixes: 31fdf184 ("nvme-rdma: reuse configure/destroy_admin_queue") Reported-by: NYi Zhang <yi.zhang@redhat.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
Since commit b86dd815 "block: get rid of blk-mq default scheduler choice Kconfig entries", when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler. This flag is useful for admin queues that aren't used for normal IO. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If the controller is deleting (in case the user decided to delete it), we have no point to continue reset sequence. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Instead of marking we are deleting, mark we are allocated and check that instead. This makes the logic symmetrical to connected mark check. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
No chance for the local invalidate to succeed if the queue-pair is in error state. Most likely the target will do a remote invalidation of our mr so not a big loss on the test_bit. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Relying on the queue state while tearing down on every reconnect attempt is not a good design. We should do it once in err_work and simply try to establish the queues for each reconnect attempt. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Warn if req->mr is NULL as it should never happen. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
No need for the extra line for trivial assignments. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Not necessarily address resolution failed. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Instead of flagging admin/io. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Move blk_mq_reinit_tagset from blk-mq to nvme core as the only user of it. Current transports that use it (rdma, fc) simply implement .reinit_request op. This patch does not change any functionality. Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 26 9月, 2017 2 次提交
-
-
由 Sagi Grimberg 提交于
By calling nvme_stop_ctrl on a already failed controller will wait for the scan work to complete (only by identify timeout expiration which is 60 seconds). This is unnecessary when we already know that the controller has failed. Reported-by: NYi Zhang <yizhan@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
If we failed to transition to state LIVE after a successful reconnect, then controller deletion already started. In this case there is no point moving forward with reconnect. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 8月, 2017 1 次提交
-
-
由 Max Gurtovoy 提交于
Due to various page sizes in the system (IOMMU/device/kernel), we set the fabrics controller page size to 4k and block layer boundaries accordinglly. In architectures that uses different kernel page size we'll have a mismatch to the MR page size that may cause a mapping error. Update the MR page size to correspond to the core ctrl settings. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 29 8月, 2017 7 次提交
-
-
由 Max Gurtovoy 提交于
This patch slightly improves performance (mainly for small block sizes). Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
To make the nvme_rdma_configure_admin_queue generic in preparation of moving it to common code. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
No need to queue an extra work to indirect controller removal, just call the ctrl remove routine. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
This should pair with nvme_rdma_stop_queue. While this is not a complete inverse, it still pairs up pretty well because in fabrics we don't have a disconnect capsule (yet) but we simply teardown the transport association. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Give it a name symmetric to nvme_rdma_free_queue. Also pass in the ctrl sqsize+1 and not the opts queue_size. And suppress a superflous failure message. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
If we move the queues from LIVE state, we might as well stop them (drain for rdma). Do it after we stop the request queues to prevent a stray request sneaking in .queue_rq after we stop the queue. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Make a symmetrical handling with admin queue. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-