- 25 6月, 2020 1 次提交
-
-
由 Max Gurtovoy 提交于
The completion vector index that is given during CQ creation can't exceed the number of support vectors by the underlying RDMA device. This violation currently can accure, for example, in case one will try to connect with N regular read/write queues and M poll queues and the sum of N + M > num_supported_vectors. This will lead to failure in establish a connection to remote target. Instead, in that case, share a completion vector between queues. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 27 5月, 2020 2 次提交
-
-
由 Max Gurtovoy 提交于
For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end protection information passthrough and validation for NVMe over RDMA transport. Metadata offload support was implemented over the new RDMA signature verbs API and it is enabled for capable controllers. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Israel Rukshin 提交于
Remove first_sgl pointer from struct nvme_rdma_request and use pointer arithmetic instead. The inline scatterlist, if exists, will be located right after the nvme_rdma_request. This patch is needed as a preparation for adding PI support. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 31 3月, 2020 1 次提交
-
-
由 Israel Rukshin 提交于
Use a semicolon at the end of an assignment expression. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 26 3月, 2020 3 次提交
-
-
由 Israel Rukshin 提交于
The transition to LIVE state should not fail in case of a new controller. Moving to DELETING state before nvme_tcp_create_ctrl() allocates all the resources may leads to NULL dereference at teardown flow (e.g., IO tagset, admin_q, connect_q). Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
由 Israel Rukshin 提交于
Put the ctrl reference count at nvme_uninit_ctrl as opposed to nvme_init_ctrl which takes it. This decrease the reference count at the core layer instead of decreasing it on each transport separately. Also move the call of nvme_uninit_ctrl at PCI driver after calling to nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference count after using the dev. This is safe because those functions use nvme_dev which is freed only later at nvme_pci_free_ctrl. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
由 Israel Rukshin 提交于
In case nvme_sysfs_delete() is called by the user before taking the ctrl reference count, the ctrl may be freed during the creation and cause the bug. Take the reference as soon as the controller is externally visible, which is done by cdev_device_add() in nvme_init_ctrl(). Also take the reference count at the core layer instead of taking it on each transport separately. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 17 3月, 2020 1 次提交
-
-
由 Bart Van Assche 提交于
Move the get_unaligned_be24(), get_unaligned_le24() and put_unaligned_le24() definitions from various drivers into include/linux/unaligned/generic.h. Add a put_unaligned_be24() implementation. Link: https://lore.kernel.org/r/20200313203102.16613-4-bvanassche@acm.org Cc: Keith Busch <kbusch@kernel.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jens Axboe <axboe@fb.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> # For drivers/usb Reviewed-by: Felipe Balbi <balbi@kernel.org> # For drivers/usb/gadget Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 11 3月, 2020 1 次提交
-
-
由 Prabhath Sajeepa 提交于
The timeout of identify cmd, which is invoked as part of admin queue creation, can result in freeing of async event data both in nvme_rdma_timeout handler and error handling path of nvme_rdma_configure_admin queue thus causing NULL pointer reference. Call Trace: ? nvme_rdma_setup_ctrl+0x223/0x800 [nvme_rdma] nvme_rdma_create_ctrl+0x2ba/0x3f7 [nvme_rdma] nvmf_dev_write+0xa54/0xcc6 [nvme_fabrics] __vfs_write+0x1b/0x40 vfs_write+0xb2/0x1b0 ksys_write+0x61/0xd0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x60/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reviewed-by: NRoland Dreier <roland@purestorage.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPrabhath Sajeepa <psajeepa@purestorage.com> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 15 2月, 2020 1 次提交
-
-
由 Nigel Kirkland 提交于
Delayed keep alive work is queued on system workqueue and may be cancelled via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq. Check_flush_dependency detects mismatched attributes between the work-queue context used to cancel the keep alive work and system-wq. Specifically system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used to cancel keep alive work have WQ_MEM_RECLAIM flag. Example warning: workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq. However this creates a secondary concern where work and a request to cancel that work may be in the same work queue - namely err_work in the rdma and tcp transports, which will want to flush/cancel the keep alive work which will now be on nvme_wq. After reviewing the transports, it looks like err_work can be moved to nvme_reset_wq. In fact that aligns them better with transition into RESETTING and performing related reset work in nvme_reset_wq. Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq. Signed-off-by: NNigel Kirkland <nigel.kirkland@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 11月, 2019 1 次提交
-
-
由 Israel Rukshin 提交于
nvme_rdma_alloc_tagset() preallocates a big buffer for the IO SGL based on SG_CHUNK_SIZE. Modern DMA engines are often capable of dealing with very big segments so the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB SGL allocation per command. If a controller has lots of deep queues, preallocation for the sg list can consume substantial amounts of memory. For nvme-rdma, nr_hw_queues can be 128 and each queue's depth 128. This means the resulting preallocation for the data SGL is 128*128*4K = 64MB per controller. Switch to runtime allocation for SGL for lists longer than 2 entries. This is the approach used by NVMe PCI so it should be reasonable for NVMeOF as well. Runtime SGL allocation has always been the case for the legacy I/O path so this is nothing new. The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't support SG_CHAIN, use only runtime allocation for the SGL. We didn't notice of a performance degradation, since for small IOs we'll use the inline SG and for the bigger IOs the allocation of a bigger SGL from slab is fast enough. Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 05 11月, 2019 3 次提交
-
-
由 Max Gurtovoy 提交于
In case there are controllers that are not associated with any RDMA device (e.g. during unsuccessful reconnection) and the user will unload the module, these controllers will not be freed and will access already freed memory. The same logic appears in other fabric drivers as well. Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset") Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
由 Max Gurtovoy 提交于
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd (symmetrical functions). Move the call for nvme_cleanup_cmd to the common core layer and call it during nvme_complete_rq for the good flow. For error flow, each transport will call nvme_cleanup_cmd independently. Also take care of a special case of path failure, where we call nvme_complete_rq without doing nvme_setup_cmd. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Israel Rukshin 提交于
This function improves code readability and reduces code duplication. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 10月, 2019 1 次提交
-
-
由 Keith Busch 提交于
A controller in the resetting state has not yet completed its recovery actions. The pci and fc transports were already handling this, so update the remaining transports to not attempt additional recovery in this state. Instead, just restart the request timer. Tested-by: NEdmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 28 9月, 2019 1 次提交
-
-
由 Sagi Grimberg 提交于
If the connect times out, we may have already destroyed the queue in the timeout handler, so test if the queue is still allocated in the connect error handler. Reported-by: NYi Zhang <yi.zhang@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 26 9月, 2019 1 次提交
-
-
由 Max Gurtovoy 提交于
By default, the NVMe/RDMA driver should support max io_size of 1MiB (or upto the maximum supported size by the HCA). Currently, one will see that /sys/class/block/<bdev>/queue/max_hw_sectors_kb is 1020 instead of 1024. A non power of 2 value can cause performance degradation due to unnecessary splitting of IO requests and unoptimized allocation units. The number of pages per MR has been fixed here, so there is no longer any need to reduce max_sectors by 1. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 30 8月, 2019 5 次提交
-
-
由 Israel Rukshin 提交于
Remove code duplication. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
We have a fundamental issue that fabric commands use the admin_q. The reason is, that admin-connect, register reads and writes and admin commands cannot be guaranteed ordering while we are running controller resets. For example, when we reset a controller we perform: 1. disable the controller 2. teardown the admin queue 3. re-establish the admin queue 4. enable the controller In order to perform (3), we need to unquiesce the admin queue, however we may have some admin commands that are already pending on the quiesced admin_q and will immediate execute when we unquiesce it before we execute (4). The host must not send admin commands to the controller before enabling the controller. To fix this, we have the fabric commands (admin connect and property get/set, but not I/O queue connect) use a separate fabrics_q and make sure to quiesce the admin_q before we disable the controller, and unquiesce it only after we enable the controller. This fixes the error prints from nvmet in a controller reset storm test: kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0 Which indicate that the host is sending an admin command when the controller is not enabled. Reviewed-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Israel Rukshin 提交于
For RDMA transports, TOS is an extension of IB QoS to provide clients the ability to segregate traffic flows for different type of data. RDMA CM abstract it for ULPs using rdma_set_service_type(). Internally, each traffic flow is represented by a connection with all of its independent resources like that of a normal connection, and is differentiated by service type. In other words, there can be multiple qp connections between an IP pair and each supports a unique service type. One of the TOS usage is bandwidth management which allows setting bandwidth limits for QoS classes, e.g. 80% bandwidth to controllers at QoS class A and 20% to controllers at QoS class B. Note: In addition to the TOS configuration, QOS must be configured on the relevant HCA on the target (send RDMA commands) and initiator to effect the traffic. usage examples: nvme connect --tos=0 --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
All seem to call it with ctrl->cap so no need to pass it at all. Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
nvme_enable_ctrl reads the cap register right after, so no need to do that locally in the transport driver. Have sqsize setting in nvme_init_identify. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 05 8月, 2019 1 次提交
-
-
由 Ming Lei 提交于
When aborting in-flight request for recovering controller, we have to make sure that queue's complete function is called on completed request before moving on. Otherwise, for example, the warning of WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered on nvme-rdma. Fix this issue by using blk_mq_tagset_wait_completed_request. Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 8月, 2019 1 次提交
-
-
由 Sagi Grimberg 提交于
When start_queue fails, we need to make sure to drain the queue cq before freeing the rdma resources because we might still race with the completion path. Have start_queue() error path safely stop the queue. -- [30371.808111] nvme nvme1: Failed reconnect attempt 11 [30371.808113] nvme nvme1: Reconnecting in 10 seconds... [...] [30382.069315] nvme nvme1: creating 4 I/O queues. [30382.257058] nvme nvme1: Connect Invalid SQE Parameter, qid 4 [30382.257061] nvme nvme1: failed to connect queue: 4 ret=386 [30382.305001] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [30382.305022] IP: qedr_poll_cq+0x8a3/0x1170 [qedr] [30382.305028] PGD 0 P4D 0 [30382.305037] Oops: 0000 [#1] SMP PTI [...] [30382.305153] Call Trace: [30382.305166] ? __switch_to_asm+0x34/0x70 [30382.305187] __ib_process_cq+0x56/0xd0 [ib_core] [30382.305201] ib_poll_handler+0x26/0x70 [ib_core] [30382.305213] irq_poll_softirq+0x88/0x110 [30382.305223] ? sort_range+0x20/0x20 [30382.305232] __do_softirq+0xde/0x2c6 [30382.305241] ? sort_range+0x20/0x20 [30382.305249] run_ksoftirqd+0x1c/0x60 [30382.305258] smpboot_thread_fn+0xef/0x160 [30382.305265] kthread+0x113/0x130 [30382.305273] ? kthread_create_worker_on_cpu+0x50/0x50 [30382.305281] ret_from_fork+0x35/0x40 -- Reported-by: NNicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 24 6月, 2019 1 次提交
-
-
由 Israel Rukshin 提交于
This is a preparation for adding new signature API to the rw-API. Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 21 6月, 2019 1 次提交
-
-
由 Ming Lei 提交于
sg_alloc_table_chained() currently allows the caller to provide one preallocated SGL and returns if the requested number isn't bigger than size of that SGL. This is used to inline an SGL for an IO request. However, scattergather code only allows that size of the 1st preallocated SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory (4KB) is claimed for the SGL for each IO request. If the I/O is small, it would be prudent to allocate a smaller SGL. Introduce an extra parameter to sg_alloc_table_chained() and sg_free_table_chained() for specifying size of the preallocated SGL. Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the same size except for the last one. Change the code to allow both functions to accept a variable size for the 1st preallocated SGL. [mkp: attempted to clarify commit desc] Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: netdev@vger.kernel.org Cc: linux-nvme@lists.infradead.org Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 07 6月, 2019 1 次提交
-
-
由 Max Gurtovoy 提交于
Commit 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset") caused a kernel panic when disconnecting from an inaccessible controller (disconnect during re-connection). -- nvme nvme0: Removing ctrl: NQN "testnqn1" nvme_rdma: nvme_rdma_exit_request: hctx 0 queue_idx 1 BUG: unable to handle kernel paging request at 0000000080000228 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... Call Trace: blk_mq_exit_hctx+0x5c/0xf0 blk_mq_exit_queue+0xd4/0x100 blk_cleanup_queue+0x9a/0xc0 nvme_rdma_destroy_io_queues+0x52/0x60 [nvme_rdma] nvme_rdma_shutdown_ctrl+0x3e/0x80 [nvme_rdma] nvme_do_delete_ctrl+0x53/0x80 [nvme_core] nvme_sysfs_delete+0x45/0x60 [nvme_core] kernfs_fop_write+0x105/0x180 vfs_write+0xad/0x1a0 ksys_write+0x5a/0xd0 do_syscall_64+0x55/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fa215417154 -- The reason for this crash is accessing an already freed ib_device for performing dma_unmap during exit_request commands. The root cause for that is that during re-connection all the queues are destroyed and re-created (and the ib_device is reference counted by the queues and freed as well) but the tagset stays alive and all the DMA mappings (that we perform in init_request) kept in the request context. The original commit fixed a different bug that was introduced during bonding (aka nic teaming) tests that for some scenarios change the underlying ib_device and caused memory leakage and possible segmentation fault. This commit is a complementary commit that also changes the wrong DMA mappings that were saved in the request context and making the request sqe dma mappings dynamic with the command lifetime (i.e. mapped in .queue_rq and unmapped in .complete). It also fixes the above crash of accessing freed ib_device during destruction of the tagset. Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset") Reported-by: NJim Harris <james.r.harris@intel.com> Suggested-by: NSagi Grimberg <sagi@grimberg.me> Tested-by: NJim Harris <james.r.harris@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 31 5月, 2019 1 次提交
-
-
由 Sagi Grimberg 提交于
When the controller supports less queues than requested, we should make sure that queue mapping does the right thing and not assume that all queues are available. This fixes a crash when the controller supports less queues than requested. The rules are: 1. if no write/poll queues are requested, we assign the available queues to the default queue map. The default and read queue maps share the existing queues. 2. if write queues are requested: - first make sure that read queue map gets the requested nr_io_queues count - then grant the default queue map the minimum between the requested nr_write_queues and the remaining queues. If there are no available queues to dedicate to the default queue map, fallback to (1) and share all the queues in the existing queue map. 3. if poll queues are requested: - map the remaining queues to the poll queue map. Also, provide a log indication on how we constructed the different queue maps. Reported-by: NHarris, James R <james.r.harris@intel.com> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Tested-by: NJim Harris <james.r.harris@intel.com> Cc: <stable@vger.kernel.org> # v5.0+ Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 13 5月, 2019 1 次提交
-
-
由 Max Gurtovoy 提交于
In the past, before adding f41725bb ("nvme-rdma: Use mr pool") commit, we needed a reference on the ib_device as long as the tagset was alive, as the MRs in the request structures needed a valid ib_device. Now, we allocate/deallocate MR pool per QP and consume on demand. Also remove nvme_rdma_free_tagset function and use blk_mq_free_tag_set instead, as it unneeded anymore. This commit also fixes a memory leakage and possible segmentation fault. When configuring the system with NIC teaming (aka bonding), we use 1 network interface to create an HA connection to the target side. In case one connection breaks down, nvme-rdma driver will get notification from rdma-cm layer that underlying address was change and will start error recovery process. During this process, we'll reconnect to the target via the second interface in the bond without destroying the tagset. This will cause a leakage of the initial rdma device (ndev) and miscount in the reference count of the new created rdma device (new ndev). In the final destruction (or in another error flow), we'll get a warning dump from the ib_dealloc_pd that we still have inflight MR's related to that pd. This happens becasue of the miscount of the reference tag of the rdma device and causing access violation to it's elements (some queues are not destroyed yet). Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 25 4月, 2019 1 次提交
-
-
由 Sagi Grimberg 提交于
If we timeout the admin startup sequence we might not yet have an I/O tagset allocated which causes the teardown sequence to crash. Make nvme_tcp_teardown_io_queues safe by not iterating inflight tags if the tagset wasn't allocated. Fixes: 4c174e63 ("nvme-rdma: fix timeout handler") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 21 2月, 2019 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check if a command contains data to be mapped. This fixes the case where a struct request contains LBAs, but it has no payload, such as Write Zeroes support. Fixes: 6e02318e ("nvme: add support for the Write Zeroes command") Reported-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 20 2月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 04 2月, 2019 1 次提交
-
-
由 Sagi Grimberg 提交于
It is used now just to flush error recovery and reconnect work items in the RDMA and TCP transports, which can simply be moved to the corresponding teardown routines. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 24 1月, 2019 2 次提交
-
-
由 Sagi Grimberg 提交于
If the device supports less queues than provided (if the device has less completion vectors), we might hit a bug due to the fact that we ignore that in nvme_rdma_map_queues (we override the maps nr_queues with user opts). Instead, keep track of how many default/read/poll queues we actually allocated (rather than asked by the user) and use that to assign our queue mappings. Fixes: b65bb777 (" nvme-rdma: support separate queue maps for read and write") Reported-by: NSaleem, Shiraz <shiraz.saleem@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
Currently, we have several problems with the timeout handler: 1. If we timeout on the controller establishment flow, we will hang because we don't execute the error recovery (and we shouldn't because the create_ctrl flow needs to fail and cleanup on its own) 2. We might also hang if we get a disconnet on a queue while the controller is already deleting. This racy flow can cause the controller disable/shutdown admin command to hang. We cannot complete a timed out request from the timeout handler without mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work). So we serialize it in the timeout handler and teardown io and admin queues to guarantee that no one races with us from completing the request. Reported-by: NJaesoo Lee <jalee@purestorage.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 12月, 2018 2 次提交
-
-
由 Sagi Grimberg 提交于
When passed with nr_poll_queues setup additional queues with cq polling context IB_POLL_DIRECT (no interrupts) and make sure to set QUEUE_FLAG_POLL on the connect_q. In addition add the third queue mapping for polling queues. nvmf connect on this queue is polled for like all other requests so make nvmf_connect_io_queue poll for polling queues. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Preparation for polling support for fabrics. Polling support means that our completion queues are not generating any interrupts which means we need to poll for the nvmf io queue connect as well. Reviewed by Steve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 13 12月, 2018 2 次提交
-
-
由 Sagi Grimberg 提交于
llow NVMF_OPT_NR_WRITE_QUEUES to describe additional write queues. In addition, implement .map_queues that will apply 2 queue maps for read and write queue sets. Note that with the separate queue map, HCTX_TYPE_READ will always use nr_io_queues and HCTX_TYPE_DEFAULT will use nr_write_queues. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Will be used by nvme-rdma for queue map separation support. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 08 12月, 2018 1 次提交
-
-
由 Hannes Reinecke 提交于
Instead of directly poking into the struct device add a new numa_node field to struct nvme_ctrl. This allows fabrics drivers where ctrl->dev is a virtual device to support NUMA affinity as well. Also expose the field as a sysfs attribute, and populate it for the RDMA and FC transports. Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-