1. 15 1月, 2021 1 次提交
  2. 06 1月, 2021 1 次提交
    • I
      nvmet-rdma: Fix list_del corruption on queue establishment failure · 9ceb7863
      Israel Rukshin 提交于
      When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some
      requests at rsp_wait_list. In case a disconnect occurs at this
      state, no one will empty this list and will return the requests to
      free_rsps list. Normally nvmet_rdma_queue_established() free those
      requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in
      this case __nvmet_rdma_queue_disconnect() is called before. The
      crash happens at nvmet_rdma_free_rsps() when calling
      list_del(&rsp->free_list), because the request exists only at
      the wait list. To fix the issue, simply clear rsp_wait_list when
      destroying the queue.
      Signed-off-by: NIsrael Rukshin <israelr@nvidia.com>
      Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9ceb7863
  3. 18 11月, 2020 1 次提交
  4. 24 8月, 2020 1 次提交
  5. 29 7月, 2020 1 次提交
  6. 08 7月, 2020 1 次提交
  7. 28 5月, 2020 1 次提交
  8. 27 5月, 2020 2 次提交
  9. 10 5月, 2020 1 次提交
    • M
      nvmet-rdma: use SRQ per completion vector · b0012dd3
      Max Gurtovoy 提交于
      In order to save resource allocation and utilize the completion
      locality in a better way (compared to SRQ per device that exist today),
      allocate Shared Receive Queues (SRQs) per completion vector. Associate
      each created QP/CQ with an appropriate SRQ according to the queue index.
      This association will reduce the lock contention in the fast path
      (compared to SRQ per device solution) and increase the locality in
      memory buffers. Add new module parameter for SRQ size to adjust it
      according to the expected load. User should make sure the size is >= 256
      to avoid lack of resources. Also reduce the debug level of "last WQE
      reached" event that is raised when a QP is using SRQ during destruction
      process to relief the log.
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b0012dd3
  10. 08 4月, 2020 1 次提交
  11. 04 4月, 2020 1 次提交
  12. 26 3月, 2020 2 次提交
  13. 17 3月, 2020 1 次提交
  14. 05 11月, 2019 2 次提交
  15. 25 4月, 2019 2 次提交
  16. 20 2月, 2019 1 次提交
  17. 24 1月, 2019 1 次提交
  18. 13 12月, 2018 1 次提交
  19. 08 12月, 2018 1 次提交
  20. 07 12月, 2018 1 次提交
  21. 09 11月, 2018 1 次提交
  22. 18 10月, 2018 2 次提交
    • L
      nvmet: Optionally use PCI P2P memory · c6925093
      Logan Gunthorpe 提交于
      Create a configfs attribute in each nvme-fabrics namespace to enable P2P
      memory use.  The attribute may be enabled (with a boolean) or a specific
      P2P device may be given (with the device's PCI name).
      
      When enabled, the namespace will ensure the underlying block device
      supports P2P and is compatible with any specified P2P device.  If no device
      was specified it will ensure there is compatible P2P memory somewhere in
      the system.  Enabling a namespace with P2P memory will fail with EINVAL
      (and an appropriate dmesg error) if any of these conditions are not met.
      
      Once a controller is set up on a specific port, the P2P device to use for
      each namespace will be found and stored in a radix tree by namespace ID.
      When memory is allocated for a request, the tree is used to look up the P2P
      device to allocate memory against.  If no device is in the tree (because no
      appropriate device was found), or if allocation of P2P memory fails, fall
      back to using regular memory.
      Signed-off-by: NStephen Bates <sbates@raithlin.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      [hch: partial rewrite of the initial code]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      c6925093
    • L
      nvmet: Introduce helper functions to allocate and free request SGLs · 5b2322e4
      Logan Gunthorpe 提交于
      Add helpers to allocate and free the SGL in a struct nvmet_req:
      
        int nvmet_req_alloc_sgl(struct nvmet_req *req)
        void nvmet_req_free_sgl(struct nvmet_req *req)
      
      This will be expanded in a future patch to implement peer-to-peer memory
      DMAs and should be common with all target drivers.
      
      The new helpers are used in nvmet-rdma.  Seeing we use req.transfer_len as
      the length of the SGL it is set earlier and cleared on any error.  It also
      seems to be unnecessary to accumulate the length as the map_sgl functions
      should only ever be called once per request.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NSagi Grimberg <sagi@grimberg.me>
      5b2322e4
  23. 17 10月, 2018 1 次提交
  24. 05 10月, 2018 1 次提交
    • S
      nvmet-rdma: use a private workqueue for delete · 2acf70ad
      Sagi Grimberg 提交于
      Queue deletion is done asynchronous when the last reference on the queue
      is dropped.  Thus, in order to make sure we don't over allocate under a
      connect/disconnect storm, we let queue deletion complete before making
      forward progress.
      
      However, given that we flush the system_wq from rdma_cm context which
      runs from a workqueue context, we can have a circular locking complaint
      [1]. Fix that by using a private workqueue for queue deletion.
      
      [1]:
      ======================================================
      WARNING: possible circular locking dependency detected
      4.19.0-rc4-dbg+ #3 Not tainted
      ------------------------------------------------------
      kworker/5:0/39 is trying to acquire lock:
      00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]
      
      but task is already holding lock:
      00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 ((work_completion)(&queue->release_work)){+.+.}:
             process_one_work+0x474/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #2 ((wq_completion)"events"){+.+.}:
             flush_workqueue+0xf3/0x970
             nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma]
             cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
             cm_process_work+0x2e/0x110 [ib_cm]
             cm_req_handler+0x135b/0x1c30 [ib_cm]
             cm_work_handler+0x2b7/0x38cd [ib_cm]
             process_one_work+0x4ae/0xa20
      nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      nvme nvme0: Reconnecting in 10 seconds...
      
      -> #1 (&id_priv->handler_mutex/1){+.+.}:
             __mutex_lock+0xfe/0xbe0
             mutex_lock_nested+0x1b/0x20
             cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
             cm_process_work+0x2e/0x110 [ib_cm]
             cm_req_handler+0x135b/0x1c30 [ib_cm]
             cm_work_handler+0x2b7/0x38cd [ib_cm]
             process_one_work+0x4ae/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #0 (&id_priv->handler_mutex){+.+.}:
             lock_acquire+0xc5/0x200
             __mutex_lock+0xfe/0xbe0
             mutex_lock_nested+0x1b/0x20
             rdma_destroy_id+0x6f/0x440 [rdma_cm]
             nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
             process_one_work+0x4ae/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      Fixes: 777dc823 ("nvmet-rdma: occasionally flush ongoing controller teardown")
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2acf70ad
  25. 06 9月, 2018 1 次提交
    • S
      nvmet-rdma: fix possible bogus dereference under heavy load · 8407879c
      Sagi Grimberg 提交于
      Currently we always repost the recv buffer before we send a response
      capsule back to the host. Since ordering is not guaranteed for send
      and recv completions, it is posible that we will receive a new request
      from the host before we got a send completion for the response capsule.
      
      Today, we pre-allocate 2x rsps the length of the queue, but in reality,
      under heavy load there is nothing that is really preventing the gap to
      expand until we exhaust all our rsps.
      
      To fix this, if we don't have any pre-allocated rsps left, we dynamically
      allocate a rsp and make sure to free it when we are done. If under memory
      pressure we fail to allocate a rsp, we silently drop the command and
      wait for the host to retry.
      Reported-by: NSteve Wise <swise@opengridcomputing.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: dropped a superflous assignment]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8407879c
  26. 25 7月, 2018 1 次提交
  27. 23 7月, 2018 3 次提交
  28. 19 6月, 2018 1 次提交
  29. 26 3月, 2018 5 次提交