1. 28 9月, 2019 1 次提交
  2. 26 9月, 2019 1 次提交
  3. 30 8月, 2019 5 次提交
  4. 05 8月, 2019 1 次提交
  5. 01 8月, 2019 1 次提交
    • S
      nvme-rdma: fix possible use-after-free in connect error flow · d94211b8
      Sagi Grimberg 提交于
      When start_queue fails, we need to make sure to drain the
      queue cq before freeing the rdma resources because we might
      still race with the completion path. Have start_queue() error
      path safely stop the queue.
      
      --
      [30371.808111] nvme nvme1: Failed reconnect attempt 11
      [30371.808113] nvme nvme1: Reconnecting in 10 seconds...
      [...]
      [30382.069315] nvme nvme1: creating 4 I/O queues.
      [30382.257058] nvme nvme1: Connect Invalid SQE Parameter, qid 4
      [30382.257061] nvme nvme1: failed to connect queue: 4 ret=386
      [30382.305001] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [30382.305022] IP: qedr_poll_cq+0x8a3/0x1170 [qedr]
      [30382.305028] PGD 0 P4D 0
      [30382.305037] Oops: 0000 [#1] SMP PTI
      [...]
      [30382.305153] Call Trace:
      [30382.305166]  ? __switch_to_asm+0x34/0x70
      [30382.305187]  __ib_process_cq+0x56/0xd0 [ib_core]
      [30382.305201]  ib_poll_handler+0x26/0x70 [ib_core]
      [30382.305213]  irq_poll_softirq+0x88/0x110
      [30382.305223]  ? sort_range+0x20/0x20
      [30382.305232]  __do_softirq+0xde/0x2c6
      [30382.305241]  ? sort_range+0x20/0x20
      [30382.305249]  run_ksoftirqd+0x1c/0x60
      [30382.305258]  smpboot_thread_fn+0xef/0x160
      [30382.305265]  kthread+0x113/0x130
      [30382.305273]  ? kthread_create_worker_on_cpu+0x50/0x50
      [30382.305281]  ret_from_fork+0x35/0x40
      --
      Reported-by: NNicolas Morey-Chaisemartin <NMoreyChaisemartin@suse.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d94211b8
  6. 24 6月, 2019 1 次提交
  7. 21 6月, 2019 1 次提交
    • M
      scsi: lib/sg_pool.c: improve APIs for allocating sg pool · 4635873c
      Ming Lei 提交于
      sg_alloc_table_chained() currently allows the caller to provide one
      preallocated SGL and returns if the requested number isn't bigger than
      size of that SGL. This is used to inline an SGL for an IO request.
      
      However, scattergather code only allows that size of the 1st preallocated
      SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory
      (4KB) is claimed for the SGL for each IO request. If the I/O is small, it
      would be prudent to allocate a smaller SGL.
      
      Introduce an extra parameter to sg_alloc_table_chained() and
      sg_free_table_chained() for specifying size of the preallocated SGL.
      
      Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the
      same size except for the last one.  Change the code to allow both functions
      to accept a variable size for the 1st preallocated SGL.
      
      [mkp: attempted to clarify commit desc]
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4635873c
  8. 07 6月, 2019 1 次提交
    • M
      nvme-rdma: use dynamic dma mapping per command · 62f99b62
      Max Gurtovoy 提交于
      Commit 87fd1253 ("nvme-rdma: remove redundant reference between
      ib_device and tagset") caused a kernel panic when disconnecting from an
      inaccessible controller (disconnect during re-connection).
      
      --
      nvme nvme0: Removing ctrl: NQN "testnqn1"
      nvme_rdma: nvme_rdma_exit_request: hctx 0 queue_idx 1
      BUG: unable to handle kernel paging request at 0000000080000228
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      ...
      Call Trace:
       blk_mq_exit_hctx+0x5c/0xf0
       blk_mq_exit_queue+0xd4/0x100
       blk_cleanup_queue+0x9a/0xc0
       nvme_rdma_destroy_io_queues+0x52/0x60 [nvme_rdma]
       nvme_rdma_shutdown_ctrl+0x3e/0x80 [nvme_rdma]
       nvme_do_delete_ctrl+0x53/0x80 [nvme_core]
       nvme_sysfs_delete+0x45/0x60 [nvme_core]
       kernfs_fop_write+0x105/0x180
       vfs_write+0xad/0x1a0
       ksys_write+0x5a/0xd0
       do_syscall_64+0x55/0x110
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fa215417154
      --
      
      The reason for this crash is accessing an already freed ib_device for
      performing dma_unmap during exit_request commands. The root cause for
      that is that during re-connection all the queues are destroyed and
      re-created (and the ib_device is reference counted by the queues and
      freed as well) but the tagset stays alive and all the DMA mappings (that
      we perform in init_request) kept in the request context. The original
      commit fixed a different bug that was introduced during bonding (aka nic
      teaming) tests that for some scenarios change the underlying ib_device
      and caused memory leakage and possible segmentation fault. This commit
      is a complementary commit that also changes the wrong DMA mappings that
      were saved in the request context and making the request sqe dma
      mappings dynamic with the command lifetime (i.e. mapped in .queue_rq and
      unmapped in .complete). It also fixes the above crash of accessing freed
      ib_device during destruction of the tagset.
      
      Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset")
      Reported-by: NJim Harris <james.r.harris@intel.com>
      Suggested-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NJim Harris <james.r.harris@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      62f99b62
  9. 31 5月, 2019 1 次提交
    • S
      nvme-rdma: fix queue mapping when queue count is limited · 5651cd3c
      Sagi Grimberg 提交于
      When the controller supports less queues than requested, we
      should make sure that queue mapping does the right thing and
      not assume that all queues are available. This fixes a crash
      when the controller supports less queues than requested.
      
      The rules are:
      1. if no write/poll queues are requested, we assign the available queues
         to the default queue map. The default and read queue maps share the
         existing queues.
      2. if write queues are requested:
        - first make sure that read queue map gets the requested
          nr_io_queues count
        - then grant the default queue map the minimum between the requested
          nr_write_queues and the remaining queues. If there are no available
          queues to dedicate to the default queue map, fallback to (1) and
          share all the queues in the existing queue map.
      3. if poll queues are requested:
        - map the remaining queues to the poll queue map.
      
      Also, provide a log indication on how we constructed the different
      queue maps.
      Reported-by: NHarris, James R <james.r.harris@intel.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Tested-by: NJim Harris <james.r.harris@intel.com>
      Cc: <stable@vger.kernel.org> # v5.0+
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      5651cd3c
  10. 13 5月, 2019 1 次提交
    • M
      nvme-rdma: remove redundant reference between ib_device and tagset · 87fd1253
      Max Gurtovoy 提交于
      In the past, before adding f41725bb ("nvme-rdma: Use mr pool") commit,
      we needed a reference on the ib_device as long as the tagset
      was alive, as the MRs in the request structures needed a valid ib_device.
      Now, we allocate/deallocate MR pool per QP and consume on demand.
      
      Also remove nvme_rdma_free_tagset function and use blk_mq_free_tag_set
      instead, as it unneeded anymore.
      
      This commit also fixes a memory leakage and possible segmentation fault.
      When configuring the system with NIC teaming (aka bonding), we use 1
      network interface to create an HA connection to the target side. In case
      one connection breaks down, nvme-rdma driver will get notification from
      rdma-cm layer that underlying address was change and will start error
      recovery process. During this process, we'll reconnect to the target
      via the second interface in the bond without destroying the tagset.
      This will cause a leakage of the initial rdma device (ndev) and miscount
      in the reference count of the new created rdma device (new ndev). In
      the final destruction (or in another error flow), we'll get a warning
      dump from the ib_dealloc_pd that we still have inflight MR's related to
      that pd. This happens becasue of the miscount of the reference tag of
      the rdma device and causing access violation to it's elements (some
      queues are not destroyed yet).
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      87fd1253
  11. 25 4月, 2019 1 次提交
  12. 21 2月, 2019 1 次提交
  13. 20 2月, 2019 1 次提交
  14. 04 2月, 2019 1 次提交
  15. 24 1月, 2019 2 次提交
    • S
      nvme-rdma: rework queue maps handling · b1064d3e
      Sagi Grimberg 提交于
      If the device supports less queues than provided (if the device has less
      completion vectors), we might hit a bug due to the fact that we ignore
      that in nvme_rdma_map_queues (we override the maps nr_queues with user
      opts).
      
      Instead, keep track of how many default/read/poll queues we actually
      allocated (rather than asked by the user) and use that to assign our
      queue mappings.
      
      Fixes: b65bb777 (" nvme-rdma: support separate queue maps for read and write")
      Reported-by: NSaleem, Shiraz <shiraz.saleem@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1064d3e
    • S
      nvme-rdma: fix timeout handler · 4c174e63
      Sagi Grimberg 提交于
      Currently, we have several problems with the timeout
      handler:
      1. If we timeout on the controller establishment flow, we will hang
      because we don't execute the error recovery (and we shouldn't because
      the create_ctrl flow needs to fail and cleanup on its own)
      2. We might also hang if we get a disconnet on a queue while the
      controller is already deleting. This racy flow can cause the controller
      disable/shutdown admin command to hang.
      
      We cannot complete a timed out request from the timeout handler without
      mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work).
      So we serialize it in the timeout handler and teardown io and admin
      queues to guarantee that no one races with us from completing the
      request.
      Reported-by: NJaesoo Lee <jalee@purestorage.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4c174e63
  16. 19 12月, 2018 2 次提交
  17. 13 12月, 2018 2 次提交
  18. 08 12月, 2018 1 次提交
  19. 05 12月, 2018 1 次提交
  20. 01 12月, 2018 1 次提交
  21. 26 11月, 2018 2 次提交
  22. 19 10月, 2018 2 次提交
  23. 17 10月, 2018 1 次提交
  24. 25 7月, 2018 1 次提交
  25. 24 7月, 2018 5 次提交
  26. 23 7月, 2018 2 次提交