提交 · 7a84665619bb5da8c8b6517157875a1fd7632014 · openeuler / Kernel

15 1月, 2021 1 次提交

nvmet-rdma: Fix NULL deref when setting pi_enable and traddr INADDR_ANY · 7a846656

由 Israel Rukshin 提交于 1月 10, 2021

When setting port traddr to INADDR_ANY, the listening cm_id->device
is NULL. The associate IB device is known only when a connect request
event arrives, so checking T10-PI device capability should be done
at this stage.

Fixes: b09160c3 ("nvmet-rdma: add metadata/T10-PI support")
Signed-off-by: NIsrael Rukshin <israelr@nvidia.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7a846656

06 1月, 2021 1 次提交

nvmet-rdma: Fix list_del corruption on queue establishment failure · 9ceb7863

由 Israel Rukshin 提交于 1月 05, 2021

When a queue is in NVMET_RDMA_Q_CONNECTING state, it may has some
requests at rsp_wait_list. In case a disconnect occurs at this
state, no one will empty this list and will return the requests to
free_rsps list. Normally nvmet_rdma_queue_established() free those
requests after moving the queue to NVMET_RDMA_Q_LIVE state, but in
this case __nvmet_rdma_queue_disconnect() is called before. The
crash happens at nvmet_rdma_free_rsps() when calling
list_del(&rsp->free_list), because the request exists only at
the wait list. To fix the issue, simply clear rsp_wait_list when
destroying the queue.
Signed-off-by: NIsrael Rukshin <israelr@nvidia.com>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9ceb7863

18 11月, 2020 1 次提交

RDMA/core: remove use of dma_virt_ops · 5a7a9e03

由 Christoph Hellwig 提交于 11月 06, 2020

Use the ib_dma_* helpers to skip the DMA translation instead. This
removes the last user if dma_virt_ops and keeps the weird layering
violation inside the RDMA core instead of burderning the DMA mapping
subsystems with it. This also means the software RDMA drivers now don't
have to mess with DMA parameters that are not relevant to them at all, and
that in the future we can use PCI P2P transfers even for software RDMA, as
there is no first fake layer of DMA mapping that the P2P DMA support.

Link: https://lore.kernel.org/r/20201106181941.1878556-8-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

5a7a9e03

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

29 7月, 2020 1 次提交

nvmet-rdma: use new shared CQ mechanism · ca0f1a80

由 Yamin Friedman 提交于 7月 13, 2020

Has the driver use shared CQs providing ~10%-20% improvement when
multiple disks are used. Instead of opening a CQ for each QP per
controller, a CQ for each core will be provided by the RDMA core driver
that will be shared between the QPs on that core reducing interrupt
overhead.
Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca0f1a80

08 7月, 2020 1 次提交

nvmet: introduce flags member in nvmet_fabrics_ops · 6fa350f7

由 Max Gurtovoy 提交于 6月 02, 2020

Replace has_keyed_sgls and metadata_support booleans with a flags member
that will be used for adding more features in the future.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6fa350f7

28 5月, 2020 1 次提交

RDMA/cma: Provide ECE reject reason · 8094ba0a

由 Leon Romanovsky 提交于 5月 26, 2020

IBTA declares "vendor option not supported" reject reason in REJ messages
if passive side doesn't want to accept proposed ECE options.

Due to the fact that ECE is managed by userspace, there is a need to let
users to provide such rejected reason.

Link: https://lore.kernel.org/r/20200526103304.196371-7-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8094ba0a

27 5月, 2020 2 次提交

nvmet-rdma: add metadata/T10-PI support · b09160c3

由 Israel Rukshin 提交于 5月 19, 2020

For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata support was implemented over the new RDMA signature
verbs API.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b09160c3

nvmet: add metadata support for block devices · c6e3f133

由 Israel Rukshin 提交于 5月 19, 2020

Allocate the metadata SGL buffers and set metadata fields for the
request. Then create a block IO request for the metadata from the
protection SG list.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c6e3f133

10 5月, 2020 1 次提交

nvmet-rdma: use SRQ per completion vector · b0012dd3

由 Max Gurtovoy 提交于 4月 19, 2017

In order to save resource allocation and utilize the completion
locality in a better way (compared to SRQ per device that exist today),
allocate Shared Receive Queues (SRQs) per completion vector. Associate
each created QP/CQ with an appropriate SRQ according to the queue index.
This association will reduce the lock contention in the fast path
(compared to SRQ per device solution) and increase the locality in
memory buffers. Add new module parameter for SRQ size to adjust it
according to the expected load. User should make sure the size is >= 256
to avoid lack of resources. Also reduce the debug level of "last WQE
reached" event that is raised when a QP is using SRQ during destruction
process to relief the log.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0012dd3

08 4月, 2020 1 次提交

nvmet-rdma: fix double free of rdma queue · 21f90243

由 Israel Rukshin 提交于 4月 07, 2020

In case rdma accept fails at nvmet_rdma_queue_connect(), release work is
scheduled. Later on, a new RDMA CM event may arrive since we didn't
destroy the cm-id and call nvmet_rdma_queue_connect_fail(), which
schedule another release work. This will cause calling
nvmet_rdma_free_queue twice. To fix this we implicitly destroy the cm_id
with non-zero ret code, which guarantees that new rdma_cm events will
not arrive afterwards. Also add a qp pointer to nvmet_rdma_queue
structure, so we can use it when the cm_id pointer is NULL or was
destroyed.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

21f90243

04 4月, 2020 1 次提交

nvmet-rdma: fix bonding failover possible NULL deref · a032e4f6

由 Sagi Grimberg 提交于 4月 02, 2020

RDMA_CM_EVENT_ADDR_CHANGE event occur in the case of bonding failover
on normal as well as on listening cm_ids. Hence this event will
immediately trigger a NULL dereference trying to disconnect a queue
for a cm_id that actually belongs to the port.

To fix this we provide a different handler for the listener cm_ids
that will defer a work to disable+(re)enable the port which essentially
destroys and setups another listener cm_id
Reported-by: NAlex Lyakas <alex@zadara.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Tested-by: NAlex Lyakas <alex@zadara.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a032e4f6

26 3月, 2020 2 次提交

nvmet-rdma: allocate RW ctxs according to mdts · c363f249

由 Max Gurtovoy 提交于 3月 08, 2020

Current nvmet-rdma code allocates MR pool budget based on queue size,
assuming both host and target use the same "max_pages_per_mr" count.
After limiting the mdts value for RDMA controllers, we know the factor
of maximum MR's per IO operation. Thus, make sure MR pool will be
sufficient for the required IO depth and IO size.

That is, say host's SQ size is 100, then the MR pool budget allocated
currently at target will also be 100 MRs. But 100 IO WRITE Requests
with 256 sg_count(IO size above 1MB) require 200 MRs when target's
"max_pages_per_mr" is 128.
Reported-by: NKrishnamraju Eraparaju <krishna2@chelsio.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>

c363f249

nvmet-rdma: Implement get_mdts controller op · ec6d20e1

由 Max Gurtovoy 提交于 3月 08, 2020

Set the maximal data transfer size to be 1MB (currently mdts is
unlimited). This will allow calculating the amount of MR's that
one ctrl should allocate to fulfill it's capabilities.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>

ec6d20e1

17 3月, 2020 1 次提交

scsi: treewide: Consolidate {get,put}_unaligned_[bl]e24() definitions · a7afff31

由 Bart Van Assche 提交于 3月 13, 2020

Move the get_unaligned_be24(), get_unaligned_le24() and
put_unaligned_le24() definitions from various drivers into
include/linux/unaligned/generic.h. Add a put_unaligned_be24()
implementation.

Link: https://lore.kernel.org/r/20200313203102.16613-4-bvanassche@acm.org
Cc: Keith Busch <kbusch@kernel.org>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@fb.com>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> # For drivers/usb
Reviewed-by: Felipe Balbi <balbi@kernel.org> # For drivers/usb/gadget
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a7afff31

05 11月, 2019 2 次提交

nvmet: Open code nvmet_req_execute() · be3f3114

由 Christoph Hellwig 提交于 10月 23, 2019

Now that nvmet_req_execute does nothing, open code it.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[split patch, update changelog]
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be3f3114

nvmet-rdma: add unlikely check at nvmet_rdma_map_sgl_keyed · 59534b9d

由 Israel Rukshin 提交于 10月 13, 2019

The calls to nvmet_req_alloc_sgl and rdma_rw_ctx_init should usually
succeed, so add this simple optimization to the fast path.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59534b9d

25 4月, 2019 2 次提交

nvmet: rename nvme_completion instances from rsp to cqe · fc6c9730

由 Max Gurtovoy 提交于 4月 08, 2019

Use NVMe namings for improving code readability.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by : Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

fc6c9730

nvmet-rdma: remove p2p_client initialization from fast-path · 8dc2ed3f

由 Max Gurtovoy 提交于 4月 08, 2019

Initialize it during command allocation.

Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Stephen Bates <sbates@raithlin.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8dc2ed3f

20 2月, 2019 1 次提交

nvmet-rdma: convert to SPDX identifiers · 3641bd32

由 Christoph Hellwig 提交于 2月 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

3641bd32

24 1月, 2019 1 次提交

nvmet-rdma: fix null dereference under heavy load · 5cbab630

由 Raju Rangoju 提交于 1月 03, 2019

Under heavy load if we don't have any pre-allocated rsps left, we
dynamically allocate a rsp, but we are not actually allocating memory
for nvme_completion (rsp->req.rsp). In such a case, accessing pointer
fields (req->rsp->status) in nvmet_req_init() will result in crash.

To fix this, allocate the memory for nvme_completion by calling
nvmet_rdma_alloc_rsp()

Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load")

Cc: <stable@vger.kernel.org>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NRaju Rangoju <rajur@chelsio.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5cbab630

13 12月, 2018 1 次提交

nvmet: add error log support for rdma backend · 762a11df

由 Chaitanya Kulkarni 提交于 12月 12, 2018

This patch adds the support to maintain the error log page for rdma
transport, we mainly focus here on the NVME_INVALID_FIELD errors.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

762a11df

08 12月, 2018 1 次提交

nvmet-rdma: Add unlikely for response allocated check · ad1f8249

由 Israel Rukshin 提交于 11月 19, 2018

Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad1f8249

07 12月, 2018 1 次提交

nvmet-rdma: fix response use after free · d7dcdf9d

由 Israel Rukshin 提交于 12月 05, 2018

nvmet_rdma_release_rsp() may free the response before using it at error
flow.

Fixes: 8407879c ("nvmet-rdma: fix possible bogus dereference under heavy load")
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d7dcdf9d

09 11月, 2018 1 次提交

Revert "nvmet-rdma: use a private workqueue for delete" · d39aa497

由 Christoph Hellwig 提交于 11月 07, 2018

This reverts commit 2acf70ad.

The commit never really fixed the intended issue and caused all
kinds of other issues, including a use before initialization.
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d39aa497

18 10月, 2018 2 次提交

nvmet: Optionally use PCI P2P memory · c6925093

由 Logan Gunthorpe 提交于 10月 04, 2018

Create a configfs attribute in each nvme-fabrics namespace to enable P2P
memory use.  The attribute may be enabled (with a boolean) or a specific
P2P device may be given (with the device's PCI name).

When enabled, the namespace will ensure the underlying block device
supports P2P and is compatible with any specified P2P device.  If no device
was specified it will ensure there is compatible P2P memory somewhere in
the system.  Enabling a namespace with P2P memory will fail with EINVAL
(and an appropriate dmesg error) if any of these conditions are not met.

Once a controller is set up on a specific port, the P2P device to use for
each namespace will be found and stored in a radix tree by namespace ID.
When memory is allocated for a request, the tree is used to look up the P2P
device to allocate memory against.  If no device is in the tree (because no
appropriate device was found), or if allocation of P2P memory fails, fall
back to using regular memory.
Signed-off-by: NStephen Bates <sbates@raithlin.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
[hch: partial rewrite of the initial code]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

c6925093

nvmet: Introduce helper functions to allocate and free request SGLs · 5b2322e4

由 Logan Gunthorpe 提交于 10月 04, 2018

Add helpers to allocate and free the SGL in a struct nvmet_req:

  int nvmet_req_alloc_sgl(struct nvmet_req *req)
  void nvmet_req_free_sgl(struct nvmet_req *req)

This will be expanded in a future patch to implement peer-to-peer memory
DMAs and should be common with all target drivers.

The new helpers are used in nvmet-rdma.  Seeing we use req.transfer_len as
the length of the SGL it is set earlier and cleared on any error.  It also
seems to be unnecessary to accumulate the length as the map_sgl functions
should only ever be called once per request.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>

5b2322e4

17 10月, 2018 1 次提交

nvmet-rdma: declare local symbols static · 0d3ebdec

由 Bart Van Assche 提交于 10月 08, 2018

This patch avoids that sparse complains about missing declarations.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0d3ebdec

05 10月, 2018 1 次提交

nvmet-rdma: use a private workqueue for delete · 2acf70ad

由 Sagi Grimberg 提交于 9月 27, 2018

Queue deletion is done asynchronous when the last reference on the queue
is dropped.  Thus, in order to make sure we don't over allocate under a
connect/disconnect storm, we let queue deletion complete before making
forward progress.

However, given that we flush the system_wq from rdma_cm context which
runs from a workqueue context, we can have a circular locking complaint
[1]. Fix that by using a private workqueue for queue deletion.

[1]:
======================================================
WARNING: possible circular locking dependency detected
4.19.0-rc4-dbg+ #3 Not tainted
------------------------------------------------------
kworker/5:0/39 is trying to acquire lock:
00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]

but task is already holding lock:
00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 ((work_completion)(&queue->release_work)){+.+.}:
       process_one_work+0x474/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #2 ((wq_completion)"events"){+.+.}:
       flush_workqueue+0xf3/0x970
       nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma]
       cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_req_handler+0x135b/0x1c30 [ib_cm]
       cm_work_handler+0x2b7/0x38cd [ib_cm]
       process_one_work+0x4ae/0xa20
nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30
nvme nvme0: Reconnecting in 10 seconds...

-> #1 (&id_priv->handler_mutex/1){+.+.}:
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
       cm_process_work+0x2e/0x110 [ib_cm]
       cm_req_handler+0x135b/0x1c30 [ib_cm]
       cm_work_handler+0x2b7/0x38cd [ib_cm]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #0 (&id_priv->handler_mutex){+.+.}:
       lock_acquire+0xc5/0x200
       __mutex_lock+0xfe/0xbe0
       mutex_lock_nested+0x1b/0x20
       rdma_destroy_id+0x6f/0x440 [rdma_cm]
       nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
       process_one_work+0x4ae/0xa20
       worker_thread+0x63/0x5a0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

Fixes: 777dc823 ("nvmet-rdma: occasionally flush ongoing controller teardown")
Reported-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2acf70ad

06 9月, 2018 1 次提交

nvmet-rdma: fix possible bogus dereference under heavy load · 8407879c

由 Sagi Grimberg 提交于 9月 03, 2018

Currently we always repost the recv buffer before we send a response
capsule back to the host. Since ordering is not guaranteed for send
and recv completions, it is posible that we will receive a new request
from the host before we got a send completion for the response capsule.

Today, we pre-allocate 2x rsps the length of the queue, but in reality,
under heavy load there is nothing that is really preventing the gap to
expand until we exhaust all our rsps.

To fix this, if we don't have any pre-allocated rsps left, we dynamically
allocate a rsp and make sure to free it when we are done. If under memory
pressure we fail to allocate a rsp, we silently drop the command and
wait for the host to retry.
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
[hch: dropped a superflous assignment]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8407879c

25 7月, 2018 1 次提交

nvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() calls · 23f96d1f

由 Bart Van Assche 提交于 7月 18, 2018

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

23f96d1f

23 7月, 2018 3 次提交

nvmet-rdma: add an error flow for post_recv failures · 20209384

由 Max Gurtovoy 提交于 7月 01, 2018

Posting receive buffer operation can fail, thus we should make
sure to have an error flow during initialization phase. While
we're here, add a debug print in case of a failure.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

20209384

nvmet-rdma: add unlikely check in the fast path · 2fc464e2

由 Max Gurtovoy 提交于 6月 27, 2018

ib_post_send operation should succeed unless something unusual
happened to the ib device.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2fc464e2

nvmet-rdma: support max(16KB, PAGE_SIZE) inline data · 0d5ee2b2

由 Steve Wise 提交于 6月 20, 2018

The patch enables inline data sizes using up to 4 recv sges, and capping
the size at 16KB or at least 1 page size. So on a 4K page system, up to
16KB is supported, and for a 64K page system 1 page of 64KB is supported.

We avoid > 0 order page allocations for the inline buffers by using
multiple recv sges, one for each page. If the device cannot support
the configured inline data size due to lack of enough recv sges, then
log a warning and reduce the inline size.

Add a new configfs port attribute, called param_inline_data_size,
to allow configuring the size of inline data for a given nvmf port.
The maximum size allowed is still enforced by nvmet-rdma with
NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE).
And the default size, if not specified via configfs, is still PAGE_SIZE.
This preserves the existing behavior, but allows larger inline sizes
for small page systems. If the configured inline data size exceeds
NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is
reduced. If param_inline_data_size is set to 0, then inline data is
disabled for that nvmf port.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0d5ee2b2

19 6月, 2018 1 次提交

IB/core: add max_send_sge and max_recv_sge attributes · 33023fb8

由 Steve Wise 提交于 6月 18, 2018

This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

33023fb8

26 3月, 2018 5 次提交

nvmet: constify struct nvmet_fabrics_ops · e929f06d

由 Christoph Hellwig 提交于 3月 20, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e929f06d

nvmet-rdma: Don't flush system_wq by default during remove_one · a3dd7d00

由 Max Gurtovoy 提交于 2月 28, 2018

The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the system work queue.
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a3dd7d00

nvmet-rdma: Fix use after free in nvmet_rdma_cm_handler() · e1a2ee24

由 Israel Rukshin 提交于 3月 14, 2018

We free nvmet rdma queues while handling rdma_cm events.
In order to avoid this we destroy the qp and the queue after destroying
the cm_id which guarantees that all rdma_cm events are done.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e1a2ee24

nvmet-rdma: Remove unused queue state · be9bddeb

由 Israel Rukshin 提交于 3月 14, 2018

Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be9bddeb

nvmet: don't return "any" ip address in discovery log page · 4c652685

由 Sagi Grimberg 提交于 1月 24, 2018

Its perfectly valid to assign a nvmet port to listen on "any"
IP address (traddr 0.0.0.0 for ipv4 address family) for IP based
transport ports. However, we must not return this address in
discovery log entries. Instead we need to return the address
where the request was accepted on (req->port address).

Since this is nvme transport specific, introduce an optional
.disc_traddr interface that is designed to check that a
port in question is bound to "any" IP address and if so, set
the traddr from the port where the request came from.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4c652685

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功