提交 · 5f24410408fd093734ce758f2fe3a66fe543de22 · openanolis / cloud-kernel

24 9月, 2016 2 次提交

nvme-rdma: use IB_PD_UNSAFE_GLOBAL_RKEY · 11975e01

由 Christoph Hellwig 提交于 9月 05, 2016

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

11975e01

IB/core: add support to create a unsafe global rkey to ib_create_pd · ed082d36

由 Christoph Hellwig 提交于 9月 05, 2016

Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or
less unchecked, this moves the capability of creating a global rkey into
the RDMA core, where it can be easily audited.  It also prints a warning
everytime this feature is used as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ed082d36

23 9月, 2016 1 次提交

nvme-rdma: only clear queue flags after successful connect · 3b4ac786

由 Sagi Grimberg 提交于 9月 22, 2016

Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly
try to stop/free rdma qps/cm_ids that are already freed.

Fixes: e89ca58f ("nvme-rdma: add DELETING queue flag")
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

3b4ac786

15 9月, 2016 1 次提交

blk-mq: remove ->map_queue · 7d7e0f90

由 Christoph Hellwig 提交于 9月 14, 2016

All drivers use the default, so provide an inline version of it.  If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.

This provides better code generation, and better debugability as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7d7e0f90

13 9月, 2016 3 次提交

nvme-rdma: fix null pointer dereference on req->mr · 1bda18de

由 Colin Ian King 提交于 9月 05, 2016

If there is an error on req->mr, req->mr is set to null, however
the following statement sets req->mr->need_inval causing a null
pointer dereference.  Fix this by bailing out to label 'out' to
immediately return and hence skip over the offending null pointer
dereference.

Fixes: f5b7b559 ("nvme-rdma: Get rid of duplicate variable")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

1bda18de

nvme-rdma: use ib_client API to detect device removal · e87a911f

由 Steve Wise 提交于 9月 02, 2016

Change nvme-rdma to use the IB Client API to detect device removal.
This has the wonderful benefit of being able to blow away all the
ib/rdma_cm resources for the device being removed.  No craziness about
not destroying the cm_id handling the event.  No deadlocks due to broken
iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
around during controller recovery/reconnect to catch device removal
events.

We don't use the device_add aspect of the ib_client service since we only
want to create resources for an IB device if we have a target utilizing
that device.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

e87a911f

nvme-rdma: add DELETING queue flag · e89ca58f

由 Sagi Grimberg 提交于 9月 02, 2016

When we get a surprise disconnect from the target we queue a periodic
reconnect (which is the sane thing to do...).

We only move the queues out of CONNECTED when we retry to reconnect (after
10 seconds in the default case) but we stop the blk queues immediately
so we are not bothered with traffic from now on. If delete() is kicking
off in this period the queues are still in CONNECTED state.

Part of the delete sequence is trying to issue ctrl shutdown if the
admin queue is CONNECTED (which it is!). This request is issued but
stuck in blk-mq waiting for the queues to start again. This might be
the one preventing us from forward progress...

The patch separates the queue flags to CONNECTED and DELETING. Now we
will move out of CONNECTED as soon as error recovery kicks in (before
stopping the queues) and DELETING is on when we start the queue deletion.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

e89ca58f

04 9月, 2016 2 次提交

nvme-rdma: destroy nvme queue rdma resources on connect failure · f361e5a0

由 Steve Wise 提交于 9月 02, 2016

After address resolution, the nvme_rdma_queue rdma resources are
allocated.  If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed.  Otherwise, rdma resources are leaked.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimbrg.me>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

f361e5a0

nvme_rdma: keep a ref on the ctrl during delete/flush · cdbecc8d

由 Steve Wise 提交于 9月 01, 2016

Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimbrg.me>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

cdbecc8d

28 8月, 2016 2 次提交

nvme-rdma: Get rid of redundant defines · 4d8c6a79

由 Sagi Grimberg 提交于 8月 26, 2016

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4d8c6a79

nvme-rdma: Get rid of duplicate variable · f5b7b559

由 Sagi Grimberg 提交于 8月 24, 2016

We already have need_inval in ib_mr, lets use
that instead.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f5b7b559

18 8月, 2016 2 次提交

nvme-rdma: fix sqsize/hsqsize per spec · c5af8654

由 Jay Freyensee 提交于 8月 17, 2016

Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
a 0-based value.

Also per spec, the RDMA binding values shall be set
to sqsize, which makes hsqsize 0-based values.

Thus, the sqsize during NVMf connect() is now:

[root@fedora23-fabrics-host1 for-48]# dmesg
[  318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
admin queue: 31
[  318.720884] nvme nvme0: creating 16 I/O queues.
[  318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o
queue: 127

Finally, current interpretation implies hrqsize is 1's based
so set it appropriately.
Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c5af8654

fabrics: define admin sqsize min default, per spec · f994d9dc

由 Jay Freyensee 提交于 8月 17, 2016

Upon admin queue connect(), the rdma qp was being
set based on NVMF_AQ_DEPTH.  However, the fabrics layer was
using the sqsize field value set for I/O queues for the admin
queue, which threw the nvme layer and rdma layer off-whack:

root@fedora23-fabrics-host1 nvmf]# dmesg
[ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
being sent is: 128
[ 3507.798858] nvme nvme0: creating 16 I/O queues.
[ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
192.168.1.3:4420

Thus, to have a different admin queue value, we use
NVMF_AQ_DEPTH for connect() and RDMA private data
as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec
(and in that RDMA private data we treat hrqsize as 1's-based
value, per current understanding of the fabrics spec).
Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

f994d9dc

16 8月, 2016 1 次提交

nvme-rdma: initialize ret to zero to avoid returning garbage · 39bbee4e

由 Colin Ian King 提交于 8月 16, 2016

ret is not initialized so it contains garbage.  Ensure garbage
is not returned by initializing rc to 0.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

39bbee4e

04 8月, 2016 2 次提交

nvme-rdma: Remove unused includes · e3266378

由 Sagi Grimberg 提交于 8月 04, 2016

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

e3266378

nvme-rdma: start async event handler after reconnecting to a controller · 3ef1b4b2

由 Sagi Grimberg 提交于 8月 04, 2016

When we reset or reconnect to a controller, we are cancelling the
async event handler so we can safely re-establish resources, but we
need to remember to start it again when we successfully reconnect.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3ef1b4b2

03 8月, 2016 6 次提交

nvme-rdma: Make sure to shutdown the controller if we can · 45862ebc

由 Sagi Grimberg 提交于 7月 24, 2016

Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because
it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it).

Instead, check that the admin queue is connected. Note that it is safe
because we can never see a copmeting thread trying to destroy the admin
queue (reset or delete controller).
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

45862ebc

nvme-rdma: Free the I/O tags when we delete the controller · a34ca17a

由 Sagi Grimberg 提交于 7月 24, 2016

If we wait until we free the controller (free_ctrl) we might
lose our rdma device without any notification while we still
have open resources (tags mrs and dma mappings).

Instead, destroy the tags with their rdma resources once we
delete the device and not when freeing it.

Note that we don't do that in nvme_rdma_shutdown_ctrl because
controller reset uses it as well and we want to give active I/O
a chance to complete successfully.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a34ca17a

nvme-rdma: Remove duplicate call to nvme_remove_namespaces · 2461a8dd

由 Sagi Grimberg 提交于 7月 24, 2016

nvme_uninit_ctrl already does that for us. Note that we reordered
nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly
fine because we actually want ctrl uninit (aen, scan cancellation
and namespaces removal) to happen before we shutdown the rdma
resources.

Also, centralize the deletion work and the dead controller removal
work code duplication into __nvme_rdma_shutdown_ctrl that accepts
a shutdown boolean.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2461a8dd

nvme-rdma: Fix device removal handling · 57de5a0a

由 Sagi Grimberg 提交于 7月 14, 2016

Device removal sequence may have crashed because the
controller (and admin queue space) was freed before
we destroyed the admin queue resources. Thus we
want to destroy the admin queue and only then queue
controller deletion and wait for it to complete.

More specifically we:
1. own the controller deletion (make sure we are not
   competing with another deletion).
2. get rid of inflight reconnects if exists (which
   also destroy and create queues).
3. destroy the queue.
4. safely queue controller deletion (and wait for it
   to complete).
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

57de5a0a

nvme-rdma: Queue ns scanning after a sucessful reconnection · 5f372eb3

由 Sagi Grimberg 提交于 7月 31, 2016

On an ordered target shutdown, the target can send a AEN on a namespace
removal, this will trigger the host to queue ns-list query. The shutdown
will trigger error recovery which will attepmt periodic reconnect.

We can hit a race where the ns rescanning fails (error recovery kicked
in and we're not connected) causing removing all the namespaces and when
we reconnect we won't see any namespaces for this controller.

So, queue a namespace rescan after we successfully reconnected to the target.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5f372eb3

nvme-rdma: Don't leak uninitialized memory in connect request private data · 0b857b44

由 Roland Dreier 提交于 7月 31, 2016

Zero out the full nvme_rdma_cm_req structure before sending it.
Otherwise we end up leaking kernel memory in the reserved field, which
might break forward compatibility in the future.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

0b857b44

12 7月, 2016 2 次提交

nvme-rdma: Don't use tl_retry_count · 2ac17c28

由 Sagi Grimberg 提交于 6月 22, 2016

Always use the maximum qp retry count as the
error recovery timeout is dictated from the nvme
keep-alive.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

2ac17c28

nvme-rdma: fix the return value of nvme_rdma_reinit_request() · 458a9632

由 Wei Yongjun 提交于 7月 12, 2016

PTR_ERR should be applied before its argument is reassigned, otherwise the
return value will be set to 0, not error code.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

458a9632

08 7月, 2016 1 次提交

nvme-rdma: add a NVMe over Fabrics RDMA host driver · 71102307

由 Christoph Hellwig 提交于 7月 06, 2016

This patch implements the RDMA host (initiator in SCSI speak) driver.  It
can be used to connect to remote NVMe over Fabrics controllers over
Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well
a the new fabrics library.

To connect to all NVMe over Fabrics controller reachable on a given taget
port using RDMA/CM use the following command:

	nvme connect-all -t rdma -a $IPADDR

This requires the latest version of nvme-cli with Fabrics support.
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

71102307

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功