提交 · 3639efef8fb128f99ae38bf603189639efc3850d · openeuler / Kernel

01 11月, 2017 13 次提交

由 Keith Busch 提交于 10月 20, 2017

Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3639efef

nvme-fc: add dev_loss_tmo timeout and remoteport resume support · 2b632970

由 James Smart 提交于 10月 25, 2017

When a remoteport is unregistered (connectivity lost), the following
actions are taken:

 - the remoteport is marked DELETED
 - the time when dev_loss_tmo would expire is set in the remoteport
 - all controllers on the remoteport are reset.

After a controller resets, it will stall in a RECONNECTING state waiting
for one of the following:

 - the controller will continue to attempt reconnect per max_retries and
   reconnect_delay.  As no remoteport connectivity, the reconnect attempt
   will immediately fail.  If max reconnects has not been reached, a new
   reconnect_delay timer will be schedule.  If the current time plus
   another reconnect_delay exceeds when dev_loss_tmo expires on the remote
   port, then the reconnect_delay will be shortend to schedule no later
   than when dev_loss_tmo expires.  If max reconnect attempts are reached
   (e.g. ctrl_loss_tmo reached) or dev_loss_tmo ix exceeded without
   connectivity, the controller is deleted.
 - the remoteport is re-registered prior to dev_loss_tmo expiring.
   The resume of the remoteport will immediately attempt to reconnect
   each of its suspended controllers.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
[hch: updated to use nvme_delete_ctrl]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2b632970

nvme: allow controller RESETTING to RECONNECTING transition · 3cec7f9d

由 James Smart 提交于 10月 25, 2017

Transport will typically transition from LIVE to RESETTING when initially
performing a reset or recovering from an error.  Adding this transition
allows a transport to transition to RECONNECTING when it checks/waits for
connectivity then creates new transport connections and reinits the
controller.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3cec7f9d

nvme-fc: check connectivity before initiating reconnects · 96e24801

由 James Smart 提交于 10月 25, 2017

Check remoteport connectivity before initiating reconnects
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

96e24801

nvme-fc: add a dev_loss_tmo field to the remoteport · ac7fe82b

由 James Smart 提交于 10月 25, 2017

Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device
connectivity loss.

The transport initializes the value in the nvme_fc_register_remoteport()
call. If the value is not set, a default of 60s is set.

Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine,
which allows the lldd to dynamically update the value on an existing
remoteport.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ac7fe82b

nvme-fc: change ctlr state assignments during reset/reconnect · 44c6ec77

由 James Smart 提交于 10月 25, 2017

Clean up some of the controller state checks and add the
RESETTING->RECONNECTING state transition.

Specifically:
- the movement of the RESETTING state change and schedule of reset_work
  to core doesn't work wiht nvme_fc_error_recovery setting state to
  RECONNECTING before attempting to reset.  Remove the state change as
  the reset request does it.
- In the rare cases where an error occurs right as we're transitioning
  to LIVE, defer the controller start actions.
- In error handling on teardown of associations while performing initial
  controller creation - avoid quiesce calls on the admin_q.  They are
  unneeded.
- Add the RESETTING->RECONNECTING transition in the reset handler.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

44c6ec77

nvme: flush reset_work before safely continuing with delete operation · 4054637c

由 Sagi Grimberg 提交于 10月 29, 2017

Prevent racing controller reset and delete flows. reset_work must not
ever self-requeue so flushing it suffices.
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4054637c

nvme-rdma: reuse nvme_delete_ctrl when reconnect attempts expire · 12fa1304

由 Sagi Grimberg 提交于 10月 29, 2017

instead of just queueing delete work
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

12fa1304

nvme: consolidate common code from ->reset_work · 6cd53d14

由 Christoph Hellwig 提交于 10月 29, 2017

No change in behavior except that the FC code cancels two work items a
little later now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6cd53d14

nvme-rdma: remove nvme_rdma_remove_ctrl · e9bc2587

由 Christoph Hellwig 提交于 10月 29, 2017

It is only used in two places, and some of the work done by it will
be taken into common code soon.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e9bc2587

nvme: move controller deletion to common code · c5017e85

由 Christoph Hellwig 提交于 10月 29, 2017

Move the ->delete_work and the associated helpers to common code instead
of duplicating them in every driver.  This also adds the missing reference
get/put for the loop driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c5017e85

nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrl · 29c09648

由 Christoph Hellwig 提交于 10月 29, 2017

No need to have two functions doing the same thing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

29c09648

nvme-fc: avoid workqueue flush stalls · 71c691fd

由 James Smart 提交于 9月 28, 2017

There's no need to wait for the full nvme_wq, which is now shared,
to flush. flush only the delete_work item.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sgi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

71c691fd

27 10月, 2017 10 次提交

nvme-fc: remove NVME_FC_MAX_SEGMENTS · ecad0d2c

由 James Smart 提交于 10月 23, 2017

The define is an arbitrary limit to the io size on the initiator,
capping the io to 1MB-4KB.

Remove the define from the transport. I/O size will solely be limited
by the LLDD sg limits.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ecad0d2c

nvme-fc: add support for duplicate_connect option · 56d5f4f1

由 James Smart 提交于 10月 20, 2017

Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same host port
and target port for the same host (hostnqn, hostid) to the same
subsystem. Fails the connection request if an existing controller.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

56d5f4f1

nvme-rdma: add support for duplicate_connect option · 36e835f2

由 James Smart 提交于 10月 20, 2017

Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same target
address (traddr), target port (trsvcid), and if specified, host
address (host_traddr). Fails the connection request if there is
an existing controller.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

36e835f2

nvme: add helper to compare options to controller · 991231dc

由 James Smart 提交于 10月 20, 2017

Adds a helper function that compares the host and subsytem
specified in a connect options list vs a controller.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

991231dc

nvme: add duplicate_connect option · 3b338762

由 James Smart 提交于 10月 20, 2017

Add the "duplicate_connect" boolean option (presence means true).
Default is false.

When false, the transport should validate whether a new controller request
is targeted for the same host transport addressing and target transport
addressing as an existing controller. If so, the new controller request
should be rejected.

When true, the callee is explicitly requesting a duplicate controller
connection to be made and the new request should be attempted.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3b338762

nvme: check for a live controller in nvme_dev_open · 999ada28

由 Christoph Hellwig 提交于 10月 18, 2017

This is a much more sensible check than just the admin queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@rimbeg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

999ada28

nvme: get rid of nvme_ctrl_list · a6a5149b

由 Christoph Hellwig 提交于 10月 18, 2017

Use the core chrdev code to set up the link between the character device
and the nvme controller.  This allows us to get rid of the global list
of all controllers, and also ensures that we have both a reference to
the controller and the transport module before the open method of the
character device is called.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sgi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>

a6a5149b

nvme: switch controller refcounting to use struct device · d22524a4

由 Christoph Hellwig 提交于 10月 18, 2017

Instead of allocating a separate struct device for the character device
handle embedd it into struct nvme_ctrl and use it for the main controller
refcounting.  This removes double refcounting and gets us an automatic
reference for the character device operations.  We keep ctrl->device as a
pointer for now to avoid chaning printks all over, but in the future we
could look into message printing helpers that take a controller structure
similar to what other subsystems do.

Note the delete_ctrl operation always already has a reference (either
through sysfs due this change, or because every open file on the
/dev/nvme-fabrics node has a refernece) when it is entered now, so we
don't need to do the unless_zero variant there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>

d22524a4

nvme: simplify nvme_open · c6424a90

由 Christoph Hellwig 提交于 10月 18, 2017

Now that we are protected against lookup vs free races for the namespace
by using kref_get_unless_zero we don't need the hack of NULLing out the
disk private data during removal.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

c6424a90

nvme: use kref_get_unless_zero in nvme_find_get_ns · 2dd41228

由 Christoph Hellwig 提交于 10月 18, 2017

For kref_get_unless_zero to protect against lookup vs free races we need
to use it in all places where we aren't guaranteed to already hold a
reference.  There is no such guarantee in nvme_find_get_ns, so switch to
kref_get_unless_zero in this function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

2dd41228

23 10月, 2017 2 次提交

nvme-rdma: Add debug message when reaches timeout · e62a538d

由 Nitzan Carmi 提交于 10月 22, 2017

Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e62a538d

M
nvme-rdma: align nvme_rdma_device structure · f87c89ad
由 Max Gurtovoy 提交于 10月 23, 2017
```
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
f87c89ad

20 10月, 2017 4 次提交

nvme-fc: correct io timeout behavior · 134aedc9

由 James Smart 提交于 10月 19, 2017

The transport io timeout behavior wasn't quite correct. It ignored
that the io error handler is supposed to be synchronous so it possibly
allowed the blk request to be restarted while the io associated was
still aborting. Timeouts on reserved commands, those used for
association create, were never timing out thus they hung out forever.

To correct:
If an io is times out while a remoteport is not connected, just
restart the io timer. The lack of connectivity will simultaneously
be resetting the controller, so the reset path will abort and terminate
the io.

If an io is times out while it was marked for transport abort, just
reset the io timer. The abort process is underway and will complete
the io.

Otherwise, if an io times out, abort the io. If the abort was
unsuccessful (unlikely) give up and return not handled.

If the abort was successful, as the abort process is underway it will
terminate the io, so rather than synchronously waiting, just restart
the io timer.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

134aedc9

nvme-fc: correct io termination handling · 0a02e39f

由 James Smart 提交于 10月 19, 2017

The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.

Change the io completion handler for the following items:

When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.

In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).

Rearranged code flow to remove a largely unneeded local variable.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0a02e39f

nvme-pci: add SGL support · a7a7cbe3

由 Chaitanya Kulkarni 提交于 10月 16, 2017

This adds SGL support for NVMe PCIe driver, based on an earlier patch
from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch
refactors the original code and adds new module parameter sgl_threshold
to determine whether to use SGL or PRP for IOs.

The usage of SGLs is controlled by the sgl_threshold module parameter,
which allows to conditionally use SGLs if average request segment
size (avg_seg_size) is greater than sgl_threshold. In the original patch,
the decision of using SGLs was dependent only on the IO size,
with the new approach we consider not only IO size but also the
number of physical segments present in the IO.

We calculate avg_seg_size based on request payload bytes and number
of physical segments present in the request.

For e.g.:-

1. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

2. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k
avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.

3. blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a7a7cbe3

nvme: use ida_simple_{get,remove} for the controller instance · 9843f685

由 Christoph Hellwig 提交于 10月 18, 2017

Switch to the ida_simple_* helpers instead of opencoding them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

9843f685

19 10月, 2017 11 次提交

nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set · 5a22e2bf

由 Israel Rukshin 提交于 10月 18, 2017

Since commit b86dd815
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart  <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5a22e2bf

nvme-rdma: Add BLK_MQ_F_NO_SCHED flag to admin tag set · 94f29d4f

由 Israel Rukshin 提交于 10月 18, 2017

Since commit b86dd815
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

94f29d4f

nvme-pci: fix typos in comments · 16772ae6

由 Minwoo Im 提交于 10月 18, 2017

fixed comment typos in adapter_alloc_cq() and adapter_alloc_sq().
'the the' duplications are replaced with 'that the'.
Signed-off-by: NMinwoo Im <dn3108@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

16772ae6

nvme-rdma: stop controller reset if the controller is deleting · 0ad0bfa2