提交 · 842594c8775b585c58459e044708c0335b6aa6b7 · openanolis / cloud-kernel

06 7月, 2017 4 次提交

nvme-rdma: unconditionally recycle the request mr · 842594c8

由 Sagi Grimberg 提交于 7月 05, 2017

When our RDMA queue-pair is torn down with high load
of I/O traffic, we have no way of knowing if the
memory region was actually registered by the reg_mr
work request as it completion flushes with error (hw
might have done it or not).

So in order to not deal with all this uncertanty, we
simply recycle the MR in reinit_request.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

842594c8

nvme: split nvme_uninit_ctrl into stop and uninit · d09f2b45

由 Sagi Grimberg 提交于 7月 02, 2017

Usually before we teardown the controller we want to:
1. complete/cancel any ctrl inflight works
2. remove ctrl namespaces (only for removal though, resets
   shouldn't remove any namespaces).

but we do not want to destroy the controller device as
we might use it for logging during the teardown stage.

This patch adds nvme_start_ctrl() which queues inflight
controller works (aen, ns scan, queue start and keep-alive
if kato is set) and nvme_stop_ctrl() which cancels the works
namespace removal is left to the callers to handle.

Move nvme_uninit_ctrl after we are done with the
controller device.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d09f2b45

nvme-rdma: quiesce/unquiesce admin_q instead of start/stop its hw queues · fb051339

由 Sagi Grimberg 提交于 7月 02, 2017

unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues
quiescing/unquiescing respects the submission path rcu grace.
Also make sure to kick the requeue list when appropriate.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

fb051339

nvme-rdma: remove race conditions from IB signalling · 5e599d73

由 Marta Rybczynska 提交于 6月 06, 2017

This patch improves the way the RDMA IB signalling is done by using atomic
operations for the signalling variable. This avoids race conditions on
sig_count.

The signalling interval changes slightly and is now the largest power of
two not larger than queue depth / 2.

ilog() usage idea by Bart Van Assche.
Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org

5e599d73

04 7月, 2017 1 次提交

nvme-rdma: update tagset nr_hw_queues after reconnecting/resetting · 4c8b99f6

由 Sagi Grimberg 提交于 6月 29, 2017

We might have more/less queues once we reconnect/reset. For
example due to cpu going online/offline or controller constraints.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

4c8b99f6

02 7月, 2017 2 次提交

nvme: move ctrl cap to struct nvme_ctrl · 20d0dfe6

由 Sagi Grimberg 提交于 6月 27, 2017

All transports use either a private cache of controller cap or an on-stack
copy, move it to the generic struct nvme_ctrl. In the future it will also
be maintained by the core.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

20d0dfe6

nvme: move queue_count to the nvme_ctrl · d858e5f0

由 Sagi Grimberg 提交于 4月 24, 2017

All all transports use the queue_count in exactly the same, so move it to
the generic struct nvme_ctrl. In the future it will also be maintained by
the core.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-By: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d858e5f0

28 6月, 2017 2 次提交

nvme: read the subsystem NQN from Identify Controller · 180de007

由 Christoph Hellwig 提交于 6月 26, 2017

NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures.  Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it.  For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

180de007

nvme: use a single NVME_AQ_DEPTH and relax it to 32 · 7aa1f427

由 Sagi Grimberg 提交于 6月 18, 2017

No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7aa1f427

15 6月, 2017 11 次提交

nvme: move reset workqueue handling to common code · d86c4d8e

由 Christoph Hellwig 提交于 6月 15, 2017

This moves the nvme_reset function from the PCIe driver to common code,
renaming it to nvme_reset_ctrl in the process. Additionally a new
helper nvme_reset_ctrl_sync is added for the case where we want to
wait for the reset. To facilitate that the reset_work work structure is
move to the common nvme_ctrl structure and the ->reset_ctrl method is
removed. For now the drivers initialize the reset_work with their own
callback, but longer term we should move to callouts for specific
parts of the reset process and move even more code to the core.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

d86c4d8e

nvme-rdma: merge init_request and exit_request methods · 385475ee

由 Christoph Hellwig 提交于 6月 13, 2017

Now that we get the tagset passed we can have a single implementation for
the I/O and admin queues.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

385475ee

nvme-rdma: fix error code in nvme_rdma_create_ctrl() · bb472baa

由 Dan Carpenter 提交于 6月 14, 2017

We accidentally return ERR_PTR(0) which is NULL.  The caller isn't
explicitly checking for that but I couldn't immediately spot whether
this would lead to a NULL dereference.  Anyway, we can fix add an
error code easily enough.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bb472baa

nvme: move nr_reconnects to nvme_ctrl · fdf9dfa8

由 Sagi Grimberg 提交于 5月 04, 2017

It is not a user option but rather a variable controller
attribute.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

fdf9dfa8

nvme: Move transports to use nvme-core workqueue · 9a6327d2

由 Sagi Grimberg 提交于 6月 07, 2017

Instead of each transport using it's own workqueue, export
a single nvme-core workqueue and use that instead.

In the future, this will help us moving towards some unification
if controller setup/teardown flows.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9a6327d2

nvme-rdma: Get rid of CONNECTED state · b282a88d

由 Sagi Grimberg 提交于 5月 04, 2017

We only care about if the queue is LIVE for request submission,
so no need for CONNECTED.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b282a88d

nvme-rdma: rework rdma connection establishment error path · abf87d5e

由 Sagi Grimberg 提交于 5月 04, 2017

Instead of introducing a flag for if the queue is allocated,
simply free the rdma resources when we get the error.

We allocate the queue rdma resources when we have an address
resolution, their we allocate (or take a reference on) our device
so we should free it when we have error after the address resolution
namely:
1. route resolution error
2. connect reject
3. connect error
4. peer unreachable error
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

abf87d5e

nvme-rdma: make nvme_rdma_[create|destroy]_queue_ib symmetrical · ca6e95bb

由 Sagi Grimberg 提交于 5月 04, 2017

We put the reference on the device in the destroy routine
so we should lookup and take the reference in the create
routine.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca6e95bb

nvme-rdma: Don't rearm the CQ when polling directly · c8295d11

由 Sagi Grimberg 提交于 5月 04, 2017

We don't need it as the core polling context will take
are of rearming the completion queue.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c8295d11

nvme-rdma: Make queue flags bit numbers and not shifts · dc5bc6a9

由 Sagi Grimberg 提交于 5月 04, 2017

bitops accept bit numbers.
Reported-by: NVijay Immanuel <vijayi@attalasystems.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

dc5bc6a9

nvme-rdma: get rid of unused ctrl lock · 3dee63c7

由 Sagi Grimberg 提交于 5月 04, 2017

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3dee63c7

13 6月, 2017 1 次提交

nvme-rdma: fix merge error · a104c9f2

由 Christoph Hellwig 提交于 6月 12, 2017

The merge of 4.12-rc5 into the for-4.13/block tree didn't handle the queue
ready case correctly.  Fix this by propagating blk_status_t into
nvme_rdma_queue_is_ready.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

a104c9f2

09 6月, 2017 1 次提交

blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653

由 Christoph Hellwig 提交于 6月 03, 2017

Use the same values for use for request completion errors as the return
value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fc17b653

07 6月, 2017 1 次提交

nvme-rdma: fast fail incoming requests while we reconnect · e818a5b4

由 Sagi Grimberg 提交于 6月 05, 2017

When we encounter an transport/controller errors, error recovery
kicks in which performs:
1. stops io/admin queues
2. moves transport queues out of LIVE state
3. fast fail pending io
4. schedule periodic reconnects.

But we also need to fast fail incoming IO taht enters after we
already scheduled. Given that our queue is not LIVE anymore, simply
restart the request queues to fail in .queue_rq
Reported-by: NAlex Turin <alex@vastdata.com>
Reported-by: Nshahar.salzman <shahar.salzman@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org

e818a5b4

26 5月, 2017 1 次提交

nvme: replace is_flags field in nvme_ctrl_ops with a flags field · d3d5b87d

由 Christoph Hellwig 提交于 5月 20, 2017

So that we can have more flags for transport-specific behavior.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

d3d5b87d

23 5月, 2017 1 次提交

nvme-rdma: support devices with queue size < 32 · 0544f549

由 Marta Rybczynska 提交于 4月 10, 2017

In the case of small NVMe-oF queue size (<32) we may enter a deadlock
caused by the fact that the IB completions aren't sent waiting for 32
and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signaling is done so that it depends on
the queue depth now. The magic define has been removed completely.

Cc: stable@vger.kernel.org
Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: NSamuel Jones <sjones@kalray.eu>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0544f549

02 5月, 2017 1 次提交

blk-mq: update ->init_request and ->exit_request prototypes · d6296d39

由 Christoph Hellwig 提交于 5月 01, 2017

Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq.  Except for a superflous check in
mtip32xx it was unused anyway.

Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d6296d39

21 4月, 2017 1 次提交

nvme: split nvme status from block req->errors · 27fa9bc5

由 Christoph Hellwig 提交于 4月 20, 2017

We want our own clearly defined error field for NVMe passthrough commands,
and the request errors field is going away in its current form.

Just store the status and result field in the nvme_request field from
hardirq completion context (using a new helper) and then generate a
Linux errno for the block layer only when we actually need it.

Because we can't overload the status value with a negative error code
for cancelled command we now have a flags filed in struct nvme_request
that contains a bit for this condition.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

27fa9bc5

10 4月, 2017 1 次提交

nvme-rdma: Fix sqsize wrong assignment based on ctrl MQES capability · 1af76dda

由 Sagi Grimberg 提交于 4月 06, 2017

both our sqsize and the controller MQES cap are a 0 based value,
so making it 1 based is wrong.
Reported-by: NTrapp, Darren <Darren.Trapp@cavium.com>
Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

1af76dda

04 4月, 2017 8 次提交

nvme: factor request completion code into a common helper · 77f02a7a

由 Christoph Hellwig 提交于 3月 30, 2017

This avoids duplicating the logic four times, and it also allows to keep
some helpers static in core.c or just opencode them.

Note that this loses printing the aborted status on completions in the
PCI driver as that uses a data structure not available any more.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

77f02a7a

nvme-rdma: increment request retries counter before requeuing · e806666e

由 Sagi Grimberg 提交于 3月 29, 2017

This way our max retry limit holds as well.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e806666e

nvme-rdma: Support ctrl_loss_tmo · fd8563ce

由 Sagi Grimberg 提交于 3月 18, 2017

Before scheduling a reconnect attempt, check
nr_reconnects against max_reconnects, if not
exhausted (or max_reconnects is not -1), schedule
a reconnect attempts, otherwise schedule ctrl
removal.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

fd8563ce

nvme-rdma: get rid of local reconnect_delay · 7777bded

由 Sagi Grimberg 提交于 3月 18, 2017

we already have it in opts.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

7777bded

nvme-rdma: fix module_init (theoretical) error path · a56c79cf

由 Sagi Grimberg 提交于 3月 19, 2017

If nvmf_register_transport happened to fail
(it can't, but theoretically) we leak memory.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

a56c79cf

nvme-rdma: use inet_pton_with_scope helper · 0928f9b4

由 Sagi Grimberg 提交于 2月 05, 2017

Both the destination and the host addresses are now
parsed using inet_pton_with_scope helper. We also
get ipv6 (with address scopes support).
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

0928f9b4

nvme-rdma: Give some more grace for rdma connection establishment · 782d820c

由 Sagi Grimberg 提交于 3月 21, 2017

The target might be occupied with multiple hosts so lets
give it some more grace before failing the connection
establishment.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

782d820c

nvme-rdma: handle cpu unplug when re-establishing the controller · dc2ad16a

由 Sagi Grimberg 提交于 3月 09, 2017

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

dc2ad16a

31 3月, 2017 1 次提交

blk-mq: constify struct blk_mq_ops · f363b089

由 Eric Biggers 提交于 3月 30, 2017

Constify all instances of blk_mq_ops, as they are never modified.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f363b089

22 3月, 2017 1 次提交

nvme-rdma: handle cpu unplug when re-establishing the controller · c248c643

由 Sagi Grimberg 提交于 3月 09, 2017

If a cpu unplug event has occured, we need to take the minimum
of the provided nr_io_queues and the number of online cpus,
otherwise we won't be able to connect them as blk-mq mapping
won't dispatch to those queues.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c248c643

28 2月, 2017 1 次提交

scripts/spelling.txt: add "embeded" pattern and fix typo instances · b43daedc

由 Masahiro Yamada 提交于 2月 27, 2017

Fix typos and add the following to the scripts/spelling.txt:

embeded||embedded

Link: http://lkml.kernel.org/r/1481573103-11329-12-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b43daedc

23 2月, 2017 1 次提交

nvme-rdma: add support for host_traddr · 8f4e8dac

由 Max Gurtovoy 提交于 2月 19, 2017

This will enable the user to control the specific interface for
connection establishment in case the host has more than 1 interface
under the same subnet.
E.g:
Host interfaces configured as:
 - ib0 1.1.1.1/16
 - ib1 1.1.1.2/16

Target interfaces configured as:
 - ib0 1.1.1.3/16 (listener interface)
 - ib1 1.1.1.4/16

the following connect command will go through host iface ib0 (default):
nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023

but the following command will go through host iface ib1:
nvme connect -t rdma -n testsubsystem -a 1.1.1.3 -s 1023 -w 1.1.1.2
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

8f4e8dac

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功