提交 · 7db814465395f3196ee98c8bd40d214d63e4f708 · openeuler / Kernel

23 10月, 2017 1 次提交

nvme-rdma: fix possible hang when issuing commands during ctrl removal · 7db81446

由 Sagi Grimberg 提交于 10月 23, 2017

nvme_rdma_queue_is_ready() fails requests in case a queue is not
LIVE. If the controller is in RECONNECTING state, we might be in
this state for a long time (until we successfully reconnect) and
we are better off with failing the request fast. Otherwise, we
fail with BLK_STS_RESOURCE to have the block layer try again
soon.

In case we are removing the controller when the admin queue
is not LIVE, we will terminate the request with BLK_STS_RESOURCE
but it happens before we call blk_mq_start_request() so the
request timeout never expires, and the queue will never get
back to LIVE (because we are removing the controller). This
causes the removal operation to block infinitly [1].

Thus, if we are removing (state DELETING), and the queue is
not LIVE, we need to fail the request permanently as there is
no chance for it to ever complete successfully.

[1]
--
sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/u66:2   D    0   440      2 0x80000000
Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x221/0x580
 io_schedule_timeout+0x1e/0x50
 wait_for_completion_io_timeout+0x118/0x180
 blk_execute_rq+0x86/0xc0
 __nvme_submit_sync_cmd+0x89/0xf0
 nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
 nvme_shutdown_ctrl+0x41/0xe0
 nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
 nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
 nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
 process_one_work+0x1fd/0x630
 worker_thread+0x1db/0x3b0
 kthread+0x11e/0x150
 ret_from_fork+0x27/0x40
01              D    0  2868   2862 0x00000000
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x260/0x580
 wait_for_completion+0x108/0x170
 flush_work+0x1e0/0x270
 nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
 nvme_sysfs_delete+0x2a/0x40
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x45/0x60
 kernfs_fop_write+0x124/0x1c0
 __vfs_write+0x28/0x150
 vfs_write+0xc7/0x1b0
 SyS_write+0x49/0xa0
 entry_SYSCALL_64_fastpath+0x18/0xad
--
Reported-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7db81446

19 10月, 2017 2 次提交

nvme-rdma: Fix error status return in tagset allocation failure · f04b9cc8

由 Sagi Grimberg 提交于 10月 19, 2017

We should make sure to escelate allocation failures to prevent a
use-after-free in nvmf_create_ctrl.

Fixes: b28a308e ("nvme-rdma: move tagset allocation to a dedicated routine")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f04b9cc8

nvme-rdma: Fix possible double free in reconnect flow · bd9f0759

由 Sagi Grimberg 提交于 10月 19, 2017

The fact that we free the async event buffer in
nvme_rdma_destroy_admin_queue can cause us to free it
more than once because this happens in every reconnect
attempt since commit 31fdf184. we rely on the queue
state flags DELETING to avoid this for other resources.

A more complete fix is to not destroy the admin/io queues
unconditionally on every reconnect attempt, but its a bit
more extensive and will go in the next release.

Fixes: 31fdf184 ("nvme-rdma: reuse configure/destroy_admin_queue")
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bd9f0759

26 9月, 2017 2 次提交

nvme-rdma: don't fully stop the controller in error recovery · e4d753d7

由 Sagi Grimberg 提交于 9月 21, 2017

By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.
Reported-by: NYi Zhang <yizhan@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e4d753d7

nvme-rdma: give up reconnect if state change fails · 0a960afd

由 Sagi Grimberg 提交于 9月 21, 2017

If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a960afd

30 8月, 2017 1 次提交

nvme-rdma: default MR page size to 4k · b925a2dc

由 Max Gurtovoy 提交于 8月 28, 2017

Due to various page sizes in the system (IOMMU/device/kernel), we
set the fabrics controller page size to 4k and block layer boundaries
accordinglly. In architectures that uses different kernel page size
we'll have a mismatch to the MR page size that may cause a mapping error.
Update the MR page size to correspond to the core ctrl settings.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b925a2dc

29 8月, 2017 14 次提交

nvme-rdma: Use unlikely macro in the fast path · a7b7c7a1

由 Max Gurtovoy 提交于 8月 14, 2017

This patch slightly improves performance (mainly for small block sizes).
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a7b7c7a1

nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64 · 09fdc23b

由 Sagi Grimberg 提交于 7月 10, 2017

To make the nvme_rdma_configure_admin_queue generic in preparation of
moving it to common code.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

09fdc23b

nvme-rdma: cleanup error path in controller reset · 370ae6e4

由 Sagi Grimberg 提交于 7月 10, 2017

No need to queue an extra work to indirect controller removal, just call the
ctrl remove routine.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

370ae6e4

nvme-rdma: introduce nvme_rdma_start_queue · 68e16fcf

由 Sagi Grimberg 提交于 7月 10, 2017

This should pair with nvme_rdma_stop_queue. While this is not a complete
inverse, it still pairs up pretty well because in fabrics we don't have a
disconnect capsule (yet) but we simply teardown the transport association.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

68e16fcf

nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue · 41e8cfa1

由 Sagi Grimberg 提交于 7月 10, 2017

Give it a name symmetric to nvme_rdma_free_queue. Also pass in the ctrl
sqsize+1 and not the opts queue_size.  And suppress a superflous
failure message.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

41e8cfa1

nvme-rdma: stop queues instead of simply flipping their state · 148b4e7f

由 Sagi Grimberg 提交于 7月 10, 2017

If we move the queues from LIVE state, we might as well stop them (drain
for rdma).  Do it after we stop the request queues to prevent a stray
request sneaking in .queue_rq after we stop the queue.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

148b4e7f

nvme-rdma: introduce configure/destroy io queues · a57bd541

由 Sagi Grimberg 提交于 8月 28, 2017

Make a symmetrical handling with admin queue.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a57bd541

nvme-rdma: reuse configure/destroy_admin_queue · 31fdf184

由 Sagi Grimberg 提交于 8月 28, 2017

No need to open-code it.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

31fdf184

nvme-rdma: don't free tagset on resets · 3f02fffb

由 Sagi Grimberg 提交于 7月 10, 2017

We're not supposed to do that.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3f02fffb

nvme-rdma: disable the controller on resets · 18398af2

由 Sagi Grimberg 提交于 7月 10, 2017

Mimic the pci driver as a controller disable might be more lightweight
than a shutdown.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

18398af2

nvme-rdma: move tagset allocation to a dedicated routine · b28a308e

由 Sagi Grimberg 提交于 7月 10, 2017

We always pair tagset allocation with rdma device reference and it shares
some code, centralize it with an argument if its an admin or IO tagset.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b28a308e

nvme: Add admin_tagset pointer to nvme_ctrl · 34b6c231

由 Sagi Grimberg 提交于 7月 10, 2017

Will be used when we centralize control flows.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

34b6c231

nvme-rdma: move nvme_rdma_configure_admin_queue code location · 90af3512

由 Sagi Grimberg 提交于 7月 10, 2017

We will call it from other places so avoid having to forward declare it.
Also move it next to nvme_rdma_destroy_admin_queue.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

90af3512

nvme-rdma: remove NVME_RDMA_MAX_SEGMENT_SIZE · 4897ad4e

由 Johannes Thumshirn 提交于 8月 03, 2017

NVME_RDMA_MAX_SEGMENT_SIZE is not used anywhere, zap it.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4897ad4e

18 8月, 2017 2 次提交

nvme-rdma: remove redundant empty device add callout · 5138e4bd

由 Sagi Grimberg 提交于 7月 02, 2017

Now that its not needed, we can simply not assign it.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5138e4bd

blk-mq: Make blk_mq_reinit_tagset() calls easier to read · d352ae20

由 Bart Van Assche 提交于 8月 17, 2017

Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d352ae20

09 8月, 2017 1 次提交

nvme-rdma: use intelligent affinity based queue mappings · 0b36658c

由 Sagi Grimberg 提交于 7月 13, 2017

Use the generic block layer affinity mapping helper. Also,
limit nr_hw_queues to the rdma device number of irq vectors
as we don't really need more.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b36658c

06 7月, 2017 4 次提交

nvme-rdma: unconditionally recycle the request mr · 842594c8

由 Sagi Grimberg 提交于 7月 05, 2017

When our RDMA queue-pair is torn down with high load
of I/O traffic, we have no way of knowing if the
memory region was actually registered by the reg_mr
work request as it completion flushes with error (hw
might have done it or not).

So in order to not deal with all this uncertanty, we
simply recycle the MR in reinit_request.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

842594c8

nvme: split nvme_uninit_ctrl into stop and uninit · d09f2b45

由 Sagi Grimberg 提交于 7月 02, 2017

Usually before we teardown the controller we want to:
1. complete/cancel any ctrl inflight works
2. remove ctrl namespaces (only for removal though, resets
   shouldn't remove any namespaces).

but we do not want to destroy the controller device as
we might use it for logging during the teardown stage.

This patch adds nvme_start_ctrl() which queues inflight
controller works (aen, ns scan, queue start and keep-alive
if kato is set) and nvme_stop_ctrl() which cancels the works
namespace removal is left to the callers to handle.

Move nvme_uninit_ctrl after we are done with the
controller device.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d09f2b45

nvme-rdma: quiesce/unquiesce admin_q instead of start/stop its hw queues · fb051339

由 Sagi Grimberg 提交于 7月 02, 2017

unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues
quiescing/unquiescing respects the submission path rcu grace.
Also make sure to kick the requeue list when appropriate.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

fb051339

nvme-rdma: remove race conditions from IB signalling · 5e599d73

由 Marta Rybczynska 提交于 6月 06, 2017

This patch improves the way the RDMA IB signalling is done by using atomic
operations for the signalling variable. This avoids race conditions on
sig_count.

The signalling interval changes slightly and is now the largest power of
two not larger than queue depth / 2.

ilog() usage idea by Bart Van Assche.
Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org

5e599d73

04 7月, 2017 1 次提交

nvme-rdma: update tagset nr_hw_queues after reconnecting/resetting · 4c8b99f6

由 Sagi Grimberg 提交于 6月 29, 2017

We might have more/less queues once we reconnect/reset. For
example due to cpu going online/offline or controller constraints.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

4c8b99f6

02 7月, 2017 2 次提交

nvme: move ctrl cap to struct nvme_ctrl · 20d0dfe6

由 Sagi Grimberg 提交于 6月 27, 2017

All transports use either a private cache of controller cap or an on-stack
copy, move it to the generic struct nvme_ctrl. In the future it will also
be maintained by the core.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

20d0dfe6

nvme: move queue_count to the nvme_ctrl · d858e5f0

由 Sagi Grimberg 提交于 4月 24, 2017

All all transports use the queue_count in exactly the same, so move it to
the generic struct nvme_ctrl. In the future it will also be maintained by
the core.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-By: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d858e5f0

28 6月, 2017 2 次提交

nvme: read the subsystem NQN from Identify Controller · 180de007

由 Christoph Hellwig 提交于 6月 26, 2017

NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures.  Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it.  For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

180de007

nvme: use a single NVME_AQ_DEPTH and relax it to 32 · 7aa1f427

由 Sagi Grimberg 提交于 6月 18, 2017

No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7aa1f427

15 6月, 2017 8 次提交

nvme: move reset workqueue handling to common code · d86c4d8e

由 Christoph Hellwig 提交于 6月 15, 2017

This moves the nvme_reset function from the PCIe driver to common code,
renaming it to nvme_reset_ctrl in the process. Additionally a new
helper nvme_reset_ctrl_sync is added for the case where we want to
wait for the reset. To facilitate that the reset_work work structure is
move to the common nvme_ctrl structure and the ->reset_ctrl method is
removed. For now the drivers initialize the reset_work with their own
callback, but longer term we should move to callouts for specific
parts of the reset process and move even more code to the core.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

d86c4d8e

nvme-rdma: merge init_request and exit_request methods · 385475ee

由 Christoph Hellwig 提交于 6月 13, 2017

Now that we get the tagset passed we can have a single implementation for
the I/O and admin queues.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

385475ee

nvme-rdma: fix error code in nvme_rdma_create_ctrl() · bb472baa

由 Dan Carpenter 提交于 6月 14, 2017

We accidentally return ERR_PTR(0) which is NULL.  The caller isn't
explicitly checking for that but I couldn't immediately spot whether
this would lead to a NULL dereference.  Anyway, we can fix add an
error code easily enough.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bb472baa

nvme: move nr_reconnects to nvme_ctrl · fdf9dfa8

由 Sagi Grimberg 提交于 5月 04, 2017

It is not a user option but rather a variable controller
attribute.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

fdf9dfa8

nvme: Move transports to use nvme-core workqueue · 9a6327d2

由 Sagi Grimberg 提交于 6月 07, 2017

Instead of each transport using it's own workqueue, export
a single nvme-core workqueue and use that instead.

In the future, this will help us moving towards some unification
if controller setup/teardown flows.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9a6327d2

nvme-rdma: Get rid of CONNECTED state · b282a88d

由 Sagi Grimberg 提交于 5月 04, 2017

We only care about if the queue is LIVE for request submission,
so no need for CONNECTED.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b282a88d

nvme-rdma: rework rdma connection establishment error path · abf87d5e

由 Sagi Grimberg 提交于 5月 04, 2017

Instead of introducing a flag for if the queue is allocated,
simply free the rdma resources when we get the error.

We allocate the queue rdma resources when we have an address
resolution, their we allocate (or take a reference on) our device
so we should free it when we have error after the address resolution
namely:
1. route resolution error
2. connect reject
3. connect error
4. peer unreachable error
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

abf87d5e

nvme-rdma: make nvme_rdma_[create|destroy]_queue_ib symmetrical · ca6e95bb

由 Sagi Grimberg 提交于 5月 04, 2017

We put the reference on the device in the destroy routine
so we should lookup and take the reference in the create
routine.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ca6e95bb

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功