提交 · 45e3cc1a88bff18bbfe7d8bf4812ff56d9b21e5e · openeuler / Kernel

25 7月, 2018 1 次提交

nvme-rdma: Simplify ib_post_(send|recv|srq_recv)() calls · 45e3cc1a

由 Bart Van Assche 提交于 7月 18, 2018

Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

45e3cc1a

15 6月, 2018 1 次提交

nvme-fabrics: refactor queue ready check · 3bc32bb1

由 Christoph Hellwig 提交于 6月 11, 2018

Move the is_connected check to the fibre channel transport, as it has no
meaning for other transports.  To facilitate this split out a new
nvmf_fail_nonready_command helper that is called by the transport when
it is asked to handle a command on a queue that is not ready.

Also avoid a function call for the queue live fast path by inlining
the check.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>

3bc32bb1

11 6月, 2018 1 次提交

nvme-rdma: fix error flow during mapping request data · 94423a8f

由 Max Gurtovoy 提交于 6月 10, 2018

After dma mapping the sgl, we map the sgl to nvme sgl descriptor. In case
of failure during the last mapping we never dma unmap the sgl.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

94423a8f

09 6月, 2018 1 次提交

nvme-rdma: correctly check for target keyed sgl support · d4c68c7a

由 Steve Wise 提交于 6月 05, 2018

The code was checking bit 20 instead of bit 2.  Also fixed the log entry.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4c68c7a

29 5月, 2018 1 次提交

nvme: return BLK_EH_DONE from ->timeout · db8c48e4

由 Christoph Hellwig 提交于 5月 29, 2018

NVMe always completes the request before returning from ->timeout, either
by polling for it, or by disabling the controller.  Return BLK_EH_DONE so
that the block layer doesn't even try to complete it again.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db8c48e4

25 5月, 2018 1 次提交

nvme-rdma: stop admin queue before freeing it · 2e050f00

由 Jianchao Wang 提交于 5月 24, 2018

For any failure after nvme_rdma_start_queue in
nvme_rdma_configure_admin_queue, the admin queue will be freed with the
NVME_RDMA_Q_LIVE flag still set.  Once nvme_rdma_stop_queue is invoked,
that will cause a use-after-free.
BUG: KASAN: use-after-free in rdma_disconnect+0x1f/0xe0 [rdma_cm]

To fix it, call nvme_rdma_stop_queue for all the failed cases after
nvme_rdma_start_queue.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Suggested-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2e050f00

12 4月, 2018 1 次提交

nvme: expand nvmf_check_if_ready checks · bb06ec31

由 James Smart 提交于 4月 12, 2018

The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted. When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests. As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.e>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb06ec31

26 3月, 2018 4 次提交

nvme: Add .stop_ctrl to nvme ctrl ops · b435ecea

由 Nitzan Carmi 提交于 3月 20, 2018

For consistancy reasons, any fabric-specific works
(e.g error recovery/reconnect) should be canceled in
nvme_stop_ctrl, as for all other NVMe pending works
(e.g. scan, keep alive).

The patch aims to simplify the logic of the code, as
we now only rely on a vague demand from any fabric
to flush its private workqueues at the beginning of
.delete_ctrl op.
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b435ecea

nvme-rdma: Allow DELETING state change failure in error_recovery · 187c0832

由 Nitzan Carmi 提交于 3月 20, 2018

While error recovery is ongoing, it is OK to move
ctrl to DELETING state (from concurrent delete_work).
Thus we don't need a warning for that case.
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

187c0832

nvme-rdma: Don't flush delete_wq by default during remove_one · 9bad0404

由 Max Gurtovoy 提交于 2月 28, 2018

The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the work queue.
Reviewed-by: NIsrael Rukshin <israelr@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9bad0404

nvme: centralize ctrl removal prints · 77d0612d

由 Max Gurtovoy 提交于 3月 11, 2018

nvme_delete_ctrl can be called from various contexts in parallel,
and cause duplicated information prints, even though the specific
context doesn't perform the actual removal. Instead, print the
information when the actual removal occurs.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

77d0612d

22 2月, 2018 1 次提交

nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes · 0d309923

由 Christoph Hellwig 提交于 2月 22, 2018

blk_rq_bytes does the wrong thing for special payloads like discards and
might cause the driver to not set up a SGL.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

0d309923

14 2月, 2018 1 次提交

nvme-rdma: fix sysfs invoked reset_ctrl error flow · 8000d1fd

由 Nitzan Carmi 提交于 1月 17, 2018

When reset_controller that is invoked by sysfs fails,
it enters an error flow which practically removes the
nvme ctrl entirely (similar to delete_ctrl flow). It
causes the system to hang, since a sysfs attribute cannot
be unregistered by one of its own methods.

This can be fixed by calling delete_ctrl as a work rather
than sequential code. In addition, it should give the ctrl
a chance to recover using reconnection mechanism (consistant
with FC reset_ctrl error flow). Also, while we're here, return
suitable errno in case the reset ended with non live ctrl.
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

8000d1fd

09 2月, 2018 2 次提交

nvme-rdma: use NVME_CTRL_CONNECTING state to mark init process · b754a32c

由 Max Gurtovoy 提交于 1月 31, 2018

In order to avoid concurrent error recovery during initialization
process (allowed by the NVME_CTRL_NEW --> NVME_CTRL_RESETTING transition)
we must mark the ctrl as CONNECTING before initial connection
establisment.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

b754a32c

nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING · ad6a0a52

由 Max Gurtovoy 提交于 1月 31, 2018

In pci transport, this state is used to mark the initialization
process. This should be also used in other transports as well.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

ad6a0a52

26 1月, 2018 1 次提交

nvme-rdma: remove redundant boolean for inline_data · 1dad3a67

由 Max Gurtovoy 提交于 12月 06, 2017

Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1dad3a67

16 1月, 2018 1 次提交

nvme: host delete_work and reset_work on separate workqueues · b227c59b

由 Roy Shterman 提交于 1月 14, 2018

We need to ensure that delete_work will be hosted on a different
workqueue than all the works we flush or cancel from it.
Otherwise we may hit a circular dependency warning [1].

Also, given that delete_work flushes reset_work, host reset_work
on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
fix the flushing in the individual drivers to flush nvme_delete_wq
when draining queued deletes.

[1]:
[  178.491942] =============================================
[  178.492718] [ INFO: possible recursive locking detected ]
[  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
[  178.494382] ---------------------------------------------
[  178.495160] kworker/5:1/135 is trying to acquire lock:
[  178.495894]  (
[  178.496120] "nvme-wq"
[  178.496471] ){++++.+}
[  178.496599] , at:
[  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
[  178.497670]
               but task is already holding lock:
[  178.498499]  (
[  178.498724] "nvme-wq"
[  178.499074] ){++++.+}
[  178.499202] , at:
[  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.500343]
               other info that might help us debug this:
[  178.501269]  Possible unsafe locking scenario:

[  178.502113]        CPU0
[  178.502472]        ----
[  178.502829]   lock(
[  178.503115] "nvme-wq"
[  178.503467] );
[  178.503716]   lock(
[  178.504001] "nvme-wq"
[  178.504353] );
[  178.504601]
                *** DEADLOCK ***

[  178.505441]  May be due to missing lock nesting notation

[  178.506453] 2 locks held by kworker/5:1/135:
[  178.507068]  #0:
[  178.507330]  (
[  178.507598] "nvme-wq"
[  178.507726] ){++++.+}
[  178.508079] , at:
[  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.509004]  #1:
[  178.509265]  (
[  178.509532] (&ctrl->delete_work)
[  178.509795] ){+.+.+.}
[  178.510145] , at:
[  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
[  178.511070]
               stack backtrace:
:
[  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
[  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
[  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
[  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
[  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
[  178.518443] Call Trace:
[  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
[  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
[  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
[  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
[  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
[  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
[  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
[  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
[  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
[  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
[  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
[  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
[  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
[  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
[  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
[  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
[  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
[  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
[  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
[  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
[  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
[  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b227c59b

08 1月, 2018 1 次提交

nvme-fabrics: protect against module unload during create_ctrl · 0de5cd36

由 Roy Shterman 提交于 12月 25, 2017

NVMe transport driver module unload may (and usually does) trigger
iteration over the active controllers and delete them all (sometimes
under a mutex).  However, a controller can be created concurrently with
module unload which can lead to leakage of resources (most important char
device node leakage) in case the controller creation occured after the
unload delete and drain sequence.  To protect against this, we take a
module reference to guarantee that the nvme transport driver is not
unloaded while creating a controller.
Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0de5cd36

29 12月, 2017 1 次提交

nvme-rdma: fix concurrent reset and reconnect · d5bf4b7f

由 Sagi Grimberg 提交于 12月 21, 2017

Now ctrl state machine allows to transition from RESETTING to
RECONNECTING.  In nvme-rdma when we receive a rdma cm DISONNECTED event,
we trigger nvme_rdma_error_recovery. This happens also when we execute a
controller reset, issue a cm diconnect request and receive a cm
disconnect reply, as a result, the reset work and the error recovery work
can run concurrently.

Until now the state machine prevented from the error recovery work from
running as a result of a controller reset (RESETTING -> RECONNECTING was
not allowed).

To fix this, we adopt the FC state machine approach, we always transition
from LIVE to RESETTING and only then to RECONNECTING.  We do this both
for the error recovery work and the controller reset work:

 1. transition to RESETTING
 2. teardown the controller association
 3. transition to RECONNECTING

This will restore the protection against reset work and error recovery work
from concurrently running together.

Fixes: 3cec7f9d ("nvme: allow controller RESETTING to RECONNECTING transition")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d5bf4b7f

29 11月, 2017 1 次提交

nvme-rdma: fix memory leak during queue allocation · eb1bd249

由 Max Gurtovoy 提交于 11月 28, 2017

In case nvme_rdma_wait_for_cm timeout expires before we get
an established or rejected event (rdma_connect succeeded) from
rdma_cm, we end up with leaking the ib transport resources for
dedicated queue. This scenario can easily reproduced using traffic
test during port toggling.
Also, in order to protect from parallel ib queue destruction, that
may be invoked from different context's, introduce new flag that
stands for transport readiness. While we're here, protect also against
a situation that we can receive rdma_cm events during ib queue destruction.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

eb1bd249

26 11月, 2017 5 次提交

nvme-rdma: Use mr pool · f41725bb

由 Israel Rukshin 提交于 11月 26, 2017

Currently, blk_mq_tagset_iter() iterate over initial hctx tags only.  If
an I/O scheduler is used, it doesn't iterate the hctx scheduler tags and
the static request aren't been updated. For example, while using NVMe
over Fabrics RDMA host, this cause us not to reinit the scheduler
requests and thus not re-register all the memory regions during the
tagset re-initialization in the reconnect flow.

This may lead to a memory registration error:

  "MEMREG for CQE 0xffff88044c14dce8 failed with status memory management operation error (6)"

With this commit we don't need to reinit the requests, and thus fix this
failure.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f41725bb

nvme-rdma: Check remotely invalidated rkey matches our expected rkey · 3ef0279b

由 Sagi Grimberg 提交于 11月 23, 2017

If we got a remote invalidation on a bogus rkey, this is a protocol error.
Fail the connection in this case.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3ef0279b

nvme-rdma: wait for local invalidation before completing a request · 2f122e4f

由 Sagi Grimberg 提交于 11月 23, 2017

We must not complete a request before the host memory region is
invalidated.  Luckily we have send with invalidate protocol support so
we usually don't need to execute it, but in case the target did not
invalidate a memory region for us, we must wait for the invalidation to
complete before unmapping host memory and completing the I/O.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2f122e4f

nvme-rdma: don't complete requests before a send work request has completed · 4af7f7ff

由 Sagi Grimberg 提交于 11月 23, 2017

In order to guarantee that the HCA will never get an access violation
(either from invalidated rkey or from iommu) when retrying a send
operation we must complete a request only when both send completion and
the nvme cqe has arrived. We need to set the send/recv completions flags
atomically because we might have more than a single context accessing the
request concurrently (one is cq irq-poll context and the other is
user-polling used in IOCB_HIPRI).

Only then we are safe to invalidate the rkey (if needed), unmap the host
buffers, and complete the IO.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4af7f7ff

nvme-rdma: don't suppress send completions · b4b591c8

由 Sagi Grimberg 提交于 11月 23, 2017

The entire completions suppress mechanism is currently broken because the
HCA might retry a send operation (due to dropped ack) after the nvme
transaction has completed.

In order to handle this, we signal all send completions and introduce a
separate done handler for async events as they will be handled differently
(as they don't include in-capsule data by definition).
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b4b591c8

20 11月, 2017 1 次提交

nvme-fabrics: introduce init command check for a queue that is not alive · 48832f8d

由 Sagi Grimberg 提交于 10月 24, 2017

When the fabrics queue is not alive and fully functional, no commands
should be allowed to pass but connect (which moves the queue to a fully
functional state). Any other command should be failed, with either
temporary status BLK_STS_RESOUCE or permanent status BLK_STS_IOERR.

This is shared across all fabrics, hence move the check to fabrics
library.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

48832f8d

11 11月, 2017 3 次提交

nvme: remove handling of multiple AEN requests · ad22c355

由 Keith Busch 提交于 11月 07, 2017

The driver can handle tracking only one AEN request, so this patch
removes handling for multiple ones.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart  <james.smart@broadcom.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad22c355

nvme: centralize AEN defines · 38dabe21

由 Keith Busch 提交于 11月 07, 2017

All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38dabe21

nvme-rdma: fix nvme_rdma_create_queue_ib error flow · 1f61def9

由 Max Gurtovoy 提交于 11月 06, 2017

QP object is created using rdma_cm api, therefore the destruction
should use the same api for symmetry.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1f61def9

01 11月, 2017 4 次提交

nvme-rdma: reuse nvme_delete_ctrl when reconnect attempts expire · 12fa1304

由 Sagi Grimberg 提交于 10月 29, 2017

instead of just queueing delete work
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

12fa1304

nvme: consolidate common code from ->reset_work · 6cd53d14

由 Christoph Hellwig 提交于 10月 29, 2017

No change in behavior except that the FC code cancels two work items a
little later now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6cd53d14

nvme-rdma: remove nvme_rdma_remove_ctrl · e9bc2587

由 Christoph Hellwig 提交于 10月 29, 2017

It is only used in two places, and some of the work done by it will
be taken into common code soon.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e9bc2587

nvme: move controller deletion to common code · c5017e85

由 Christoph Hellwig 提交于 10月 29, 2017

Move the ->delete_work and the associated helpers to common code instead
of duplicating them in every driver.  This also adds the missing reference
get/put for the loop driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c5017e85

27 10月, 2017 2 次提交

nvme-rdma: add support for duplicate_connect option · 36e835f2

由 James Smart 提交于 10月 20, 2017

Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same target
address (traddr), target port (trsvcid), and if specified, host
address (host_traddr). Fails the connection request if there is
an existing controller.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

36e835f2

nvme: switch controller refcounting to use struct device · d22524a4

由 Christoph Hellwig 提交于 10月 18, 2017

Instead of allocating a separate struct device for the character device
handle embedd it into struct nvme_ctrl and use it for the main controller
refcounting.  This removes double refcounting and gets us an automatic
reference for the character device operations.  We keep ctrl->device as a
pointer for now to avoid chaning printks all over, but in the future we
could look into message printing helpers that take a controller structure
similar to what other subsystems do.

Note the delete_ctrl operation always already has a reference (either
through sysfs due this change, or because every open file on the
/dev/nvme-fabrics node has a refernece) when it is entered now, so we
don't need to do the unless_zero variant there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>

d22524a4

23 10月, 2017 3 次提交

nvme-rdma: Add debug message when reaches timeout · e62a538d

由 Nitzan Carmi 提交于 10月 22, 2017

Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e62a538d

M
nvme-rdma: align nvme_rdma_device structure · f87c89ad
由 Max Gurtovoy 提交于 10月 23, 2017
```
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
f87c89ad

nvme-rdma: fix possible hang when issuing commands during ctrl removal · 7db81446

由 Sagi Grimberg 提交于 10月 23, 2017

nvme_rdma_queue_is_ready() fails requests in case a queue is not
LIVE. If the controller is in RECONNECTING state, we might be in
this state for a long time (until we successfully reconnect) and
we are better off with failing the request fast. Otherwise, we
fail with BLK_STS_RESOURCE to have the block layer try again
soon.

In case we are removing the controller when the admin queue
is not LIVE, we will terminate the request with BLK_STS_RESOURCE
but it happens before we call blk_mq_start_request() so the
request timeout never expires, and the queue will never get
back to LIVE (because we are removing the controller). This
causes the removal operation to block infinitly [1].

Thus, if we are removing (state DELETING), and the queue is
not LIVE, we need to fail the request permanently as there is
no chance for it to ever complete successfully.

[1]
--
sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/u66:2   D    0   440      2 0x80000000
Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x221/0x580
 io_schedule_timeout+0x1e/0x50
 wait_for_completion_io_timeout+0x118/0x180
 blk_execute_rq+0x86/0xc0
 __nvme_submit_sync_cmd+0x89/0xf0
 nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
 nvme_shutdown_ctrl+0x41/0xe0
 nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
 nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
 nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
 process_one_work+0x1fd/0x630
 worker_thread+0x1db/0x3b0
 kthread+0x11e/0x150
 ret_from_fork+0x27/0x40
01              D    0  2868   2862 0x00000000
Call Trace:
 __schedule+0x3e9/0xb00
 schedule+0x40/0x90
 schedule_timeout+0x260/0x580
 wait_for_completion+0x108/0x170
 flush_work+0x1e0/0x270
 nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
 nvme_sysfs_delete+0x2a/0x40
 dev_attr_store+0x18/0x30
 sysfs_kf_write+0x45/0x60
 kernfs_fop_write+0x124/0x1c0
 __vfs_write+0x28/0x150
 vfs_write+0xc7/0x1b0
 SyS_write+0x49/0xa0
 entry_SYSCALL_64_fastpath+0x18/0xad
--
Reported-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7db81446

19 10月, 2017 2 次提交

nvme-rdma: Fix error status return in tagset allocation failure · f04b9cc8

由 Sagi Grimberg 提交于 10月 19, 2017

We should make sure to escelate allocation failures to prevent a
use-after-free in nvmf_create_ctrl.

Fixes: b28a308e ("nvme-rdma: move tagset allocation to a dedicated routine")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f04b9cc8

nvme-rdma: Fix possible double free in reconnect flow · bd9f0759

由 Sagi Grimberg 提交于 10月 19, 2017

The fact that we free the async event buffer in
nvme_rdma_destroy_admin_queue can cause us to free it
more than once because this happens in every reconnect
attempt since commit 31fdf184. we rely on the queue
state flags DELETING to avoid this for other resources.

A more complete fix is to not destroy the admin/io queues
unconditionally on every reconnect attempt, but its a bit
more extensive and will go in the next release.

Fixes: 31fdf184 ("nvme-rdma: reuse configure/destroy_admin_queue")
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bd9f0759

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功