提交 · bff687b3dad6e0e56b27f4d3ed8a9695f35c7b1a · openeuler / Kernel

22 12月, 2022 1 次提交

nvme: fix multipath crash caused by flush request when blktrace is enabled · 3659fb5a

由 Yanjun Zhang 提交于 12月 22, 2022

The flush request initialized by blk_kick_flush has NULL bio,
and it may be dealt with nvme_end_req during io completion.
When blktrace is enabled, nvme_trace_bio_complete with multipath
activated trying to access NULL pointer bio from flush request
results in the following crash:

[ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a
[ 2517.835213] #PF: supervisor read access in kernel mode
[ 2517.838724] #PF: error_code(0x0000) - not-present page
[ 2517.842222] PGD 7b2d51067 P4D 0
[ 2517.845684] Oops: 0000 [#1] SMP NOPTI
[ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S                5.15.67-0.cl9.x86_64 #1
[ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022
[ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30
[ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba
[ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286
[ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000
[ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000
[ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000
[ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8
[ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018
[ 2517.894434] FS:  0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000
[ 2517.898299] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0
[ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2517.913761] PKRU: 55555554
[ 2517.917558] Call Trace:
[ 2517.921294]  <TASK>
[ 2517.924982]  nvme_complete_rq+0x1c3/0x1e0 [nvme_core]
[ 2517.928715]  nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp]
[ 2517.932442]  nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp]
[ 2517.936137]  ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp]
[ 2517.939830]  tcp_read_sock+0x9c/0x260
[ 2517.943486]  nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp]
[ 2517.947173]  nvme_tcp_io_work+0x64/0x90 [nvme_tcp]
[ 2517.950834]  process_one_work+0x1e8/0x390
[ 2517.954473]  worker_thread+0x53/0x3c0
[ 2517.958069]  ? process_one_work+0x390/0x390
[ 2517.961655]  kthread+0x10c/0x130
[ 2517.965211]  ? set_kthread_struct+0x40/0x40
[ 2517.968760]  ret_from_fork+0x1f/0x30
[ 2517.972285]  </TASK>

To avoid this situation, add a NULL check for req->bio before
calling trace_block_bio_complete.
Signed-off-by: NYanjun Zhang <zhangyanjun@cestc.cn>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3659fb5a

07 12月, 2022 2 次提交

nvme: consolidate setting the tagset flags · db45e1a5

由 Christoph Hellwig 提交于 11月 30, 2022

All nvme transports should be using the same flags for their tagsets,
with the exception for the blocking flag that should only be set for
transports that can block in ->queue_rq.

Add a NVME_F_BLOCKING flag to nvme_ctrl_ops to control the blocking
behavior and lift setting the flags into nvme_alloc_{admin,io}_tag_set.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

db45e1a5

nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set · dcef7727

由 Christoph Hellwig 提交于 11月 30, 2022

Don't look at ctrl->ops as only RDMA and TCP actually support multiple
maps.

Fixes: 6dfba1c0 ("nvme-fc: use the tagset alloc/free helpers")
Fixes: ceee1953 ("nvme-loop: use the tagset alloc/free helpers")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

dcef7727

06 12月, 2022 3 次提交

nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl · 285b6e9b

由 Christoph Hellwig 提交于 11月 08, 2022

Many of the callers decide which one to use based on a bool argument and
there is at least some code to be shared, so merge these two.  Also
move a comment specific to a single callsite to that callsite.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHector Martin <marcan@marcan.st>

285b6e9b

nvme-multipath: support io stats on the mpath device · d4d957b5

由 Sagi Grimberg 提交于 11月 29, 2022

Our mpath stack device is just a shim that selects a bottom namespace
and submits the bio to it without any fancy splitting. This also means
that we don't clone the bio or have any context to the bio beyond
submission. However it really sucks that we don't see the mpath device
io stats.

Given that the mpath device can't do that without adding some context
to it, we let the bottom device do it on its behalf (somewhat similar
to the approach taken in nvme_trace_bio_complete).

When the IO starts, we account the request for multipath IO stats using
REQ_NVME_MPATH_IO_STATS nvme_request flag to avoid queue io stats disable
in the middle of the request.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>

d4d957b5

nvme: introduce nvme_start_request · 6887fc64

由 Sagi Grimberg 提交于 10月 03, 2022

In preparation for nvme-multipath IO stats accounting, we want the
accounting to happen in a centralized place. The request completion
is already centralized, but we need a common helper to request I/O
start.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NHannes Reinecke <hare@suse.de>

6887fc64

18 11月, 2022 1 次提交

nvme: rename the queue quiescing helpers · 9f27bd70

由 Christoph Hellwig 提交于 11月 15, 2022

Naming the nvme helpers that wrap the block quiesce functionality
_start/_stop is rather confusing.  Switch to using the quiesce naming
used by the block layer instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

9f27bd70

16 11月, 2022 4 次提交

nvme-auth: convert dhchap_auth_list to an array · aa36d711

由 Sagi Grimberg 提交于 11月 13, 2022

We know exactly how many dhchap contexts we will need, there is no need
to hold a list that we need to protect with a mutex. Convert to
a dynamically allocated array. And dhchap_context access state is
maintained by the chap itself.

Make dhchap_auth_mutex protect only the ctrl host_key and ctrl_key
in a fine-grained lock such that there is no long lasting acquisition
of the lock and no need to take/release this lock when flushing
authentication works.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

aa36d711

nvme-auth: no need to reset chap contexts on re-authentication · e8a420ef

由 Sagi Grimberg 提交于 11月 13, 2022

Now that the chap context is reset upon completion, this is no longer
needed. Also remove nvme_auth_reset as no callers are left.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e8a420ef

nvme-auth: guarantee dhchap buffers under memory pressure · e481fc0a

由 Sagi Grimberg 提交于 11月 15, 2022

We want to guarantee that we have chap buffers when a controller
reconnects under memory pressure. Add a mempool specifically
for that.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e481fc0a

nvme-auth: don't ignore key generation failures when initializing ctrl keys · 193a8c7e

由 Sagi Grimberg 提交于 11月 13, 2022

nvme_auth_generate_key can fail, don't ignore it upon initialization.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

193a8c7e

15 11月, 2022 3 次提交

nvme: simplify transport specific device attribute handling · 86adbf0c

由 Christoph Hellwig 提交于 10月 27, 2022

Allow the transport driver to override the attribute groups for the
control device, so that the PCIe driver doesn't manually have to add a
group after device creation and keep track of it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Tested-by Gerd Bayer <gbayer@linxu.ibm.com>

86adbf0c

nvme: move OPAL setup from PCIe to core · 94cc781f

由 Christoph Hellwig 提交于 11月 08, 2022

Nothing about the TCG Opal support is PCIe transport specific, so move it
to the core code.  For this nvme_init_ctrl_finish grows a new
was_suspended argument that allows the transport driver to tell the OPAL
code if the controller came out of a suspend cycle.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Tested-by Gerd Bayer <gbayer@linxu.ibm.com>

94cc781f

nvme: implement the DEAC bit for the Write Zeroes command · 1b96f862

由 Christoph Hellwig 提交于 10月 30, 2022

While the specification allows devices to either deallocate data
or to actually write zeroes on any Write Zeroes command, many SSDs
only do the sensible thing and deallocate data when the DEAC bit
is specific.  Set it when it is supported and the caller doesn't
explicitly opt out of deallocation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>

1b96f862

02 11月, 2022 2 次提交

nvme: use blk_mq_[un]quiesce_tagset · 98d81f0d

由 Chao Leng 提交于 11月 01, 2022

All controller namespaces share the same tagset, so we can use this
interface which does the optimal operation for parallel quiesce based on
the tagset type(e.g. blocking tagsets and non-blocking tagsets).

nvme connect_q should not be quiesced when quiesce tagset, so set the
QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.

Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
In addition, we never really quiesce a single namespace. It is a better
choice to move the flag from ns to ctrl.
Signed-off-by: NChao Leng <lengchao@huawei.com>
[hch: rebased on top of prep patches]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20221101150050.3510-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

98d81f0d

nvme: split nvme_kill_queues · cd50f9b2

由 Christoph Hellwig 提交于 11月 01, 2022

nvme_kill_queues does two things:

 1) mark the gendisk of all namespaces dead
 2) unquiesce all I/O queues

These used to be be intertwined due to block layer issues, but aren't
any more.  So move the unquiscing of the I/O queues into the callers,
and rename the rest of the function to the now more descriptive
nvme_mark_namespaces_dead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20221101150050.3510-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

cd50f9b2

27 9月, 2022 6 次提交

nvme: remove nvme_ctrl_init_connect_q · fe6f04c0

由 Christoph Hellwig 提交于 9月 20, 2022

Unused now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fe6f04c0

nvme: add common helpers to allocate and free tagsets · fe60e8c5

由 Christoph Hellwig 提交于 9月 04, 2022

Add common helpers to allocate and tear down the admin and I/O tag sets,
including the special queues allocated with them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fe60e8c5

nvme: send a rediscover uevent when a persistent discovery controller reconnects · f46ef9e8

由 Sagi Grimberg 提交于 9月 22, 2022

When a discovery controller is disconnected, no AENs will arrive to
notify the host about discovery log change events.

In order to solve this, send a uevent notification when a
persistent discovery controller reconnects. We add a new ctrl
flag NVME_CTRL_STARTED_ONCE that will be set on the first
start, and consecutive calls will find it set, and send the
event to userspace if the controller is a discovery controller.

Upon the event reception, userspace will re-read the discovery
log page and will act upon changes as it sees fit.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f46ef9e8

nvme: enumerate controller flags · bf093d97

由 Sagi Grimberg 提交于 9月 22, 2022

We expect to grow a few of these flags for various purposes
so make them a proper enumeration.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bf093d97

nvme: ensure subsystem reset is single threaded · 1e866afd

由 Keith Busch 提交于 9月 22, 2022

The subsystem reset writes to a register, so we have to ensure the
device state is capable of handling that otherwise the driver may access
unmapped registers. Use the state machine to ensure the subsystem reset
doesn't try to write registers on a device already undergoing this type
of reset.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1e866afd

nvme: handle effects after freeing the request · bc8fb906

由 Keith Busch 提交于 9月 19, 2022

If a reset occurs after the scan work attempts to issue a command, the
reset may quisce the admin queue, which blocks the scan work's command
from dispatching. The scan work will not be able to complete while the
queue is quiesced.

Meanwhile, the reset work will cancel all outstanding admin tags and
wait until all requests have transitioned to idle, which includes the
passthrough request. But the passthrough request won't be set to idle
until after the scan_work flushes, so we're deadlocked.

Fix this by handling the end effects after the request has been freed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354Reported-by: NJonathan Derrick <Jonathan.Derrick@solidigm.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bc8fb906

22 9月, 2022 2 次提交

fs: add batch and poll flags to the uring_cmd_iopoll() handler · de97fcb3

由 Jens Axboe 提交于 9月 02, 2022

We need the poll_flags to know how to poll for the IO, and we should
have the batch structure in preparation for supporting batched
completions with iopoll.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de97fcb3

nvme: wire up async polling for io passthrough commands · 585079b6

由 Kanchan Joshi 提交于 8月 23, 2022

Store a cookie during submission, and use that to implement
completion-polling inside the ->uring_cmd_iopoll handler.
This handler makes use of existing bio poll facility.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/20220823161443.49436-5-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

585079b6

03 8月, 2022 4 次提交

nvme-multipath: refactor nvme_mpath_add_disk · c13cf14f

由 Joel Granados 提交于 6月 28, 2022

Pass anagrpid as second argument. This is prep patch that allows reusing
this function for supporting unknown command sets.
Signed-off-by: NJoel Granados <j.granados@samsung.com>
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c13cf14f

nvme: implement In-Band authentication · f50fff73

由 Hannes Reinecke 提交于 6月 27, 2022

Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006.
This patch adds two new fabric options 'dhchap_secret' to specify the
pre-shared key (in ASCII respresentation according to NVMe 2.0 section
8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify
the pre-shared controller key for bi-directional authentication of both
the host and the controller.
Re-authentication can be triggered by writing the PSK into the new
controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[axboe: fold in clang build fix]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f50fff73

nvme: remove unused timeout parameter · 6b46fa02

由 Chaitanya Kulkarni 提交于 6月 06, 2022

The function __nvme_submit_sync_cmd() has following list of callers
that sets the timeout value to 0 :-

        Callers               |   Timeout value
------------------------------------------------
nvme_submit_sync_cmd()        |        0
nvme_features()               |        0
nvme_sec_submit()             |        0
nvmf_reg_read32()             |        0
nvmf_reg_read64()             |        0
nvmf_reg_write32()            |        0
nvmf_connect_admin_queue()    |        0
nvmf_connect_io_queue()       |        0

Remove the timeout function parameter from __nvme_submit_sync_cmd() and
adjust the rest of code accordingly.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b46fa02

nvme: remove a double word in a comment · b7df575f

由 Xiang wangx 提交于 6月 04, 2022

Delete the redundant word 'be'.
Signed-off-by: NXiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7df575f

26 7月, 2022 1 次提交

nvme-pci: check DMA ops when indicating support for PCI P2PDMA · 2f859441

由 Logan Gunthorpe 提交于 7月 08, 2022

Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2f859441

15 7月, 2022 1 次提交

nvme/host: Use the enum req_op and blk_opf_t types · f9ed86dc

由 Bart Van Assche 提交于 7月 14, 2022

Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
variables that represent request flags.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-38-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

f9ed86dc

06 7月, 2022 1 次提交

blk-mq: Drop 'reserved' arg of busy_tag_iter_fn · 2dd6532e

由 John Garry 提交于 7月 06, 2022

We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter
function so it may be dropped.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2dd6532e

29 6月, 2022 1 次提交

nvme: fix regression when disconnect a recovering ctrl · f7f70f4a

由 Ruozhu Li 提交于 6月 23, 2022

We encountered a problem that the disconnect command hangs.
After analyzing the log and stack, we found that the triggering
process is as follows:
CPU0                          CPU1
                                nvme_rdma_error_recovery_work
                                  nvme_rdma_teardown_io_queues
nvme_do_delete_ctrl                 nvme_stop_queues
  nvme_remove_namespaces
  --clear ctrl->namespaces
                                    nvme_start_queues
                                    --no ns in ctrl->namespaces
    nvme_ns_remove                  return(because ctrl is deleting)
      blk_freeze_queue
        blk_mq_freeze_queue_wait
        --wait for ns to unquiesce to clean infligt IO, hang forever

This problem was not found in older kernels because we will flush
err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
seem to be modified for functional reasons, the patch can be revert
to solve the problem.

Revert commit 794a4cb3 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f7f70f4a

14 6月, 2022 1 次提交

nvme: add bug report info for global duplicate id · 2f0dad17

由 Keith Busch 提交于 6月 07, 2022

The recent global id check is finding poorly implemented devices in the
wild. Include relavant device information in the output to help quicken
an appropriate quirk patch.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2f0dad17

20 5月, 2022 1 次提交

nvme: enable uring-passthrough for admin commands · 58e5bdeb

由 Kanchan Joshi 提交于 5月 20, 2022

Add two new opcodes that userspace can use for admin commands:
NVME_URING_CMD_ADMIN : non-vectroed
NVME_URING_CMD_ADMIN_VEC : vectored variant

Wire up support when these are issued on controller node(/dev/nvmeX).
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

58e5bdeb

16 5月, 2022 1 次提交

nvme: fix interpretation of DMRSL · 1a86924e

由 Tom Yan 提交于 4月 29, 2022

DMRSLl is in the unit of logical blocks, while max_discard_sectors is
in the unit of "linux sector".
Signed-off-by: NTom Yan <tom.ty89@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1a86924e

11 5月, 2022 1 次提交

nvme: wire-up uring-cmd support for io-passthru on char-device. · 456cba38

由 Kanchan Joshi 提交于 5月 11, 2022

Introduce handler for fops->uring_cmd(), implementing async passthru
on char device (/dev/ngX). The handler supports newly introduced
operation NVME_URING_CMD_IO. This operates on a new structure
nvme_uring_cmd, which is similar to struct nvme_passthru_cmd64 but
without the embedded 8b result field. This field is not needed since
uring-cmd allows to return additional result via big-CQE.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-5-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

456cba38

15 4月, 2022 1 次提交

nvme: add a quirk to disable namespace identifiers · 00ff400e

由 Christoph Hellwig 提交于 4月 11, 2022

Add a quirk to disable using and exporting namespace identifiers for
controllers where they are broken beyond repair.

The most directly visible problem with non-unique namespace identifiers
is that they break the /dev/disk/by-id/ links, with the link for a
supposedly unique identifier now pointing to one of multiple possible
namespaces that share the same ID, and a somewhat random selection of
which one actually shows up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

00ff400e

29 3月, 2022 2 次提交

nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8

由 Anton Eidelman 提交于 3月 24, 2022

nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl.  This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.

This happens in the following cases:
  1) A new ctrl is being connected.
  2) An existing ctrl is successfully reconnected.
  3) An existing ctrl is being reset.

While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().

This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.

See sample hang below.

Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
  therefore only fetches and parses the ANA log;
  any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
  is called in nvme_start_ctrl();
  this parses the ANA log without fetching it.
  At this point the ctrl is live,
  therefore, disks can be set live normally.

Sample failure:
    nvme nvme0: starting error recovery
    nvme nvme0: Reconnecting in 10 seconds...
    block nvme0n6: no usable path - requeuing I/O
    INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
          Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
    Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
    Call Trace:
     __schedule+0x2a2/0x7e0
     schedule+0x4e/0xb0
     io_schedule+0x16/0x40
     wait_on_page_bit_common+0x15c/0x3e0
     do_read_cache_page+0x1e0/0x410
     read_cache_page+0x12/0x20
     read_part_sector+0x46/0x100
     read_lba+0x121/0x240
     efi_partition+0x1d2/0x6a0
     bdev_disk_changed.part.0+0x1df/0x430
     bdev_disk_changed+0x18/0x20
     blkdev_get_whole+0x77/0xe0
     blkdev_get_by_dev+0xd2/0x3a0
     __device_add_disk+0x1ed/0x310
     device_add_disk+0x13/0x20
     nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
     nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
     nvme_update_ana_state+0xca/0xe0 [nvme_core]
     nvme_parse_ana_log+0xac/0x170 [nvme_core]
     nvme_read_ana_log+0x7d/0xe0 [nvme_core]
     nvme_mpath_init_identify+0x105/0x150 [nvme_core]
     nvme_init_identify+0x2df/0x4d0 [nvme_core]
     nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
     nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
     nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
     process_one_work+0x1bd/0x360
     worker_thread+0x50/0x3d0
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a4a6f3c8

nvme: allow duplicate NSIDs for private namespaces · 5974ea7c

由 Sungup Moon 提交于 3月 14, 2022

A NVMe subsystem with multiple controller can have private namespaces
that use the same NSID under some conditions:

 "If Namespace Management, ANA Reporting, or NVM Sets are supported, the
  NSIDs shall be unique within the NVM subsystem. If the Namespace
  Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
   a) for shared namespace shall be unique; and
   b) for private namespace are not required to be unique."

Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.

Make sure this specific setup is supported in Linux.

Fixes: 9ad1927a ("nvme: always search for namespace head")
Signed-off-by: NSungup Moon <sungup.moon@samsung.com>
[hch: refactored and fixed the controller vs subsystem based naming
      conflict]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

5974ea7c

16 3月, 2022 1 次提交

nvme: remove nvme_alloc_request and nvme_alloc_request_qid · e559398f

由 Christoph Hellwig 提交于 3月 15, 2022

Just open code the allocation + initialization in the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

e559398f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功