提交 · e18a9c18c38f523ae45416e2b75ed4ddf8ad107b · openeuler / Kernel

06 12月, 2022 1 次提交

nvme initialize core quirks before calling nvme_init_subsystem · 6f2d7152

由 Pankaj Raghav 提交于 12月 01, 2022

A device might have a core quirk for NVME_QUIRK_IGNORE_DEV_SUBNQN
(such as Samsung X5) but it would still give a:

    "missing or invalid SUBNQN field"

warning as core quirks are filled after calling nvme_init_subnqn.  Fill
ctrl->quirks from struct core_quirks before calling nvme_init_subsystem
to fix this.

Tested on a Samsung X5.

Fixes: ab9e00cc ("nvme: track subsystems")
Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6f2d7152

30 11月, 2022 1 次提交

nvme: fix SRCU protection of nvme_ns_head list · 899d2a05

由 Caleb Sander 提交于 11月 18, 2022

Walking the nvme_ns_head siblings list is protected by the head's srcu
in nvme_ns_head_submit_bio() but not nvme_mpath_revalidate_paths().
Removing namespaces from the list also fails to synchronize the srcu.
Concurrent scan work can therefore cause use-after-frees.

Hold the head's srcu lock in nvme_mpath_revalidate_paths() and
synchronize with the srcu, not the global RCU, in nvme_ns_remove().

Observed the following panic when making NVMe/RDMA connections
with native multipath on the Rocky Linux 8.6 kernel
(it seems the upstream kernel has the same race condition).
Disassembly shows the faulting instruction is cmp 0x50(%rdx),%rcx;
computing capacity != get_capacity(ns->disk).
Address 0x50 is dereferenced because ns->disk is NULL.
The NULL disk appears to be the result of concurrent scan work
freeing the namespace (note the log line in the middle of the panic).

[37314.206036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[37314.206036] nvme0n3: detected capacity change from 0 to 11811160064
[37314.299753] PGD 0 P4D 0
[37314.299756] Oops: 0000 [#1] SMP PTI
[37314.299759] CPU: 29 PID: 322046 Comm: kworker/u98:3 Kdump: loaded Tainted: G        W      X --------- -  - 4.18.0-372.32.1.el8test86.x86_64 #1
[37314.299762] Hardware name: Dell Inc. PowerEdge R720/0JP31P, BIOS 2.7.0 05/23/2018
[37314.299763] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[37314.299783] RIP: 0010:nvme_mpath_revalidate_paths+0x26/0xb0 [nvme_core]
[37314.299790] Code: 1f 44 00 00 66 66 66 66 90 55 53 48 8b 5f 50 48 8b 83 c8 c9 00 00 48 8b 13 48 8b 48 50 48 39 d3 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 50 74 05 f0 80 60 70 ef 48 8b 50 30 48 8d 42 d0 48 39 d3
[37315.058803] RSP: 0018:ffffabe28f913d10 EFLAGS: 00010202
[37315.121316] RAX: ffff927a077da800 RBX: ffff92991dd70000 RCX: 0000000001600000
[37315.206704] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff92991b719800
[37315.292106] RBP: ffff929a6b70c000 R08: 000000010234cd4a R09: c0000000ffff7fff
[37315.377501] R10: 0000000000000001 R11: ffffabe28f913a30 R12: 0000000000000000
[37315.462889] R13: ffff92992716600c R14: ffff929964e6e030 R15: ffff92991dd70000
[37315.548286] FS:  0000000000000000(0000) GS:ffff92b87fb80000(0000) knlGS:0000000000000000
[37315.645111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37315.713871] CR2: 0000000000000050 CR3: 0000002208810006 CR4: 00000000000606e0
[37315.799267] Call Trace:
[37315.828515]  nvme_update_ns_info+0x1ac/0x250 [nvme_core]
[37315.892075]  nvme_validate_or_alloc_ns+0x2ff/0xa00 [nvme_core]
[37315.961871]  ? __blk_mq_free_request+0x6b/0x90
[37316.015021]  nvme_scan_work+0x151/0x240 [nvme_core]
[37316.073371]  process_one_work+0x1a7/0x360
[37316.121318]  ? create_worker+0x1a0/0x1a0
[37316.168227]  worker_thread+0x30/0x390
[37316.212024]  ? create_worker+0x1a0/0x1a0
[37316.258939]  kthread+0x10a/0x120
[37316.297557]  ? set_kthread_struct+0x50/0x50
[37316.347590]  ret_from_fork+0x35/0x40
[37316.390360] Modules linked in: nvme_rdma nvme_tcp(X) nvme_fabrics nvme_core netconsole iscsi_tcp libiscsi_tcp dm_queue_length dm_service_time nf_conntrack_netlink br_netfilter bridge stp llc overlay nft_chain_nat ipt_MASQUERADE nf_nat xt_addrtype xt_CT nft_counter xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_multiport nft_compat nf_tables libcrc32c nfnetlink dm_multipath tg3 rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm intel_rapl_msr iTCO_wdt iTCO_vendor_support dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel ib_uverbs rapl intel_cstate intel_uncore ib_core ipmi_si joydev mei_me pcspkr ipmi_devintf mei lpc_ich wmi ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 mlx5_core drm_kms_helper syscopyarea
[37316.390419]  sysfillrect ahci sysimgblt fb_sys_fops libahci drm crc32c_intel libata mlxfw pci_hyperv_intf tls i2c_algo_bit psample dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nvme_core]
[37317.645908] CR2: 0000000000000050

Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: NCaleb Sander <csander@purestorage.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

899d2a05

09 11月, 2022 1 次提交

nvme: quiet user passthrough command errors · d7ac8dca

由 Keith Busch 提交于 10月 28, 2022

The driver is spamming the kernel logs for entirely harmless errors from
user space submitting unsupported commands. Just silence the errors.
The application has direct access to command status, so there's no need
to log these.

And since every passthrough command now uses the quiet flag, move the
setting to the common initializer.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NAlan Adamson <alan.adamson@oracle.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Tested-by: NAlan Adamson <alan.adamson@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d7ac8dca

19 10月, 2022 2 次提交

nvme-hwmon: consistently ignore errors from nvme_hwmon_init · 6b8cf940

由 Christoph Hellwig 提交于 10月 18, 2022

An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NSerge Semin <fancer.lancer@gmail.com>

6b8cf940

nvme: fix error pointer dereference in error handling · 4739824e

由 Dan Carpenter 提交于 10月 15, 2022

There is typo here so it releases the wrong variable.  "ctrl->admin_q"
was intended instead of "ctrl->fabrics_q".

Fixes: fe60e8c5 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4739824e

30 9月, 2022 1 次提交

block: change request end_io handler to pass back a return value · de671d61

由 Jens Axboe 提交于 9月 21, 2022

Everything is just converted to returning RQ_END_IO_NONE, and there
should be no functional changes with this patch.

In preparation for allowing the end_io handler to pass ownership back
to the block layer, rather than retain ownership of the request.
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de671d61

27 9月, 2022 5 次提交

nvme: add common helpers to allocate and free tagsets · fe60e8c5

由 Christoph Hellwig 提交于 9月 04, 2022

Add common helpers to allocate and tear down the admin and I/O tag sets,
including the special queues allocated with them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fe60e8c5

nvme: send a rediscover uevent when a persistent discovery controller reconnects · f46ef9e8

由 Sagi Grimberg 提交于 9月 22, 2022

When a discovery controller is disconnected, no AENs will arrive to
notify the host about discovery log change events.

In order to solve this, send a uevent notification when a
persistent discovery controller reconnects. We add a new ctrl
flag NVME_CTRL_STARTED_ONCE that will be set on the first
start, and consecutive calls will find it set, and send the
event to userspace if the controller is a discovery controller.

Upon the event reception, userspace will re-read the discovery
log page and will act upon changes as it sees fit.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f46ef9e8

nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices · c292a337

由 Michael Kelley 提交于 9月 22, 2022

The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified).  The current code does it backwards.

Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.

Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command.  The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.

Tested on a version 1.4 PCI NVMe device in an Azure VM.

Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Fixes: 1d277a63 ("NVMe: Add persistent reservation ops")
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c292a337

nvme: copy firmware_rev on each init · a8eb6c1b

由 Keith Busch 提交于 9月 19, 2022

The firmware revision can change on after a reset so copy the most
recent info each time instead of just the first time, otherwise the
sysfs firmware_rev entry may contain stale data.
Reported-by: NJeff Lien <jeff.lien@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a8eb6c1b

nvme: handle effects after freeing the request · bc8fb906

由 Keith Busch 提交于 9月 19, 2022

If a reset occurs after the scan work attempts to issue a command, the
reset may quisce the admin queue, which blocks the scan work's command
from dispatching. The scan work will not be able to complete while the
queue is quiesced.

Meanwhile, the reset work will cancel all outstanding admin tags and
wait until all requests have transitioned to idle, which includes the
passthrough request. But the passthrough request won't be set to idle
until after the scan_work flushes, so we're deadlocked.

Fix this by handling the end effects after the request has been freed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354Reported-by: NJonathan Derrick <Jonathan.Derrick@solidigm.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bc8fb906

22 9月, 2022 1 次提交

nvme: wire up async polling for io passthrough commands · 585079b6

由 Kanchan Joshi 提交于 8月 23, 2022

Store a cookie during submission, and use that to implement
completion-polling inside the ->uring_cmd_iopoll handler.
This handler makes use of existing bio poll facility.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/20220823161443.49436-5-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

585079b6

19 9月, 2022 2 次提交

nvme: move from strlcpy with unused retval to strscpy · a8817cc0

由 Wolfram Sang 提交于 8月 18, 2022

Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.

Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a8817cc0

nvme: add comment for unaligned "fake" nqn · 1abc6961

由 Linjun Bao 提交于 8月 11, 2022

Current "fake" nqn field is "nqn.2014.08.org.nvmexpress:", it is
not aligned with the canonical version for history reasons.
Signed-off-by: NLinjun Bao <meljbao@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1abc6961

07 9月, 2022 1 次提交

nvme: requeue aen after firmware activation · 371a982c

由 Keith Busch 提交于 9月 01, 2022

The driver prevents async event work while handling a processing paused
event, but someone needs to restart it after the controller returns to a
live state.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216400Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

371a982c

03 8月, 2022 12 次提交

nvme: enable generic interface (/dev/ngXnY) for unknown command sets · eb867ee9

由 Joel Granados 提交于 7月 12, 2022

Extend nvme_alloc_ns() and nvme_validate_ns() for unknown command-set as
well. Both are made to use a new helper (nvme_update_ns_info_cs_indep)
which is similar to nvme_update_ns_info but performs fewer operations
to get the generic interface up.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJoel Granados <j.granados@samsung.com>
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
[hch: rebased on other refactoring patches]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb867ee9

nvme: factor out a nvme_ns_is_readonly helper · 1e4ea66a

由 Christoph Hellwig 提交于 7月 13, 2022

Add a little helper to check if a namespace should be marked read-only
that uses a new is_readonly flag in the nvme_ns_info structure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NJoel Granados <j.granados@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1e4ea66a

nvme: refactor namespace probing · 1a893c2b

由 Christoph Hellwig 提交于 7月 22, 2022

Change nvme_ns_scan to gather all information needed for generic
namespace setup into a nvme_ns_info structure. This structure is filled
from the Command Set Idependent Identify Namespace data structure if
it is available or else the legacy Identify namespace structure.

With that everything related to the NVM command set (and the ZNS command
set derived from it) can be encapsulated in the nvme_update_ns_info_block
function while keeping the rest of the namespace probing generic.

The downside is that we now always issue two Identify Namespace calls for
each probed namespace instead of usually just a single one previously.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NJoel Granados <j.granados@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1a893c2b

nvme: generalize the nvme_multi_css check in nvme_scan_ns · 71882e7d

由 Christoph Hellwig 提交于 7月 12, 2022

Check for multiple command set support early on an error out if is
not supported when a !NVM command set namespace is found.  This
prepares for adding command set independent passthrough support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NJoel Granados <j.granados@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

71882e7d

nvme: rename nvme_validate_or_alloc_ns to nvme_scan_ns · 04c170f6

由 Christoph Hellwig 提交于 7月 12, 2022

This shorter name much better fits what this function does in
the scanning process.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier González <javier.gonz@samsung.com>
Reviewed-by: NJoel Granados <j.granados@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

04c170f6

nvme: catch -ENODEV from nvme_revalidate_zones again · e06b425b

由 Christoph Hellwig 提交于 7月 21, 2022

nvme_revalidate_zones can also return -ENODEV if e.g. zone sizes aren't
constant or not a power of two.  In that case we should jump to marking
the gendisk hidden and only support pass through.

Fixes: 602e57c9 ("nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info")
Reported-by: NJoel Granados <j.granados@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJoel Granados <j.granados@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e06b425b

nvme: define compat_ioctl again to unbreak 32-bit userspace. · a25d4261

由 Nick Bowler 提交于 7月 20, 2022

Commit 89b3d6e6 ("nvme: simplify the compat ioctl handling") removed
the initialization of compat_ioctl from the nvme block_device_operations
structures.

Presumably the expectation was that 32-bit ioctls would be directed
through the regular handler but this is not the case: failing to assign
.compat_ioctl actually means that the compat case is disabled entirely,
and any attempt to submit nvme ioctls from 32-bit userspace fails
outright with -ENOTTY.

For example:

  % smartctl -x /dev/nvme0n1
  [...]
  Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device

The blkdev_compat_ptr_ioctl helper can be used to direct compat calls
through the main ioctl handler and makes things work again.

Fixes: 89b3d6e6 ("nvme: simplify the compat ioctl handling")
Signed-off-by: NNick Bowler <nbowler@draconx.ca>
Reviewed-by: NGuixin Liu <kanie@linux.alibaba.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a25d4261

nvme-multipath: refactor nvme_mpath_add_disk · c13cf14f

由 Joel Granados 提交于 6月 28, 2022

Pass anagrpid as second argument. This is prep patch that allows reusing
this function for supporting unknown command sets.
Signed-off-by: NJoel Granados <j.granados@samsung.com>
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c13cf14f

nvme: implement In-Band authentication · f50fff73

由 Hannes Reinecke 提交于 6月 27, 2022

Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006.
This patch adds two new fabric options 'dhchap_secret' to specify the
pre-shared key (in ASCII respresentation according to NVMe 2.0 section
8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify
the pre-shared controller key for bi-directional authentication of both
the host and the controller.
Re-authentication can be triggered by writing the PSK into the new
controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[axboe: fold in clang build fix]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f50fff73

nvme: fix qid param blk_mq_alloc_request_hctx · b10907b8

由 Chaitanya Kulkarni 提交于 6月 06, 2022

Only caller of the __nvme_submit_sync_cmd() with qid value not equal to
NVME_QID_ANY is nvmf_connect_io_queues(), where qid value is alway set
to > 0.

[1] __nvme_submit_sync_cmd() callers with  qid parameter from :-

        Caller                  |   qid parameter
------------------------------------------------------
* nvme_fc_connect_io_queues()   |
   nvmf_connect_io_queue()      |      qid > 0
* nvme_rdma_start_io_queues()   |
   nvme_rdma_start_queue()      |
    nvmf_connect_io_queues()    |      qid > 0
* nvme_tcp_start_io_queues()    |
   nvme_tcp_start_queue()       |
    nvmf_connect_io_queues()    |      qid > 0
* nvme_loop_connect_io_queues() |
   nvmf_connect_io_queues()     |      qid > 0

When qid value of the function parameter __nvme_submit_sync_cmd() is > 0
from above callers, we use blk_mq_alloc_request_hctx(), where we pass
last parameter as 0 if qid functional parameter value is set to 0 with
conditional operators, see 1002 :-

991 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 992                 union nvme_result *result, void *buffer, unsigned bufflen,
 993                 int qid, int at_head, blk_mq_req_flags_t flags)
 994 {
 995         struct request *req;
 996         int ret;
 997
 998         if (qid == NVME_QID_ANY)
 999                 req = blk_mq_alloc_request(q, nvme_req_op(cmd), flags);
1000         else
1001                 req = blk_mq_alloc_request_hctx(q, nvme_req_op(cmd), flags,
1002                                                 qid ? qid - 1 : 0);
1003

But qid function parameter value of the __nvme_submit_sync_cmd() will
never be 0 from above caller list see [1], and all the other callers of
__nvme_submit_sync_cmd() use NVME_QID_ANY as qid value :-
1. nvme_submit_sync_cmd()
2. nvme_features()
3. nvme_sec_submit()
4. nvmf_reg_read32()
5. nvmf_reg_read64()
6. nvmf_ref_write32()
7. nvmf_connect_admin_queue()

Remove the conditional operator to pass the qid as 0 in the call to
blk_mq_alloc_requst_hctx().
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b10907b8

nvme: remove unused timeout parameter · 6b46fa02

由 Chaitanya Kulkarni 提交于 6月 06, 2022

The function __nvme_submit_sync_cmd() has following list of callers
that sets the timeout value to 0 :-

        Callers               |   Timeout value
------------------------------------------------
nvme_submit_sync_cmd()        |        0
nvme_features()               |        0
nvme_sec_submit()             |        0
nvmf_reg_read32()             |        0
nvmf_reg_read64()             |        0
nvmf_reg_write32()            |        0
nvmf_connect_admin_queue()    |        0
nvmf_connect_io_queue()       |        0

Remove the timeout function parameter from __nvme_submit_sync_cmd() and
adjust the rest of code accordingly.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b46fa02

nvme: handle the persistent internal error AER · 2c61c97f

由 Michael Kelley 提交于 6月 08, 2022

In the NVM Express Revision 1.4 spec, Figure 145 describes possible
values for an AER with event type "Error" (value 000b). For a
Persistent Internal Error (value 03h), the host should perform a
controller reset.

Add support for this error using code that already exists for
doing a controller reset. As part of this support, introduce
two utility functions for parsing the AER type and subtype.

This new support was tested in a lab environment where we can
generate the persistent internal error on demand, and observe
both the Linux side and NVMe controller side to see that the
controller reset has been done.
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c61c97f

26 7月, 2022 1 次提交

nvme-pci: check DMA ops when indicating support for PCI P2PDMA · 2f859441

由 Logan Gunthorpe 提交于 7月 08, 2022

Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2f859441

14 7月, 2022 1 次提交

nvme: fix block device naming collision · 6961b5e0

由 Israel Rukshin 提交于 7月 14, 2022

The issue exists when multipath is enabled and the namespace is
shared, but all the other controller checks at nvme_is_unique_nsid()
are false. The reason for this issue is that nvme_is_unique_nsid()
returns false when is called from nvme_mpath_alloc_disk() due to an
uninitialized value of head->shared. The patch fixes it by setting
head->shared before nvme_mpath_alloc_disk() is called.

Fixes: 5974ea7c ("nvme: allow duplicate NSIDs for private namespaces")
Signed-off-by: NIsrael Rukshin <israelr@nvidia.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6961b5e0

06 7月, 2022 1 次提交

blk-mq: Drop 'reserved' arg of busy_tag_iter_fn · 2dd6532e

由 John Garry 提交于 7月 06, 2022

We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter
function so it may be dropped.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2dd6532e

29 6月, 2022 1 次提交

nvme: fix regression when disconnect a recovering ctrl · f7f70f4a

由 Ruozhu Li 提交于 6月 23, 2022

We encountered a problem that the disconnect command hangs.
After analyzing the log and stack, we found that the triggering
process is as follows:
CPU0                          CPU1
                                nvme_rdma_error_recovery_work
                                  nvme_rdma_teardown_io_queues
nvme_do_delete_ctrl                 nvme_stop_queues
  nvme_remove_namespaces
  --clear ctrl->namespaces
                                    nvme_start_queues
                                    --no ns in ctrl->namespaces
    nvme_ns_remove                  return(because ctrl is deleting)
      blk_freeze_queue
        blk_mq_freeze_queue_wait
        --wait for ns to unquiesce to clean infligt IO, hang forever

This problem was not found in older kernels because we will flush
err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
seem to be modified for functional reasons, the patch can be revert
to solve the problem.

Revert commit 794a4cb3 ("nvme: remove the .stop_ctrl callout")
Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f7f70f4a

28 6月, 2022 2 次提交

block: remove blk_cleanup_disk · 8b9ab626

由 Christoph Hellwig 提交于 6月 19, 2022

blk_cleanup_disk is nothing but a trivial wrapper for put_disk now,
so remove it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

8b9ab626

block: simplify disk shutdown · 6f8191fd

由 Christoph Hellwig 提交于 6月 19, 2022

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6f8191fd

23 6月, 2022 1 次提交

nvme: move the Samsung X5 quirk entry to the core quirks · e6487833

由 Christoph Hellwig 提交于 6月 17, 2022

This device shares the PCI ID with the Samsung 970 Evo Plus that
does not need or want the quirks.  Move the the quirk entry to the
core table based on the model number instead.

Fixes: bc360b0b ("nvme-pci: add quirks for Samsung X5 SSDs")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NPankaj Raghav <p.raghav@samsung.com>

e6487833

14 6月, 2022 2 次提交

nvme: add bug report info for global duplicate id · 2f0dad17

由 Keith Busch 提交于 6月 07, 2022

The recent global id check is finding poorly implemented devices in the
wild. Include relavant device information in the output to help quicken
an appropriate quirk patch.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2f0dad17

nvme: add device name to warning in uuid_show() · 1fc766b5

由 Thomas Weißschuh 提交于 6月 07, 2022

This provides more context to users.

Old message:

[   00.000000] No UUID available providing old NGUID

New message:

[   00.000000] block nvme0n1: No UUID available providing old NGUID

Fixes: d934f984 ("nvme: provide UUID value to userspace")
Signed-off-by: NThomas Weißschuh <linux@weissschuh.net>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1fc766b5

31 5月, 2022 1 次提交

nvme: set controller enable bit in a separate write · aa41d2fe

由 Niklas Cassel 提交于 5月 26, 2022

The NVM Express Base Specification 2.0 specifies in the description
of the CC – Controller Configuration register:
"Host software shall set the Arbitration Mechanism Selected (CC.AMS),
the Memory Page Size (CC.MPS), and the I/O Command Set Selected (CC.CSS)
to valid values prior to enabling the controller by setting CC.EN to ‘1’.

While we haven't seen any controller misbehaving while setting all bits
in a single write, let's do it in the order that it is written in the
spec, as there could potentially be controllers that are implemented to
rely on the configuration bits being set before enabling the controller.
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

aa41d2fe

28 5月, 2022 1 次提交

blk-mq: remove the done argument to blk_execute_rq_nowait · e2e53086

由 Christoph Hellwig 提交于 5月 24, 2022

Let the caller set it together with the end_io_data instead of passing
a pointless argument. Note the the target code did in fact already
set it and then just overrode it again by calling blk_execute_rq_nowait.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e2e53086

20 5月, 2022 2 次提交

nvme: enable uring-passthrough for admin commands · 58e5bdeb

由 Kanchan Joshi 提交于 5月 20, 2022

Add two new opcodes that userspace can use for admin commands:
NVME_URING_CMD_ADMIN : non-vectroed
NVME_URING_CMD_ADMIN_VEC : vectored variant

Wire up support when these are issued on controller node(/dev/nvmeX).
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

58e5bdeb

nvme: set non-mdts limits in nvme_scan_work · 78288665

由 Chaitanya Kulkarni 提交于 5月 18, 2022

In current implementation we set the non-mdts limits by calling
nvme_init_non_mdts_limits() from nvme_init_ctrl_finish().
This also tries to set the limits for the discovery controller which
has no I/O queues resulting in the warning message reported by the
nvme_log_error() when running blktest nvme/002: -

[ 2005.155946] run blktests nvme/002 at 2022-04-09 16:57:47
[ 2005.192223] loop: module loaded
[ 2005.196429] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
[ 2005.200334] nvmet: adding nsid 1 to subsystem blktests-subsystem-1

<------------------------------SNIP---------------------------------->

[ 2008.958108] nvmet: adding nsid 1 to subsystem blktests-subsystem-997
[ 2008.962082] nvmet: adding nsid 1 to subsystem blktests-subsystem-998
[ 2008.966102] nvmet: adding nsid 1 to subsystem blktests-subsystem-999
[ 2008.973132] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN testhostnqn.
*[ 2008.973196] nvme1: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) MORE DNR*
[ 2008.974595] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery"
[ 2009.103248] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

Move the call of nvme_init_non_mdts_limits() to nvme_scan_work() after
we verify that I/O queues are created since that is a converging point
for each transport where these limits are actually used.

1. FC :
nvme_fc_create_association()
 ...
 nvme_fc_create_io_queues(ctrl);
 ...
 nvme_start_ctrl()
  nvme_scan_queue()
   nvme_scan_work()

2. PCIe:-
nvme_reset_work()
 ...
 nvme_setup_io_queues()
  nvme_create_io_queues()
   nvme_alloc_queue()
 ...
 nvme_start_ctrl()
  nvme_scan_queue()
   nvme_scan_work()

3. RDMA :-
nvme_rdma_setup_ctrl
 ...
  nvme_rdma_configure_io_queues
  ...
  nvme_start_ctrl()
   nvme_scan_queue()
    nvme_scan_work()

4. TCP :-
nvme_tcp_setup_ctrl
 ...
  nvme_tcp_configure_io_queues
  ...
  nvme_start_ctrl()
   nvme_scan_queue()
    nvme_scan_work()

* nvme_scan_work()
...
nvme_validate_or_alloc_ns()
  nvme_alloc_ns()
   nvme_update_ns_info()
    nvme_update_disk_info()
     nvme_config_discard() <---
     blk_queue_max_write_zeroes_sectors() <---
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

78288665

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功