提交 · 2b3f056f72e56fa07df69b4705e0b46a6c08e77c · openeuler / Kernel

25 10月, 2022 1 次提交

blk-mq: move the call to blk_put_queue out of blk_mq_destroy_queue · 2b3f056f

由 Christoph Hellwig 提交于 10月 18, 2022

The fact that blk_mq_destroy_queue also drops a queue reference leads
to various places having to grab an extra reference.  Move the call to
blk_put_queue into the callers to allow removing the extra references.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20221018135720.670094-2-hch@lst.de
[axboe: fix fabrics_q vs admin_q conflict in nvme core.c]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b3f056f

19 10月, 2022 5 次提交

nvme-hwmon: kmalloc the NVME SMART log buffer · c94b7f9b

由 Serge Semin 提交于 10月 18, 2022

Recent commit 52fde2c0 ("nvme: set dma alignment to dword") has
caused a regression on our platform.

It turned out that the nvme_get_log() method invocation caused the
nvme_hwmon_data structure instance corruption. In particular the
nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with
garbage. After some research we discovered that the problem happened
even before the actual NVME DMA execution, but during the buffer mapping.
Since our platform is DMA-noncoherent, the mapping implied the cache-line
invalidations or write-backs depending on the DMA-direction parameter.
In case of the NVME SMART log getting the DMA was performed
from-device-to-memory, thus the cache-invalidation was activated during
the buffer mapping. Since the log-buffer isn't cache-line aligned, the
cache-invalidation caused the neighbour data to be discarded. The
neighbouring data turned to be the data surrounding the buffer in the
framework of the nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b ("nvme: Add hardware monitoring support")
Signed-off-by: NSerge Semin <Sergey.Semin@baikalelectronics.ru>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c94b7f9b

nvme-hwmon: consistently ignore errors from nvme_hwmon_init · 6b8cf940

由 Christoph Hellwig 提交于 10月 18, 2022

An NVMe controller works perfectly fine even when the hwmon
initialization fails.  Stop returning errors that do not come from a
controller reset from nvme_hwmon_init to handle this case consistently.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NSerge Semin <fancer.lancer@gmail.com>

6b8cf940

nvme-apple: don't limit DMA segement size · d622f847

由 Russell King (Oracle) 提交于 10月 12, 2022

NVMe uses PRPs for data transfers and has no specific limit for a single
DMA segement.  Limiting the size will cause problems because the block
layer assumes PRP-ish devices using a virt boundary mask don't have a
segment limit.  And while this is true, we also really need to tell the
DMA mapping layer about it, otherwise dma-debug will trip over it.

Fixes: 5bd2927a ("nvme-apple: Add initial Apple SoC NVMe driver")
Suggested-by: NSven Peter <sven@svenpeter.dev>
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
[hch: rewrote the commit message based on the PCIe commit]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Curtin <ecurtin@redhat.com>
Reviewed-by: NSven Peter <sven@svenpeter.dev>

d622f847

nvme-pci: disable write zeroes on various Kingston SSD · ac9b57d4

由 Xander Li 提交于 10月 11, 2022

Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to
process.  The firmware version is locked by these SSDs, we can not expect
firmware improvement, so disable Write_Zeroes cmd.
Signed-off-by: NXander Li <xander_li@kingston.com.tw>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ac9b57d4

nvme: fix error pointer dereference in error handling · 4739824e

由 Dan Carpenter 提交于 10月 15, 2022

There is typo here so it releases the wrong variable.  "ctrl->admin_q"
was intended instead of "ctrl->fabrics_q".

Fixes: fe60e8c5 ("nvme: add common helpers to allocate and free tagsets")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4739824e

12 10月, 2022 5 次提交

nvme-multipath: fix possible hang in live ns resize with ANA access · 72e3b888

由 Sagi Grimberg 提交于 9月 29, 2022

When we revalidate paths as part of ns size change (as of commit
e7d65803), it is possible that during the path revalidation, the
only paths that is IO capable (i.e. optimized/non-optimized) are the
ones that ns resize was not yet informed to the host, which will cause
inflight requests to be requeued (as we have available paths but none
are IO capable). These requests on the requeue list are waiting for
someone to resubmit them at some point.

The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit
the queued requests.

Fix this by always kicking the requeue list, and if no IO capable path
exists, these requests will be queued again.

A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--
Reported-by: NYogev Cohen <yogev@lightbitslabs.com>
Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: NChristoph Hellwig <hch@lst.de>

72e3b888

nvme-pci: avoid the deepest sleep state on ZHITAI TiPro5000 SSDs · d5d3c100

由 Xi Ruoyao 提交于 9月 28, 2022

ZHITAI TiPro5000 SSDs has the same APST sleep problem as its cousin,
TiPro7000.  The quirk for TiPro7000 has been added in
commit 6b961bce ("nvme-pci: avoid the deepest sleep state on
ZHITAI TiPro7000 SSDs"), use the same quirk for TiPro5000.

The ASPT data from "nvme id-ctrl /dev/nvme1":

vid       : 0x1e49
ssvid     : 0x1e49
sn        : ZTA21T0KA2227304LM
mn        : ZHITAI TiPlus5000 1TB
fr        : ZTA09139
[...]
ps    0 : mp:6.50W operational enlat:0 exlat:0 rrt:0 rrl:0
         rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:5.80W operational enlat:0 exlat:0 rrt:1 rrl:1
         rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.60W operational enlat:0 exlat:0 rrt:2 rrl:2
         rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0500W non-operational enlat:5000 exlat:10000 rrt:3 rrl:3
         rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0025W non-operational enlat:8000 exlat:45000 rrt:4 rrl:4
         rwt:4 rwl:4 idle_power:- active_power:-
Reported-and-tested-by: NChang Feng <flukehn@gmail.com>
Signed-off-by: NXi Ruoyao <xry111@xry111.site>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d5d3c100

nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM760 · 80b26240

由 Abhijit 提交于 10月 10, 2022

Add a quirk to fix Lexar NM760 SSD drives reporting duplicate nsids.
Signed-off-by: NAbhijit <abhijit@abhijittomar.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

80b26240

nvme-tcp: fix possible hang caused during ctrl deletion · c4abd875

由 Sagi Grimberg 提交于 9月 28, 2022

When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
	inflight or scheduled (specifically also .stop_ctrl
	which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
	to avoid competing ns addition/removal
3. continue to teardown the controller

However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).

The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
   - err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing (see
stack trace [1]).

Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.

[1]:
--
Call Trace:
 <TASK>
 __schedule+0x390/0x910
 ? scan_shadow_nodes+0x40/0x40
 schedule+0x55/0xe0
 io_schedule+0x16/0x40
 do_read_cache_page+0x55d/0x850
 ? __page_cache_alloc+0x90/0x90
 read_cache_page+0x12/0x20
 read_part_sector+0x3f/0x110
 amiga_partition+0x3d/0x3e0
 ? osf_partition+0x33/0x220
 ? put_partition+0x90/0x90
 bdev_disk_changed+0x1fe/0x4d0
 blkdev_get_whole+0x7b/0x90
 blkdev_get_by_dev+0xda/0x2d0
 device_add_disk+0x356/0x3b0
 nvme_mpath_set_live+0x13c/0x1a0 [nvme_core]
 ? nvme_parse_ana_log+0xae/0x1a0 [nvme_core]
 nvme_update_ns_ana_state+0x3a/0x40 [nvme_core]
 nvme_mpath_add_disk+0x120/0x160 [nvme_core]
 nvme_alloc_ns+0x594/0xa00 [nvme_core]
 nvme_validate_or_alloc_ns+0xb9/0x1a0 [nvme_core]
 ? __nvme_submit_sync_cmd+0x1d2/0x210 [nvme_core]
 nvme_scan_work+0x281/0x410 [nvme_core]
 process_one_work+0x1be/0x380
 worker_thread+0x37/0x3b0
 ? process_one_work+0x380/0x380
 kthread+0x12d/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x1f/0x30
 </TASK>
INFO: task nvme:6725 blocked for more than 491 seconds.
      Not tainted 5.15.65-f0.el7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nvme            state:D
 stack:    0 pid: 6725 ppid:  1761 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x390/0x910
 ? sched_clock+0x9/0x10
 schedule+0x55/0xe0
 schedule_timeout+0x24b/0x2e0
 ? try_to_wake_up+0x358/0x510
 ? finish_task_switch+0x88/0x2c0
 wait_for_completion+0xa5/0x110
 __flush_work+0x144/0x210
 ? worker_attach_to_pool+0xc0/0xc0
 flush_work+0x10/0x20
 nvme_remove_namespaces+0x41/0xf0 [nvme_core]
 nvme_do_delete_ctrl+0x47/0x66 [nvme_core]
 nvme_sysfs_delete.cold.96+0x8/0xd [nvme_core]
 dev_attr_store+0x14/0x30
 sysfs_kf_write+0x38/0x50
 kernfs_fop_write_iter+0x146/0x1d0
 new_sync_write+0x114/0x1b0
 ? intel_pmu_handle_irq+0xe0/0x420
 vfs_write+0x18d/0x270
 ksys_write+0x61/0xe0
 __x64_sys_write+0x1a/0x20
 do_syscall_64+0x37/0x90
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
--

Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver")
Reported-by: NJonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NJonathan Nicklin <jnicklin@blockbridge.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c4abd875

nvme-rdma: fix possible hang caused during ctrl deletion · a1ae8d4d

由 Sagi Grimberg 提交于 9月 28, 2022

When we delete a controller, we execute the following:
1. nvme_stop_ctrl() - stop some work elements that may be
        inflight or scheduled (specifically also .stop_ctrl
        which cancels ctrl error recovery work)
2. nvme_remove_namespaces() - which first flushes scan_work
        to avoid competing ns addition/removal
3. continue to teardown the controller

However, if err_work was scheduled to run in (1), it is designed to
cancel any inflight I/O, particularly I/O that is originating from ns
scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
prevent forward progress of (2) as ns scanning is blocking on I/O
(that will never be cancelled).

The race is:
1. transport layer error observed -> err_work is scheduled
2. scan_work executes, discovers ns, generate I/O to it
3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
   - err_work never executed
4. nvme_remove_namespaces() -> flush_work(scan_work)
--> deadlock, because scan_work is blocked on I/O that was supposed
to be cancelled by err_work, but was cancelled before executing.

Fix this by flushing err_work instead of cancelling it, to force it
to execute and cancel all inflight I/O.

Fixes: b435ecea ("nvme: Add .stop_ctrl to nvme ctrl ops")
Fixes: f6c8e432 ("nvme: flush namespace scanning work just before removing namespaces")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a1ae8d4d

30 9月, 2022 8 次提交

nvme: wire up fixed buffer support for nvme passthrough · 23fd22e5

由 Kanchan Joshi 提交于 9月 30, 2022

if io_uring sends passthrough command with IORING_URING_CMD_FIXED flag,
use the pre-registered buffer for IO (non-vectored variant). Pass the
buffer/length to io_uring and get the bvec iterator for the range. Next,
pass this bvec to block-layer and obtain a bio/request for subsequent
processing.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220930062749.152261-13-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

23fd22e5

nvme: pass ubuffer as an integer · 4d174486

由 Kanchan Joshi 提交于 9月 30, 2022

This is a prep patch. Modify nvme_submit_user_cmd and
nvme_map_user_request to take ubuffer as plain integer
argument, and do away with nvme_to_user_ptr conversion in callers.
Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com>
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220930062749.152261-12-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4d174486

nvme: refactor nvme_alloc_request · 470e900c

由 Kanchan Joshi 提交于 9月 30, 2022

nvme_alloc_request expects a large number of parameters.
Split this out into two functions to reduce number of parameters.
First one retains the name nvme_alloc_request, while second one is
named nvme_map_user_request.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220930062749.152261-8-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

470e900c

nvme: refactor nvme_add_user_metadata · 38c0ddab

由 Kanchan Joshi 提交于 9月 30, 2022

Pass struct request rather than bio. It helps to kill a parameter, and
some processing clean-up too.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220930062749.152261-7-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

38c0ddab

nvme: Use blk_rq_map_user_io helper · 7f056357

由 Anuj Gupta 提交于 9月 30, 2022

User blk_rq_map_user_io instead of duplicating the same code at
different places
Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com>
Link: https://lore.kernel.org/r/20220930062749.152261-6-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7f056357

nvme: enable batched completions of passthrough IO · 851eb780

由 Jens Axboe 提交于 9月 22, 2022

Now that the normal passthrough end_io path doesn't need the request
anymore, we can kill the explicit blk_mq_free_request() and just pass
back RQ_END_IO_FREE instead. This enables the batched completion from
freeing batches of requests at the time.

This brings passthrough IO performance at least on par with bdev based
O_DIRECT with io_uring. With this and batche allocations, peak performance
goes from 110M IOPS to 122M IOPS. For IRQ based, passthrough is now also
about 10% faster than previously, going from ~61M to ~67M IOPS.
Reviewed-by: NAnuj Gupta <anuj20.g@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Co-developed-by: NStefan Roesch <shr@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

851eb780

nvme: split out metadata vs non metadata end_io uring_cmd completions · c0a7ba77

由 Jens Axboe 提交于 9月 21, 2022

By splitting up the metadata and non-metadata end_io handling, we can
remove any request dependencies on the normal non-metadata IO path. This
is in preparation for enabling the normal IO passthrough path to pass
the ownership of the request back to the block layer.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAnuj Gupta <anuj20.g@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Co-developed-by: NStefan Roesch <shr@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c0a7ba77

block: change request end_io handler to pass back a return value · de671d61

由 Jens Axboe 提交于 9月 21, 2022

Everything is just converted to returning RQ_END_IO_NONE, and there
should be no functional changes with this patch.

In preparation for allowing the end_io handler to pass ownership back
to the block layer, rather than retain ownership of the request.
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de671d61

27 9月, 2022 20 次提交

nvme: remove nvme_ctrl_init_connect_q · fe6f04c0

由 Christoph Hellwig 提交于 9月 20, 2022

Unused now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fe6f04c0

nvme-fc: use the tagset alloc/free helpers · 6dfba1c0

由 Christoph Hellwig 提交于 9月 20, 2022

Use the common helpers to allocate and free the tagsets.  To make this
work the generic nvme_ctrl now needs to be stored in the hctx private
data instead of the nvme_fc_ctrl.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>

6dfba1c0

nvme-fc: store the generic nvme_ctrl in set->driver_data · 1864ea46

由 Christoph Hellwig 提交于 9月 20, 2022

Point the private data to the generic controller structure in preparation
of using the common tagset init/exit code and use the chance the cleanup
the init_hctx methods a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>

1864ea46

nvme-fc: keep ctrl->sqsize in sync with opts->queue_size · 18ecd975

由 Christoph Hellwig 提交于 9月 20, 2022

Also update the sqsize field when capping the queue size, and remove the
check a queue size that is larger than sqsize given that sqsize is only
initialized from opts->queue_size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>

18ecd975

nvme-rdma: use the tagset alloc/free helpers · cefa1032

由 Christoph Hellwig 提交于 9月 20, 2022

Use the common helpers to allocate and free the tagsets.  To make this
work the generic nvme_ctrl now needs to be stored in the hctx private
data instead of the nvme_rdma_ctrl.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

cefa1032

nvme-rdma: store the generic nvme_ctrl in set->driver_data · 2d60738c

由 Christoph Hellwig 提交于 9月 20, 2022

Point the private data to the generic controller structure in preparation
of using the common tagset init/exit code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

2d60738c

nvme-tcp: use the tagset alloc/free helpers · de777825

由 Christoph Hellwig 提交于 9月 20, 2022

Use the common helpers to allocate and free the tagsets.  To make this
work the generic nvme_ctrl now needs to be stored in the hctx private
data instead of the nvme_tcp_ctrl.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

de777825

nvme-tcp: store the generic nvme_ctrl in set->driver_data · 06427ca0

由 Christoph Hellwig 提交于 9月 20, 2022

Point the private data to the generic controller structure in preparation
of using the common tagset init/exit code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

06427ca0

nvme-tcp: remove the unused queue_size member in nvme_tcp_queue · fb8745d0

由 Christoph Hellwig 提交于 9月 20, 2022

->nvme_tcp_queue is not used anywhere, so remove it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fb8745d0

nvme: add common helpers to allocate and free tagsets · fe60e8c5

由 Christoph Hellwig 提交于 9月 04, 2022

Add common helpers to allocate and tear down the admin and I/O tag sets,
including the special queues allocated with them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fe60e8c5

nvme-pci: report the actual number of tagset maps · 6ee742fa

由 Keith Busch 提交于 9月 26, 2022

We've been reporting 2 maps regardless of whether the module parameter
asked for anything beyond the default queues. A consequence of this
means that blk-mq will reinitialize the all the hardware contexts and io
schedulers on every controller reset when the mapping is exactly the
same as before. This unnecessary overhead is adding several milliseconds
on a reset for environments that don't need it. Report the actual number
of mappings in use.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6ee742fa

nvme-pci: set min_align_mask before calculating max_hw_sectors · 61ce339f

由 Rishabh Bhatnagar 提交于 9月 20, 2022

If swiotlb is force enabled dma_max_mapping_size ends up calling
swiotlb_max_mapping_size which takes into account the min align mask for
the device.  Set the min align mask for nvme driver before calling
dma_max_mapping_size while calculating max hw sectors.
Signed-off-by: NRishabh Bhatnagar <risbhat@amazon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

61ce339f

nvme: send a rediscover uevent when a persistent discovery controller reconnects · f46ef9e8

由 Sagi Grimberg 提交于 9月 22, 2022

When a discovery controller is disconnected, no AENs will arrive to
notify the host about discovery log change events.

In order to solve this, send a uevent notification when a
persistent discovery controller reconnects. We add a new ctrl
flag NVME_CTRL_STARTED_ONCE that will be set on the first
start, and consecutive calls will find it set, and send the
event to userspace if the controller is a discovery controller.

Upon the event reception, userspace will re-read the discovery
log page and will act upon changes as it sees fit.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f46ef9e8

nvme: enumerate controller flags · bf093d97

由 Sagi Grimberg 提交于 9月 22, 2022

We expect to grow a few of these flags for various purposes
so make them a proper enumeration.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bf093d97

nvme-pci: disable Write Zeroes on Phison E3C/E4C · d14c2731

由 Tina Hsu 提交于 9月 22, 2022

E3C/E4C SSDs do support the Write Zeroes command in theory, but have very
bad performance when using it.  As the firmware has been frozen for these
products we can not expect firmware improvements for it, so disable
Write Zeroes.
Signed-off-by: NTina Hsu <tina_hsu@phison.corp-partner.google.com>
[hch: update the commit message]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d14c2731

nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices · c292a337

由 Michael Kelley 提交于 9月 22, 2022

The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
non-functional on NVMe devices because the nvme_pr_clear()
and nvme_pr_release() functions set the IEKEY field incorrectly.
The IEKEY field should be set only when the key is zero (i.e,
not specified).  The current code does it backwards.

Furthermore, the NVMe spec describes the persistent
reservation "clear" function as an option on the reservation
release command. The current implementation of nvme_pr_clear()
erroneously uses the reservation register command.

Fix these errors. Note that NVMe version 1.3 and later specify
that setting the IEKEY field will return an error of Invalid
Field in Command.  The fix will set IEKEY when the key is zero,
which is appropriate as these ioctls consider a zero key to
be "unspecified", and the intention of the spec change is
to require a valid key.

Tested on a version 1.4 PCI NVMe device in an Azure VM.

Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Fixes: 1d277a63 ("NVMe: Add persistent reservation ops")
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c292a337

nvme: ensure subsystem reset is single threaded · 1e866afd

由 Keith Busch 提交于 9月 22, 2022

The subsystem reset writes to a register, so we have to ensure the
device state is capable of handling that otherwise the driver may access
unmapped registers. Use the state machine to ensure the subsystem reset
doesn't try to write registers on a device already undergoing this type
of reset.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1e866afd

nvme: restrict management ioctls to admin · 23e085b2

由 Keith Busch 提交于 9月 22, 2022

The passthrough commands already have this restriction, but the other
operations do not. Require the same capabilities for all users as all of
these operations, which include resets and rescans, can be disruptive.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

23e085b2

nvme: copy firmware_rev on each init · a8eb6c1b

由 Keith Busch 提交于 9月 19, 2022

The firmware revision can change on after a reset so copy the most
recent info each time instead of just the first time, otherwise the
sysfs firmware_rev entry may contain stale data.
Reported-by: NJeff Lien <jeff.lien@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a8eb6c1b

nvme: handle effects after freeing the request · bc8fb906

由 Keith Busch 提交于 9月 19, 2022

If a reset occurs after the scan work attempts to issue a command, the
reset may quisce the admin queue, which blocks the scan work's command
from dispatching. The scan work will not be able to complete while the
queue is quiesced.

Meanwhile, the reset work will cancel all outstanding admin tags and
wait until all requests have transitioned to idle, which includes the
passthrough request. But the passthrough request won't be set to idle
until after the scan_work flushes, so we're deadlocked.

Fix this by handling the end effects after the request has been freed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354Reported-by: NJonathan Derrick <Jonathan.Derrick@solidigm.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bc8fb906

22 9月, 2022 1 次提交

fs: add batch and poll flags to the uring_cmd_iopoll() handler · de97fcb3

由 Jens Axboe 提交于 9月 02, 2022

We need the poll_flags to know how to poll for the IO, and we should
have the batch structure in preparation for supporting batched
completions with iopoll.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de97fcb3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功