- 25 10月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
The fact that blk_mq_destroy_queue also drops a queue reference leads to various places having to grab an extra reference. Move the call to blk_put_queue into the callers to allow removing the extra references. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-2-hch@lst.de [axboe: fix fabrics_q vs admin_q conflict in nvme core.c] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 10月, 2022 5 次提交
-
-
由 Serge Semin 提交于
Recent commit 52fde2c0 ("nvme: set dma alignment to dword") has caused a regression on our platform. It turned out that the nvme_get_log() method invocation caused the nvme_hwmon_data structure instance corruption. In particular the nvme_hwmon_data.ctrl pointer was overwritten either with zeros or with garbage. After some research we discovered that the problem happened even before the actual NVME DMA execution, but during the buffer mapping. Since our platform is DMA-noncoherent, the mapping implied the cache-line invalidations or write-backs depending on the DMA-direction parameter. In case of the NVME SMART log getting the DMA was performed from-device-to-memory, thus the cache-invalidation was activated during the buffer mapping. Since the log-buffer isn't cache-line aligned, the cache-invalidation caused the neighbour data to be discarded. The neighbouring data turned to be the data surrounding the buffer in the framework of the nvme_hwmon_data structure. In order to fix that we need to make sure that the whole log-buffer is defined within the cache-line-aligned memory region so the cache-invalidation procedure wouldn't involve the adjacent data. One of the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the rest of the NVME core driver prefer that method it has been chosen to fix this problem too. Note after a deeper researches we found out that the denoted commit wasn't a root cause of the problem. It just revealed the invalidity by activating the DMA-based NVME SMART log getting performed in the framework of the NVME hwmon driver. The problem was here since the initial commit of the driver. [1] Documentation/core-api/dma-api-howto.rst Fixes: 400b6a7b ("nvme: Add hardware monitoring support") Signed-off-by: NSerge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
An NVMe controller works perfectly fine even when the hwmon initialization fails. Stop returning errors that do not come from a controller reset from nvme_hwmon_init to handle this case consistently. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NSerge Semin <fancer.lancer@gmail.com>
-
由 Russell King (Oracle) 提交于
NVMe uses PRPs for data transfers and has no specific limit for a single DMA segement. Limiting the size will cause problems because the block layer assumes PRP-ish devices using a virt boundary mask don't have a segment limit. And while this is true, we also really need to tell the DMA mapping layer about it, otherwise dma-debug will trip over it. Fixes: 5bd2927a ("nvme-apple: Add initial Apple SoC NVMe driver") Suggested-by: NSven Peter <sven@svenpeter.dev> Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk> [hch: rewrote the commit message based on the PCIe commit] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NEric Curtin <ecurtin@redhat.com> Reviewed-by: NSven Peter <sven@svenpeter.dev>
-
由 Xander Li 提交于
Kingston SSDs do support NVMe Write_Zeroes cmd but take long time to process. The firmware version is locked by these SSDs, we can not expect firmware improvement, so disable Write_Zeroes cmd. Signed-off-by: NXander Li <xander_li@kingston.com.tw> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Dan Carpenter 提交于
There is typo here so it releases the wrong variable. "ctrl->admin_q" was intended instead of "ctrl->fabrics_q". Fixes: fe60e8c5 ("nvme: add common helpers to allocate and free tagsets") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 10月, 2022 5 次提交
-
-
由 Sagi Grimberg 提交于
When we revalidate paths as part of ns size change (as of commit e7d65803), it is possible that during the path revalidation, the only paths that is IO capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet informed to the host, which will cause inflight requests to be requeued (as we have available paths but none are IO capable). These requests on the requeue list are waiting for someone to resubmit them at some point. The IO capable paths will eventually notify the ns resize change to the host, but there is nothing that will kick the requeue list to resubmit the queued requests. Fix this by always kicking the requeue list, and if no IO capable path exists, these requests will be queued again. A typical log that indicates that IOs are requeued: -- nvme nvme1: creating 4 I/O queues. nvme nvme1: new ctrl: "testnqn1" nvme nvme2: creating 4 I/O queues. nvme nvme2: mapped 4/0/0 default/read/poll queues. nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009 nvme nvme1: rescanning namespaces. nvme1n1: detected capacity change from 2097152 to 4194304 block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O block nvme1n1: no usable path - requeuing I/O nvme nvme2: rescanning namespaces. -- Reported-by: NYogev Cohen <yogev@lightbitslabs.com> Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Cc: <stable@vger.kernel.org> # v5.15+ Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Xi Ruoyao 提交于
ZHITAI TiPro5000 SSDs has the same APST sleep problem as its cousin, TiPro7000. The quirk for TiPro7000 has been added in commit 6b961bce ("nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDs"), use the same quirk for TiPro5000. The ASPT data from "nvme id-ctrl /dev/nvme1": vid : 0x1e49 ssvid : 0x1e49 sn : ZTA21T0KA2227304LM mn : ZHITAI TiPlus5000 1TB fr : ZTA09139 [...] ps 0 : mp:6.50W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:5.80W operational enlat:0 exlat:0 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:3.60W operational enlat:0 exlat:0 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0500W non-operational enlat:5000 exlat:10000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0025W non-operational enlat:8000 exlat:45000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- Reported-and-tested-by: NChang Feng <flukehn@gmail.com> Signed-off-by: NXi Ruoyao <xry111@xry111.site> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Abhijit 提交于
Add a quirk to fix Lexar NM760 SSD drives reporting duplicate nsids. Signed-off-by: NAbhijit <abhijit@abhijittomar.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
When we delete a controller, we execute the following: 1. nvme_stop_ctrl() - stop some work elements that may be inflight or scheduled (specifically also .stop_ctrl which cancels ctrl error recovery work) 2. nvme_remove_namespaces() - which first flushes scan_work to avoid competing ns addition/removal 3. continue to teardown the controller However, if err_work was scheduled to run in (1), it is designed to cancel any inflight I/O, particularly I/O that is originating from ns scan_work in (2), but because it is cancelled in .stop_ctrl(), we can prevent forward progress of (2) as ns scanning is blocking on I/O (that will never be cancelled). The race is: 1. transport layer error observed -> err_work is scheduled 2. scan_work executes, discovers ns, generate I/O to it 3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work) - err_work never executed 4. nvme_remove_namespaces() -> flush_work(scan_work) --> deadlock, because scan_work is blocked on I/O that was supposed to be cancelled by err_work, but was cancelled before executing (see stack trace [1]). Fix this by flushing err_work instead of cancelling it, to force it to execute and cancel all inflight I/O. [1]: -- Call Trace: <TASK> __schedule+0x390/0x910 ? scan_shadow_nodes+0x40/0x40 schedule+0x55/0xe0 io_schedule+0x16/0x40 do_read_cache_page+0x55d/0x850 ? __page_cache_alloc+0x90/0x90 read_cache_page+0x12/0x20 read_part_sector+0x3f/0x110 amiga_partition+0x3d/0x3e0 ? osf_partition+0x33/0x220 ? put_partition+0x90/0x90 bdev_disk_changed+0x1fe/0x4d0 blkdev_get_whole+0x7b/0x90 blkdev_get_by_dev+0xda/0x2d0 device_add_disk+0x356/0x3b0 nvme_mpath_set_live+0x13c/0x1a0 [nvme_core] ? nvme_parse_ana_log+0xae/0x1a0 [nvme_core] nvme_update_ns_ana_state+0x3a/0x40 [nvme_core] nvme_mpath_add_disk+0x120/0x160 [nvme_core] nvme_alloc_ns+0x594/0xa00 [nvme_core] nvme_validate_or_alloc_ns+0xb9/0x1a0 [nvme_core] ? __nvme_submit_sync_cmd+0x1d2/0x210 [nvme_core] nvme_scan_work+0x281/0x410 [nvme_core] process_one_work+0x1be/0x380 worker_thread+0x37/0x3b0 ? process_one_work+0x380/0x380 kthread+0x12d/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x30 </TASK> INFO: task nvme:6725 blocked for more than 491 seconds. Not tainted 5.15.65-f0.el7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:nvme state:D stack: 0 pid: 6725 ppid: 1761 flags:0x00004000 Call Trace: <TASK> __schedule+0x390/0x910 ? sched_clock+0x9/0x10 schedule+0x55/0xe0 schedule_timeout+0x24b/0x2e0 ? try_to_wake_up+0x358/0x510 ? finish_task_switch+0x88/0x2c0 wait_for_completion+0xa5/0x110 __flush_work+0x144/0x210 ? worker_attach_to_pool+0xc0/0xc0 flush_work+0x10/0x20 nvme_remove_namespaces+0x41/0xf0 [nvme_core] nvme_do_delete_ctrl+0x47/0x66 [nvme_core] nvme_sysfs_delete.cold.96+0x8/0xd [nvme_core] dev_attr_store+0x14/0x30 sysfs_kf_write+0x38/0x50 kernfs_fop_write_iter+0x146/0x1d0 new_sync_write+0x114/0x1b0 ? intel_pmu_handle_irq+0xe0/0x420 vfs_write+0x18d/0x270 ksys_write+0x61/0xe0 __x64_sys_write+0x1a/0x20 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb -- Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver") Reported-by: NJonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Tested-by: NJonathan Nicklin <jnicklin@blockbridge.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
When we delete a controller, we execute the following: 1. nvme_stop_ctrl() - stop some work elements that may be inflight or scheduled (specifically also .stop_ctrl which cancels ctrl error recovery work) 2. nvme_remove_namespaces() - which first flushes scan_work to avoid competing ns addition/removal 3. continue to teardown the controller However, if err_work was scheduled to run in (1), it is designed to cancel any inflight I/O, particularly I/O that is originating from ns scan_work in (2), but because it is cancelled in .stop_ctrl(), we can prevent forward progress of (2) as ns scanning is blocking on I/O (that will never be cancelled). The race is: 1. transport layer error observed -> err_work is scheduled 2. scan_work executes, discovers ns, generate I/O to it 3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work) - err_work never executed 4. nvme_remove_namespaces() -> flush_work(scan_work) --> deadlock, because scan_work is blocked on I/O that was supposed to be cancelled by err_work, but was cancelled before executing. Fix this by flushing err_work instead of cancelling it, to force it to execute and cancel all inflight I/O. Fixes: b435ecea ("nvme: Add .stop_ctrl to nvme ctrl ops") Fixes: f6c8e432 ("nvme: flush namespace scanning work just before removing namespaces") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 30 9月, 2022 8 次提交
-
-
由 Kanchan Joshi 提交于
if io_uring sends passthrough command with IORING_URING_CMD_FIXED flag, use the pre-registered buffer for IO (non-vectored variant). Pass the buffer/length to io_uring and get the bvec iterator for the range. Next, pass this bvec to block-layer and obtain a bio/request for subsequent processing. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220930062749.152261-13-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kanchan Joshi 提交于
This is a prep patch. Modify nvme_submit_user_cmd and nvme_map_user_request to take ubuffer as plain integer argument, and do away with nvme_to_user_ptr conversion in callers. Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com> Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220930062749.152261-12-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kanchan Joshi 提交于
nvme_alloc_request expects a large number of parameters. Split this out into two functions to reduce number of parameters. First one retains the name nvme_alloc_request, while second one is named nvme_map_user_request. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220930062749.152261-8-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kanchan Joshi 提交于
Pass struct request rather than bio. It helps to kill a parameter, and some processing clean-up too. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220930062749.152261-7-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Anuj Gupta 提交于
User blk_rq_map_user_io instead of duplicating the same code at different places Signed-off-by: NAnuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/20220930062749.152261-6-anuj20.g@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Now that the normal passthrough end_io path doesn't need the request anymore, we can kill the explicit blk_mq_free_request() and just pass back RQ_END_IO_FREE instead. This enables the batched completion from freeing batches of requests at the time. This brings passthrough IO performance at least on par with bdev based O_DIRECT with io_uring. With this and batche allocations, peak performance goes from 110M IOPS to 122M IOPS. For IRQ based, passthrough is now also about 10% faster than previously, going from ~61M to ~67M IOPS. Reviewed-by: NAnuj Gupta <anuj20.g@samsung.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKeith Busch <kbusch@kernel.org> Co-developed-by: NStefan Roesch <shr@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
By splitting up the metadata and non-metadata end_io handling, we can remove any request dependencies on the normal non-metadata IO path. This is in preparation for enabling the normal IO passthrough path to pass the ownership of the request back to the block layer. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAnuj Gupta <anuj20.g@samsung.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKeith Busch <kbusch@kernel.org> Co-developed-by: NStefan Roesch <shr@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Everything is just converted to returning RQ_END_IO_NONE, and there should be no functional changes with this patch. In preparation for allowing the end_io handler to pass ownership back to the block layer, rather than retain ownership of the request. Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 9月, 2022 20 次提交
-
-
由 Christoph Hellwig 提交于
Unused now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Christoph Hellwig 提交于
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_fc_ctrl. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJames Smart <jsmart2021@gmail.com>
-
由 Christoph Hellwig 提交于
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code and use the chance the cleanup the init_hctx methods a bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJames Smart <jsmart2021@gmail.com>
-
由 Christoph Hellwig 提交于
Also update the sqsize field when capping the queue size, and remove the check a queue size that is larger than sqsize given that sqsize is only initialized from opts->queue_size. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJames Smart <jsmart2021@gmail.com>
-
由 Christoph Hellwig 提交于
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_rdma_ctrl. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Christoph Hellwig 提交于
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Christoph Hellwig 提交于
Use the common helpers to allocate and free the tagsets. To make this work the generic nvme_ctrl now needs to be stored in the hctx private data instead of the nvme_tcp_ctrl. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Christoph Hellwig 提交于
Point the private data to the generic controller structure in preparation of using the common tagset init/exit code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Christoph Hellwig 提交于
->nvme_tcp_queue is not used anywhere, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Christoph Hellwig 提交于
Add common helpers to allocate and tear down the admin and I/O tag sets, including the special queues allocated with them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
-
由 Keith Busch 提交于
We've been reporting 2 maps regardless of whether the module parameter asked for anything beyond the default queues. A consequence of this means that blk-mq will reinitialize the all the hardware contexts and io schedulers on every controller reset when the mapping is exactly the same as before. This unnecessary overhead is adding several milliseconds on a reset for environments that don't need it. Report the actual number of mappings in use. Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Rishabh Bhatnagar 提交于
If swiotlb is force enabled dma_max_mapping_size ends up calling swiotlb_max_mapping_size which takes into account the min align mask for the device. Set the min align mask for nvme driver before calling dma_max_mapping_size while calculating max hw sectors. Signed-off-by: NRishabh Bhatnagar <risbhat@amazon.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
When a discovery controller is disconnected, no AENs will arrive to notify the host about discovery log change events. In order to solve this, send a uevent notification when a persistent discovery controller reconnects. We add a new ctrl flag NVME_CTRL_STARTED_ONCE that will be set on the first start, and consecutive calls will find it set, and send the event to userspace if the controller is a discovery controller. Upon the event reception, userspace will re-read the discovery log page and will act upon changes as it sees fit. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We expect to grow a few of these flags for various purposes so make them a proper enumeration. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Tina Hsu 提交于
E3C/E4C SSDs do support the Write Zeroes command in theory, but have very bad performance when using it. As the firmware has been frozen for these products we can not expect firmware improvements for it, so disable Write Zeroes. Signed-off-by: NTina Hsu <tina_hsu@phison.corp-partner.google.com> [hch: update the commit message] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Michael Kelley 提交于
The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are non-functional on NVMe devices because the nvme_pr_clear() and nvme_pr_release() functions set the IEKEY field incorrectly. The IEKEY field should be set only when the key is zero (i.e, not specified). The current code does it backwards. Furthermore, the NVMe spec describes the persistent reservation "clear" function as an option on the reservation release command. The current implementation of nvme_pr_clear() erroneously uses the reservation register command. Fix these errors. Note that NVMe version 1.3 and later specify that setting the IEKEY field will return an error of Invalid Field in Command. The fix will set IEKEY when the key is zero, which is appropriate as these ioctls consider a zero key to be "unspecified", and the intention of the spec change is to require a valid key. Tested on a version 1.4 PCI NVMe device in an Azure VM. Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Fixes: 1d277a63 ("NVMe: Add persistent reservation ops") Signed-off-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The subsystem reset writes to a register, so we have to ensure the device state is capable of handling that otherwise the driver may access unmapped registers. Use the state machine to ensure the subsystem reset doesn't try to write registers on a device already undergoing this type of reset. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214771Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The firmware revision can change on after a reset so copy the most recent info each time instead of just the first time, otherwise the sysfs firmware_rev entry may contain stale data. Reported-by: NJeff Lien <jeff.lien@wdc.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
If a reset occurs after the scan work attempts to issue a command, the reset may quisce the admin queue, which blocks the scan work's command from dispatching. The scan work will not be able to complete while the queue is quiesced. Meanwhile, the reset work will cancel all outstanding admin tags and wait until all requests have transitioned to idle, which includes the passthrough request. But the passthrough request won't be set to idle until after the scan_work flushes, so we're deadlocked. Fix this by handling the end effects after the request has been freed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354Reported-by: NJonathan Derrick <Jonathan.Derrick@solidigm.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 22 9月, 2022 1 次提交
-
-
由 Jens Axboe 提交于
We need the poll_flags to know how to poll for the IO, and we should have the batch structure in preparation for supporting batched completions with iopoll. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-