- 07 10月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The removal of the ->revalidate_disk method broke the initialization of the zone bitmaps, as nvme_revalidate_disk now never gets called during initialization. Move the zone related code from nvme_revalidate_disk into a new helper in zns.c, and call it from nvme_alloc_ns in addition to nvme_validate_ns to ensure the zone bitmaps are initialized during probe. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
-
- 27 9月, 2020 2 次提交
-
-
由 Chaitanya Kulkarni 提交于
When using linked list we have to open code the locking, search, and destroy operations with the loops even if data structure doesn't fall into the fast path. One of the main advantage of having XArray to store, search, and remove items is that it handles all the locking by itself, avoids the loops when using linked lists, provides clear API to replace the linked list's search and destroy loops. This patch replaces the ctrl->cel list with XArray and removes :- a. Extra code needed for the linked list for ctrl->cel item management such as nvme_find_cel(). b. Destroy loop in the nvme_free_ctrl(). c. Explicit insertion locking in the nvme_get_effects_log(). Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: refactored a bit, split the bug fixes into a separate prep patch] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
-
- 25 9月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Drivers shouldn't really mess with the readahead size, as that is a VM concept. Instead set it based on the optimal I/O size by lifting the algorithm from the md driver when registering the disk. Also set bdi->io_pages there as well by applying the same scheme based on max_sectors. To ensure the limits work well for stacking drivers a new helper is added to update the readahead limits from the block limits, which is also called from disk_stack_limits. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 9月, 2020 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
Get and put the reference to the ctrl in the nvme_dev_open() and nvme_dev_release() before and after module get/put for ctrl in char device file operations. Introduce char_dev relase function, get/put the controller and module which allows us to fix the potential Oops which can be easily reproduced with a passthru ctrl (although the problem also exists with pure user access): Entering kdb (current=0xffff8887f8290000, pid 3128) on processor 30 Oops: (null) due to oops @ 0xffffffffa01019ad CPU: 30 PID: 3128 Comm: bash Tainted: G W OE 5.8.0-rc4nvme-5.9+ #35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.4 RIP: 0010:nvme_free_ctrl+0x234/0x285 [nvme_core] Code: 57 10 a0 e8 73 bf 02 e1 ba 3d 11 00 00 48 c7 c6 98 33 10 a0 48 c7 c7 1d 57 10 a0 e8 5b bf 02 e1 8 RSP: 0018:ffffc90001d63de0 EFLAGS: 00010246 RAX: ffffffffa05c0440 RBX: ffff8888119e45a0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8888177e9550 RDI: ffff8888119e43b0 RBP: ffff8887d4768000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: ffffc90001d63c90 R12: ffff8888119e43b0 R13: ffff8888119e5108 R14: dead000000000100 R15: ffff8888119e5108 FS: 00007f1ef27b0740(0000) GS:ffff888817600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa05c0470 CR3: 00000007f6bee000 CR4: 00000000003406e0 Call Trace: device_release+0x27/0x80 kobject_put+0x98/0x170 nvmet_passthru_ctrl_disable+0x4a/0x70 [nvmet] nvmet_passthru_enable_store+0x4c/0x90 [nvmet] configfs_write_file+0xe6/0x150 vfs_write+0xba/0x1e0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f1ef1eb2840 Code: Bad RIP value. RSP: 002b:00007fffdbff0eb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1ef1eb2840 RDX: 0000000000000002 RSI: 00007f1ef27d2000 RDI: 0000000000000001 RBP: 00007f1ef27d2000 R08: 000000000000000a R09: 00007f1ef27b0740 R10: 0000000000000001 R11: 0000000000000246 R12: 00007f1ef2186400 R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000000 With this patch fix we take the module ref count in nvme_dev_open() and release that ref count in newly introduced nvme_dev_release(). Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 09 9月, 2020 1 次提交
-
-
由 James Smart 提交于
The indicated patch introduced a barrier in the sysfs_delete attribute for the controller that rejects the request if the controller isn't created. "Created" is defined as at least 1 call to nvme_start_ctrl(). This is problematic in error-injection testing. If an error occurs on the initial attempt to create an association and the controller enters reconnect(s) attempts, the admin cannot delete the controller until either there is a successful association created or ctrl_loss_tmo times out. Where this issue is particularly hurtful is when the "admin" is the nvme-cli, it is performing a connection to a discovery controller, and it is initiated via auto-connect scripts. With the FC transport, if the first connection attempt fails, the controller enters a normal reconnect state but returns control to the cli thread that created the controller. In this scenario, the cli attempts to read the discovery log via ioctl, which fails, causing the cli to see it as an empty log and then proceeds to delete the discovery controller. The delete is rejected and the controller is left live. If the discovery controller reconnect then succeeds, there is no action to delete it, and it sits live doing nothing. Cc: <stable@vger.kernel.org> # v5.7+ Fixes: ce151813 ("nvme: Fix controller creation races with teardown flow") Signed-off-by: NJames Smart <james.smart@broadcom.com> CC: Israel Rukshin <israelr@mellanox.com> CC: Max Gurtovoy <maxg@mellanox.com> CC: Christoph Hellwig <hch@lst.de> CC: Keith Busch <kbusch@kernel.org> CC: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 02 9月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Keep control in the NVMe driver instead of going through an indirect call back into ->revalidate_disk. Also reorder the function a bit to be easier to follow with the additional code. And now that we have removed all callers of revalidate_disk() in the nvme code, ->revalidate_disk is only called from the open code when first opening the device. Which is of course totally pointless as we have a valid size since the initial scan, and will get an updated view through the asynchronous notifiation everytime the size changes. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
In nvme_set_queue_dying we really just want to ensure the disk and bdev sizes are set to zero. Going through revalidate_disk leads to a somewhat arcance and complex callchain relying on special behavior in a few places. Instead just lift the set_capacity directly to nvme_set_queue_dying, and rename and move the nvme_mpath_update_disk_size helper so that we can use it in nvme_set_queue_dying to propagate the size to the bdev without detours. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 8月, 2020 4 次提交
-
-
由 Keith Busch 提交于
The kernel requires a power of two for boundaries because that's the only way it can efficiently split commands that cross them. A controller, however, may report a non-power of two boundary. The driver had been rounding the controller's value to one the kernel can use, but splitting on the wrong boundary provides no benefit on the device side, and incurs additional submission overhead from non-optimal splits. Don't provide any boundary hint if the controller's value can't be used and log a warning when first scanning a disk's unreported IO boundary. Since the chunk sector logic has grown, move it to a separate function. Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Keith Busch 提交于
If the driver has to unbind from the controller for an early failure before the subsystem has been set up, there won't be a subsystem holding the controller's instance, so the controller needs to free its own instance in this case. Fixes: 733e4b69 ("nvme: Assign subsys instance from first ctrl") Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
PCIe controllers do not have fabric opts, verify they exist before showing ctrl_loss_tmo or reconnect_delay attributes. Fixes: 764075fd ("nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs") Reported-by: NTobias Markus <tobias@markus-regensburg.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Sagi Grimberg 提交于
Users can detect if the wait has completed or not and take appropriate actions based on this information (e.g. weather to continue initialization or rather fail and schedule another initialization attempt). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 22 8月, 2020 4 次提交
-
-
由 Chao Leng 提交于
If a command send through nvme-multipath failed on a dying queue, resend it on another path. Signed-off-by: NChao Leng <lengchao@huawei.com> [hch: rebased on top of the completion refactoring] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Lift all the code to decide the dispostition of a completed command from nvme_complete_rq and nvme_failover_req into a new helper, which returns an emum of the potential actions. nvme_complete_rq then just switches on those and calls the proper helper for the action. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
Zoned block devices reuse the chunk_sectors queue limit to define zone boundaries. If a such a device happens to also report an optimal boundary, do not use that to define the chunk_sectors as that may intermittently interfere with io splitting and zone size queries. Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Logan Gunthorpe 提交于
When locking the ctrl->lock spinlock IRQs need to be disabled to avoid a dead lock. The new spin_lock() calls recently added produce the following lockdep warning when running the blktest nvme/003: ================================ WARNING: inconsistent lock state -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/22 [HC0[0]:SC1[1]:HE0:SE0] takes: ffff888276a8c4c0 (&ctrl->lock){+.?.}-{2:2}, at: nvme_keep_alive_end_io+0x50/0xc0 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x164/0x500 _raw_spin_lock+0x28/0x40 nvme_get_effects_log+0x37/0x1c0 nvme_init_identify+0x9e4/0x14f0 nvme_reset_work+0xadd/0x2360 process_one_work+0x66b/0xb70 worker_thread+0x6e/0x6c0 kthread+0x1e7/0x210 ret_from_fork+0x22/0x30 irq event stamp: 1449221 hardirqs last enabled at (1449220): [<ffffffff81c58e69>] ktime_get+0xf9/0x140 hardirqs last disabled at (1449221): [<ffffffff83129665>] _raw_spin_lock_irqsave+0x25/0x60 softirqs last enabled at (1449210): [<ffffffff83400447>] __do_softirq+0x447/0x595 softirqs last disabled at (1449215): [<ffffffff81b489b5>] run_ksoftirqd+0x35/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&ctrl->lock); <Interrupt> lock(&ctrl->lock); *** DEADLOCK *** no locks held by ksoftirqd/2/22. stack backtrace: CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 5.8.0-rc4-eid-vmlocalyes-dbg-00157-g7236657c #1450 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0xc8/0x11a print_usage_bug.cold.63+0x235/0x23e mark_lock+0xa9c/0xcf0 __lock_acquire+0xd9a/0x2b50 lock_acquire+0x164/0x500 _raw_spin_lock_irqsave+0x40/0x60 nvme_keep_alive_end_io+0x50/0xc0 blk_mq_end_request+0x158/0x210 nvme_complete_rq+0x146/0x500 nvme_loop_complete_rq+0x26/0x30 [nvme_loop] blk_done_softirq+0x187/0x1e0 __do_softirq+0x118/0x595 run_ksoftirqd+0x35/0x50 smpboot_thread_fn+0x1d3/0x310 kthread+0x1e7/0x210 ret_from_fork+0x22/0x30 Fixes: be93e87e ("nvme: support for multiple Command Sets Supported and Effects log pages") Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Tested-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 7月, 2020 10 次提交
-
-
由 Christoph Hellwig 提交于
Add a quirk for a device that does not support the Identify Namespace Identification Descriptor list despite claiming 1.3 compliance. Fixes: ea43d970 ("nvme: fix identify error status silent ignore") Reported-by: NIngo Brunberg <ingo_brunberg@web.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NIngo Brunberg <ingo_brunberg@web.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Logan Gunthorpe 提交于
nvme_find_get_ns() and nvme_put_ns() are required by the target passthru code and are exported under the NVME_TARGET_PASSTHRU namespace. Based-on-a-patch-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Logan Gunthorpe 提交于
nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0). It makes use of filp_open() to open the file and uses the private data to obtain a pointer to the struct nvme_ctrl. If the fops of the file do not match, -EINVAL is returned. The purpose of this function is to support NVMe-OF target passthru and is exported under the NVME_TARGET_PASSTHRU namespace. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Logan Gunthorpe 提交于
Introduce a new nvme_execute_passthru_rq() helper which calls nvme_passthru_[start|end]() around blk_execute_rq(). This ensures all passthru calls (including nvme_submit_io()) will be wrapped appropriately. nvme_execute_passthru_rq() will also be useful for the nvmet passthru code and is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Logan Gunthorpe 提交于
Separate the code to obtain command effects from the code to start a passthru request and move the nvme_passthru_start() and nvme_passthru_end() functions up above nvme_submit_user_cmd() in order that they may be used in a new helper a subsequent patch. The new helper function will be necessary for nvmet passthru code to determine if we need to change out of interrupt context to handle the effects. It is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Logan Gunthorpe 提交于
The host driver should decide whether to use SGLs or PRPs and they currently assume the flags are cleared after the call to nvme_setup_cmd(). However, passed-through commands may erroneously set these bits; so clear them for all cases. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f ("nvme: add ANA support") Reported-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Saving the nvme controller's page size was from a time when the driver tried to use different sized pages, but this value is always set to a constant, and has been this way for some time. Remove the 'page_size' field and replace its usage with the constant value. This also lets the compiler make some micro-optimizations in the io path, and that's always a good thing. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Baolin Wang 提交于
We've already validated the 'kato' in nvme_start_keep_alive(), thus no need to validate it again in nvme_start_ctrl(). Remove it. Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Dan Carpenter 提交于
"v" is an unsigned int so it can't be more than UINT_MAX. Removing this check makes it easier to preserve the error code as well. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 21 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
This function is just a tiny wrapper around blk_stack_limits. Open code it int the two callers. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Tested-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 7月, 2020 1 次提交
-
-
由 Anthony Iliopoulos 提交于
Commit 3b4b1972 ("nvme: fix possible deadlock when I/O is blocked") reverted multipath head disk revalidation due to deadlocks caused by holding the bd_mutex during revalidate. Updating the multipath disk blockdev size is still required though for userspace to be able to observe any resizing while the device is mounted. Directly update the bdev inode size to avoid unnecessarily holding the bdev->bd_mutex. Fixes: 3b4b1972 ("nvme: fix possible deadlock when I/O is blocked") Signed-off-by: NAnthony Iliopoulos <ailiop@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 09 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
By the time a namespace is added to ctrl->namespaces list and thus can be looked up ns->disk has been assigned, and it it never cleared. Remove all the checks for ns->disk being NULL. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
- 08 7月, 2020 7 次提交
-
-
由 Sagi Grimberg 提交于
This is useful information, and moreover its it's useful to be able to alter these parameters per controller after it has been established. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined in NVM Express TP4053. Zoned namespaces are discovered based on their Command Set Identifier reported in the namespaces Namespace Identification Descriptor list. A successfully discovered Zoned Namespace will be registered with the block layer as a host managed zoned block device with Zone Append command support. A namespace that does not support append is not supported by the driver. Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NHans Holmberg <hans.holmberg@wdc.com> Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: NAjay Joshi <ajay.joshi@wdc.com> Signed-off-by: NAravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NKeith Busch <keith.busch@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The Commands Supported and Effects log page was extended with a CSI field that enables the host to query the log page for each command set supported. Retrieve this log page for each command set that an attached namespace supports, and save a pointer to that log in the namespace head. Reviewed-by: NMatias Bjørling <matias.bjorling@wdc.com> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NKeith Busch <keith.busch@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
Implements support for multiple I/O Command Sets. NVMe TP 4056 introduces a method to enumerate multiple command sets per namespace. If the command set is exposed, this method for enumeration will be used instead of the traditional method that uses the CC.CSS register command set register for command set identification. For namespaces where the Command Set Identifier is not supported or recognized, the specific namespace will not be created. Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NMatias Bjørling <matias.bjorling@wdc.com> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Baolin Wang 提交于
Use USEC_PER_SEC instead of magic numbers to make code more readable. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
In nvme_init_identify() when calculating submission queue size use u16 instead of int type in the min_t() since target variable ctrl->sqsize is of type u16. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
In nvme_configure_directives() when calculating number of streams use u16 instead of unsigned type in the min_t() since target variable ctrl->nr_streams is of type u16. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 02 7月, 2020 1 次提交
-
-
由 Sagi Grimberg 提交于
Commit 59c7c3ca intended to only silently ignore non retry-able errors (DNR bit set) such that we can still identify misbehaving controllers, and in the other hand propagate retry-able errors (DNR bit cleared) so we don't wrongly abandon a namespace just because it happens to be temporarily inaccessible. The goal remains the same as the original commit where this was introduced but unfortunately had the logic backwards. Fixes: 59c7c3ca ("nvme: fix possible hang when ns scanning fails during error recovery") Reported-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 01 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-