- 13 8月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Use the nvme-internal NVME_NSHEAD_DISK_LIVE flag instead of abusing the block layer state. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210809064028.1198327-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 7月, 2021 1 次提交
-
-
由 Hannes Reinecke 提交于
When the last path to a ns_head drops the current code removes the ns_head from the subsystem list, but will only delete the disk itself if the last reference to the ns_head drops. This is causing an refcounting imbalance eg when applications have a reference to the disk, as then they'll never get notified that the disk is in fact dead. This patch moves the call 'del_gendisk' into nvme_mpath_check_last_path(), ensuring that the disk can be properly removed and applications get the appropriate notifications. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 16 6月, 2021 2 次提交
-
-
由 Daniel Wagner 提交于
Commit ce86dad2 ("nvme-multipath: reset bdev to ns head when failover") moved the reset code where the bio is added to the requeue_list for the failover path. But it left the original bio_set_dev in nvme_requeue_work. There is a second path to nvme_requee_work. It is via nvme_ns_head_submit_bio. Though we don't have to set bio->bi_bdev for this path either, as it points to the correct bdev already. Let's remove the bio_set_dev. It's updating the bio->bi_bdev with the same pointer and thus it's unnecessary. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Daniel Wagner 提交于
The controller is required to have a non-zero MNAN value if it supports ANA: If the controller supports Asymmetric Namespace Access Reporting, then this field shall be set to a non-zero value that is less than or equal to the NN value. Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 03 6月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Split multipath support out of nvme_report_zones into a separate helper and simplify the non-multipath version as a result. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 01 6月, 2021 2 次提交
-
-
由 Christoph Hellwig 提交于
Convert the nvme-multipath driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-19-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Automatically set the GENHD_FL_EXT_DEVT flag for all disks allocated without an explicit number of minors. This is what all new block drivers should do, so make sure it is the default without boilerplate code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org> Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 5月, 2021 1 次提交
-
-
由 Hou Pu 提交于
The new ana_log_size should be used instead of the old one. Or kernel NULL pointer dereference will happen like below: [ 38.957849][ T69] BUG: kernel NULL pointer dereference, address: 000000000000003c [ 38.975550][ T69] #PF: supervisor write access in kernel mode [ 38.975955][ T69] #PF: error_code(0x0002) - not-present page [ 38.976905][ T69] PGD 0 P4D 0 [ 38.979388][ T69] Oops: 0002 [#1] SMP NOPTI [ 38.980488][ T69] CPU: 0 PID: 69 Comm: kworker/0:2 Not tainted 5.12.0+ #54 [ 38.981254][ T69] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 38.982502][ T69] Workqueue: events nvme_loop_execute_work [ 38.985219][ T69] RIP: 0010:memcpy_orig+0x68/0x10f [ 38.986203][ T69] Code: 83 c2 20 eb 44 48 01 d6 48 01 d7 48 83 ea 20 0f 1f 00 48 83 ea 20 4c 8b 46 f8 4c 8b 4e f0 4c 8b 56 e8 4c 8b 5e e0 48 8d 76 e0 <4c> 89 47 f8 4c 89 4f f0 4c 89 57 e8 4c 89 5f e0 48 8d 7f e0 73 d2 [ 38.987677][ T69] RSP: 0018:ffffc900001b7d48 EFLAGS: 00000287 [ 38.987996][ T69] RAX: 0000000000000020 RBX: 0000000000000024 RCX: 0000000000000010 [ 38.988327][ T69] RDX: ffffffffffffffe4 RSI: ffff8881084bc004 RDI: 0000000000000044 [ 38.988620][ T69] RBP: 0000000000000024 R08: 0000000100000000 R09: 0000000000000000 [ 38.988991][ T69] R10: 0000000100000000 R11: 0000000000000001 R12: 0000000000000024 [ 38.989289][ T69] R13: ffff8881084bc000 R14: 0000000000000000 R15: 0000000000000024 [ 38.989845][ T69] FS: 0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 [ 38.990234][ T69] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 38.990490][ T69] CR2: 000000000000003c CR3: 00000001085b2000 CR4: 00000000000006f0 [ 38.991105][ T69] Call Trace: [ 38.994157][ T69] sg_copy_buffer+0xb8/0xf0 [ 38.995357][ T69] nvmet_copy_to_sgl+0x48/0x6d [ 38.995565][ T69] nvmet_execute_get_log_page_ana+0xd4/0x1cb [ 38.995792][ T69] nvmet_execute_get_log_page+0xc9/0x146 [ 38.995992][ T69] nvme_loop_execute_work+0x3e/0x44 [ 38.996181][ T69] process_one_work+0x1c3/0x3c0 [ 38.996393][ T69] worker_thread+0x44/0x3d0 [ 38.996600][ T69] ? cancel_delayed_work+0x90/0x90 [ 38.996804][ T69] kthread+0xf7/0x130 [ 38.996961][ T69] ? kthread_create_worker_on_cpu+0x70/0x70 [ 38.997171][ T69] ret_from_fork+0x22/0x30 [ 38.997705][ T69] Modules linked in: [ 38.998741][ T69] CR2: 000000000000003c [ 39.000104][ T69] ---[ end trace e719927b609d0fa0 ]--- Fixes: 5e1f6899 ("nvme-multipath: fix double initialization of ANA state") Signed-off-by: NHou Pu <houpu.main@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 5月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
nvme_init_identify and thus nvme_mpath_init can be called multiple times and thus must not overwrite potentially initialized or in-use fields. Split out a helper for the basic initialization when the controller is initialized and make sure the init_identify path does not blindly change in-use data structures. Fixes: 0d0b660f ("nvme: add ANA support") Reported-by: NMartin Wilck <mwilck@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de>
-
- 04 5月, 2021 1 次提交
-
-
由 Daniel Wagner 提交于
When a request finally completes in end_io() after it has failed over, the bdev pointer can be stale and thus the system can crash. Set the bdev back to ns head, so the request is map to an active path when resubmitted. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 22 4月, 2021 1 次提交
-
-
由 Minwoo Im 提交于
Userspace has not been allowed to I/O to device that's failed to be initialized. This patch introduces generic per-namespace character device to allow userspace to I/O regardless the block device is there or not. The chardev naming convention will similar to the existing blkdev naming, using a ng prefix instead of nvme, i.e. - /dev/ngXnY It also supports multipath which means it will not expose chardev for the hidden namespace blkdevs (e.g., nvmeXcYnZ). If /dev/ngXnY is created for a ns_head, then I/O request will be routed to a specific controller selected by the iopolicy of the subsystem. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Tested-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 15 4月, 2021 3 次提交
-
-
由 Christoph Hellwig 提交于
Move the multipath block_device_operations to multipath.c, where they belong. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Return false from nvme_set_disk_name and let the caller set the non-multipath name instead of duplicating the naming information in two places. Also remove the pointless local variables for the disk name and flags and the not needed ctrl argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in multipath.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 06 4月, 2021 1 次提交
-
-
由 Hannes Reinecke 提交于
If ANA is enabled but no ANA group descriptor is found when creating a new namespace the ANA log is most likely out of date, so trigger a re-read. The namespace will be tagged with the NS_ANA_PENDING flag to exclude it from path selection until the ANA log has been re-read. Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems") Reported-by: NMartin George <marting@netapp.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 03 4月, 2021 1 次提交
-
-
由 Daniel Wagner 提交于
sysfs_emit is the recommended API to use for formatting strings to be returned to user space. It is equivalent to scnprintf and aware of the PAGE_SIZE buffer size. Suggested-by: NChaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 10 2月, 2021 1 次提交
-
-
由 Keith Busch 提交于
The bio based drivers only require the request_queue's nr_zones is set, so set this field in the head if the namespace path is zoned. Fixes: 240e6ee2 ("nvme: support for zoned namespaces") Reported-by: NMinwoo Im <minwoo.im.dev@gmail.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 29 1月, 2021 1 次提交
-
-
由 Daniel Wagner 提交于
nvme_round_robin_path() should test if the return ns pointer is valid. nvme_next_ns() will return a NULL pointer if there is no path left. Fixes: 75c10e73 ("nvme-multipath: round-robin I/O policy") Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 26 1月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Always use the bio_set_dev helper to assign ->bi_bdev to make sure other state related to the device is uptodate. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 1月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 12月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The request_queue can trivially be derived from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 12月, 2020 1 次提交
-
-
由 Victor Gladkov 提交于
Commands get stuck while Host NVMe-oF controller is in reconnect state. The controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until a successful reconnect or until the reconnect time-out is reached. The default reconnect time out is 10 minutes. Applications are expecting commands to complete with success or error within a certain timeout (30 seconds by default). The NVMe host is enforcing that timeout while it is connected, but during reconnect the timeout is not enforced and commands may get stuck for a long period or even forever. To fix this long delay due to the default timeout, introduce new "fast_io_fail_tmo" session parameter. The timeout is measured in seconds from the controller reconnect and any command beyond that timeout is rejected. The new parameter value may be passed during 'connect'. The default value of -1 means no timeout (similar to current behavior). Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 25 9月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The BDI_CAP_STABLE_WRITES is one of the few bits of information in the backing_dev_info shared between the block drivers and the writeback code. To help untangling the dependency replace it with a queue flag and a superblock flag derived from it. This also helps with the case of e.g. a file system requiring stable writes due to its own checksumming, but not forcing it on other users of the block device like the swap code. One downside is that we an't support the stable_pages_required bdi attribute in sysfs anymore. It is replaced with a queue attribute which also is writable for easier testing. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 8月, 2020 3 次提交
-
-
由 Christoph Hellwig 提交于
Lift all the code to decide the dispostition of a completed command from nvme_complete_rq and nvme_failover_req into a new helper, which returns an emum of the potential actions. nvme_complete_rq then just switches on those and calls the proper helper for the action. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Martin Wilck 提交于
If we find an optimized path, we quit the loop immediately. Thus we can use just one variable for the next path, slighly simplifying the code. Signed-off-by: NMartin Wilck <mwilck@suse.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Martin Wilck 提交于
If there's only one usable, non-optimized path, nvme_round_robin_path() returns NULL, which is wrong. Fix it by falling back to "old", like in the single optimized path case. Also, if the active path isn't changed, there's no need to re-assign the pointer. Fixes: 3f6e3246 ("nvme-multipath: fix logic for non-optimized paths") Signed-off-by: NMartin Wilck <mwilck@suse.com> Signed-off-by: NMartin George <marting@netapp.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 7月, 2020 3 次提交
-
-
由 Hannes Reinecke 提交于
When nvme_round_robin_path() finds a valid namespace we should be using it; falling back to __nvme_find_path() for non-optimized paths will cause the result from nvme_round_robin_path() to be ignored for non-optimized paths. Fixes: 75c10e73 ("nvme-multipath: round-robin I/O policy") Signed-off-by: NMartin Wilck <mwilck@suse.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Martin Wilck 提交于
Handle the special case where we have exactly one optimized path, which we should keep using in this case. Fixes: 75c10e73 ("nvme-multipath: round-robin I/O policy") Signed off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f ("nvme: add ANA support") Reported-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 08 7月, 2020 1 次提交
-
-
由 Keith Busch 提交于
The Commands Supported and Effects log page was extended with a CSI field that enables the host to query the log page for each command set supported. Retrieve this log page for each command set that an attached namespace supports, and save a pointer to that log in the namespace head. Reviewed-by: NMatias Bjørling <matias.bjorling@wdc.com> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NKeith Busch <keith.busch@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 02 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
For private namespaces ns->head_disk is NULL, so add a NULL check before updating the BDI capabilities. Fixes: b2ce4d90 ("nvme-multipath: set bdi capabilities once") Reported-by: NAvinash M N <Avinash.M.N@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
-
- 01 7月, 2020 4 次提交
-
-
由 Christoph Hellwig 提交于
Now that submit_bio_noacct has a decent blk-mq fast path there is no more need for this bypass. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The queue can be trivially derived from the bio, so pass one less argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 6月, 2020 4 次提交
-
-
由 Sagi Grimberg 提交于
The mpath disk node takes a reference on the request mpath request queue when adding live path to the mpath gendisk. However if we connected to an inaccessible path device_add_disk is not called, so if we disconnect and remove the mpath gendisk we endup putting an reference on the request queue that was never taken [1]. Fix that to check if we ever added a live path (using NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue reference. [1]: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: disk_release+0xa2/0xc0 device_release+0x28/0x80 kobject_put+0xa5/0x1b0 nvme_put_ns_head+0x26/0x70 [nvme_core] nvme_put_ns+0x30/0x60 [nvme_core] nvme_remove_namespaces+0x9b/0xe0 [nvme_core] nvme_do_delete_ctrl+0x43/0x5c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write+0xc1/0x1a0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: NAnton Eidelman <anton@lightbitslabs.com> Tested-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Anton Eidelman 提交于
In the following scenario scan_work and ana_work will deadlock: When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on ANY ctrl will not be able to complete nvme_mpath_set_live() on the same ns->head, which is required in order to update the new accessible path and remove NVME_NS_ANA_PENDING.. Therefore IO never completes: deadlock [1]. Fix: Move device_add_disk out of the head->lock and protect it with an atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit. [1]: kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:2 D 0 160 2 0x80004000 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core] kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_read_ana_log+0x76/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:4 D 0 439 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x256/0x390 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f ("nvme: add ANA support") Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Right now ns->head->lock is protecting namespace mutation which is wrong and unneeded. Move it to only protect against head mutations. While we're at it, remove unnecessary ns->head reference as we already have head pointer. The problem with this is that the head->lock spans mpath disk node I/O that may block under some conditions (if for example the controller is disconnecting or the path became inaccessible), The locking scheme does not allow any other path to enable itself, preventing blocked I/O to complete and forward-progress from there. This is a preparation patch for the fix in a subsequent patch where the disk I/O will also be done outside the head->lock. Fixes: 0d0b660f ("nvme: add ANA support") Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Anton Eidelman 提交于
When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO. In order to recover and complete the IO ana_work on the same ctrl should be able to update the path state and remove NVME_NS_ANA_PENDING. The deadlock occurs because scan_work keeps holding ana_lock, so ana_work hangs [1]. Fix: Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy of the ANA group desc, and then calls nvme_update_ns_ana_state() without holding ana_lock. [1]: kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? select_task_rq_fair+0x1aa/0x5c0 kernel: ? kvm_sched_clock_read+0x11/0x20 kernel: ? sched_clock+0x9/0x10 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x35/0x40 Fixes: 0d0b660f ("nvme: add ANA support") Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 10 5月, 2020 1 次提交
-
-
由 Keith Busch 提交于
Improve code readability by defining the specification's constants that the driver is using when decoding identification payloads. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NBart van Assche <bvanassche@acm.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-