- 08 4月, 2021 2 次提交
-
-
由 Christoph Hellwig 提交于
Factor out a self-contained helper to just lookup a mddev by the dev_t "unit". Cc: stable@vger.kernel.org Reviewed-by: NHeming Zhao <heming.zhao@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSong Liu <song@kernel.org>
-
由 Zhao Heming 提交于
commit d3374825 ("md: make devices disappear when they are no longer needed.") introduced protection between mddev creating & removing. The md_open shouldn't create mddev when all_mddevs list doesn't contain mddev. With currently code logic, there will be very easy to trigger soft lockup in non-preempt env. This patch changes md_open returning from -ERESTARTSYS to -EBUSY, which will break the infinitely retry when md_open enter racing area. This patch is partly fix soft lockup issue, full fix needs mddev_find is split into two functions: mddev_find & mddev_find_or_alloc. And md_open should call new mddev_find (it only does searching job). For more detail, please refer with Christoph's "split mddev_find" patch in later commits. *** env *** kvm-qemu VM 2C1G with 2 iscsi luns kernel should be non-preempt *** script *** about trigger every time with below script ``` 1 node1="mdcluster1" 2 node2="mdcluster2" 3 4 mdadm -Ss 5 ssh ${node2} "mdadm -Ss" 6 wipefs -a /dev/sda /dev/sdb 7 mdadm -CR /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda \ /dev/sdb --assume-clean 8 9 for i in {1..10}; do 10 echo ==== $i ====; 11 12 echo "test ...." 13 ssh ${node2} "mdadm -A /dev/md0 /dev/sda /dev/sdb" 14 sleep 1 15 16 echo "clean ....." 17 ssh ${node2} "mdadm -Ss" 18 done ``` I use mdcluster env to trigger soft lockup, but it isn't mdcluster speical bug. To stop md array in mdcluster env will do more jobs than non-cluster array, which will leave enough time/gap to allow kernel to run md_open. *** stack *** ``` [ 884.226509] mddev_put+0x1c/0xe0 [md_mod] [ 884.226515] md_open+0x3c/0xe0 [md_mod] [ 884.226518] __blkdev_get+0x30d/0x710 [ 884.226520] ? bd_acquire+0xd0/0xd0 [ 884.226522] blkdev_get+0x14/0x30 [ 884.226524] do_dentry_open+0x204/0x3a0 [ 884.226531] path_openat+0x2fc/0x1520 [ 884.226534] ? seq_printf+0x4e/0x70 [ 884.226536] do_filp_open+0x9b/0x110 [ 884.226542] ? md_release+0x20/0x20 [md_mod] [ 884.226543] ? seq_read+0x1d8/0x3e0 [ 884.226545] ? kmem_cache_alloc+0x18a/0x270 [ 884.226547] ? do_sys_open+0x1bd/0x260 [ 884.226548] do_sys_open+0x1bd/0x260 [ 884.226551] do_syscall_64+0x5b/0x1e0 [ 884.226554] entry_SYSCALL_64_after_hwframe+0x44/0xa9 ``` *** rootcause *** "mdadm -A" (or other array assemble commands) will start a daemon "mdadm --monitor" by default. When "mdadm -Ss" is running, the stop action will wakeup "mdadm --monitor". The "--monitor" daemon will immediately get info from /proc/mdstat. This time mddev in kernel still exist, so /proc/mdstat still show md device, which makes "mdadm --monitor" to open /dev/md0. The previously "mdadm -Ss" is removing action, the "mdadm --monitor" open action will trigger md_open which is creating action. Racing is happening. ``` <thread 1>: "mdadm -Ss" md_release mddev_put deletes mddev from all_mddevs queue_work for mddev_delayed_delete at this time, "/dev/md0" is still available for opening <thread 2>: "mdadm --monitor ..." md_open + mddev_find can't find mddev of /dev/md0, and create a new mddev and | return. + trigger "if (mddev->gendisk != bdev->bd_disk)" and return -ERESTARTSYS. ``` In non-preempt kernel, <thread 2> is occupying on current CPU. and mddev_delayed_delete which was created in <thread 1> also can't be schedule. In preempt kernel, it can also trigger above racing. But kernel doesn't allow one thread running on a CPU all the time. after <thread 2> running some time, the later "mdadm -A" (refer above script line 13) will call md_alloc to alloc a new gendisk for mddev. it will break md_open statement "if (mddev->gendisk != bdev->bd_disk)" and return 0 to caller, the soft lockup is broken. Cc: stable@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NZhao Heming <heming.zhao@suse.com> Signed-off-by: NSong Liu <song@kernel.org>
-
- 06 4月, 2021 15 次提交
-
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): from drivers/block/drbd/drbd_nl.c:24: drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’: drivers/block/drbd/drbd_nl.c:1968:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion] drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'flags' not described in 'drbd_determine_dev_size' drivers/block/drbd/drbd_nl.c:930: warning: Function parameter or member 'rs' not described in 'drbd_determine_dev_size' drivers/block/drbd/drbd_nl.c:1148: warning: Function parameter or member 'dc' not described in 'drbd_check_al_size' Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-12-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'dev' not described in 'blkfront_probe' drivers/block/xen-blkfront.c:1960: warning: Function parameter or member 'id' not described in 'blkfront_probe' drivers/block/xen-blkfront.c:1960: warning: expecting prototype for Allocate the basic(). Prototype was for blkfront_probe() instead drivers/block/xen-blkfront.c:2085: warning: Function parameter or member 'dev' not described in 'blkfront_resume' drivers/block/xen-blkfront.c:2085: warning: expecting prototype for or a backend(). Prototype was for blkfront_resume() instead drivers/block/xen-blkfront.c:2444: warning: wrong kernel-doc identifier on line: Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: xen-devel@lists.xenproject.org Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210312105530.2219008-11-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request' drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request' drivers/block/drbd/drbd_receiver.c:1641: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request' Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-10-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/drbd/drbd_main.c:278: warning: Function parameter or member 'connection' not described in 'tl_clear' drivers/block/drbd/drbd_main.c:278: warning: Excess function parameter 'device' description in 'tl_clear' drivers/block/drbd/drbd_main.c:489: warning: Function parameter or member 'cpu_mask' not described in 'drbd_calc_cpu_mask' drivers/block/drbd/drbd_main.c:528: warning: Excess function parameter 'device' description in 'drbd_thread_current_set_cpu' drivers/block/drbd/drbd_main.c:549: warning: Function parameter or member 'connection' not described in 'drbd_header_size' drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'device' not described in 'send_bitmap_rle_or_plain' drivers/block/drbd/drbd_main.c:1204: warning: Function parameter or member 'c' not described in 'send_bitmap_rle_or_plain' drivers/block/drbd/drbd_main.c:1335: warning: Function parameter or member 'peer_device' not described in '_drbd_send_ack' drivers/block/drbd/drbd_main.c:1335: warning: Excess function parameter 'device' description in '_drbd_send_ack' drivers/block/drbd/drbd_main.c:1379: warning: Function parameter or member 'peer_device' not described in 'drbd_send_ack' drivers/block/drbd/drbd_main.c:1379: warning: Excess function parameter 'device' description in 'drbd_send_ack' drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'connection' not described in 'drbd_send_all' drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'sock' not described in 'drbd_send_all' drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'buffer' not described in 'drbd_send_all' drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'size' not described in 'drbd_send_all' drivers/block/drbd/drbd_main.c:1892: warning: Function parameter or member 'msg_flags' not described in 'drbd_send_all' drivers/block/drbd/drbd_main.c:3525: warning: Function parameter or member 'flags' not described in 'drbd_queue_bitmap_io' drivers/block/drbd/drbd_main.c:3563: warning: Function parameter or member 'flags' not described in 'drbd_bitmap_io' Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-9-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): from drivers/block/drbd/drbd_nl.c:24: drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_set_role’: drivers/block/drbd/drbd_nl.c:793:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion] drivers/block/drbd/drbd_nl.c:795:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion] drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_attach’: drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion] drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_connect’: drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion] drivers/block/drbd/drbd_nl.c: In function ‘drbd_adm_disconnect’: drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from ‘enum drbd_state_rv’ to ‘enum drbd_ret_code’ [-Wenum-conversion] Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-8-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
[P_RETRY_WRITE] is initialised more than once. Fixes the following W=1 kernel build warning(s): drivers/block/drbd/drbd_main.c: In function ‘cmdname’: drivers/block/drbd/drbd_main.c:3660:22: warning: initialized field overwritten [-Woverride-init] drivers/block/drbd/drbd_main.c:3660:22: note: (near initialization for ‘cmdnames[44]’) Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-7-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/drbd/drbd_receiver.c:265: warning: Function parameter or member 'peer_device' not described in 'drbd_alloc_pages' drivers/block/drbd/drbd_receiver.c:265: warning: Excess function parameter 'device' description in 'drbd_alloc_pages' drivers/block/drbd/drbd_receiver.c:1362: warning: Function parameter or member 'connection' not described in 'drbd_may_finish_epoch' drivers/block/drbd/drbd_receiver.c:1362: warning: Excess function parameter 'device' description in 'drbd_may_finish_epoch' drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'resource' not described in 'drbd_bump_write_ordering' drivers/block/drbd/drbd_receiver.c:1451: warning: Function parameter or member 'bdev' not described in 'drbd_bump_write_ordering' drivers/block/drbd/drbd_receiver.c:1451: warning: Excess function parameter 'connection' description in 'drbd_bump_write_ordering' drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op' not described in 'drbd_submit_peer_request' drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'op_flags' not described in 'drbd_submit_peer_request' drivers/block/drbd/drbd_receiver.c:1643: warning: Function parameter or member 'fault_type' not described in 'drbd_submit_peer_request' drivers/block/drbd/drbd_receiver.c:1643: warning: Excess function parameter 'rw' description in 'drbd_submit_peer_request' drivers/block/drbd/drbd_receiver.c:3055: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_0p' drivers/block/drbd/drbd_receiver.c:3138: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_1p' drivers/block/drbd/drbd_receiver.c:3195: warning: Function parameter or member 'peer_device' not described in 'drbd_asb_recover_2p' drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'peer_device' not described in 'receive_bitmap_plain' drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'size' not described in 'receive_bitmap_plain' drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'p' not described in 'receive_bitmap_plain' drivers/block/drbd/drbd_receiver.c:4684: warning: Function parameter or member 'c' not described in 'receive_bitmap_plain' drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'peer_device' not described in 'recv_bm_rle_bits' drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'p' not described in 'recv_bm_rle_bits' drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'c' not described in 'recv_bm_rle_bits' drivers/block/drbd/drbd_receiver.c:4738: warning: Function parameter or member 'len' not described in 'recv_bm_rle_bits' drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'peer_device' not described in 'decode_bitmap_c' drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'p' not described in 'decode_bitmap_c' drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'c' not described in 'decode_bitmap_c' drivers/block/drbd/drbd_receiver.c:4807: warning: Function parameter or member 'len' not described in 'decode_bitmap_c' Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-6-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/drbd/drbd_state.c:913: warning: Function parameter or member 'connection' not described in 'is_valid_soft_transition' drivers/block/drbd/drbd_state.c:913: warning: Excess function parameter 'device' description in 'is_valid_soft_transition' drivers/block/drbd/drbd_state.c:1054: warning: Function parameter or member 'warn' not described in 'sanitize_state' drivers/block/drbd/drbd_state.c:1054: warning: Excess function parameter 'warn_sync_abort' description in 'sanitize_state' drivers/block/drbd/drbd_state.c:1703: warning: Function parameter or member 'state_change' not described in 'after_state_ch' Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-5-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_standby_immediate’: drivers/block/mtip32xx/mtip32xx.c:1216:16: warning: variable ‘start’ set but not used [-Wunused-but-set-variable] Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-4-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/block/drbd/drbd_interval.c:11: warning: Function parameter or member 'node' not described in 'interval_end' drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'root' not described in 'drbd_insert_interval' drivers/block/drbd/drbd_interval.c:26: warning: Function parameter or member 'this' not described in 'drbd_insert_interval' drivers/block/drbd/drbd_interval.c:70: warning: Function parameter or member 'root' not described in 'drbd_contains_interval' drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'root' not described in 'drbd_remove_interval' drivers/block/drbd/drbd_interval.c:96: warning: Function parameter or member 'this' not described in 'drbd_remove_interval' drivers/block/drbd/drbd_interval.c:113: warning: Function parameter or member 'root' not described in 'drbd_find_overlap' Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20210312105530.2219008-3-lee.jones@linaro.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
git://git.infradead.org/nvme由 Jens Axboe 提交于
Pull NVMe updates from Christoph: "nvme updates for Linux 5.13 - fix handling of very large MDTS values (Bart Van Assche) - retrigger ANA log update if group descriptor isn't found (Hannes Reinecke) - fix locking contexts in nvme-tcp and nvmet-tcp (Sagi Grimberg) - return proper error code from discovery ctrl (Hou Pu) - verify the SGLS field in nvmet-tcp and nvmet-fc (Max Gurtovoy) - disallow passthru cmd from targeting a nsid != nsid of the block dev (Niklas Cassel) - do not allow model_number exceed 40 bytes in nvmet (Noam Gottlieb) - enable optional queue idle period tracking in nvmet-tcp (Mark Wunderlich) - various cleanups and optimizations (Chaitanya Kulkarni, Kanchan Joshi) - expose fast_io_fail_tmo in sysfs (Daniel Wagner) - implement non-MDTS command limits (Keith Busch) - reduce warnings for unhandled command effects (Keith Busch) - allocate storage for the SQE as part of the nvme_request (Keith Busch)" * tag 'nvme-5.13-2021-04-06' of git://git.infradead.org/nvme: (33 commits) nvme: fix handling of large MDTS values nvme: implement non-mdts command limits nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev nvme: retrigger ANA log update if group descriptor isn't found nvme: export fast_io_fail_tmo to sysfs nvme: remove superfluous else in nvme_ctrl_loss_tmo_store nvme: use sysfs_emit instead of sprintf nvme-fc: check sgl supported by target nvme-tcp: check sgl supported by target nvmet-tcp: enable optional queue idle period tracking nvmet-tcp: fix incorrect locking in state_change sk callback nvme-tcp: block BH in sk state_change sk callback nvmet: return proper error code from discovery ctrl nvme: warn of unhandled effects only once nvme: use driver pdu command for passthrough nvme-pci: allocate nvme_command within driver pdu nvmet: do not allow model_number exceed 40 bytes nvmet: remove unnecessary ctrl parameter nvmet-fc: update function documentation nvme-fc: fix the function documentation comment ...
-
由 Bart Van Assche 提交于
Instead of triggering an integer overflow and undefined behavior if MDTS is large, set max_hw_sectors to UINT_MAX. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NKeith Busch <kbusch@kernel.org> [hch: rebased to account for the new nvme_mps_to_sectors helper] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Commands that access LBA contents without a data transfer between the host historically have not had a spec defined upper limit. The driver set the queue constraints for such commands to the max data transfer size just to be safe, but this artificial constraint frequently limits devices below their capabilities. The NVMe Workgroup ratified TP4040 defines how a controller may advertise their non-MDTS limits. Use these if provided and default to the current constraints if not. Since the Dataset Management command limits are defined in logical blocks, but without a namespace to tell us the logical block size, the code defaults to the safe 512b size. Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
When a passthru command targets a specific namespace, the ns parameter to nvme_user_cmd()/nvme_user_cmd64() is set. However, there is currently no validation that the nsid specified in the passthru command targets the namespace/nsid represented by the block device that the ioctl was performed on. Add a check that validates that the nsid in the passthru command matches that of the supplied namespace. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Reviewed-by: NJavier González <javier@javigon.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
If ANA is enabled but no ANA group descriptor is found when creating a new namespace the ANA log is most likely out of date, so trigger a re-read. The namespace will be tagged with the NS_ANA_PENDING flag to exclude it from path selection until the ANA log has been re-read. Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems") Reported-by: NMartin George <marting@netapp.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 03 4月, 2021 23 次提交
-
-
由 Daniel Wagner 提交于
Commit 8c4dfea9 ("nvme-fabrics: reject I/O to offline device") introduced fast_io_fail_tmo but didn't export the value to sysfs. The value can be set during the 'nvme connect'. Export the timeout value to user space via sysfs to allow runtime configuration. Cc: Victor Gladkov <Victor.Gladkov@kioxia.com> Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NEwan D. Milne <emilne@redhat.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHimanshu Madhani <himanshu.madhaani@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Daniel Wagner 提交于
If there is an error we will leave the function early. So there is no need for an else. Remove it. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Daniel Wagner 提交于
sysfs_emit is the recommended API to use for formatting strings to be returned to user space. It is equivalent to scnprintf and aware of the PAGE_SIZE buffer size. Suggested-by: NChaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
SGLs support is mandatory for NVMe/FC, make sure that the target is aligned to the specification. Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
SGLs support is mandatory for NVMe/tcp, make sure that the target is aligned to the specification. Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Wunderlich, Mark 提交于
Add 'idle_poll_period_usecs' option used by io_work() to support network devices enabled with advanced interrupt moderation supporting a relaxed interrupt model. It was discovered that such a NIC used on the target was unable to support initiator connection establishment, caused by the existing io_work() flow that immediately exits after a loop with no activity and does not re-queue itself. With this new option a queue is assigned a period of time that no activity must occur in order to become 'idle'. Until the queue is idle the work item is requeued. The new module option is defined as changeable making it flexible for testing purposes. The pre-existing legacy behavior is preserved when no module option for idle_poll_period_usecs is specified. Signed-off-by: NMark Wunderlich <mark.wunderlich@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We are not changing anything in the TCP connection state so we should not take a write_lock but rather a read lock. This caused a deadlock when running nvmet-tcp and nvme-tcp on the same system, where state_change callbacks on the host and on the controller side have causal relationship and made lockdep report on this with blktests: ================================ WARNING: inconsistent lock state 5.12.0-rc3 #1 Tainted: G I -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage. nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp] {IN-SOFTIRQ-W} state was registered at: __lock_acquire+0x79b/0x18d0 lock_acquire+0x1ca/0x480 _raw_write_lock_bh+0x39/0x80 nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp] tcp_fin+0x2a8/0x780 tcp_data_queue+0xf94/0x1f20 tcp_rcv_established+0x6ba/0x1f00 tcp_v4_do_rcv+0x502/0x760 tcp_v4_rcv+0x257e/0x3430 ip_protocol_deliver_rcu+0x69/0x6a0 ip_local_deliver_finish+0x1e2/0x2f0 ip_local_deliver+0x1a2/0x420 ip_rcv+0x4fb/0x6b0 __netif_receive_skb_one_core+0x162/0x1b0 process_backlog+0x1ff/0x770 __napi_poll.constprop.0+0xa9/0x5c0 net_rx_action+0x7b3/0xb30 __do_softirq+0x1f0/0x940 do_softirq+0xa1/0xd0 __local_bh_enable_ip+0xd8/0x100 ip_finish_output2+0x6b7/0x18a0 __ip_queue_xmit+0x706/0x1aa0 __tcp_transmit_skb+0x2068/0x2e20 tcp_write_xmit+0xc9e/0x2bb0 __tcp_push_pending_frames+0x92/0x310 inet_shutdown+0x158/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae irq event stamp: 10687 hardirqs last enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40 hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90 softirqs last enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940 softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(clock-AF_INET); <Interrupt> lock(clock-AF_INET); *** DEADLOCK *** 5 locks held by nvme/1324: #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp] #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300 stack backtrace: CPU: 26 PID: 1324 Comm: nvme Tainted: G I 5.12.0-rc3 #1 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020 Call Trace: dump_stack+0x93/0xc2 mark_lock_irq.cold+0x2c/0xb3 ? verify_lock_unused+0x390/0x390 ? stack_trace_consume_entry+0x160/0x160 ? lock_downgrade+0x100/0x100 ? save_trace+0x88/0x5e0 ? _raw_spin_unlock_irqrestore+0x2d/0x40 mark_lock+0x530/0x1470 ? mark_lock_irq+0x1d10/0x1d10 ? enqueue_timer+0x660/0x660 mark_usage+0x215/0x2a0 __lock_acquire+0x79b/0x18d0 ? tcp_schedule_loss_probe.part.0+0x38c/0x520 lock_acquire+0x1ca/0x480 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? rcu_read_unlock+0x40/0x40 ? tcp_mtu_probe+0x1ae0/0x1ae0 ? kmalloc_reserve+0xa0/0xa0 ? sysfs_file_ops+0x170/0x170 _raw_read_lock+0x3d/0xa0 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp] nvme_tcp_state_change+0x21/0x150 [nvme_tcp] ? sysfs_file_ops+0x170/0x170 inet_shutdown+0x189/0x300 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp] nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp] nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp] nvme_do_delete_ctrl+0x100/0x10c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write_iter+0x2c7/0x460 new_sync_write+0x36c/0x610 ? new_sync_read+0x600/0x600 ? lock_acquire+0x1ca/0x480 ? rcu_read_unlock+0x40/0x40 ? lock_is_held_type+0x9a/0x110 vfs_write+0x5c0/0x870 ksys_write+0xf9/0x1d0 ? __ia32_sys_read+0xa0/0xa0 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340 ? syscall_enter_from_user_mode+0x27/0x70 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: NYi Zhang <yi.zhang@redhat.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
The TCP stack can run from process context for a long time so we should disable BH here. Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hou Pu 提交于
Return NVME_SC_INVALID_FIELD from discovery controller like normal controller when executing identify or get log page command. Signed-off-by: NHou Pu <houpu.main@gmail.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
We don't need to repeatedly spam the kernel logs with the same warning about unhandled passthrough IO effects. Just one warning is sufficient to observe this condition occurs. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
All nvme transport drivers preallocate an nvme command for each request. Assume to use that command for nvme_setup_cmd() instead of requiring drivers pass a pointer to it. All nvme drivers must initialize the generic nvme_request 'cmd' to point to the transport's preallocated nvme_command. The generic nvme_request cmd pointer had previously been used only as a temporary copy for passthrough commands. Since it now points to the command that gets dispatched, passthrough commands must directly set it up prior to executing the request. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Except for pci, all the nvme transport drivers allocate a command within the driver's pdu. Align pci with everyone else by allocating the nvme command within pci's pdu and replace the .queue_rq() stack variable with this. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Noam Gottlieb 提交于
According to the NVM specifications, the model number size should be 40 bytes (bytes 63:24 of the Identify Controller data structure). Therefore, any attempt to store a value into model_number which exceeds 40 bytes should return an error. Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: NNoam Gottlieb <ngottlieb@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
The function nvmet_ctrl_find_get() accepts out pointer to nvmet_ctrl structure. This function returns the same error value from two places that is :- NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR. Move this to the caller so we can change the return type to nvmet_ctrl. Now that we can changed the return type, instead of taking out pointer to the nvmet_ctrl structure remove that function parameter and return the valid nvmet_ctrl pointer on success and NULL on failure. Also, add and rename the goto labels for more readability with comments. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Add minimum description of the hosthandle parameter for nvmet_fc_rcv_ls_req() so that we can get rid of the following warning. drivers/nvme//target/fc.c:2009: warning: Function parameter or member 'hosthandle' not described in 'nvmet_fc_rcv_ls_req Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
The nvme_fc_rcv_ls_req() function has first argument as pointer to remoteport named portprt, but in the documentation comment that is name is used as remoteport. Fix that to get rid if the compilation warning. drivers/nvme//host/fc.c:1724: warning: Function parameter or member 'portptr' not described in 'nvme_fc_rcv_ls_req' drivers/nvme//host/fc.c:1724: warning: Excess function parameter 'remoteport' description in 'nvme_fc_rcv_ls_req' Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Add a new line in functions nvme_pr_preempt(), nvme_pr_clear(), and nvme_pr_release() after variable declaration which follows the rest of the code in the nvme/host/core.c. No functional change(s) in this patch. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
nvme_clear_request() has a check for flag REQ_DONTPREP and it is called from nvme_init_request() and nvme_setuo_cmd(). The function nvme_init_request() is called from nvme_alloc_request() and nvme_alloc_request_qid(). From these two callers new request is allocated everytime. For newly allocated request RQF_DONTPREP is never set. Since after getting a tag, block layer sets the req->rq_flags == 0 and never sets the REQ_DONTPREP when returning the request :- nvme_alloc_request() blk_mq_alloc_request() blk_mq_rq_ctx_init() rq->rq_flags = 0 <---- nvme_alloc_request_qid() blk_mq_alloc_request_hctx() blk_mq_rq_ctx_init() rq->rq_flags = 0 <---- The block layer does set req->rq_flags but REQ_DONTPREP is not one of them and that is set by the driver. That means we can unconditinally set the REQ_DONTPREP value to the rq->rq_flags when nvme_init_request()->nvme_clear_request() is called from above two callers. Move the check for REQ_DONTPREP from nvme_clear_nvme_request() into nvme_setup_cmd(). This is needed since nvme_alloc_request() now gets called from fast path when NVMeOF target is configured with passthru backend to avoid unnecessary checks in the fast path. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Since nvmet_setup_passthru() function falls in fast path when called from the NVMeOF passthru backend, make it inline. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
The function nvme_init_ctrl_finish() (formerly nvme_init_identify()) has grown over the period of time about ~200 lines given the size of nvme id ctrl data structure. Move the nvme_id_ctrl data structure related initilzation into helper nvme_init_identify() and call it from nvme_init_ctrl_finish(). When we move the code into nvme_init_identify() change the local variable i from int to unsigned int and remove the duplicate kfree() after nvme_mpath_init() and jump to the label out_free if nvme_mpath_ini() fails. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
This is a prep patch so that we can move the identify data structure related code initialization from nvme_init_identify() into a helper. Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish(). Next patch will move the nvme_id_ctrl related initialization from newly renamed function nvme_init_ctrl_finish() into the nvme_init_identify() helper. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Kanchan Joshi 提交于
For passthrough I/O commands, effects are usually to be zero. nvme_passthrough_end() does three checks in futility for this case. Bail out of function-call/checks. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Kanchan Joshi 提交于
Use the proper macro instead of hard-coded value. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-