- 29 3月, 2023 1 次提交
-
-
由 Zhong Jinghua 提交于
hulk inclusion category: bugfix bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162 ---------------------------------------- This reverts commit 51e35e67. There is a new fix for this problem in the mainline patch, so the patch should return to the mainline solution. mainline patch: d36a9ea5 ("block: fix use-after-free of q->q_usage_counter") Fixes: 51e35e67("block: fix null-deref in percpu_ref_put") Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 12 12月, 2022 1 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I65K8D CVE: NA -------------------------------- Before commit f60df4a0 ("blk-mq: fix kabi broken in struct request"), drivers will got cmd address right after request, however, after this commit, drivers will got cmd address after request_wrapper instead, which is bigger than request and will cause compatibility issues. Fix the problem by placing request_wrapper behind cmd, so that the cmd address for drivers will stay the same. Before commit: |request|cmd| After commit: |request|request_wrapper|cmd| With this patch: |request|cmd|request_wrapper| Performance test: arm64 Kunpeng-920 96 core 1) null_blk setup: modprobe null_blk nr_devices=0 && udevadm settle && cd /sys/kernel/config/nullb && mkdir nullb0 && cd nullb0 && echo 0 > completion_nsec && echo 512 > blocksize && echo 0 > home_node && echo 0 > irqmode && echo 1024 > size && echo 0 > memory_backed && echo 2 > queue_mode && echo 4096 > hw_queue_depth && echo 96 > submit_queues && echo 1 > power 2) fio test script: [global] ioengine=libaio direct=1 numjobs=96 iodepth=32 bs=4k rw=randwrite allow_mounted_write=0 time_based runtime=60 group_reporting=1 ioscheduler=none cpus_allowed_policy=split cpus_allowed=0-95 [test] filename=/dev/nullb0 3) iops test result: without this patch: 23.9M with this patch: 24.1M Fixes: f60df4a0 ("blk-mq: fix kabi broken in struct request") Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 02 11月, 2022 3 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5 CVE: NA -------------------------------- Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 John Garry 提交于
mainline inclusion from mainline-v5.16-rc1 commit e155b0c2 category: performance bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e155b0c238b20f0a866f4334d292656665836c8a -------------------------------- Currently we use separate sbitmap pairs and active_queues atomic_t for shared sbitmap support. However a full sets of static requests are used per HW queue, which is quite wasteful, considering that the total number of requests usable at any given time across all HW queues is limited by the shared sbitmap depth. As such, it is considerably more memory efficient in the case of shared sbitmap to allocate a set of static rqs per tag set or request queue, and not per HW queue. So replace the sbitmap pairs and active_queues atomic_t with a shared tags per tagset and request queue, which will hold a set of shared static rqs. Since there is now no valid HW queue index to be passed to the blk_mq_ops .init and .exit_request callbacks, pass an invalid index token. This changes the semantics of the APIs, such that the callback would need to validate the HW queue index before using it. Currently no user of shared sbitmap actually uses the HW queue index (as would be expected). Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflict: c6f9c0e2 ("blk-mq: allow hardware queue to get more tag while sharing a tag set") is merged, which will cause lots of conflicts for this patch, and in the meantime, the functionality will need to be adapted in that patch. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5N162 CVE: NA -------------------------------- In the use of q_usage_counter of request_queue, blk_cleanup_queue using "wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))" to wait q_usage_counter becoming zero. however, if the q_usage_counter becoming zero quickly, and percpu_ref_exit will execute and ref->data will be freed, maybe another process will cause a null-defef problem like below: CPU0 CPU1 blk_cleanup_queue blk_freeze_queue blk_mq_freeze_queue_wait scsi_end_request percpu_ref_get ... percpu_ref_put atomic_long_sub_and_test percpu_ref_exit ref->data -> NULL ref->data->release(ref) -> null-deref Fix it by setting flag(QUEUE_FLAG_USAGE_COUNT_SYNC) to add synchronization mechanism, when ref->data->release is called, the flag will be setted, and the "wait_event" in blk_mq_freeze_queue_wait must wait flag becoming true as well, which will limit percpu_ref_exit to execute ahead of time. Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com> Reviewed-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 13 7月, 2022 1 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I57S8D CVE: NA -------------------------------- Since there are no reserved fields, declare a wrapper to fix kabi broken. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 21 3月, 2022 1 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 186389, https://gitee.com/openeuler/kernel/issues/I4Y43S CVE: NA -------------------------------- blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate submit_queues through configfs for null_blk), while it might still be used from other context(e.g. switch elevator to none): t1 t2 elevator_switch blk_mq_unquiesce_queue blk_mq_run_hw_queues queue_for_each_hw_ctx // assembly code for hctx = (q)->queue_hw_ctx[i] mov 0x48(%rbp),%rdx -> read old queue_hw_ctx __blk_mq_update_nr_hw_queues blk_mq_realloc_hw_ctxs hctxs = q->queue_hw_ctx q->queue_hw_ctx = new_hctxs kfree(hctxs) movslq %ebx,%rax mov (%rdx,%rax,8),%rdi ->uaf Sicne the queue is freezed in __blk_mq_update_nr_hw_queues(), fix the problem by protecting 'queue_hw_ctx' through rcu where it can be accessed without grabbing 'q_usage_counter'. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 08 3月, 2022 3 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I4S8DW --------------------------- If pending_queues is increased once, it will only be decreased when nr_active is zero, and that will lead to the under-utilization of host tags because pending_queues is non-zero and the available tags for the queue will be max(host tags / active_queues, 4) instead of the needed tags of the queue. Fix it by adding an expiration time for the increasement of pending_queues, and decrease it when it expires, so pending_queues will be decreased to zero if there is no tag allocation failure, and the available tags for the queue will be the whole host tags. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I4S8DW --------------------------- When sharing a tag set, if most disks are issuing small amount of IO, and only a few is issuing a large amount of IO. Current approach is to limit the max amount of tags a disk can get equally to the average of total tags. Thus the few heavy load disk can't get enough tags while many tags are still free in the tag set. We add 'pending_queues' in blk_mq_tag_set to count how many queues can't get driver tag. Thus if this value is zero, there is no need to limit the max number of available tags. On the other hand, if a queue doesn't issue IO, the 'active_queues' will not be decreased in a period of time(request timeout), thus a lot of tags will not be available because max number of available tags is set to max(total tags / active_queues, 4). Thus we decreased it when 'nr_active' is 0. This functionality is enabled by default, to disable it, add "blk_mq.unfair_dtag=0" to boot cmd. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-5.12-rc1 commit 5ac83c64 category: perf bugzilla: https://gitee.com/openeuler/kernel/issues/I4SW26 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ac83c644f5fb924f0b2c09102ab82fc788f8411 ------------------------------------------------- This reverts commit b445547e. Since both mq-deadline and BFQ completely ignore hctx they are passed to their dispatch function and dispatch whatever request they deem fit checking whether any request for a particular hctx is queued is just pointless since we'll very likely get a request from a different hctx anyway. In the following commit we'll deal with lock contention in these IO schedulers in presence of multiple HW queues in a different way. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/bfq-iosched.c Signed-off-by: NBaokun Li <libaokun1@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 31 12月, 2021 1 次提交
-
-
由 Zhihao Cheng 提交于
hulk inclusion category: feature bugzilla: 185747 https://gitee.com/openeuler/kernel/issues/I4OUFN CVE: NA ------------------------------- Introduce kabi for storage module. Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 11月, 2021 1 次提交
-
-
由 Bart Van Assche 提交于
mainline inclusion from mainline-5.15-rc1 commit 90b71980 category: performance bugzilla: 174005 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=90b7198001f23ea37d3b46dc631bdaa2357a20b1 --------------------------- elevator_get_default() uses the following algorithm to select an I/O scheduler from inside add_disk(): - In case of a single hardware queue or if sharing hardware queues across multiple request queues (BLK_MQ_F_TAG_HCTX_SHARED), use mq-deadline. - Otherwise, use 'none'. This is a good choice for most but not for all block drivers. Make it possible to override the selection of mq-deadline with a new flag, namely BLK_MQ_F_NO_SCHED_BY_DEFAULT. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210805174200.3250718-2-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/elevator.c Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 06 7月, 2021 1 次提交
-
-
由 Ming Lei 提交于
mainline inclusion from mainline-5.11-rc1 commit fb01a293 category: bugfix bugzilla: 108493 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fb01a2932e81a1fb2273f87ff92dc8172b8880ee --------------------------- flush_end_io() may be called recursively from some driver, such as nvme-loop, so lockdep may complain 'possible recursive locking'. Commit b3c6a599("block: Fix a lockdep complaint triggered by request queue flushing") tried to address this issue by assigning dynamically allocated per-flush-queue lock class. This solution adds synchronize_rcu() for each hctx's release handler, and causes horrible SCSI MQ probe delay(more than half an hour on megaraid sas). Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so we just need to use driver specific lock class for avoiding the lockdep warning of 'possible recursive locking'. Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Reported-by: NQian Cai <cai@redhat.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 27 1月, 2021 2 次提交
-
-
由 Bart Van Assche 提交于
stable inclusion from stable-5.10.7 commit 782c9ef2ac059a25d6afbac344319574414258db bugzilla: 47429 -------------------------------- [ Upstream commit a4d34da7 ] Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer used by any kernel code. Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Martin Kepplinger <martin.kepplinger@puri.sm> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NCan Guo <cang@codeaurora.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Bart Van Assche 提交于
stable inclusion from stable-5.10.7 commit 8ed46b329d4e62a1d0c7b17361c0e364eaf4a9da bugzilla: 47429 -------------------------------- [ Upstream commit 0854bcdc ] Introduce the BLK_MQ_REQ_PM flag. This flag makes the request allocation functions set RQF_PM. This is the first step towards removing BLK_MQ_REQ_PREEMPT. Link: https://lore.kernel.org/r/20201209052951.16136-3-bvanassche@acm.org Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Can Guo <cang@codeaurora.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NCan Guo <cang@codeaurora.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 29 10月, 2020 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
As reported by kernel-doc: ./include/linux/blk-mq.h:267: warning: Function parameter or member 'active_queues_shared_sbitmap' not described in 'blk_mq_tag_set' There is now a new member for struct blk_mq_tag_set. Add a description for it, based on the commit that introduced it. Fixes: f1b49fdc ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap") Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/8e513153b83eefc05e358f51f2632b592c3f6772.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 04 9月, 2020 4 次提交
-
-
由 Kashyap Desai 提交于
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock contention is possible for mq-deadline and bfq IO schedulers when nr_hw_queues is more than one. It is because kblockd work queue can submit IO from all online CPUs (through blk_mq_run_hw_queues()) even though only one hctx has pending commands. The elevator callback .has_work for mq-deadline and bfq scheduler considers pending work if there are any IOs on request queue but it does not account hctx context. Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering the elevator even though there are no requests queued. [jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap] Signed-off-by: NKashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
For when using a shared sbitmap, no longer should the number of active request queues per hctx be relied on for when judging how to share the tag bitmap. Instead maintain the number of active request queues per tag_set, and make the judgement based on that. Originally-from: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support multiple reply queues with single hostwide tags. In addition, these drivers want to use interrupt assignment in pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0], CPU hotplug may cause in-flight IO completion to not be serviced when an interrupt is shutdown. That problem is solved in commit bf0beec0 ("blk-mq: drain I/O when all CPUs in a hctx are offline"). However, to take advantage of that blk-mq feature, the HBA HW queuess are required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW queues need to be exposed to the upper layer. In making that transition, the per-SCSI command request tags are no longer unique per Scsi host - they are just unique per hctx. As such, the HBA LLDD would have to generate this tag internally, which has a certain performance overhead. However another problem is that blk-mq assumes the host may accept (Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi: core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy counter was removed, which would stop the LLDD being sent more than .can_queue commands; however, it should still be ensured that the block layer does not issue more than .can_queue commands to the Scsi host. To solve this problem, introduce a shared sbitmap per blk_mq_tag_set, which may be requested at init time. New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the tagset to indicate whether the shared sbitmap should be used. Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests are still allocated per hctx; the reason for this is that if tags and requests were only allocated for a single hctx - like hctx0 - it may break block drivers which expect a request be associated with a specific hctx, i.e. not always hctx0. This will introduce extra memory usage. This change is based on work originally from Ming Lei in [1] and from Bart's suggestion in [2]. [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/ [2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request queues, all of which should belong to LUNs attached to same HBA. So rename it to make the point explicitly. [jpg: rebase a few times, add rnbd-clt.c change] Suggested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 9月, 2020 1 次提交
-
-
由 Baolin Wang 提交于
Move the blk_mq_bio_list_merge() into blk-merge.c and rename it as a generic name. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 7月, 2020 1 次提交
-
-
由 Daniel Wagner 提交于
No need to define typedefs for the callbacks, because there is not a single user except blk_mq_ops. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 6月, 2020 1 次提交
-
-
由 Ming Lei 提交于
blk-mq budget is abstract from scsi's device queue depth, and it is always per-request-queue instead of hctx. It can be quite absurd to get a budget from one hctx, then dequeue a request from scheduler queue, and this request may not belong to this hctx, at least for bfq and deadline. So fix the mess and always pass request queue to get/put budget callback. Signed-off-by: NMing Lei <ming.lei@redhat.com> Tested-by: NBaolin Wang <baolin.wang7@gmail.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDouglas Anderson <dianders@chromium.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Douglas Anderson <dianders@chromium.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 6月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Just check for a non-NULL elevator directly to make the code more clear. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 6月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
This is a variant of blk_mq_complete_request_remote that only completes the request if it needs to be bounced to another CPU or a softirq. If the request can be completed locally the function returns false and lets the driver complete it without requring and indirect function call. Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 5月, 2020 2 次提交
-
-
由 Ming Lei 提交于
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: "That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again." However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a cpuhp state handled after the CPU is down, so there isn't any chance to quiesce the hctx before shutting down the CPU. Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs where the last CPU goes away, and wait for completion of in-flight requests. This guarantees that there is no inflight I/O before shutting down the managed IRQ. Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need to wait for completion of in-flight requests from these drivers to avoid a potential dead-lock. It is safe to do this for stacking drivers as those do not use interrupts at all and their I/O completions are triggered by underlying devices I/O completion. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [hch: different retry mechanism, merged two patches, minor cleanups] Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
Drivers may need to bypass error injection for error recovery. Rename __blk_mq_complete_request() to blk_mq_force_complete_rq() and export that function so drivers may skip potential fake timeouts after they've reclaimed lost requests. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 4月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Call blk_mq_make_request when no ->make_request_fn is set. This is safe now that blk_alloc_queue always sets up the pointer for make_request based drivers. This avoids an indirect call in the blk-mq driver I/O fast path, which is rather expensive due to spectre mitigations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 4月, 2020 1 次提交
-
-
由 Douglas Anderson 提交于
We have: * blk_mq_run_hw_queue() * blk_mq_delay_run_hw_queue() * blk_mq_run_hw_queues() ...but not blk_mq_delay_run_hw_queues(), presumably because nobody needed it before now. Since we need it for a later patch in this series, add it. Signed-off-by: NDouglas Anderson <dianders@chromium.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 4月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
-
- 28 3月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
This allows a driver to pass a queuedata member before ->init_hctx is called. null_blk currently open codes this logic, but I'd rather have it in the core to ease future maintainance. Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 3月, 2020 1 次提交
-
-
由 Bart Van Assche 提交于
The 'hctx_list' member of struct blk_mq_hw_ctx is not a list head but instead an entry in q->unused_hctx_list. Fix the comment above this struct member. Fixes: d386732b ("blk-mq: fill header with kernel-doc") Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: André Almeida <andrealmeid@collabora.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 11月, 2019 1 次提交
-
-
由 John Garry 提交于
These functions are not referenced, so delete them. Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 11月, 2019 1 次提交
-
-
由 John Garry 提交于
Since commit 97889f9a ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()"), the return value of blk_mq_run_hw_queue() is never checked, so make it return void, which very marginally simplifies the code. Reviewed-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 10月, 2019 1 次提交
-
-
由 André Almeida 提交于
Insert documentation for structs, enums and functions at header file. Format existing and new comments at struct blk_mq_ops as kernel-doc comments. Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NAndré Almeida <andrealmeid@collabora.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 10月, 2019 2 次提交
-
-
由 Pavel Begunkov 提交于
blk_mq_request_completed() and blk_mq_request_started() are short, inline it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
The meaning of several member variables of these two data structures is nontrivial. Hence document all member variables using the kernel-doc syntax. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 9月, 2019 1 次提交
-
-
由 Damien Le Moal 提交于
When elevator_init_mq() is called from blk_mq_init_allocated_queue(), the only information known about the device is the number of hardware queues as the block device scan by the device driver is not completed yet for most drivers. The device type and elevator required features are not set yet, preventing to correctly select the default elevator most suitable for the device. This currently affects all multi-queue zoned block devices which default to the "none" elevator instead of the required "mq-deadline" elevator. These drives currently include host-managed SMR disks connected to a smartpqi HBA and null_blk block devices with zoned mode enabled. Upcoming NVMe Zoned Namespace devices will also be affected. Fix this by adding the boolean elevator_init argument to blk_mq_init_allocated_queue() to control the execution of elevator_init_mq(). Two cases exist: 1) elevator_init = false is used for calls to blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this case, a call to elevator_init_mq() is added to __device_add_disk(), resulting in the delayed initialization of the queue elevator after the device driver finished probing the device information. This effectively allows elevator_init_mq() access to more information about the device. 2) elevator_init = true preserves the current behavior of initializing the elevator directly from blk_mq_init_allocated_queue(). This case is used for the special request based DM devices where the device gendisk is created before the queue initialization and device information (e.g. queue limits) is already known when the queue initialization is executed. Additionally, to make sure that the elevator initialization is never done while requests are in-flight (there should be none when the device driver calls device_add_disk()), freeze and quiesce the device request queue before calling blk_mq_init_sched() in elevator_init_mq(). Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-