- 02 11月, 2022 11 次提交
-
-
由 Christoph Hellwig 提交于
None of the callers of nvme_kill_queues needs it to unquiesce the admin queues, as all of them already do it themselves: 1) nvme_reset_work explicit call nvme_start_admin_queue toward the beginning of the function. The extra call to nvme_start_admin_queue in nvme_reset_work this won't do anything as NVME_CTRL_ADMIN_Q_STOPPED will already be cleared. 2) nvme_remove calls nvme_dev_disable with shutdown flag set to true at the very beginning of the function if the PCIe device was not present, which is the precondition for the call to nvme_kill_queues. nvme_dev_disable already calls nvme_start_admin_queue toward the end of the function when the shutdown flag is set to true, so the admin queue is already enabled at this point. 3) nvme_remove_dead_ctrl schedules a workqueue to unbind the driver, which will end up in nvme_remove, which calls nvme_dev_disable with the shutdown flag. This case will call nvme_start_admin_queue a bit later than before. 4) apple_nvme_remove uses the same sequence as nvme_remove_dead_ctrl above. 5) nvme_remove_namespaces only calls nvme_kill_queues when the controller is in the DEAD state. That can only happen in the PCIe driver, and only from nvme_remove. See item 2) above for the conditions there. So it is safe to just remove the call to nvme_start_admin_queue in nvme_kill_queues without replacement. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
At the point where namespaces are marked dead, the controller is in a non-live state and we won't get pass the identify commands. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The NVME_NS_DEAD check only made sense when we revalidated namespaces in nvme_passthrough_end for commands that affected the namespace inventory. These days NVME_NS_DEAD is only set during reset or when tearing down namespaces, and we always remove all namespaces right after that. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The call to nvme_remove_invalid_namespaces made sense when nvme_passthru_end revalidated all namespaces and had to remove those that didn't exist any more. Since we don't revalidate from nvme_passthru_end now, this call is entirely spurious. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The code to create, update or delete a tagset and namespaces in nvme_reset_work is a bit convoluted. Refactor it with a two high-level conditionals for first probe vs reset and I/O queues vs no I/O queues to make the code flow more clear. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-3-hch@lst.de [axboe: fix whitespace issue] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
nvme and xen-blkfront are already doing this to stop buffered writes from creating dirty pages that can't be written out later. Move it to the common code. This also removes the comment about the ordering from nvme, as bd_mutex not only is gone entirely, but also hasn't been used for locking updates to the disk size long before that, and thus the ordering requirement documented there doesn't apply any more. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChao Leng <lengchao@huawei.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Prevent unnecessary format conversion for bfqg->bfqd in multiple places. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@unimore.it> Link: https://lore.kernel.org/r/20221102022542.3621219-6-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Such code are not even compiled since they are inside marco "#if 0". Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@unimore.it> Link: https://lore.kernel.org/r/20221102022542.3621219-5-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Just make the code a litter cleaner by removing the unnecessary variable 'sd'. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@unimore.it> Link: https://lore.kernel.org/r/20221102022542.3621219-4-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Current code is a bit ugly and hard to read. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@unimore.it> Link: https://lore.kernel.org/r/20221102022542.3621219-3-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
After the patch "block, bfq: cleanup bfq_weights_tree add/remove apis"), the local variable 'bfqd' is not used anymore, thus remove it. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221102022542.3621219-2-yukuai1@huaweicloud.com Fixes: afdba146 ("block, bfq: cleanup bfq_weights_tree add/remove apis") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 11月, 2022 15 次提交
-
-
由 Kemeng Shi 提交于
We only need a max queue depth for every iolatency to limit the inflight io number. Replace struct rq_depth with unsigned int to simplfy "struct iolatency_grp" and save memory. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20221018111240.22612-4-shikemeng@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kemeng Shi 提交于
Default queue depth of iolatency_grp is unlimited, so we scale down quickly(once by half) in scale_cookie_change. Remove the "subtract 1/16th" part which is not the truth and add the actual way we scale down. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221018111240.22612-3-shikemeng@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kemeng Shi 提交于
Function blkcg_iolatency_throttle will make sure blkg->parent is not NULL before calls check_scale_change. And function check_scale_change is only called in blkcg_iolatency_throttle. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20221018111240.22612-2-shikemeng@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Split an elevator_disable helper from elevator_switch for the case where we want to switch to no scheduler at all. This includes removing the pointless elevator_switch_mq helper and removing the switch to no schedule logic from blk_mq_init_sched. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030100714.876891-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Checking for the required features in the callers simplifies the code quite a bit, so do that. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030100714.876891-7-hch@lst.de [axboe: adjust for dropping patch 1, use __elevator_find()] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just compare the pointers instead of using the string based elevator_match. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030100714.876891-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Use eq for the elevator_queue as done elsewhere. This frees e to be used for the loop iterator instead of the odd __ prefix. In addition rename elv to cur to make it more clear it is the currently selected elevator. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030100714.876891-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
If the tag_set has BLK_MQ_F_NO_SCHED flag set we will never show any scheduler, so exit early. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030100714.876891-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Do the request_module and repeated lookup in the only caller that cares, pick a saner name that explains where are actually doing a lookup and use a sane calling conventions that passes the queue first. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030100714.876891-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
It's the same with bfq_weights_tree_remove() now. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-7-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
The 'bfq_data' and 'rb_root_cached' can both be accessed through 'bfq_queue', thus only pass 'bfq_queue' as parameter. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-6-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Now that root group is counted into 'num_groups_with_pending_reqs', 'num_groups_with_pending_reqs > 0' is always true in bfq_asymmetric_scenario(). Thus change the condition to '> 1'. On the other hand, this change can enable concurrent sync io if only one group is activated. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-5-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Currently, bfq can't handle sync io concurrently as long as they are not issued from root group. This is because 'bfqd->num_groups_with_pending_reqs > 0' is always true in bfq_asymmetric_scenario(). The way that bfqg is counted into 'num_groups_with_pending_reqs': Before this patch: 1) root group will never be counted. 2) Count if bfqg or it's child bfqgs have pending requests. 3) Don't count if bfqg and it's child bfqgs complete all the requests. After this patch: 1) root group is counted. 2) Count if bfqg have pending requests. 3) Don't count if bfqg complete all the requests. With this change, the occasion that only one group is activated can be detected, and next patch will support concurrent sync io in the occasion. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-4-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
Prepare to refactor the counting of 'num_groups_with_pending_reqs'. Add a counter in bfq_group, update it while tracking if bfqq have pending requests and when bfq_bfqq_move() is called. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-3-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Yu Kuai 提交于
If entity belongs to bfqq, then entity->in_groups_with_pending_reqs is not used currently. This patch use it to track if bfqq has pending requests through callers of weights_tree insertion and removal. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-2-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 10月, 2022 4 次提交
-
-
由 Jinlong Chen 提交于
The calling relationship in blk_mq_destroy_queue() is as follows: blk_mq_destroy_queue() ... -> blk_queue_start_drain() -> blk_freeze_queue_start() <- called ... -> blk_freeze_queue() -> blk_freeze_queue_start() <- called again -> blk_mq_freeze_queue_wait() ... So there is a redundant call to blk_freeze_queue_start(). Replace blk_freeze_queue() with blk_mq_freeze_queue_wait() to avoid the redundant call. Signed-off-by: NJinlong Chen <nickyc975@zju.edu.cn> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030083212.1251255-1-nickyc975@zju.edu.cnSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jinlong Chen 提交于
The only caller that needs queue_is_mq check is del_gendisk, so move the check into it. Signed-off-by: NJinlong Chen <nickyc975@zju.edu.cn> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030094730.1275463-1-nickyc975@zju.edu.cnSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dawei Li 提交于
Convert current looping-based implementation into bit operation, which can bring improvement for: 1) bitops is more efficient for its arch-level optimization. 2) Given that blksize_bits() is inline, _if_ @size is compile-time constant, it's possible that order_base_2() _may_ make output compile-time evaluated, depending on code context and compiler behavior. Signed-off-by: NDawei Li <set_pte_at@outlook.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/TYCP286MB23238842958D7C083D6B67CECA349@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COMSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 David Jeffery 提交于
David Jeffery found one double ->queue_rq() issue, so far it can be triggered in VM use case because of long vmexit latency or preempt latency of vCPU pthread or long page fault in vCPU pthread, then block IO req could be timed out before queuing the request to hardware but after calling blk_mq_start_request() during ->queue_rq(), then timeout handler may handle it by requeue, then double ->queue_rq() is caused, and kernel panic. So far, it is driver's responsibility to cover the race between timeout and completion, so it seems supposed to be solved in driver in theory, given driver has enough knowledge. But it is really one common problem, lots of driver could have similar issue, and could be hard to fix all affected drivers, even it isn't easy for driver to handle the race. So David suggests this patch by draining in-progress ->queue_rq() for solving this issue. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: virtualization@lists.linux-foundation.org Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221026051957.358818-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 10月, 2022 4 次提交
-
-
由 Bart Van Assche 提交于
This patch removes a conditional jump from get_max_segment_size(). The x86-64 assembler code for this function without this patch is as follows: 206 return min_not_zero(mask - offset + 1, 0x0000000000000118 <+72>: not %rax 0x000000000000011b <+75>: and 0x8(%r10),%rax 0x000000000000011f <+79>: add $0x1,%rax 0x0000000000000123 <+83>: je 0x138 <bvec_split_segs+104> 0x0000000000000125 <+85>: cmp %rdx,%rax 0x0000000000000128 <+88>: mov %rdx,%r12 0x000000000000012b <+91>: cmovbe %rax,%r12 0x000000000000012f <+95>: test %rdx,%rdx 0x0000000000000132 <+98>: mov %eax,%edx 0x0000000000000134 <+100>: cmovne %r12d,%edx With this patch applied: 206 return min(mask - offset, (unsigned long)lim->max_segment_size - 1) + 1; 0x000000000000003f <+63>: mov 0x28(%rdi),%ebp 0x0000000000000042 <+66>: not %rax 0x0000000000000045 <+69>: and 0x8(%rdi),%rax 0x0000000000000049 <+73>: sub $0x1,%rbp 0x000000000000004d <+77>: cmp %rbp,%rax 0x0000000000000050 <+80>: cmova %rbp,%rax 0x0000000000000054 <+84>: add $0x1,%eax Reviewed-by: NMing Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221025191755.1711437-4-bvanassche@acm.orgReviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Document which functions do not modify the queue limits. Reviewed-by: NMing Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221025191755.1711437-3-bvanassche@acm.orgReviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Commit c75e707f ("block: remove the per-bio/request write hint") removed all code that uses the struct request write_hint member. Hence also remove 'write_hint' itself. Reviewed-by: NMing Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221025191755.1711437-2-bvanassche@acm.orgReviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
bio_start_io_acct_time is not actually used anywhere, so remove it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221025155916.270303-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 10月, 2022 4 次提交
-
-
由 Christoph Hellwig 提交于
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second admin queue reference to be held by the apple_nvme structure. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NSven Peter <sven@svenpeter.dev> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second admin queue reference to be held by the nvme_dev. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Now that blk_mq_destroy_queue does not release the queue reference, there is no need for a second queue reference to be held by the scsi_device. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The fact that blk_mq_destroy_queue also drops a queue reference leads to various places having to grab an extra reference. Move the call to blk_put_queue into the callers to allow removing the extra references. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20221018135720.670094-2-hch@lst.de [axboe: fix fabrics_q vs admin_q conflict in nvme core.c] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 10月, 2022 2 次提交
-
-
由 Jinlong Chen 提交于
The current reference management logic of io scheduler modules contains refcnt problems. For example, blk_mq_init_sched may fail before or after the calling of e->ops.init_sched. If it fails before the calling, it does nothing to the reference to the io scheduler module. But if it fails after the calling, it releases the reference by calling kobject_put(&eq->kobj). As the callers of blk_mq_init_sched can't know exactly where the failure happens, they can't handle the reference to the io scheduler module properly: releasing the reference on failure results in double-release if blk_mq_init_sched has released it, and not releasing the reference results in ghost reference if blk_mq_init_sched did not release it either. The same problem also exists in io schedulers' init_sched implementations. We can address the problem by adding releasing statements to the error handling procedures of blk_mq_init_sched and init_sched implementations. But that is counterintuitive and requires modifications to existing io schedulers. Instead, We make elevator_alloc get the io scheduler module references that will be released by elevator_release. And then, we match each elevator_get with an elevator_put. Therefore, each reference to an io scheduler module explicitly has its own getter and releaser, and we no longer need to worry about the refcnt problems. The bugs and the patch can be validated with tools here: https://github.com/nickyc975/linux_elv_refcnt_bug.git [hch: split out a few bits into separate patches, use a non-try module_get in elevator_alloc] Signed-off-by: NJinlong Chen <nickyc975@zju.edu.cn> Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221020064819.1469928-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jinlong Chen 提交于
No need to find the actual elevator_type struct for this comparism, the name is all that is needed. Signed-off-by: NJinlong Chen <nickyc975@zju.edu.cn> [hch: split from a larger patch] Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221020064819.1469928-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-