- 18 10月, 2021 12 次提交
-
-
由 John Garry 提交于
Now that we use shared tags for shared sbitmap support, we don't require the tags sbitmap pointers, so drop them. This essentially reverts commit 222a5ae0 ("blk-mq: Use pointers for blk_mq_tags bitmap tags"). Function blk_mq_init_bitmap_tags() is removed also, since it would be only a wrappper for blk_mq_init_bitmaps(). Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1633429419-228500-14-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Currently we use separate sbitmap pairs and active_queues atomic_t for shared sbitmap support. However a full sets of static requests are used per HW queue, which is quite wasteful, considering that the total number of requests usable at any given time across all HW queues is limited by the shared sbitmap depth. As such, it is considerably more memory efficient in the case of shared sbitmap to allocate a set of static rqs per tag set or request queue, and not per HW queue. So replace the sbitmap pairs and active_queues atomic_t with a shared tags per tagset and request queue, which will hold a set of shared static rqs. Since there is now no valid HW queue index to be passed to the blk_mq_ops .init and .exit_request callbacks, pass an invalid index token. This changes the semantics of the APIs, such that the callback would need to validate the HW queue index before using it. Currently no user of shared sbitmap actually uses the HW queue index (as would be expected). Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Refactor blk_mq_free_map_and_requests() such that it can be used at many sites at which the tag map and rqs are freed. Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the alloc equivalent. Suggested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-12-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Add a function to combine allocating tags and the associated requests, and factor out common patterns to use this new function. Some function only call blk_mq_alloc_map_and_rqs() now, but more functionality will be added later. Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they are only used in blk-mq.c, and finally rename some functions for conciseness and consistency with other function names: - __blk_mq_alloc_map_and_{request -> rqs}() - blk_mq_alloc_{map_and_requests -> set_map_and_rqs}() Suggested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-11-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Put the functionality to update the sched shared sbitmap size in a common function. Since the same formula is always used to resize, and it can be got from the request queue argument, so just pass the request queue pointer. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Function blk_mq_clear_rq_mapping() is required to clear the sched tags mappings in driver tags rqs[]. But there is no need for a driver tags to clear its own mapping, so skip clearing the mapping in this scenario. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-9-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags in future, so pass a driver tags pointer instead of the tagset container and HW queue index. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-8-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
It's easier to read: if (x) X; else Y; over: if (!x) Y; else X; No functional change intended. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-5-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
For shared sbitmap, if the call to blk_mq_tag_update_depth() was successful for any hctx when hctx->sched_tags is not set, then it would be successful for all (due to nature in which blk_mq_tag_update_depth() fails). As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for each hctx. So relocate the call until after the hctx iteration under the !q->elevator check, which is equivalent (to !hctx->sched_tags). Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-4-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
The original code in commit 24d2f903 ("blk-mq: split out tag initialization, support shared tags") would check tags->rqs is non-NULL and then dereference tags->rqs[]. Then in commit 2af8cbe3 ("blk-mq: split tag ->rqs[] into two"), we started to dereference tags->static_rqs[], but continued to check non-NULL tags->rqs. Check tags->static_rqs as non-NULL instead, which is more logical. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-2-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Split the integrity/metadata handling definitions out into a new header. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
There is no need to pull blk-cgroup.h and thus blkdev.h in here, so break the include chain. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 10月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Don't switch back to percpu mode to avoid the double RCU grace period when tearing down SCSI devices. After removing the disk only passthrough commands can be send anyway. Suggested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NDarrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20210929071241.934472-6-hch@lst.deTested-by: NYi Zhang <yi.zhang@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 9月, 2021 1 次提交
-
-
由 Song Liu 提交于
Limiting number of request to BLK_MAX_REQUEST_COUNT at blk_plug hurts performance for large md arrays. [1] shows resync speed of md array drops for md array with more than 16 HDDs. Fix this by allowing more request at plug queue. The multiple_queue flag is used to only apply higher limit to multiple queue cases. [1] https://lore.kernel.org/linux-raid/CAFDAVznS71BXW8Jxv6k9dXc2iR3ysX3iZRBww_rzA8WifBFxGg@mail.gmail.com/Tested-by: NMarcin Wanat <marcin.wanat@gmail.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 8月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
Replace the magic lookup through the kobject tree with an explicit backpointer, given that the device model links are set up and torn down at times when I/O is still possible, leading to potential NULL or invalid pointer dereferences. Fixes: edb0872f ("block: move the bdi from the request_queue to the gendisk") Reported-by: Nsyzbot <syzbot+aa0801b6b32dca9dda82@syzkaller.appspotmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NSven Schnelle <svens@linux.ibm.com> Link: https://lore.kernel.org/r/20210816134624.GA24234@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Pass in a request_queue and assign disk->queue in __blk_alloc_disk to ensure struct gendisk always has a valid ->queue pointer. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210816131910.615153-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
This was a leftover from the legacy alloc_disk interface. Switch the scsi ULPs and dasd to set ->minors directly like all other drivers and remove the argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> [dasd] Link: https://lore.kernel.org/r/20210816131910.615153-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Pass the lockdep name to the low-level __blk_alloc_disk helper and hardcode the name for it given that the number of minors or node_id are not very useful information. While this passes a pointless argument for non-lockdep builds that is not really an issue as disk allocation is a probe time only slow path. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210816131910.615153-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 8月, 2021 1 次提交
-
-
由 Ming Lei 提交于
is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the following check: hctx->fq->flush_rq == req but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because: 1) memory re-order in blk_mq_rq_ctx_init(): rq->mq_hctx = data->hctx; ... refcount_set(&rq->ref, 1); OR 2) tag re-use and ->rqs[] isn't updated with new request. Fix the issue by re-writing is_flush_rq() as: return rq->end_io == flush_end_io; which turns out simpler to follow and immune to data race since we have ordered WRITE rq->end_io and refcount_set(&rq->ref, 1). Fixes: 2e315dc0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") Cc: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de> Cc: Yufen Yu <yuyufen@huawei.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210818010925.607383-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 8月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Inside blk_mq_queue_tag_busy_iter() we already grabbed request's refcount before calling ->fn(), so needn't to grab it one more time in blk_mq_check_expired(). Meantime remove extra request expire check in blk_mq_check_expired(). Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20210811155202.629575-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 8月, 2021 1 次提交
-
-
由 Yu Kuai 提交于
We run a test that delete and recover devcies frequently(two devices on the same host), and we found that 'active_queues' is super big after a period of time. If device a and device b share a tag set, and a is deleted, then blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there is only one queue that are using the tag set. However, if b is still active, the active_queues of b might never be cleared even if b is deleted. Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 8月, 2021 1 次提交
-
-
由 Tanner Love 提交于
With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads could incur a cache line miss in invoke_softirq() and other places. Replace the test with a static key to avoid the potential cache miss. [ tglx: Dropped the IDE part, removed the export and updated blk-mq ] Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NTanner Love <tannerlove@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210602180338.3324213-1-tannerlove.kernel@gmail.com
-
- 10 8月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
The backing device information only makes sense for file system I/O, and thus belongs into the gendisk and not the lower level request_queue structure. Move it there. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 7月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Move the sg_timeout and sg_reserved_size fields into the bsg_device and scsi_device structures as they have nothing to do with generic block I/O. Note that these values are now separate for bsg vs. SCSI device node access, but that just matches how /dev/sg vs the other nodes has always behaved. Link: https://lore.kernel.org/r/20210729064845.1044147-4-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 01 7月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
All driver uses are gone now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 6月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Commit 6e6fcbc2 ("blk-mq: support batching dispatch in case of io") starts to support io batching submission by using hctx->dispatch_busy. However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy in that commit, so fix the issue by updating hctx->dispatch_busy in case of real scheduler. Reported-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Fixes: 6e6fcbc2 ("blk-mq: support batching dispatch in case of io") Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210625020248.1630497-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 6月, 2021 3 次提交
-
-
由 Peter Zijlstra 提交于
Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NWill Deacon <will@kernel.org> Acked-by: NDaniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org
-
由 Peter Zijlstra 提交于
Remove yet another few p->state accesses. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.347475156@infradead.org
-
由 Peter Zijlstra 提交于
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
-
- 12 6月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
All users are gone now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Add a new API to allocate a gendisk including the request_queue for use with blk-mq based drivers. This is to avoid boilerplate code in drivers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Don't return the passed in request_queue but a normal error code, and drop the elevator_init argument in favor of just calling elevator_init_mq directly from dm-rq. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Factour out a helper to initialize a simple single hw queue tag_set from blk_mq_init_sq_queue. This will allow to phase out blk_mq_init_sq_queue in favor of a more symmetric and general API. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 6月, 2021 1 次提交
-
-
由 Jan Kara 提交于
Provided the device driver does not implement dispatch budget accounting (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls requests from the IO scheduler as long as it is willing to give out any. That defeats scheduling heuristics inside the scheduler by creating false impression that the device can take more IO when it in fact cannot. For example with BFQ IO scheduler on top of virtio-blk device setting blkio cgroup weight has barely any impact on observed throughput of async IO because __blk_mq_do_dispatch_sched() always sucks out all the IO queued in BFQ. BFQ first submits IO from higher weight cgroups but when that is all dispatched, it will give out IO of lower weight cgroups as well. And then we have to wait for all this IO to be dispatched to the disk (which means lot of it actually has to complete) before the IO scheduler is queried again for dispatching more requests. This completely destroys any service differentiation. So grab request tag for a request pulled out of the IO scheduler already in __blk_mq_do_dispatch_sched() and do not pull any more requests if we cannot get it because we are unlikely to be able to dispatch it. That way only single request is going to wait in the dispatch list for some tag to free. Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 5月, 2021 5 次提交
-
-
由 John Garry 提交于
The tags used for an IO scheduler are currently per hctx. As such, when q->nr_hw_queues grows, so does the request queue total IO scheduler tag depth. This may cause problems for SCSI MQ HBAs whose total driver depth is fixed. Ming and Yanhui report higher CPU usage and lower throughput in scenarios where the fixed total driver tag depth is appreciably lower than the total scheduler tag depth: https://lore.kernel.org/linux-block/440dfcfc-1a2c-bd98-1161-cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b In that scenario, since the scheduler tag is got first, much contention is introduced since a driver tag may not be available after we have got the sched tag. Improve this scenario by introducing request queue-wide tags for when a tagset-wide sbitmap is used. The static sched requests are still allocated per hctx, as requests are initialised per hctx, as in blk_mq_init_request(..., hctx_idx, ...) -> set->ops->init_request(.., hctx_idx, ...). For simplicity of resizing the request queue sbitmap when updating the request queue depth, just init at the max possible size, so we don't need to deal with the possibly with swapping out a new sbitmap for old if we need to grow. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
The tag allocation code to alloc the sbitmap pairs is common for regular bitmaps tags and shared sbitmap, so refactor into a common function. Also remove superfluous "flags" argument from blk_mq_init_shared_sbitmap(). Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1620907258-30910-2-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Before we free request queue, clearing flush request reference in tags->rqs[], so that potential UAF can be avoided. Based on one patch written by David Jeffery. Tested-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210511152236.763464-5-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
refcount_inc_not_zero() in bt_tags_iter() still may read one freed request. Fix the issue by the following approach: 1) hold a per-tags spinlock when reading ->rqs[tag] and calling refcount_inc_not_zero in bt_tags_iter() 2) clearing stale request referred via ->rqs[tag] before freeing request pool, the per-tags spinlock is held for clearing stale ->rq[tag] So after we cleared stale requests, bt_tags_iter() won't observe freed request any more, also the clearing will wait for pending request reference. The idea of clearing ->rqs[] is borrowed from John Garry's previous patch and one recent David's patch. Tested-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NDavid Jeffery <djeffery@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and this way will prevent the request from being re-used when ->fn is running. The approach is same as what we do during handling timeout. Fix request use-after-free(UAF) related with completion race or queue releasing: - If one rq is referred before rq->q is frozen, then queue won't be frozen before the request is released during iteration. - If one rq is referred after rq->q is frozen, refcount_inc_not_zero() will return false, and we won't iterate over this request. However, still one request UAF not covered: refcount_inc_not_zero() may read one freed request, and it will be handled in next patch. Tested-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 5月, 2021 1 次提交
-
-
由 Bart Van Assche 提交于
If a tag set is shared across request queues (e.g. SCSI LUNs) then the block layer core keeps track of the number of active request queues in tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is cleared by blk_mq_del_queue_tag_set(). Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Fixes: 0d2602ca ("blk-mq: improve support for shared tags maps") Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-