- 26 1月, 2021 5 次提交
-
-
由 Paolo Valente 提交于
To prevent injection information from being lost on bfq_queue merging, also the amount of service that a bfq_queue receives must be saved and restored when the bfq_queue is merged and split, respectively. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
To prevent weight-raising information from being lost on bfq_queue merging, also the amount of service that a bfq_queue receives must be saved and restored when the bfq_queue is merged and split, respectively. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
A bfq_queue may happen to be deemed as soft real-time while it is still enjoying interactive weight-raising. If this happens because of a false positive, then the bfq_queue is likely to loose its soft real-time status soon. Upon losing such a status, the bfq_queue must get back its interactive weight-raising, if its interactive period is not over yet. But this case is not handled. This commit corrects this error. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Upon an I/O-dispatch attempt, BFQ may detect that it was better to plug I/O dispatch, and to wait for a new request to arrive for the currently in-service queue. But the arrival of a new request for an empty bfq_queue, and thus the switch from idle to busy of the bfq_queue, may cause the scenario to change, and make plugging no longer needed for service guarantees, or more convenient for throughput. In this case, keeping I/O-dispatch plugged would certainly lower throughput. To address this issue, this commit makes such a check, and stops plugging I/O if it is better to stop plugging I/O. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Some BFQ mechanisms make their decisions on a bfq_queue basing also on whether the bfq_queue is I/O bound. In this respect, the current logic for evaluating whether a bfq_queue is I/O bound is rather rough. This commits replaces this logic with a more effective one. The new logic measures the percentage of time during which a bfq_queue is active, and marks the bfq_queue as I/O bound if the latter if this percentage is above a fixed threshold. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 1月, 2021 7 次提交
-
-
由 Jan Kara 提交于
This reverts commit b445547e. Since both mq-deadline and BFQ completely ignore hctx they are passed to their dispatch function and dispatch whatever request they deem fit checking whether any request for a particular hctx is queued is just pointless since we'll very likely get a request from a different hctx anyway. In the following commit we'll deal with lock contention in these IO schedulers in presence of multiple HW queues in a different way. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
This commits preserves I/O-dispatch plugging for a special symmetric case that may suddenly turn into asymmetric: the case where only one bfq_queue, say bfqq, is busy. In this case, not expiring bfqq does not cause any harm to any other queues in terms of service guarantees. In contrast, it avoids the following unlucky sequence of events: (1) bfqq is expired, (2) a new queue with a lower weight than bfqq becomes busy (or more queues), (3) the new queue is served until a new request arrives for bfqq, (4) when bfqq is finally served, there are so many requests of the new queue in the drive that the pending requests for bfqq take a lot of time to be served. In particular, event (2) may case even already dispatched requests of bfqq to be delayed, inside the drive. So, to avoid this series of events, the scenario is preventively declared as asymmetric also if bfqq is the only busy queues. By doing so, I/O-dispatch plugging is performed for bfqq. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
BFQ tags some bfq_queues as interactive or soft_rt if it deems that these bfq_queues contain the I/O of, respectively, interactive or soft real-time applications. BFQ privileges both these special types of bfq_queues over normal bfq_queues. To privilege a bfq_queue, BFQ mainly raises the weight of the bfq_queue. In particular, soft_rt bfq_queues get a higher weight than interactive bfq_queues. A bfq_queue may turn from interactive to soft_rt. And this leads to a tricky issue. Soft real-time applications usually start with an I/O-bound, interactive phase, in which they load themselves into main memory. BFQ correctly detects this phase, and keeps the bfq_queues associated with the application in interactive mode for a while. Problems arise when the I/O pattern of the application finally switches to soft real-time. One of the conditions for a bfq_queue to be deemed as soft_rt is that the bfq_queue does not consume too much bandwidth. But the bfq_queues associated with a soft real-time application consume as much bandwidth as they can in the loading phase of the application. So, after the application becomes truly soft real-time, a lot of time should pass before the average bandwidth consumed by its bfq_queues finally drops to a value acceptable for soft_rt bfq_queues. As a consequence, there might be a time gap during which the application is not privileged at all, because its bfq_queues are not interactive any longer, but cannot be deemed as soft_rt yet. To avoid this problem, BFQ pretends that an interactive bfq_queue consumes zero bandwidth, and allows an interactive bfq_queue to switch to soft_rt. Yet, this fake zero-bandwidth consumption easily causes the bfq_queue to often switch to soft_rt deceptively, during its loading phase. As in soft_rt mode, the bfq_queue gets its bandwidth correctly computed, and therefore soon switches back to interactive. Then it switches again to soft_rt, and so on. These spurious fluctuations usually cause losses of throughput, because they deceive BFQ's mechanisms for boosting throughput (injection, I/O-plugging avoidance, ...). This commit addresses this issue as follows: 1) It does compute actual bandwidth consumption also for interactive bfq_queues. This avoids the above false positives. 2) When a bfq_queue switches from interactive to normal mode, the consumed bandwidth is reset (forgotten). This allows the bfq_queue to enjoy soft_rt very quickly. In particular, two alternatives are possible in this switch: - the bfq_queue still has backlog, and therefore there is a budget already scheduled to serve the bfq_queue; in this case, the scheduling of the current budget of the bfq_queue is not hindered, because only the scheduling of the next budget will be affected by the weight drop. After that, if the bfq_queue is actually in a soft_rt phase, and becomes empty during the service of its current budget, which is the natural behavior of a soft_rt bfq_queue, then the bfq_queue will be considered as soft_rt when its next I/O arrives. If, in contrast, the bfq_queue remains constantly non-empty, then its next budget will be scheduled with a low weight, which is the natural treatment for an I/O-bound (non soft_rt) bfq_queue. - the bfq_queue is empty; in this case, the bfq_queue may be considered unjustly soft_rt when its new I/O arrives. Yet the problem is now much smaller than before, because it is unlikely that more than one spurious fluctuation occurs. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
BFQ heuristics try to detect interactive I/O, and raise the weight of the queues containing such an I/O. Yet, if also the user changes the weight of a queue (i.e., the user changes the ioprio of the process associated with that queue), then it is most likely better to prevent BFQ heuristics from silently changing the same weight. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Tests on slower machines showed current window to be way too small. This commit increases it. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jia Cheng Hu 提交于
Since commit c5089591c3ba ("block, bfq: detect wakers and unconditionally inject their I/O"), when the in-service bfq_queue, say Q, is temporarily empty, BFQ checks whether there are I/O requests to inject (also) from the waker bfq_queue for Q. To this goal, the value pointed by bfqq->waker_bfqq->next_rq must be controlled. However, the current implementation mistakenly looks at bfqq->next_rq, which instead points to the next request of the currently served queue. This mistake evidently causes losses of throughput in scenarios with waker bfq_queues. This commit corrects this mistake. Fixes: c5089591c3ba ("block, bfq: detect wakers and unconditionally inject their I/O") Signed-off-by: NJia Cheng Hu <jia.jiachenghu@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
The value of the I/O plugging (idling) timeout is used also as the think-time threshold to decide whether a process has a short think time. In this respect, a good value of this timeout for rotational drives is un the order of several ms. Yet, this is often too long a time interval to be effective as a think-time threshold. This commit mitigates this problem (by a lot, according to tests), by halving the threshold. Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 1月, 2021 1 次提交
-
-
由 Jan Kara 提交于
BFQ computes number of tags it allows to be allocated for each request type based on tag bitmap. However it uses 1 << bitmap.shift as number of available tags which is wrong. 'shift' is just an internal bitmap value containing logarithm of how many bits bitmap uses in each bitmap word. Thus number of tags allowed for some request types can be far to low. Use proper bitmap.depth which has the number of tags instead. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 9月, 2020 1 次提交
-
-
由 Omar Sandoval 提交于
Yang Yang reported the following crash caused by requeueing a flush request in Kyber: [ 2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00 ... [ 2.517468] pc : clear_bit+0x18/0x2c [ 2.517502] lr : sbitmap_queue_clear+0x40/0x228 [ 2.517503] sp : ffffff800832bc60 pstate : 00c00145 ... [ 2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000) [ 2.517602] Call trace: [ 2.517606] clear_bit+0x18/0x2c [ 2.517619] kyber_finish_request+0x74/0x80 [ 2.517627] blk_mq_requeue_request+0x3c/0xc0 [ 2.517637] __scsi_queue_insert+0x11c/0x148 [ 2.517640] scsi_softirq_done+0x114/0x130 [ 2.517643] blk_done_softirq+0x7c/0xb0 [ 2.517651] __do_softirq+0x208/0x3bc [ 2.517657] run_ksoftirqd+0x34/0x60 [ 2.517663] smpboot_thread_fn+0x1c4/0x2c0 [ 2.517667] kthread+0x110/0x120 [ 2.517669] ret_from_fork+0x10/0x18 This happens because Kyber doesn't track flush requests, so kyber_finish_request() reads a garbage domain token. Only call the scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for the finish_request() hook in blk_mq_free_request()). Now that we're handling it in blk-mq, also remove the check from BFQ. Reported-by: NYang Yang <yang.yang@vivo.com> Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 9月, 2020 2 次提交
-
-
由 Kashyap Desai 提交于
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock contention is possible for mq-deadline and bfq IO schedulers when nr_hw_queues is more than one. It is because kblockd work queue can submit IO from all online CPUs (through blk_mq_run_hw_queues()) even though only one hctx has pending commands. The elevator callback .has_work for mq-deadline and bfq scheduler considers pending work if there are any IOs on request queue but it does not account hctx context. Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering the elevator even though there are no requests queued. [jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap] Signed-off-by: NKashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Introduce pointers for the blk_mq_tags regular and reserved bitmap tags, with the goal of later being able to use a common shared tag bitmap across all HW contexts in a set. Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 01 8月, 2020 1 次提交
-
-
由 Randy Dunlap 提交于
Change "at at" to "at a". Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 5月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
None of the I/O schedulers actually needs it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 5月, 2020 1 次提交
-
-
由 Yufen Yu 提交于
Use the common interface bdi_dev_name() to get device name. Signed-off-by: NYufen Yu <yuyufen@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Add missing <linux/backing-dev.h> include BFQ Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 3月, 2020 2 次提交
-
-
由 Paolo Valente 提交于
A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The goal of this put is to release a process reference to a bfq_queue. But process-reference releases may trigger also some extra operation, and, to this goal, are handled through bfq_release_process_ref(). So, turn the invocation of bfq_put_queue() into an invocation of bfq_release_process_ref(). Tested-by: cki-project@redhat.com Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Zhiqiang Liu 提交于
In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is not in bfqd-lock critical section. The bfqq, which is not equal to NULL in bfq_idle_slice_timer, may be freed after passing to bfq_idle_slice_timer_body. So we will access the freed memory. In addition, considering the bfqq may be in race, we should firstly check whether bfqq is in service before doing something on it in bfq_idle_slice_timer_body func. If the bfqq in race is not in service, it means the bfqq has been expired through __bfq_bfqq_expire func, and wait_request flags has been cleared in __bfq_bfqd_reset_in_service func. So we do not need to re-clear the wait_request of bfqq which is not in service. KASAN log is given as follows: [13058.354613] ================================================================== [13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290 [13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767 [13058.354646] [13058.354655] CPU: 96 PID: 19767 Comm: fork13 [13058.354661] Call trace: [13058.354667] dump_backtrace+0x0/0x310 [13058.354672] show_stack+0x28/0x38 [13058.354681] dump_stack+0xd8/0x108 [13058.354687] print_address_description+0x68/0x2d0 [13058.354690] kasan_report+0x124/0x2e0 [13058.354697] __asan_load8+0x88/0xb0 [13058.354702] bfq_idle_slice_timer+0xac/0x290 [13058.354707] __hrtimer_run_queues+0x298/0x8b8 [13058.354710] hrtimer_interrupt+0x1b8/0x678 [13058.354716] arch_timer_handler_phys+0x4c/0x78 [13058.354722] handle_percpu_devid_irq+0xf0/0x558 [13058.354731] generic_handle_irq+0x50/0x70 [13058.354735] __handle_domain_irq+0x94/0x110 [13058.354739] gic_handle_irq+0x8c/0x1b0 [13058.354742] el1_irq+0xb8/0x140 [13058.354748] do_wp_page+0x260/0xe28 [13058.354752] __handle_mm_fault+0x8ec/0x9b0 [13058.354756] handle_mm_fault+0x280/0x460 [13058.354762] do_page_fault+0x3ec/0x890 [13058.354765] do_mem_abort+0xc0/0x1b0 [13058.354768] el0_da+0x24/0x28 [13058.354770] [13058.354773] Allocated by task 19731: [13058.354780] kasan_kmalloc+0xe0/0x190 [13058.354784] kasan_slab_alloc+0x14/0x20 [13058.354788] kmem_cache_alloc_node+0x130/0x440 [13058.354793] bfq_get_queue+0x138/0x858 [13058.354797] bfq_get_bfqq_handle_split+0xd4/0x328 [13058.354801] bfq_init_rq+0x1f4/0x1180 [13058.354806] bfq_insert_requests+0x264/0x1c98 [13058.354811] blk_mq_sched_insert_requests+0x1c4/0x488 [13058.354818] blk_mq_flush_plug_list+0x2d4/0x6e0 [13058.354826] blk_flush_plug_list+0x230/0x548 [13058.354830] blk_finish_plug+0x60/0x80 [13058.354838] read_pages+0xec/0x2c0 [13058.354842] __do_page_cache_readahead+0x374/0x438 [13058.354846] ondemand_readahead+0x24c/0x6b0 [13058.354851] page_cache_sync_readahead+0x17c/0x2f8 [13058.354858] generic_file_buffered_read+0x588/0xc58 [13058.354862] generic_file_read_iter+0x1b4/0x278 [13058.354965] ext4_file_read_iter+0xa8/0x1d8 [ext4] [13058.354972] __vfs_read+0x238/0x320 [13058.354976] vfs_read+0xbc/0x1c0 [13058.354980] ksys_read+0xdc/0x1b8 [13058.354984] __arm64_sys_read+0x50/0x60 [13058.354990] el0_svc_common+0xb4/0x1d8 [13058.354994] el0_svc_handler+0x50/0xa8 [13058.354998] el0_svc+0x8/0xc [13058.354999] [13058.355001] Freed by task 19731: [13058.355007] __kasan_slab_free+0x120/0x228 [13058.355010] kasan_slab_free+0x10/0x18 [13058.355014] kmem_cache_free+0x288/0x3f0 [13058.355018] bfq_put_queue+0x134/0x208 [13058.355022] bfq_exit_icq_bfqq+0x164/0x348 [13058.355026] bfq_exit_icq+0x28/0x40 [13058.355030] ioc_exit_icq+0xa0/0x150 [13058.355035] put_io_context_active+0x250/0x438 [13058.355038] exit_io_context+0xd0/0x138 [13058.355045] do_exit+0x734/0xc58 [13058.355050] do_group_exit+0x78/0x220 [13058.355054] __wake_up_parent+0x0/0x50 [13058.355058] el0_svc_common+0xb4/0x1d8 [13058.355062] el0_svc_handler+0x50/0xa8 [13058.355066] el0_svc+0x8/0xc [13058.355067] [13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464 [13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040) [13058.355077] The buggy address belongs to the page: [13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0 [13058.366175] flags: 0x2ffffe0000008100(slab|head) [13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780 [13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000 [13058.370789] page dumped because: kasan: bad access detected [13058.370791] [13058.370792] Memory state around the buggy address: [13058.370797] ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb [13058.370801] ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [13058.370808] ^ [13058.370811] ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [13058.370815] ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [13058.370817] ================================================================== [13058.370820] Disabling lock debugging due to kernel taint Here, we directly pass the bfqd to bfq_idle_slice_timer_body func. -- V2->V3: rewrite the comment as suggested by Paolo Valente V1->V2: add one comment, and add Fixes and Reported-by tag. Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Acked-by: NPaolo Valente <paolo.valente@linaro.org> Reported-by: NWang Wang <wangwang2@huawei.com> Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: NFeilong Lin <linfeilong@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 2月, 2020 5 次提交
-
-
由 Paolo Valente 提交于
The exact, general goal of the function bfq_split_bfqq() is not that apparent. Add a comment to make it clear. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
ifdefs around gets and puts of bfq groups reduce readability, remove them. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
The flag on_st in the bfq_entity data structure is true if the entity is on a service tree or is in service. Yet the name of the field, confusingly, does not mention the second, very important case. Extend the name to mention the second case too. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
BFQ maintains an ordered list, implemented with an RB tree, of head-request positions of non-empty bfq_queues. This position tree, inherited from CFQ, is used to find bfq_queues that contain I/O close to each other. BFQ merges these bfq_queues into a single shared queue, if this boosts throughput on the device at hand. There is however a special-purpose bfq_queue that does not participate in queue merging, the oom bfq_queue. Yet, also this bfq_queue could be wrongly added to the position tree. So bfqq_find_close() could return the oom bfq_queue, which is a source of further troubles in an out-of-memory situation. This commit prevents the oom bfq_queue from being inserted into the position tree. Tested-by: NPatrick Dung <patdung100@gmail.com> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Commit 478de338 ("block, bfq: deschedule empty bfq_queues not referred by any process") fixed commit 3726112e ("block, bfq: re-schedule empty queues if they deserve I/O plugging") by descheduling an empty bfq_queue when it remains with not process reference. Yet, this still left a case uncovered: an empty bfq_queue with not process reference that remains in service. This happens for an in-service sync bfq_queue that is deemed to deserve I/O-dispatch plugging when it remains empty. Yet no new requests will arrive for such a bfq_queue if no process sends requests to it any longer. Even worse, the bfq_queue may happen to be prematurely freed while still in service (because there may remain no reference to it any longer). This commit solves this problem by preventing I/O dispatch from being plugged for the in-service bfq_queue, if the latter has no process reference (the bfq_queue is then prevented from remaining in service). Fixes: 3726112e ("block, bfq: re-schedule empty queues if they deserve I/O plugging") Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Reported-by: NPatrick Dung <patdung100@gmail.com> Tested-by: NPatrick Dung <patdung100@gmail.com> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 1月, 2020 1 次提交
-
-
由 Alex Shi 提交于
This macro is never used after introduced from commit aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler") Better to remove it. Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 11月, 2019 1 次提交
-
-
由 Paolo Valente 提交于
Since commit 3726112e ("block, bfq: re-schedule empty queues if they deserve I/O plugging"), to prevent the service guarantees of a bfq_queue from being violated, the bfq_queue may be left busy, i.e., scheduled for service, even if empty (see comments in __bfq_bfqq_expire() for details). But, if no process will send requests to the bfq_queue any longer, then there is no point in keeping the bfq_queue scheduled for service. In addition, keeping the bfq_queue scheduled for service, but with no process reference any longer, may cause the bfq_queue to be freed when descheduled from service. But this is assumed to never happen, and causes a UAF if it happens. This, in turn, caused crashes [1, 2]. This commit fixes this issue by descheduling an empty bfq_queue when it remains with not process reference. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1767539 [2] https://bugzilla.kernel.org/show_bug.cgi?id=205447 Fixes: 3726112e ("block, bfq: re-schedule empty queues if they deserve I/O plugging") Reported-by: NChris Evich <cevich@redhat.com> Reported-by: NPatrick Dung <patdung100@gmail.com> Reported-by: NThorsten Schubert <tschubert@bafh.org> Tested-by: NThorsten Schubert <tschubert@bafh.org> Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 11月, 2019 1 次提交
-
-
由 Tejun Heo 提交于
When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios from blk-cgroup core to populate six stat knobs. blk-cgroup core is moving away from blkg_rwstat to improve scalability and won't be able to support this usage. It isn't like the sharing gains all that much. Let's break it out to dedicated rwstat counters which are updated when on cgroup1. This makes use of bfqg_*rwstat*() helpers outside of CONFIG_BFQ_CGROUP_DEBUG. Move them out. v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 9月, 2019 4 次提交
-
-
由 Paolo Valente 提交于
If equal to 0, the injection limit for a bfq_queue is pushed to 1 after a first sample of the total service time of the I/O requests of the queue is computed (to allow injection to start). Yet, because of a mistake in the branch that performs this action, the push may happen also in some other case. This commit fixes this issue. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
The update period of the injection limit has been tentatively set to 100 ms, to reduce fluctuations. This value however proved to cause, occasionally, the limit to be decremented for some bfq_queue only after the queue underwent excessive injection for a lot of time. This commit reduces the period to 10 ms. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Upon an increment attempt of the injection limit, the latter is constrained not to become higher than twice the maximum number max_rq_in_driver of I/O requests that have happened to be in service in the drive. This high bound allows the injection limit to grow beyond max_rq_in_driver, which may then cause max_rq_in_driver itself to grow. However, since the limit is incremented by only one unit at a time, there is no need for such a high bound, and just max_rq_in_driver+1 is enough. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
BFQ updates the injection limit of each bfq_queue as a function of how much the limit inflates the service times experienced by the I/O requests of the queue. So only service times affected by injection must be taken into account. Unfortunately, in the current implementation of this update scheme, the service time of an I/O request rq not affected by injection may happen to be considered in the following case: there is no I/O request in service when rq arrives. This commit fixes this issue by making sure that only service times affected by injection are considered for updating the injection limit. In particular, the service time of an I/O request rq is now considered only if at least one of the following two conditions holds: - the destination bfq_queue for rq underwent injection before rq arrival, and there is still I/O in service in the drive on rq arrival (the service of such unfinished I/O may delay the service of rq); - injection occurs between the arrival and the completion time of rq. Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 8月, 2019 3 次提交
-
-
由 Paolo Valente 提交于
As reported in [1], the call bfq_init_rq(rq) may return NULL in case of OOM (in particular, if rq->elv.icq is NULL because memory allocation failed in failed in ioc_create_icq()). This commit handles this circumstance. [1] https://lkml.org/lkml/2019/7/22/824 Cc: Hsin-Yi Wang <hsinyi@google.com> Cc: Nicolas Boichat <drinkcat@chromium.org> Cc: Doug Anderson <dianders@chromium.org> Reported-by: NGuenter Roeck <linux@roeck-us.net> Reported-by: NHsin-Yi Wang <hsinyi@google.com> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Since commit 13a857a4 ("block, bfq: detect wakers and unconditionally inject their I/O"), every bfq_queue has a pointer to a waker bfq_queue and a list of the bfq_queues it may wake. In this respect, when a bfq_queue, say Q, remains with no I/O source attached to it, Q cannot be woken by any other bfq_queue, and cannot wake any other bfq_queue. Then Q must be removed from the woken list of its possible waker bfq_queue, and all bfq_queues in the woken list of Q must stop having a waker bfq_queue. Q remains with no I/O source in two cases: when the last process associated with Q exits or when such a process gets associated with a different bfq_queue. Unfortunately, commit 13a857a4 ("block, bfq: detect wakers and unconditionally inject their I/O") performed the above updates only in the first case. This commit fixes this bug by moving these updates to when Q gets freed. This is a simple and safe way to handle all cases, as both the above events, process exit and re-association, lead to Q being freed soon, and because dangling references would come out only after Q gets freed (if no update were performed). Fixes: 13a857a4 ("block, bfq: detect wakers and unconditionally inject their I/O") Reported-by: NDouglas Anderson <dianders@chromium.org> Tested-by: NDouglas Anderson <dianders@chromium.org> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Paolo Valente 提交于
Since commit 13a857a4 ("block, bfq: detect wakers and unconditionally inject their I/O"), BFQ stores, in a per-device pointer last_completed_rq_bfqq, the last bfq_queue that had an I/O request completed. If some bfq_queue receives new I/O right after the last request of last_completed_rq_bfqq has been completed, then last_completed_rq_bfqq may be a waker bfq_queue. But if the bfq_queue last_completed_rq_bfqq points to is freed, then last_completed_rq_bfqq becomes a dangling reference. This commit resets last_completed_rq_bfqq if the pointed bfq_queue is freed. Fixes: 13a857a4 ("block, bfq: detect wakers and unconditionally inject their I/O") Reported-by: NDouglas Anderson <dianders@chromium.org> Tested-by: NDouglas Anderson <dianders@chromium.org> Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 7月, 2019 1 次提交
-
-
由 Paolo Valente 提交于
Consider a sync bfq_queue Q that remains empty while in service, and suppose that, when this happens, there is a fair amount of already in-flight I/O not belonging to Q. In such a situation, I/O dispatching may need to be plugged (until new I/O arrives for Q), for the following reason. The drive may decide to serve in-flight non-Q's I/O requests before Q's ones, thereby delaying the arrival of new I/O requests for Q (recall that Q is sync). If I/O-dispatching is not plugged, then, while Q remains empty, a basically uncontrolled amount of I/O from other queues may be dispatched too, possibly causing the service of Q's I/O to be delayed even longer in the drive. This problem gets more and more serious as the speed and the queue depth of the drive grow, because, as these two quantities grow, the probability to find no queue busy but many requests in flight grows too. If Q has the same weight and priority as the other queues, then the above delay is unlikely to cause any issue, because all queues tend to undergo the same treatment. So, since not plugging I/O dispatching is convenient for throughput, it is better not to plug. Things change in case Q has a higher weight or priority than some other queue, because Q's service guarantees may simply be violated. For this reason, commit 1de0c4cd ("block, bfq: reduce idling only in symmetric scenarios") does plug I/O in such an asymmetric scenario. Plugging minimizes the delay induced by already in-flight I/O, and enables Q to recover the bandwidth it may lose because of this delay. Yet the above commit does not cover the case of weight-raised queues, for efficiency concerns. For weight-raised queues, I/O-dispatch plugging is activated simply if not all bfq_queues are weight-raised. But this check does not handle the case of in-flight requests, because a bfq_queue may become non busy *before* all its in-flight requests are completed. This commit performs I/O-dispatch plugging for weight-raised queues if there are some in-flight requests. As a practical example of the resulting recover of control, under write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5 seconds after this fix, against 15 seconds before the fix (as a reference, gnome-terminal takes about 35 seconds to start with any of the other I/O schedulers). Fixes: 1de0c4cd ("block, bfq: reduce idling only in symmetric scenarios") Signed-off-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 7月, 2019 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Rename the block documentation files to ReST, add an index for them and adjust in order to produce a nice html output via the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
-
- 28 6月, 2019 1 次提交
-
-
由 Douglas Anderson 提交于
In reboot tests on several devices we were seeing a "use after free" when slub_debug or KASAN was enabled. The kernel complained about: Unable to handle kernel paging request at virtual address 6b6b6c2b ...which is a classic sign of use after free under slub_debug. The stack crawl in kgdb looked like: 0 test_bit (addr=<optimized out>, nr=<optimized out>) 1 bfq_bfqq_busy (bfqq=<optimized out>) 2 bfq_select_queue (bfqd=<optimized out>) 3 __bfq_dispatch_request (hctx=<optimized out>) 4 bfq_dispatch_request (hctx=<optimized out>) 5 0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440) 6 0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440) 7 0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440) 8 0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>) 9 0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480) 10 0xc024cff4 in worker_thread (__worker=0xec6d4640) Digging in kgdb, it could be found that, though bfqq looked fine, bfqq->bic had been freed. Through further digging, I postulated that perhaps it is illegal to access a "bic" (AKA an "icq") after bfq_exit_icq() had been called because the "bic" can be freed at some point in time after this call is made. I confirmed that there certainly were cases where the exact crashing code path would access the "bic" after bfq_exit_icq() had been called. Sspecifically I set the "bfqq->bic" to (void *)0x7 and saw that the bic was 0x7 at the time of the crash. To understand a bit more about why this crash was fairly uncommon (I saw it only once in a few hundred reboots), you can see that much of the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't access the ->bic anymore. The only case it doesn't is if bfq_put_queue() sees a reference still held. However, even in the case when bfqq isn't freed, the crash is still rare. Why? I tracked what happened to the "bic" after the exit routine. It doesn't get freed right away. Rather, put_io_context_active() eventually called put_io_context() which queued up freeing on a workqueue. The freeing then actually happened later than that through call_rcu(). Despite all these delays, some extra debugging showed that all the hoops could be jumped through in time and the memory could be freed causing the original crash. Phew! To make a long story short, assuming it truly is illegal to access an icq after the "exit_icq" callback is finished, this patch is needed. Cc: stable@vger.kernel.org Reviewed-by: NPaolo Valente <paolo.valente@unimore.it> Signed-off-by: NDouglas Anderson <dianders@chromium.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-