alinux: block-throttle: only do io statistics if needed

task #29063222 Current blk throttle codes always do io statistics even though users don't specify valid throttle rules, which will introduce significant overheads for applications that don't use blk throttle function and is wrose in arm, see below perf data captured in arm: sudo taskset -c 66 fio -ioengine=io_uring -sqthread_poll=1 -hipri=1 -sqthread_poll_cpu=65 -registerfiles=1 -fixedbufs=1 -direct=1 -filename=/dev/nvme0n1 -bs=4k -iodepth=8 -rw=randwrite -time_based -ramp_time=30 -runtime=60 -name="test" Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 3.54% io_uring-sq [kernel.kallsyms] [k] throtl_stats_update_completion 0.89% io_uring-sq [kernel.kallsyms] [k] throtl_bio_end_io 0.66% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio 0.05% io_uring-sq [kernel.kallsyms] [k] blk_throtl_stat_add 0.05% io_uring-sq [kernel.kallsyms] [k] throtl_track_latency 0.01% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio_endio Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 1.62% io_uring-sq [kernel.kallsyms] [k] io_submit_sqes 1.06% io_uring-sq [kernel.kallsyms] [k] io_issue_sqe 0.32% io_uring-sq [kernel.kallsyms] [k] __io_queue_sqe 0.06% io_uring-sq [kernel.kallsyms] [k] io_queue_sqe Above test doesn't set valid blk throttle rules, but the overhead introduced by blk throttle is even bigger than many io_uring framework functions, which is not acceptable. To improve this issue, only do do io statistics if users specify valid blk throttle rules, and this will also improve performance. Before this patch: clat (usec): min=5, max=6871, avg=18.70, stdev=17.89 lat (usec): min=9, max=6871, avg=18.84, stdev=17.89 WRITE: bw=1618MiB/s (1697MB/s), 1618MiB/s-1618MiB/s (1697MB/s-1697MB/s), io=94.8GiB (102GB), run=60001-60001msec With this patch: clat (usec): min=5, max=7554, avg=17.49, stdev=18.24 lat (usec): min=9, max=7554, avg=17.62, stdev=18.24 WRITE: bw=1727MiB/s (1810MB/s), 1727MiB/s-1727MiB/s (1810MB/s-1810MB/s), io=101GiB (109GB), run=60001-60001msec About 6.6% bps improvement and 6.4% latency reduction. Signed-off-by: N Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: N Joseph Qi <joseph.qi@linux.alibaba.com>

alinux: block-throttle: only do io statistics if needed
task #29063222 Current blk throttle codes always do io statistics even though users don't specify valid throttle rules, which will introduce significant overheads for applications that don't use blk throttle function and is wrose in arm, see below perf data captured in arm: sudo taskset -c 66 fio -ioengine=io_uring -sqthread_poll=1 -hipri=1 -sqthread_poll_cpu=65 -registerfiles=1 -fixedbufs=1 -direct=1 -filename=/dev/nvme0n1 -bs=4k -iodepth=8 -rw=randwrite -time_based -ramp_time=30 -runtime=60 -name="test" Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 3.54% io_uring-sq [kernel.kallsyms] [k] throtl_stats_update_completion 0.89% io_uring-sq [kernel.kallsyms] [k] throtl_bio_end_io 0.66% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio 0.05% io_uring-sq [kernel.kallsyms] [k] blk_throtl_stat_add 0.05% io_uring-sq [kernel.kallsyms] [k] throtl_track_latency 0.01% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio_endio Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 1.62% io_uring-sq [kernel.kallsyms] [k] io_submit_sqes 1.06% io_uring-sq [kernel.kallsyms] [k] io_issue_sqe 0.32% io_uring-sq [kernel.kallsyms] [k] __io_queue_sqe 0.06% io_uring-sq [kernel.kallsyms] [k] io_queue_sqe Above test doesn't set valid blk throttle rules, but the overhead introduced by blk throttle is even bigger than many io_uring framework functions, which is not acceptable. To improve this issue, only do do io statistics if users specify valid blk throttle rules, and this will also improve performance. Before this patch: clat (usec): min=5, max=6871, avg=18.70, stdev=17.89 lat (usec): min=9, max=6871, avg=18.84, stdev=17.89 WRITE: bw=1618MiB/s (1697MB/s), 1618MiB/s-1618MiB/s (1697MB/s-1697MB/s), io=94.8GiB (102GB), run=60001-60001msec With this patch: clat (usec): min=5, max=7554, avg=17.49, stdev=18.24 lat (usec): min=9, max=7554, avg=17.62, stdev=18.24 WRITE: bw=1727MiB/s (1810MB/s), 1727MiB/s-1727MiB/s (1810MB/s-1810MB/s), io=101GiB (109GB), run=60001-60001msec About 6.6% bps improvement and 6.4% latency reduction. Signed-off-by: N Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: N Joseph Qi <joseph.qi@linux.alibaba.com>
b8a94ed8 · Xiaoguang Wang · Caspar Zhang · 897fa8eb · b8a94ed8
隐藏空白更改
内联并排

Showing with 5 addition and 4 deletion

block/blk-throttle.c block/blk-throttle.c +5 -4

未找到文件。
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -2324,6 +2324,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 	struct throtl_service_queue *sq;
 	bool rw = bio_data_dir(bio);
 	bool throttled = false;
+	bool has_rules = tg->has_rules[rw];
 	struct throtl_data *td = tg->td;
 	WARN_ON_ONCE(!rcu_read_lock_held());
@@ -2332,11 +2333,11 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 	if (bio_flagged(bio, BIO_THROTTLED))
 		goto out;
-	throtl_bio_stats_start(bio, tg);
+	if (!has_rules)
-	if (!tg->has_rules[rw])
 		goto out;
+	throtl_bio_stats_start(bio, tg);
 	spin_lock_irq(q->queue_lock);
 	throtl_update_latency_buckets(td);
@@ -2435,7 +2436,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg,
 out_unlock:
 	spin_unlock_irq(q->queue_lock);
 out:
-	if (!throttled)
+	if (!throttled && has_rules)
 		bio_set_io_start_time_ns(bio);
 	bio_set_flag(bio, BIO_THROTTLED);