alinux: block-throttle: only do io statistics if needed
task #29063222 Current blk throttle codes always do io statistics even though users don't specify valid throttle rules, which will introduce significant overheads for applications that don't use blk throttle function and is wrose in arm, see below perf data captured in arm: sudo taskset -c 66 fio -ioengine=io_uring -sqthread_poll=1 -hipri=1 -sqthread_poll_cpu=65 -registerfiles=1 -fixedbufs=1 -direct=1 -filename=/dev/nvme0n1 -bs=4k -iodepth=8 -rw=randwrite -time_based -ramp_time=30 -runtime=60 -name="test" Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 3.54% io_uring-sq [kernel.kallsyms] [k] throtl_stats_update_completion 0.89% io_uring-sq [kernel.kallsyms] [k] throtl_bio_end_io 0.66% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio 0.05% io_uring-sq [kernel.kallsyms] [k] blk_throtl_stat_add 0.05% io_uring-sq [kernel.kallsyms] [k] throtl_track_latency 0.01% io_uring-sq [kernel.kallsyms] [k] blk_throtl_bio_endio Samples: 25K of event 'cycles', Event count (approx.): 16586974662 Overhead Command Shared Object Symbol 1.62% io_uring-sq [kernel.kallsyms] [k] io_submit_sqes 1.06% io_uring-sq [kernel.kallsyms] [k] io_issue_sqe 0.32% io_uring-sq [kernel.kallsyms] [k] __io_queue_sqe 0.06% io_uring-sq [kernel.kallsyms] [k] io_queue_sqe Above test doesn't set valid blk throttle rules, but the overhead introduced by blk throttle is even bigger than many io_uring framework functions, which is not acceptable. To improve this issue, only do do io statistics if users specify valid blk throttle rules, and this will also improve performance. Before this patch: clat (usec): min=5, max=6871, avg=18.70, stdev=17.89 lat (usec): min=9, max=6871, avg=18.84, stdev=17.89 WRITE: bw=1618MiB/s (1697MB/s), 1618MiB/s-1618MiB/s (1697MB/s-1697MB/s), io=94.8GiB (102GB), run=60001-60001msec With this patch: clat (usec): min=5, max=7554, avg=17.49, stdev=18.24 lat (usec): min=9, max=7554, avg=17.62, stdev=18.24 WRITE: bw=1727MiB/s (1810MB/s), 1727MiB/s-1727MiB/s (1810MB/s-1810MB/s), io=101GiB (109GB), run=60001-60001msec About 6.6% bps improvement and 6.4% latency reduction. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Showing
想要评论请 注册 或 登录