alinux: blk: add iohang check function
Background:
We do not have a dependable block layer interface to determine whether
block device has io requests which have not been completed for somewhat
long time. Currently we have 'in_flight' interface, it counts the number
of I/O requests that have been issued to the device driver but have
not yet completed, and it does not include I/O requests that are in the
queue but not yet issued to the device driver, which means it will not
count io requests that have been stucked in block layer.
Also say that there are steady io requests issued to device driver,
'in_flight' maybe always non-zero, but you could not determine whether
there is one io request which has not been completed for too long.
Solution:
To find io requests which have not been completed for too long, here
add 3 new inferfaces:
/sys/block/vdb/queue/hang_threshold
If one io request's running time has been greater than this value, count
this io as hang.
/sys/block/vdb/hang
Show read/write io requests' hang counter.
/sys/kernel/debug/block/vdb/rq_hang
Show all hang io requests's detailed info, like below:
ffff97db96301200 {.op=WRITE, .cmd_flags=SYNC, .rq_flags=STARTED|
ELVPRIV|IO_STAT|STATS, .state=in_flight, .tag=30, .internal_tag=169,
.start_time_ns=140634088407, .io_start_time_ns=140634102958,
.current_time=146497371953, .bio = ffff97db91e8e000,
.bio_pages = { ffffd096a0602540 }, .bio = ffff97db91e8ec00,
.bio_pages = { ffffd096a070eec0 }, .bio = ffff97db91e8f600,
.bio_pages = { ffffd096a0424cc0 }, .bio = ffff97db91e8f300,
.bio_pages = { ffffd096a0600a80 }}
With above info, we can easily see this request's latency distribution,
and see next patch for bio_pages's usage.
Note, /sys/kernel/debug/block/vdb/rq_hang only exists in blk-mq device driver
and needs CONFIG_BLK_DEBUG_FS enabled.
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Showing
想要评论请 注册 或 登录