提交 · b131f2011115f3c18a49e17762486501496fea3c · openeuler / Kernel

12 11月, 2021 1 次提交

blk-mq: rename blk_attempt_bio_merge · b131f201

由 Ming Lei 提交于 11月 11, 2021

It is very annoying to have two block layer functions which share same
name, so rename blk_attempt_bio_merge in blk-mq.c as
blk_mq_attempt_bio_merge.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b131f201

09 11月, 2021 1 次提交

blk-mq: add one API for waiting until quiesce is done · 9ef4d020

由 Ming Lei 提交于 11月 09, 2021

Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it
is hard to switch to this style, so these drivers need one atomic flag for
helping to balance quiesce and unquiesce.

When quiesce is in-progress, the driver still needs to wait until
the quiesce is done, so add API of blk_mq_wait_quiesce_done() for
these drivers.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9ef4d020

08 11月, 2021 1 次提交

blk-mq: don't free tags if the tag_set is used by other device in queue initialztion · a846a8e6

由 Ye Bin 提交于 11月 08, 2021

We got UAF report on v5.10 as follows:
[ 1446.674930] ==================================================================
[ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90
[ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348
[ 1446.677851]
[ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2
[ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn
[ 1446.681448] Call Trace:
[ 1446.681800]  dump_stack+0x9b/0xce
[ 1446.682916]  print_address_description.constprop.6+0x3e/0x60
[ 1446.685999]  kasan_report.cold.9+0x22/0x3a
[ 1446.687186]  blk_mq_get_driver_tag+0x9a4/0xa90
[ 1446.687785]  blk_mq_dispatch_rq_list+0x21a/0x1d40
[ 1446.692576]  __blk_mq_do_dispatch_sched+0x394/0x830
[ 1446.695758]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[ 1446.698279]  blk_mq_sched_dispatch_requests+0xdf/0x140
[ 1446.698967]  __blk_mq_run_hw_queue+0xc0/0x270
[ 1446.699561]  __blk_mq_delay_run_hw_queue+0x4cc/0x550
[ 1446.701407]  blk_mq_run_hw_queue+0x13b/0x2b0
[ 1446.702593]  blk_mq_sched_insert_requests+0x1de/0x390
[ 1446.703309]  blk_mq_flush_plug_list+0x4b4/0x760
[ 1446.705408]  blk_flush_plug_list+0x2c5/0x480
[ 1446.708471]  blk_finish_plug+0x55/0xa0
[ 1446.708980]  blk_throtl_dispatch_work_fn+0x23b/0x2e0
[ 1446.711236]  process_one_work+0x6d4/0xfe0
[ 1446.711778]  worker_thread+0x91/0xc80
[ 1446.713400]  kthread+0x32d/0x3f0
[ 1446.714362]  ret_from_fork+0x1f/0x30
[ 1446.714846]
[ 1446.715062] Allocated by task 1:
[ 1446.715509]  kasan_save_stack+0x19/0x40
[ 1446.716026]  __kasan_kmalloc.constprop.1+0xc1/0xd0
[ 1446.716673]  blk_mq_init_tags+0x6d/0x330
[ 1446.717207]  blk_mq_alloc_rq_map+0x50/0x1c0
[ 1446.717769]  __blk_mq_alloc_map_and_request+0xe5/0x320
[ 1446.718459]  blk_mq_alloc_tag_set+0x679/0xdc0
[ 1446.719050]  scsi_add_host_with_dma.cold.3+0xa0/0x5db
[ 1446.719736]  virtscsi_probe+0x7bf/0xbd0
[ 1446.720265]  virtio_dev_probe+0x402/0x6c0
[ 1446.720808]  really_probe+0x276/0xde0
[ 1446.721320]  driver_probe_device+0x267/0x3d0
[ 1446.721892]  device_driver_attach+0xfe/0x140
[ 1446.722491]  __driver_attach+0x13a/0x2c0
[ 1446.723037]  bus_for_each_dev+0x146/0x1c0
[ 1446.723603]  bus_add_driver+0x3fc/0x680
[ 1446.724145]  driver_register+0x1c0/0x400
[ 1446.724693]  init+0xa2/0xe8
[ 1446.725091]  do_one_initcall+0x9e/0x310
[ 1446.725626]  kernel_init_freeable+0xc56/0xcb9
[ 1446.726231]  kernel_init+0x11/0x198
[ 1446.726714]  ret_from_fork+0x1f/0x30
[ 1446.727212]
[ 1446.727433] Freed by task 26992:
[ 1446.727882]  kasan_save_stack+0x19/0x40
[ 1446.728420]  kasan_set_track+0x1c/0x30
[ 1446.728943]  kasan_set_free_info+0x1b/0x30
[ 1446.729517]  __kasan_slab_free+0x111/0x160
[ 1446.730084]  kfree+0xb8/0x520
[ 1446.730507]  blk_mq_free_map_and_requests+0x10b/0x1b0
[ 1446.731206]  blk_mq_realloc_hw_ctxs+0x8cb/0x15b0
[ 1446.731844]  blk_mq_init_allocated_queue+0x374/0x1380
[ 1446.732540]  blk_mq_init_queue_data+0x7f/0xd0
[ 1446.733155]  scsi_mq_alloc_queue+0x45/0x170
[ 1446.733730]  scsi_alloc_sdev+0x73c/0xb20
[ 1446.734281]  scsi_probe_and_add_lun+0x9a6/0x2d90
[ 1446.734916]  __scsi_scan_target+0x208/0xc50
[ 1446.735500]  scsi_scan_channel.part.3+0x113/0x170
[ 1446.736149]  scsi_scan_host_selected+0x25a/0x360
[ 1446.736783]  store_scan+0x290/0x2d0
[ 1446.737275]  dev_attr_store+0x55/0x80
[ 1446.737782]  sysfs_kf_write+0x132/0x190
[ 1446.738313]  kernfs_fop_write_iter+0x319/0x4b0
[ 1446.738921]  new_sync_write+0x40e/0x5c0
[ 1446.739429]  vfs_write+0x519/0x720
[ 1446.739877]  ksys_write+0xf8/0x1f0
[ 1446.740332]  do_syscall_64+0x2d/0x40
[ 1446.740802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1446.741462]
[ 1446.741670] The buggy address belongs to the object at ffff8880185afd00
[ 1446.741670]  which belongs to the cache kmalloc-256 of size 256
[ 1446.743276] The buggy address is located 16 bytes inside of
[ 1446.743276]  256-byte region [ffff8880185afd00, ffff8880185afe00)
[ 1446.744765] The buggy address belongs to the page:
[ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac
[ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0
[ 1446.747719] flags: 0x1fffff80010200(slab|head)
[ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240
[ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[ 1446.750455] page dumped because: kasan: bad access detected
[ 1446.751227]
[ 1446.751445] Memory state around the buggy address:
[ 1446.752102]  ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.753090]  ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1446.755065]                          ^
[ 1446.755589]  ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1446.756574]  ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.757566] ==================================================================

Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the
same host initializes it's queue successfully. However, if the second
device failed to allocate memory in blk_mq_alloc_and_init_hctx() from
blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(),
__blk_mq_free_map_and_rqs() will be called on error path, and if
'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed
while it's still used by the first device.

To fix this issue we move release newly allocated hardware context from
blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to
release hardware context in blk_mq_init_allocated_queue.

Fixes: 868f2f0b ("blk-mq: dynamic h/w context count")
Signed-off-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a846a8e6

05 11月, 2021 4 次提交

block: ensure cached plug request matches the current queue · 10c47870

由 Jens Axboe 提交于 11月 04, 2021

If we're driving multiple devices, we could have pre-populated the cache
for a different device. Ensure that the empty request matches the current
queue.

Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10c47870

block: move queue enter logic into blk_mq_submit_bio() · 900e0807

由 Jens Axboe 提交于 11月 03, 2021

Retain the old logic for the fops based submit, but for our internal
blk_mq_submit_bio(), move the queue entering logic into the core
function itself.

We need to be a bit careful if going into the scheduler, as a scheduler
or queue mappings can arbitrarily change before we have entered the queue.
Have the bio scheduler mapping do that separately, it's a very cheap
operation compared to actually doing merging locking and lookups.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
[axboe: update to check merge post submit_bio_checks() doing remap...]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

900e0807

block: split request allocation components into helpers · 71539717

由 Jens Axboe 提交于 11月 03, 2021

This is in preparation for a fix, but serves as a cleanup as well moving
the cached vs regular alloc logic out of blk_mq_submit_bio().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

71539717

block: have plug stored requests hold references to the queue · c5fc7b93

由 Jens Axboe 提交于 11月 03, 2021

Requests that were stored in the cache deliberately didn't hold an enter
reference to the queue, instead we grabbed one every time we pulled a
request out of there. That made for awkward logic on freeing the remainder
of the cached list, if needed, where we had to artificially raise the
queue usage count before each free.

Grab references up front for cached plug requests. That's safer, and also
more efficient.

Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5fc7b93

03 11月, 2021 2 次提交

blk-mq: update hctx->nr_active in blk_mq_end_request_batch() · 3b87c6ea

由 Ming Lei 提交于 11月 02, 2021

In case of shared tags and none io sched, batched completion still may
be run into, and hctx->nr_active is accounted when getting driver tag,
so it has to be updated in blk_mq_end_request_batch().

Otherwise, hctx->nr_active may become same with queue depth, then
hctx_may_queue() always return false, then io hang is caused.

Fixes the issue by updating the counter in batched way.
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: f794f335 ("block: add support for blk_mq_end_request_batch()")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211102153619.3627505-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b87c6ea

block: move RQF_ELV setting into allocators · 781dd830

由 Jens Axboe 提交于 11月 02, 2021

It's not safe to do this before blk_queue_enter(), as the scheduler state
could have changed in between. Hence move the RQF_ELV setting into the
allocators, where we know the queue is already entered.
Suggested-by: NMing Lei <ming.lei@redhat.com>
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Reported-by: NSteffen Maier <maier@linux.ibm.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

781dd830

02 11月, 2021 2 次提交

block: replace always false argument with 'false' · b2280909

由 Jens Axboe 提交于 11月 01, 2021

A previous commit fixed up the condition for doing direct issue, but that
left the 'from_schedule' argument dead inside the branch. Replace it with
'false'.

Fixes: ff155223 ("blk-mq: don't issue request directly in case that current is to be blocked")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2280909

block: assign correct tag before doing prefetch of request · a22c00be

由 Jens Axboe 提交于 11月 01, 2021

Ensure that current tag is correctly assigned before attempting
to prefetch the first cacheline of the request.

Fixes: 92aff191 ("block: prefetch request to be initialized")
Reported-and-tested-by: syzbot+cd20829ac44b92bf6ed0@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a22c00be

29 10月, 2021 1 次提交

block: improve readability of blk_mq_end_request_batch() · 02f7eab0

由 Jens Axboe 提交于 10月 28, 2021

It's faster and easier to read if we tolerate cur_hctx being NULL in
the "when to flush" condition. Rename last_hctx to cur_hctx while at it,
as it better describes the role of that variable.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02f7eab0

27 10月, 2021 5 次提交

block: re-flow blk_mq_rq_ctx_init() · c7b84d42

由 Jens Axboe 提交于 10月 19, 2021

Now that we have flags passed in, we can do a final re-arrange of the
flow of blk_mq_rq_ctx_init() so we're always writing request in the
order in which it is laid out.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20211019153300.623322-5-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

c7b84d42

block: prefetch request to be initialized · 92aff191

由 Jens Axboe 提交于 10月 19, 2021

Now we have the tags available in __blk_mq_alloc_requests_batch(), we
can start fetching the first request cacheline before calling into the
request initialization.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20211019153300.623322-4-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

92aff191

block: pass in blk_mq_tags to blk_mq_rq_ctx_init() · fe6134f6

由 Jens Axboe 提交于 10月 19, 2021

Instead of getting this from data for every invocation of request
initialization, pass it in as an argument instead.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20211019153300.623322-3-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

fe6134f6

block: add rq_flags to struct blk_mq_alloc_data · 56f8da64

由 Jens Axboe 提交于 10月 19, 2021

There's a hole here we can use, and it's faster to set this earlier
rather than need to check q->elevator multiple times.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20211019153300.623322-2-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>

56f8da64

block: schedule queue restart after BLK_STS_ZONE_RESOURCE · 9586e67b

由 Naohiro Aota 提交于 10月 27, 2021

When dispatching a zone append write request to a SCSI zoned block device,
if the target zone of the request is already locked, the device driver will
return BLK_STS_ZONE_RESOURCE and the request will be pushed back to the
hctx dipatch queue. The queue will be marked as RESTART in
dd_finish_request() and restarted in __blk_mq_free_request(). However, this
restart applies to the hctx of the completed request. If the requeued
request is on a different hctx, dispatch will no be retried until another
request is submitted or the next periodic queue run triggers, leading to up
to 30 seconds latency for the requeued request.

Fix this problem by scheduling a queue restart similarly to the
BLK_STS_RESOURCE case or when we cannot get the budget.

Also, consolidate the checks into the "need_resource" variable to simplify
the condition.
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Link: https://lore.kernel.org/r/20211026165127.4151055-1-naohiro.aota@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9586e67b

26 10月, 2021 1 次提交

blk-mq: don't issue request directly in case that current is to be blocked · ff155223

由 Ming Lei 提交于 10月 26, 2021

When flushing plug list in case that current will be blocked, we can't
issue request directly because ->queue_rq() may sleep, otherwise scheduler
may complain.

Fixes: dc5fc361 ("block: attempt direct issue of plug list")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211026082257.2889890-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ff155223

22 10月, 2021 1 次提交

block: fix req_bio_endio append error handling · 297db731

由 Pavel Begunkov 提交于 10月 22, 2021

Shinichiro Kawasaki reports that there is a bug in a recent
req_bio_endio() patch causing problems with zonefs. As Shinichiro
suggested, inverse the condition in zone append path to resemble how it
was before: fail when it's not fully completed.

Fixes: 478eb72b ("block: optimise req_bio_endio()")
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/344ea4e334aace9148b41af5f2426da38c8aa65a.1634914228.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

297db731

21 10月, 2021 1 次提交

block: clean up blk_mq_submit_bio() merging · 179ae84f

由 Pavel Begunkov 提交于 10月 20, 2021

Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a
common if, so we don't check it twice.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/daedc90d4029a5d1d73344771632b1faca3aaf81.1634755800.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

179ae84f

20 10月, 2021 6 次提交

blk-mq: only flush requests from the plug in blk_mq_submit_bio · a214b949

由 Christoph Hellwig 提交于 10月 20, 2021

Replace the call to blk_flush_plug_list in blk_mq_submit_bio with a
direct call to blk_mq_flush_plug_list. This means we do not flush
plug callback from stackable devices, which doesn't really help with
the accumulated requests anyway, and it also means the cached requests
aren't freed here as they can still be used later on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211020144119.142582-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a214b949

block: remove inaccurate requeue check · 037057a5

由 Jens Axboe 提交于 10月 20, 2021

This check is meant to catch cases where a requeue is attempted on a
request that is still inserted. It's never really been useful to catch any
misuse, and now it's actively wrong. Outside of that, this should not be a
BUG_ON() to begin with.

Remove the check as it's now causing active harm, as requeue off the plug
path will trigger it even though the request state is just fine.
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/linux-block/CAHj4cs80zAUc2grnCZ015-2Rvd-=gXRfB_dFKy=RTm+wRo09HQ@mail.gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>

037057a5

block: optimise req_bio_endio() · 478eb72b

由 Pavel Begunkov 提交于 10月 19, 2021

First, get rid of an extra branch and chain error checks. Also reshuffle
it with bio_advance(), so it goes closer to the final check, with that
the compiler loads rq->rq_flags only once, and also doesn't reload
bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

478eb72b

blk-mq: support concurrent queue quiesce/unquiesce · e70feb8b

由 Ming Lei 提交于 10月 14, 2021

blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support
concurrent/nested quiesce. One biggest issue is that unquiesce can happen
unexpectedly in case that quiesce/unquiesce are run concurrently from
more than one context.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last/outer-most one of all
contexts.

Several kernel panic issue has been reported[1][2][3] when running stress
quiesce test. And this patch has been verified in these reports.

[1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787
[2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722
[3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.htmlSigned-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014081710.1871747-7-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e70feb8b

block: inline fast path of driver tag allocation · a808a9d5

由 Jens Axboe 提交于 10月 13, 2021

If we don't use an IO scheduler or have shared tags, then we don't need
to call into this external function at all. This saves ~2% for such
a setup.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a808a9d5

blk-mq: don't handle non-flush requests in blk_insert_flush · d92ca9d8

由 Christoph Hellwig 提交于 10月 19, 2021

Return to the normal blk_mq_submit_bio flow if the bio did not end up
actually being a flush because the device didn't support it. Note that
this is basically impossible to hit without special instrumentation given
that submit_bio_checks already clears these flags usually, so we'd need a
tight race to actually hit this code path.

With this the call to blk_mq_run_hw_queue for the flush requests can be
removed given that the actual flush requests are always issued via the
requeue workqueue which runs the queue unconditionally.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d92ca9d8

19 10月, 2021 14 次提交

block: attempt direct issue of plug list · dc5fc361

由 Jens Axboe 提交于 10月 19, 2021

If we have just one queue type in the plug list, then we can extend our
direct issue to cover a full plug list as well. This allows sending a
batch of requests for direct issue, which is more efficient than doing
one-at-a-time kind of issue.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc5fc361

block: change plugging to use a singly linked list · bc490f81

由 Jens Axboe 提交于 10月 18, 2021

Use a singly linked list for the blk_plug. This saves 8 bytes in the
blk_plug struct, and makes for faster list manipulations than doubly
linked lists. As we don't use the doubly linked lists for anything,
singly linked is just fine.

This yields a bump in default (merging enabled) performance from 7.0
to 7.1M IOPS, and ~7.5M IOPS with merging disabled.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc490f81

block: move blk_mq_tag_to_rq() inline · e028f167

由 Jens Axboe 提交于 10月 16, 2021

This is in the fast path of driver issue or completion, and it's a single
array index operation. Move it inline to avoid a function call for it.

This does mean making struct blk_mq_tags block layer public, but there's
not really much in there.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e028f167

block: get rid of plug list sorting · df87eb0f

由 Jens Axboe 提交于 10月 18, 2021

Even if we have multiple queues in the plug list, chances that they
are very interspersed is minimal. Don't bother spending CPU cycles
sorting the list.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

df87eb0f

block: return whether or not to unplug through boolean · 87c037d1

由 Jens Axboe 提交于 10月 18, 2021

Instead of returning the same queue request through a request pointer,
use a boolean to accomplish the same.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87c037d1

block: don't call blk_status_to_errno in blk_update_request · 8a7d267b

由 Christoph Hellwig 提交于 10月 18, 2021

We only need to call it to resolve the blk_status_t -> errno mapping for
tracing, so move the conversion into the tracepoints that are not called
at all when tracing isn't enabled.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a7d267b

block: fix too broad elevator check in blk_mq_free_request() · e0d78afe

由 Jens Axboe 提交于 10月 18, 2021

We added RQF_ELV to tell whether there's an IO scheduler attached, and
RQF_ELVPRIV tells us whether there's an IO scheduler with private data
attached. Don't check RQF_ELV in blk_mq_free_request(), what we care
about here is just if we have scheduler private data attached.

This fixes a boot crash

Fixes: 2ff0682d ("block: store elevator state in request")
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e0d78afe

block: add support for blk_mq_end_request_batch() · f794f335

由 Jens Axboe 提交于 10月 08, 2021

Instead of calling blk_mq_end_request() on a single request, add a helper
that takes the new struct io_comp_batch and completes any request stored
in there.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f794f335

block: add a struct io_comp_batch argument to fops->iopoll() · 5a72e899

由 Jens Axboe 提交于 10月 12, 2021

struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.

For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a72e899

block: provide helpers for rq_list manipulation · 013a7f95

由 Jens Axboe 提交于 10月 13, 2021

Instead of open-coding the list additions, traversal, and removal,
provide a basic set of helpers.
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

013a7f95

block: remove some blk_mq_hw_ctx debugfs entries · afd7de03

由 Jens Axboe 提交于 10月 18, 2021

Just like the blk_mq_ctx counterparts, we've got a bunch of counters
in here that are only for debugfs and are of questionnable value. They
are:

- dispatched, index of how many requests were dispatched in one go

- poll_{considered,invoked,success}, which track poll sucess rates. We're
  confident in the iopoll implementation at this point, don't bother
  tracking these.

As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes,
dropping a whole cacheline.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afd7de03

block: remove debugfs blk_mq_ctx dispatched/merged/completed attributes · 9a14d6ce

由 Jens Axboe 提交于 10月 16, 2021

These were added as part of early days debugging for blk-mq, and they
are not really useful anymore. Rather than spend cycles updating them,
just get rid of them.

As a bonus, this shrinks the per-cpu software queue size from 256b
to 192b. That's a whole cacheline less.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9a14d6ce

block: cache rq_flags inside blk_mq_rq_ctx_init() · 12845906

由 Pavel Begunkov 提交于 10月 18, 2021

Add a local variable for rq_flags, it helps to compile out some of
rq_flags reloads.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

12845906

block: blk_mq_rq_ctx_init cache ctx/q/hctx · 605f784e

由 Pavel Begunkov 提交于 10月 18, 2021

We should have enough of registers in blk_mq_rq_ctx_init(), store them
in local vars, so we don't keep reloading them.

note: keeping q->elevator may look unnecessary, but it's also used
inside inlined blk_mq_tags_from_data().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

605f784e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功