提交 · 2645672ffe21f0a1c139bfbc05ad30fd4e4f2583 · openeuler / Kernel

22 6月, 2022 1 次提交

block: pop cached rq before potentially blocking rq_qos_throttle() · 2645672f

由 Jens Axboe 提交于 6月 21, 2022

If rq_qos_throttle() ends up blocking, then we will have invalidated and
flushed our current plug. Since blk_mq_get_cached_request() hasn't
popped the cached request off the plug list just yet, we end holding a
pointer to a request that is no longer valid. This insta-crashes with
rq->mq_hctx being NULL in the validity checks just after.

Pop the request off the cached list before doing rq_qos_throttle() to
avoid using a potentially stale request.

Fixes: 0a5aa8d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
Reported-by: NDylan Yudaken <dylany@fb.com>
Tested-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2645672f

17 6月, 2022 4 次提交

blk-mq: don't clear flush_rq from tags->rqs[] · 6cfeadbf

由 Ming Lei 提交于 6月 16, 2022

commit 364b6181 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.

Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.

Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220616014401.817001-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6cfeadbf

blk-mq: avoid to touch q->elevator without any protection · 4d337ceb

由 Ming Lei 提交于 6月 16, 2022

q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.

Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch. Meantime the elevator feature flag
of ELEVATOR_F_MQ_AWARE isn't needed any more.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4d337ceb

blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none · 5fd7a84a

由 Ming Lei 提交于 6月 16, 2022

elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220616014401.817001-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5fd7a84a

block: Fix handling of offline queues in blk_mq_alloc_request_hctx() · 14dc7a18

由 Bart Van Assche 提交于 6月 15, 2022

This patch prevents that test nvme/004 triggers the following:

UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9
index 512 is out of range for type 'long unsigned int [512]'
Call Trace:
 show_stack+0x52/0x58
 dump_stack_lvl+0x49/0x5e
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x3b
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 blk_mq_alloc_request_hctx+0x304/0x310
 __nvme_submit_sync_cmd+0x70/0x200 [nvme_core]
 nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics]
 nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop]
 nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop]
 nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics]
 nvmf_dev_write+0xae/0x111 [nvme_fabrics]
 vfs_write+0x144/0x560
 ksys_write+0xb7/0x140
 __x64_sys_write+0x42/0x50
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 20e4d813 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220615210004.1031820-1-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

14dc7a18

30 5月, 2022 1 次提交

blk-mq: do not update io_ticks with passthrough requests · b81c14ca

由 Haisu Wang 提交于 5月 30, 2022

Flush or passthrough requests are not accounted as normal IO in completion.
To reflect iostat for slow IO, io_ticks is updated when stat show called
based on inflight numbers.
It may cause inconsistent io_ticks calculation result.

So do not account non-passthrough request when check inflight.

Fixes: 86d73312 ("block: update io_ticks when io hang")
Signed-off-by: NHaisu Wang <haisuwang@tencent.com>
Reviewed-by: Nsamuelliao <samuelliao@tencent.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220530064059.1120058-1-haisuwang@tencent.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b81c14ca

28 5月, 2022 3 次提交

blk-mq: remove the done argument to blk_execute_rq_nowait · e2e53086

由 Christoph Hellwig 提交于 5月 24, 2022

Let the caller set it together with the end_io_data instead of passing
a pointless argument. Note the the target code did in fact already
set it and then just overrode it again by calling blk_execute_rq_nowait.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e2e53086

blk-mq: avoid a mess of casts for blk_end_sync_rq · 32ac5a9b

由 Christoph Hellwig 提交于 5月 24, 2022

Instead of trying to cast a __bitwise 32-bit integer to a larger integer
and then a pointer, just allow a struct with the blk_status_t and the
completion on stack and set the end_io_data to that.  Use the
opportunity to move the code to where it belongs and drop rather
confusing comments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

32ac5a9b

blk-mq: remove __blk_execute_rq_nowait · ae948fd6

由 Christoph Hellwig 提交于 5月 24, 2022

We don't want to plug for synchronous execution that where we immediately
wait for the request. Once that is done not a whole lot of code is
shared, so just remove __blk_execute_rq_nowait.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ae948fd6

23 5月, 2022 1 次提交

blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx · 5d05426e

由 Ming Lei 提交于 5月 22, 2022

blk_mq_run_hw_queues() could be run when there isn't queued request and
after queue is cleaned up, at that time tagset is freed, because tagset
lifetime is covered by driver, and often freed after blk_cleanup_queue()
returns.

So don't touch ->tagset for figuring out current default hctx by the mapping
built in request queue, so use-after-free on tagset can be avoided. Meantime
this way should be fast than retrieving mapping from tagset.

Cc: "yukuai (C)" <yukuai3@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Fixes: b6e68ee8 ("blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220522122350.743103-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5d05426e

21 5月, 2022 1 次提交

blk-mq: fix typo in comment · 2aaf5160

由 Julia Lawall 提交于 5月 21, 2022

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.
Signed-off-by: NJulia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220521111145.81697-29-Julia.Lawall@inria.frSigned-off-by: NJens Axboe <axboe@kernel.dk>

2aaf5160

12 5月, 2022 1 次提交

blk-mq: fix passthrough plugging · a327c341

由 Ming Lei 提交于 5月 12, 2022

First we can't add request into plug list in blk_mq_request_bypass_insert
which may be called when flushing plug list, so nested plug is caused.

Second if polled passthrough request is inserted via blk_execute_rq(),
it can't be added to plug list too since io polling needs the request
to be issued to driver.

Fixes the two by moving plugging into blk_execute_rq_no_wait().

Cc: Christoph Hellwig <hch@lst.de>
Fixes: 1c2d2fff ("block: wire-up support for passthrough plugging")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220512140010.1458645-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a327c341

11 5月, 2022 1 次提交

block: wire-up support for passthrough plugging · 1c2d2fff

由 Jens Axboe 提交于 5月 11, 2022

Add support for plugging in passthrough path. When plugging is enabled, the
requests are added to a plug instead of getting dispatched to the driver.
And when the plug is finished, the whole batch gets dispatched via
->queue_rqs which turns out to be more efficient. Otherwise dispatching
used to happen via ->queue_rq, one request at a time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1c2d2fff

28 4月, 2022 1 次提交

Revert "block: inherit request start time from bio for BLK_CGROUP" · 4cddeaca

由 Tejun Heo 提交于 4月 27, 2022

This reverts commit 00067077. It has a
couple problems:

* bio_issue_time() is stored in bio->bi_issue truncated to 51 bits. This
  overflows in slightly over 26 days. Setting rq->io_start_time_ns with it
  means that io duration calculation would yield >26days after 26 days of
  uptime. This, for example, confuses kyber making it cause high IO
  latencies.

* rq->io_start_time_ns should record the time that the IO is issued to the
  device so that on-device latency can be measured. However,
  bio_issue_time() is set before the bio goes through the rq-qos controllers
  (wbt, iolatency, iocost), so when the bio gets throttled in any of the
  mechanisms, the measured latencies make no sense - on-device latencies end
  up higher than request-alloc-to-completion latencies.

We'll need a smarter way to avoid calling ktime_get_ns() repeatedly
back-to-back. For now, let's revert the commit.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.16+
Link: https://lore.kernel.org/r/YmmeOLfo5lzc+8yI@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

4cddeaca

15 4月, 2022 1 次提交

block: don't print I/O error warning for dead disks · 3d973a76

由 Christoph Hellwig 提交于 3月 23, 2022

When a disk has been marked dead, don't print warnings for I/O errors
as they are very much expected.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220323163815.1526998-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3d973a76

31 3月, 2022 1 次提交

block: use dedicated list iterator variable · 4a3b666e

由 Jakob Koschel 提交于 3月 31, 2022

To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: NJakob Koschel <jakobkoschel@gmail.com>
Link: https://lore.kernel.org/r/20220331091218.641532-1-jakobkoschel@gmail.com
[axboe: move lookup to where return value is checked]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4a3b666e

12 3月, 2022 1 次提交

block: flush plug based on hardware and software queue order · 26fed4ac

由 Jens Axboe 提交于 3月 11, 2022

We used to sort the plug list if we had multiple queues before dispatching
requests to the IO scheduler. This usually isn't needed, but for certain
workloads that interleave requests to disks, it's a less efficient to
process the plug list one-by-one if everything is interleaved.

Don't sort the list, but skip through it and flush out entries that have
the same target at the same time.

Fixes: df87eb0f ("block: get rid of plug list sorting")
Reported-and-tested-by: NSong Liu <song@kernel.org>
Reviewed-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26fed4ac

09 3月, 2022 9 次提交

block: don't remove hctx debugfs dir from blk_mq_exit_queue · de3d347f

由 Ming Lei 提交于 3月 08, 2022

The queue's top debugfs dir is removed from blk_release_queue(), so all
hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
is only called from blk_cleanup_queue(), it isn't necessary to remove
hctx debugfs from blk_mq_exit_queue().

So remove it from blk_mq_exit_queue().
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220308055200.735835-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

de3d347f

blk-mq: handle already freed tags gracefully in blk_mq_free_rqs · e02657ea

由 Ming Lei 提交于 3月 08, 2022

To simplify further changes allow for double calling blk_mq_free_rqs on
a queue.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
[hch: split out from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e02657ea

blk-mq: do not include passthrough requests in I/O accounting · 41fa7222

由 Christoph Hellwig 提交于 3月 08, 2022

I/O accounting buckets I/O into the read/write/discard categories into
which passthrough I/O does not fit at all. It also accounts to the
block_device, which may not even exist for passthrough I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220308055200.735835-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

41fa7222

blk-mq: manage hctx map via xarray · 4e5cc99e

由 Ming Lei 提交于 3月 08, 2022

First code becomes more clean by switching to xarray from plain array.

Second use-after-free on q->queue_hw_ctx can be fixed because
queue_for_each_hw_ctx() may be run when updating nr_hw_queues is
in-progress. With this patch, q->hctx_table is defined as xarray, and
this structure will share same lifetime with request queue, so
queue_for_each_hw_ctx() can use q->hctx_table to lookup hctx reliably.
Reported-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220308073219.91173-7-ming.lei@redhat.com
[axboe: fix blk_mq_hw_ctx forward declaration]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e5cc99e

blk-mq: prepare for implementing hctx table via xarray · 4f481208

由 Ming Lei 提交于 3月 08, 2022

It is inevitable to cause use-after-free on q->queue_hw_ctx between
queue_for_each_hw_ctx() and blk_mq_update_nr_hw_queues(). And converting
to xarray can fix the uaf, meantime code gets cleaner.

Prepare for converting q->queue_hctx_ctx into xarray, one thing is that
xa_for_each() can only accept 'unsigned long' as index, so changes type
of hctx index of queue_for_each_hw_ctx() into 'unsigned long'.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-6-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4f481208

blk-mq: reconfigure poll after queue map is changed · 42ee3061

由 Ming Lei 提交于 3月 08, 2022

queue map can be changed when updating nr_hw_queues, so we need to
reconfigure queue's poll capability. Add one helper for doing this job.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

42ee3061

blk-mq: simplify reallocation of hw ctxs a bit · 306f13ee

由 Ming Lei 提交于 3月 08, 2022

blk_mq_alloc_and_init_hctx() has already taken reuse into account, so
no need to do it outside, then we can simplify blk_mq_realloc_hw_ctxs().
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

306f13ee

blk-mq: figure out correct numa node for hw queue · 4d805131

由 Ming Lei 提交于 3月 08, 2022

The current code always uses default queue map and hw queue index
for figuring out the numa node for hw queue, this way isn't correct
because blk-mq supports three queue maps, and the correct queue map
should be used for the specified hw queue.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4d805131

block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection · 0a5aa8d1

由 Shin'ichiro Kawasaki 提交于 3月 08, 2022

Commit 9d497e29 ("block: don't protect submit_bio_checks by
q_usage_counter") moved blk_mq_attempt_bio_merge and rq_qos_throttle
calls out of q_usage_counter protection. However, these functions require
q_usage_counter protection. The blk_mq_attempt_bio_merge call without
the protection resulted in blktests block/005 failure with KASAN null-
ptr-deref or use-after-free at bio merge. The rq_qos_throttle call
without the protection caused kernel hang at qos throttle.

To fix the failures, move the blk_mq_attempt_bio_merge and
rq_qos_throttle calls back to q_usage_counter protection.

Fixes: 9d497e29 ("block: don't protect submit_bio_checks by q_usage_counter")
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20220308080915.3473689-1-shinichiro.kawasaki@wdc.comReviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a5aa8d1

08 3月, 2022 1 次提交

block: remove the per-bio/request write hint · c75e707f

由 Christoph Hellwig 提交于 3月 04, 2022

With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c75e707f

17 2月, 2022 5 次提交

blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues · 8f5fea65

由 David Jeffery 提交于 1月 31, 2022

When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it can
reset the delay length for an already pending delayed work run_work. This
creates a scenario where multiple hctx may have their queues set to run,
but if one runs first and finds nothing to do, it can reset the delay of
another hctx and stall the other hctx's ability to run requests.

To avoid this I/O stall when an hctx's run_work is already pending,
leave it untouched to run at its current designated time rather than
extending its delay. The work will still run which keeps closed the race
calling blk_mq_delay_run_hw_queues is needed for while also avoiding the
I/O stall.
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220131203337.GA17666@redhatSigned-off-by: NJens Axboe <axboe@kernel.dk>

8f5fea65

block: move blk_crypto_bio_prep() out of blk-mq.c · 7f36b7d0

由 Ming Lei 提交于 2月 16, 2022

blk_crypto_bio_prep() is called for both bio based and blk-mq drivers,
so move it out of blk-mq.c, then we can unify this kind of handling.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7f36b7d0

blk-mq: remove the request_queue argument to blk_insert_cloned_request · 28db4711

由 Christoph Hellwig 提交于 2月 15, 2022

The request must be submitted to the queue it was allocated for, so
remove the extra request_queue argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

28db4711

blk-mq: fold blk_cloned_rq_check_limits into blk_insert_cloned_request · a5efda3c

由 Christoph Hellwig 提交于 2月 15, 2022

Fold blk_cloned_rq_check_limits into its only caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a5efda3c

blk-mq: make the blk-mq stacking code optional · 248c7933

由 Christoph Hellwig 提交于 2月 15, 2022

The code to stack blk-mq drivers is only used by dm-multipath, and
will preferably stay that way. Make it optional and only selected
by device mapper, so that the buildbots more easily catch abuses
like the one that slipped in in the ufs driver in the last merged
window. Another positive side effects is that kernel builds without
device mapper shrink a little bit as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220215100540.3892965-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

248c7933

12 2月, 2022 2 次提交

block: introduce block_rq_error tracepoint · d5869fdc

由 Yang Shi 提交于 2月 10, 2022

Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.

But there are a few problems with this approach:

1. Even kernel trace filter could do the filtering work, there is
   still some overhead after we enable this tracepoint.

2. The filter is merely based on errno, which does not align with kernel
   logic to check the errors for print_req_error().

3. block_rq_complete only provides dev major and minor to identify
   the block device, it is not convenient to use in user-space.

So introduce a new tracepoint block_rq_error just for the error case.
With this patch, rasdaemon could switch to block_rq_error.

Since the new tracepoint has the similar implementation with
block_rq_complete, so move the existing code from TRACE_EVENT
block_rq_complete() into new event class block_rq_completion(). Then add
event for block_rq_complete and block_rq_err respectively from the newly
created event class per the suggestion from Chaitanya Kulkarni.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NYang Shi <shy828301@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220210225222.260069-1-shy828301@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d5869fdc

block: Add handling for zone append command in blk_complete_request · a12821d5

由 Pankaj Raghav 提交于 2月 11, 2022

Zone append command needs special handling to update the bi_sector
field in the bio struct with the actual position of the data in the
device. It is stored in __sector field of the request struct.

Fixes: 5581a5dd ("block: add completion handler for fast path")
Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NAdam Manzanares <a.manzanares@samsung.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220211093425.43262-2-p.raghav@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a12821d5

04 2月, 2022 1 次提交

block: pass a block_device to bio_clone_fast · abfc426d

由 Christoph Hellwig 提交于 2月 02, 2022

Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

abfc426d

26 1月, 2022 1 次提交

blk-mq: fix missing blk_account_io_done() in error path · 592ee119

由 Yu Kuai 提交于 1月 26, 2022

If blk_mq_request_issue_directly() failed from
blk_insert_cloned_request(), the request will be accounted start.
Currently, blk_insert_cloned_request() is only called by dm, and such
request won't be accounted done by dm.

In normal path, io will be accounted start from blk_mq_bio_to_request(),
when the request is allocated, and such io will be accounted done from
__blk_mq_end_request_acct() whether it succeeded or failed. Thus add
blk_account_io_done() to fix the problem.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220126012132.3111551-1-yukuai3@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

592ee119

18 1月, 2022 1 次提交

block: assign bi_bdev for cloned bios in blk_rq_prep_clone · fd9f4e62

由 Christoph Hellwig 提交于 1月 18, 2022

bio_clone_fast() sets the cloned bio to have the same ->bi_bdev as the
source bio. This means that when request-based dm called setup_clone(),
the cloned bio had its ->bi_bdev pointing to the dm device. After Commit
0b6e522c ("blk-mq: use ->bi_bdev for I/O accounting")
__blk_account_io_start() started using the request's ->bio->bi_bdev for
I/O accounting, if it was set. This caused IO going to the underlying
devices to use the dm device for their I/O accounting.

Set up the proper ->bi_bdev in blk_rq_prep_clone based on the whole
device bdev for the queue the request is cloned onto.

Fixes: 0b6e522c ("blk-mq: use ->bi_bdev for I/O accounting")
Reported-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[hch: the commit message is mostly from a different patch from Benjamin]
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBenjamin Marzinski <bmarzins@redhat.com>
Link: https://lore.kernel.org/r/20220118070444.1241739-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

fd9f4e62

16 1月, 2022 1 次提交

cpumask: replace cpumask_next_* with cpumask_first_* where appropriate · 9b51d9d8

由 Yury Norov 提交于 8月 14, 2021

cpumask_first() is a more effective analogue of 'next' version if n == -1
(which means start == 0). This patch replaces 'next' with 'first' where
things look trivial.

There's no cpumask_first_zero() function, so create it.
Signed-off-by: NYury Norov <yury.norov@gmail.com>
Tested-by: NWolfram Sang <wsa+renesas@sang-engineering.com>

9b51d9d8

10 1月, 2022 1 次提交

block: don't protect submit_bio_checks by q_usage_counter · 9d497e29

由 Ming Lei 提交于 1月 04, 2022

Commit cc9c884d ("block: call submit_bio_checks under q_usage_counter")
uses q_usage_counter to protect submit_bio_checks for avoiding IO after
disk is deleted by del_gendisk().

Turns out the protection isn't necessary, because once
blk_mq_freeze_queue_wait() in del_gendisk() returns:

1) all in-flight IO has been done

2) all new IO will be failed in __bio_queue_enter() because
   q_usage_counter is dead, and GD_DEAD is set

3) both disk and request queue instance are safe since caller of
submit_bio() guarantees that the disk can't be closed.

Once submit_bio_checks() needn't the protection of q_usage_counter, we can
move submit_bio_checks before calling blk_mq_submit_bio() and
->submit_bio(). With this change, we needn't to throttle queue with
holding one allocated request, then precise driver tag or request won't be
wasted in throttling. Meantime we can unify the bio check for both bio
based and request based driver.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220104134223.590803-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9d497e29

21 12月, 2021 1 次提交

blk-mq: blk-mq: check quiesce state before queue_rqs · 518579a9

由 Keith Busch 提交于 12月 20, 2021

The low level drivers don't expect to see new requests after a
successful quiesce completes. Check the queue quiesce state within the
rcu protected area prior to calling the driver's queue_rqs().

Fixes: 3c67d44d ("block: add mq_ops->queue_rqs hook")
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20211220205919.180191-1-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

518579a9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功