提交 · 2a904d00855f94cb85751e45fa494f225d44ae0d · openeuler / Kernel

04 12月, 2021 2 次提交

blk-mq: remove hctx_lock and hctx_unlock · 2a904d00

由 Ming Lei 提交于 12月 03, 2021

Remove hctx_lock and hctx_unlock, and add one helper of
blk_mq_run_dispatch_ops() to run code block defined in dispatch_ops
with rcu/srcu read held.

Compared with hctx_lock()/hctx_unlock():

1) remove 2 branch to 1, so we just need to check
(hctx->flags & BLK_MQ_F_BLOCKING) once when running one dispatch_ops

2) srcu_idx needn't to be touched in case of non-blocking

3) might_sleep_if() can be moved to the blocking branch

Also put the added blk_mq_run_dispatch_ops() in private header, so that
the following patch can use it out of blk-mq.c.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203131534.3668411-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2a904d00

block: switch to atomic_t for request references · 0a467d0f

由 Jens Axboe 提交于 10月 14, 2021

refcount_t is not as expensive as it used to be, but it's still more
expensive than the io_uring method of using atomic_t and just checking
for potential over/underflow.

This borrows that same implementation, which in turn is based on the
mm implementation from Linus.
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a467d0f

03 12月, 2021 2 次提交

block: fix double bio queue when merging in cached request path · a08ed9aa

由 Jens Axboe 提交于 12月 02, 2021

When we attempt to merge off the cached request path, we return NULL
if successful. This makes the caller believe that it's should allocate
a new request, and hence we end up with the bio both merged and associated
with a new request. This, predictably, leads to all sorts of crashes.

Pass in a pointer to the bio pointer, and clear it for the merge case.
Then the caller knows that the bio is already queued, and no new requests
need to get allocated.

Fixes: 5b13bc8a ("blk-mq: cleanup request allocation")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a08ed9aa

block: get rid of useless goto and label in blk_mq_get_new_requests() · 373b5416

由 Jens Axboe 提交于 12月 02, 2021

Expected case is returning a request, just check for success and return
the request rather than having an error label.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

373b5416

29 11月, 2021 19 次提交

block: remove the gendisk argument to blk_execute_rq · b84ba30b

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b84ba30b

block: remove the ->rq_disk field in struct request · f3fa33ac

由 Christoph Hellwig 提交于 11月 26, 2021

Just use the disk attached to the request_queue instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f3fa33ac

block: move the remaining elv.icq handling to the I/O scheduler · 222ee581

由 Christoph Hellwig 提交于 11月 26, 2021

After the prepare side has been moved to the only I/O scheduler that
cares, do the same for the cleanup and the NULL initialization.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

222ee581

blk-mq: use bio->bi_opf after bio is checked · 5f480b1a

由 Ming Lei 提交于 11月 27, 2021

bio->bi_opf isn't finalized before checking the bio, so use it after
submit_bio_checks() returns.

Fixes: 5b13bc8a ("blk-mq: cleanup request allocation")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f480b1a

blk-mq: cleanup request allocation · 5b13bc8a

由 Christoph Hellwig 提交于 11月 24, 2021

Refactor the request alloction so that blk_mq_get_cached_request tries
to find a cached request first, and the entirely separate and now
self contained blk_mq_get_new_requests allocates one or more requests
if that is not possible.

There is a small change in behavior as submit_bio_checks is called
twice now if a cached request is present but can't be used, but that
is a small price to pay for unwinding this code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211124062856.1444266-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

5b13bc8a

block: don't include <linux/part_stat.h> in blk.h · 82d981d4

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

82d981d4

blk_mq: remove repeated includes · 35c90e6e

由 Guo Zhengkui 提交于 11月 23, 2021

Remove a repeated "#include<linux/sched/sysctl.h>".
Signed-off-by: NGuo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211123063340.25882-1-guozhengkui@vivo.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

35c90e6e

block: move io_context creation into where it's needed · 5a9d041b

由 Jens Axboe 提交于 11月 13, 2021

The only user of the io_context for IO is BFQ, yet we put the checking
and logic of it into the normal IO path.

Put the creation into blk_mq_sched_assign_ioc(), and have BFQ use that
helper.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a9d041b

block: only allocate poll_stats if there's a user of them · 48b5c1fb

由 Jens Axboe 提交于 11月 13, 2021

This is essentially never used, yet it's about 1/3rd of the total
queue size. Allocate it when needed, and don't embed it in the queue.

Kill the queue flag for this while at it, since we can just check the
assigned pointer now.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48b5c1fb

blk-mq: move more plug handling from blk_mq_submit_bio into blk_add_rq_to_plug · 1e9c2303

由 Christoph Hellwig 提交于 11月 23, 2021

Keep all the functionality for adding a request to a plug in a single place.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e9c2303

blk-mq: simplify the plug handling in blk_mq_submit_bio · 0c5bcc92

由 Christoph Hellwig 提交于 11月 23, 2021

blk_mq_submit_bio has two different plug cases, one that uses full
plugging and a limited plugging one.

The limited plugging case is only used for a corner case that does
not matter in real life:

 - no ->commit_rqs (so not NVMe)
 - no shared tags (so not SCSI)
 - not rotational (so no old disk or floppy driver)
 - must have multiple queues (so no eMMC)

Remove the limited merging case and all the related junk to simplify
blk_mq_submit_bio and the functions called from it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c5bcc92

block: move blk_print_req_error to blk-mq.c · 0d7a29a2

由 Christoph Hellwig 提交于 11月 17, 2021

This function is only used by the request completion path. Factor out
a blk_status_to_str to keep blk_errors private in blk-core.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0d7a29a2

block: move blk_dump_rq_flags to blk-mq.c · 22350ad7

由 Christoph Hellwig 提交于 11月 17, 2021

blk_dump_rq_flags deals with a request, so move it to blk-mq.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

22350ad7

block: move blk_account_io_{start,done} to blk-mq.c · 450b7879

由 Christoph Hellwig 提交于 11月 17, 2021

These are only used for request based I/O, so move them where they are
used.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

450b7879

block: move blk_steal_bios to blk-mq.c · f2b8f3ce

由 Christoph Hellwig 提交于 11月 17, 2021

Keep all the request based code together.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f2b8f3ce

block: move blk_rq_init to blk-mq.c · 52fdbbcc

由 Christoph Hellwig 提交于 11月 17, 2021

blk_rq_init deals with a request structure, so move it to blk-mq.c
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

52fdbbcc

block: move request based cloning helpers to blk-mq.c · 06c8c691

由 Christoph Hellwig 提交于 11月 17, 2021

Keep all the request based code together.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

06c8c691

blk-mq: move blk_mq_flush_plug_list · b84c5b50

由 Christoph Hellwig 提交于 11月 17, 2021

Move blk_mq_flush_plug_list and blk_mq_plug_issue_direct down in blk-mq.c
to prepare for marking blk_mq_request_issue_directly static without the
need of a forward declaration.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b84c5b50

block: remove blk-exec.c · 4054cff9

由 Christoph Hellwig 提交于 11月 17, 2021

All this code is tightly coupled to the blk-mq core, so move it
there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-4-hch@lst.de
[axboe: remove doc generation for blk-exec.c]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4054cff9

27 11月, 2021 1 次提交

block: call rq_qos_done() before ref check in batch completions · 98b26a0e

由 Jens Axboe 提交于 11月 26, 2021

We need to call rq_qos_done() regardless of whether or not we're freeing
the request or not, as the reference count doesn't cover the IO completion
tracking.

Fixes: f794f335 ("block: add support for blk_mq_end_request_batch()")
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reported-by: NKenneth R. Crudup <kenny@panix.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

98b26a0e

19 11月, 2021 1 次提交

blk-mq: don't insert FUA request with data into scheduler queue · 2b504bd4

由 Ming Lei 提交于 11月 18, 2021

We never insert flush request into scheduler queue before.

Recently commit d92ca9d8 ("blk-mq: don't handle non-flush requests in
blk_insert_flush") tries to handle FUA data request as normal request.
This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in
case of kyber since RQF_ELVPRIV isn't set for flush request, then
->finish_request won't be called.

Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert()
when the device supports FUA, just like what we did before.

[1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/Reported-by: NYi Zhang <yi.zhang@redhat.com>
Fixes: d92ca9d8 ("blk-mq: don't handle non-flush requests in blk_insert_flush")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20211118153041.2163228-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2b504bd4

16 11月, 2021 2 次提交

blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() · 2a19b28f

由 Ming Lei 提交于 11月 16, 2021

For avoiding to slow down queue destroy, we don't call
blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
cancel dispatch work in blk_release_queue().

However, this way has caused kernel oops[1], reported by Changhui. The log
shows that scsi_device can be freed before running blk_release_queue(),
which is expected too since scsi_device is released after the scsi disk
is closed and the scsi_device is removed.

Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
and disk_release():

1) when disk_release() is run, the disk has been closed, and any sync
dispatch activities have been done, so canceling dispatch work is enough to
quiesce filesystem I/O dispatch activity.

2) in blk_cleanup_queue(), we only focus on passthrough request, and
passthrough request is always explicitly allocated & freed by
its caller, so once queue is frozen, all sync dispatch activity
for passthrough request has been done, then it is enough to just cancel
dispatch work for avoiding any dispatch activity.

[1] kernel panic log
[12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
[12622.777186] #PF: supervisor read access in kernel mode
[12622.782918] #PF: error_code(0x0000) - not-present page
[12622.788649] PGD 0 P4D 0
[12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
[12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
[12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[12622.813321] Workqueue: kblockd blk_mq_run_work_fn
[12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
[12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 <48> 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
[12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
[12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
[12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
[12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
[12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
[12622.889926] FS:  0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
[12622.898956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
[12622.913328] Call Trace:
[12622.916055]  <TASK>
[12622.918394]  scsi_mq_get_budget+0x1a/0x110
[12622.922969]  __blk_mq_do_dispatch_sched+0x1d4/0x320
[12622.928404]  ? pick_next_task_fair+0x39/0x390
[12622.933268]  __blk_mq_sched_dispatch_requests+0xf4/0x140
[12622.939194]  blk_mq_sched_dispatch_requests+0x30/0x60
[12622.944829]  __blk_mq_run_hw_queue+0x30/0xa0
[12622.949593]  process_one_work+0x1e8/0x3c0
[12622.954059]  worker_thread+0x50/0x3b0
[12622.958144]  ? rescuer_thread+0x370/0x370
[12622.962616]  kthread+0x158/0x180
[12622.966218]  ? set_kthread_struct+0x40/0x40
[12622.970884]  ret_from_fork+0x22/0x30
[12622.974875]  </TASK>
[12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
Reported-by: NChanghuiZhong <czhong@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2a19b28f

block: fix missing queue put in error path · 95febeb6

由 Jens Axboe 提交于 11月 15, 2021

If we fail the submission queue checks, we don't put the queue afterwards.
This can cause various issues like stalls on scheduler switch or failure
to remove the device, or like in the original bug report, timeout waiting
for the device on reboot/restart.

While in there, fix a few whitespace discrepancies in the surrounding
code.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215039
Fixes: b637108a ("blk-mq: fix filesystem I/O request allocation")
Reported-and-tested-by: NStephen Smith <stephenmsmith@blueyonder.co.uk>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95febeb6

13 11月, 2021 1 次提交

blk-mq: fix filesystem I/O request allocation · b637108a

由 Ming Lei 提交于 11月 12, 2021

submit_bio_checks() may update bio->bi_opf, so we have to initialize
blk_mq_alloc_data.cmd_flags with bio->bi_opf after submit_bio_checks()
returns when allocating new request.

In case of using cached request, fallback to allocate new request if
cached rq isn't compatible with the incoming bio, otherwise change
rq->cmd_flags with incoming bio->bi_opf.

Fixes: 900e0807 ("block: move queue enter logic into blk_mq_submit_bio()")
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b637108a

12 11月, 2021 1 次提交

blk-mq: rename blk_attempt_bio_merge · b131f201

由 Ming Lei 提交于 11月 11, 2021

It is very annoying to have two block layer functions which share same
name, so rename blk_attempt_bio_merge in blk-mq.c as
blk_mq_attempt_bio_merge.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b131f201

09 11月, 2021 1 次提交

blk-mq: add one API for waiting until quiesce is done · 9ef4d020

由 Ming Lei 提交于 11月 09, 2021

Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it
is hard to switch to this style, so these drivers need one atomic flag for
helping to balance quiesce and unquiesce.

When quiesce is in-progress, the driver still needs to wait until
the quiesce is done, so add API of blk_mq_wait_quiesce_done() for
these drivers.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9ef4d020

08 11月, 2021 1 次提交

blk-mq: don't free tags if the tag_set is used by other device in queue initialztion · a846a8e6

由 Ye Bin 提交于 11月 08, 2021

We got UAF report on v5.10 as follows:
[ 1446.674930] ==================================================================
[ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90
[ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348
[ 1446.677851]
[ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2
[ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn
[ 1446.681448] Call Trace:
[ 1446.681800]  dump_stack+0x9b/0xce
[ 1446.682916]  print_address_description.constprop.6+0x3e/0x60
[ 1446.685999]  kasan_report.cold.9+0x22/0x3a
[ 1446.687186]  blk_mq_get_driver_tag+0x9a4/0xa90
[ 1446.687785]  blk_mq_dispatch_rq_list+0x21a/0x1d40
[ 1446.692576]  __blk_mq_do_dispatch_sched+0x394/0x830
[ 1446.695758]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[ 1446.698279]  blk_mq_sched_dispatch_requests+0xdf/0x140
[ 1446.698967]  __blk_mq_run_hw_queue+0xc0/0x270
[ 1446.699561]  __blk_mq_delay_run_hw_queue+0x4cc/0x550
[ 1446.701407]  blk_mq_run_hw_queue+0x13b/0x2b0
[ 1446.702593]  blk_mq_sched_insert_requests+0x1de/0x390
[ 1446.703309]  blk_mq_flush_plug_list+0x4b4/0x760
[ 1446.705408]  blk_flush_plug_list+0x2c5/0x480
[ 1446.708471]  blk_finish_plug+0x55/0xa0
[ 1446.708980]  blk_throtl_dispatch_work_fn+0x23b/0x2e0
[ 1446.711236]  process_one_work+0x6d4/0xfe0
[ 1446.711778]  worker_thread+0x91/0xc80
[ 1446.713400]  kthread+0x32d/0x3f0
[ 1446.714362]  ret_from_fork+0x1f/0x30
[ 1446.714846]
[ 1446.715062] Allocated by task 1:
[ 1446.715509]  kasan_save_stack+0x19/0x40
[ 1446.716026]  __kasan_kmalloc.constprop.1+0xc1/0xd0
[ 1446.716673]  blk_mq_init_tags+0x6d/0x330
[ 1446.717207]  blk_mq_alloc_rq_map+0x50/0x1c0
[ 1446.717769]  __blk_mq_alloc_map_and_request+0xe5/0x320
[ 1446.718459]  blk_mq_alloc_tag_set+0x679/0xdc0
[ 1446.719050]  scsi_add_host_with_dma.cold.3+0xa0/0x5db
[ 1446.719736]  virtscsi_probe+0x7bf/0xbd0
[ 1446.720265]  virtio_dev_probe+0x402/0x6c0
[ 1446.720808]  really_probe+0x276/0xde0
[ 1446.721320]  driver_probe_device+0x267/0x3d0
[ 1446.721892]  device_driver_attach+0xfe/0x140
[ 1446.722491]  __driver_attach+0x13a/0x2c0
[ 1446.723037]  bus_for_each_dev+0x146/0x1c0
[ 1446.723603]  bus_add_driver+0x3fc/0x680
[ 1446.724145]  driver_register+0x1c0/0x400
[ 1446.724693]  init+0xa2/0xe8
[ 1446.725091]  do_one_initcall+0x9e/0x310
[ 1446.725626]  kernel_init_freeable+0xc56/0xcb9
[ 1446.726231]  kernel_init+0x11/0x198
[ 1446.726714]  ret_from_fork+0x1f/0x30
[ 1446.727212]
[ 1446.727433] Freed by task 26992:
[ 1446.727882]  kasan_save_stack+0x19/0x40
[ 1446.728420]  kasan_set_track+0x1c/0x30
[ 1446.728943]  kasan_set_free_info+0x1b/0x30
[ 1446.729517]  __kasan_slab_free+0x111/0x160
[ 1446.730084]  kfree+0xb8/0x520
[ 1446.730507]  blk_mq_free_map_and_requests+0x10b/0x1b0
[ 1446.731206]  blk_mq_realloc_hw_ctxs+0x8cb/0x15b0
[ 1446.731844]  blk_mq_init_allocated_queue+0x374/0x1380
[ 1446.732540]  blk_mq_init_queue_data+0x7f/0xd0
[ 1446.733155]  scsi_mq_alloc_queue+0x45/0x170
[ 1446.733730]  scsi_alloc_sdev+0x73c/0xb20
[ 1446.734281]  scsi_probe_and_add_lun+0x9a6/0x2d90
[ 1446.734916]  __scsi_scan_target+0x208/0xc50
[ 1446.735500]  scsi_scan_channel.part.3+0x113/0x170
[ 1446.736149]  scsi_scan_host_selected+0x25a/0x360
[ 1446.736783]  store_scan+0x290/0x2d0
[ 1446.737275]  dev_attr_store+0x55/0x80
[ 1446.737782]  sysfs_kf_write+0x132/0x190
[ 1446.738313]  kernfs_fop_write_iter+0x319/0x4b0
[ 1446.738921]  new_sync_write+0x40e/0x5c0
[ 1446.739429]  vfs_write+0x519/0x720
[ 1446.739877]  ksys_write+0xf8/0x1f0
[ 1446.740332]  do_syscall_64+0x2d/0x40
[ 1446.740802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1446.741462]
[ 1446.741670] The buggy address belongs to the object at ffff8880185afd00
[ 1446.741670]  which belongs to the cache kmalloc-256 of size 256
[ 1446.743276] The buggy address is located 16 bytes inside of
[ 1446.743276]  256-byte region [ffff8880185afd00, ffff8880185afe00)
[ 1446.744765] The buggy address belongs to the page:
[ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac
[ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0
[ 1446.747719] flags: 0x1fffff80010200(slab|head)
[ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240
[ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[ 1446.750455] page dumped because: kasan: bad access detected
[ 1446.751227]
[ 1446.751445] Memory state around the buggy address:
[ 1446.752102]  ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.753090]  ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1446.755065]                          ^
[ 1446.755589]  ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 1446.756574]  ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 1446.757566] ==================================================================

Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the
same host initializes it's queue successfully. However, if the second
device failed to allocate memory in blk_mq_alloc_and_init_hctx() from
blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(),
__blk_mq_free_map_and_rqs() will be called on error path, and if
'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed
while it's still used by the first device.

To fix this issue we move release newly allocated hardware context from
blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to
release hardware context in blk_mq_init_allocated_queue.

Fixes: 868f2f0b ("blk-mq: dynamic h/w context count")
Signed-off-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a846a8e6

05 11月, 2021 4 次提交

block: ensure cached plug request matches the current queue · 10c47870

由 Jens Axboe 提交于 11月 04, 2021

If we're driving multiple devices, we could have pre-populated the cache
for a different device. Ensure that the empty request matches the current
queue.

Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10c47870

block: move queue enter logic into blk_mq_submit_bio() · 900e0807

由 Jens Axboe 提交于 11月 03, 2021

Retain the old logic for the fops based submit, but for our internal
blk_mq_submit_bio(), move the queue entering logic into the core
function itself.

We need to be a bit careful if going into the scheduler, as a scheduler
or queue mappings can arbitrarily change before we have entered the queue.
Have the bio scheduler mapping do that separately, it's a very cheap
operation compared to actually doing merging locking and lookups.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
[axboe: update to check merge post submit_bio_checks() doing remap...]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

900e0807

block: split request allocation components into helpers · 71539717

由 Jens Axboe 提交于 11月 03, 2021

This is in preparation for a fix, but serves as a cleanup as well moving
the cached vs regular alloc logic out of blk_mq_submit_bio().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

71539717

block: have plug stored requests hold references to the queue · c5fc7b93

由 Jens Axboe 提交于 11月 03, 2021

Requests that were stored in the cache deliberately didn't hold an enter
reference to the queue, instead we grabbed one every time we pulled a
request out of there. That made for awkward logic on freeing the remainder
of the cached list, if needed, where we had to artificially raise the
queue usage count before each free.

Grab references up front for cached plug requests. That's safer, and also
more efficient.

Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5fc7b93

03 11月, 2021 2 次提交

blk-mq: update hctx->nr_active in blk_mq_end_request_batch() · 3b87c6ea

由 Ming Lei 提交于 11月 02, 2021

In case of shared tags and none io sched, batched completion still may
be run into, and hctx->nr_active is accounted when getting driver tag,
so it has to be updated in blk_mq_end_request_batch().

Otherwise, hctx->nr_active may become same with queue depth, then
hctx_may_queue() always return false, then io hang is caused.

Fixes the issue by updating the counter in batched way.
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: f794f335 ("block: add support for blk_mq_end_request_batch()")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211102153619.3627505-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b87c6ea

block: move RQF_ELV setting into allocators · 781dd830

由 Jens Axboe 提交于 11月 02, 2021

It's not safe to do this before blk_queue_enter(), as the scheduler state
could have changed in between. Hence move the RQF_ELV setting into the
allocators, where we know the queue is already entered.
Suggested-by: NMing Lei <ming.lei@redhat.com>
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Reported-by: NSteffen Maier <maier@linux.ibm.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

781dd830

02 11月, 2021 2 次提交

block: replace always false argument with 'false' · b2280909

由 Jens Axboe 提交于 11月 01, 2021

A previous commit fixed up the condition for doing direct issue, but that
left the 'from_schedule' argument dead inside the branch. Replace it with
'false'.

Fixes: ff155223 ("blk-mq: don't issue request directly in case that current is to be blocked")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2280909

block: assign correct tag before doing prefetch of request · a22c00be

由 Jens Axboe 提交于 11月 01, 2021

Ensure that current tag is correctly assigned before attempting
to prefetch the first cacheline of the request.

Fixes: 92aff191 ("block: prefetch request to be initialized")
Reported-and-tested-by: syzbot+cd20829ac44b92bf6ed0@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a22c00be

29 10月, 2021 1 次提交

block: improve readability of blk_mq_end_request_batch() · 02f7eab0

由 Jens Axboe 提交于 10月 28, 2021

It's faster and easier to read if we tolerate cur_hctx being NULL in
the "when to flush" condition. Rename last_hctx to cur_hctx while at it,
as it better describes the role of that variable.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02f7eab0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功