提交 · ad740780bbc2fe37856f944dbbaff07aac9db9e3 · openeuler / Kernel

07 3月, 2022 2 次提交

block: remove handle_bad_sector · ad740780

由 Christoph Hellwig 提交于 3月 04, 2022

Use the %pg format specifier instead of the stack hungry bdevname
function, and remove handle_bad_sector given that it is not pointless.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ad740780

block: fix and cleanup bio_check_ro · 57e95e46

由 Christoph Hellwig 提交于 3月 04, 2022

Don't use a WARN_ON when printing a potentially user triggered
condition.  Also don't print the partno when the block device name
already includes it, and use the %pg specifier to simplify printing
the block device name.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

57e95e46

17 2月, 2022 5 次提交

block: merge submit_bio_checks() into submit_bio_noacct · d24c670e

由 Ming Lei 提交于 2月 16, 2022

Now submit_bio_checks() is only called by submit_bio_noacct(), so merge
it into submit_bio_noacct().
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-6-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d24c670e

block: don't check bio in blk_throtl_dispatch_work_fn · 3f98c753

由 Ming Lei 提交于 2月 16, 2022

The bio has been checked already before throttling, so no need to check
it again before dispatching it from throttle queue.

Add a helper of submit_bio_noacct_nocheck() for this purpose.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220216044514.2903784-5-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3f98c753

block: don't declare submit_bio_checks in local header · 29ff2362

由 Ming Lei 提交于 2月 16, 2022

submit_bio_checks() won't be called outside of block/blk-core.c any more
since commit 9d497e29 ("block: don't protect submit_bio_checks by
q_usage_counter"), so mark it as one local helper.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

29ff2362

block: move blk_crypto_bio_prep() out of blk-mq.c · 7f36b7d0

由 Ming Lei 提交于 2月 16, 2022

blk_crypto_bio_prep() is called for both bio based and blk-mq drivers,
so move it out of blk-mq.c, then we can unify this kind of handling.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

7f36b7d0

block: move submit_bio_checks() into submit_bio_noacct · a650628b

由 Ming Lei 提交于 2月 16, 2022

It is more clean & readable to check bio when starting to submit it,
instead of just before calling ->submit_bio() or blk_mq_submit_bio().

Also it provides us chance to optimize bio submission without checking
bio.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a650628b

12 2月, 2022 2 次提交

block: partition include/linux/blk-cgroup.h · 672fdcf0

由 Ming Lei 提交于 2月 11, 2022

Partition include/linux/blk-cgroup.h into two parts: one is public part,
the other is block layer private part.

Suggested by Christoph Hellwig.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211101149.2368042-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

672fdcf0

block: move initialization of q->blkg_list into blkcg_init_queue · 472e4314

由 Ming Lei 提交于 2月 11, 2022

q->blkg_list is only used by blkcg code, so move it into
blkcg_init_queue.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220211101149.2368042-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

472e4314

04 2月, 2022 2 次提交

block: return -ENODEV for BLK_STS_OFFLINE · 7d32c027

由 Song Liu 提交于 2月 03, 2022

Change the user visible return value for BLK_STS_OFFLINE to -ENODEV, which
is more descriptive than existing -EIO.
Signed-off-by: NSong Liu <song@kernel.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220203192827.1370270-3-song@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

7d32c027

block: introduce BLK_STS_OFFLINE · 2651bf68

由 Song Liu 提交于 2月 03, 2022

Currently, drivers reports BLK_STS_IOERR for devices that are not full
online or being removed. This behavior could cause confusion for users,
as they are not really I/O errors from the device.

Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
offline error" in dmesg instead of "I/O error".

EIO is intentionally kept to not change user visible return value.
Signed-off-by: NSong Liu <song@kernel.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220203192827.1370270-2-song@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

2651bf68

02 2月, 2022 1 次提交

block: check that there is a plug in blk_flush_plug · aa8dccca

由 Christoph Hellwig 提交于 1月 27, 2022

Rename blk_flush_plug to __blk_flush_plug and add a wrapper that includes
the NULL check instead of open coding that check everywhere.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220127070549.1377856-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

aa8dccca

29 1月, 2022 1 次提交

block: add bio_start_io_acct_time() to control start_time · e45c47d1

由 Mike Snitzer 提交于 1月 28, 2022

bio_start_io_acct_time() interface is like bio_start_io_acct() that
allows start_time to be passed in. This gives drivers the ability to
defer starting accounting until after IO is issued (but possibily not
entirely due to bio splitting).
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-2-snitzer@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e45c47d1

10 1月, 2022 1 次提交

block: don't protect submit_bio_checks by q_usage_counter · 9d497e29

由 Ming Lei 提交于 1月 04, 2022

Commit cc9c884d ("block: call submit_bio_checks under q_usage_counter")
uses q_usage_counter to protect submit_bio_checks for avoiding IO after
disk is deleted by del_gendisk().

Turns out the protection isn't necessary, because once
blk_mq_freeze_queue_wait() in del_gendisk() returns:

1) all in-flight IO has been done

2) all new IO will be failed in __bio_queue_enter() because
   q_usage_counter is dead, and GD_DEAD is set

3) both disk and request queue instance are safe since caller of
submit_bio() guarantees that the disk can't be closed.

Once submit_bio_checks() needn't the protection of q_usage_counter, we can
move submit_bio_checks before calling blk_mq_submit_bio() and
->submit_bio(). With this change, we needn't to throttle queue with
holding one allocated request, then precise driver tag or request won't be
wasted in throttling. Meantime we can unify the bio check for both bio
based and request based driver.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220104134223.590803-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9d497e29

19 12月, 2021 1 次提交

Revert "block: reduce kblockd_mod_delayed_work_on() CPU consumption" · 87959fa1

由 Jens Axboe 提交于 12月 19, 2021

This reverts commit cb2ac291.

Alex and the kernel test robot report that this causes a significant
performance regression with BFQ. I can reproduce that result, so let's
revert this one as we're close to -rc6 and we there's no point in trying
to rush a fix.

Link: https://lore.kernel.org/linux-block/1639853092.524jxfaem2.none@localhost/
Link: https://lore.kernel.org/lkml/20211219141852.GH14057@xsang-OptiPlex-9020/Reported-by: NAlex Xu (Hello71) <alex_y_xu@yahoo.ca>
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

87959fa1

15 12月, 2021 1 次提交

block: reduce kblockd_mod_delayed_work_on() CPU consumption · cb2ac291

由 Jens Axboe 提交于 12月 14, 2021

Dexuan reports that he's seeing spikes of very heavy CPU utilization when
running 24 disks and using the 'none' scheduler. This happens off the
sched restart path, because SCSI requires the queue to be restarted async,
and hence we're hammering on mod_delayed_work_on() to ensure that the work
item gets run appropriately.

Avoid hammering on the timer and just use queue_work_on() if no delay
has been specified.
Reported-and-tested-by: NDexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@BYAPR21MB1270.namprd21.prod.outlook.com/Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cb2ac291

04 12月, 2021 1 次提交

blk-mq: move srcu from blk_mq_hw_ctx to request_queue · 704b914f

由 Ming Lei 提交于 12月 03, 2021

In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch
critical area. However, this srcu instance stays at the end of hctx, and
it often takes standalone cacheline, often cold.

Inside srcu_read_lock() and srcu_read_unlock(), WRITE is always done on
the indirect percpu variable which is allocated from heap instead of
being embedded, srcu->srcu_idx is read only in srcu_read_lock(). It
doesn't matter if srcu structure stays in hctx or request queue.

So switch to per-request-queue srcu for protecting dispatch, and this
way simplifies quiesce a lot, not mention quiesce is always done on the
request queue wide.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203131534.3668411-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

704b914f

29 11月, 2021 12 次提交

block: don't include <linux/part_stat.h> in blk.h · 82d981d4

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

82d981d4

block: don't include blk-mq-sched.h in blk.h · 2aa7745b

由 Christoph Hellwig 提交于 11月 23, 2021

No needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

2aa7745b

block: move io_context creation into where it's needed · 5a9d041b

由 Jens Axboe 提交于 11月 13, 2021

The only user of the io_context for IO is BFQ, yet we put the checking
and logic of it into the normal IO path.

Put the creation into blk_mq_sched_assign_ioc(), and have BFQ use that
helper.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a9d041b

block: don't include blk-mq headers in blk-core.c · d9337a42

由 Christoph Hellwig 提交于 11月 17, 2021

All request based code is in the blk-mq files now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d9337a42

block: move blk_print_req_error to blk-mq.c · 0d7a29a2

由 Christoph Hellwig 提交于 11月 17, 2021

This function is only used by the request completion path. Factor out
a blk_status_to_str to keep blk_errors private in blk-core.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0d7a29a2

block: move blk_dump_rq_flags to blk-mq.c · 22350ad7

由 Christoph Hellwig 提交于 11月 17, 2021

blk_dump_rq_flags deals with a request, so move it to blk-mq.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

22350ad7

block: move blk_account_io_{start,done} to blk-mq.c · 450b7879

由 Christoph Hellwig 提交于 11月 17, 2021

These are only used for request based I/O, so move them where they are
used.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

450b7879

block: move blk_steal_bios to blk-mq.c · f2b8f3ce

由 Christoph Hellwig 提交于 11月 17, 2021

Keep all the request based code together.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f2b8f3ce

block: move blk_rq_init to blk-mq.c · 52fdbbcc

由 Christoph Hellwig 提交于 11月 17, 2021

blk_rq_init deals with a request structure, so move it to blk-mq.c
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

52fdbbcc

block: move request based cloning helpers to blk-mq.c · 06c8c691

由 Christoph Hellwig 提交于 11月 17, 2021

Keep all the request based code together.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

06c8c691

block: remove rq_flush_dcache_pages · 786d4e01

由 Christoph Hellwig 提交于 11月 17, 2021

This function is trivial, and flush_dcache_page is always defined, so
just open code it in the 2.5 callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

786d4e01

block: move blk_rq_err_bytes to scsi · 79478bf9

由 Christoph Hellwig 提交于 11月 17, 2021

blk_rq_err_bytes is only used by the scsi midlayer, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20211117061404.331732-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79478bf9

26 11月, 2021 1 次提交

block: fix parameter not described warning · e30028ac

由 Yang Guang 提交于 11月 26, 2021

The build warning:
block/blk-core.c:968: warning: Function parameter or member 'iob'
not described in 'bio_poll'.

Fixes: 5a72e899 ("block: add a struct io_comp_batch argument to fops->iopoll()")
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NYang Guang <yang.guang5@zte.com.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e30028ac

16 11月, 2021 1 次提交

blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() · 2a19b28f

由 Ming Lei 提交于 11月 16, 2021

For avoiding to slow down queue destroy, we don't call
blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
cancel dispatch work in blk_release_queue().

However, this way has caused kernel oops[1], reported by Changhui. The log
shows that scsi_device can be freed before running blk_release_queue(),
which is expected too since scsi_device is released after the scsi disk
is closed and the scsi_device is removed.

Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
and disk_release():

1) when disk_release() is run, the disk has been closed, and any sync
dispatch activities have been done, so canceling dispatch work is enough to
quiesce filesystem I/O dispatch activity.

2) in blk_cleanup_queue(), we only focus on passthrough request, and
passthrough request is always explicitly allocated & freed by
its caller, so once queue is frozen, all sync dispatch activity
for passthrough request has been done, then it is enough to just cancel
dispatch work for avoiding any dispatch activity.

[1] kernel panic log
[12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
[12622.777186] #PF: supervisor read access in kernel mode
[12622.782918] #PF: error_code(0x0000) - not-present page
[12622.788649] PGD 0 P4D 0
[12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
[12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
[12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[12622.813321] Workqueue: kblockd blk_mq_run_work_fn
[12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
[12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 <48> 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
[12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
[12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
[12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
[12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
[12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
[12622.889926] FS:  0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
[12622.898956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
[12622.913328] Call Trace:
[12622.916055]  <TASK>
[12622.918394]  scsi_mq_get_budget+0x1a/0x110
[12622.922969]  __blk_mq_do_dispatch_sched+0x1d4/0x320
[12622.928404]  ? pick_next_task_fair+0x39/0x390
[12622.933268]  __blk_mq_sched_dispatch_requests+0xf4/0x140
[12622.939194]  blk_mq_sched_dispatch_requests+0x30/0x60
[12622.944829]  __blk_mq_run_hw_queue+0x30/0xa0
[12622.949593]  process_one_work+0x1e8/0x3c0
[12622.954059]  worker_thread+0x50/0x3b0
[12622.958144]  ? rescuer_thread+0x370/0x370
[12622.962616]  kthread+0x158/0x180
[12622.966218]  ? set_kthread_struct+0x40/0x40
[12622.970884]  ret_from_fork+0x22/0x30
[12622.974875]  </TASK>
[12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
Reported-by: NChanghuiZhong <czhong@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2a19b28f

12 11月, 2021 1 次提交

blkcg: Remove extra blkcg_bio_issue_init · b781d8db

由 Laibin Qiu 提交于 11月 12, 2021

KASAN reports a use-after-free report when doing block test:

==================================================================
[10050.967049] BUG: KASAN: use-after-free in
submit_bio_checks+0x1539/0x1550

[10050.977638] Call Trace:
[10050.978190]  dump_stack+0x9b/0xce
[10050.979674]  print_address_description.constprop.6+0x3e/0x60
[10050.983510]  kasan_report.cold.9+0x22/0x3a
[10050.986089]  submit_bio_checks+0x1539/0x1550
[10050.989576]  submit_bio_noacct+0x83/0xc80
[10050.993714]  submit_bio+0xa7/0x330
[10050.994435]  mpage_readahead+0x380/0x500
[10050.998009]  read_pages+0x1c1/0xbf0
[10051.002057]  page_cache_ra_unbounded+0x4c2/0x6f0
[10051.007413]  do_page_cache_ra+0xda/0x110
[10051.008207]  force_page_cache_ra+0x23d/0x3d0
[10051.009087]  page_cache_sync_ra+0xca/0x300
[10051.009970]  generic_file_buffered_read+0xbea/0x2130
[10051.012685]  generic_file_read_iter+0x315/0x490
[10051.014472]  blkdev_read_iter+0x113/0x1b0
[10051.015300]  aio_read+0x2ad/0x450
[10051.023786]  io_submit_one+0xc8e/0x1d60
[10051.029855]  __se_sys_io_submit+0x125/0x350
[10051.033442]  do_syscall_64+0x2d/0x40
[10051.034156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[10051.048733] Allocated by task 18598:
[10051.049482]  kasan_save_stack+0x19/0x40
[10051.050263]  __kasan_kmalloc.constprop.1+0xc1/0xd0
[10051.051230]  kmem_cache_alloc+0x146/0x440
[10051.052060]  mempool_alloc+0x125/0x2f0
[10051.052818]  bio_alloc_bioset+0x353/0x590
[10051.053658]  mpage_alloc+0x3b/0x240
[10051.054382]  do_mpage_readpage+0xddf/0x1ef0
[10051.055250]  mpage_readahead+0x264/0x500
[10051.056060]  read_pages+0x1c1/0xbf0
[10051.056758]  page_cache_ra_unbounded+0x4c2/0x6f0
[10051.057702]  do_page_cache_ra+0xda/0x110
[10051.058511]  force_page_cache_ra+0x23d/0x3d0
[10051.059373]  page_cache_sync_ra+0xca/0x300
[10051.060198]  generic_file_buffered_read+0xbea/0x2130
[10051.061195]  generic_file_read_iter+0x315/0x490
[10051.062189]  blkdev_read_iter+0x113/0x1b0
[10051.063015]  aio_read+0x2ad/0x450
[10051.063686]  io_submit_one+0xc8e/0x1d60
[10051.064467]  __se_sys_io_submit+0x125/0x350
[10051.065318]  do_syscall_64+0x2d/0x40
[10051.066082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[10051.067455] Freed by task 13307:
[10051.068136]  kasan_save_stack+0x19/0x40
[10051.068931]  kasan_set_track+0x1c/0x30
[10051.069726]  kasan_set_free_info+0x1b/0x30
[10051.070621]  __kasan_slab_free+0x111/0x160
[10051.071480]  kmem_cache_free+0x94/0x460
[10051.072256]  mempool_free+0xd6/0x320
[10051.072985]  bio_free+0xe0/0x130
[10051.073630]  bio_put+0xab/0xe0
[10051.074252]  bio_endio+0x3a6/0x5d0
[10051.074984]  blk_update_request+0x590/0x1370
[10051.075870]  scsi_end_request+0x7d/0x400
[10051.076667]  scsi_io_completion+0x1aa/0xe50
[10051.077503]  scsi_softirq_done+0x11b/0x240
[10051.078344]  blk_mq_complete_request+0xd4/0x120
[10051.079275]  scsi_mq_done+0xf0/0x200
[10051.080036]  virtscsi_vq_done+0xbc/0x150
[10051.080850]  vring_interrupt+0x179/0x390
[10051.081650]  __handle_irq_event_percpu+0xf7/0x490
[10051.082626]  handle_irq_event_percpu+0x7b/0x160
[10051.083527]  handle_irq_event+0xcc/0x170
[10051.084297]  handle_edge_irq+0x215/0xb20
[10051.085122]  asm_call_irq_on_stack+0xf/0x20
[10051.085986]  common_interrupt+0xae/0x120
[10051.086830]  asm_common_interrupt+0x1e/0x40

==================================================================

Bio will be checked at beginning of submit_bio_noacct(). If bio needs
to be throttled, it will start the timer and stop submit bio directly.
Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
But in the current process, if bio is throttled, it will still set bio
issue->value by blkcg_bio_issue_init(). This is redundant and may cause
the above use-after-free.

CPU0                                   CPU1
submit_bio
submit_bio_noacct
  submit_bio_checks
    blk_throtl_bio()
      <=mod_timer(&sq->pending_timer
                                      blk_throtl_dispatch_work_fn
                                        submit_bio_noacct() <= bio have
                                        throttle tag, will throw directly
                                        and bio issue->value will be set
                                        here

                                      bio_endio()
                                      bio_put()
                                      bio_free() <= free this bio

    blkcg_bio_issue_init(bio)
      <= bio has been freed and
      will lead to UAF
  return BLK_QC_T_NONE

Fix this by remove extra blkcg_bio_issue_init.

Fixes: e439bedf (blkcg: consolidate bio_issue_init() to be a part of core)
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.comReviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b781d8db

05 11月, 2021 3 次提交

block: move queue enter logic into blk_mq_submit_bio() · 900e0807

由 Jens Axboe 提交于 11月 03, 2021

Retain the old logic for the fops based submit, but for our internal
blk_mq_submit_bio(), move the queue entering logic into the core
function itself.

We need to be a bit careful if going into the scheduler, as a scheduler
or queue mappings can arbitrarily change before we have entered the queue.
Have the bio scheduler mapping do that separately, it's a very cheap
operation compared to actually doing merging locking and lookups.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
[axboe: update to check merge post submit_bio_checks() doing remap...]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

900e0807

block: make bio_queue_enter() fast-path available inline · c98cb5bb

由 Jens Axboe 提交于 11月 04, 2021

Just a prep patch for shifting the queue enter logic. This moves the
expected fast path inline, and leaves __bio_queue_enter() as an
out-of-line function call. We don't want to inline the latter, as it's
mostly slow path code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c98cb5bb

block: have plug stored requests hold references to the queue · c5fc7b93

由 Jens Axboe 提交于 11月 03, 2021

Requests that were stored in the cache deliberately didn't hold an enter
reference to the queue, instead we grabbed one every time we pulled a
request out of there. That made for awkward logic on freeing the remainder
of the cached list, if needed, where we had to artificially raise the
queue usage count before each free.

Grab references up front for cached plug requests. That's safer, and also
more efficient.

Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5fc7b93

29 10月, 2021 1 次提交

block: remove blk_{get,put}_request · 0bf6d96c

由 Christoph Hellwig 提交于 10月 25, 2021

These are now pointless wrappers around blk_mq_{alloc,free}_request,
so remove them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0bf6d96c

22 10月, 2021 1 次提交

block: remove the initialize_rq_fn blk_mq_ops method · 4abafdc4

由 Christoph Hellwig 提交于 10月 21, 2021

Entirely unused now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20211021060607.264371-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

4abafdc4

21 10月, 2021 1 次提交

block: kill extra rcu lock/unlock in queue enter · e94f6852

由 Pavel Begunkov 提交于 10月 21, 2021

blk_try_enter_queue() already takes rcu_read_lock/unlock, so we can
avoid the second pair in percpu_ref_tryget_live(), use a newly added
percpu_ref_tryget_live_rcu().

As rcu_read_lock/unlock imply barrier()s, it's pretty noticeable,
especially for for !CONFIG_PREEMPT_RCU (default for some distributions),
where __rcu_read_lock/unlock() are not inlined.

3.20% io_uring [kernel.vmlinux] [k] __rcu_read_unlock
3.05% io_uring [kernel.vmlinux] [k] __rcu_read_lock

2.52% io_uring [kernel.vmlinux] [k] __rcu_read_unlock
2.28% io_uring [kernel.vmlinux] [k] __rcu_read_lock
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6b11c67ea495ed9d44f067622d852de4a510ce65.1634822969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

e94f6852

20 10月, 2021 2 次提交

block: cleanup the flush plug helpers · 008f75a2

由 Christoph Hellwig 提交于 10月 20, 2021

Consolidate the various helpers into a single blk_flush_plug helper that
takes a plk_plug and the from_scheduler bool and switch all callsites to
call it directly. Checks that the plug is non-NULL must be performed by
the caller, something that most already do anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

008f75a2

block: optimise blk_flush_plug_list · b600455d

由 Pavel Begunkov 提交于 10月 20, 2021

Don't call flush_plug_callbacks if there are no plug callbacks.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211020144119.142582-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b600455d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功