提交 · 4b6a5d9cea911424e84107df8c4eb8317938d2cd · openeuler / Kernel

30 9月, 2022 1 次提交

block: enable batched allocation for blk_mq_alloc_request() · 4b6a5d9c

由 Jens Axboe 提交于 9月 21, 2022

The filesystem IO path can take advantage of allocating batches of
requests, if the underlying submitter tells the block layer about it
through the blk_plug. For passthrough IO, the exported API is the
blk_mq_alloc_request() helper, and that one does not allow for
request caching.

Wire up request caching for blk_mq_alloc_request(), which is generally
done without having a bio available upfront.
Tested-by: NAnuj Gupta <anuj20.g@samsung.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b6a5d9c

29 9月, 2022 1 次提交

block: add rationale for not using blk_mq_plug() when applicable · 110fdb44

由 Pankaj Raghav 提交于 9月 29, 2022

There are two places in the block layer at the moment where
blk_mq_plug() helper could be used instead of directly accessing the
plug from struct current. In both these cases, directly accessing the plug
should not have any consequences for zoned devices.

Make the intent explicit by adding comments instead of introducing unwanted
checks with blk_mq_plug() helper.[1]

[1] https://lore.kernel.org/linux-block/f6e54907-1035-2b2c-6387-ed178be05ccb@kernel.dk/Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
Suggested-by: NJens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220929144141.140077-1-p.raghav@samsung.com
[axboe: fixup multi-line comment style]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

110fdb44

27 9月, 2022 1 次提交

blk-mq: use quiesced elevator switch when reinitializing queues · 8237c01f

由 Keith Busch 提交于 9月 27, 2022

The hctx's run_work may be racing with the elevator switch when
reinitializing hardware queues. The queue is merely frozen in this
context, but that only prevents requests from allocating and doesn't
stop the hctx work from running. The work may get an elevator pointer
that's being torn down, and can result in use-after-free errors and
kernel panics (example below). Use the quiesced elevator switch instead,
and make the previous one static since it is now only used locally.

  nvme nvme0: resetting controller
  nvme nvme0: 32/0/0 default/read/poll queues
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 80000020c8861067 P4D 80000020c8861067 PUD 250f8c8067 PMD 0
  Oops: 0000 [#1] SMP PTI
  Workqueue: kblockd blk_mq_run_work_fn
  RIP: 0010:kyber_has_work+0x29/0x70

...

  Call Trace:
   __blk_mq_do_dispatch_sched+0x83/0x2b0
   __blk_mq_sched_dispatch_requests+0x12e/0x170
   blk_mq_sched_dispatch_requests+0x30/0x60
   __blk_mq_run_hw_queue+0x2b/0x50
   process_one_work+0x1ef/0x380
   worker_thread+0x2d/0x3e0
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220927155652.3260724-1-kbusch@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8237c01f

24 9月, 2022 1 次提交

blk-mq: don't redirect completion for hctx withs only one ctx mapping · f168420c

由 Liu Song 提交于 9月 21, 2022

High-performance NVMe devices usually support a large hw queues, which
ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: NLiu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1663731123-81536-1-git-send-email-liusong@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f168420c

22 9月, 2022 1 次提交

block: export blk_rq_is_poll · c6e99ea4

由 Kanchan Joshi 提交于 8月 23, 2022

This is in preparation to support iopoll for nvme passthrough.
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Link: https://lore.kernel.org/r/20220823161443.49436-4-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c6e99ea4

06 9月, 2022 1 次提交

blk-mq: remove unneeded needs_restart check · 6d5e8d21

由 Miaohe Lin 提交于 9月 05, 2022

If code reaches here, needs_restart must be true. Remove this unneeded
needs_restart check. No functional change intended.
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/20220905101950.4606-1-linmiaohe@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6d5e8d21

23 8月, 2022 1 次提交

block: Change the return type of blk_mq_map_queues() into void · a4e1d0b7

由 Bart Van Assche 提交于 8月 15, 2022

Since blk_mq_map_queues() and the .map_queues() callbacks always return 0,
change their return type into void. Most callers ignore the returned value
anyway.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Acked-by: NMd Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org
[axboe: fold in fix from Bart]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a4e1d0b7

20 8月, 2022 1 次提交

blk-mq: fix io hung due to missing commit_rqs · 65fac0d5

由 Yu Kuai 提交于 7月 26, 2022

Currently, in virtio_scsi, if 'bd->last' is not set to true while
dispatching request, such io will stay in driver's queue, and driver
will wait for block layer to dispatch more rqs. However, if block
layer failed to dispatch more rq, it should trigger commit_rqs to
inform driver.

There is a problem in blk_mq_try_issue_list_directly() that commit_rqs
won't be called:

// assume that queue_depth is set to 1, list contains two rq
blk_mq_try_issue_list_directly
 blk_mq_request_issue_directly
 // dispatch first rq
 // last is false
  __blk_mq_try_issue_directly
   blk_mq_get_dispatch_budget
   // succeed to get first budget
   __blk_mq_issue_directly
    scsi_queue_rq
     cmd->flags |= SCMD_LAST
      virtscsi_queuecommand
       kick = (sc->flags & SCMD_LAST) != 0
       // kick is false, first rq won't issue to disk
 queued++

 blk_mq_request_issue_directly
 // dispatch second rq
  __blk_mq_try_issue_directly
   blk_mq_get_dispatch_budget
   // failed to get second budget
 ret == BLK_STS_RESOURCE
  blk_mq_request_bypass_insert
 // errors is still 0

 if (!list_empty(list) || errors && ...)
  // won't pass, commit_rqs won't be called

In this situation, first rq relied on second rq to dispatch, while
second rq relied on first rq to complete, thus they will both hung.

Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since
it means that request is not queued successfully.

Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE'
can't be treated as 'errors' here, fix the problem by calling
commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'.

Fixes: d666ba98 ("blk-mq: add mq_ops->commit_rqs()")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

65fac0d5

18 8月, 2022 2 次提交

blk-mq: run queue no matter whether the request is the last request · d3b38596

由 Yufen Yu 提交于 8月 03, 2022

We do test on a virtio scsi device (/dev/sda) and the default mq
scheduler is 'none'. We found a IO hung as following:

blk_finish_plug
  blk_mq_plug_issue_direct
      scsi_mq_get_budget
      //get budget_token fail and sdev->restarts=1

			     	 scsi_end_request
				   scsi_run_queue_async
                                   //sdev->restart=0 and run queue

     blk_mq_request_bypass_insert
        //add request to hctx->dispatch list

  //continue to dispath plug list
  blk_mq_dispatch_plug_list
      blk_mq_try_issue_list_directly
        //success issue all requests from plug list

After .get_budget fail, scsi_mq_get_budget will increase 'restarts'.
Normally, it will run hw queue when io complete and set 'restarts'
as 0. But if we run queue before adding request to the dispatch list
and blk_mq_dispatch_plug_list also success issue all requests, then
on one will run queue, and the request will be stall in the dispatch
list and cannot complete forever.

It is wrong to use last request of plug list to decide if run queue is
needed since all the remained requests in plug list may be from other
hctxs. To fix the bug, pass run_queue as true always to
blk_mq_request_bypass_insert().
Fix-suggested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Fixes: dc5fc361 ("block: attempt direct issue of plug list")
Link: https://lore.kernel.org/r/20220803023355.3687360-1-yuyufen@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d3b38596

blk-mq: remove unused function blk_mq_queue_stopped() · a8239f03

由 Yu Kuai 提交于 8月 18, 2022

blk_mq_queue_stopped() doesn't have any caller, which was found by
code coverage test, thus remove it.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220818063555.3741222-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8239f03

03 8月, 2022 3 次提交

block: pass struct queue_limits to the bio splitting helpers · c55ddd90

由 Christoph Hellwig 提交于 7月 27, 2022

Allow using the splitting helpers on just a queue_limits instead of
a full request_queue structure. This will eventually allow file systems
or remapping drivers to split REQ_OP_ZONE_APPEND bios based on limits
calculated as the minimum common capabilities over multiple devices.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220727162300.3089193-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c55ddd90

block: change the blk_queue_bounce calling convention · 51d798cd

由 Christoph Hellwig 提交于 7月 27, 2022

The double indirect bio leads to somewhat suboptimal code generation.
Instead return the (original or split) bio, and make sure the
request_queue arguments to the lower level helpers is passed after the
bio to avoid constant reshuffling of the argument passing registers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220727162300.3089193-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

51d798cd

block: change the blk_queue_split calling convention · 5a97806f

由 Christoph Hellwig 提交于 7月 27, 2022

Also give it and the helpers used to implement it more descriptive names.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

5a97806f

22 7月, 2022 1 次提交

blk-mq: fix error handling in __blk_mq_alloc_disk · 0a3e5cc7

由 Christoph Hellwig 提交于 7月 20, 2022

To fully clean up the queue if the disk allocation fails we need to
call blk_mq_destroy_queue and not just blk_put_queue.

Fixes: 6f8191fd ("block: simplify disk shutdown")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220720130541.1323531-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0a3e5cc7

15 7月, 2022 1 次提交

block: Use the new blk_opf_t type · 16458cf3

由 Bart Van Assche 提交于 7月 14, 2022

Use the new blk_opf_t type for arguments and variables that represent
request flags or a bitwise combination of a request operation and
request flags. Rename the function arguments and also a structure member
that hold a request operation and flags from 'rw' into 'opf'.

This patch does not change any functionality.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

16458cf3

06 7月, 2022 4 次提交

block: simplify blk_mq_plug · 6deacb3b

由 Christoph Hellwig 提交于 7月 06, 2022

Drop the unused q argument, and invert the check to move the exception
into a branch and the regular path as the normal return.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220706070350.1703384-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6deacb3b

blk-mq: Drop 'reserved' arg of busy_tag_iter_fn · 2dd6532e

由 John Garry 提交于 7月 06, 2022

We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter
function so it may be dropped.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2dd6532e

blk-mq: Drop blk_mq_ops.timeout 'reserved' arg · 9bdb4833

由 John Garry 提交于 7月 06, 2022

With new API blk_mq_is_reserved_rq() we can tell if a request is from
the reserved pool, so stop passing 'reserved' arg. There is actually
only a single user of that arg for all the callback implementations, which
can use blk_mq_is_reserved_rq() instead.

This will also allow us to stop passing the same 'reserved' around the
blk-mq iter functions next.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9bdb4833

blk-mq: Add a flag for reserved requests · 99e48cd6

由 John Garry 提交于 7月 06, 2022

Add a flag for reserved requests so that drivers may know this for any
special handling.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/1657109034-206040-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

99e48cd6

29 6月, 2022 1 次提交

blk-mq: rename blk_mq_sysfs_{,un}register · eaa870f9

由 Christoph Hellwig 提交于 6月 28, 2022

Add a _hctx postfix to better describe what the functions do, match
the debugfs equivalents and release the old names for functions that
should be using them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

eaa870f9

28 6月, 2022 1 次提交

block: simplify disk shutdown · 6f8191fd

由 Christoph Hellwig 提交于 6月 19, 2022

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6f8191fd

27 6月, 2022 5 次提交

block: Always initialize bio IO priority on submit · a78418e6

由 Jan Kara 提交于 6月 23, 2022

Currently, IO priority set in task's IO context is not reflected in the
bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
results in odd results where process is submitting some bios with one
priority and other bios with a different (unset) priority and due to
differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
always set on bio submission.
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623074840.5960-9-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

a78418e6

block: Initialize bio priority earlier · 9c6227e0

由 Jan Kara 提交于 6月 23, 2022

Bio's IO priority needs to be initialized before we try to merge the bio
with other bios. Otherwise we could merge bios which would otherwise
receive different IO priorities leading to possible QoS issues.
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623074840.5960-8-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

9c6227e0

blk-ioprio: Convert from rqos policy to direct call · 82b74cac

由 Jan Kara 提交于 6月 23, 2022

Convert blk-ioprio handling from a rqos policy to a direct call from
blk_mq_submit_bio(). Firstly, blk-ioprio is not much of a rqos policy
anyway, it just needs a hook in bio submission path to set the bio's IO
priority. Secondly, the rqos .track hook gets actually called too late
for blk-ioprio purposes and introducing a special rqos hook just for
blk-ioprio looks even weirder.
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623074840.5960-7-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

82b74cac

blk-mq: Don't disable preemption around __blk_mq_run_hw_queue(). · 3c8f9da4

由 Sebastian Andrzej Siewior 提交于 6月 22, 2022

__blk_mq_delay_run_hw_queue() disables preemption to get a stable
current CPU number and then invokes __blk_mq_run_hw_queue() if the CPU
number is part the mask.

__blk_mq_run_hw_queue() acquires a spin_lock_t which is a sleeping lock
on PREEMPT_RT and can't be acquired with disabled preemption.

It is not required for correctness to invoke __blk_mq_run_hw_queue() on
a CPU matching hctx->cpumask. Both (async and direct requests) can run
on a CPU not matching hctx->cpumask.

The CPU mask without disabling preemption and invoking
__blk_mq_run_hw_queue().
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/YrLSEiNvagKJaDs5@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3c8f9da4

block: Make blk_mq_get_sq_hctx() select the proper hardware queue type · 51ab80f0

由 Bart Van Assche 提交于 6月 15, 2022

Since the introduction of blk_mq_get_hctx_type() the operation type in
the second argument of blk_mq_get_hctx_type() matters. The introduction
of blk_mq_get_hctx_type() caused blk_mq_get_sq_hctx() to select a
hardware queue of type HCTX_TYPE_READ instead of HCTX_TYPE_DEFAULT.
Switch to hardware queue type HCTX_TYPE_DEFAULT since HCTX_TYPE_READ
should only be used for read requests.

Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220615225549.1054905-4-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

51ab80f0

22 6月, 2022 1 次提交

block: pop cached rq before potentially blocking rq_qos_throttle() · 2645672f

由 Jens Axboe 提交于 6月 21, 2022

If rq_qos_throttle() ends up blocking, then we will have invalidated and
flushed our current plug. Since blk_mq_get_cached_request() hasn't
popped the cached request off the plug list just yet, we end holding a
pointer to a request that is no longer valid. This insta-crashes with
rq->mq_hctx being NULL in the validity checks just after.

Pop the request off the cached list before doing rq_qos_throttle() to
avoid using a potentially stale request.

Fixes: 0a5aa8d1 ("block: fix blk_mq_attempt_bio_merge and rq_qos_throttle protection")
Reported-by: NDylan Yudaken <dylany@fb.com>
Tested-by: NDylan Yudaken <dylany@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2645672f

17 6月, 2022 4 次提交

blk-mq: don't clear flush_rq from tags->rqs[] · 6cfeadbf

由 Ming Lei 提交于 6月 16, 2022

commit 364b6181 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.

Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.

Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220616014401.817001-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6cfeadbf

blk-mq: avoid to touch q->elevator without any protection · 4d337ceb

由 Ming Lei 提交于 6月 16, 2022

q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.

Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch. Meantime the elevator feature flag
of ELEVATOR_F_MQ_AWARE isn't needed any more.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

4d337ceb

blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none · 5fd7a84a

由 Ming Lei 提交于 6月 16, 2022

elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220616014401.817001-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5fd7a84a

block: Fix handling of offline queues in blk_mq_alloc_request_hctx() · 14dc7a18

由 Bart Van Assche 提交于 6月 15, 2022

This patch prevents that test nvme/004 triggers the following:

UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9
index 512 is out of range for type 'long unsigned int [512]'
Call Trace:
 show_stack+0x52/0x58
 dump_stack_lvl+0x49/0x5e
 dump_stack+0x10/0x12
 ubsan_epilogue+0x9/0x3b
 __ubsan_handle_out_of_bounds.cold+0x44/0x49
 blk_mq_alloc_request_hctx+0x304/0x310
 __nvme_submit_sync_cmd+0x70/0x200 [nvme_core]
 nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics]
 nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop]
 nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop]
 nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics]
 nvmf_dev_write+0xae/0x111 [nvme_fabrics]
 vfs_write+0x144/0x560
 ksys_write+0xb7/0x140
 __x64_sys_write+0x42/0x50
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 20e4d813 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220615210004.1031820-1-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

14dc7a18

30 5月, 2022 1 次提交

blk-mq: do not update io_ticks with passthrough requests · b81c14ca

由 Haisu Wang 提交于 5月 30, 2022

Flush or passthrough requests are not accounted as normal IO in completion.
To reflect iostat for slow IO, io_ticks is updated when stat show called
based on inflight numbers.
It may cause inconsistent io_ticks calculation result.

So do not account non-passthrough request when check inflight.

Fixes: 86d73312 ("block: update io_ticks when io hang")
Signed-off-by: NHaisu Wang <haisuwang@tencent.com>
Reviewed-by: Nsamuelliao <samuelliao@tencent.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220530064059.1120058-1-haisuwang@tencent.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b81c14ca

28 5月, 2022 3 次提交

blk-mq: remove the done argument to blk_execute_rq_nowait · e2e53086

由 Christoph Hellwig 提交于 5月 24, 2022

Let the caller set it together with the end_io_data instead of passing
a pointless argument. Note the the target code did in fact already
set it and then just overrode it again by calling blk_execute_rq_nowait.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e2e53086

blk-mq: avoid a mess of casts for blk_end_sync_rq · 32ac5a9b

由 Christoph Hellwig 提交于 5月 24, 2022

Instead of trying to cast a __bitwise 32-bit integer to a larger integer
and then a pointer, just allow a struct with the blk_status_t and the
completion on stack and set the end_io_data to that.  Use the
opportunity to move the code to where it belongs and drop rather
confusing comments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

32ac5a9b

blk-mq: remove __blk_execute_rq_nowait · ae948fd6

由 Christoph Hellwig 提交于 5月 24, 2022

We don't want to plug for synchronous execution that where we immediately
wait for the request. Once that is done not a whole lot of code is
shared, so just remove __blk_execute_rq_nowait.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ae948fd6

23 5月, 2022 1 次提交

blk-mq: don't touch ->tagset in blk_mq_get_sq_hctx · 5d05426e

由 Ming Lei 提交于 5月 22, 2022

blk_mq_run_hw_queues() could be run when there isn't queued request and
after queue is cleaned up, at that time tagset is freed, because tagset
lifetime is covered by driver, and often freed after blk_cleanup_queue()
returns.

So don't touch ->tagset for figuring out current default hctx by the mapping
built in request queue, so use-after-free on tagset can be avoided. Meantime
this way should be fast than retrieving mapping from tagset.

Cc: "yukuai (C)" <yukuai3@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Fixes: b6e68ee8 ("blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220522122350.743103-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5d05426e

21 5月, 2022 1 次提交

blk-mq: fix typo in comment · 2aaf5160

由 Julia Lawall 提交于 5月 21, 2022

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.
Signed-off-by: NJulia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220521111145.81697-29-Julia.Lawall@inria.frSigned-off-by: NJens Axboe <axboe@kernel.dk>

2aaf5160

12 5月, 2022 1 次提交

blk-mq: fix passthrough plugging · a327c341

由 Ming Lei 提交于 5月 12, 2022

First we can't add request into plug list in blk_mq_request_bypass_insert
which may be called when flushing plug list, so nested plug is caused.

Second if polled passthrough request is inserted via blk_execute_rq(),
it can't be added to plug list too since io polling needs the request
to be issued to driver.

Fixes the two by moving plugging into blk_execute_rq_no_wait().

Cc: Christoph Hellwig <hch@lst.de>
Fixes: 1c2d2fff ("block: wire-up support for passthrough plugging")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220512140010.1458645-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a327c341

11 5月, 2022 1 次提交

block: wire-up support for passthrough plugging · 1c2d2fff

由 Jens Axboe 提交于 5月 11, 2022

Add support for plugging in passthrough path. When plugging is enabled, the
requests are added to a plug instead of getting dispatched to the driver.
And when the plug is finished, the whole batch gets dispatched via
->queue_rqs which turns out to be more efficient. Otherwise dispatching
used to happen via ->queue_rq, one request at a time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220511054750.20432-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1c2d2fff

28 4月, 2022 1 次提交

Revert "block: inherit request start time from bio for BLK_CGROUP" · 4cddeaca

由 Tejun Heo 提交于 4月 27, 2022

This reverts commit 00067077. It has a
couple problems:

* bio_issue_time() is stored in bio->bi_issue truncated to 51 bits. This
  overflows in slightly over 26 days. Setting rq->io_start_time_ns with it
  means that io duration calculation would yield >26days after 26 days of
  uptime. This, for example, confuses kyber making it cause high IO
  latencies.

* rq->io_start_time_ns should record the time that the IO is issued to the
  device so that on-device latency can be measured. However,
  bio_issue_time() is set before the bio goes through the rq-qos controllers
  (wbt, iolatency, iocost), so when the bio gets throttled in any of the
  mechanisms, the measured latencies make no sense - on-device latencies end
  up higher than request-alloc-to-completion latencies.

We'll need a smarter way to avoid calling ktime_get_ns() repeatedly
back-to-back. For now, let's revert the commit.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v5.16+
Link: https://lore.kernel.org/r/YmmeOLfo5lzc+8yI@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

4cddeaca

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功