提交 · 4f266f2be822eacd70aca2a7a53c4a111be79acb · openeuler / Kernel

19 10月, 2021 1 次提交

block: skip elevator fields init for non-elv queue · 4f266f2b

由 Pavel Begunkov 提交于 3年前

Don't init rq->hash and rq->rb_node in blk_mq_rq_ctx_init() if there is
no elevator. Also, move some other initialisers that imply barriers to
the end, so the compiler is free to rearrange and optimise other the
rest of them.

note: fold in a change from Jens leaving queue_list unconditional, as
it might lead to problems otherwise.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4f266f2b

18 10月, 2021 39 次提交

block: store elevator state in request · 2ff0682d

由 Jens Axboe 提交于 3年前

Add an rq private RQF_ELV flag, which tells the block layer that this
request was initialized on a queue that has an IO scheduler attached.
This allows for faster checking in the fast path, rather than having to
deference rq->q later on.

Elevator switching does full quiesce of the queue before detaching an
IO scheduler, so it's safe to cache this in the request itself.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2ff0682d

block: only mark bio as tracked if it really is tracked · 90b8faa0

由 Jens Axboe 提交于 3年前

We set BIO_TRACKED unconditionally when rq_qos_throttle() is called, even
though we may not even have an rq_qos handler. Only mark it as TRACKED if
it really is potentially tracked.

This saves considerable time for the case where the bio isn't tracked:

     2.64%     -1.65%  [kernel.vmlinux]  [k] bio_endio
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

90b8faa0

block: improve layout of struct request · b6087629

由 Jens Axboe 提交于 3年前

It's been a while since this was analyzed, move some members around to
better flow with the use case. Initial state up top, and queued state
after that. This improves my peak case by about 1.5%, from 7750K to
7900K IOPS.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6087629

block: move update request helpers into blk-mq.c · 9be3e06f

由 Jens Axboe 提交于 3年前

For some reason we still have them in blk-core, with the rest of the
request completion being in blk-mq. That causes and out-of-line call
for each completion.

Move them into blk-mq.c instead, where they belong.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9be3e06f

block: remove useless caller argument to print_req_error() · c477b797

由 Jens Axboe 提交于 3年前

We have exactly one caller of this, just get rid of adding the useless
function name to the output.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c477b797

block: don't bother iter advancing a fully done bio · d4aa57a1

由 Jens Axboe 提交于 3年前

If we're completing nbytes and nbytes is the size of the bio, don't bother
with calling into the iterator increment helpers. Just clear the bio
size and we're done.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4aa57a1

block: convert the rest of block to bdev_get_queue · ed6cddef

由 Pavel Begunkov 提交于 3年前

Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/addf6ea988c04213697ba3684c853e4ed7642a39.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

ed6cddef

block: use bdev_get_queue() in blk-core.c · eab4e027

由 Pavel Begunkov 提交于 3年前

Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/efc41f880262517c8dc32f932f1b23112f21b255.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

eab4e027

block: use bdev_get_queue() in bio.c · 3caee463

由 Pavel Begunkov 提交于 3年前

Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/85c36ea784d285a5075baa10049e6b59e15fb484.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3caee463

block: use bdev_get_queue() in bdev.c · 025a3865

由 Pavel Begunkov 提交于 3年前

Convert bdev->bd_disk->queue to bdev_get_queue(), it's uses a cached
queue pointer and so is faster.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a352936ce5d9ac719645b1e29b173d931ebcdc02.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

025a3865

block: cache request queue in bdev · 17220ca5

由 Pavel Begunkov 提交于 3年前

There are tons of places where we need to get a request_queue only
having bdev, which turns into bdev->bd_disk->queue. There are probably a
hundred of such places considering inline helpers, and enough of them
are in hot paths.

Cache queue pointer in struct block_device and make use of it in
bdev_get_queue().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a3bfaecdd28956f03629d0ca5c63ebc096e1c809.1634219547.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

17220ca5

block: handle fast path of bio splitting inline · abd45c15

由 Jens Axboe 提交于 3年前

The fast path is no splitting needed. Separate the handling into a
check part we can inline, and an out-of-line handling path if we do
need to split.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

abd45c15

block: use flags instead of bit fields for blkdev_dio · 09ce8744

由 Jens Axboe 提交于 3年前

This generates a lot better code for me, and bumps performance from
7650K IOPS to 7750K IOPS. Looking at profiles for the run and running
perf diff, it confirms that we're now sending a lot less time there:

     6.38%     -2.80%  [kernel.vmlinux]  [k] blkdev_direct_IO

Taking it from the 2nd most cycle consumer to only the 9th most at
3.35% of the CPU time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09ce8744

block: cache bdev in struct file for raw bdev IO · fac7c6d5

由 Pavel Begunkov 提交于 3年前

bdev = &BDEV_I(file->f_mapping->host)->bdev

Getting struct block_device from a file requires 2 memory dereferences
as illustrated above, that takes a toll on performance, so cache it in
yet unused file->private_data. That gives a noticeable peak performance
improvement.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/8415f9fe12e544b9da89593dfbca8de2b52efe03.1634115360.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fac7c6d5

nvme-multipath: enable polled I/O · c712dccc

由 Christoph Hellwig 提交于 3年前

Set the poll queue flag to enable polling, given that the multipath
node just dispatches the bios to a lower queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c712dccc

block: don't allow writing to the poll queue attribute · a614dd22

由 Christoph Hellwig 提交于 3年前

The poll attribute is a historic artefact from before when we had
explicit poll queues that require driver specific configuration.
Just print a warning when writing to the attribute.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a614dd22

block: switch polling to be bio based · 3e08773c

由 Christoph Hellwig 提交于 3年前

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3e08773c

block: define 'struct bvec_iter' as packed · 19416123

由 Ming Lei 提交于 3年前

'struct bvec_iter' is embedded into 'struct bio', define it as packed
so that we can get one extra 4bytes for other uses without expanding
bio.

'struct bvec_iter' is often allocated on stack, so making it packed
doesn't affect performance. Also I have run io_uring on both
nvme/null_blk, and not observe performance effect in this way.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

19416123

block: use SLAB_TYPESAFE_BY_RCU for the bio slab · 1a7e76e4

由 Christoph Hellwig 提交于 3年前

This flags ensures that the pages will not be reused for non-bio
allocations before the end of an RCU grace period. With that we can
safely use a RCU lookup for bio polling as long as we are fine with
occasionally polling the wrong device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1a7e76e4

block: rename REQ_HIPRI to REQ_POLLED · 6ce913fe

由 Christoph Hellwig 提交于 3年前

Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague,
the bio flag is specific to the polling implementation, so rename and
document it properly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6ce913fe

io_uring: don't sleep when polling for I/O · d729cf9a

由 Christoph Hellwig 提交于 3年前

There is no point in sleeping for the expected I/O completion timeout
in the io_uring async polling model as we never poll for a specific
I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d729cf9a

block: replace the spin argument to blk_iopoll with a flags argument · ef99b2d3

由 Christoph Hellwig 提交于 3年前

Switch the boolean spin argument to blk_poll to passing a set of flags
instead.  This will allow to control polling behavior in a more fine
grained way.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef99b2d3

blk-mq: remove blk_qc_t_valid · 28a1ae6b

由 Christoph Hellwig 提交于 3年前

Move the trivial check into the only caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

28a1ae6b

blk-mq: remove blk_qc_t_to_tag and blk_qc_t_is_internal · efbabbe1

由 Christoph Hellwig 提交于 3年前

Merge both functions into their only caller to keep the blk-mq tag to
blk_qc_t mapping as private as possible in blk-mq.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

efbabbe1

blk-mq: factor out a "classic" poll helper · c6699d6f

由 Christoph Hellwig 提交于 3年前

Factor the code to do the classic full metal polling out of blk_poll into
a separate blk_mq_poll_classic helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c6699d6f

blk-mq: factor out a blk_qc_to_hctx helper · f70299f0

由 Christoph Hellwig 提交于 3年前

Add a helper to get the hctx from a request_queue and cookie, and fold
the blk_qc_t_to_queue_num helper into it as no other callers are left.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f70299f0

io_uring: fix a layering violation in io_iopoll_req_issued · 30da1b45

由 Christoph Hellwig 提交于 3年前

syscall-level code can't just poke into the details of the poll cookie,
which is private information of the block layer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

30da1b45

iomap: don't try to poll multi-bio I/Os in __iomap_dio_rw · f79d4749

由 Christoph Hellwig 提交于 3年前

If an iocb is split into multiple bios we can't poll for both. So don't
bother to even try to poll in that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f79d4749

block: don't try to poll multi-bio I/Os in __blkdev_direct_IO · 71fc3f5e

由 Christoph Hellwig 提交于 3年前

If an iocb is split into multiple bios we can't poll for both. So don't
even bother to try to poll in that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012111226.760968-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

71fc3f5e

direct-io: remove blk_poll support · 94c2ed58

由 Christoph Hellwig 提交于 3年前

The polling support in the legacy direct-io support is a little crufty.
It already doesn't support the asynchronous polling needed for io_uring
polling, and is hard to adopt to upcoming changes in the polling
interfaces. Given that all the major file systems already use the iomap
direct I/O code, just drop the polling support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

94c2ed58

block: only check previous entry for plug merge attempt · d38a9c04

由 Jens Axboe 提交于 3年前

Currently we scan the entire plug list, which is potentially very
expensive. In an IOPS bound workload, we can drive about 5.6M IOPS with
merging enabled, and profiling shows that the plug merge check is the
(by far) most expensive thing we're doing:

  Overhead  Command   Shared Object     Symbol
  +   20.89%  io_uring  [kernel.vmlinux]  [k] blk_attempt_plug_merge
  +    4.98%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
  +    4.78%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
  +    4.61%  io_uring  [kernel.vmlinux]  [k] blk_mq_submit_bio

Instead of browsing the whole list, just check the previously inserted
entry. That is enough for a naive merge check and will catch most cases,
and for devices that need full merging, the IO scheduler attached to
such devices will do that anyway. The plug merge is meant to be an
inexpensive check to avoid getting a request, but if we repeatedly
scan the list for every single insert, it is very much not a cheap
check.

With this patch, the workload instead runs at ~7.0M IOPS, providing
a 25% improvement. Disabling merging entirely yields another 5%
improvement.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d38a9c04

block: move CONFIG_BLOCK guard to top Makefile · 4c928904

由 Masahiro Yamada 提交于 3年前

Every object under block/ depends on CONFIG_BLOCK.

Move the guard to the top Makefile since there is no point to
descend into block/ if CONFIG_BLOCK=n.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-5-masahiroy@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

4c928904

block: move menu "Partition type" to block/partitions/Kconfig · b8b98a62

由 Masahiro Yamada 提交于 3年前

Move the menu to the relevant place.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-4-masahiroy@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

b8b98a62

block: simplify Kconfig files · c50fca55

由 Masahiro Yamada 提交于 3年前

Everything under block/ depends on BLOCK. BLOCK_HOLDER_DEPRECATED is
selected from drivers/md/Kconfig, which is entirely dependent on BLOCK.

Extend the 'if BLOCK' ... 'endif' so it covers the whole block/Kconfig.

Also, clean up the definition of BLOCK_COMPAT and BLK_MQ_PCI because
COMPAT and PCI are boolean.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-3-masahiroy@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

c50fca55

block: remove redundant =y from BLK_CGROUP dependency · df252bde

由 Masahiro Yamada 提交于 3年前

CONFIG_BLK_CGROUP is a boolean option, that is, its value is 'y' or 'n'.
The comparison to 'y' is redundant.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210927140000.866249-2-masahiroy@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

df252bde

block: improve batched tag allocation · 349302da

由 Jens Axboe 提交于 3年前

Add a blk_mq_get_tags() helper, which uses the new sbitmap API for
allocating a batch of tags all at once. This both simplifies the block
code for batched allocation, and it is also more efficient than just
doing repeated calls into __sbitmap_queue_get().

This reduces the sbitmap overhead in peak runs from ~3% to ~1% and
yields a performanc increase from 6.6M IOPS to 6.8M IOPS for a single
CPU core.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

349302da

sbitmap: add __sbitmap_queue_get_batch() · 9672b0d4

由 Jens Axboe 提交于 3年前

The block layer tag allocation batching still calls into sbitmap to get
each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
which returns a mask of tags all at once, along with an offset for
those tags.

An example return would be 0xff, where bits 0..7 are set, with
tag_offset == 128. The valid tags in this case would be 128..135.

A batch is specific to an individual sbitmap_map, hence it cannot be
larger than that. The requested number of tags is automatically reduced
to the max that can be satisfied with a single map.

On failure, 0 is returned. Caller should fall back to single tag
allocation at that point/
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9672b0d4

blk-mq: optimise *end_request non-stat path · 8971a3b7

由 Pavel Begunkov 提交于 3年前

We already have a blk_mq_need_time_stamp() check in
__blk_mq_end_request() to get a timestamp, hide all the statistics
accounting under it. It cuts some cycles for requests that don't need
stats, and is free otherwise.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e0f2ea812e93a8adcd07101212e7d7e70ca304e7.1634115360.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8971a3b7

block: mark bio_truncate static · 4f7ab09a

由 Christoph Hellwig 提交于 3年前

bio_truncate is only used in bio.c, so mark it static.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012161804.991559-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

4f7ab09a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功