提交 · 1eb17f5e15b73669df635fb07df2853cb1244a69 · openeuler / Kernel

29 11月, 2021 40 次提交

由 Jan Kara 提交于 11月 25, 2021

Waker - wakee relationships are important in deciding whether one queue
can preempt the other one. Print information about detected waker-wakee
relationships so that scheduling decisions can be better understood from
block traces.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-7-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

1eb17f5e

bfq: Provide helper to generate bfqq name · 582f04e1

由 Jan Kara 提交于 11月 25, 2021

Instead of having helper formating bfqq pid, provide a helper to
generate full bfqq name as used in the traces. It saves some code
duplication and will save more in the coming tracepoints.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-6-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

582f04e1

bfq: Limit waker detection in time · 1f18b700

由 Jan Kara 提交于 11月 25, 2021

Currently, when process A starts issuing requests shortly after process
B has completed some IO three times in a row, we decide that B is a
"waker" of A meaning that completing IO of B is needed for A to make
progress and generally stop separating A's and B's IO much. This logic
is useful to avoid unnecessary idling and thus throughput loss for cases
where workload needs to switch e.g. between the process and the
journaling thread doing IO. However the detection heuristic tends to
frequently give false positives when A and B are fighting IO bandwidth
and other processes aren't doing much IO as we are basically deemed to
eventually accumulate three occurences of a situation where one process
starts issuing requests after the other has completed some IO. To reduce
these false positives, cancel the waker detection also if we didn't
accumulate three detected wakeups within given timeout. The rationale is
that if wakeups are really rare, the pointless idling doesn't hurt
throughput that much anyway.

This significantly reduces false waker detection for workload like:

[global]
directory=/mnt/repro/
rw=write
size=8g
time_based
runtime=30
ramp_time=10
blocksize=1m
direct=0
ioengine=sync

[slowwriter]
numjobs=1
fsync=200

[fastwriter]
numjobs=1
fsync=200
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-5-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

1f18b700

bfq: Limit number of requests consumed by each cgroup · 76f1df88

由 Jan Kara 提交于 11月 25, 2021

When cgroup IO scheduling is used with BFQ it does not really provide
service differentiation if the cgroup drives a big IO depth. That for
example happens with writeback which asynchronously submits lots of IO
but it can happen with AIO as well. The problem is that if we have two
cgroups that submit IO with different weights, the cgroup with higher
weight properly gets more IO time and is able to dispatch more IO.
However this causes lower weight cgroup to accumulate more requests
inside BFQ and eventually lower weight cgroup consumes most of IO
scheduler tags. At that point higher weight cgroup stops getting better
service as it is mostly blocked waiting for a scheduler tag while its
queues inside BFQ are empty and thus lower weight cgroup gets served.

Check how many requests submitting cgroup has allocated in
bfq_limit_depth() and if it consumes more requests than what would
correspond to its weight limit available depth to 1 so that the cgroup
cannot consume many more requests. With this limitation the higher
weight cgroup gets proper service even with writeback.
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-4-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

76f1df88

bfq: Store full bitmap depth in bfq_data · 44dfa279

由 Jan Kara 提交于 11月 25, 2021

Store bitmap depth shift inside bfq_data so that we can use it in
bfq_limit_depth() for proportioning when limiting number of available
request tags for a cgroup.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-3-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

44dfa279

bfq: Track number of allocated requests in bfq_entity · 98f04499

由 Jan Kara 提交于 11月 25, 2021

When we want to limit number of requests used by each bfqq and also
cgroup, we need to track also number of requests used by each cgroup.
So track number of allocated requests for each bfq_entity.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-2-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

98f04499

block: Provide blk_mq_sched_get_icq() · 790cf9c8

由 Jan Kara 提交于 11月 25, 2021

Currently we lookup ICQ only after the request is allocated. However BFQ
will want to decide how many scheduler tags it allows a given bfq queue
(effectively a process) to consume based on cgroup weight. So provide a
function blk_mq_sched_get_icq() so that BFQ can lookup ICQ earlier.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

790cf9c8

mmc: core: Use blk_mq_complete_request_direct(). · 639d3531

由 Sebastian Andrzej Siewior 提交于 10月 25, 2021

The completion callback for the sdhci-pci device is invoked from a
kworker.
I couldn't identify in which context is mmc_blk_mq_req_done() invoke but
the remaining caller are from invoked from preemptible context. Here it
would make sense to complete the request directly instead scheduling
ksoftirqd for its completion.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20211025070658.1565848-3-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

639d3531

blk-mq: Add blk_mq_complete_request_direct() · e8dc17e2

由 Sebastian Andrzej Siewior 提交于 10月 25, 2021

Add blk_mq_complete_request_direct() which completes the block request
directly instead deferring it to softirq for single queue devices.

This is useful for devices which complete the requests in preemptible
context and raising softirq from means scheduling ksoftirqd.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211025070658.1565848-2-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e8dc17e2

blk-crypto: remove blk_crypto_unregister() · 72cd9df2

由 Eric Biggers 提交于 11月 23, 2021

This function is trivial and is only used in one place. Having this
function is misleading because it implies that blk_crypto_register()
needs to be paired with blk_crypto_unregister(), which is not the case.
Just set disk->queue->crypto_profile to NULL directly.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211124013733.347612-1-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

72cd9df2

blk-mq: cleanup request allocation · 5b13bc8a

由 Christoph Hellwig 提交于 11月 24, 2021

Refactor the request alloction so that blk_mq_get_cached_request tries
to find a cached request first, and the entirely separate and now
self contained blk_mq_get_new_requests allocates one or more requests
if that is not possible.

There is a small change in behavior as submit_bio_checks is called
twice now if a cached request is present but can't be used, but that
is a small price to pay for unwinding this code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211124062856.1444266-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

5b13bc8a

block: don't include <linux/part_stat.h> in blk.h · 82d981d4

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

82d981d4

block: don't include <linux/idr.h> in blk.h · ca5b304c

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ca5b304c

block: don't include <linux/blk-mq.h> in blk.h · a2ff7781

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a2ff7781

block: don't include blk-mq.h in blk.h · e4a19f72

由 Christoph Hellwig 提交于 11月 23, 2021

No needed, shift a blk-stat.h include into the source file that needs it
instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e4a19f72

block: don't include blk-mq-sched.h in blk.h · 2aa7745b

由 Christoph Hellwig 提交于 11月 23, 2021

No needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

2aa7745b

block: remove the e argument to elevator_exit · 0c6cb3a2

由 Christoph Hellwig 提交于 11月 23, 2021

All callers pass q->elevator.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c6cb3a2

block: remove elevator_exit · f46b81c5

由 Christoph Hellwig 提交于 11月 23, 2021

Open code elevator_exit in it's only caller, and rename __elevator_exit to
elevator_exit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f46b81c5

block: move blk_get_flush_queue to blk-flush.c · 0281ed3c

由 Christoph Hellwig 提交于 11月 23, 2021

blk_get_flush_queue is only used in blk-flush.c, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0281ed3c

blk_mq: remove repeated includes · 35c90e6e

由 Guo Zhengkui 提交于 11月 23, 2021

Remove a repeated "#include<linux/sched/sysctl.h>".
Signed-off-by: NGuo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211123063340.25882-1-guozhengkui@vivo.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

35c90e6e

block: move io_context creation into where it's needed · 5a9d041b

由 Jens Axboe 提交于 11月 13, 2021

The only user of the io_context for IO is BFQ, yet we put the checking
and logic of it into the normal IO path.

Put the creation into blk_mq_sched_assign_ioc(), and have BFQ use that
helper.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a9d041b

block: only allocate poll_stats if there's a user of them · 48b5c1fb

由 Jens Axboe 提交于 11月 13, 2021

This is essentially never used, yet it's about 1/3rd of the total
queue size. Allocate it when needed, and don't embed it in the queue.

Kill the queue flag for this while at it, since we can just check the
assigned pointer now.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48b5c1fb

blk-ioprio: don't set bio priority if not needed · 25c4b5e0

由 Jens Axboe 提交于 11月 13, 2021

We don't need to write to the bio if:

1) No ioprio value has ever been assigned to the blkcg
2) We wouldn't anyway, depending on bio and blkcg IO priority
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

25c4b5e0

blk-mq: move more plug handling from blk_mq_submit_bio into blk_add_rq_to_plug · 1e9c2303

由 Christoph Hellwig 提交于 11月 23, 2021

Keep all the functionality for adding a request to a plug in a single place.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e9c2303

blk-mq: simplify the plug handling in blk_mq_submit_bio · 0c5bcc92

由 Christoph Hellwig 提交于 11月 23, 2021

blk_mq_submit_bio has two different plug cases, one that uses full
plugging and a limited plugging one.

The limited plugging case is only used for a corner case that does
not matter in real life:

 - no ->commit_rqs (so not NVMe)
 - no shared tags (so not SCSI)
 - not rotational (so no old disk or floppy driver)
 - must have multiple queues (so no eMMC)

Remove the limited merging case and all the related junk to simplify
blk_mq_submit_bio and the functions called from it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c5bcc92

sr: set GENHD_FL_REMOVABLE earlier · a4561f9f

由 Christoph Hellwig 提交于 11月 22, 2021

Set up GENHD_FL_REMOVABLE together with the rest of the gendisk fields.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a4561f9f

block: cleanup the GENHD_FL_* definitions · 430cc5d3

由 Christoph Hellwig 提交于 11月 22, 2021

Switch to an enum and tidy up the documentation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

430cc5d3

block: don't set GENHD_FL_NO_PART for hidden gendisks · 9f18db57

由 Christoph Hellwig 提交于 11月 22, 2021

Hidden gendisks can't be opened using blkdev_get_*, so we can't really
reach any of the partition scanning paths or partitioning ioctls except
for the initial partition scan from add_disk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f18db57

block: remove GENHD_FL_EXT_DEVT · 1ebe2e5f

由 Christoph Hellwig 提交于 11月 22, 2021

All modern drivers can support extra partitions using the extended
dev_t.  In fact except for the ioctl method drivers never even see
partitions in normal operation.

So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ebe2e5f

block: remove GENHD_FL_SUPPRESS_PARTITION_INFO · 3b5149ac

由 Christoph Hellwig 提交于 11月 22, 2021

This flag is not set directly anywhere and only inherited from
GENHD_FL_HIDDEN. Just check for GENHD_FL_HIDDEN instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b5149ac

mmc: don't set GENHD_FL_SUPPRESS_PARTITION_INFO · 79b0f79a

由 Christoph Hellwig 提交于 11月 22, 2021

This manually reverts 07b652cdbec3 ("mmc: card: Don't show eMMC RPMB and
BOOT areas in /proc/partitions"). Based on the commit description that
change was purely cosmetic. mmc is the last driver that sets this
flag and thus prevents it from being removed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79b0f79a

null_blk: don't suppress partitioning information · 94b49c3d

由 Christoph Hellwig 提交于 11月 22, 2021

This manually reverts commit 27290b469051 ("null_blk: suppress invalid
partition info"). The message in that commit log can't appearch as
the flag is never checked during probing, and there is no good reason
to treat null_blk special in /proc/partitions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

94b49c3d

block: remove the GENHD_FL_HIDDEN check in blkdev_get_no_open · 14086280

由 Christoph Hellwig 提交于 11月 29, 2021

Hidden gendisks never hash the block device inode, so this can't happen.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

14086280

block: rename GENHD_FL_NO_PART_SCAN to GENHD_FL_NO_PART · 46e7eac6

由 Christoph Hellwig 提交于 11月 22, 2021

The GENHD_FL_NO_PART_SCAN controls more than just partitions canning,
so rename it to GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

46e7eac6

block: merge disk_scan_partitions and blkdev_reread_part · e16e506c

由 Christoph Hellwig 提交于 11月 22, 2021

Unify the functionality that implements a partition rescan for a
gendisk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e16e506c

block: remove a dead check in show_partition · e3b3bad3

由 Christoph Hellwig 提交于 11月 22, 2021

disk_max_parts never returns 0 given that ->minors for devices not using
the extended dev_t must be non-zero, and disk_max_parts always returns
DISK_MAX_PARTS for the latter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e3b3bad3

block: remove GENHD_FL_CD · 1a827ce1

由 Christoph Hellwig 提交于 11月 22, 2021

GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device.
Besides being used internally inside of sunvdc.c an xen-blkfront it
is used by xen-blkback as a hint to claim a device exported to a
guest is a CD-ROM like device. Just check for disk->cdi instead
which is the right indicator for "real" CD-ROM or DVD drivers. This
will miss the paravirtualized guest drivers, but those make little
sense to report anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1a827ce1