提交 · 3304742562d27fb87a6d8291cc48824dd20f6964 · openeuler / Kernel

29 11月, 2021 40 次提交

block: mark put_io_context_active static · 33047425

由 Christoph Hellwig 提交于 11月 26, 2021

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

33047425

Revert "block: Provide blk_mq_sched_get_icq()" · c2a32464

由 Christoph Hellwig 提交于 11月 26, 2021

This reverts commit 4896c4e64ba5d5d5acdbcf68c5910dd4f6d8fa62.

The helper is not needed any more.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c2a32464

bfq: use bfq_bic_lookup in bfq_limit_depth · a0725c22

由 Christoph Hellwig 提交于 11月 26, 2021

No need to create a new I/O context if there is none present yet in
->limit_depth.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a0725c22

bfq: simplify bfq_bic_lookup · 836b394b

由 Christoph Hellwig 提交于 11月 26, 2021

Remove the unused bfqd argument, and hardcode ioc to current->io_context.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

836b394b

fork: move copy_io to block/blk-ioc.c · 88c9a2ce

由 Christoph Hellwig 提交于 11月 26, 2021

Move the copying of the I/O context to the block layer as that is where
we can use the proper low-level interfaces.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

88c9a2ce

RDMA/qib: rename copy_io to qib_copy_io · e92a559e

由 Christoph Hellwig 提交于 11月 26, 2021

Add the proper module prefix to avoid conflicts with a function
in the scheduler.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211126115817.2087431-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e92a559e

blk-mq: use bio->bi_opf after bio is checked · 5f480b1a

由 Ming Lei 提交于 11月 27, 2021

bio->bi_opf isn't finalized before checking the bio, so use it after
submit_bio_checks() returns.

Fixes: 5b13bc8a ("blk-mq: cleanup request allocation")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f480b1a

bfq: Do not let waker requests skip proper accounting · c65e6fd4

由 Jan Kara 提交于 11月 25, 2021

Commit 7cc4ffc5 ("block, bfq: put reqs of waker and woken in
dispatch list") added a condition to bfq_insert_request() which added
waker's requests directly to dispatch list. The rationale was that
completing waker's IO is needed to get more IO for the current queue.
Although this rationale is valid, there is a hole in it. The waker does
not necessarily serve the IO only for the current queue and maybe it's
current IO is not needed for current queue to make progress. Furthermore
injecting IO like this completely bypasses any service accounting within
bfq and thus we do not properly track how much service is waker's queue
getting or that the waker is actually doing any IO. Depending on the
conditions this can result in the waker getting too much or too few
service.

Consider for example the following job file:

[global]
directory=/mnt/repro/
rw=write
size=8g
time_based
runtime=30
ramp_time=10
blocksize=1m
direct=0
ioengine=sync

[slowwriter]
numjobs=1
prioclass=2
prio=7
fsync=200

[fastwriter]
numjobs=1
prioclass=2
prio=0
fsync=200

Despite processes have very different IO priorities, they get the same
about of service. The reason is that bfq identifies these processes as
having waker-wakee relationship and once that happens, IO from
fastwriter gets injected during slowwriter's time slice. As a result bfq
is not aware that fastwriter has any IO to do and constantly schedules
only slowwriter's queue. Thus fastwriter is forced to compete with
slowwriter's IO all the time instead of getting its share of time based
on IO priority.

Drop the special injection condition from bfq_insert_request(). As a
result, requests will be tracked and queued in a normal way and on next
dispatch bfq_select_queue() can decide whether the waker's inserted
requests should be injected during the current queue's timeslice or not.

Fixes: 7cc4ffc5 ("block, bfq: put reqs of waker and woken in dispatch list")
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-8-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

c65e6fd4

bfq: Log waker detections · 1eb17f5e

由 Jan Kara 提交于 11月 25, 2021

Waker - wakee relationships are important in deciding whether one queue
can preempt the other one. Print information about detected waker-wakee
relationships so that scheduling decisions can be better understood from
block traces.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-7-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

1eb17f5e

bfq: Provide helper to generate bfqq name · 582f04e1

由 Jan Kara 提交于 11月 25, 2021

Instead of having helper formating bfqq pid, provide a helper to
generate full bfqq name as used in the traces. It saves some code
duplication and will save more in the coming tracepoints.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-6-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

582f04e1

bfq: Limit waker detection in time · 1f18b700

由 Jan Kara 提交于 11月 25, 2021

Currently, when process A starts issuing requests shortly after process
B has completed some IO three times in a row, we decide that B is a
"waker" of A meaning that completing IO of B is needed for A to make
progress and generally stop separating A's and B's IO much. This logic
is useful to avoid unnecessary idling and thus throughput loss for cases
where workload needs to switch e.g. between the process and the
journaling thread doing IO. However the detection heuristic tends to
frequently give false positives when A and B are fighting IO bandwidth
and other processes aren't doing much IO as we are basically deemed to
eventually accumulate three occurences of a situation where one process
starts issuing requests after the other has completed some IO. To reduce
these false positives, cancel the waker detection also if we didn't
accumulate three detected wakeups within given timeout. The rationale is
that if wakeups are really rare, the pointless idling doesn't hurt
throughput that much anyway.

This significantly reduces false waker detection for workload like:

[global]
directory=/mnt/repro/
rw=write
size=8g
time_based
runtime=30
ramp_time=10
blocksize=1m
direct=0
ioengine=sync

[slowwriter]
numjobs=1
fsync=200

[fastwriter]
numjobs=1
fsync=200
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-5-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

1f18b700

bfq: Limit number of requests consumed by each cgroup · 76f1df88

由 Jan Kara 提交于 11月 25, 2021

When cgroup IO scheduling is used with BFQ it does not really provide
service differentiation if the cgroup drives a big IO depth. That for
example happens with writeback which asynchronously submits lots of IO
but it can happen with AIO as well. The problem is that if we have two
cgroups that submit IO with different weights, the cgroup with higher
weight properly gets more IO time and is able to dispatch more IO.
However this causes lower weight cgroup to accumulate more requests
inside BFQ and eventually lower weight cgroup consumes most of IO
scheduler tags. At that point higher weight cgroup stops getting better
service as it is mostly blocked waiting for a scheduler tag while its
queues inside BFQ are empty and thus lower weight cgroup gets served.

Check how many requests submitting cgroup has allocated in
bfq_limit_depth() and if it consumes more requests than what would
correspond to its weight limit available depth to 1 so that the cgroup
cannot consume many more requests. With this limitation the higher
weight cgroup gets proper service even with writeback.
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-4-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

76f1df88

bfq: Store full bitmap depth in bfq_data · 44dfa279

由 Jan Kara 提交于 11月 25, 2021

Store bitmap depth shift inside bfq_data so that we can use it in
bfq_limit_depth() for proportioning when limiting number of available
request tags for a cgroup.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-3-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

44dfa279

bfq: Track number of allocated requests in bfq_entity · 98f04499

由 Jan Kara 提交于 11月 25, 2021

When we want to limit number of requests used by each bfqq and also
cgroup, we need to track also number of requests used by each cgroup.
So track number of allocated requests for each bfq_entity.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-2-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

98f04499

block: Provide blk_mq_sched_get_icq() · 790cf9c8

由 Jan Kara 提交于 11月 25, 2021

Currently we lookup ICQ only after the request is allocated. However BFQ
will want to decide how many scheduler tags it allows a given bfq queue
(effectively a process) to consume based on cgroup weight. So provide a
function blk_mq_sched_get_icq() so that BFQ can lookup ICQ earlier.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211125133645.27483-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>

790cf9c8

mmc: core: Use blk_mq_complete_request_direct(). · 639d3531

由 Sebastian Andrzej Siewior 提交于 10月 25, 2021

The completion callback for the sdhci-pci device is invoked from a
kworker.
I couldn't identify in which context is mmc_blk_mq_req_done() invoke but
the remaining caller are from invoked from preemptible context. Here it
would make sense to complete the request directly instead scheduling
ksoftirqd for its completion.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20211025070658.1565848-3-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

639d3531

blk-mq: Add blk_mq_complete_request_direct() · e8dc17e2

由 Sebastian Andrzej Siewior 提交于 10月 25, 2021

Add blk_mq_complete_request_direct() which completes the block request
directly instead deferring it to softirq for single queue devices.

This is useful for devices which complete the requests in preemptible
context and raising softirq from means scheduling ksoftirqd.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211025070658.1565848-2-bigeasy@linutronix.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e8dc17e2

blk-crypto: remove blk_crypto_unregister() · 72cd9df2

由 Eric Biggers 提交于 11月 23, 2021

This function is trivial and is only used in one place. Having this
function is misleading because it implies that blk_crypto_register()
needs to be paired with blk_crypto_unregister(), which is not the case.
Just set disk->queue->crypto_profile to NULL directly.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211124013733.347612-1-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

72cd9df2

blk-mq: cleanup request allocation · 5b13bc8a

由 Christoph Hellwig 提交于 11月 24, 2021

Refactor the request alloction so that blk_mq_get_cached_request tries
to find a cached request first, and the entirely separate and now
self contained blk_mq_get_new_requests allocates one or more requests
if that is not possible.

There is a small change in behavior as submit_bio_checks is called
twice now if a cached request is present but can't be used, but that
is a small price to pay for unwinding this code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211124062856.1444266-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

5b13bc8a

block: don't include <linux/part_stat.h> in blk.h · 82d981d4

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

82d981d4

block: don't include <linux/idr.h> in blk.h · ca5b304c

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ca5b304c

block: don't include <linux/blk-mq.h> in blk.h · a2ff7781

由 Christoph Hellwig 提交于 11月 23, 2021

Not needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a2ff7781

block: don't include blk-mq.h in blk.h · e4a19f72

由 Christoph Hellwig 提交于 11月 23, 2021

No needed, shift a blk-stat.h include into the source file that needs it
instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e4a19f72

block: don't include blk-mq-sched.h in blk.h · 2aa7745b

由 Christoph Hellwig 提交于 11月 23, 2021

No needed, shift it into the source files that need it instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

2aa7745b

block: remove the e argument to elevator_exit · 0c6cb3a2

由 Christoph Hellwig 提交于 11月 23, 2021

All callers pass q->elevator.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c6cb3a2

block: remove elevator_exit · f46b81c5

由 Christoph Hellwig 提交于 11月 23, 2021

Open code elevator_exit in it's only caller, and rename __elevator_exit to
elevator_exit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f46b81c5

block: move blk_get_flush_queue to blk-flush.c · 0281ed3c

由 Christoph Hellwig 提交于 11月 23, 2021

blk_get_flush_queue is only used in blk-flush.c, so move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123185312.1432157-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0281ed3c

blk_mq: remove repeated includes · 35c90e6e

由 Guo Zhengkui 提交于 11月 23, 2021

Remove a repeated "#include<linux/sched/sysctl.h>".
Signed-off-by: NGuo Zhengkui <guozhengkui@vivo.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20211123063340.25882-1-guozhengkui@vivo.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

35c90e6e

block: move io_context creation into where it's needed · 5a9d041b

由 Jens Axboe 提交于 11月 13, 2021

The only user of the io_context for IO is BFQ, yet we put the checking
and logic of it into the normal IO path.

Put the creation into blk_mq_sched_assign_ioc(), and have BFQ use that
helper.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a9d041b

block: only allocate poll_stats if there's a user of them · 48b5c1fb

由 Jens Axboe 提交于 11月 13, 2021

This is essentially never used, yet it's about 1/3rd of the total
queue size. Allocate it when needed, and don't embed it in the queue.

Kill the queue flag for this while at it, since we can just check the
assigned pointer now.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48b5c1fb

blk-ioprio: don't set bio priority if not needed · 25c4b5e0

由 Jens Axboe 提交于 11月 13, 2021

We don't need to write to the bio if:

1) No ioprio value has ever been assigned to the blkcg
2) We wouldn't anyway, depending on bio and blkcg IO priority
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

25c4b5e0

blk-mq: move more plug handling from blk_mq_submit_bio into blk_add_rq_to_plug · 1e9c2303

由 Christoph Hellwig 提交于 11月 23, 2021

Keep all the functionality for adding a request to a plug in a single place.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e9c2303

blk-mq: simplify the plug handling in blk_mq_submit_bio · 0c5bcc92

由 Christoph Hellwig 提交于 11月 23, 2021

blk_mq_submit_bio has two different plug cases, one that uses full
plugging and a limited plugging one.

The limited plugging case is only used for a corner case that does
not matter in real life:

 - no ->commit_rqs (so not NVMe)
 - no shared tags (so not SCSI)
 - not rotational (so no old disk or floppy driver)
 - must have multiple queues (so no eMMC)

Remove the limited merging case and all the related junk to simplify
blk_mq_submit_bio and the functions called from it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211123160443.1315598-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0c5bcc92

sr: set GENHD_FL_REMOVABLE earlier · a4561f9f

由 Christoph Hellwig 提交于 11月 22, 2021

Set up GENHD_FL_REMOVABLE together with the rest of the gendisk fields.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a4561f9f

block: cleanup the GENHD_FL_* definitions · 430cc5d3

由 Christoph Hellwig 提交于 11月 22, 2021

Switch to an enum and tidy up the documentation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

430cc5d3

block: don't set GENHD_FL_NO_PART for hidden gendisks · 9f18db57

由 Christoph Hellwig 提交于 11月 22, 2021

Hidden gendisks can't be opened using blkdev_get_*, so we can't really
reach any of the partition scanning paths or partitioning ioctls except
for the initial partition scan from add_disk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-13-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9f18db57

block: remove GENHD_FL_EXT_DEVT · 1ebe2e5f

由 Christoph Hellwig 提交于 11月 22, 2021

All modern drivers can support extra partitions using the extended
dev_t.  In fact except for the ioctl method drivers never even see
partitions in normal operation.

So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ebe2e5f

block: remove GENHD_FL_SUPPRESS_PARTITION_INFO · 3b5149ac

由 Christoph Hellwig 提交于 11月 22, 2021

This flag is not set directly anywhere and only inherited from
GENHD_FL_HIDDEN. Just check for GENHD_FL_HIDDEN instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-11-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3b5149ac

mmc: don't set GENHD_FL_SUPPRESS_PARTITION_INFO · 79b0f79a

由 Christoph Hellwig 提交于 11月 22, 2021

This manually reverts 07b652cdbec3 ("mmc: card: Don't show eMMC RPMB and
BOOT areas in /proc/partitions"). Based on the commit description that
change was purely cosmetic. mmc is the last driver that sets this
flag and thus prevents it from being removed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

79b0f79a

null_blk: don't suppress partitioning information · 94b49c3d

由 Christoph Hellwig 提交于 11月 22, 2021

This manually reverts commit 27290b469051 ("null_blk: suppress invalid
partition info"). The message in that commit log can't appearch as
the flag is never checked during probing, and there is no good reason
to treat null_blk special in /proc/partitions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

94b49c3d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功