提交 · 9301fe734384990ef9a2463cb7aeb3b00bf5dad5 · openeuler / Kernel

24 9月, 2020 2 次提交

block: cleanup partition scanning in register_disk · 9301fe73

由 Christoph Hellwig 提交于 9月 21, 2020

Use blkdev_get_by_dev instead of open coding it using bdget_disk +
blkdev_get, and split the code to read the partition table into a
separate helper to make it a little more obvious.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9301fe73

block: move the NEED_PART_SCAN flag to struct gendisk · 38430f08

由 Christoph Hellwig 提交于 9月 21, 2020

We can only scan for partitions on the whole disk, so move the flag
from struct block_device to struct gendisk.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38430f08

10 9月, 2020 1 次提交

block: add a bdev_check_media_change helper · 95f6f3a4

由 Christoph Hellwig 提交于 9月 08, 2020

Like check_disk_changed, except that it does not call ->revalidate_disk
but leaves that to the caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95f6f3a4

02 9月, 2020 5 次提交

block: use revalidate_disk_size in set_capacity_revalidate_and_notify · b8086d3f

由 Christoph Hellwig 提交于 9月 01, 2020

Only virtio_blk and xen-blkfront set the revalidate argument to true,
and both do not implement the ->revalidate_disk method.  So switch
to the helper that just updates the size instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b8086d3f

block: rename bd_invalidated · f4ad06f2

由 Christoph Hellwig 提交于 9月 01, 2020

Replace bd_invalidate with a new BDEV_NEED_PART_SCAN flag in a bd_flags
variable to better describe the condition.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f4ad06f2

C
block: remove the unused q argument to part_in_flight and part_in_flight_rw · 1f06959b
由 Christoph Hellwig 提交于 8月 31, 2020
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
1f06959b

block: remove the disk argument to delete_partition · 8328eb28

由 Christoph Hellwig 提交于 8月 31, 2020

We can trivially derive the gendisk from the hd_struct.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8328eb28

block: cleanup __alloc_disk_node · f93af2a4

由 Christoph Hellwig 提交于 8月 31, 2020

Use early returns and goto-based unwinding to simplify the flow a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f93af2a4

01 8月, 2020 1 次提交

block: genhd: delete duplicated words · 0d20dcc2

由 Randy Dunlap 提交于 7月 30, 2020

Drop the repeated word "to" in multiple places.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d20dcc2

18 7月, 2020 1 次提交

blk-cgroup: show global disk stats in root cgroup io.stat · ef45fe47

由 Boris Burkov 提交于 6月 01, 2020

In order to improve consistency and usability in cgroup stat accounting,
we would like to support the root cgroup's io.stat.

Since the root cgroup has processes doing io even if the system has no
explicitly created cgroups, we need to be careful to avoid overhead in
that case.  For that reason, the rstat algorithms don't handle the root
cgroup, so just turning the file on wouldn't give correct statistics.

To get around this, we simulate flushing the iostat struct by filling it
out directly from global disk stats. The result is a root cgroup io.stat
file consistent with both /proc/diskstats and io.stat.

Note that in order to collect the disk stats, we needed to iterate over
devices. To facilitate that, we had to change the linkage of a disk_type
to external so that it can be used from blk-cgroup.c to iterate over
disks.
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NBoris Burkov <boris@bur.io>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef45fe47

09 7月, 2020 1 次提交

md: switch to ->check_events for media change notifications · a564e23f

由 Christoph Hellwig 提交于 7月 08, 2020

md is the last driver using the legacy media_changed method.  Switch
it over to (not so) new ->clear_events approach, which also removes the
need for the ->revalidate_disk method.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[axboe: remove unused 'bdops' variable in disk_clear_events()]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a564e23f

24 6月, 2020 3 次提交

block: revert back to synchronous request_queue removal · e8c7d14a

由 Luis Chamberlain 提交于 6月 19, 2020

Commit dc9edc44 ("block: Fix a blk_exit_rl() regression") merged on
v4.12 moved the work behind blk_release_queue() into a workqueue after a
splat floated around which indicated some work on blk_release_queue()
could sleep in blk_exit_rl(). This splat would be possible when a driver
called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
as its final call) from an atomic context.

blk_put_queue() decrements the refcount for the request_queue kobject, and
upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is
now removed through commit db6d9952 ("block: remove request_list code")
on v5.0, we reserve the right to be able to sleep within
blk_release_queue() context.

The last reference for the request_queue must not be called from atomic
context. *When* the last reference to the request_queue reaches 0 varies,
and so let's take the opportunity to document when that is expected to
happen and also document the context of the related calls as best as
possible so we can avoid future issues, and with the hopes that the
synchronous request_queue removal sticks.

We revert back to synchronous request_queue removal because asynchronous
removal creates a regression with expected userspace interaction with
several drivers. An example is when removing the loopback driver, one
uses ioctls from userspace to do so, but upon return and if successful,
one expects the device to be removed. Likewise if one races to add another
device the new one may not be added as it is still being removed. This was
expected behavior before and it now fails as the device is still present
and busy still. Moving to asynchronous request_queue removal could have
broken many scripts which relied on the removal to have been completed if
there was no error. Document this expectation as well so that this
doesn't regress userspace again.

Using asynchronous request_queue removal however has helped us find
other bugs. In the future we can test what could break with this
arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE.

While at it, update the docs with the context expectations for the
request_queue / gendisk refcount decrement, and make these
expectations explicit by using might_sleep().

Fixes: dc9edc44 ("block: Fix a blk_exit_rl() regression")
Suggested-by: NNicolai Stange <nstange@suse.de>
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Nicolai Stange <nstange@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: yu kuai <yukuai3@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8c7d14a

block: clarify context for refcount increment helpers · 763b5892

由 Luis Chamberlain 提交于 6月 19, 2020

Let us clarify the context under which the helpers to increment the
refcount for the gendisk and request_queue can be called under. We
make this explicit on the places where we may sleep with might_sleep().

We don't address the decrement context yet, as that needs some extra
work and fixes, but will be addressed in the next patch.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

763b5892

block: add docs for gendisk / request_queue refcount helpers · b5bd357c

由 Luis Chamberlain 提交于 6月 19, 2020

This adds documentation for the gendisk / request_queue refcount
helpers.
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b5bd357c

27 5月, 2020 2 次提交

block: remove rcu_read_lock() from part_stat_lock() · 8ab1d40a

由 Konstantin Khlebnikov 提交于 5月 27, 2020

The RCU lock is required only in disk_map_sector_rcu() to lookup the
partition.  After that request holds reference to related hd_struct.

Replace get_cpu() with preempt_disable() - returned cpu index is unused.

[hch: rebased]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ab1d40a

block: always use a percpu variable for disk stats · 58d4f14f

由 Christoph Hellwig 提交于 5月 27, 2020

percpu variables have a perfectly fine working stub implementation
for UP kernels, so use that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

58d4f14f

19 5月, 2020 2 次提交

block: merge part_{inc,dev}_in_flight into their only callers · 10ec5e86

由 Christoph Hellwig 提交于 5月 13, 2020

part_inc_in_flight and part_dec_in_flight only have one caller each, and
those callers are purely for bio based drivers.  Merge each function into
the only caller, and remove the superflous blk-mq checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10ec5e86

block: move the blk-mq calls out of part_in_flight{,_rw} · b2f609e1

由 Christoph Hellwig 提交于 5月 13, 2020

Don't bother to call part_in_flight / part_in_flight_rw on blk-mq
devices, just call the blk-mq versions directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2f609e1

13 5月, 2020 3 次提交

block: don't hold part0's refcount in IO path · 27eb3af9

由 Ming Lei 提交于 5月 08, 2020

gendisk can't be gone when there is IO activity, so not hold
part0's refcount in IO path.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27eb3af9

block: only define 'nr_sects_seq' in hd_part for 32bit SMP · 07c4e1e8

由 Ming Lei 提交于 5月 08, 2020

The seqcount of 'nr_sects_seq' is only needed in case of 32bit SMP,
so define it just for 32bit SMP.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Cc: Yufen Yu <yuyufen@huawei.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07c4e1e8

block: fix use-after-free on cached last_lookup partition · b7d6c303

由 Ming Lei 提交于 5月 08, 2020

delete_partition() clears the cached last_lookup partition. However the
.last_lookup cache may be overwritten by one IO path after it is cleared
from delete_partition(). Then another IO path may use the cached deleting
partition after hd_struct_free() is called, then use-after-free is triggered
on the cached partition.

Fixes the issue by the following approach:

1) always get the partition's refcount via hd_struct_try_get() before
setting .last_lookup

2) move clearing .last_lookup from delete_partition() to hd_struct_free()
which is the release handle of the partition's percpu-refcount, so that no
IO path can cache deleteing partition via .last_lookup.

It is one candidate approach of Yufen's patch[1] which adds overhead
in fast path by indirect lookup which may introduce one extra cacheline
in IO path. Also this patch relies on percpu-refcount's protection, and
it is easier to understand and verify.

[1] https://lore.kernel.org/linux-block/20200109013551.GB9655@ming.t460p/T/#tReported-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7d6c303

10 5月, 2020 1 次提交

bdi: remove bdi_register_owner · 3c5d202b

由 Christoph Hellwig 提交于 5月 04, 2020

Split out a new bdi_set_owner helper to set the owner, and move the policy
for creating the bdi name back into genhd.c, where it belongs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c5d202b

21 4月, 2020 3 次提交

block: fold bdev_unhash_inode into invalidate_partition · 9bc5c397

由 Christoph Hellwig 提交于 4月 14, 2020

invalidate_partition and bdev_unhash_inode are always paired, and
invalidate_partition already does an icache lookup for the block device
inode.  Piggy back on that to remove the inode from the hash.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9bc5c397

block: mark invalidate_partition static · 02d33b67

由 Christoph Hellwig 提交于 4月 14, 2020

invalidate_partition is only used in genhd.c, so mark it static.  Also
drop the return value given that is is always ignored.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02d33b67

block: pass a hd_struct to delete_partition · cddae808

由 Christoph Hellwig 提交于 4月 14, 2020

All callers have the hd_struct at hand, so pass it instead of performing
another lookup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cddae808

27 3月, 2020 1 次提交

block: move the ->devnode callback to struct block_device_operations · 348e114b

由 Christoph Hellwig 提交于 3月 27, 2020

There really isn't any good reason to stash a method directly into
struct gendisk.  Move it together with the other block device
operations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

348e114b

25 3月, 2020 7 次提交

block: unexport get_gendisk · 1b4d4dbd

由 Christoph Hellwig 提交于 3月 25, 2020

get_gendisk is not used by any modular code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1b4d4dbd

block: unexport disk_map_sector_rcu · a7818aed

由 Christoph Hellwig 提交于 3月 25, 2020

disk_map_sector_rcu is not used by any modular code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7818aed

block: unexport disk_get_part · 572e7fc8

由 Christoph Hellwig 提交于 3月 25, 2020

disk_get_part is not used by any modular code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

572e7fc8

C
block: mark part_in_flight and part_in_flight_rw static · 6005771c
由 Christoph Hellwig 提交于 3月 25, 2020
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
6005771c

block: mark block_depr static · 31eb6186

由 Christoph Hellwig 提交于 3月 25, 2020

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

31eb6186

block/diskstats: replace time_in_queue with sum of request times · 8cd5b8fc

由 Konstantin Khlebnikov 提交于 3月 25, 2020

Column "time_in_queue" in diskstats is supposed to show total waiting time
of all requests. I.e. value should be equal to the sum of times from other
columns. But this is not true, because column "time_in_queue" is counted
separately in jiffies rather than in nanoseconds as other times.

This patch removes redundant counter for "time_in_queue" and shows total
time of read, write, discard and flush requests.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cd5b8fc

block/diskstats: accumulate all per-cpu counters in one pass · ea18e0f0

由 Konstantin Khlebnikov 提交于 3月 25, 2020

Reading /proc/diskstats iterates over all cpus for summing each field.
It's faster to sum all fields in one pass.

Hammering /proc/diskstats with fio shows 2x performance improvement:

fio --name=test --numjobs=$JOBS --filename=/proc/diskstats \
    --size=1k --bs=1k --fallocate=none --create_on_open=1 \
    --time_based=1 --runtime=10 --invalidate=0 --group_report

	  JOBS=1	JOBS=10
Before:	  7k iops	64k iops
After:	 18k iops      120k iops

Also this way code is more compact:

add/remove: 1/0 grow/shrink: 0/2 up/down: 194/-1540 (-1346)
Function                                     old     new   delta
part_stat_read_all                             -     194    +194
diskstats_show                              1344     631    -713
part_stat_show                              1219     392    -827
Total: Before=14966947, After=14965601, chg -0.01%
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ea18e0f0

24 3月, 2020 3 次提交

block: move sysfs methods shared by disks and partitions to genhd.c · 3ad5cee5

由 Christoph Hellwig 提交于 3月 24, 2020

Move the sysfs _show methods that are used both on the full disk and
partition nodes to genhd.c instead of hiding them in the partitioning
code.  Also move the declaration for these methods to block/blk.h so
that we don't expose them to drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3ad5cee5

block: move disk_name and related helpers out of partition-generic.c · 5cbd28e3

由 Christoph Hellwig 提交于 3月 24, 2020

Thes functions aren't really related to partition support, so move them
to a more suitable place.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5cbd28e3

block: remove the blk_lookup_devt export · d2332c5c

由 Christoph Hellwig 提交于 3月 24, 2020

This function is only used by init/do_mounts.c, which can't be modular.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d2332c5c

19 3月, 2020 1 次提交

block/genhd: Notify udev about capacity change · e598a72f

由 Balbir Singh 提交于 3月 13, 2020

Allow block/genhd to notify user space (via udev) about disk size changes
using a new helper set_capacity_revalidate_and_notify(), which is a wrapper
on top of set_capacity(). set_capacity_revalidate_and_notify() will only
notify via udev if the current capacity or the target capacity is not zero
and iff the capacity changes.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSomeswarudu Sangaraju <ssomesh@amazon.com>
Signed-off-by: NBalbir Singh <sblbir@amazon.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e598a72f

12 3月, 2020 1 次提交

block: Fix partition support for host aware zoned block devices · b53df2e7

由 Shin'ichiro Kawasaki 提交于 2月 21, 2020

Commit b7205307 ("block: allow partitions on host aware zone
devices") introduced the helper function disk_has_partitions() to check
if a given disk has valid partitions. However, since this function result
directly depends on the disk partition table length rather than the
actual existence of valid partitions in the table, it returns true even
after all partitions are removed from the disk. For host aware zoned
block devices, this results in zone management support to be kept
disabled even after removing all partitions.

Fix this by changing disk_has_partitions() to walk through the partition
table entries and return true if and only if a valid non-zero size
partition is found.

Fixes: b7205307 ("block: allow partitions on host aware zone devices")
Cc: stable@vger.kernel.org # 5.5
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b53df2e7

22 11月, 2019 1 次提交

block: add iostat counters for flush requests · b6866318

由 Konstantin Khlebnikov 提交于 11月 21, 2019

Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.

Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.

This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6866318

06 9月, 2019 1 次提交

block: Delay default elevator initialization · 737eb78e

由 Damien Le Moal 提交于 9月 05, 2019

When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet for most drivers. The device type and elevator required features
are not set yet, preventing to correctly select the default elevator
most suitable for the device.

This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.

Fix this by adding the boolean elevator_init argument to
blk_mq_init_allocated_queue() to control the execution of
elevator_init_mq(). Two cases exist:
1) elevator_init = false is used for calls to
   blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
   case, a call to elevator_init_mq() is added to __device_add_disk(),
   resulting in the delayed initialization of the queue elevator
   after the device driver finished probing the device information. This
   effectively allows elevator_init_mq() access to more information
   about the device.
2) elevator_init = true preserves the current behavior of initializing
   the elevator directly from blk_mq_init_allocated_queue(). This case
   is used for the special request based DM devices where the device
   gendisk is created before the queue initialization and device
   information (e.g. queue limits) is already known when the queue
   initialization is executed.

Additionally, to make sure that the elevator initialization is never
done while requests are in-flight (there should be none when the device
driver calls device_add_disk()), freeze and quiesce the device request
queue before calling blk_mq_init_sched() in elevator_init_mq().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

737eb78e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功