提交 · b6866318657717c8914673a6394894d12bc9ff5e · openeuler / Kernel

22 11月, 2019 1 次提交

block: add iostat counters for flush requests · b6866318

由 Konstantin Khlebnikov 提交于 11月 21, 2019

Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.

Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.

This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6866318

21 11月, 2019 1 次提交

block,bfq: Skip tracing hooks if possible · 40d47c15

由 Dmitry Monakhov 提交于 11月 01, 2019

In most cases blk_tracing is not active, but  bfq_log_bfqq macro
generate pid_str unconditionally, which result in significant overhead.

## Test
modprobe null_blk
echo bfq > /sys/block/nullb0/queue/scheduler
fio --name=t --ioengine=libaio --direct=1 --filename=/dev/nullb0 \
   --runtime=30 --time_based=1 --rw=write --iodepth=128 --bs=4k

# Results
|        | baseline | w/ patch | gain |
| iops   | 113.19K  | 126.42K  | +11% |
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

40d47c15

19 11月, 2019 1 次提交

block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64' · c6da429e

由 Revanth Rajashekar 提交于 11月 08, 2019

In function 'activate_lsp', rather than hard-coding the short atom
header(0x83), we need to let the function 'add_short_atom_header' append
the header based on the parameter being appended.

The parameter has been defined in Section 3.1.2.1 of
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdfReviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6da429e

18 11月, 2019 1 次提交

block: Don't disable interrupts in trigger_softirq() · de678bc6

由 Sebastian Andrzej Siewior 提交于 11月 18, 2019

trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.

Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de678bc6

14 11月, 2019 1 次提交

blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue() · cb711b91

由 John Garry 提交于 11月 14, 2019

These functions are not referenced, so delete them.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cb711b91

08 11月, 2019 8 次提交

block: split bio if the only bvec's length is > SZ_4K · 6952a7f8

由 Ming Lei 提交于 11月 08, 2019

64K PAGE_SIZE is popular on ARM64 or other ARCHs, and 64K has been big
enough to break some devices probably, so change the logic to split bio
if the only bvec's length is > SZ_4K instead of PAGE_SIZE.

Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations)
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6952a7f8

block: still try to split bio if the bvec crosses pages · 59db8ba2

由 Ming Lei 提交于 11月 08, 2019

Some device may set segment boundary as PAGE_SIZE - 1. If the bvec
crosses pages, and meantime its length is <= PAGE_SIZE, we still need
to split the bvec into 2 segments.

Fixes this issue by still splitting bio if the single bvec crosses
pages.
Reported-by: Nkernel test robot <lkp@intel.com>
Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations)
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59db8ba2

blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT · 1d156646

由 Tejun Heo 提交于 11月 07, 2019

blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
cgroup1.  Let's move it into its own files and gate it behind a config
option.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1d156646

blk-cgroup: reimplement basic IO stats using cgroup rstat · f7331648

由 Tejun Heo 提交于 11月 07, 2019

blk-cgroup has been using blkg_rwstat to track basic IO stats.
Unfortunately, reading recursive stats scales badly as itinvolves
walking all descendants.  On systems with a huge number of cgroups
(dead or alive), this can lead to substantial CPU cost when reading IO
stats.

This patch reimplements basic IO stats using cgroup rstat which uses
more memory but makes recursive stat reading O(# descendants which
have been active since last reading) instead of O(# descendants).

* blk-cgroup core no longer uses sync/async stats.  Introduce new stat
  enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}.

* Add blkg_iostat[_set] which encapsulates byte and io stats, last
  values for propagation delta calculation and u64_stats_sync for
  correctness on 32bit archs.

* Update the new percpu stat counters directly and implement
  blkcg_rstat_flush() to implement propagation.

* blkg_print_stat() can now bring the stats up to date by calling
  cgroup_rstat_flush() and print them instead of directly summing up
  all descendants.

* It now allocates 96 bytes per cpu.  It used to be 40 bytes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Dan Schatzberg <dschatzberg@fb.com>
Cc: Daniel Xu <dlxu@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7331648

blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive() · 8a80d5d6

由 Tejun Heo 提交于 11月 07, 2019

These don't have users anymore.  Remove them.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a80d5d6

blk-throtl: stop using blkg->stat_bytes and ->stat_ios · 7ca46438

由 Tejun Heo 提交于 11月 07, 2019

When used on cgroup1, blk-throtl uses the blkg->stat_bytes and
->stat_ios from blk-cgroup core to populate four stat knobs.
blk-cgroup core is moving away from blkg_rwstat to improve scalability
and won't be able to support this usage.

It isn't like the sharing gains all that much.  Let's break them out
to dedicated rwstat counters which are updated when on cgroup1.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ca46438

bfq-iosched: stop using blkg->stat_bytes and ->stat_ios · fd41e603

由 Tejun Heo 提交于 11月 07, 2019

When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios
from blk-cgroup core to populate six stat knobs.  blk-cgroup core is
moving away from blkg_rwstat to improve scalability and won't be able
to support this usage.

It isn't like the sharing gains all that much.  Let's break it out to
dedicated rwstat counters which are updated when on cgroup1.  This
makes use of bfqg_*rwstat*() helpers outside of
CONFIG_BFQ_CGROUP_DEBUG.  Move them out.

v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fd41e603

bfq-iosched: relocate bfqg_*rwstat*() helpers · a557f1c7

由 Tejun Heo 提交于 11月 07, 2019

Collect them right under #ifdef CONFIG_BFQ_CGROUP_DEBUG.  The next
patch will use them from !DEBUG path and this makes it easy to move
them out of the ifdef block.

This is pure code reorganization.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a557f1c7

07 11月, 2019 5 次提交

block: add zone open, close and finish ioctl support · e876df1f

由 Ajay Joshi 提交于 10月 27, 2019

Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and
BLKFINISHZONE to allow applications to control the condition of zones
on a zoned block device through the execution of the REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: NJavier González <javier@javigon.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAjay Joshi <ajay.joshi@wdc.com>
Signed-off-by: NMatias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: NHans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e876df1f

block: add zone open, close and finish operations · 6c1b1da5

由 Ajay Joshi 提交于 10月 27, 2019

Zoned block devices (ZBC and ZAC devices) allow an explicit control
over the condition (state) of zones. The operations allowed are:
* Open a zone: Transition to open condition to indicate that a zone will
  actively be written
* Close a zone: Transition to closed condition to release the drive
  resources used for writing to a zone
* Finish a zone: Transition an open or closed zone to the full
  condition to prevent write operations

To enable this control for in-kernel zoned block device users, define
the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE
and REQ_OP_ZONE_FINISH as well as the generic function
blkdev_zone_mgmt() for submitting these operations on a range of zones.
This results in blkdev_reset_zones() removal and replacement with this
new zone magement function. Users of blkdev_reset_zones() (f2fs and
dm-zoned) are updated accordingly.

Contains contributions from Matias Bjorling, Hans Holmberg,
Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig.
Reviewed-by: NJavier González <javier@javigon.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAjay Joshi <ajay.joshi@wdc.com>
Signed-off-by: NMatias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: NHans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: NDmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6c1b1da5

block: Simplify REQ_OP_ZONE_RESET_ALL handling · c7a1d926

由 Damien Le Moal 提交于 10月 27, 2019

There is no need for the function __blkdev_reset_all_zones() as
REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones()
bio loop with an early break from the loop. This patch removes this
function and modifies blkdev_reset_zones(), simplifying the code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7a1d926

block: Remove REQ_OP_ZONE_RESET plugging · a84324d2

由 Damien Le Moal 提交于 10月 27, 2019

REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests
do not have a size and are never sequential due to the zone start sector
position required for their execution. As a result, there is no point in
using a plug around blkdev_reset_zones() bio issuing loop. This patch
removes this unnecessary plugging.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NJavier González <javier@javigon.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a84324d2

blkcg: make blkcg_print_stat() print stats only for online blkgs · b0814361

由 Tejun Heo 提交于 11月 05, 2019

blkcg_print_stat() iterates blkgs under RCU and doesn't test whether
the blkg is online.  This can call into pd_stat_fn() on a pd which is
still being initialized leading to an oops.

The heaviest operation - recursively summing up rwstat counters - is
already done while holding the queue_lock.  Expand queue_lock to cover
the other operations and skip the blkg if it isn't online yet.  The
online state is protected by both blkcg and queue locks, so this
guarantees that only online blkgs are processed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRoman Gushchin <guro@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Fixes: 903d23f0 ("blk-cgroup: allow controllers to output their own stats")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0814361

06 11月, 2019 1 次提交

block: Warn if elevator= parameter is used · f8db3835

由 Jan Kara 提交于 11月 06, 2019

With transition to blk-mq, the elevator= kernel argument was removed as
it makes less and less sense with the current variety of devices.  Since
this may surprise some users and there are advices on the Internet that
still suggest to use it, let's at least warn if the parameter is used.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8db3835

05 11月, 2019 1 次提交

block: avoid blk_bio_segment_split for small I/O operations · fa532287

由 Christoph Hellwig 提交于 11月 04, 2019

__blk_queue_split() adds significant overhead for small I/O operations.
Add a shortcut to avoid it for cases where we know we never need to
split.

Based on a patch from Ming Lei.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa532287

04 11月, 2019 4 次提交

blk-mq: make sure that line break can be printed · d2c9be89

由 Ming Lei 提交于 11月 04, 2019

8962842c ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
avoids sysfs buffer overflow, and reserves one character for line break.
However, the last snprintf() doesn't get correct 'size' parameter passed
in, so fixed it.

Fixes: 8962842c ("blk-mq: avoid sysfs buffer overflow with too many CPU cores")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d2c9be89

block: sed-opal: Introduce Opal Datastore UID · 62c441c6

由 Revanth Rajashekar 提交于 10月 31, 2019

This patch introduces Opal Datastore UID.
The generic read/write table ioctl can use this UID
to access the Opal Datastore.
Reviewed-by: NScott Bauer <sbauer@plzdonthack.me>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62c441c6

block: sed-opal: Add support to read/write opal tables generically · 51f421c8

由 Revanth Rajashekar 提交于 10月 31, 2019

This feature gives the user RW access to any opal table with admin1
authority. The flags described in the new structure determines if the user
wants to read/write the data. Flags are checked for valid values in
order to allow future features to be added to the ioctl.

The user can provide the desired table's UID. Also, the ioctl provides a
size and offset field and internally will loop data accesses to return
the full data block. Read overrun is prevented by the initiator's
sec_send_recv() backend. The ioctl provides a private field with the
intention to accommodate any future expansions to the ioctl.
Reviewed-by: NScott Bauer <sbauer@plzdonthack.me>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

51f421c8

block: sed-opal: Generalizing write data to any opal table · 3495ea1b

由 Revanth Rajashekar 提交于 10月 31, 2019

This patch refactors the existing "write_shadowmbr" func and
creates a new generalized function "generic_table_write_data",
to write data to any opal table. Also, a few cleanups are included
in this patch.
Reviewed-by: NScott Bauer <sbauer@plzdonthack.me>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3495ea1b

02 11月, 2019 1 次提交

blk-mq: avoid sysfs buffer overflow with too many CPU cores · 8962842c

由 Ming Lei 提交于 11月 02, 2019

It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.

Use snprintf to avoid the potential buffer overflow.

This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.

Cc: stable@vger.kernel.org
Fixes: 676141e4("blk-mq: don't dump CPU -> hw queue map on driver load")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8962842c

01 11月, 2019 2 次提交

blk-mq: Make blk_mq_run_hw_queue() return void · 626fb735

由 John Garry 提交于 10月 30, 2019

Since commit 97889f9a ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()"), the return value of blk_mq_run_hw_queue()
is never checked, so make it return void, which very marginally simplifies
the code.
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

626fb735

iocost: don't nest spin_lock_irq in ioc_weight_write() · 41591a51

由 Dan Carpenter 提交于 10月 31, 2019

This code causes a static analysis warning:

    block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'

We disable IRQs in blkg_conf_prep() and re-enable them in
blkg_conf_finish().  IRQ disable/enable should not be nested because
that means the IRQs will be enabled at the first unlock instead of the
second one.

Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41591a51

26 10月, 2019 4 次提交

blk-mq: remove needless goto from blk_mq_get_driver_tag · 1fead718

由 André Almeida 提交于 10月 25, 2019

The only usage of the label "done" is when (rq->tag != -1) at the
beginning of the function. Rather than jumping to label, we can just
remove this label and execute the code at the "if". Besides that, the
code that would be executed after the label "done" is the return of the
logical expression (rq->tag != -1) but since we are already inside the
if, we now that this is true. Remove the label and replace the goto with
the proper result of the label.
Signed-off-by: NAndré Almeida <andrealmeid@collabora.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1fead718

block: Reduce the amount of memory used for tag sets · f7e76dbc

由 Bart Van Assche 提交于 10月 25, 2019

Instead of allocating an array of size nr_cpu_ids for set->tags, allocate
an array of size set->nr_hw_queues. This patch improves behavior that was
introduced by commit 868f2f0b ("blk-mq: dynamic h/w context count").

Reallocating tag sets from inside __blk_mq_update_nr_hw_queues() is safe
because:
- All request queues that share the tag sets are frozen before the tag sets
  are reallocated.
- blk_mq_queue_tag_busy_iter() holds q->q_usage_counter while active and
  hence is serialized against __blk_mq_update_nr_hw_queues().

Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7e76dbc

block: Reduce the amount of memory required per request queue · ac0d6b92

由 Bart Van Assche 提交于 10月 25, 2019

Instead of always allocating at least nr_cpu_ids hardware queues per request
queue, reallocate q->queue_hw_ctx if it has to grow. This patch improves
behavior that was introduced by commit 868f2f0b ("blk-mq: dynamic h/w
context count").

Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ac0d6b92

block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() · a9a80808

由 Bart Van Assche 提交于 10月 25, 2019

Since the blk_mq_{,un}freeze_queue() calls in __blk_mq_update_nr_hw_queues()
already serialize __blk_mq_update_nr_hw_queues() against
blk_mq_queue_tag_busy_iter(), the synchronize_rcu() call in
__blk_mq_update_nr_hw_queues() is not necessary. Hence remove it.

Note: the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() was
introduced by commit f5bbbbe4 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter"). Commit 530ca2c9 ("blk-mq: Allow blocking
queue tag iter callbacks") removed the rcu_read_{,un}lock() calls that
correspond to the synchronize_rcu() call in __blk_mq_update_nr_hw_queues().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a9a80808

16 10月, 2019 2 次提交

blk-rq-qos: fix first node deletion of rq_qos_del() · 307f4065

由 Tejun Heo 提交于 10月 15, 2019

rq_qos_del() incorrectly assigns the node being deleted to the head if
it was the first on the list in the !prev path.  Fix it by iterating
with ** instead.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: NJens Axboe <axboe@kernel.dk>

307f4065

blkcg: Fix multiple bugs in blkcg_activate_policy() · 9d179b86

由 Tejun Heo 提交于 10月 15, 2019

blkcg_activate_policy() has the following bugs.

* cf09a8ee ("blkcg: pass @q and @blkcg into
  blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
  blkcg_activate_policy() ends up using pd's allocated for the root
  blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
  can be passed in pd's which are allocated for the root blkcg.

  For blk-iocost, this means that ->pd_init_fn() can write beyond the
  end of the allocated object as it determines the length of the flex
  array at the end based on the blkcg's nesting level.

* Each pd is initialized as they get allocated.  If alloc fails, the
  policy will get freed with pd's initialized on it.

* After the above partial failure, the partial pds are not freed.

This patch fixes all the above issues by

* Restructuring blkcg_activate_policy() so that alloc and init passes
  are separate.  Init takes place only after all allocs succeeded and
  on failure all allocated pds are freed.

* Unifying and fixing the cleanup of the remaining pd_prealloc.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: cf09a8ee ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9d179b86

15 10月, 2019 1 次提交

block: Fix elv_support_iosched() · 7a7c5e71

由 Damien Le Moal 提交于 10月 09, 2019

A BIO based request queue does not have a tag_set, which prevent testing
for the flag BLK_MQ_F_NO_SCHED indicating that the queue does not
require an elevator. This leads to an incorrect initialization of a
default elevator in some cases such as BIO based null_blk
(queue_mode == BIO) with zoned mode enabled as the default elevator in
this case is mq-deadline instead of "none".

Fix this by testing for a NULL queue mq_ops field which indicates that
the queue is BIO based and should not have an elevator.
Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7a7c5e71

11 10月, 2019 1 次提交

block: account statistics for passthrough requests · 48d9b0d4

由 Logan Gunthorpe 提交于 10月 10, 2019

Presently, passthrough requests are not accounted for because
blk_do_io_stat() expressly rejects them. Based on some digging
in the history, this doesn't seem like a concious decision but
one that evolved from the change from blk_fs_request() to
blk_rq_is_passthrough().

To support this, call blk_account_io_start() in blk_execute_rq_nowait()
and remove the passthrough check in blk_do_io_stat().

Link: https://lore.kernel.org/linux-block/20191010100526.GA27209@lst.de/Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48d9b0d4

08 10月, 2019 1 次提交

blk-stat: Optimise blk_stat_add() · 8148f0b5

由 Pavel Begunkov 提交于 10月 08, 2019

blk_stat_add() calls {get,put}_cpu_ptr() in a loop, which entails
overhead of disabling/enabling preemption. The loop is under RCU
(i.e.short) anyway, so do get_cpu() in advance.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8148f0b5

07 10月, 2019 4 次提交

blk-mq: Embed counters into struct mq_inflight · a2e80f6f

由 Pavel Begunkov 提交于 9月 30, 2019

Store inflight counters immediately in struct mq_inflight.
That's type-safer and removes extra indirection.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a2e80f6f

blk-mq: Reuse callback in blk_mq_in_flight*() · bb4e6b14

由 Pavel Begunkov 提交于 9月 30, 2019

Reuse a more generic callback in both blk_mq_in_flight() and
blk_mq_in_flight_rw().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb4e6b14

blk-mq: Inline status checkers · 27a46989

由 Pavel Begunkov 提交于 9月 30, 2019

blk_mq_request_completed() and blk_mq_request_started() are
short, inline it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27a46989

block: Reduce sysfs_lock locking inside blk_cleanup_queue() · 73f1c77e

由 Bart Van Assche 提交于 9月 30, 2019

Since blk_cleanup_queue() is called after blk_unregister_queue() and
since that last function removes all sysfs attributes, serializing
any code in blk_cleanup_queue() against sysfs callback methods nor against
I/O scheduler changes is necessary. Hence remove the syfs_lock locking
calls from the start of blk_cleanup_queue().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

73f1c77e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功