提交 · 69e55430e9eb2f6df76f2709a744b388d69d567b · openeuler / Kernel

09 3月, 2022 1 次提交

block: add a switch for precise iostat accounting · 69e55430

由 Zhang Wensheng 提交于 3月 09, 2022

hulk inclusion
category: bugfix
bugzilla: 39265, https://gitee.com/openeuler/kernel/issues/I4WC06
CVE: NA

-----------------------------------------------

When the inflight IOs are slow and no new IOs are issued, we expect
iostat could manifest the IO hang problem. However after
commit 9c6dea45 ("block: delete part_round_stats and switch to less
precise counting"), io_tick and time_in_queue will not be updated until
the end of IO, and the avgqu-sz and %util columns of iostat will be zero.

To fix it, we could fallback to the implementation before commit
9c6dea45, but it may cause performance regression on NVMe device
or bio-based device (due to overhead of inflight calculation),
so add a switch to control whether or not to use precise iostat
accounting. It can be enabled by adding "precise_iostat=1" in kernel
boot cmdline. When precise accouting is enabled, io_tick and time_in_queue
will be updated when accessing /proc/diskstats and
/sys/block/sdX/sdXN/stat.

Fixes: 9c6dea45 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

69e55430

17 1月, 2022 1 次提交

block: Add a helper to validate the block size · 996af2e0

由 Xie Yongji 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 570b1cac
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

There are some duplicated codes to validate the block
size in block drivers. This limitation actually comes
from block layer, so this patch tries to add a new block
layer helper for that.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20211026144015.188-2-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

996af2e0

09 12月, 2021 1 次提交

blk-mq: add a new queue flag to quiesce/unquiesce queue · d6784b9f

由 Yu Kuai 提交于 12月 09, 2021

hulk inclusion
category: bugfix
bugzilla: 173974
CVE: NA
---------------------------

Queue will be quiesced if the old or the new flag is set, and the
queue will be unqiesced if both flags is cleared.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d6784b9f

30 8月, 2021 1 次提交

blk-mq: fix divide by zero crash in tg_may_dispatch() · ed9f7876

由 Yu Kuai 提交于 8月 30, 2021

hulk inclusion
category: bugfix
bugzilla: 177149, https://gitee.com/openeuler/kernel/issues/I47R8R
CVE: NA

-----------------------------------------------

If blk-throttle is enabled and io is issued before
blk_throtl_register_queue() is done. Divide by zero crash will be
triggered in tg_may_dispatch() because 'throtl_slice' is uninitialized.

Thus introduce a new falg QUEUE_FLAG_THROTL_INIT_DONE. It will be set
after blk_throtl_register_queue() is done, and will be checked before
apply any config.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ed9f7876

16 8月, 2021 2 次提交

blk-mq: fix kabi broken by "blk-mq: fix hang caused by freeze/unfreeze sequence" · 2fd10d61

由 Yu Kuai 提交于 8月 16, 2021

hulk inclusion
category: bugfix
bugzilla: 173119
CVE: NA

-----------------------------------------------

Add struct request_queue_wrapper to avoid kabi broken.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2fd10d61

blk-mq: fix hang caused by freeze/unfreeze sequence · cbd9fe6e

由 Bob Liu 提交于 8月 16, 2021

mainline inclusion
from mainline-v5.2-rc2
commit 7996a8b5
category: bugfix
bugzilla: 173119
CVE: NA

-----------------------------------------------

The following is a description of a hang in blk_mq_freeze_queue_wait().
The hang happens on attempt to freeze a queue while another task does
queue unfreeze.

The root cause is an incorrect sequence of percpu_ref_resurrect() and
percpu_ref_kill() and as a result those two can be swapped:

 CPU#0                         CPU#1
 ----------------              -----------------
 q1 = blk_mq_init_queue(shared_tags)

                                q2 = blk_mq_init_queue(shared_tags):
                                  blk_mq_add_queue_tag_set(shared_tags):
                                    blk_mq_update_tag_set_depth(shared_tags):
				     list_for_each_entry()
                                      blk_mq_freeze_queue(q1)
                                       > percpu_ref_kill()
                                       > blk_mq_freeze_queue_wait()

 blk_cleanup_queue(q1)
  blk_mq_freeze_queue(q1)
   > percpu_ref_kill()
                 ^^^^^^ freeze_depth can't guarantee the order

                                      blk_mq_unfreeze_queue()
                                        > percpu_ref_resurrect()

   > blk_mq_freeze_queue_wait()
                 ^^^^^^ Hang here!!!!

This wrong sequence raises kernel warning:
percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release!
WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0

But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
which waits for a zero of a q_usage_counter, which never happens
because percpu-ref was reinited (instead of being killed) and stays in
PERCPU state forever.

How to reproduce:
 - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2"
 - cpu0: python Script.py 0; taskset the corresponding process running on cpu0
 - cpu1: python Script.py 1; taskset the corresponding process running on cpu1

 Script.py:
 ------
 #!/usr/bin/python3

import os
import sys

while True:
    on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
    off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
    os.system(on)
    os.system(off)
------

This bug was first reported and fixed by Roman, previous discussion:
[1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
[2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org
[3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cbd9fe6e

22 2月, 2021 1 次提交

scsi: do quiesce for enclosure driver · c3adeb60

由 Yufen Yu 提交于 2月 22, 2021

hulk inclusion
category: bugfix
bugzilla: 46860

--------------------------------

Drivers (such as scsi enclosure) will not call blk_register_queue()
to do initialize for request_queue. And we rely on driver self
to deal with the race in that case when cleanup queue.

But, some self-developed drivers cannot deal the race. To avoid
null pointer reference as following, we do quiesce in kernel.

[67760.308034] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[67760.308069] pc : blk_mq_do_dispatch_sched+0x94/0x130
[67760.308072] lr : blk_mq_sched_dispatch_requests+0x128/0x1f0
[67760.308072] sp : ffff0000b2bb3ca0
[67760.308073] x29: ffff0000b2bb3ca0 x28: 0000000000000000
[67760.308075] x27: 0000000000000000 x26: ffff00008128a000
[67760.308076] x25: ffff8042b13e9700 x24: 0000000000000000
[67760.308077] x23: ffff0000b2bb3cf8 x22: ffff8042b2976808
[67760.308078] x21: ffff0000b2bb3d58 x20: ffff00008128a000
[67760.308079] x19: ffff8042b2976800 x18: 0000000000000000
[67760.308080] x17: 0000000000000000 x16: 0000000000000000
[67760.308081] x15: 0000000000000000 x14: 0000000000000000
[67760.308082] x13: 0000000000000000 x12: 0000000000000000
[67760.308084] x11: ffff800040801004 x10: ffff80004080100c
[67760.308085] x9 : 0000000000000060 x8 : ffff809e58f7e500
[67760.308087] x7 : 0000000000000000 x6 : 00000000ffffffff
[67760.308088] x5 : ffff000080ab7550 x4 : ffff80418e84ec80
[67760.308089] x3 : ffff0000815d7f10 x2 : 94e133dc71839c00
[67760.308090] x1 : 0000000000000000 x0 : ffff00008128a748
[67760.308091] Call trace:
[67760.308093]  blk_mq_do_dispatch_sched+0x94/0x130
[67760.308095]  blk_mq_sched_dispatch_requests+0x128/0x1f0
[67760.308096]  __blk_mq_run_hw_queue+0x98/0x138
[67760.308097]  blk_mq_run_work_fn+0x28/0x38
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

c3adeb60

12 3月, 2020 1 次提交

blktrace: Protect q->blk_trace with RCU · 40b51620

由 Jan Kara 提交于 3月 12, 2020

mainline inclusion
from mainline-v5.6-rc4
commit c780e86d
category: bugfix
bugzilla: 13690
CVE: CVE-2019-19768

-------------------------------------------------

KASAN is reporting that __blk_add_trace() has a use-after-free issue
when accessing q->blk_trace. Indeed the switching of block tracing (and
thus eventual freeing of q->blk_trace) is completely unsynchronized with
the currently running tracing and thus it can happen that the blk_trace
structure is being freed just while __blk_add_trace() works on it.
Protect accesses to q->blk_trace by RCU during tracing and make sure we
wait for the end of RCU grace period when shutting down tracing. Luckily
that is rare enough event that we can afford that. Note that postponing
the freeing of blk_trace to an RCU callback should better be avoided as
it could have unexpected user visible side-effects as debugfs files
would be still existing for a short while block tracing has been shut
down.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711
CC: stable@vger.kernel.org
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reported-by: NTristan Madani <tristmd@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Conflicts:
  kernel/trace/blktrace.c
[yyl: adjust context]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

40b51620

27 12月, 2019 4 次提交

block, scsi: Change the preempt-only flag into a counter · 99a9d5a0

由 Bart Van Assche 提交于 8月 06, 2019

commit cd84a62e0078dce09f4ed349bec84f86c9d54b30 upstream.

The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
  are executed even if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

99a9d5a0

blk-mq: always free hctx after request queue is freed · d1642ed3

由 Ming Lei 提交于 6月 28, 2019

mainline inclusion
from mainline-5.2-rc1
commit 2f8f1336
category: bugfix
bugzilla: 14836
CVE: NA
---------------------------

In normal queue cleanup path, hctx is released after request queue
is freed, see blk_mq_release().

However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
of hw queues shrinking. This way is easy to cause use-after-free,
because: one implicit rule is that it is safe to call almost all block
layer APIs if the request queue is alive; and one hctx may be retrieved
by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
finally use-after-free is triggered.

Fixes this issue by always freeing hctx after releasing request queue.
If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
a per-queue list to hold them, then try to resuse these hctxs if numa
node is matched.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
	block/blk-mq.c
	include/linux/blk-mq.h
	include/linux/blkdev.h
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d1642ed3

kabi: reserve space for blk related structure · b8bcbd99

由 Tan Xiaojun 提交于 4月 10, 2019

hulk inclusion
category: feature
bugzilla: 13276
CVE: NA

-------------------------------

Reserve space for the structure in blk.
Signed-off-by: NTan Xiaojun <tanxiaojun@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b8bcbd99

blk-mq: not embed .mq_kobj and ctx->kobj into queue instance · d4118616

由 Ming Lei 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0-rc1
commit 1db4909e
category: bugfix
bugzilla: 5901
CVE: NA
---------------------------

Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime
from block layer's view, actually they don't because userspace may
grab one kobject anytime via sysfs.

This patch fixes the issue by the following approach:

1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing
all ctxs

2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release
handler of .mq_kobj

3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that
.mq_kobj is always released after all ctxs are freed.

This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE
is enabled.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d4118616

12 9月, 2018 1 次提交

blk-cgroup: increase number of supported policies · 01c5f85a

由 Jens Axboe 提交于 9月 11, 2018

After merging the iolatency policy, we potentially now have 4 policies
being registered, but only support 3. This causes one of them to fail
loading. Takashi reports that BFQ no longer works for him, because it
fails to load due to policy registration failure.

Bump to 5 policies, and also add a warning for when we have exceeded
the global amount. If we have to touch this again, we should switch
to a dynamic scheme instead.
Reported-by: NTakashi Iwai <tiwai@suse.de>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Tested-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

01c5f85a

09 8月, 2018 1 次提交

block: Remove two superfluous #include directives · b1f4267c

由 Bart Van Assche 提交于 8月 09, 2018

Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the
only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
did not remove the corresponding #include directives. Since these
include directives are no longer needed, remove them.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>,
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1f4267c

27 7月, 2018 1 次提交

block: move bio_integrity_{intervals,bytes} into blkdev.h · 359f6427

由 Greg Edwards 提交于 7月 25, 2018

This allows bio_integrity_bytes() to be called from drivers instead of
open coding it.
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Edwards <gedwards@ddn.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

359f6427

18 7月, 2018 1 次提交

block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb

由 Tejun Heo 提交于 7月 18, 2018

c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
read/write") replaced @OP with boolean @is_write, which limited the
amount of information going into ->rw_page() and more importantly
page_endio(), which removed the need to expose block internals to mm.

Unfortunately, we want to track discards separately and @is_write
isn't enough information.  This patch updates bdev_ops->rw_page() to
take REQ_OP instead but leaves page_endio() to take bool @is_write.
This allows the block part of operations to have enough information
while not leaking it to mm.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f289dcb

13 7月, 2018 1 次提交

block: remove blkdev_entry_to_request() macro · 05814a10

由 Vladimir Zapolskiy 提交于 7月 13, 2018

Remove blkdev_entry_to_request() macro, which remained unused through
the observable history, also note that it repeats list_entry_rq() macro
verbatim.
Signed-off-by: NVladimir Zapolskiy <vz@mleia.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

05814a10

09 7月, 2018 5 次提交

blk-rq-qos: refactor out common elements of blk-wbt · a7905043

由 Josef Bacik 提交于 7月 03, 2018

blkcg-qos is going to do essentially what wbt does, only on a cgroup
basis.  Break out the common code that will be shared between blkcg-qos
and wbt into blk-rq-qos.* so they can both utilize the same
infrastructure.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7905043

blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set() · 97889f9a

由 Ming Lei 提交于 6月 25, 2018

We have to remove synchronize_rcu() from blk_queue_cleanup(),
otherwise long delay can be caused during lun probe. For removing
it, we have to avoid to iterate the set->tag_list in IO path, eg,
blk_mq_sched_restart().

This patch reverts 5b79413946d (Revert "blk-mq: don't handle
TAG_SHARED in restart"). Given we have fixed enough IO hang issue,
and there isn't any reason to restart all queues in one tags any more,
see the following reasons:

1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(),
which can wake up queues waiting for driver tag.

2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue,
target or host is ready, but SCSI built-in restart can cover all these well,
see scsi_end_request(), queue will be rerun after any request initiated from
this host/target is completed.

In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30%
when running I/O on these 8 luns concurrently.

Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
Cc: Omar Sandoval <osandov@fb.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Reported-by: NAndrew Jones <drjones@redhat.com>
Tested-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

97889f9a

block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n · 6a5ac984

由 Bart Van Assche 提交于 6月 15, 2018

Exclude zoned block device members from struct request_queue for
CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building
the code that uses these struct request_queue members if
CONFIG_BLK_DEV_ZONED != n.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a5ac984

block: Inline blk_queue_nr_zones() · 7c8542b7

由 Bart Van Assche 提交于 6月 15, 2018

Since the implementation of blk_queue_nr_zones() is trivial and since
it only has a single caller, inline this function.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c8542b7

block: Remove bdev_nr_zones() · 6b1d83d2

由 Bart Van Assche 提交于 6月 15, 2018

Remove this function since it has no callers. This function was
introduced in commit 6cc77e9c ("block: introduce zoned block
devices zone write locking").
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matias Bjorling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b1d83d2

27 6月, 2018 1 次提交

block: Fix transfer when chunk sectors exceeds max · 15bfd21f

由 Keith Busch 提交于 6月 26, 2018

A device may have boundary restrictions where the number of sectors
between boundaries exceeds its max transfer size. In this case, we need
to cap the max size to the smaller of the two limits.
Reported-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com>
Tested-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

15bfd21f

15 6月, 2018 1 次提交

block: remov blk_queue_invalidate_tags · be7f99c5

由 Christoph Hellwig 提交于 6月 15, 2018

This function is entirely unused, so remove it and the tag_queue_busy
member of struct request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be7f99c5

14 6月, 2018 1 次提交

blk-mq: don't time out requests again that are in the timeout handler · da661267

由 Christoph Hellwig 提交于 6月 14, 2018

We can currently call the timeout handler again on a request that has
already been handed over to the timeout handler.  Prevent that with a new
flag.

Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
Reported-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
Tested-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da661267

31 5月, 2018 1 次提交

block: convert bounce, q->bio_split to bioset_init()/mempool_init() · 338aa96d

由 Kent Overstreet 提交于 5月 20, 2018

Convert the core block functionality to embedded bio sets.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

338aa96d

29 5月, 2018 5 次提交

block: move ->timeout request member · 0b7576d8

由 Jens Axboe 提交于 5月 29, 2018

After the recent timeout handling changes, we have two holes in
the struct. Move the timeout near the deadline, killing both,
and moving related members closer together. On my config on
x86-64, this shrinks struct request from 312 to 304 bytes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b7576d8

block: document the blk_eh_timer_return values · 88b0cfad

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

88b0cfad

block: remove BLK_EH_HANDLED · f6e7d48a

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6e7d48a

block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE · 6600593c

由 Christoph Hellwig 提交于 5月 29, 2018

The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command.  Fix the symbolic name to reflect that a little better.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6600593c

blk-mq: Remove generation seqeunce · 12f5b931

由 Keith Busch 提交于 5月 29, 2018

This patch simplifies the timeout handling by relying on the request
reference counting to ensure the iterator is operating on an inflight
and truly timed out request. Since the reference counting prevents the
tag from being reallocated, the block layer no longer needs to prevent
drivers from completing their requests while the timeout handler is
operating on it: a driver completing a request is allowed to proceed to
the next state without additional syncronization with the block layer.

This also removes any need for generation sequence numbers since the
request lifetime is prevented from being reallocated as a new sequence
while timeout handling is operating on it.

To enables this a refcount is added to struct request so that request
users can be sure they're operating on the same request without it
changing while they're processing it.  The request's tag won't be
released for reuse until both the timeout handler and the completion
are done with it.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[hch: slight cleanups, added back submission side hctx lock, use cmpxchg
 for completions]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

12f5b931

14 5月, 2018 1 次提交

block: sanitize blk_get_request calling conventions · ff005a06

由 Christoph Hellwig 提交于 5月 09, 2018

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff005a06

09 5月, 2018 3 次提交

block: consolidate struct request timestamp fields · 522a7775

由 Omar Sandoval 提交于 5月 09, 2018

Currently, struct request has four timestamp fields:

- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
  used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq

These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

522a7775

block: use ktime_get_ns() instead of sched_clock() for cfq and bfq · 84c7afce

由 Omar Sandoval 提交于 5月 09, 2018

cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e ("block: disable preemption before
using sched_clock()").
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84c7afce

block: get rid of struct blk_issue_stat · 544ccc8d

由 Omar Sandoval 提交于 5月 09, 2018

struct blk_issue_stat squashes three things into one u64:

- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling

It turns out that on x86_64, we have a 4 byte hole in struct request
which we can fill with the non-timestamp fields from blk_issue_stat,
simplifying things quite a bit.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

544ccc8d

19 4月, 2018 1 次提交

scsi: sd_zbc: Avoid that resetting a zone fails sporadically · ccce20fc

由 Bart Van Assche 提交于 4月 16, 2018

Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is
called from sd_probe_async() and since sd_revalidate_disk() calls
sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
concurrently with blkdev_report_zones() and/or blkdev_reset_zones().  That can
cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets
q->nr_zones to zero before restoring it to the actual value, even if no drive
characteristics have changed.  Avoid that this can happen by making the
following changes:

- Protect the code that updates zone information with blk_queue_enter()
  and blk_queue_exit().
- Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
  these functions do not modify struct scsi_disk before all zone
  information has been obtained.

Note: since commit 055f6e18 ("block: Make q_usage_counter also track
legacy requests"; kernel v4.15) the request queue freezing mechanism also
affects legacy request queues.

Fixes: 89d94756 ("sd: Implement support for ZBC devices")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # v4.16
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ccce20fc

18 4月, 2018 1 次提交

block: add blk_queue_fua() helper function · 0ce91444

由 Dave Chinner 提交于 4月 18, 2018

So we can check FUA support status from the iomap direct IO code.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0ce91444

18 3月, 2018 1 次提交

block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21

由 Bart Van Assche 提交于 3月 14, 2018

It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.

Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.

Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

233bde21

09 3月, 2018 2 次提交

block: Move the queue_flag_*() functions from a public into a private header file · 8a0ac14b

由 Bart Van Assche 提交于 3月 07, 2018

This patch helps to avoid that new code gets introduced in block drivers
that manipulates queue flags without holding the queue lock when that
lock should be held.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a0ac14b

block: Complain if queue_flag_(set|clear)_unlocked() is abused · 1db2008b

由 Bart Van Assche 提交于 3月 07, 2018

Since it is not safe to use queue_flag_(set|clear)_unlocked()
without holding the queue lock after the sysfs entries for a
queue have been created, complain if this happens.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1db2008b

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功