提交 · 671fae5e51297fc76b3758ca2edd514858734a6a · openeuler / Kernel

24 10月, 2022 4 次提交

blk-wbt: don't enable throttling if default elevator is bfq · 671fae5e

由 Yu Kuai 提交于 10月 19, 2022

Commit b5dc5d4d ("block,bfq: Disable writeback throttling") tries to
disable wbt for bfq, it's done by calling wbt_disable_default() in
bfq_init_queue(). However, wbt is still enabled if default elevator is
bfq:

device_add_disk
 elevator_init_mq
  bfq_init_queue
   wbt_disable_default -> done nothing

 blk_register_queue
  wbt_enable_default -> wbt is enabled

Fix the problem by adding a new flag ELEVATOR_FLAG_DISBALE_WBT, bfq
will set the flag in bfq_init_queue, and following wbt_enable_default()
won't enable wbt while the flag is set.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-7-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

671fae5e

blk-wbt: don't show valid wbt_lat_usec in sysfs while wbt is disabled · 3642ef4d

由 Yu Kuai 提交于 10月 19, 2022

Currently, if wbt is initialized and then disabled by
wbt_disable_default(), sysfs will still show valid wbt_lat_usec, which
will confuse users that wbt is still enabled.

This patch shows wbt_lat_usec as zero if it's disabled.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reported-and-tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-5-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

3642ef4d

blk-wbt: make enable_state more accurate · a9a236d2

由 Yu Kuai 提交于 10月 19, 2022

Currently, if user disable wbt through sysfs, 'enable_state' will be
'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state
'WBT_STATE_OFF_MANUAL' to cover that case.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-4-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a9a236d2

blk-wbt: remove unnecessary check in wbt_enable_default() · b11d31ae

由 Yu Kuai 提交于 10月 19, 2022

If CONFIG_BLK_WBT_MQ is disabled, wbt_init() won't do anything.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-3-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b11d31ae

09 10月, 2022 1 次提交

blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() · 285febab

由 Yu Kuai 提交于 10月 09, 2022

commit 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is
initialized") moves wbt_set_write_cache() before rq_qos_add(), which
is wrong because wbt_rq_qos() is still NULL.

Fix the problem by removing wbt_set_write_cache() and setting 'rwb->wc'
directly. Noted that this patch also remove the redundant setting of
'rab->wc'.

Fixes: 8c5035df ("blk-wbt: call rq_qos_add() after wb_normal is initialized")
Reported-by: Nkernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210081045.77ddf59b-yujie.liu@intel.comSigned-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20221009101038.1692875-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

285febab

21 9月, 2022 1 次提交

blk-wbt: call rq_qos_add() after wb_normal is initialized · 8c5035df

由 Yu Kuai 提交于 9月 13, 2022

Our test found a problem that wbt inflight counter is negative, which
will cause io hang(noted that this problem doesn't exist in mainline):

t1: device create	t2: issue io
add_disk
 blk_register_queue
  wbt_enable_default
   wbt_init
    rq_qos_add
    // wb_normal is still 0
			/*
			 * in mainline, disk can't be opened before
			 * bdev_add(), however, in old kernels, disk
			 * can be opened before blk_register_queue().
			 */
			blkdev_issue_flush
                        // disk size is 0, however, it's not checked
                         submit_bio_wait
                          submit_bio
                           blk_mq_submit_bio
                            rq_qos_throttle
                             wbt_wait
			      bio_to_wbt_flags
                               rwb_enabled
			       // wb_normal is 0, inflight is not increased

    wbt_queue_depth_changed(&rwb->rqos);
     wbt_update_limits
     // wb_normal is initialized
                            rq_qos_track
                             wbt_track
                              rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
			      // wb_normal is not 0，wbt_flags will be set
t3: io completion
blk_mq_free_request
 rq_qos_done
  wbt_done
   wbt_is_tracked
   // return true
   __wbt_done
    wbt_rqw_done
     atomic_dec_return(&rqw->inflight);
     // inflight is decreased

commit 8235b5c1 ("block: call bdev_add later in device_add_disk") can
avoid this problem, however it's better to fix this problem in wbt:

1) Lower kernel can't backport this patch due to lots of refactor.
2) Root cause is that wbt call rq_qos_add() before wb_normal is
initialized.

Fixes: e34cbd30 ("blk-wbt: add general throttling mechanism")
Cc: <stable@vger.kernel.org>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

8c5035df

20 7月, 2022 1 次提交

block: don't allow the same type rq_qos add more than once · 14a6e2eb

由 Jinke Han 提交于 7月 20, 2022

In our test of iocost, we encountered some list add/del corruptions of
inner_walk list in ioc_timer_fn.

The reason can be described as follows:

cpu 0					cpu 1
ioc_qos_write				ioc_qos_write

ioc = q_to_ioc(queue);
if (!ioc) {
        ioc = kzalloc();
					ioc = q_to_ioc(queue);
					if (!ioc) {
						ioc = kzalloc();
						...
						rq_qos_add(q, rqos);
					}
        ...
        rq_qos_add(q, rqos);
        ...
}

When the io.cost.qos file is written by two cpus concurrently, rq_qos may
be added to one disk twice. In that case, there will be two iocs enabled
and running on one disk. They own different iocgs on their active list. In
the ioc_timer_fn function, because of the iocgs from two iocs have the
same root iocg, the root iocg's walk_list may be overwritten by each other
and this leads to list add/del corruptions in building or destroying the
inner_walk list.

And so far, the blk-rq-qos framework works in case that one instance for
one type rq_qos per queue by default. This patch make this explicit and
also fix the crash above.
Signed-off-by: NJinke Han <hanjinke.666@bytedance.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

14a6e2eb

15 7月, 2022 2 次提交

block: Use the new blk_opf_t type · 16458cf3

由 Bart Van Assche 提交于 7月 14, 2022

Use the new blk_opf_t type for arguments and variables that represent
request flags or a bitwise combination of a request operation and
request flags. Rename the function arguments and also a structure member
that hold a request operation and flags from 'rw' into 'opf'.

This patch does not change any functionality.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

16458cf3

block: Use enum req_op where appropriate · 77e7ffd7

由 Bart Van Assche 提交于 7月 14, 2022

Change the type of the arguments that are used to pass a REQ_OP_* value
from int or unsigned int into enum req_op to improve static type
checking.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-3-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

77e7ffd7

19 10月, 2021 1 次提交

blk-wbt: prevent NULL pointer dereference in wb_timer_fn · 480d42dc

由 Andrea Righi 提交于 10月 19, 2021

The timer callback used to evaluate if the latency is exceeded can be
executed after the corresponding disk has been released, causing the
following NULL pointer dereference:

[ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ 119.987617] #PF: supervisor read access in kernel mode
[ 119.987971] #PF: error_code(0x0000) - not-present page
[ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0
[ 119.988697] Oops: 0000 [#1] SMP NOPTI
[ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi
[ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
[ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0
[ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
[ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246
[ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
[ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780
[ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000
[ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500
[ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00
[ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000
[ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0
[ 119.995906] Call Trace:
[ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30
[ 119.996505] blk_stat_timer_fn+0x138/0x140
[ 119.996830] call_timer_fn+0x2b/0x100
[ 119.997136] __run_timers.part.0+0x1d1/0x240
[ 119.997470] ? kvm_clock_get_cycles+0x11/0x20
[ 119.997826] ? ktime_get+0x3e/0xa0
[ 119.998110] ? native_apic_msr_write+0x2c/0x30
[ 119.998456] ? lapic_next_event+0x20/0x30
[ 119.998779] ? clockevents_program_event+0x94/0xf0
[ 119.999150] run_timer_softirq+0x2a/0x50
[ 119.999465] __do_softirq+0xcb/0x26f
[ 119.999764] irq_exit_rcu+0x8c/0xb0
[ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90
[ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20

In this case simply return from the timer callback (no action
required) to prevent the NULL pointer dereference.

BugLink: https://bugs.launchpad.net/bugs/1947557
Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/
Fixes: 34dbad5d ("blk-stat: convert to callback-based statistics reporting")
Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktopSigned-off-by: NJens Axboe <axboe@kernel.dk>

480d42dc

24 8月, 2021 1 次提交

block: add an explicit ->disk backpointer to the request_queue · d152c682

由 Christoph Hellwig 提交于 8月 16, 2021

Replace the magic lookup through the kobject tree with an explicit
backpointer, given that the device model links are set up and torn
down at times when I/O is still possible, leading to potential
NULL or invalid pointer dereferences.

Fixes: edb0872f ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Nsyzbot <syzbot+aa0801b6b32dca9dda82@syzkaller.appspotmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NSven Schnelle <svens@linux.ibm.com>
Link: https://lore.kernel.org/r/20210816134624.GA24234@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d152c682

10 8月, 2021 1 次提交

block: move the bdi from the request_queue to the gendisk · edb0872f

由 Christoph Hellwig 提交于 8月 09, 2021

The backing device information only makes sense for file system I/O,
and thus belongs into the gendisk and not the lower level request_queue
structure. Move it there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

edb0872f

22 6月, 2021 2 次提交

blk-wbt: make sure throttle is enabled properly · 76a80408

由 Zhang Yi 提交于 6月 19, 2021

After commit a7905043 ("blk-rq-qos: refactor out common elements of
blk-wbt"), if throttle was disabled by wbt_disable_default(), we could
not enable again, fix this by set enable_state back to
WBT_STATE_ON_DEFAULT.

Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210619093700.920393-3-yi.zhang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

76a80408

blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() · 1d0903d6

由 Zhang Yi 提交于 6月 19, 2021

Now that we disable wbt by simply zero out rwb->wb_normal in
wbt_disable_default() when switch elevator to bfq, but it's not safe
because it will become false positive if we change queue depth. If it
become false positive between wbt_wait() and wbt_track() when submit
write request, it will lead to drop rqw->inflight to -1 in wbt_done(),
which will end up trigger IO hung. Fix this issue by introduce a new
state which mean the wbt was disabled.

Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210619093700.920393-2-yi.zhang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1d0903d6

19 6月, 2021 1 次提交

blk-wbt: remove outdated comment · a79da21b

由 lijiazi 提交于 6月 18, 2021

Now wbt_wait() returns void, so remove now outdated comment.
Signed-off-by: Nlijiazi <lijiazi@xiaomi.com>
Link: https://lore.kernel.org/r/1623986240-13878-1-git-send-email-lijiazi@xiaomi.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a79da21b

27 1月, 2021 1 次提交

blk: wbt: remove unused parameter from wbt_should_throttle · 482e302a

由 Lei Chen 提交于 1月 25, 2021

The first parameter rwb is not used for this function.
So just remove it.
Signed-off-by: NLei Chen <lennychen@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

482e302a

01 12月, 2020 1 次提交

block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init · 5a20d073

由 Lei Chen 提交于 11月 30, 2020

It's unnecessary to call wbt_update_limits explicitly within wbt_init,
because it will be called in the following function wbt_queue_depth_changed.
Signed-off-by: NLei Chen <lennychen@tencent.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a20d073

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

30 5月, 2020 2 次提交

blk-wbt: rename __wbt_update_limits to wbt_update_limits · 4d89e1d1

由 Guoqing Jiang 提交于 5月 09, 2020

Now let's rename __wbt_update_limits to wbt_update_limits after the
previous one is deleted.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d89e1d1

blk-wbt: remove wbt_update_limits · 26e0ca12

由 Guoqing Jiang 提交于 5月 09, 2020

No one call this function after commit 2af2783f ("rq-qos: get rid of
redundant wbt_update_limits()"), so remove it.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26e0ca12

17 4月, 2020 1 次提交

blk-wbt: Use tracepoint_string() for wbt_step tracepoint string literals · 3a89c25d

由 Tommi Rantala 提交于 4月 17, 2020

Use tracepoint_string() for string literals that are used in the
wbt_step tracepoint, so that userspace tools can display the string
content.
Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a89c25d

06 10月, 2019 1 次提交

blk-wbt: fix performance regression in wbt scale_up/scale_down · b84477d3

由 Harshad Shirwadkar 提交于 10月 05, 2019

scale_up wakes up waiters after scaling up. But after scaling max, it
should not wake up more waiters as waiters will not have anything to
do. This patch fixes this by making scale_up (and also scale_down)
return when threshold is reached.

This bug causes increased fdatasync latency when fdatasync and dd
conv=sync are performed in parallel on 4.19 compared to 4.14. This
bug was introduced during refactoring of blk-wbt code.

Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
Cc: stable@vger.kernel.org
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b84477d3

29 8月, 2019 1 次提交

block/rq_qos: implement rq_qos_ops->queue_depth_changed() · 9677a3e0

由 Tejun Heo 提交于 8月 28, 2019

wbt already gets queue depth changed notification through
wbt_set_queue_depth().  Generalize it into
rq_qos_ops->queue_depth_changed() so that other rq_qos policies can
easily hook into the events too.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9677a3e0

28 8月, 2019 1 次提交

block: add helper for checking if queue is registered · 58c898ba

由 Ming Lei 提交于 8月 27, 2019

There are 4 users which check if queue is registered, so add one helper
to check it.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

58c898ba

01 5月, 2019 1 次提交

block: add SPDX tags to block layer files missing licensing information · 3dcf60bc

由 Christoph Hellwig 提交于 4月 30, 2019

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3dcf60bc

25 1月, 2019 1 次提交

blk-wbt: Declare local functions static · c83f536a

由 Bart Van Assche 提交于 1月 23, 2019

This patch avoids that sparse reports the following warnings:

  CHECK   block/blk-wbt.c
block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static?
block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static?
  CC      block/blk-wbt.o
block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes]
 void wbt_issue(struct rq_qos *rqos, struct request *rq)
      ^~~~~~~~~
block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes]
 void wbt_requeue(struct rq_qos *rqos, struct request *rq)
      ^~~~~~~~~~~
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c83f536a

17 12月, 2018 1 次提交

blk-wbt: export internal state via debugfs · d19afebc

由 Ming Lei 提交于 12月 17, 2018

This information is helpful to either investigate issues, or understand
wbt's internal behaviour.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d19afebc

12 12月, 2018 1 次提交

block: deactivate blk_stat timer in wbt_disable_default() · 544fbd16

由 Ming Lei 提交于 12月 12, 2018

rwb_enabled() can't be changed when there is any inflight IO.

wbt_disable_default() may set rwb->wb_normal as zero, however the
blk_stat timer may still be pending, and the timer function will update
wrb->wb_normal again.

This patch introduces blk_stat_deactivate() and applies it in
wbt_disable_default(), then the following IO hang triggered when running
parted & switching io scheduler can be fixed:

[  369.937806] INFO: task parted:3645 blocked for more than 120 seconds.
[  369.938941]       Not tainted 4.20.0-rc6-00284-g906c801e5248 #498
[  369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.940768] parted          D    0  3645   3239 0x00000000
[  369.941500] Call Trace:
[  369.941874]  ? __schedule+0x6d9/0x74c
[  369.942392]  ? wbt_done+0x5e/0x5e
[  369.942864]  ? wbt_cleanup_cb+0x16/0x16
[  369.943404]  ? wbt_done+0x5e/0x5e
[  369.943874]  schedule+0x67/0x78
[  369.944298]  io_schedule+0x12/0x33
[  369.944771]  rq_qos_wait+0xb5/0x119
[  369.945193]  ? karma_partition+0x1c2/0x1c2
[  369.945691]  ? wbt_cleanup_cb+0x16/0x16
[  369.946151]  wbt_wait+0x85/0xb6
[  369.946540]  __rq_qos_throttle+0x23/0x2f
[  369.947014]  blk_mq_make_request+0xe6/0x40a
[  369.947518]  generic_make_request+0x192/0x2fe
[  369.948042]  ? submit_bio+0x103/0x11f
[  369.948486]  ? __radix_tree_lookup+0x35/0xb5
[  369.949011]  submit_bio+0x103/0x11f
[  369.949436]  ? blkg_lookup_slowpath+0x25/0x44
[  369.949962]  submit_bio_wait+0x53/0x7f
[  369.950469]  blkdev_issue_flush+0x8a/0xae
[  369.951032]  blkdev_fsync+0x2f/0x3a
[  369.951502]  do_fsync+0x2e/0x47
[  369.951887]  __x64_sys_fsync+0x10/0x13
[  369.952374]  do_syscall_64+0x89/0x149
[  369.952819]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  369.953492] RIP: 0033:0x7f95a1e729d4
[  369.953996] Code: Bad RIP value.
[  369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4
[  369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004
[  369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0
[  369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380
[  369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008

Cc: stable@vger.kernel.org
Cc: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

544fbd16

08 12月, 2018 1 次提交

block: convert wbt_wait() to use rq_qos_wait() · b6c7b58f

由 Josef Bacik 提交于 12月 04, 2018

Now that we have rq_qos_wait() in place, convert wbt_wait() over to
using it with it's specific callbacks.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6c7b58f

16 11月, 2018 4 次提交

block: add queue_is_mq() helper · 344e9ffc

由 Jens Axboe 提交于 11月 15, 2018

Various spots check for q->mq_ops being non-NULL, but provide
a helper to do this instead.

Where the ->mq_ops != NULL check is redundant, remove it.

Since mq == rq-based now that legacy is gone, get rid of the
queue_is_rq_based() and just use queue_is_mq() everywhere.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

344e9ffc

block: add wbt_disable_default export for BFQ · e815f404

由 Jens Axboe 提交于 11月 15, 2018

This isn't unused, if BFQ is modular we get into trouble.

Fixes: b6676f65 ("block: remove a few unused exports")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e815f404

block: remove a few unused exports · b6676f65

由 Christoph Hellwig 提交于 11月 14, 2018

Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6676f65

block: remove the unused lock argument to rq_qos_throttle · d5337560

由 Christoph Hellwig 提交于 11月 14, 2018

Unused now that the legacy request path is gone.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d5337560

08 11月, 2018 1 次提交

blk-wbt: kill check for legacy queue type · 3c774156

由 Jens Axboe 提交于 10月 12, 2018

Everything is blk-mq at this point, so it doesn't make any sense
to have this option available as it does nothing.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c774156

12 10月, 2018 1 次提交

blk-wbt: wake up all when we scale up, not down · 5e65a203

由 Josef Bacik 提交于 10月 11, 2018

Tetsuo brought to my attention that I screwed up the scale_up/scale_down
helpers when I factored out the rq-qos code.  We need to wake up all the
waiters when we add slots for requests to make, not when we shrink the
slots.  Otherwise we'll end up things waiting forever.  This was a
mistake and simply puts everything back the way it was.

cc: stable@vger.kernel.org
Fixes: a7905043 ("blk-rq-qos: refactor out common elements of blk-wbt")
eported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e65a203

28 8月, 2018 3 次提交

blk-wbt: remove dead code · b0a84beb

由 Jens Axboe 提交于 8月 27, 2018

We already note and mark discard and swap IO from bio_to_wbt_flags().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0a84beb

blk-wbt: improve waking of tasks · 38cfb5a4

由 Jens Axboe 提交于 8月 26, 2018

We have two potential issues:

1) After commit 2887e41b, we only wake one process at the time when
   we finish an IO. We really want to wake up as many tasks as can
   queue IO. Before this commit, we woke up everyone, which could cause
   a thundering herd issue.

2) A task can potentially consume two wakeups, causing us to (in
   practice) miss a wakeup.

Fix both by providing our own wakeup function, which stops
__wake_up_common() from waking up more tasks if we fail to get a
queueing token. With the strict ordering we have on the wait list, this
wakes the right tasks and the right amount of tasks.

Based on a patch from Jianchao Wang <jianchao.w.wang@oracle.com>.
Tested-by: NAgarwal, Anchal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38cfb5a4

blk-wbt: abstract out end IO completion handler · 061a5427

由 Jens Axboe 提交于 8月 26, 2018

Prep patch for calling the handler from a different context,
no functional changes in this patch.
Tested-by: NAgarwal, Anchal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

061a5427

23 8月, 2018 2 次提交

blk-wbt: don't maintain inflight counts if disabled · c125311d

由 Jens Axboe 提交于 8月 23, 2018

A previous commit removed the ability to have per-rq flags. We used
those flags to maintain inflight counts. Since we don't have those
anymore, we have to always maintain inflight counts, even if wbt is
disabled. This is clearly suboptimal.

Add a queue quiesce around changing the wbt latency settings from sysfs
to work around this. With that, we can reliably put the enabled check in
our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be
consistent for the lifetime of the request.

Fixes: c1c80384 ("block: remove external dependency on wbt_flags")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c125311d

blk-wbt: fix has-sleeper queueing check · c45e6a03

由 Jens Axboe 提交于 8月 20, 2018

We need to do this inside the loop as well, or we can allow new
IO to supersede previous IO.
Tested-by: NAnchal Agarwal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c45e6a03

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功