提交 · 358a5fb928a9df8b51ec2fcbc7a2272fd2111011 · openeuler / Kernel

29 3月, 2023 1 次提交

Revert "block: fix null-deref in percpu_ref_put" · 60469540

由 Zhong Jinghua 提交于 3月 29, 2023

hulk inclusion
category: bugfix
bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162

----------------------------------------

This reverts commit 51e35e67.

There is a new fix for this problem in the mainline patch, so the patch
should return to the mainline solution.

mainline patch:
d36a9ea5 ("block: fix use-after-free of q->q_usage_counter")

Fixes: 51e35e67("block: fix null-deref in percpu_ref_put")
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

60469540

15 3月, 2023 1 次提交

block: add precise io accouting apis · 0217c918

由 Yu Kuai 提交于 3月 15, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6L586
CVE: NA

--------------------------------

Currently, for bio-based device, 'ios' and 'sectors' is counted while
io is started, while 'nsecs' is counted while io is done.

This behaviour is obviously wrong, however we can't fix exist kapis
because this will require new parameter, which will cause kapi broken.
Hence this patch add some new apis, which will make sure io accounting
for bio-based device is precise.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

0217c918

04 1月, 2023 1 次提交

blk-mq: fix kabi broken in struct request · 9a003b73

由 Li Nan 提交于 1月 04, 2023

hulk inclusion
category: bugfix
bugzilla: 187921, https://gitee.com/openeuler/kernel/issues/I66VDB
CVE: NA

--------------------------------

Enable CONFIG_BLK_RQ_ALLOC_TIME will cause kabi broken, use request
wrapper to fix it.
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a003b73

12 12月, 2022 1 次提交

blk-mq: don't access request_wrapper if request is not allocated from block layer · 9981c33d

由 Yu Kuai 提交于 12月 12, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I65K8D
CVE: NA

--------------------------------

request_wrapper is used to fix kabi broken for request, it's only for
internal use. This patch make sure out-of-tree drivers won't access
request_wrapper if request is not managed by block layer.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9981c33d

02 11月, 2022 5 次提交

block: fix inaccurate io_ticks by set 'precise_iostat' · eb090da6

由 Zhang Wensheng 提交于 11月 02, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5X6RT
CVE: NA

--------------------------------

After introducing commit 5b18b5a7 ("block: delete part_round_stats
and switch to less precise counting"), '%util' accounted by iostat
will be over reality data. In fact, the device is quite idle, but
iostat may show '%util' as a big number (e.g. 50%). It can produce by fio:

fio --name=1 --direct=1 --bs=4k --rw=read --filename=/dev/sda \
	   --thinktime=4ms --runtime=180
We fix this by using a switch(precise_iostat=1) to control whether or not
acconut ioticks precisely.

fixes: 5b18b5a7 ("block: delete part_round_stats and switch to less precise counting")
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eb090da6

block: fix kabi broken in request_queue · 4d82ccac

由 Yu Kuai 提交于 11月 02, 2022

hulk inclusion
category: bugfix
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

--------------------------------
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4d82ccac

blk-mq: Use shared tags for shared sbitmap support · 70399fa6

由 John Garry 提交于 11月 02, 2022

mainline inclusion
from mainline-v5.16-rc1
commit e155b0c2
category: performance
bugzilla: 186917, https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e155b0c238b20f0a866f4334d292656665836c8a

--------------------------------

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Conflict: c6f9c0e2 ("blk-mq: allow hardware queue to get more tag
while sharing a tag set") is merged, which will cause lots of conflicts
for this patch, and in the meantime, the functionality will need to be
adapted in that patch.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

70399fa6

Revert "blk-mq: fix kabi broken by "blk-mq: Use request queue-wide tags for tagset-wide sbitmap"" · 3a94180c

由 Zhang Wensheng 提交于 11月 02, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5N1S5
CVE: NA

--------------------------------

This reverts commit b97f541e.

The related feilds will be modified in later patches which are
backported from mainline.
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3a94180c

block: fix null-deref in percpu_ref_put · 51ddf58f

由 Zhang Wensheng 提交于 11月 02, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5N162
CVE: NA

--------------------------------

In the use of q_usage_counter of request_queue, blk_cleanup_queue using
"wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter))"
to wait q_usage_counter becoming zero. however, if the q_usage_counter
becoming zero quickly, and percpu_ref_exit will execute and ref->data
will be freed, maybe another process will cause a null-defef problem
like below:

	CPU0                             CPU1
blk_cleanup_queue
 blk_freeze_queue
  blk_mq_freeze_queue_wait
				scsi_end_request
				 percpu_ref_get
				 ...
				 percpu_ref_put
				  atomic_long_sub_and_test
  percpu_ref_exit
   ref->data -> NULL
   				   ref->data->release(ref) -> null-deref

Fix it by setting flag(QUEUE_FLAG_USAGE_COUNT_SYNC) to add synchronization
mechanism, when ref->data->release is called, the flag will be setted,
and the "wait_event" in blk_mq_freeze_queue_wait must wait flag becoming
true as well, which will limit percpu_ref_exit to execute ahead of time.
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

51ddf58f

13 7月, 2022 2 次提交

blk-mq: fix kabi broken in struct request · f60df4a0

由 Yu Kuai 提交于 7月 13, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I57S8D
CVE: NA

--------------------------------

Since there are no reserved fields, declare a wrapper to fix kabi
broken.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f60df4a0

block: update nsecs[] in part_stat_show() and diskstats_show() · d8e6687d

由 Yu Kuai 提交于 7月 13, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I57S8D
CVE: NA

--------------------------------

commit 7ec2ec68 ("block: update io_ticks when io hang") fixed that
%util will be zero for iostat when io is hanged, however, avgqu-sz is
still zero while it represents the number of io that are hunged. On the
other hand, for some slow device, if an io is started before and done
after diskstats is read, the avgqu-sz will be miscalculated.

To fix the problem, update 'nsecs[]' when part_stat_show() or
diskstats_show() is called. In order to do that, add 'stat_time' in
struct hd_struct and 'rq_stat_time' in struct request to record the
time. And during iteration, update 'nsecs[]' for each inflight request.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d8e6687d

31 5月, 2022 2 次提交

blk-mq: fix kabi broken by "blk-mq: Use request queue-wide tags for tagset-wide sbitmap" · b97f541e

由 Yufen Yu 提交于 5月 31, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I597XM
CVE: NA

---------------------------
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b97f541e

blk-mq: Use request queue-wide tags for tagset-wide sbitmap · 02faa4ed

由 John Garry 提交于 5月 31, 2022

mainline inclusion
from mainline-v5.14-rc1
commit d97e594c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I597XM
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d97e594c51660bea510a387731637b894651e4b5

--------------------------------

The tags used for an IO scheduler are currently per hctx.

As such, when q->nr_hw_queues grows, so does the request queue total IO
scheduler tag depth.

This may cause problems for SCSI MQ HBAs whose total driver depth is
fixed.

Ming and Yanhui report higher CPU usage and lower throughput in scenarios
where the fixed total driver tag depth is appreciably lower than the total
scheduler tag depth:
https://lore.kernel.org/linux-block/440dfcfc-1a2c-bd98-1161-cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b

In that scenario, since the scheduler tag is got first, much contention
is introduced since a driver tag may not be available after we have got
the sched tag.

Improve this scenario by introducing request queue-wide tags for when
a tagset-wide sbitmap is used. The static sched requests are still
allocated per hctx, as requests are initialised per hctx, as in
blk_mq_init_request(..., hctx_idx, ...) ->
set->ops->init_request(.., hctx_idx, ...).

For simplicity of resizing the request queue sbitmap when updating the
request queue depth, just init at the max possible size, so we don't need
to deal with the possibly with swapping out a new sbitmap for old if
we need to grow.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
conflict:
	block/blk-mq-sched.c
	block/blk-mq-sched.h
	block/blk-mq-tag.c
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

02faa4ed

21 3月, 2022 1 次提交

blk-mq: fix potential uaf for 'queue_hw_ctx' · fb9a380f

由 Yu Kuai 提交于 3月 21, 2022

hulk inclusion
category: bugfix
bugzilla: 186389, https://gitee.com/openeuler/kernel/issues/I4Y43S
CVE: NA

--------------------------------

blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate
submit_queues through configfs for null_blk), while it might still be
used from other context(e.g. switch elevator to none):

t1					t2
elevator_switch
 blk_mq_unquiesce_queue
  blk_mq_run_hw_queues
   queue_for_each_hw_ctx
    // assembly code for hctx = (q)->queue_hw_ctx[i]
    mov    0x48(%rbp),%rdx -> read old queue_hw_ctx

					__blk_mq_update_nr_hw_queues
					 blk_mq_realloc_hw_ctxs
					  hctxs = q->queue_hw_ctx
					  q->queue_hw_ctx = new_hctxs
					  kfree(hctxs)
    movslq %ebx,%rax
    mov    (%rdx,%rax,8),%rdi ->uaf

Sicne the queue is freezed in __blk_mq_update_nr_hw_queues(), fix the
problem by protecting 'queue_hw_ctx' through rcu where it can be accessed
without grabbing 'q_usage_counter'.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fb9a380f

08 3月, 2022 2 次提交

blk-mq: decrease pending_queues when it expires · 326d641b

由 Yu Kuai 提交于 3月 08, 2022

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I4S8DW

---------------------------

If pending_queues is increased once, it will only be decreased when
nr_active is zero, and that will lead to the under-utilization of
host tags because pending_queues is non-zero and the available
tags for the queue will be max(host tags / active_queues, 4)
instead of the needed tags of the queue.

Fix it by adding an expiration time for the increasement of pending_queues,
and decrease it when it expires, so pending_queues will be decreased
to zero if there is no tag allocation failure, and the available tags
for the queue will be the whole host tags.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

326d641b

blk-mq: allow hardware queue to get more tag while sharing a tag set · c6f9c0e2

由 Yu Kuai 提交于 3月 08, 2022

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I4S8DW

---------------------------

When sharing a tag set, if most disks are issuing small amount of IO, and
only a few is issuing a large amount of IO. Current approach is to limit
the max amount of tags a disk can get equally to the average of total
tags. Thus the few heavy load disk can't get enough tags while many tags
are still free in the tag set.

We add 'pending_queues' in blk_mq_tag_set to count how many queues can't
get driver tag. Thus if this value is zero, there is no need to limit
the max number of available tags.

On the other hand, if a queue doesn't issue IO, the 'active_queues' will
not be decreased in a period of time(request timeout), thus a lot of tags
will not be available because max number of available tags is set to
max(total tags / active_queues, 4). Thus we decreased it when
'nr_active' is 0.

This functionality is enabled by default, to disable it, add
"blk_mq.unfair_dtag=0" to boot cmd.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c6f9c0e2

31 12月, 2021 1 次提交

kabi: Add kabi reservation for storage module · dde4fe56

由 Zhihao Cheng 提交于 12月 31, 2021

hulk inclusion
category: feature
bugzilla: 185747 https://gitee.com/openeuler/kernel/issues/I4OUFN
CVE: NA

-------------------------------

Introduce kabi for storage module.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dde4fe56

10 12月, 2021 1 次提交

block: return errors from blk_execute_rq() · 208a3120

由 Keith Busch 提交于 12月 10, 2021

mainline inclusion
from mainline-v5.14-rc1
commit  fb9b16e1
category: bugfix
bugzilla: 185778 https://gitee.com/openeuler/kernel/issues/I4LM14
CVE: NA

-----------------------------------------

The synchronous blk_execute_rq() had not provided a way for its callers
to know if its request was successful or not. Return the blk_status_t
result of the request.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-4-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

conflict: 1. in blkdev.h and blk-exec, blk_execute_rq return value
change;
	  2. input parameter in blk_execute_rq is not the same as
	     mainline;
Signed-off-by: Nzhangwensheng <zhangwensheng5@huawei.com>
Reviewed-by: Nqiulaibin <qiulaibin@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

208a3120

06 12月, 2021 2 次提交

block: Add a helper to validate the block size · 014705da

由 Xie Yongji 提交于 12月 06, 2021

stable inclusion
from stable-5.10.81
commit 79ff56c613c193744d6be77d4c50a7ae22d6dd01
bugzilla: 185832 https://gitee.com/openeuler/kernel/issues/I4L9CF

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79ff56c613c193744d6be77d4c50a7ae22d6dd01

--------------------------------

commit 570b1cac upstream.

There are some duplicated codes to validate the block
size in block drivers. This limitation actually comes
from block layer, so this patch tries to add a new block
layer helper for that.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20211026144015.188-2-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

014705da

block: bump max plugged deferred size from 16 to 32 · e1327700

由 Jens Axboe 提交于 12月 06, 2021

stable inclusion
from stable-5.10.80
commit b34ea3c91eacdc50c761506cab35b14f67216f76
bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b34ea3c91eacdc50c761506cab35b14f67216f76

--------------------------------

[ Upstream commit ba0ffdd8 ]

Particularly for NVMe with efficient deferred submission for many
requests, there are nice benefits to be seen by bumping the default max
plug count from 16 to 32. This is especially true for virtualized setups,
where the submit part is more expensive. But can be noticed even on
native hardware.

Reduce the multiple queue factor from 4 to 2, since we're changing the
default size.

While changing it, move the defines into the block layer private header.
These aren't values that anyone outside of the block layer uses, or
should use.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e1327700

15 11月, 2021 2 次提交

blk-mq: support concurrent queue quiesce/unquiesce · ed1ab377

由 Ming Lei 提交于 11月 15, 2021

mainline inclusion
from mainline-v5.16
commit e70feb8b
category: bugfix
bugzilla: 182378 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e70feb8b3e6886c525c88943b5f1508d02f5a683

---------------------------

blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support
concurrent/nested quiesce. One biggest issue is that unquiesce can happen
unexpectedly in case that quiesce/unquiesce are run concurrently from
more than one context.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last/outer-most one of all
contexts.

Several kernel panic issue has been reported[1][2][3] when running stress
quiesce test. And this patch has been verified in these reports.

[1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787
[2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722
[3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.htmlSigned-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014081710.1871747-7-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ed1ab377

Revert "[Huawei] block: stop wait rcu once we can ensure no io while elevator init" · 1ed78029

由 yangerkun 提交于 11月 15, 2021

hulk inclusion
category: performance
bugzilla: 174005 https://gitee.com/openeuler/kernel/issues/I4DDEL

---------------------------

This reverts commit b2d85dfb8c1de87700afae78df99715d3a0788a5. This patch
is a local patch try to remove the useless rcu gap for loop setup. Now
mainline has the solution too. Revert local patch.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1ed78029

19 10月, 2021 2 次提交

block: return ELEVATOR_DISCARD_MERGE if possible · 65047d35

由 Ming Lei 提交于 10月 19, 2021

stable inclusion
from stable-5.10.65
commit 87aa69aa10b420823174eedcfd16366ad3d7fe93
bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=87aa69aa10b420823174eedcfd16366ad3d7fe93

--------------------------------

[ Upstream commit 866663b7 ]

When merging one bio to request, if they are discard IO and the queue
supports multi-range discard, we need to return ELEVATOR_DISCARD_MERGE
because both block core and related drivers(nvme, virtio-blk) doesn't
handle mixed discard io merge(traditional IO merge together with
discard merge) well.

Fix the issue by returning ELEVATOR_DISCARD_MERGE in this situation,
so both blk-mq and drivers just need to handle multi-range discard.
Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Fixes: 2705dfb2 ("block: fix discard request merge")
Link: https://lore.kernel.org/r/20210729034226.1591070-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

65047d35

blk-mq: fix divide by zero crash in tg_may_dispatch() · 21d91581

由 Yu Kuai 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 177149 https://gitee.com/openeuler/kernel/issues/I4DDEL

-----------------------------------------------

If blk-throttle is enabled and io is issued before
blk_throtl_register_queue() is done. Divide by zero crash will be
triggered in tg_may_dispatch() because 'throtl_slice' is uninitialized.

Thus introduce a new flag QUEUE_FLAG_THROTL_INIT_DONE. It will be set
after blk_throtl_register_queue() is done, and will be checked before
applying any config.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

21d91581

12 10月, 2021 1 次提交

block: stop wait rcu once we can ensure no io while elevator init · 9c770585

由 yangerkun 提交于 10月 12, 2021

Offering: HULK
hulk inclusion
category: performance
bugzilla: 174005 https://gitee.com/openeuler/kernel/issues/I4DDEL

---------------------------

'commit 737eb78e ("block: Delay default elevator initialization")'
delay elevator init to fix some problem for special device like SMR.
Also, the commit add the logic to ensure no IO can happened while
blk_mq_init_sched. However, blk_mq_freeze_queue/blk_mq_quiesce_queue
will add RCU Grace period which can lead some overhead(about 36 loop
device try to mount which each Grace period around 20ms).

For loop device, no io can happened while add_disk, so it's safe to skip
this step. Add flag QUEUE_FLAG_NO_INIT_IO to identify this case.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9c770585

27 1月, 2021 2 次提交

scsi: block: Do not accept any requests while suspended · 27b386e5

由 Alan Stern 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit d55d15a332ec651ccb49c42a8a10c03447fdf418
bugzilla: 47429

--------------------------------

[ Upstream commit 52abca64 ]

blk_queue_enter() accepts BLK_MQ_REQ_PM requests independent of the runtime
power management state. Now that SCSI domain validation no longer depends
on this behavior, modify the behavior of blk_queue_enter() as follows:

   - Do not accept any requests while suspended.

   - Only process power management requests while suspending or resuming.

Submitting BLK_MQ_REQ_PM requests to a device that is runtime suspended
causes runtime-suspended devices not to resume as they should. The request
which should cause a runtime resume instead gets issued directly, without
resuming the device first. Of course the device can't handle it properly,
the I/O fails, and the device remains suspended.

The problem is fixed by checking that the queue's runtime-PM status isn't
RPM_SUSPENDED before allowing a request to be issued, and queuing a
runtime-resume request if it is.  In particular, the inline
blk_pm_request_resume() routine is renamed blk_pm_resume_queue() and the
code is unified by merging the surrounding checks into the routine.  If the
queue isn't set up for runtime PM, or there currently is no restriction on
allowed requests, the request is allowed.  Likewise if the BLK_MQ_REQ_PM
flag is set and the status isn't RPM_SUSPENDED.  Otherwise a runtime resume
is queued and the request is blocked until conditions are more suitable.

[ bvanassche: modified commit message and removed Cc: stable because
  without the previous patches from this series this patch would break
  parallel SCSI domain validation + introduced queue_rpm_status() ]

Link: https://lore.kernel.org/r/20201209052951.16136-9-bvanassche@acm.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-and-tested-by: NMartin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NCan Guo <cang@codeaurora.org>
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

27b386e5

scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT · d8545d48

由 Bart Van Assche 提交于 1月 19, 2021

stable inclusion
from stable-5.10.7
commit 782c9ef2ac059a25d6afbac344319574414258db
bugzilla: 47429

--------------------------------

[ Upstream commit a4d34da7 ]

Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer
used by any kernel code.

Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org
Cc: Can Guo <cang@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Martin Kepplinger <martin.kepplinger@puri.sm>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NCan Guo <cang@codeaurora.org>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d8545d48

05 12月, 2020 2 次提交

block: fix incorrect branching in blk_max_size_offset() · 65f33b35

由 Mike Snitzer 提交于 12月 04, 2020

If non-zero 'chunk_sectors' is passed in to blk_max_size_offset() that
override will be incorrectly ignored.

Old blk_max_size_offset() branching, prior to commit 3ee16db3,
must be used only if passed 'chunk_sectors' override is zero.

Fixes: 3ee16db3 ("dm: fix IO splitting")
Cc: stable@vger.kernel.org # 5.9
Reported-by: NJohn Dorminy <jdorminy@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

65f33b35

dm: fix IO splitting · 3ee16db3

由 Mike Snitzer 提交于 11月 30, 2020

Commit 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
   chunk_sectors must reflect the most limited of all devices in the
   IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
   .iterate_devices method no longer had there IO split properly.

And commit 5091cdec ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).

Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.

Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.

Fixes: 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: NJohn Dorminy <jdorminy@redhat.com>
Reported-by: NBruce Johnston <bjohnsto@redhat.com>
Reported-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: NJohn Dorminy <jdorminy@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

3ee16db3

17 10月, 2020 1 次提交

kernel.h: split out min()/max() et al. helpers · b296a6d5

由 Andy Shevchenko 提交于 10月 15, 2020

kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out min()/max()
et al.  helpers.

At the same time convert users in header and lib folder to use new header.
Though for time being include new header back to kernel.h to avoid
twisted indirected includes for other existing users.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Joe Perches <joe@perches.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20200910164152.GA1891694@smile.fi.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b296a6d5

07 10月, 2020 2 次提交

block: soft limit zone-append sectors as well · fe6f0cdc

由 Johannes Thumshirn 提交于 10月 07, 2020

Martin rightfully noted that for normal filesystem IO we have soft limits
in place, to prevent them from getting too big and not lead to
unpredictable latencies. For zone append we only have the hardware limit
in place.

Cap the max sectors we submit via zone-append to the maximal number of
sectors if the second limit is lower.
Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/linux-btrfs/yq1k0w8g3rw.fsf@ca-mkp.ca.oracle.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

fe6f0cdc

block: optimize blk_queue_zoned_model for !CONFIG_BLK_DEV_ZONED · 6fcd6695

由 Christoph Hellwig 提交于 8月 20, 2020

Always return BLK_ZONED_NONE if zoned device support is not enabled.
This allows various compiler optimizations including the dead code
elimination that we so like for avoiding ifdefs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>

6fcd6695

06 10月, 2020 4 次提交

block: remove the unused blk_integrity_merge_bio export · d59da419

由 Christoph Hellwig 提交于 10月 06, 2020

Also move the definition from the public blkdev.h to the private
block/blk.h header.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d59da419

block: remove the unused blk_integrity_merge_rq export · 92cf2fd1

由 Christoph Hellwig 提交于 10月 06, 2020

Also move the definition from the public blkdev.h to the private
block/blk.h header.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

92cf2fd1

block: move 'q_usage_counter' into front of 'request_queue' · 0549e87c

由 Ming Lei 提交于 10月 01, 2020

The field of 'q_usage_counter' is always fetched in fast path of every
block driver, and move it into front of 'request_queue', so it can be
fetched into 1st cacheline of 'request_queue' instance.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NVeronika Kabatova <vkabatov@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0549e87c

block: add a bdget_part helper · 10ed1666

由 Christoph Hellwig 提交于 9月 25, 2020

All remaining callers of bdget() outside of fs/block_dev.c want to get a
reference to the struct block_device for a given struct hd_struct.  Add
a helper just for that and then mark bdget static.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10ed1666

25 9月, 2020 4 次提交

block: add QUEUE_FLAG_NOWAIT · 021a2446

由 Mike Snitzer 提交于 9月 23, 2020

Add QUEUE_FLAG_NOWAIT to allow a block device to advertise support for
REQ_NOWAIT. Bio-based devices may set QUEUE_FLAG_NOWAIT where
applicable.

Update QUEUE_FLAG_MQ_DEFAULT to include QUEUE_FLAG_NOWAIT.  Also
update submit_bio_checks() to verify it is set for REQ_NOWAIT bios.
Reported-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

021a2446

block: add a bdev_is_partition helper · fa01b1e9

由 Christoph Hellwig 提交于 9月 03, 2020

Add a littler helper to make the somewhat arcane bd_contains checks a
little more obvious.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa01b1e9

bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag · 1cb039f3

由 Christoph Hellwig 提交于 9月 24, 2020

The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it.  This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore.  It is replaced with a queue attribute which
also is writable for easier testing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1cb039f3

block: lift setting the readahead size into the block layer · c2e4cd57

由 Christoph Hellwig 提交于 9月 24, 2020

Drivers shouldn't really mess with the readahead size, as that is a VM
concept.  Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk.  Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors.  To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c2e4cd57

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功