提交 · d5f0ae1b3bb719ccea9507ef5e66184b6fa892e8 · openeuler / Kernel

20 7月, 2023 2 次提交

dm: don't lock fs when the map is NULL during suspend or resume · d5f0ae1b

由 Li Lingfeng 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.4-rc8
commit 2760904d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7FI78
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc7&id=2760904d895279f87196f0fa9ec570c79fe6a2e4

----------------------------------------

As described in commit 38d11da5 ("dm: don't lock fs when the map is
NULL in process of resume"), a deadlock may be triggered between
do_resume() and do_mount().

This commit preserves the fix from commit 38d11da5 but moves it to
where it also serves to fix a similar deadlock between do_suspend()
and do_mount().  It does so, if the active map is NULL, by clearing
DM_SUSPEND_LOCKFS_FLAG in dm_suspend() which is called by both
do_suspend() and do_resume().

Fixes: 38d11da5 ("dm: don't lock fs when the map is NULL in process of resume")
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

Conflicts:
  drivers/md/dm-ioctl.c
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
(cherry picked from commit 0ba9dcd9)

d5f0ae1b

dm: requeue IO if mapping table not yet available · bad8061a

由 Mike Snitzer 提交于 7月 07, 2023

mainline inclusion
from mainline-v5.18-rc1
commit fa247089
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7FI78
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc7&id=fa247089de9936a46e290d4724cb5f0b845600f5

----------------------------------------

Update both bio-based and request-based DM to requeue IO if the
mapping table not available.

This race of IO being submitted before the DM device ready is so
narrow, yet possible for initial table load given that the DM device's
request_queue is created prior, that it best to requeue IO to handle
this unlikely case.
Reported-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
(cherry picked from commit e50475d3)

bad8061a

08 6月, 2023 1 次提交

dm: add disk before alloc dax · f17dc4a3

由 Li Lingfeng 提交于 6月 06, 2023

Offering: HULK
hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I78SWJ
CVE: NA

-------------------------------

In dm_create(), alloc_dev() may trigger panic if alloc_dax() fail since
del_gendisk() will be called with add_disk() wasn't called before.

Call add_disk() before alloc_dax() to avoid it.
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
(cherry picked from commit 6601d443)

f17dc4a3

30 11月, 2022 1 次提交

dm: Fix UAF in run_timer_softirq() · dbe740d5

由 Luo Meng 提交于 11月 30, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WBID
CVE: NA

--------------------------------

When dm_resume() and dm_destroy() are concurrent, it will
lead to UAF.

One of the concurrency UAF can be shown as below:

        use                                  free
do_resume                           |
  __find_device_hash_cell           |
    dm_get                          |
      atomic_inc(&md->holders)      |
                                    | dm_destroy
				    |   __dm_destroy
				    |     if (!dm_suspended_md(md))
                                    |     atomic_read(&md->holders)
				    |     msleep(1)
  dm_resume                         |
    __dm_resume                     |
      dm_table_resume_targets       |
	pool_resume                 |
	  do_waker  #add delay work |
				    |     dm_table_destroy
				    |       pool_dtr
				    |         __pool_dec
                                    |           __pool_destroy
                                    |             destroy_workqueue
                                    |             kfree(pool) # free pool
	time out
__do_softirq
  run_timer_softirq # pool has already been freed

This can be easily reproduced using:
  1. create thin-pool
  2. dmsetup suspend pool
  3. dmsetup resume pool
  4. dmsetup remove_all # Concurrent with 3

The root cause of UAF bugs is that dm_resume() adds timer after
dm_destroy() skips cancel timer beause of suspend status. After
timeout, it will call run_timer_softirq(), however pool has already
been freed. The concurrency UAF bug will happen.

Therefore, canceling timer is moved after md->holders is zero.
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dbe740d5

18 11月, 2022 1 次提交

dm: return early from dm_pr_call() if DM device is suspended · 225613c6

由 Mike Snitzer 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit b7e2d64d673abdecae8b8f3f44ef37820e7e8f6c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b7e2d64d673abdecae8b8f3f44ef37820e7e8f6c

--------------------------------

[ Upstream commit e120a5f1 ]

Otherwise PR ops may be issued while the broader DM device is being
reconfigured, etc.

Fixes: 9c72bad1 ("dm: call PR reserve/unreserve on each underlying device")
Signed-off-by: NMike Snitzer <snitzer@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

225613c6

21 9月, 2022 1 次提交

dm: switch to rq-based after queue is initialized · c26ea58d

由 Yu Kuai 提交于 9月 21, 2022

hulk inclusion
category: bugfix
bugzilla: 187345, https://gitee.com/openeuler/kernel/issues/I5L5ZG
CVE: NA

--------------------------------

Otherwise, null pointer crash can be triggered to handle bio in
blk_mq_submit_bio() while queue is not initialized.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c26ea58d

04 8月, 2022 1 次提交

dm: interlock pending dm_io and dm_wait_for_bios_completion · db8a1f7a

由 Mike Snitzer 提交于 8月 04, 2022

stable inclusion
from stable-v5.10.115
commit 7676a5b99f3da170d42d3e457cd7ff8c82001dd1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7676a5b99f3da170d42d3e457cd7ff8c82001dd1

--------------------------------

commit 9f6dc633 upstream.

Commit d208b894 ("dm: fix mempool NULL pointer race when
completing IO") didn't go far enough.

When bio_end_io_acct ends the count of in-flight I/Os may reach zero
and the DM device may be suspended. There is a possibility that the
suspend races with dm_stats_account_io.

Fix this by adding percpu "pending_io" counters to track outstanding
dm_io. Move kicking of suspend queue to dm_io_dec_pending(). Also,
rename md_in_flight_bios() to dm_in_flight_bios() and update it to
iterate all pending_io counters.

Fixes: d208b894 ("dm: fix mempool NULL pointer race when completing IO")
Cc: stable@vger.kernel.org
Co-developed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

db8a1f7a

27 4月, 2022 1 次提交

dm: fix alloc_dax error handling in alloc_dev · 3b9f937e

由 Christoph Hellwig 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit dfde7afed7116374074e531dfad9919348bef5ac
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dfde7afed7116374074e531dfad9919348bef5ac

--------------------------------

[ Upstream commit d7519392 ]

Make sure ->dax_dev is NULL on error so that the cleanup path doesn't
trip over an ERR_PTR.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

3b9f937e

08 3月, 2022 2 次提交

dm rq: don't queue request to blk-mq during DM suspend · 3c04af00

由 Ming Lei 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.15-rc6
commit b4459b11
category: panic
bugzilla: 185513 https://gitee.com/openeuler/kernel/issues/I4V82O?from=project-issue
CVE: NA

-------------------------------------------------

DM uses blk-mq's quiesce/unquiesce to stop/start device mapper queue.

But blk-mq's unquiesce may come from outside events, such as elevator
switch, updating nr_requests or others, and request may come during
suspend, so simply ask for blk-mq to requeue it.

Fixes one kernel panic issue when running updating nr_requests and
dm-mpath suspend/resume stress test.

Cc: stable@vger.kernel.org
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3c04af00

dm: fix mempool NULL pointer race when completing IO · 6bff3499

由 Jiazi Li 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.15-rc6
commit d208b894
category: panic
bugzilla: 185514 https://gitee.com/openeuler/kernel/issues/I4V6FT?from=project-issue
CVE: NA

-------------------------------------------------

dm_io_dec_pending() calls end_io_acct() first and will then dec md
in-flight pending count. But if a task is swapping DM table at same
time this can result in a crash due to mempool->elements being NULL:

task1                             task2
do_resume
 ->do_suspend
  ->dm_wait_for_completion
                                  bio_endio
				   ->clone_endio
				    ->dm_io_dec_pending
				     ->end_io_acct
				      ->wakeup task1
 ->dm_swap_table
  ->__bind
   ->__bind_mempools
    ->bioset_exit
     ->mempool_exit
                                     ->free_io

[ 67.330330] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
......
[ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[ 67.330510] pc : mempool_free+0x70/0xa0
[ 67.330515] lr : mempool_free+0x4c/0xa0
[ 67.330520] sp : ffffff8008013b20
[ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004
[ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8
[ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800
[ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800
[ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80
[ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c
[ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd
[ 67.330563] x15: 000000000093b41e x14: 0000000000000010
[ 67.330569] x13: 0000000000007f7a x12: 0000000034155555
[ 67.330574] x11: 0000000000000001 x10: 0000000000000001
[ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000
[ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a
[ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001
[ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8
[ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970
[ 67.330609] Call trace:
[ 67.330616] mempool_free+0x70/0xa0
[ 67.330627] bio_put+0xf8/0x110
[ 67.330638] dec_pending+0x13c/0x230
[ 67.330644] clone_endio+0x90/0x180
[ 67.330649] bio_endio+0x198/0x1b8
[ 67.330655] dec_pending+0x190/0x230
[ 67.330660] clone_endio+0x90/0x180
[ 67.330665] bio_endio+0x198/0x1b8
[ 67.330673] blk_update_request+0x214/0x428
[ 67.330683] scsi_end_request+0x2c/0x300
[ 67.330688] scsi_io_completion+0xa0/0x710
[ 67.330695] scsi_finish_command+0xd8/0x110
[ 67.330700] scsi_softirq_done+0x114/0x148
[ 67.330708] blk_done_softirq+0x74/0xd0
[ 67.330716] __do_softirq+0x18c/0x374
[ 67.330724] irq_exit+0xb4/0xb8
[ 67.330732] __handle_domain_irq+0x84/0xc0
[ 67.330737] gic_handle_irq+0x148/0x1b0
[ 67.330744] el1_irq+0xe8/0x190
[ 67.330753] lpm_cpuidle_enter+0x4f8/0x538
[ 67.330759] cpuidle_enter_state+0x1fc/0x398
[ 67.330764] cpuidle_enter+0x18/0x20
[ 67.330772] do_idle+0x1b4/0x290
[ 67.330778] cpu_startup_entry+0x20/0x28
[ 67.330786] secondary_start_kernel+0x160/0x170

Fix this by:
1) Establishing pointers to 'struct dm_io' members in
dm_io_dec_pending() so that they may be passed into end_io_acct()
_after_ free_io() is called.
2) Moving end_io_acct() after free_io().

Cc: stable@vger.kernel.org
Signed-off-by: NJiazi Li <lijiazi@xiaomi.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NLuo Meng <luomeng12@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6bff3499

15 11月, 2021 1 次提交

dm: don't stop request queue after the dm device is suspended · 9d0e8ada

由 Ming Lei 提交于 11月 15, 2021

mainline inclusion
from mainline-v5.16
commit a1c2f7e7
category: bugfix
bugzilla: 182378 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a1c2f7e7f25c9d35d3bf046f99682c5373b20fa2

---------------------------

For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two call to be balanced.

__bind() is only called from dm_swap_table() in which dm device has been
suspended already, so not necessary to stop queue again. With this way,
request queue quiesce and unquiesce can be balanced.
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9d0e8ada

15 10月, 2021 1 次提交

dm: Fix dm_accept_partial_bio() relative to zone management commands · 00f90ae5

由 Damien Le Moal 提交于 10月 14, 2021

stable inclusion
from stable-5.10.51
commit cc4f0a9d5aa1b5abffb2366a0b37c37806362fe8
bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cc4f0a9d5aa1b5abffb2366a0b37c37806362fe8

--------------------------------

[ Upstream commit 6842d264 ]

Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

00f90ae5

09 4月, 2021 2 次提交

dm table: fix DAX iterate_devices based device capability checks · 1a204795

由 Jeffle Xu 提交于 3月 15, 2021

stable inclusion
from stable-5.10.20
commit bc3f609db369f126631aa0bc2be7e9a4727a1836
bugzilla: 50608

--------------------------------

commit 5b0fab50 upstream.

Fix dm_table_supports_dax() and invert logic of both
iterate_devices_callout_fn so that all devices' DAX capabilities are
properly checked.

Fixes: 545ed20e ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1a204795

dm: fix deadlock when swapping to encrypted device · b9743b42

由 Mikulas Patocka 提交于 3月 15, 2021

stable inclusion
from stable-5.10.20
commit 1f145073b196073891d2df66d2d011f1c361fd26
bugzilla: 50608

--------------------------------

commit a666e5c0 upstream.

The system would deadlock when swapping to a dm-crypt device. The reason
is that for each incoming write bio, dm-crypt allocates memory that holds
encrypted data. These excessive allocations exhaust all the memory and the
result is either deadlock or OOM trigger.

This patch limits the number of in-flight swap bios, so that the memory
consumed by dm-crypt is limited. The limit is enforced if the target set
the "limit_swap_bios" variable and if the bio has REQ_SWAP set.

Non-swap bios are not affected becuase taking the semaphore would cause
performance degradation.

This is similar to request-based drivers - they will also block when the
number of requests is over the limit.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b9743b42

28 1月, 2021 1 次提交

dm: eliminate potential source of excessive kernel log noise · bbcfc32d

由 Mike Snitzer 提交于 1月 25, 2021

stable inclusion
from stable-5.10.9
commit 0eb56457d239f5ee555ad9dc0c086a0abd933f1b
bugzilla: 47457

--------------------------------

commit 0378c625 upstream.

There wasn't ever a real need to log an error in the kernel log for
ioctls issued with insufficient permissions. Simply return an error
and if an admin/user is sufficiently motivated they can enable DM's
dynamic debugging to see an explanation for why the ioctls were
disallowed.
Reported-by: NNir Soffer <nsoffer@redhat.com>
Fixes: e980f623 ("dm: don't allow ioctls to targets that don't map to whole devices")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

bbcfc32d

05 12月, 2020 3 次提交

dm: remove invalid sparse __acquires and __releases annotations · bde3808b

由 Mike Snitzer 提交于 12月 04, 2020

Fixes sparse warnings:
drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit

Fixes: 971888c4 ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bde3808b

dm: fix double RCU unlock in dm_dax_zero_page_range() error path · f05c4403

由 Mike Snitzer 提交于 12月 04, 2020

Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error
path to fix sparse warning:
drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock

Fixes: cdf6cdcd ("dm,dax: Add dax zero_page_range operation")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f05c4403

dm: fix IO splitting · 3ee16db3

由 Mike Snitzer 提交于 11月 30, 2020

Commit 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
   chunk_sectors must reflect the most limited of all devices in the
   IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
   .iterate_devices method no longer had there IO split properly.

And commit 5091cdec ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).

Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.

Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.

Fixes: 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: NJohn Dorminy <jdorminy@redhat.com>
Reported-by: NBruce Johnston <bjohnsto@redhat.com>
Reported-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: NJohn Dorminy <jdorminy@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

3ee16db3

02 12月, 2020 1 次提交

dm: fix bug with RCU locking in dm_blk_report_zones · 89478335

由 Sergei Shtepa 提交于 11月 11, 2020

The dm_get_live_table() function makes RCU read lock so
dm_put_live_table() must be called even if dm_table map is not found.

Fixes: e76239a3 ("block: add a report_zones method")
Cc: stable@vger.kernel.org
Signed-off-by: NSergei Shtepa <sergei.shtepa@veeam.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

89478335

08 10月, 2020 2 次提交

dm: fix request-based DM to not bounce through indirect dm_submit_bio · 681cc5e8

由 Mike Snitzer 提交于 10月 07, 2020

It is unnecessary to force request-based DM to call into bio-based
dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then
call blk_mq_submit_bio().

Fix this by establishing a request-based DM block_device_operations
(dm_rq_blk_dops, which doesn't have .submit_bio) and update
dm_setup_md_queue() to set md->disk->fops to it for
DM_TYPE_REQUEST_BASED.

Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport
blk_mq_submit_bio.

Fixes: c62b37d9 ("block: move ->make_request_fn to struct block_device_operations")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

681cc5e8

dm: remove special-casing of bio-based immutable singleton target on NVMe · 9c37de29

由 Mike Snitzer 提交于 10月 07, 2020

Since commit 5a6c35f9 ("block: remove direct_make_request") there
is no benefit to DM special-casing NVMe. Remove all code used to
establish DM_TYPE_NVME_BIO_BASED.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9c37de29

06 10月, 2020 1 次提交

block: make bio_crypt_clone() able to fail · 07560151

由 Eric Biggers 提交于 9月 15, 2020

bio_crypt_clone() assumes its gfp_mask argument always includes
__GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed.

However, bio_crypt_clone() might be called with GFP_ATOMIC via
setup_clone() in drivers/md/dm-rq.c, or with GFP_NOWAIT via
kcryptd_io_read() in drivers/md/dm-crypt.c.

Neither case is currently reachable with a bio that actually has an
encryption context.  However, it's fragile to rely on this.  Just make
bio_crypt_clone() able to fail, analogous to bio_integrity_clone().
Reported-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NSatya Tangirala <satyat@google.com>
Cc: Satya Tangirala <satyat@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07560151

02 10月, 2020 2 次提交

dm: fix comment in __dm_suspend() · 0cede372

由 Mike Snitzer 提交于 9月 30, 2020

Fix stale references to functions that have been renamed and fix typo.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0cede372

dm: fold dm_process_bio() into dm_submit_bio() · b2abdb1b

由 Mike Snitzer 提交于 9月 30, 2020

dm_process_bio() is only called by dm_submit_bio(), there is no benefit
to keeping dm_process_bio() factored out, so fold it.

While at it, cleanup dm_submit_bio()'s DMF_BLOCK_IO_FOR_SUSPEND related
branching and expand scope of dm_get_live_table() rcu reference on map
via common 'out' label to dm_put_live_table().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b2abdb1b

01 10月, 2020 1 次提交

dm: fix missing imposition of queue_limits from dm_wq_work() thread · 0c2915b8

由 Mike Snitzer 提交于 9月 28, 2020

If a DM device was suspended when bios were issued to it, those bios
would be deferred using queue_io(). Once the DM device was resumed
dm_process_bio() could be called by dm_wq_work() for original bio that
still needs splitting. dm_process_bio()'s check for current->bio_list
(meaning call chain is within ->submit_bio) as a prerequisite for
calling blk_queue_split() for "abnormal IO" would result in
dm_process_bio() never imposing corresponding queue_limits
(e.g. discard_granularity, discard_max_bytes, etc).

Fix this by always having dm_wq_work() resubmit deferred bios using
submit_bio_noacct().

Side-effect is blk_queue_split() is always called for "abnormal IO" from
->submit_bio, be it from application thread or dm_wq_work() workqueue,
so proper bio splitting and depth-first bio submission is performed.
For sake of clarity, remove current->bio_list check before call to
blk_queue_split().

Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no
longer needed since IO will be reissued in terms of ->submit_bio.
And rename bio variable from 'c' to 'bio'.

Fixes: cf9c3786 ("dm: fix comment in dm_process_bio()")
Reported-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0c2915b8

30 9月, 2020 7 次提交

dm table: make 'struct dm_table' definition accessible to all of DM core · 33bd6f06

由 Mike Snitzer 提交于 9月 19, 2020

Move 'struct dm_table' definition from dm-table.c to dm-core.h and
update DM core to access its members directly.

Helps optimize max_io_len() and other methods slightly.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

33bd6f06

M
dm: eliminate need for start_io_acct() forward declaration · 7465d7ac
由 Mike Snitzer 提交于 9月 17, 2020
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
7465d7ac

dm: simplify __process_abnormal_io() · 9679b5a7

由 Mike Snitzer 提交于 9月 15, 2020

Only call bio_op() once in switch statement.  Also remove the
excessive factoring out to one line functions.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9679b5a7

M
dm: push use of on-stack flush_bio down to __send_empty_flush() · 828678b8
由 Mike Snitzer 提交于 9月 14, 2020
```
Eliminates duplicate code, no functional change.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
828678b8

dm: optimize max_io_len() by inlining max_io_len_target_boundary() · 3720281d

由 Mike Snitzer 提交于 9月 19, 2020

Saves redundant dm_target_offset() math.

Also, reverse argument order for max_io_len() to be consistent with
other similar functions.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3720281d

M
dm: push md->immutable_target optimization down to __process_bio() · 094ee64d
由 Mike Snitzer 提交于 9月 14, 2020
```
Also, update associated stale comment in __bind().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
094ee64d

dm: change max_io_len() to use blk_max_size_offset() · 5091cdec

由 Mike Snitzer 提交于 9月 18, 2020

Using blk_max_size_offset() enables DM core's splitting to impose
ti->max_io_len (via q->limits.chunk_sectors) and also fallback to
respecting q->limits.max_sectors if chunk_sectors isn't set.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5091cdec

25 9月, 2020 1 次提交

dm: add support for REQ_NOWAIT and enable it for linear target · 6abc4946

由 Konstantin Khlebnikov 提交于 9月 23, 2020

Add DM target feature flag DM_TARGET_NOWAIT which advertises that
target works with REQ_NOWAIT bios.

Add dm_table_supports_nowait() and update dm_table_set_restrictions()
to set/clear QUEUE_FLAG_NOWAIT accordingly.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6abc4946

22 9月, 2020 2 次提交

dm: fix comment in dm_process_bio() · cf9c3786

由 Mike Snitzer 提交于 9月 21, 2020

Refer to the correct function (->submit_bio instead of ->queue_bio).
Also, add details about why using blk_queue_split() isn't needed for
dm_wq_work()'s call to dm_process_bio().

Fixes: c62b37d9 ("block: move ->make_request_fn to struct block_device_operations")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cf9c3786

dm: fix bio splitting and its bio completion order for regular IO · ee1dfad5

由 Mike Snitzer 提交于 9月 14, 2020

dm_queue_split() is removed because __split_and_process_bio() _must_
handle splitting bios to ensure proper bio submission and completion
ordering as a bio is split.

Otherwise, multiple recursive calls to ->submit_bio will cause multiple
split bios to be allocated from the same ->bio_split mempool at the same
time. This would result in deadlock in low memory conditions because no
progress could be made (only one bio is available in ->bio_split
mempool).

This fix has been verified to still fix the loss of performance, due
to excess splitting, that commit 120c9257 provided.

Fixes: 120c9257 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
Cc: stable@vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
Reported-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ee1dfad5

20 9月, 2020 1 次提交

dm/dax: Fix table reference counts · 02186d88

由 Dan Williams 提交于 9月 18, 2020

A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
dm_get_live_table() fails it is still required to drop the
srcu_read_lock(). Without this change the lvm2 test-suite triggers this
warning:

    # lvm2-testsuite --only pvmove-abort-all.sh

    WARNING: lock held when returning to user space!
    5.9.0-rc5+ #251 Tainted: G           OE
    ------------------------------------------------
    lvm/1318 is leaving the kernel with locks still held!
    1 lock held by lvm/1318:
     #0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]

...and later on this hang signature:

    INFO: task lvm:1344 blocked for more than 122 seconds.
          Tainted: G           OE     5.9.0-rc5+ #251
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:lvm             state:D stack:    0 pid: 1344 ppid:     1 flags:0x00004000
    Call Trace:
     __schedule+0x45f/0xa80
     ? finish_task_switch+0x249/0x2c0
     ? wait_for_completion+0x86/0x110
     schedule+0x5f/0xd0
     schedule_timeout+0x212/0x2a0
     ? __schedule+0x467/0xa80
     ? wait_for_completion+0x86/0x110
     wait_for_completion+0xb0/0x110
     __synchronize_srcu+0xd1/0x160
     ? __bpf_trace_rcu_utilization+0x10/0x10
     __dm_suspend+0x6d/0x210 [dm_mod]
     dm_suspend+0xf6/0x140 [dm_mod]

Fixes: 7bf7eac8 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: NAdrian Huang <ahuang12@lenovo.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NAdrian Huang <ahuang12@lenovo.com>
Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>

02186d88

02 9月, 2020 1 次提交

block: fix locking for struct block_device size updates · c2b4bb8c

由 Christoph Hellwig 提交于 8月 23, 2020

Two different callers use two different mutexes for updating the
block device size, which obviously doesn't help to actually protect
against concurrent updates from the different callers.  In addition
one of the locks, bd_mutex is rather prone to deadlocks with other
parts of the block stack that use it for high level synchronization.

Switch to using a new spinlock protecting just the size updates, as
that is all we need, and make sure everyone does the update through
the proper helper.

This fixes a bug reported with the nvme revalidating disks during a
hot removal operation, which can currently deadlock on bd_mutex.
Reported-by: NXianting Tian <xianting_tian@126.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c2b4bb8c

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

05 8月, 2020 1 次提交

dm: don't call report zones for more than the user requested · a9cb9f41

由 Johannes Thumshirn 提交于 8月 04, 2020

Don't call report zones for more zones than the user actually requested,
otherwise this can lead to out-of-bounds accesses in the callback
functions.

Such a situation can happen if the target's ->report_zones() callback
function returns 0 because we've reached the end of the target and then
restart the report zones on the second target.

We're again calling into ->report_zones() and ultimately into the user
supplied callback function but when we're not subtracting the number of
zones already processed this may lead to out-of-bounds accesses in the
user callbacks.
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Fixes: d4100351 ("block: rework zone reporting")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a9cb9f41

24 7月, 2020 1 次提交

dm integrity: fix integrity recalculation that is improperly skipped · 5df96f2b

由 Mikulas Patocka 提交于 7月 23, 2020

Commit adc0daad ("dm: report suspended
device during destroy") broke integrity recalculation.

The problem is dm_suspended() returns true not only during suspend,
but also during resume. So this race condition could occur:
1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
2. integrity_recalc (&ic->recalc_work) preempts the current thread
3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
4. integrity_recalc exits and no recalculating is done.

To fix this race condition, add a function dm_post_suspending that is
only true during the postsuspend phase and use it instead of
dm_suspended().

Signed-off-by: Mikulas Patocka <mpatocka redhat com>
Fixes: adc0daad ("dm: report suspended device during destroy")
Cc: stable vger kernel org # v4.18+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5df96f2b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功