提交 · 1d1068cecff70cb8e48c7cb0ba27cc3fd906eb31 · openeuler / Kernel

04 2月, 2022 6 次提交

dm: retun the clone bio from alloc_tio · 1d1068ce

由 Christoph Hellwig 提交于 2月 02, 2022

Return the clone bio embedded into the tio as that is what the callers
actually want. Similar for the free side.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1d1068ce

dm: pass the bio instead of tio to __map_bio · 1561b396

由 Christoph Hellwig 提交于 2月 02, 2022

This simplifies the callers a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1561b396

dm: move cloning the bio into alloc_tio · dc8e2021

由 Christoph Hellwig 提交于 2月 02, 2022

Move the call to __bio_clone_fast and the assignment of ->len_ptr from
the callers into alloc_tio to prepare for changes to the bio clone API.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

dc8e2021

dm: fold __send_duplicate_bios into __clone_and_map_simple_bio · 8eabf5d0

由 Christoph Hellwig 提交于 2月 02, 2022

Fold __send_duplicate_bios into its only caller to prepare for
refactoring.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

8eabf5d0

dm: fold clone_bio into __clone_and_map_data_bio · b1bee792

由 Christoph Hellwig 提交于 2月 02, 2022

Fold clone_bio into its only caller to prepare for refactoring.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

b1bee792

dm: add a clone_to_tio helper · 6c23f0bd

由 Christoph Hellwig 提交于 2月 02, 2022

Add a helper to stop open coding the container_of operations to get
from the clone bio to the tio structure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6c23f0bd

02 2月, 2022 3 次提交

block: pass a block_device and opf to bio_init · 49add496

由 Christoph Hellwig 提交于 1月 24, 2022

Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

49add496

block: pass a block_device and opf to bio_alloc_bioset · 609be106

由 Christoph Hellwig 提交于 1月 24, 2022

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

609be106

dm: bio_alloc can't fail if it is allowed to sleep · 53db984e

由 Christoph Hellwig 提交于 1月 24, 2022

Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

53db984e

29 1月, 2022 2 次提交

dm: properly fix redundant bio-based IO accounting · b879f915

由 Mike Snitzer 提交于 1月 28, 2022

Record the start_time for a bio but defer the starting block core's IO
accounting until after IO is submitted using bio_start_io_acct_time().

This approach avoids the need to mess around with any of the
individual IO stats in response to a bio_split() that follows bio
submission.
Reported-by: NBud Brown <bubrown@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Depends-on: e45c47d1 ("block: add bio_start_io_acct_time() to control start_time")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-4-snitzer@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b879f915

dm: revert partial fix for redundant bio-based IO accounting · f524d9c9

由 Mike Snitzer 提交于 1月 28, 2022

Reverts a1e1cb72 ("dm: fix redundant IO accounting for bios that
need splitting") because it was too narrow in scope (only addressed
redundant 'sectors[]' accounting and not ios, nsecs[], etc).

Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f524d9c9

19 12月, 2021 2 次提交

dax: remove the copy_from_iter and copy_to_iter methods · 7ac5360c

由 Christoph Hellwig 提交于 12月 15, 2021

These methods indirect the actual DAX read/write path. In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.

Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
routines are used to remove indirect call from the read/write fast path
as well as a lot of boilerplate code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

7ac5360c

dax: remove the DAXDEV_F_SYNC flag · 30c6828a

由 Christoph Hellwig 提交于 12月 15, 2021

Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
just let the drivers call set_dax_synchronous directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NPankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

30c6828a

05 12月, 2021 5 次提交

dax: return the partition offset from fs_dax_get_by_bdev · cd913c76

由 Christoph Hellwig 提交于 11月 29, 2021

Prepare for the removal of the block_device from the DAX I/O path by
returning the partition offset from fs_dax_get_by_bdev so that the file
systems have it at hand for use during I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

cd913c76

dax: remove dax_capable · 7b0800d0

由 Christoph Hellwig 提交于 11月 29, 2021

Just open code the block size and dax_dev == NULL checks in the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs]
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

7b0800d0

dax: simplify the dax_device <-> gendisk association · fb08a190

由 Christoph Hellwig 提交于 11月 29, 2021

Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

fb08a190

dm: make the DAX support depend on CONFIG_FS_DAX · 5d2a228b

由 Christoph Hellwig 提交于 11月 29, 2021

The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax. Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

5d2a228b

dm: fix alloc_dax error handling in alloc_dev · d7519392

由 Christoph Hellwig 提交于 11月 29, 2021

Make sure ->dax_dev is NULL on error so that the cleanup path doesn't
trip over an ERR_PTR.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

d7519392

29 11月, 2021 1 次提交

block: remove GENHD_FL_EXT_DEVT · 1ebe2e5f

由 Christoph Hellwig 提交于 11月 22, 2021

All modern drivers can support extra partitions using the extended
dev_t.  In fact except for the ioctl method drivers never even see
partitions in normal operation.

So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ebe2e5f

02 11月, 2021 3 次提交

dm: don't stop request queue after the dm device is suspended · a1c2f7e7

由 Ming Lei 提交于 10月 21, 2021

For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two call to be balanced.

__bind() is only called from dm_swap_table() in which dm device has been
suspended already, so not necessary to stop queue again. With this way,
request queue quiesce and unquiesce can be balanced.
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NYi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20211021145918.2691762-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a1c2f7e7

dm: make workqueue names device-specific · c7c879ee

由 Michał Mirosław 提交于 10月 21, 2021

Add device number to kdmflush workqueue name to help debugging CPU usage.

Resulting `ps axfu` snippet:

root 3791 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:7]
root 3792 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd_io/253:7]
root 3793 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd/253:7]
root 3794 0.0 0.0 0 0 ? S paź19 0:00 \_ [dmcrypt_write/253:7]
root 3814 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:8]
root 3815 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:9]
root 3816 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:10]
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c7c879ee

dm: add add_disk() error handling · 08997537

由 Luis Chamberlain 提交于 10月 15, 2021

We never checked for errors on add_disk() as this function returned
void. Now that this is fixed, use the shiny new error handling.

There are two calls to dm_setup_md_queue() which can fail then, one on
dm_early_create() and we can easily see that the error path there
calls dm_destroy in the error path. The other use case is on the ioctl
table_load case. If that fails userspace needs to call the
DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

08997537

22 10月, 2021 2 次提交

blk-crypto: rename blk_keyslot_manager to blk_crypto_profile · cb77cb5a

由 Eric Biggers 提交于 10月 18, 2021

blk_keyslot_manager is misnamed because it doesn't necessarily manage
keyslots.  It actually does several different things:

  - Contains the crypto capabilities of the device.

  - Provides functions to control the inline encryption hardware.
    Originally these were just for programming/evicting keyslots;
    however, new functionality (hardware-wrapped keys) will require new
    functions here which are unrelated to keyslots.  Moreover,
    device-mapper devices already (ab)use "keyslot_evict" to pass key
    eviction requests to their underlying devices even though
    device-mapper devices don't have any keyslots themselves (so it
    really should be "evict_key", not "keyslot_evict").

  - Sometimes (but not always!) it manages keyslots.  Originally it
    always did, but device-mapper devices don't have keyslots
    themselves, so they use a "passthrough keyslot manager" which
    doesn't actually manage keyslots.  This hack works, but the
    terminology is unnatural.  Also, some hardware doesn't have keyslots
    and thus also uses a "passthrough keyslot manager" (support for such
    hardware is yet to be upstreamed, but it will happen eventually).

Let's stop having keyslot managers which don't actually manage keyslots.
Instead, rename blk_keyslot_manager to blk_crypto_profile.

This is a fairly big change, since for consistency it also has to update
keyslot manager-related function names, variable names, and comments --
not just the actual struct name.  However it's still a fairly
straightforward change, as it doesn't change any actual functionality.

Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

cb77cb5a

blk-crypto: rename keyslot-manager files to blk-crypto-profile · 1e8d44bd

由 Eric Biggers 提交于 10月 18, 2021

In preparation for renaming struct blk_keyslot_manager to struct
blk_crypto_profile, rename the keyslot-manager.h and keyslot-manager.c
source files.  Renaming these files separately before making a lot of
changes to their contents makes it easier for git to understand that
they were renamed.

Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-3-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e8d44bd

21 10月, 2021 1 次提交

dm: add add_disk() error handling · e7089f65

由 Luis Chamberlain 提交于 10月 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

There are two calls to dm_setup_md_queue() which can fail then,
one on dm_early_create() and we can easily see that the error path
there calls dm_destroy in the error path. The other use case is on
the ioctl table_load case. If that fails userspace needs to call
the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

e7089f65

18 10月, 2021 1 次提交

block: switch polling to be bio based · 3e08773c

由 Christoph Hellwig 提交于 10月 12, 2021

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3e08773c

13 10月, 2021 1 次提交

dm: fix mempool NULL pointer race when completing IO · d208b894

由 Jiazi Li 提交于 9月 29, 2021

dm_io_dec_pending() calls end_io_acct() first and will then dec md
in-flight pending count. But if a task is swapping DM table at same
time this can result in a crash due to mempool->elements being NULL:

task1                             task2
do_resume
 ->do_suspend
  ->dm_wait_for_completion
                                  bio_endio
				   ->clone_endio
				    ->dm_io_dec_pending
				     ->end_io_acct
				      ->wakeup task1
 ->dm_swap_table
  ->__bind
   ->__bind_mempools
    ->bioset_exit
     ->mempool_exit
                                     ->free_io

[ 67.330330] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
......
[ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[ 67.330510] pc : mempool_free+0x70/0xa0
[ 67.330515] lr : mempool_free+0x4c/0xa0
[ 67.330520] sp : ffffff8008013b20
[ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004
[ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8
[ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800
[ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800
[ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80
[ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c
[ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd
[ 67.330563] x15: 000000000093b41e x14: 0000000000000010
[ 67.330569] x13: 0000000000007f7a x12: 0000000034155555
[ 67.330574] x11: 0000000000000001 x10: 0000000000000001
[ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000
[ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a
[ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001
[ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8
[ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970
[ 67.330609] Call trace:
[ 67.330616] mempool_free+0x70/0xa0
[ 67.330627] bio_put+0xf8/0x110
[ 67.330638] dec_pending+0x13c/0x230
[ 67.330644] clone_endio+0x90/0x180
[ 67.330649] bio_endio+0x198/0x1b8
[ 67.330655] dec_pending+0x190/0x230
[ 67.330660] clone_endio+0x90/0x180
[ 67.330665] bio_endio+0x198/0x1b8
[ 67.330673] blk_update_request+0x214/0x428
[ 67.330683] scsi_end_request+0x2c/0x300
[ 67.330688] scsi_io_completion+0xa0/0x710
[ 67.330695] scsi_finish_command+0xd8/0x110
[ 67.330700] scsi_softirq_done+0x114/0x148
[ 67.330708] blk_done_softirq+0x74/0xd0
[ 67.330716] __do_softirq+0x18c/0x374
[ 67.330724] irq_exit+0xb4/0xb8
[ 67.330732] __handle_domain_irq+0x84/0xc0
[ 67.330737] gic_handle_irq+0x148/0x1b0
[ 67.330744] el1_irq+0xe8/0x190
[ 67.330753] lpm_cpuidle_enter+0x4f8/0x538
[ 67.330759] cpuidle_enter_state+0x1fc/0x398
[ 67.330764] cpuidle_enter+0x18/0x20
[ 67.330772] do_idle+0x1b4/0x290
[ 67.330778] cpu_startup_entry+0x20/0x28
[ 67.330786] secondary_start_kernel+0x160/0x170

Fix this by:
1) Establishing pointers to 'struct dm_io' members in
dm_io_dec_pending() so that they may be passed into end_io_acct()
_after_ free_io() is called.
2) Moving end_io_acct() after free_io().

Cc: stable@vger.kernel.org
Signed-off-by: NJiazi Li <lijiazi@xiaomi.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d208b894

27 8月, 2021 1 次提交

dm: use fs_dax_get_by_bdev instead of dax_get_by_host · dfa584f6

由 Christoph Hellwig 提交于 8月 26, 2021

There is no point in trying to finding the dax device if the DAX flag is
not set on the queue as none of the users of the device mapper exported
block devices could make use of the DAX capability.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210826135510.6293-4-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

dfa584f6

21 8月, 2021 1 次提交

dm ima: add a warning in dm_init if duplicate ima events are not measured · f1cd6cb2

由 Tushar Sugandhi 提交于 8月 13, 2021

The end-users of DM devices/targets may remove and re-create the same
device multiple times. IMA does not measure such duplicate events if the
configuration CONFIG_IMA_DISABLE_HTABLE is set to 'n'.
To avoid confusion, the end-users need some indication on the client
if that configuration option is disabled.

Add a one-time warning during dm_init() if CONFIG_IMA_DISABLE_HTABLE
is set to 'n', to notify the end-users that duplicate events will not
be measured in the ima log. Also cleanup some whitespace in dm_init().
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f1cd6cb2

11 8月, 2021 1 次提交

dm ima: measure data on table load · 91ccbbac

由 Tushar Sugandhi 提交于 7月 12, 2021

DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.

Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.

Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

91ccbbac

10 8月, 2021 3 次提交

dm: delay registering the gendisk · 89f871af

由 Christoph Hellwig 提交于 8月 04, 2021

device mapper is currently the only outlier that tries to call
register_disk after add_disk, leading to fairly inconsistent state
of these block layer data structures.  Instead change device-mapper
to just register the gendisk later now that the holder mechanism
can cope with that.

Note that this introduces a user visible change: the dm kobject is
now only visible after the initial table has been loaded.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

89f871af

dm: move setting md->type into dm_setup_md_queue · ba305859

由 Christoph Hellwig 提交于 8月 04, 2021

Move setting md->type from both callers into dm_setup_md_queue.
This ensures that md->type is only set to a valid value after the queue
has been fully setup, something we'll rely on future changes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ba305859

dm: cleanup cleanup_mapped_device · 74a2b6ec

由 Christoph Hellwig 提交于 8月 04, 2021

md->queue is now always set when md->disk is set, so simplify the
conditionals a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

74a2b6ec

18 6月, 2021 1 次提交

sched: Change task_struct::state · 2f064a59

由 Peter Zijlstra 提交于 6月 11, 2021

Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org

2f064a59

05 6月, 2021 5 次提交

dm: introduce zone append emulation · bb37d772

由 Damien Le Moal 提交于 5月 26, 2021

For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.

To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().

Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.

Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.

The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().

For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.

All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bb37d772

dm: rearrange core declarations for extended use from dm-zone.c · e2118b3c

由 Damien Le Moal 提交于 5月 26, 2021

Move the definitions of struct dm_target_io, struct dm_io and the bits
of the flags field of struct mapped_device from dm.c to dm-core.h to
make them usable from dm-zone.c. For the same reason, declare
dec_pending() in dm-core.h after renaming it to dm_io_dec_pending().
And for symmetry of the function names, introduce the inline helper
dm_io_inc_pending() instead of directly using atomic_inc() calls.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e2118b3c

dm: Forbid requeue of writes to zones · bf14e2b2

由 Damien Le Moal 提交于 5月 26, 2021

A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.

The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.

This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bf14e2b2

dm: move zone related code to dm-zone.c · 7fc18728

由 Damien Le Moal 提交于 5月 26, 2021

Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7fc18728

dm: Fix dm_accept_partial_bio() relative to zone management commands · 6842d264

由 Damien Le Moal 提交于 5月 26, 2021

Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6842d264

01 6月, 2021 1 次提交

dm: convert to blk_alloc_disk/blk_cleanup_disk · 74fe6ba9

由 Christoph Hellwig 提交于 5月 21, 2021

Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

74fe6ba9

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功