提交 · fb08a1908cb119a4585611d91461ab6d27756b14 · openeuler / Kernel

05 12月, 2021 3 次提交

dax: simplify the dax_device <-> gendisk association · fb08a190

由 Christoph Hellwig 提交于 11月 29, 2021

Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

fb08a190

dm: make the DAX support depend on CONFIG_FS_DAX · 5d2a228b

由 Christoph Hellwig 提交于 11月 29, 2021

The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax. Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

5d2a228b

dm: fix alloc_dax error handling in alloc_dev · d7519392

由 Christoph Hellwig 提交于 11月 29, 2021

Make sure ->dax_dev is NULL on error so that the cleanup path doesn't
trip over an ERR_PTR.
Reported-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211129102203.2243509-2-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

d7519392

02 11月, 2021 3 次提交

dm: don't stop request queue after the dm device is suspended · a1c2f7e7

由 Ming Lei 提交于 10月 21, 2021

For fixing queue quiesce race between driver and block layer(elevator
switch, update nr_requests, ...), we need to support concurrent quiesce
and unquiesce, which requires the two call to be balanced.

__bind() is only called from dm_swap_table() in which dm device has been
suspended already, so not necessary to stop queue again. With this way,
request queue quiesce and unquiesce can be balanced.
Reported-by: NYi Zhang <yi.zhang@redhat.com>
Fixes: e70feb8b ("blk-mq: support concurrent queue quiesce/unquiesce")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NYi Zhang <yi.zhang@redhat.com>
Link: https://lore.kernel.org/r/20211021145918.2691762-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a1c2f7e7

dm: make workqueue names device-specific · c7c879ee

由 Michał Mirosław 提交于 10月 21, 2021

Add device number to kdmflush workqueue name to help debugging CPU usage.

Resulting `ps axfu` snippet:

root 3791 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:7]
root 3792 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd_io/253:7]
root 3793 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kcryptd/253:7]
root 3794 0.0 0.0 0 0 ? S paź19 0:00 \_ [dmcrypt_write/253:7]
root 3814 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:8]
root 3815 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:9]
root 3816 0.0 0.0 0 0 ? I< paź19 0:00 \_ [kdmflush/253:10]
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c7c879ee

dm: add add_disk() error handling · 08997537

由 Luis Chamberlain 提交于 10月 15, 2021

We never checked for errors on add_disk() as this function returned
void. Now that this is fixed, use the shiny new error handling.

There are two calls to dm_setup_md_queue() which can fail then, one on
dm_early_create() and we can easily see that the error path there
calls dm_destroy in the error path. The other use case is on the ioctl
table_load case. If that fails userspace needs to call the
DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

08997537

22 10月, 2021 2 次提交

blk-crypto: rename blk_keyslot_manager to blk_crypto_profile · cb77cb5a

由 Eric Biggers 提交于 10月 18, 2021

blk_keyslot_manager is misnamed because it doesn't necessarily manage
keyslots.  It actually does several different things:

  - Contains the crypto capabilities of the device.

  - Provides functions to control the inline encryption hardware.
    Originally these were just for programming/evicting keyslots;
    however, new functionality (hardware-wrapped keys) will require new
    functions here which are unrelated to keyslots.  Moreover,
    device-mapper devices already (ab)use "keyslot_evict" to pass key
    eviction requests to their underlying devices even though
    device-mapper devices don't have any keyslots themselves (so it
    really should be "evict_key", not "keyslot_evict").

  - Sometimes (but not always!) it manages keyslots.  Originally it
    always did, but device-mapper devices don't have keyslots
    themselves, so they use a "passthrough keyslot manager" which
    doesn't actually manage keyslots.  This hack works, but the
    terminology is unnatural.  Also, some hardware doesn't have keyslots
    and thus also uses a "passthrough keyslot manager" (support for such
    hardware is yet to be upstreamed, but it will happen eventually).

Let's stop having keyslot managers which don't actually manage keyslots.
Instead, rename blk_keyslot_manager to blk_crypto_profile.

This is a fairly big change, since for consistency it also has to update
keyslot manager-related function names, variable names, and comments --
not just the actual struct name.  However it's still a fairly
straightforward change, as it doesn't change any actual functionality.

Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

cb77cb5a

blk-crypto: rename keyslot-manager files to blk-crypto-profile · 1e8d44bd

由 Eric Biggers 提交于 10月 18, 2021

In preparation for renaming struct blk_keyslot_manager to struct
blk_crypto_profile, rename the keyslot-manager.h and keyslot-manager.c
source files.  Renaming these files separately before making a lot of
changes to their contents makes it easier for git to understand that
they were renamed.

Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20211018180453.40441-3-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

1e8d44bd

21 10月, 2021 1 次提交

dm: add add_disk() error handling · e7089f65

由 Luis Chamberlain 提交于 10月 15, 2021

We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.

There are two calls to dm_setup_md_queue() which can fail then,
one on dm_early_create() and we can easily see that the error path
there calls dm_destroy in the error path. The other use case is on
the ioctl table_load case. If that fails userspace needs to call
the DM_DEV_REMOVE_CMD to cleanup the state - similar to any other
failure.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015233028.2167651-4-mcgrof@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

e7089f65

18 10月, 2021 1 次提交

block: switch polling to be bio based · 3e08773c

由 Christoph Hellwig 提交于 10月 12, 2021

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3e08773c

13 10月, 2021 1 次提交

dm: fix mempool NULL pointer race when completing IO · d208b894

由 Jiazi Li 提交于 9月 29, 2021

dm_io_dec_pending() calls end_io_acct() first and will then dec md
in-flight pending count. But if a task is swapping DM table at same
time this can result in a crash due to mempool->elements being NULL:

task1                             task2
do_resume
 ->do_suspend
  ->dm_wait_for_completion
                                  bio_endio
				   ->clone_endio
				    ->dm_io_dec_pending
				     ->end_io_acct
				      ->wakeup task1
 ->dm_swap_table
  ->__bind
   ->__bind_mempools
    ->bioset_exit
     ->mempool_exit
                                     ->free_io

[ 67.330330] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
......
[ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[ 67.330510] pc : mempool_free+0x70/0xa0
[ 67.330515] lr : mempool_free+0x4c/0xa0
[ 67.330520] sp : ffffff8008013b20
[ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004
[ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8
[ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800
[ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800
[ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80
[ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c
[ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd
[ 67.330563] x15: 000000000093b41e x14: 0000000000000010
[ 67.330569] x13: 0000000000007f7a x12: 0000000034155555
[ 67.330574] x11: 0000000000000001 x10: 0000000000000001
[ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000
[ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a
[ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001
[ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8
[ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970
[ 67.330609] Call trace:
[ 67.330616] mempool_free+0x70/0xa0
[ 67.330627] bio_put+0xf8/0x110
[ 67.330638] dec_pending+0x13c/0x230
[ 67.330644] clone_endio+0x90/0x180
[ 67.330649] bio_endio+0x198/0x1b8
[ 67.330655] dec_pending+0x190/0x230
[ 67.330660] clone_endio+0x90/0x180
[ 67.330665] bio_endio+0x198/0x1b8
[ 67.330673] blk_update_request+0x214/0x428
[ 67.330683] scsi_end_request+0x2c/0x300
[ 67.330688] scsi_io_completion+0xa0/0x710
[ 67.330695] scsi_finish_command+0xd8/0x110
[ 67.330700] scsi_softirq_done+0x114/0x148
[ 67.330708] blk_done_softirq+0x74/0xd0
[ 67.330716] __do_softirq+0x18c/0x374
[ 67.330724] irq_exit+0xb4/0xb8
[ 67.330732] __handle_domain_irq+0x84/0xc0
[ 67.330737] gic_handle_irq+0x148/0x1b0
[ 67.330744] el1_irq+0xe8/0x190
[ 67.330753] lpm_cpuidle_enter+0x4f8/0x538
[ 67.330759] cpuidle_enter_state+0x1fc/0x398
[ 67.330764] cpuidle_enter+0x18/0x20
[ 67.330772] do_idle+0x1b4/0x290
[ 67.330778] cpu_startup_entry+0x20/0x28
[ 67.330786] secondary_start_kernel+0x160/0x170

Fix this by:
1) Establishing pointers to 'struct dm_io' members in
dm_io_dec_pending() so that they may be passed into end_io_acct()
_after_ free_io() is called.
2) Moving end_io_acct() after free_io().

Cc: stable@vger.kernel.org
Signed-off-by: NJiazi Li <lijiazi@xiaomi.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d208b894

27 8月, 2021 1 次提交

dm: use fs_dax_get_by_bdev instead of dax_get_by_host · dfa584f6

由 Christoph Hellwig 提交于 8月 26, 2021

There is no point in trying to finding the dax device if the DAX flag is
not set on the queue as none of the users of the device mapper exported
block devices could make use of the DAX capability.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210826135510.6293-4-hch@lst.deSigned-off-by: NDan Williams <dan.j.williams@intel.com>

dfa584f6

21 8月, 2021 1 次提交

dm ima: add a warning in dm_init if duplicate ima events are not measured · f1cd6cb2

由 Tushar Sugandhi 提交于 8月 13, 2021

The end-users of DM devices/targets may remove and re-create the same
device multiple times. IMA does not measure such duplicate events if the
configuration CONFIG_IMA_DISABLE_HTABLE is set to 'n'.
To avoid confusion, the end-users need some indication on the client
if that configuration option is disabled.

Add a one-time warning during dm_init() if CONFIG_IMA_DISABLE_HTABLE
is set to 'n', to notify the end-users that duplicate events will not
be measured in the ima log. Also cleanup some whitespace in dm_init().
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f1cd6cb2

11 8月, 2021 1 次提交

dm ima: measure data on table load · 91ccbbac

由 Tushar Sugandhi 提交于 7月 12, 2021

DM configures a block device with various target specific attributes
passed to it as a table. DM loads the table, and calls each target’s
respective constructors with the attributes as input parameters.
Some of these attributes are critical to ensure the device meets
certain security bar. Thus, IMA should measure these attributes, to
ensure they are not tampered with, during the lifetime of the device.
So that the external services can have high confidence in the
configuration of the block-devices on a given system.

Some devices may have large tables. And a given device may change its
state (table-load, suspend, resume, rename, remove, table-clear etc.)
many times. Measuring these attributes each time when the device
changes its state will significantly increase the size of the IMA logs.
Further, once configured, these attributes are not expected to change
unless a new table is loaded, or a device is removed and recreated.
Therefore the clear-text of the attributes should only be measured
during table load, and the hash of the active/inactive table should be
measured for the remaining device state changes.

Export IMA function ima_measure_critical_data() to allow measurement
of DM device parameters, as well as target specific attributes, during
table load. Compute the hash of the inactive table and store it for
measurements during future state change. If a load is called multiple
times, update the inactive table hash with the hash of the latest
populated table. So that the correct inactive table hash is measured
when the device transitions to different states like resume, remove,
rename, etc.
Signed-off-by: NTushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # leak fix
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

91ccbbac

10 8月, 2021 3 次提交

dm: delay registering the gendisk · 89f871af

由 Christoph Hellwig 提交于 8月 04, 2021

device mapper is currently the only outlier that tries to call
register_disk after add_disk, leading to fairly inconsistent state
of these block layer data structures.  Instead change device-mapper
to just register the gendisk later now that the holder mechanism
can cope with that.

Note that this introduces a user visible change: the dm kobject is
now only visible after the initial table has been loaded.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-8-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

89f871af

dm: move setting md->type into dm_setup_md_queue · ba305859

由 Christoph Hellwig 提交于 8月 04, 2021

Move setting md->type from both callers into dm_setup_md_queue.
This ensures that md->type is only set to a valid value after the queue
has been fully setup, something we'll rely on future changes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ba305859

dm: cleanup cleanup_mapped_device · 74a2b6ec

由 Christoph Hellwig 提交于 8月 04, 2021

md->queue is now always set when md->disk is set, so simplify the
conditionals a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

74a2b6ec

18 6月, 2021 1 次提交

sched: Change task_struct::state · 2f064a59

由 Peter Zijlstra 提交于 6月 11, 2021

Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org

2f064a59

05 6月, 2021 5 次提交

dm: introduce zone append emulation · bb37d772

由 Damien Le Moal 提交于 5月 26, 2021

For zoned targets that cannot support zone append operations, implement
an emulation using regular write operations. If the original BIO
submitted by the user is a zone append operation, change its clone into
a regular write operation directed at the target zone write pointer
position.

To do so, an array of write pointer offsets (write pointer position
relative to the start of a zone) is added to struct mapped_device. All
operations that modify a sequential zone write pointer (writes, zone
reset, zone finish and zone append) are intersepted in __map_bio() and
processed using the new functions dm_zone_map_bio().

Detection of the target ability to natively support zone append
operations is done from dm_table_set_restrictions() by calling the
function dm_set_zones_restrictions(). A target that does not support
zone append operation, either by explicitly declaring it using the new
struct dm_target field zone_append_not_supported, or because the device
table contains a non-zoned device, has its mapped device marked with the
new flag DMF_ZONE_APPEND_EMULATED. The helper function
dm_emulate_zone_append() is introduced to test a mapped device for this
new flag.

Atomicity of the zones write pointer tracking and updates is done using
a zone write locking mechanism based on a bitmap. This is similar to
the block layer method but based on BIOs rather than struct request.
A zone write lock is taken in dm_zone_map_bio() for any clone BIO with
an operation type that changes the BIO target zone write pointer
position. The zone write lock is released if the clone BIO is failed
before submission or when dm_zone_endio() is called when the clone BIO
completes.

The zone write lock bitmap of the mapped device, together with a bitmap
indicating zone types (conv_zones_bitmap) and the write pointer offset
array (zwp_offset) are allocated and initialized with a full device zone
report in dm_set_zones_restrictions() using the function
dm_revalidate_zones().

For failed operations that may have modified a zone write pointer, the
zone write pointer offset is marked as invalid in dm_zone_endio().
Zones with an invalid write pointer offset are checked and the write
pointer updated using an internal report zone operation when the
faulty zone is accessed again by the user.

All functions added for this emulation have a minimal overhead for
zoned targets natively supporting zone append operations. Regular
device targets are also not affected. The added code also does not
impact builds with CONFIG_BLK_DEV_ZONED disabled by stubbing out all
dm zone related functions.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bb37d772

dm: rearrange core declarations for extended use from dm-zone.c · e2118b3c

由 Damien Le Moal 提交于 5月 26, 2021

Move the definitions of struct dm_target_io, struct dm_io and the bits
of the flags field of struct mapped_device from dm.c to dm-core.h to
make them usable from dm-zone.c. For the same reason, declare
dec_pending() in dm-core.h after renaming it to dm_io_dec_pending().
And for symmetry of the function names, introduce the inline helper
dm_io_inc_pending() instead of directly using atomic_inc() calls.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e2118b3c

dm: Forbid requeue of writes to zones · bf14e2b2

由 Damien Le Moal 提交于 5月 26, 2021

A target map method requesting the requeue of a bio with
DM_MAPIO_REQUEUE or completing it with DM_ENDIO_REQUEUE can cause
unaligned write errors if the bio is a write operation targeting a
sequential zone. If a zoned target request such a requeue, warn about
it and kill the IO.

The function dm_is_zone_write() is introduced to detect write operations
to zoned targets.

This change does not affect the target drivers supporting zoned devices
and exposing a zoned device, namely dm-crypt, dm-linear and dm-flakey as
none of these targets ever request a requeue.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bf14e2b2

dm: move zone related code to dm-zone.c · 7fc18728

由 Damien Le Moal 提交于 5月 26, 2021

Move core and table code used for zoned targets and conditionally
defined with #ifdef CONFIG_BLK_DEV_ZONED to the new file dm-zone.c.
This file is conditionally compiled depending on CONFIG_BLK_DEV_ZONED.
The small helper dm_set_zones_restrictions() is introduced to
initialize a mapped device request queue zone attributes in
dm_table_set_restrictions().
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7fc18728

dm: Fix dm_accept_partial_bio() relative to zone management commands · 6842d264

由 Damien Le Moal 提交于 5月 26, 2021

Fix dm_accept_partial_bio() to actually check that zone management
commands are not passed as explained in the function documentation
comment. Also, since a zone append operation cannot be split, add
REQ_OP_ZONE_APPEND as a forbidden command.

White lines are added around the group of BUG_ON() calls to make the
code more legible.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6842d264

01 6月, 2021 1 次提交

dm: convert to blk_alloc_disk/blk_cleanup_disk · 74fe6ba9

由 Christoph Hellwig 提交于 5月 21, 2021

Convert the dm driver to use the blk_alloc_disk and blk_cleanup_disk
helpers to simplify gendisk and request_queue allocation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20210521055116.1053587-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

74fe6ba9

27 3月, 2021 2 次提交

dm: unexport dm_{get,put}_table_device · e30de3a8

由 Christoph Hellwig 提交于 3月 18, 2021

These are only used by DM core, DM target modules should only use
dm_{get,put}_device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e30de3a8

dm: remove useless loop in __split_and_process_bio · 8615cb65

由 Mikulas Patocka 提交于 3月 01, 2021

Remove useless "while" loop. If the condition ci.sector_count && !error is
true, we go to a branch that ends with "break". If this condition is
false, the "while" loop will not be executed again. So, the loop can't be
executed more than once.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8615cb65

23 3月, 2021 1 次提交

dm: don't report "detected capacity change" on device creation · 5424a0b8

由 Mikulas Patocka 提交于 3月 22, 2021

When a DM device is first created it doesn't yet have an established
capacity, therefore the use of set_capacity_and_notify() should be
conditional given the potential for needless pr_info "detected
capacity change" noise even if capacity is 0.

One could argue that the pr_info() in set_capacity_and_notify() is
misplaced, but that position is not held uniformly.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: f64d9b2e ("dm: use set_capacity_and_notify")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5424a0b8

11 2月, 2021 2 次提交

dm: fix deadlock when swapping to encrypted device · a666e5c0

由 Mikulas Patocka 提交于 2月 10, 2021

The system would deadlock when swapping to a dm-crypt device. The reason
is that for each incoming write bio, dm-crypt allocates memory that holds
encrypted data. These excessive allocations exhaust all the memory and the
result is either deadlock or OOM trigger.

This patch limits the number of in-flight swap bios, so that the memory
consumed by dm-crypt is limited. The limit is enforced if the target set
the "limit_swap_bios" variable and if the bio has REQ_SWAP set.

Non-swap bios are not affected becuase taking the semaphore would cause
performance degradation.

This is similar to request-based drivers - they will also block when the
number of requests is over the limit.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a666e5c0

dm: add support for passing through inline crypto support · aa6ce87a

由 Satya Tangirala 提交于 2月 01, 2021

Update the device-mapper core to support exposing the inline crypto
support of the underlying device(s) through the device-mapper device.

This works by creating a "passthrough keyslot manager" for the dm
device, which declares support for encryption settings which all
underlying devices support.  When a supported setting is used, the bio
cloning code handles cloning the crypto context to the bios for all the
underlying devices.  When an unsupported setting is used, the blk-crypto
fallback is used as usual.

Crypto support on each underlying device is ignored unless the
corresponding dm target opts into exposing it.  This is needed because
for inline crypto to semantically operate on the original bio, the data
must not be transformed by the dm target.  Thus, targets like dm-linear
can expose crypto support of the underlying device, but targets like
dm-crypt can't.  (dm-crypt could use inline crypto itself, though.)

A DM device's table can only be changed if the "new" inline encryption
capabilities are a (*not* necessarily strict) superset of the "old" inline
encryption capabilities.  Attempts to make changes to the table that result
in some inline encryption capability becoming no longer supported will be
rejected.

For the sake of clarity, key eviction from underlying devices will be
handled in a future patch.
Co-developed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NSatya Tangirala <satyat@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

aa6ce87a

09 2月, 2021 1 次提交

dm table: fix DAX iterate_devices based device capability checks · 5b0fab50

由 Jeffle Xu 提交于 2月 08, 2021

Fix dm_table_supports_dax() and invert logic of both
iterate_devices_callout_fn so that all devices' DAX capabilities are
properly checked.

Fixes: 545ed20e ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5b0fab50

03 2月, 2021 1 次提交

dm: cleanup of front padding calculation · 62f26317

由 Jeffle Xu 提交于 1月 12, 2021

Add two helper macros calculating the offset of bio in struct dm_io and
struct dm_target_io respectively.

Besides, simplify the front padding calculation in
dm_alloc_md_mempools().
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

62f26317

25 1月, 2021 1 次提交

block: store a block_device pointer in struct bio · 309dca30

由 Christoph Hellwig 提交于 1月 24, 2021

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device.  From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

309dca30

09 1月, 2021 1 次提交

dm: eliminate potential source of excessive kernel log noise · 0378c625

由 Mike Snitzer 提交于 1月 06, 2021

There wasn't ever a real need to log an error in the kernel log for
ioctls issued with insufficient permissions. Simply return an error
and if an admin/user is sufficiently motivated they can enable DM's
dynamic debugging to see an explanation for why the ioctls were
disallowed.
Reported-by: NNir Soffer <nsoffer@redhat.com>
Fixes: e980f623 ("dm: don't allow ioctls to targets that don't map to whole devices")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0378c625

05 12月, 2020 6 次提交

dm: remove unnecessary current->bio_list check when submitting split bio · 985eabdc

由 Jeffle Xu 提交于 11月 04, 2020

The depth-first splitting is introduced in commit 18a25da8 ("dm:
ensure bio submission follows a depth-first tree walk"), which is used
to fix the potential deadlock in case of the misordering handling of
bios caused by bio_list. There're two paths submitting split bios,
dm_wq_work() from worker thread and submit_bio() from application. Back
upon that time, dm_wq_work() thread calls __split_and_process_bio()
directly and thus will not trigger this issue since bio_list doesn't
exist here. So this issue will only be triggered from application
calling submit_bio(), and the fix has to check if current->bio_list is
non-NULL to distinguish this case.

However since commit 0c2915b8 ("dm: fix missing imposition of
queue_limits from dm_wq_work() thread"), dm_wq_work() thread calls
submit_bio_noacct() and thus also uses bio_list. Since then all entries
into __split_and_process_bio() are under protection of bio_list, and
thus the checking of current->bio_list when determinning if the
depth-first principle should be used, seems kind of nonsense. After all
the checking always succeeds now.

Fixes: 0c2915b8 ("dm: fix missing imposition of queue_limits from dm_wq_work() thread")
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

985eabdc

dm: remove invalid sparse __acquires and __releases annotations · bde3808b

由 Mike Snitzer 提交于 12月 04, 2020

Fixes sparse warnings:
drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit

Fixes: 971888c4 ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bde3808b

dm: fix double RCU unlock in dm_dax_zero_page_range() error path · f05c4403

由 Mike Snitzer 提交于 12月 04, 2020

Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error
path to fix sparse warning:
drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock

Fixes: cdf6cdcd ("dm,dax: Add dax zero_page_range operation")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f05c4403

dm: fix IO splitting · 3ee16db3

由 Mike Snitzer 提交于 11月 30, 2020

Commit 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
   chunk_sectors must reflect the most limited of all devices in the
   IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
   .iterate_devices method no longer had there IO split properly.

And commit 5091cdec ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).

Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.

Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.

Fixes: 882ec4e6 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: NJohn Dorminy <jdorminy@redhat.com>
Reported-by: NBruce Johnston <bjohnsto@redhat.com>
Reported-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: NJohn Dorminy <jdorminy@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

3ee16db3

block: remove the request_queue argument to the block_bio_remap tracepoint · 1c02fca6

由 Christoph Hellwig 提交于 12月 03, 2020

The request_queue can trivially be derived from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c02fca6

block: remove the request_queue argument to the block_split tracepoint · eb6f7f7c

由 Christoph Hellwig 提交于 12月 03, 2020

The request_queue can trivially be derived from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb6f7f7c

02 12月, 2020 1 次提交

block: stop using bdget_disk for partition 0 · 977115c0

由 Christoph Hellwig 提交于 11月 26, 2020

We can just dereference the point in struct gendisk instead.  Also
remove the now unused export.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

977115c0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功