提交 8374cfe6 编写于 作者: L Linus Torvalds

Merge tag 'for-6.0/dm-changes' of...

Merge tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Refactor DM core's mempool allocation so that it clearer by not being
   split acorss files.

 - Improve DM core's BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling.

 - Optimize DM core's more common bio splitting by eliminating the use
   of bio cloning with bio_split+bio_chain. Shift that cloning cost to
   the relatively unlikely dm_io requeue case that only occurs during
   error handling. Introduces dm_io_rewind() that will clone a bio that
   reflects the subset of the original bio that must be requeued.

 - Remove DM core's dm_table_get_num_targets() wrapper and audit all
   dm_table_get_target() callers.

 - Fix potential for OOM with DM writecache target by setting a default
   MAX_WRITEBACK_JOBS (set to 256MiB or 1/16 of total system memory,
   whichever is smaller).

 - Fix DM writecache target's stats that are reported through
   DM-specific table info.

 - Fix use-after-free crash in dm_sm_register_threshold_callback().

 - Refine DM core's Persistent Reservation handling in preparation for
   broader work Mike Christie is doing to add compatibility with
   Microsoft Windows Failover Cluster.

 - Fix various KASAN reported bugs in the DM raid target.

 - Fix DM raid target crash due to md_handle_request() bio splitting
   that recurses to block core without properly initializing the bio's
   bi_dev.

 - Fix some code comment typos and fix some Documentation formatting.

* tag 'for-6.0/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (29 commits)
  dm: fix dm-raid crash if md_handle_request() splits bio
  dm raid: fix address sanitizer warning in raid_resume
  dm raid: fix address sanitizer warning in raid_status
  dm: Start pr_preempt from the same starting path
  dm: Fix PR release handling for non All Registrants
  dm: Start pr_reserve from the same starting path
  dm: Allow dm_call_pr to be used for path searches
  dm: return early from dm_pr_call() if DM device is suspended
  dm thin: fix use-after-free crash in dm_sm_register_threshold_callback
  dm writecache: count number of blocks discarded, not number of discard bios
  dm writecache: count number of blocks written, not number of write bios
  dm writecache: count number of blocks read, not number of read bios
  dm writecache: return void from functions
  dm kcopyd: use __GFP_HIGHMEM when allocating pages
  dm writecache: set a default MAX_WRITEBACK_JOBS
  Documentation: dm writecache: Render status list as list
  Documentation: dm writecache: add blank line before optional parameters
  dm snapshot: fix typo in snapshot_map() comment
  dm raid: remove redundant "the" in parse_raid_params() comment
  dm cache: fix typo in 2 comment blocks
  ...
...@@ -20,6 +20,7 @@ Constructor parameters: ...@@ -20,6 +20,7 @@ Constructor parameters:
size) size)
5. the number of optional parameters (the parameters with an argument 5. the number of optional parameters (the parameters with an argument
count as two) count as two)
start_sector n (default: 0) start_sector n (default: 0)
offset from the start of cache device in 512-byte sectors offset from the start of cache device in 512-byte sectors
high_watermark n (default: 50) high_watermark n (default: 50)
...@@ -74,20 +75,21 @@ Constructor parameters: ...@@ -74,20 +75,21 @@ Constructor parameters:
the origin volume in the last n milliseconds the origin volume in the last n milliseconds
Status: Status:
1. error indicator - 0 if there was no error, otherwise error number 1. error indicator - 0 if there was no error, otherwise error number
2. the number of blocks 2. the number of blocks
3. the number of free blocks 3. the number of free blocks
4. the number of blocks under writeback 4. the number of blocks under writeback
5. the number of read requests 5. the number of read blocks
6. the number of read requests that hit the cache 6. the number of read blocks that hit the cache
7. the number of write requests 7. the number of write blocks
8. the number of write requests that hit uncommitted block 8. the number of write blocks that hit uncommitted block
9. the number of write requests that hit committed block 9. the number of write blocks that hit committed block
10. the number of write requests that bypass the cache 10. the number of write blocks that bypass the cache
11. the number of write requests that are allocated in the cache 11. the number of write blocks that are allocated in the cache
12. the number of write requests that are blocked on the freelist 12. the number of write requests that are blocked on the freelist
13. the number of flush requests 13. the number of flush requests
14. the number of discard requests 14. the number of discarded blocks
Messages: Messages:
flush flush
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \ dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \
dm-ioctl.o dm-io.o dm-kcopyd.o dm-sysfs.o dm-stats.o \ dm-ioctl.o dm-io.o dm-kcopyd.o dm-sysfs.o dm-stats.o \
dm-rq.o dm-rq.o dm-io-rewind.o
dm-multipath-y += dm-path-selector.o dm-mpath.o dm-multipath-y += dm-path-selector.o dm-mpath.o
dm-historical-service-time-y += dm-ps-historical-service-time.o dm-historical-service-time-y += dm-ps-historical-service-time.o
dm-io-affinity-y += dm-ps-io-affinity.o dm-io-affinity-y += dm-ps-io-affinity.o
......
...@@ -131,7 +131,7 @@ void dm_cache_dump(struct dm_cache_metadata *cmd); ...@@ -131,7 +131,7 @@ void dm_cache_dump(struct dm_cache_metadata *cmd);
* hints will be lost. * hints will be lost.
* *
* The hints are indexed by the cblock, but many policies will not * The hints are indexed by the cblock, but many policies will not
* neccessarily have a fast way of accessing efficiently via cblock. So * necessarily have a fast way of accessing efficiently via cblock. So
* rather than querying the policy for each cblock, we let it walk its data * rather than querying the policy for each cblock, we let it walk its data
* structures and fill in the hints in whatever order it wishes. * structures and fill in the hints in whatever order it wishes.
*/ */
......
...@@ -2775,7 +2775,7 @@ static int load_mapping(void *context, dm_oblock_t oblock, dm_cblock_t cblock, ...@@ -2775,7 +2775,7 @@ static int load_mapping(void *context, dm_oblock_t oblock, dm_cblock_t cblock,
/* /*
* The discard block size in the on disk metadata is not * The discard block size in the on disk metadata is not
* neccessarily the same as we're currently using. So we have to * necessarily the same as we're currently using. So we have to
* be careful to only set the discarded attribute if we know it * be careful to only set the discarded attribute if we know it
* covers a complete block of the new size. * covers a complete block of the new size.
*/ */
......
...@@ -22,6 +22,8 @@ ...@@ -22,6 +22,8 @@
#define DM_RESERVED_MAX_IOS 1024 #define DM_RESERVED_MAX_IOS 1024
struct dm_io;
struct dm_kobject_holder { struct dm_kobject_holder {
struct kobject kobj; struct kobject kobj;
struct completion completion; struct completion completion;
...@@ -91,6 +93,14 @@ struct mapped_device { ...@@ -91,6 +93,14 @@ struct mapped_device {
spinlock_t deferred_lock; spinlock_t deferred_lock;
struct bio_list deferred; struct bio_list deferred;
/*
* requeue work context is needed for cloning one new bio
* to represent the dm_io to be requeued, since each
* dm_io may point to the original bio from FS.
*/
struct work_struct requeue_work;
struct dm_io *requeue_list;
void *interface_ptr; void *interface_ptr;
/* /*
...@@ -216,6 +226,13 @@ struct dm_table { ...@@ -216,6 +226,13 @@ struct dm_table {
#endif #endif
}; };
static inline struct dm_target *dm_table_get_target(struct dm_table *t,
unsigned int index)
{
BUG_ON(index >= t->num_targets);
return t->targets + index;
}
/* /*
* One of these is allocated per clone bio. * One of these is allocated per clone bio.
*/ */
...@@ -230,6 +247,9 @@ struct dm_target_io { ...@@ -230,6 +247,9 @@ struct dm_target_io {
sector_t old_sector; sector_t old_sector;
struct bio clone; struct bio clone;
}; };
#define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone))
#define DM_IO_BIO_OFFSET \
(offsetof(struct dm_target_io, clone) + offsetof(struct dm_io, tio))
/* /*
* dm_target_io flags * dm_target_io flags
...@@ -272,7 +292,6 @@ struct dm_io { ...@@ -272,7 +292,6 @@ struct dm_io {
atomic_t io_count; atomic_t io_count;
struct mapped_device *md; struct mapped_device *md;
struct bio *split_bio;
/* The three fields represent mapped part of original bio */ /* The three fields represent mapped part of original bio */
struct bio *orig_bio; struct bio *orig_bio;
unsigned int sector_offset; /* offset to end of orig_bio */ unsigned int sector_offset; /* offset to end of orig_bio */
...@@ -300,6 +319,8 @@ static inline void dm_io_set_flag(struct dm_io *io, unsigned int bit) ...@@ -300,6 +319,8 @@ static inline void dm_io_set_flag(struct dm_io *io, unsigned int bit)
io->flags |= (1U << bit); io->flags |= (1U << bit);
} }
void dm_io_rewind(struct dm_io *io, struct bio_set *bs);
static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj) static inline struct completion *dm_get_completion_from_kobject(struct kobject *kobj)
{ {
return &container_of(kobj, struct dm_kobject_holder, kobj)->completion; return &container_of(kobj, struct dm_kobject_holder, kobj)->completion;
......
...@@ -208,7 +208,7 @@ void dm_ima_measure_on_table_load(struct dm_table *table, unsigned int status_fl ...@@ -208,7 +208,7 @@ void dm_ima_measure_on_table_load(struct dm_table *table, unsigned int status_fl
if (!target_data_buf) if (!target_data_buf)
goto error; goto error;
num_targets = dm_table_get_num_targets(table); num_targets = table->num_targets;
if (dm_ima_alloc_and_copy_device_data(table->md, &device_data_buf, num_targets, noio)) if (dm_ima_alloc_and_copy_device_data(table->md, &device_data_buf, num_targets, noio))
goto error; goto error;
...@@ -237,9 +237,6 @@ void dm_ima_measure_on_table_load(struct dm_table *table, unsigned int status_fl ...@@ -237,9 +237,6 @@ void dm_ima_measure_on_table_load(struct dm_table *table, unsigned int status_fl
for (i = 0; i < num_targets; i++) { for (i = 0; i < num_targets; i++) {
struct dm_target *ti = dm_table_get_target(table, i); struct dm_target *ti = dm_table_get_target(table, i);
if (!ti)
goto error;
last_target_measured = 0; last_target_measured = 0;
/* /*
......
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright 2022 Red Hat, Inc.
*/
#include <linux/bio.h>
#include <linux/blk-crypto.h>
#include <linux/blk-integrity.h>
#include "dm-core.h"
static inline bool dm_bvec_iter_rewind(const struct bio_vec *bv,
struct bvec_iter *iter,
unsigned int bytes)
{
int idx;
iter->bi_size += bytes;
if (bytes <= iter->bi_bvec_done) {
iter->bi_bvec_done -= bytes;
return true;
}
bytes -= iter->bi_bvec_done;
idx = iter->bi_idx - 1;
while (idx >= 0 && bytes && bytes > bv[idx].bv_len) {
bytes -= bv[idx].bv_len;
idx--;
}
if (WARN_ONCE(idx < 0 && bytes,
"Attempted to rewind iter beyond bvec's boundaries\n")) {
iter->bi_size -= bytes;
iter->bi_bvec_done = 0;
iter->bi_idx = 0;
return false;
}
iter->bi_idx = idx;
iter->bi_bvec_done = bv[idx].bv_len - bytes;
return true;
}
#if defined(CONFIG_BLK_DEV_INTEGRITY)
/**
* dm_bio_integrity_rewind - Rewind integrity vector
* @bio: bio whose integrity vector to update
* @bytes_done: number of data bytes to rewind
*
* Description: This function calculates how many integrity bytes the
* number of completed data bytes correspond to and rewind the
* integrity vector accordingly.
*/
static void dm_bio_integrity_rewind(struct bio *bio, unsigned int bytes_done)
{
struct bio_integrity_payload *bip = bio_integrity(bio);
struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);
bip->bip_iter.bi_sector -= bio_integrity_intervals(bi, bytes_done >> 9);
dm_bvec_iter_rewind(bip->bip_vec, &bip->bip_iter, bytes);
}
#else /* CONFIG_BLK_DEV_INTEGRITY */
static inline void dm_bio_integrity_rewind(struct bio *bio,
unsigned int bytes_done)
{
return;
}
#endif
#if defined(CONFIG_BLK_INLINE_ENCRYPTION)
/* Decrements @dun by @dec, treating @dun as a multi-limb integer. */
static void dm_bio_crypt_dun_decrement(u64 dun[BLK_CRYPTO_DUN_ARRAY_SIZE],
unsigned int dec)
{
int i;
for (i = 0; dec && i < BLK_CRYPTO_DUN_ARRAY_SIZE; i++) {
u64 prev = dun[i];
dun[i] -= dec;
if (dun[i] > prev)
dec = 1;
else
dec = 0;
}
}
static void dm_bio_crypt_rewind(struct bio *bio, unsigned int bytes)
{
struct bio_crypt_ctx *bc = bio->bi_crypt_context;
dm_bio_crypt_dun_decrement(bc->bc_dun,
bytes >> bc->bc_key->data_unit_size_bits);
}
#else /* CONFIG_BLK_INLINE_ENCRYPTION */
static inline void dm_bio_crypt_rewind(struct bio *bio, unsigned int bytes)
{
return;
}
#endif
static inline void dm_bio_rewind_iter(const struct bio *bio,
struct bvec_iter *iter, unsigned int bytes)
{
iter->bi_sector -= bytes >> 9;
/* No advance means no rewind */
if (bio_no_advance_iter(bio))
iter->bi_size += bytes;
else
dm_bvec_iter_rewind(bio->bi_io_vec, iter, bytes);
}
/**
* dm_bio_rewind - update ->bi_iter of @bio by rewinding @bytes.
* @bio: bio to rewind
* @bytes: how many bytes to rewind
*
* WARNING:
* Caller must ensure that @bio has a fixed end sector, to allow
* rewinding from end of bio and restoring its original position.
* Caller is also responsibile for restoring bio's size.
*/
static void dm_bio_rewind(struct bio *bio, unsigned bytes)
{
if (bio_integrity(bio))
dm_bio_integrity_rewind(bio, bytes);
if (bio_has_crypt_ctx(bio))
dm_bio_crypt_rewind(bio, bytes);
dm_bio_rewind_iter(bio, &bio->bi_iter, bytes);
}
void dm_io_rewind(struct dm_io *io, struct bio_set *bs)
{
struct bio *orig = io->orig_bio;
struct bio *new_orig = bio_alloc_clone(orig->bi_bdev, orig,
GFP_NOIO, bs);
/*
* dm_bio_rewind can restore to previous position since the
* end sector is fixed for original bio, but we still need
* to restore bio's size manually (using io->sectors).
*/
dm_bio_rewind(new_orig, ((io->sector_offset << 9) -
orig->bi_iter.bi_size));
bio_trim(new_orig, 0, io->sectors);
bio_chain(new_orig, orig);
/*
* __bi_remaining was increased (by dm_split_and_process_bio),
* so must drop the one added in bio_chain.
*/
atomic_dec(&orig->__bi_remaining);
io->orig_bio = new_orig;
}
...@@ -832,7 +832,7 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param) ...@@ -832,7 +832,7 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
if (!(param->flags & DM_QUERY_INACTIVE_TABLE_FLAG)) { if (!(param->flags & DM_QUERY_INACTIVE_TABLE_FLAG)) {
if (get_disk_ro(disk)) if (get_disk_ro(disk))
param->flags |= DM_READONLY_FLAG; param->flags |= DM_READONLY_FLAG;
param->target_count = dm_table_get_num_targets(table); param->target_count = table->num_targets;
} }
param->flags |= DM_ACTIVE_PRESENT_FLAG; param->flags |= DM_ACTIVE_PRESENT_FLAG;
...@@ -845,7 +845,7 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param) ...@@ -845,7 +845,7 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
if (table) { if (table) {
if (!(dm_table_get_mode(table) & FMODE_WRITE)) if (!(dm_table_get_mode(table) & FMODE_WRITE))
param->flags |= DM_READONLY_FLAG; param->flags |= DM_READONLY_FLAG;
param->target_count = dm_table_get_num_targets(table); param->target_count = table->num_targets;
} }
dm_put_live_table(md, srcu_idx); dm_put_live_table(md, srcu_idx);
} }
...@@ -1248,7 +1248,7 @@ static void retrieve_status(struct dm_table *table, ...@@ -1248,7 +1248,7 @@ static void retrieve_status(struct dm_table *table,
type = STATUSTYPE_INFO; type = STATUSTYPE_INFO;
/* Get all the target info */ /* Get all the target info */
num_targets = dm_table_get_num_targets(table); num_targets = table->num_targets;
for (i = 0; i < num_targets; i++) { for (i = 0; i < num_targets; i++) {
struct dm_target *ti = dm_table_get_target(table, i); struct dm_target *ti = dm_table_get_target(table, i);
size_t l; size_t l;
......
...@@ -219,7 +219,7 @@ static struct page_list *alloc_pl(gfp_t gfp) ...@@ -219,7 +219,7 @@ static struct page_list *alloc_pl(gfp_t gfp)
if (!pl) if (!pl)
return NULL; return NULL;
pl->page = alloc_page(gfp); pl->page = alloc_page(gfp | __GFP_HIGHMEM);
if (!pl->page) { if (!pl->page) {
kfree(pl); kfree(pl);
return NULL; return NULL;
......
...@@ -1369,7 +1369,7 @@ static int parse_raid_params(struct raid_set *rs, struct dm_arg_set *as, ...@@ -1369,7 +1369,7 @@ static int parse_raid_params(struct raid_set *rs, struct dm_arg_set *as,
} }
rs->md.bitmap_info.daemon_sleep = value; rs->md.bitmap_info.daemon_sleep = value;
} else if (!strcasecmp(key, dm_raid_arg_name_by_flag(CTR_FLAG_DATA_OFFSET))) { } else if (!strcasecmp(key, dm_raid_arg_name_by_flag(CTR_FLAG_DATA_OFFSET))) {
/* Userspace passes new data_offset after having extended the the data image LV */ /* Userspace passes new data_offset after having extended the data image LV */
if (test_and_set_bit(__CTR_FLAG_DATA_OFFSET, &rs->ctr_flags)) { if (test_and_set_bit(__CTR_FLAG_DATA_OFFSET, &rs->ctr_flags)) {
rs->ti->error = "Only one data_offset argument pair allowed"; rs->ti->error = "Only one data_offset argument pair allowed";
return -EINVAL; return -EINVAL;
...@@ -3097,6 +3097,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -3097,6 +3097,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
INIT_WORK(&rs->md.event_work, do_table_event); INIT_WORK(&rs->md.event_work, do_table_event);
ti->private = rs; ti->private = rs;
ti->num_flush_bios = 1; ti->num_flush_bios = 1;
ti->needs_bio_set_dev = true;
/* Restore any requested new layout for conversion decision */ /* Restore any requested new layout for conversion decision */
rs_config_restore(rs, &rs_layout); rs_config_restore(rs, &rs_layout);
...@@ -3509,7 +3510,7 @@ static void raid_status(struct dm_target *ti, status_type_t type, ...@@ -3509,7 +3510,7 @@ static void raid_status(struct dm_target *ti, status_type_t type,
{ {
struct raid_set *rs = ti->private; struct raid_set *rs = ti->private;
struct mddev *mddev = &rs->md; struct mddev *mddev = &rs->md;
struct r5conf *conf = mddev->private; struct r5conf *conf = rs_is_raid456(rs) ? mddev->private : NULL;
int i, max_nr_stripes = conf ? conf->max_nr_stripes : 0; int i, max_nr_stripes = conf ? conf->max_nr_stripes : 0;
unsigned long recovery; unsigned long recovery;
unsigned int raid_param_cnt = 1; /* at least 1 for chunksize */ unsigned int raid_param_cnt = 1; /* at least 1 for chunksize */
...@@ -3819,7 +3820,7 @@ static void attempt_restore_of_faulty_devices(struct raid_set *rs) ...@@ -3819,7 +3820,7 @@ static void attempt_restore_of_faulty_devices(struct raid_set *rs)
memset(cleared_failed_devices, 0, sizeof(cleared_failed_devices)); memset(cleared_failed_devices, 0, sizeof(cleared_failed_devices));
for (i = 0; i < mddev->raid_disks; i++) { for (i = 0; i < rs->raid_disks; i++) {
r = &rs->dev[i].rdev; r = &rs->dev[i].rdev;
/* HM FIXME: enhance journal device recovery processing */ /* HM FIXME: enhance journal device recovery processing */
if (test_bit(Journal, &r->flags)) if (test_bit(Journal, &r->flags))
......
...@@ -43,7 +43,6 @@ unsigned dm_get_reserved_rq_based_ios(void) ...@@ -43,7 +43,6 @@ unsigned dm_get_reserved_rq_based_ios(void)
return __dm_get_module_param(&reserved_rq_based_ios, return __dm_get_module_param(&reserved_rq_based_ios,
RESERVED_REQUEST_BASED_IOS, DM_RESERVED_MAX_IOS); RESERVED_REQUEST_BASED_IOS, DM_RESERVED_MAX_IOS);
} }
EXPORT_SYMBOL_GPL(dm_get_reserved_rq_based_ios);
static unsigned dm_get_blk_mq_nr_hw_queues(void) static unsigned dm_get_blk_mq_nr_hw_queues(void)
{ {
......
...@@ -2026,7 +2026,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) ...@@ -2026,7 +2026,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
/* /*
* Write to snapshot - higher level takes care of RW/RO * Write to snapshot - higher level takes care of RW/RO
* flags so we should only get this if we are * flags so we should only get this if we are
* writeable. * writable.
*/ */
if (bio_data_dir(bio) == WRITE) { if (bio_data_dir(bio) == WRITE) {
pe = __lookup_pending_exception(s, chunk); pe = __lookup_pending_exception(s, chunk);
......
此差异已折叠。
...@@ -2045,10 +2045,13 @@ int dm_pool_register_metadata_threshold(struct dm_pool_metadata *pmd, ...@@ -2045,10 +2045,13 @@ int dm_pool_register_metadata_threshold(struct dm_pool_metadata *pmd,
dm_sm_threshold_fn fn, dm_sm_threshold_fn fn,
void *context) void *context)
{ {
int r; int r = -EINVAL;
pmd_write_lock_in_core(pmd); pmd_write_lock_in_core(pmd);
r = dm_sm_register_threshold_callback(pmd->metadata_sm, threshold, fn, context); if (!pmd->fail_io) {
r = dm_sm_register_threshold_callback(pmd->metadata_sm,
threshold, fn, context);
}
pmd_write_unlock(pmd); pmd_write_unlock(pmd);
return r; return r;
......
...@@ -3375,8 +3375,10 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -3375,8 +3375,10 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
calc_metadata_threshold(pt), calc_metadata_threshold(pt),
metadata_low_callback, metadata_low_callback,
pool); pool);
if (r) if (r) {
ti->error = "Error registering metadata threshold";
goto out_flags_changed; goto out_flags_changed;
}
dm_pool_register_pre_commit_callback(pool->pmd, dm_pool_register_pre_commit_callback(pool->pmd,
metadata_pre_commit_callback, pool); metadata_pre_commit_callback, pool);
......
...@@ -527,11 +527,10 @@ static int verity_verify_io(struct dm_verity_io *io) ...@@ -527,11 +527,10 @@ static int verity_verify_io(struct dm_verity_io *io)
if (v->validated_blocks) if (v->validated_blocks)
set_bit(cur_block, v->validated_blocks); set_bit(cur_block, v->validated_blocks);
continue; continue;
} } else if (verity_fec_decode(v, io, DM_VERITY_BLOCK_TYPE_DATA,
else if (verity_fec_decode(v, io, DM_VERITY_BLOCK_TYPE_DATA, cur_block, NULL, &start) == 0) {
cur_block, NULL, &start) == 0)
continue; continue;
else { } else {
if (bio->bi_status) { if (bio->bi_status) {
/* /*
* Error correction failed; Just return error * Error correction failed; Just return error
......
...@@ -22,7 +22,7 @@ ...@@ -22,7 +22,7 @@
#define HIGH_WATERMARK 50 #define HIGH_WATERMARK 50
#define LOW_WATERMARK 45 #define LOW_WATERMARK 45
#define MAX_WRITEBACK_JOBS 0 #define MAX_WRITEBACK_JOBS min(0x10000000 / PAGE_SIZE, totalram_pages() / 16)
#define ENDIO_LATENCY 16 #define ENDIO_LATENCY 16
#define WRITEBACK_LATENCY 64 #define WRITEBACK_LATENCY 64
#define AUTOCOMMIT_BLOCKS_SSD 65536 #define AUTOCOMMIT_BLOCKS_SSD 65536
...@@ -1325,8 +1325,8 @@ enum wc_map_op { ...@@ -1325,8 +1325,8 @@ enum wc_map_op {
WC_MAP_ERROR, WC_MAP_ERROR,
}; };
static enum wc_map_op writecache_map_remap_origin(struct dm_writecache *wc, struct bio *bio, static void writecache_map_remap_origin(struct dm_writecache *wc, struct bio *bio,
struct wc_entry *e) struct wc_entry *e)
{ {
if (e) { if (e) {
sector_t next_boundary = sector_t next_boundary =
...@@ -1334,8 +1334,6 @@ static enum wc_map_op writecache_map_remap_origin(struct dm_writecache *wc, stru ...@@ -1334,8 +1334,6 @@ static enum wc_map_op writecache_map_remap_origin(struct dm_writecache *wc, stru
if (next_boundary < bio->bi_iter.bi_size >> SECTOR_SHIFT) if (next_boundary < bio->bi_iter.bi_size >> SECTOR_SHIFT)
dm_accept_partial_bio(bio, next_boundary); dm_accept_partial_bio(bio, next_boundary);
} }
return WC_MAP_REMAP_ORIGIN;
} }
static enum wc_map_op writecache_map_read(struct dm_writecache *wc, struct bio *bio) static enum wc_map_op writecache_map_read(struct dm_writecache *wc, struct bio *bio)
...@@ -1362,14 +1360,16 @@ static enum wc_map_op writecache_map_read(struct dm_writecache *wc, struct bio * ...@@ -1362,14 +1360,16 @@ static enum wc_map_op writecache_map_read(struct dm_writecache *wc, struct bio *
map_op = WC_MAP_REMAP; map_op = WC_MAP_REMAP;
} }
} else { } else {
map_op = writecache_map_remap_origin(wc, bio, e); writecache_map_remap_origin(wc, bio, e);
wc->stats.reads += (bio->bi_iter.bi_size - wc->block_size) >> wc->block_size_bits;
map_op = WC_MAP_REMAP_ORIGIN;
} }
return map_op; return map_op;
} }
static enum wc_map_op writecache_bio_copy_ssd(struct dm_writecache *wc, struct bio *bio, static void writecache_bio_copy_ssd(struct dm_writecache *wc, struct bio *bio,
struct wc_entry *e, bool search_used) struct wc_entry *e, bool search_used)
{ {
unsigned bio_size = wc->block_size; unsigned bio_size = wc->block_size;
sector_t start_cache_sec = cache_sector(wc, e); sector_t start_cache_sec = cache_sector(wc, e);
...@@ -1409,14 +1409,15 @@ static enum wc_map_op writecache_bio_copy_ssd(struct dm_writecache *wc, struct b ...@@ -1409,14 +1409,15 @@ static enum wc_map_op writecache_bio_copy_ssd(struct dm_writecache *wc, struct b
bio->bi_iter.bi_sector = start_cache_sec; bio->bi_iter.bi_sector = start_cache_sec;
dm_accept_partial_bio(bio, bio_size >> SECTOR_SHIFT); dm_accept_partial_bio(bio, bio_size >> SECTOR_SHIFT);
wc->stats.writes += bio->bi_iter.bi_size >> wc->block_size_bits;
wc->stats.writes_allocate += (bio->bi_iter.bi_size - wc->block_size) >> wc->block_size_bits;
if (unlikely(wc->uncommitted_blocks >= wc->autocommit_blocks)) { if (unlikely(wc->uncommitted_blocks >= wc->autocommit_blocks)) {
wc->uncommitted_blocks = 0; wc->uncommitted_blocks = 0;
queue_work(wc->writeback_wq, &wc->flush_work); queue_work(wc->writeback_wq, &wc->flush_work);
} else { } else {
writecache_schedule_autocommit(wc); writecache_schedule_autocommit(wc);
} }
return WC_MAP_REMAP;
} }
static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio *bio) static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio *bio)
...@@ -1426,9 +1427,10 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio ...@@ -1426,9 +1427,10 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio
do { do {
bool found_entry = false; bool found_entry = false;
bool search_used = false; bool search_used = false;
wc->stats.writes++; if (writecache_has_error(wc)) {
if (writecache_has_error(wc)) wc->stats.writes += bio->bi_iter.bi_size >> wc->block_size_bits;
return WC_MAP_ERROR; return WC_MAP_ERROR;
}
e = writecache_find_entry(wc, bio->bi_iter.bi_sector, 0); e = writecache_find_entry(wc, bio->bi_iter.bi_sector, 0);
if (e) { if (e) {
if (!writecache_entry_is_committed(wc, e)) { if (!writecache_entry_is_committed(wc, e)) {
...@@ -1452,9 +1454,11 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio ...@@ -1452,9 +1454,11 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio
if (unlikely(!e)) { if (unlikely(!e)) {
if (!WC_MODE_PMEM(wc) && !found_entry) { if (!WC_MODE_PMEM(wc) && !found_entry) {
direct_write: direct_write:
wc->stats.writes_around++;
e = writecache_find_entry(wc, bio->bi_iter.bi_sector, WFE_RETURN_FOLLOWING); e = writecache_find_entry(wc, bio->bi_iter.bi_sector, WFE_RETURN_FOLLOWING);
return writecache_map_remap_origin(wc, bio, e); writecache_map_remap_origin(wc, bio, e);
wc->stats.writes_around += bio->bi_iter.bi_size >> wc->block_size_bits;
wc->stats.writes += bio->bi_iter.bi_size >> wc->block_size_bits;
return WC_MAP_REMAP_ORIGIN;
} }
wc->stats.writes_blocked_on_freelist++; wc->stats.writes_blocked_on_freelist++;
writecache_wait_on_freelist(wc); writecache_wait_on_freelist(wc);
...@@ -1465,10 +1469,13 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio ...@@ -1465,10 +1469,13 @@ static enum wc_map_op writecache_map_write(struct dm_writecache *wc, struct bio
wc->uncommitted_blocks++; wc->uncommitted_blocks++;
wc->stats.writes_allocate++; wc->stats.writes_allocate++;
bio_copy: bio_copy:
if (WC_MODE_PMEM(wc)) if (WC_MODE_PMEM(wc)) {
bio_copy_block(wc, bio, memory_data(wc, e)); bio_copy_block(wc, bio, memory_data(wc, e));
else wc->stats.writes++;
return writecache_bio_copy_ssd(wc, bio, e, search_used); } else {
writecache_bio_copy_ssd(wc, bio, e, search_used);
return WC_MAP_REMAP;
}
} while (bio->bi_iter.bi_size); } while (bio->bi_iter.bi_size);
if (unlikely(bio->bi_opf & REQ_FUA || wc->uncommitted_blocks >= wc->autocommit_blocks)) if (unlikely(bio->bi_opf & REQ_FUA || wc->uncommitted_blocks >= wc->autocommit_blocks))
...@@ -1503,7 +1510,7 @@ static enum wc_map_op writecache_map_flush(struct dm_writecache *wc, struct bio ...@@ -1503,7 +1510,7 @@ static enum wc_map_op writecache_map_flush(struct dm_writecache *wc, struct bio
static enum wc_map_op writecache_map_discard(struct dm_writecache *wc, struct bio *bio) static enum wc_map_op writecache_map_discard(struct dm_writecache *wc, struct bio *bio)
{ {
wc->stats.discards++; wc->stats.discards += bio->bi_iter.bi_size >> wc->block_size_bits;
if (writecache_has_error(wc)) if (writecache_has_error(wc))
return WC_MAP_ERROR; return WC_MAP_ERROR;
......
...@@ -273,11 +273,8 @@ static int device_not_zone_append_capable(struct dm_target *ti, ...@@ -273,11 +273,8 @@ static int device_not_zone_append_capable(struct dm_target *ti,
static bool dm_table_supports_zone_append(struct dm_table *t) static bool dm_table_supports_zone_append(struct dm_table *t)
{ {
struct dm_target *ti; for (unsigned int i = 0; i < t->num_targets; i++) {
unsigned int i; struct dm_target *ti = dm_table_get_target(t, i);
for (i = 0; i < dm_table_get_num_targets(t); i++) {
ti = dm_table_get_target(t, i);
if (ti->emulate_zone_append) if (ti->emulate_zone_append)
return false; return false;
......
...@@ -88,10 +88,6 @@ struct clone_info { ...@@ -88,10 +88,6 @@ struct clone_info {
bool submit_as_polled:1; bool submit_as_polled:1;
}; };
#define DM_TARGET_IO_BIO_OFFSET (offsetof(struct dm_target_io, clone))
#define DM_IO_BIO_OFFSET \
(offsetof(struct dm_target_io, clone) + offsetof(struct dm_io, tio))
static inline struct dm_target_io *clone_to_tio(struct bio *clone) static inline struct dm_target_io *clone_to_tio(struct bio *clone)
{ {
return container_of(clone, struct dm_target_io, clone); return container_of(clone, struct dm_target_io, clone);
...@@ -415,7 +411,7 @@ static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo) ...@@ -415,7 +411,7 @@ static int dm_blk_getgeo(struct block_device *bdev, struct hd_geometry *geo)
static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx, static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
struct block_device **bdev) struct block_device **bdev)
{ {
struct dm_target *tgt; struct dm_target *ti;
struct dm_table *map; struct dm_table *map;
int r; int r;
...@@ -426,17 +422,17 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx, ...@@ -426,17 +422,17 @@ static int dm_prepare_ioctl(struct mapped_device *md, int *srcu_idx,
return r; return r;
/* We only support devices that have a single target */ /* We only support devices that have a single target */
if (dm_table_get_num_targets(map) != 1) if (map->num_targets != 1)
return r; return r;
tgt = dm_table_get_target(map, 0); ti = dm_table_get_target(map, 0);
if (!tgt->type->prepare_ioctl) if (!ti->type->prepare_ioctl)
return r; return r;
if (dm_suspended_md(md)) if (dm_suspended_md(md))
return -EAGAIN; return -EAGAIN;
r = tgt->type->prepare_ioctl(tgt, bdev); r = ti->type->prepare_ioctl(ti, bdev);
if (r == -ENOTCONN && !fatal_signal_pending(current)) { if (r == -ENOTCONN && !fatal_signal_pending(current)) {
dm_put_live_table(md, *srcu_idx); dm_put_live_table(md, *srcu_idx);
msleep(10); msleep(10);
...@@ -578,9 +574,6 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) ...@@ -578,9 +574,6 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
struct bio *clone; struct bio *clone;
clone = bio_alloc_clone(NULL, bio, GFP_NOIO, &md->mempools->io_bs); clone = bio_alloc_clone(NULL, bio, GFP_NOIO, &md->mempools->io_bs);
/* Set default bdev, but target must bio_set_dev() before issuing IO */
clone->bi_bdev = md->disk->part0;
tio = clone_to_tio(clone); tio = clone_to_tio(clone);
tio->flags = 0; tio->flags = 0;
dm_tio_set_flag(tio, DM_TIO_INSIDE_DM_IO); dm_tio_set_flag(tio, DM_TIO_INSIDE_DM_IO);
...@@ -594,7 +587,6 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) ...@@ -594,7 +587,6 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
atomic_set(&io->io_count, 2); atomic_set(&io->io_count, 2);
this_cpu_inc(*md->pending_io); this_cpu_inc(*md->pending_io);
io->orig_bio = bio; io->orig_bio = bio;
io->split_bio = NULL;
io->md = md; io->md = md;
spin_lock_init(&io->lock); spin_lock_init(&io->lock);
io->start_time = jiffies; io->start_time = jiffies;
...@@ -614,6 +606,7 @@ static void free_io(struct dm_io *io) ...@@ -614,6 +606,7 @@ static void free_io(struct dm_io *io)
static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti, static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti,
unsigned target_bio_nr, unsigned *len, gfp_t gfp_mask) unsigned target_bio_nr, unsigned *len, gfp_t gfp_mask)
{ {
struct mapped_device *md = ci->io->md;
struct dm_target_io *tio; struct dm_target_io *tio;
struct bio *clone; struct bio *clone;
...@@ -623,14 +616,10 @@ static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti, ...@@ -623,14 +616,10 @@ static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti,
/* alloc_io() already initialized embedded clone */ /* alloc_io() already initialized embedded clone */
clone = &tio->clone; clone = &tio->clone;
} else { } else {
struct mapped_device *md = ci->io->md;
clone = bio_alloc_clone(NULL, ci->bio, gfp_mask, clone = bio_alloc_clone(NULL, ci->bio, gfp_mask,
&md->mempools->bs); &md->mempools->bs);
if (!clone) if (!clone)
return NULL; return NULL;
/* Set default bdev, but target must bio_set_dev() before issuing IO */
clone->bi_bdev = md->disk->part0;
/* REQ_DM_POLL_LIST shouldn't be inherited */ /* REQ_DM_POLL_LIST shouldn't be inherited */
clone->bi_opf &= ~REQ_DM_POLL_LIST; clone->bi_opf &= ~REQ_DM_POLL_LIST;
...@@ -646,6 +635,11 @@ static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti, ...@@ -646,6 +635,11 @@ static struct bio *alloc_tio(struct clone_info *ci, struct dm_target *ti,
tio->len_ptr = len; tio->len_ptr = len;
tio->old_sector = 0; tio->old_sector = 0;
/* Set default bdev, but target must bio_set_dev() before issuing IO */
clone->bi_bdev = md->disk->part0;
if (unlikely(ti->needs_bio_set_dev))
bio_set_dev(clone, md->disk->part0);
if (len) { if (len) {
clone->bi_iter.bi_size = to_bytes(*len); clone->bi_iter.bi_size = to_bytes(*len);
if (bio_integrity(clone)) if (bio_integrity(clone))
...@@ -884,22 +878,63 @@ static int __noflush_suspending(struct mapped_device *md) ...@@ -884,22 +878,63 @@ static int __noflush_suspending(struct mapped_device *md)
return test_bit(DMF_NOFLUSH_SUSPENDING, &md->flags); return test_bit(DMF_NOFLUSH_SUSPENDING, &md->flags);
} }
static void dm_io_complete(struct dm_io *io) static void dm_requeue_add_io(struct dm_io *io, bool first_stage)
{ {
blk_status_t io_error;
struct mapped_device *md = io->md; struct mapped_device *md = io->md;
struct bio *bio = io->split_bio ? io->split_bio : io->orig_bio;
if (io->status == BLK_STS_DM_REQUEUE) { if (first_stage) {
struct dm_io *next = md->requeue_list;
md->requeue_list = io;
io->next = next;
} else {
bio_list_add_head(&md->deferred, io->orig_bio);
}
}
static void dm_kick_requeue(struct mapped_device *md, bool first_stage)
{
if (first_stage)
queue_work(md->wq, &md->requeue_work);
else
queue_work(md->wq, &md->work);
}
/*
* Return true if the dm_io's original bio is requeued.
* io->status is updated with error if requeue disallowed.
*/
static bool dm_handle_requeue(struct dm_io *io, bool first_stage)
{
struct bio *bio = io->orig_bio;
bool handle_requeue = (io->status == BLK_STS_DM_REQUEUE);
bool handle_polled_eagain = ((io->status == BLK_STS_AGAIN) &&
(bio->bi_opf & REQ_POLLED));
struct mapped_device *md = io->md;
bool requeued = false;
if (handle_requeue || handle_polled_eagain) {
unsigned long flags; unsigned long flags;
if (bio->bi_opf & REQ_POLLED) {
/*
* Upper layer won't help us poll split bio
* (io->orig_bio may only reflect a subset of the
* pre-split original) so clear REQ_POLLED.
*/
bio_clear_polled(bio);
}
/* /*
* Target requested pushing back the I/O. * Target requested pushing back the I/O or
* polled IO hit BLK_STS_AGAIN.
*/ */
spin_lock_irqsave(&md->deferred_lock, flags); spin_lock_irqsave(&md->deferred_lock, flags);
if (__noflush_suspending(md) && if ((__noflush_suspending(md) &&
!WARN_ON_ONCE(dm_is_zone_write(md, bio))) { !WARN_ON_ONCE(dm_is_zone_write(md, bio))) ||
/* NOTE early return due to BLK_STS_DM_REQUEUE below */ handle_polled_eagain || first_stage) {
bio_list_add_head(&md->deferred, bio); dm_requeue_add_io(io, first_stage);
requeued = true;
} else { } else {
/* /*
* noflush suspend was interrupted or this is * noflush suspend was interrupted or this is
...@@ -910,6 +945,23 @@ static void dm_io_complete(struct dm_io *io) ...@@ -910,6 +945,23 @@ static void dm_io_complete(struct dm_io *io)
spin_unlock_irqrestore(&md->deferred_lock, flags); spin_unlock_irqrestore(&md->deferred_lock, flags);
} }
if (requeued)
dm_kick_requeue(md, first_stage);
return requeued;
}
static void __dm_io_complete(struct dm_io *io, bool first_stage)
{
struct bio *bio = io->orig_bio;
struct mapped_device *md = io->md;
blk_status_t io_error;
bool requeued;
requeued = dm_handle_requeue(io, first_stage);
if (requeued && first_stage)
return;
io_error = io->status; io_error = io->status;
if (dm_io_flagged(io, DM_IO_ACCOUNTED)) if (dm_io_flagged(io, DM_IO_ACCOUNTED))
dm_end_io_acct(io); dm_end_io_acct(io);
...@@ -929,23 +981,9 @@ static void dm_io_complete(struct dm_io *io) ...@@ -929,23 +981,9 @@ static void dm_io_complete(struct dm_io *io)
if (unlikely(wq_has_sleeper(&md->wait))) if (unlikely(wq_has_sleeper(&md->wait)))
wake_up(&md->wait); wake_up(&md->wait);
if (io_error == BLK_STS_DM_REQUEUE || io_error == BLK_STS_AGAIN) { /* Return early if the original bio was requeued */
if (bio->bi_opf & REQ_POLLED) { if (requeued)
/* return;
* Upper layer won't help us poll split bio (io->orig_bio
* may only reflect a subset of the pre-split original)
* so clear REQ_POLLED in case of requeue.
*/
bio_clear_polled(bio);
if (io_error == BLK_STS_AGAIN) {
/* io_uring doesn't handle BLK_STS_AGAIN (yet) */
queue_io(md, bio);
return;
}
}
if (io_error == BLK_STS_DM_REQUEUE)
return;
}
if (bio_is_flush_with_data(bio)) { if (bio_is_flush_with_data(bio)) {
/* /*
...@@ -962,6 +1000,58 @@ static void dm_io_complete(struct dm_io *io) ...@@ -962,6 +1000,58 @@ static void dm_io_complete(struct dm_io *io)
} }
} }
static void dm_wq_requeue_work(struct work_struct *work)
{
struct mapped_device *md = container_of(work, struct mapped_device,
requeue_work);
unsigned long flags;
struct dm_io *io;
/* reuse deferred lock to simplify dm_handle_requeue */
spin_lock_irqsave(&md->deferred_lock, flags);
io = md->requeue_list;
md->requeue_list = NULL;
spin_unlock_irqrestore(&md->deferred_lock, flags);
while (io) {
struct dm_io *next = io->next;
dm_io_rewind(io, &md->queue->bio_split);
io->next = NULL;
__dm_io_complete(io, false);
io = next;
}
}
/*
* Two staged requeue:
*
* 1) io->orig_bio points to the real original bio, and the part mapped to
* this io must be requeued, instead of other parts of the original bio.
*
* 2) io->orig_bio points to new cloned bio which matches the requeued dm_io.
*/
static void dm_io_complete(struct dm_io *io)
{
bool first_requeue;
/*
* Only dm_io that has been split needs two stage requeue, otherwise
* we may run into long bio clone chain during suspend and OOM could
* be triggered.
*
* Also flush data dm_io won't be marked as DM_IO_WAS_SPLIT, so they
* also aren't handled via the first stage requeue.
*/
if (dm_io_flagged(io, DM_IO_WAS_SPLIT))
first_requeue = true;
else
first_requeue = false;
__dm_io_complete(io, first_requeue);
}
/* /*
* Decrements the number of outstanding ios that a bio has been * Decrements the number of outstanding ios that a bio has been
* cloned into, completing the original io if necc. * cloned into, completing the original io if necc.
...@@ -1240,6 +1330,7 @@ static size_t dm_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, ...@@ -1240,6 +1330,7 @@ static size_t dm_dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff,
void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
{ {
struct dm_target_io *tio = clone_to_tio(bio); struct dm_target_io *tio = clone_to_tio(bio);
struct dm_io *io = tio->io;
unsigned bio_sectors = bio_sectors(bio); unsigned bio_sectors = bio_sectors(bio);
BUG_ON(dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO)); BUG_ON(dm_tio_flagged(tio, DM_TIO_IS_DUPLICATE_BIO));
...@@ -1255,8 +1346,9 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors) ...@@ -1255,8 +1346,9 @@ void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors)
* __split_and_process_bio() may have already saved mapped part * __split_and_process_bio() may have already saved mapped part
* for accounting but it is being reduced so update accordingly. * for accounting but it is being reduced so update accordingly.
*/ */
dm_io_set_flag(tio->io, DM_IO_WAS_SPLIT); dm_io_set_flag(io, DM_IO_WAS_SPLIT);
tio->io->sectors = n_sectors; io->sectors = n_sectors;
io->sector_offset = bio_sectors(io->orig_bio);
} }
EXPORT_SYMBOL_GPL(dm_accept_partial_bio); EXPORT_SYMBOL_GPL(dm_accept_partial_bio);
...@@ -1379,17 +1471,7 @@ static void setup_split_accounting(struct clone_info *ci, unsigned len) ...@@ -1379,17 +1471,7 @@ static void setup_split_accounting(struct clone_info *ci, unsigned len)
*/ */
dm_io_set_flag(io, DM_IO_WAS_SPLIT); dm_io_set_flag(io, DM_IO_WAS_SPLIT);
io->sectors = len; io->sectors = len;
} io->sector_offset = bio_sectors(ci->bio);
if (static_branch_unlikely(&stats_enabled) &&
unlikely(dm_stats_used(&io->md->stats))) {
/*
* Save bi_sector in terms of its offset from end of
* original bio, only needed for DM-stats' benefit.
* - saved regardless of whether split needed so that
* dm_accept_partial_bio() doesn't need to.
*/
io->sector_offset = bio_end_sector(ci->bio) - ci->sector;
} }
} }
...@@ -1423,11 +1505,11 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci, ...@@ -1423,11 +1505,11 @@ static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
} }
static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
unsigned num_bios, unsigned *len) unsigned int num_bios, unsigned *len)
{ {
struct bio_list blist = BIO_EMPTY_LIST; struct bio_list blist = BIO_EMPTY_LIST;
struct bio *clone; struct bio *clone;
int ret = 0; unsigned int ret = 0;
switch (num_bios) { switch (num_bios) {
case 0: case 0:
...@@ -1455,8 +1537,7 @@ static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti, ...@@ -1455,8 +1537,7 @@ static int __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
static void __send_empty_flush(struct clone_info *ci) static void __send_empty_flush(struct clone_info *ci)
{ {
unsigned target_nr = 0; struct dm_table *t = ci->map;
struct dm_target *ti;
struct bio flush_bio; struct bio flush_bio;
/* /*
...@@ -1471,8 +1552,9 @@ static void __send_empty_flush(struct clone_info *ci) ...@@ -1471,8 +1552,9 @@ static void __send_empty_flush(struct clone_info *ci)
ci->sector_count = 0; ci->sector_count = 0;
ci->io->tio.clone.bi_iter.bi_size = 0; ci->io->tio.clone.bi_iter.bi_size = 0;
while ((ti = dm_table_get_target(ci->map, target_nr++))) { for (unsigned int i = 0; i < t->num_targets; i++) {
int bios; unsigned int bios;
struct dm_target *ti = dm_table_get_target(t, i);
atomic_add(ti->num_flush_bios, &ci->io->io_count); atomic_add(ti->num_flush_bios, &ci->io->io_count);
bios = __send_duplicate_bios(ci, ti, ti->num_flush_bios, NULL); bios = __send_duplicate_bios(ci, ti, ti->num_flush_bios, NULL);
...@@ -1492,7 +1574,7 @@ static void __send_changing_extent_only(struct clone_info *ci, struct dm_target ...@@ -1492,7 +1574,7 @@ static void __send_changing_extent_only(struct clone_info *ci, struct dm_target
unsigned num_bios) unsigned num_bios)
{ {
unsigned len; unsigned len;
int bios; unsigned int bios;
len = min_t(sector_t, ci->sector_count, len = min_t(sector_t, ci->sector_count,
max_io_len_target_boundary(ti, dm_target_offset(ti, ci->sector))); max_io_len_target_boundary(ti, dm_target_offset(ti, ci->sector)));
...@@ -1691,11 +1773,9 @@ static void dm_split_and_process_bio(struct mapped_device *md, ...@@ -1691,11 +1773,9 @@ static void dm_split_and_process_bio(struct mapped_device *md,
* Remainder must be passed to submit_bio_noacct() so it gets handled * Remainder must be passed to submit_bio_noacct() so it gets handled
* *after* bios already submitted have been completely processed. * *after* bios already submitted have been completely processed.
*/ */
WARN_ON_ONCE(!dm_io_flagged(io, DM_IO_WAS_SPLIT)); bio_trim(bio, io->sectors, ci.sector_count);
io->split_bio = bio_split(bio, io->sectors, GFP_NOIO, trace_block_split(bio, bio->bi_iter.bi_sector);
&md->queue->bio_split); bio_inc_remaining(bio);
bio_chain(io->split_bio, bio);
trace_block_split(io->split_bio, bio->bi_iter.bi_sector);
submit_bio_noacct(bio); submit_bio_noacct(bio);
out: out:
/* /*
...@@ -1971,9 +2051,11 @@ static struct mapped_device *alloc_dev(int minor) ...@@ -1971,9 +2051,11 @@ static struct mapped_device *alloc_dev(int minor)
init_waitqueue_head(&md->wait); init_waitqueue_head(&md->wait);
INIT_WORK(&md->work, dm_wq_work); INIT_WORK(&md->work, dm_wq_work);
INIT_WORK(&md->requeue_work, dm_wq_requeue_work);
init_waitqueue_head(&md->eventq); init_waitqueue_head(&md->eventq);
init_completion(&md->kobj_holder.completion); init_completion(&md->kobj_holder.completion);
md->requeue_list = NULL;
md->swap_bios = get_swap_bios(); md->swap_bios = get_swap_bios();
sema_init(&md->swap_bios_semaphore, md->swap_bios); sema_init(&md->swap_bios_semaphore, md->swap_bios);
mutex_init(&md->swap_bios_lock); mutex_init(&md->swap_bios_lock);
...@@ -2980,54 +3062,6 @@ int dm_noflush_suspending(struct dm_target *ti) ...@@ -2980,54 +3062,6 @@ int dm_noflush_suspending(struct dm_target *ti)
} }
EXPORT_SYMBOL_GPL(dm_noflush_suspending); EXPORT_SYMBOL_GPL(dm_noflush_suspending);
struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_queue_mode type,
unsigned per_io_data_size, unsigned min_pool_size,
bool integrity, bool poll)
{
struct dm_md_mempools *pools = kzalloc_node(sizeof(*pools), GFP_KERNEL, md->numa_node_id);
unsigned int pool_size = 0;
unsigned int front_pad, io_front_pad;
int ret;
if (!pools)
return NULL;
switch (type) {
case DM_TYPE_BIO_BASED:
case DM_TYPE_DAX_BIO_BASED:
pool_size = max(dm_get_reserved_bio_based_ios(), min_pool_size);
front_pad = roundup(per_io_data_size, __alignof__(struct dm_target_io)) + DM_TARGET_IO_BIO_OFFSET;
io_front_pad = roundup(per_io_data_size, __alignof__(struct dm_io)) + DM_IO_BIO_OFFSET;
ret = bioset_init(&pools->io_bs, pool_size, io_front_pad, poll ? BIOSET_PERCPU_CACHE : 0);
if (ret)
goto out;
if (integrity && bioset_integrity_create(&pools->io_bs, pool_size))
goto out;
break;
case DM_TYPE_REQUEST_BASED:
pool_size = max(dm_get_reserved_rq_based_ios(), min_pool_size);
front_pad = offsetof(struct dm_rq_clone_bio_info, clone);
/* per_io_data_size is used for blk-mq pdu at queue allocation */
break;
default:
BUG();
}
ret = bioset_init(&pools->bs, pool_size, front_pad, 0);
if (ret)
goto out;
if (integrity && bioset_integrity_create(&pools->bs, pool_size))
goto out;
return pools;
out:
dm_free_md_mempools(pools);
return NULL;
}
void dm_free_md_mempools(struct dm_md_mempools *pools) void dm_free_md_mempools(struct dm_md_mempools *pools)
{ {
if (!pools) if (!pools)
...@@ -3043,11 +3077,14 @@ struct dm_pr { ...@@ -3043,11 +3077,14 @@ struct dm_pr {
u64 old_key; u64 old_key;
u64 new_key; u64 new_key;
u32 flags; u32 flags;
bool abort;
bool fail_early; bool fail_early;
int ret;
enum pr_type type;
}; };
static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn, static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
void *data) struct dm_pr *pr)
{ {
struct mapped_device *md = bdev->bd_disk->private_data; struct mapped_device *md = bdev->bd_disk->private_data;
struct dm_table *table; struct dm_table *table;
...@@ -3059,15 +3096,21 @@ static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn, ...@@ -3059,15 +3096,21 @@ static int dm_call_pr(struct block_device *bdev, iterate_devices_callout_fn fn,
goto out; goto out;
/* We only support devices that have a single target */ /* We only support devices that have a single target */
if (dm_table_get_num_targets(table) != 1) if (table->num_targets != 1)
goto out; goto out;
ti = dm_table_get_target(table, 0); ti = dm_table_get_target(table, 0);
if (dm_suspended_md(md)) {
ret = -EAGAIN;
goto out;
}
ret = -EINVAL; ret = -EINVAL;
if (!ti->type->iterate_devices) if (!ti->type->iterate_devices)
goto out; goto out;
ret = ti->type->iterate_devices(ti, fn, data); ti->type->iterate_devices(ti, fn, pr);
ret = 0;
out: out:
dm_put_live_table(md, srcu_idx); dm_put_live_table(md, srcu_idx);
return ret; return ret;
...@@ -3081,10 +3124,24 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev, ...@@ -3081,10 +3124,24 @@ static int __dm_pr_register(struct dm_target *ti, struct dm_dev *dev,
{ {
struct dm_pr *pr = data; struct dm_pr *pr = data;
const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops; const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
int ret;
if (!ops || !ops->pr_register) {
pr->ret = -EOPNOTSUPP;
return -1;
}
if (!ops || !ops->pr_register) ret = ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags);
return -EOPNOTSUPP; if (!ret)
return ops->pr_register(dev->bdev, pr->old_key, pr->new_key, pr->flags); return 0;
if (!pr->ret)
pr->ret = ret;
if (pr->fail_early)
return -1;
return 0;
} }
static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
...@@ -3095,82 +3152,145 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key, ...@@ -3095,82 +3152,145 @@ static int dm_pr_register(struct block_device *bdev, u64 old_key, u64 new_key,
.new_key = new_key, .new_key = new_key,
.flags = flags, .flags = flags,
.fail_early = true, .fail_early = true,
.ret = 0,
}; };
int ret; int ret;
ret = dm_call_pr(bdev, __dm_pr_register, &pr); ret = dm_call_pr(bdev, __dm_pr_register, &pr);
if (ret && new_key) { if (ret) {
/* unregister all paths if we failed to register any path */ /* Didn't even get to register a path */
pr.old_key = new_key; return ret;
pr.new_key = 0;
pr.flags = 0;
pr.fail_early = false;
dm_call_pr(bdev, __dm_pr_register, &pr);
} }
if (!pr.ret)
return 0;
ret = pr.ret;
if (!new_key)
return ret;
/* unregister all paths if we failed to register any path */
pr.old_key = new_key;
pr.new_key = 0;
pr.flags = 0;
pr.fail_early = false;
(void) dm_call_pr(bdev, __dm_pr_register, &pr);
return ret; return ret;
} }
static int __dm_pr_reserve(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
struct dm_pr *pr = data;
const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
if (!ops || !ops->pr_reserve) {
pr->ret = -EOPNOTSUPP;
return -1;
}
pr->ret = ops->pr_reserve(dev->bdev, pr->old_key, pr->type, pr->flags);
if (!pr->ret)
return -1;
return 0;
}
static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type, static int dm_pr_reserve(struct block_device *bdev, u64 key, enum pr_type type,
u32 flags) u32 flags)
{ {
struct mapped_device *md = bdev->bd_disk->private_data; struct dm_pr pr = {
const struct pr_ops *ops; .old_key = key,
int r, srcu_idx; .flags = flags,
.type = type,
.fail_early = false,
.ret = 0,
};
int ret;
r = dm_prepare_ioctl(md, &srcu_idx, &bdev); ret = dm_call_pr(bdev, __dm_pr_reserve, &pr);
if (r < 0) if (ret)
goto out; return ret;
ops = bdev->bd_disk->fops->pr_ops; return pr.ret;
if (ops && ops->pr_reserve) }
r = ops->pr_reserve(bdev, key, type, flags);
else /*
r = -EOPNOTSUPP; * If there is a non-All Registrants type of reservation, the release must be
out: * sent down the holding path. For the cases where there is no reservation or
dm_unprepare_ioctl(md, srcu_idx); * the path is not the holder the device will also return success, so we must
return r; * try each path to make sure we got the correct path.
*/
static int __dm_pr_release(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
struct dm_pr *pr = data;
const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
if (!ops || !ops->pr_release) {
pr->ret = -EOPNOTSUPP;
return -1;
}
pr->ret = ops->pr_release(dev->bdev, pr->old_key, pr->type);
if (pr->ret)
return -1;
return 0;
} }
static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type) static int dm_pr_release(struct block_device *bdev, u64 key, enum pr_type type)
{ {
struct mapped_device *md = bdev->bd_disk->private_data; struct dm_pr pr = {
const struct pr_ops *ops; .old_key = key,
int r, srcu_idx; .type = type,
.fail_early = false,
};
int ret;
r = dm_prepare_ioctl(md, &srcu_idx, &bdev); ret = dm_call_pr(bdev, __dm_pr_release, &pr);
if (r < 0) if (ret)
goto out; return ret;
ops = bdev->bd_disk->fops->pr_ops; return pr.ret;
if (ops && ops->pr_release) }
r = ops->pr_release(bdev, key, type);
else static int __dm_pr_preempt(struct dm_target *ti, struct dm_dev *dev,
r = -EOPNOTSUPP; sector_t start, sector_t len, void *data)
out: {
dm_unprepare_ioctl(md, srcu_idx); struct dm_pr *pr = data;
return r; const struct pr_ops *ops = dev->bdev->bd_disk->fops->pr_ops;
if (!ops || !ops->pr_preempt) {
pr->ret = -EOPNOTSUPP;
return -1;
}
pr->ret = ops->pr_preempt(dev->bdev, pr->old_key, pr->new_key, pr->type,
pr->abort);
if (!pr->ret)
return -1;
return 0;
} }
static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key, static int dm_pr_preempt(struct block_device *bdev, u64 old_key, u64 new_key,
enum pr_type type, bool abort) enum pr_type type, bool abort)
{ {
struct mapped_device *md = bdev->bd_disk->private_data; struct dm_pr pr = {
const struct pr_ops *ops; .new_key = new_key,
int r, srcu_idx; .old_key = old_key,
.type = type,
.fail_early = false,
};
int ret;
r = dm_prepare_ioctl(md, &srcu_idx, &bdev); ret = dm_call_pr(bdev, __dm_pr_preempt, &pr);
if (r < 0) if (ret)
goto out; return ret;
ops = bdev->bd_disk->fops->pr_ops; return pr.ret;
if (ops && ops->pr_preempt)
r = ops->pr_preempt(bdev, old_key, new_key, type, abort);
else
r = -EOPNOTSUPP;
out:
dm_unprepare_ioctl(md, srcu_idx);
return r;
} }
static int dm_pr_clear(struct block_device *bdev, u64 key) static int dm_pr_clear(struct block_device *bdev, u64 key)
......
...@@ -53,7 +53,6 @@ struct dm_io; ...@@ -53,7 +53,6 @@ struct dm_io;
*---------------------------------------------------------------*/ *---------------------------------------------------------------*/
void dm_table_event_callback(struct dm_table *t, void dm_table_event_callback(struct dm_table *t,
void (*fn)(void *), void *context); void (*fn)(void *), void *context);
struct dm_target *dm_table_get_target(struct dm_table *t, unsigned int index);
struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector); struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector);
bool dm_table_has_no_data_devices(struct dm_table *table); bool dm_table_has_no_data_devices(struct dm_table *table);
int dm_calculate_queue_limits(struct dm_table *table, int dm_calculate_queue_limits(struct dm_table *table,
...@@ -218,9 +217,6 @@ void dm_kcopyd_exit(void); ...@@ -218,9 +217,6 @@ void dm_kcopyd_exit(void);
/* /*
* Mempool operations * Mempool operations
*/ */
struct dm_md_mempools *dm_alloc_md_mempools(struct mapped_device *md, enum dm_queue_mode type,
unsigned per_io_data_size, unsigned min_pool_size,
bool integrity, bool poll);
void dm_free_md_mempools(struct dm_md_mempools *pools); void dm_free_md_mempools(struct dm_md_mempools *pools);
/* /*
......
...@@ -373,6 +373,12 @@ struct dm_target { ...@@ -373,6 +373,12 @@ struct dm_target {
* after returning DM_MAPIO_SUBMITTED from its map function. * after returning DM_MAPIO_SUBMITTED from its map function.
*/ */
bool accounts_remapped_io:1; bool accounts_remapped_io:1;
/*
* Set if the target will submit the DM bio without first calling
* bio_set_dev(). NOTE: ideally a target should _not_ need this.
*/
bool needs_bio_set_dev:1;
}; };
void *dm_per_bio_data(struct bio *bio, size_t data_size); void *dm_per_bio_data(struct bio *bio, size_t data_size);
...@@ -561,7 +567,6 @@ void dm_sync_table(struct mapped_device *md); ...@@ -561,7 +567,6 @@ void dm_sync_table(struct mapped_device *md);
* Queries * Queries
*/ */
sector_t dm_table_get_size(struct dm_table *t); sector_t dm_table_get_size(struct dm_table *t);
unsigned int dm_table_get_num_targets(struct dm_table *t);
fmode_t dm_table_get_mode(struct dm_table *t); fmode_t dm_table_get_mode(struct dm_table *t);
struct mapped_device *dm_table_get_md(struct dm_table *t); struct mapped_device *dm_table_get_md(struct dm_table *t);
const char *dm_table_device_name(struct dm_table *t); const char *dm_table_device_name(struct dm_table *t);
......
...@@ -286,9 +286,9 @@ enum { ...@@ -286,9 +286,9 @@ enum {
#define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl) #define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
#define DM_VERSION_MAJOR 4 #define DM_VERSION_MAJOR 4
#define DM_VERSION_MINOR 46 #define DM_VERSION_MINOR 47
#define DM_VERSION_PATCHLEVEL 0 #define DM_VERSION_PATCHLEVEL 0
#define DM_VERSION_EXTRA "-ioctl (2022-02-22)" #define DM_VERSION_EXTRA "-ioctl (2022-07-28)"
/* Status bits */ /* Status bits */
#define DM_READONLY_FLAG (1 << 0) /* In/Out */ #define DM_READONLY_FLAG (1 << 0) /* In/Out */
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册