提交 · 0355941d8702ed40e5b211134c6ab394f19c4f03 · openanolis / cloud-kernel

24 7月, 2019 1 次提交

dm: fix clone_bio() to trigger blk_recount_segments() · 0355941d

由 Mike Snitzer 提交于 1月 16, 2019

commit 57c36519e4b949f89381053f7283f5d605595b42 upstream.

DM's clone_bio() now benefits from using bio_trim() by fixing the fact
that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does;
which triggers blk_recount_segments() via bio_phys_segments().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

0355941d

17 4月, 2019 1 次提交

dm: revert ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") · 4f5c99e0

由 Mikulas Patocka 提交于 3月 21, 2019

commit 75ae193626de3238ca5fb895868ec91c94e63b1b upstream.

The limit was already incorporated to dm-crypt with commit 4e870e94
("dm crypt: fix error with too large bios"), so we don't need to apply
it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
wrong anyway because the variable ti->max_io_len it is supposed to be in
the units of 512-byte sectors not in bytes.

Reduction of the limit to 1048576 sectors could even cause data
corruption in rare cases - suppose that we have a dm-striped device with
stripe size 768MiB. The target will call dm_set_target_max_io_len with
the value 1572864. The buggy code would reduce it to 1048576. Now, the
dm-core will errorneously split the bios on 1048576-sector boundary
insetad of 1572864-sector boundary and pass these stripe-crossing bios
to the striped target.

Cc: stable@vger.kernel.org # v4.16+
Fixes: 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4f5c99e0

20 12月, 2018 1 次提交

dm: call blk_queue_split() to impose device limits on bios · b543b5c0

由 Mike Snitzer 提交于 12月 03, 2018

commit 89f5fa47 upstream.

Otherwise the incoming bios, of various types, won't be shaped based on
the DM device's advertised limits.

Depends-on: af67c31f ("blk: remove bio_set arg from blk_queue_split()")
Fixes: 744889b7 ("block: don't deal with discard limit in blkdev_issue_discard()")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b543b5c0

10 10月, 2018 1 次提交

dm: fix report zone remapping to account for partition offset · 9864cd5d

由 Damien Le Moal 提交于 10月 09, 2018

If dm-linear or dm-flakey are layered on top of a partition of a zoned
block device, remapping of the start sector and write pointer position
of the zones reported by a report zones BIO must be modified to account
for the target table entry mapping (start offset within the device and
entry mapping with the dm device).  If the target's backing device is a
partition of a whole disk, the start sector on the physical device of
the partition must also be accounted for when modifying the zone
information.  However, dm_remap_zone_report() was not considering this
last case, resulting in incorrect zone information remapping with
targets using disk partitions.

Fix this by calculating the target backing device start sector using
the position of the completed report zones BIO and the unchanged
position and size of the original report zone BIO. With this value
calculated, the start sector and write pointer position of the target
zones can be correctly remapped.

Fixes: 10999307 ("dm: introduce dm_remap_zone_report()")
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9864cd5d

18 7月, 2018 1 次提交

block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3

由 Michael Callahan 提交于 7月 18, 2018

Add and use a new op_stat_group() function for indexing partition stat
fields rather than indexing them by rq_data_dir() or bio_data_dir().
This function works similarly to op_is_sync() in that it takes the
request::cmd_flags or bio::bi_opf flags and determines which stats
should et updated.

In addition, the second parameter to generic_start_io_acct() and
generic_end_io_acct() is now a REQ_OP rather than simply a read or
write bit and it uses op_stat_group() on the parameter to determine
the stat group.

Note that the partition in_flight counts are not part of the per-cpu
statistics and as such are not indexed via this function.  It's now
indexed by op_is_write().

tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ddcf35d3

29 6月, 2018 1 次提交

dm: prevent DAX mounts if not supported · dbc62659

由 Ross Zwisler 提交于 6月 26, 2018

Currently device_supports_dax() just checks to see if the QUEUE_FLAG_DAX
flag is set on the device's request queue to decide whether or not the
device supports filesystem DAX.  Really we should be using
bdev_dax_supported() like filesystems do at mount time.  This performs
other tests like checking to make sure the dax_direct_access() path works.

We also explicitly clear QUEUE_FLAG_DAX on the DM device's request queue if
any of the underlying devices do not support DAX.  This makes the handling
of QUEUE_FLAG_DAX consistent with the setting/clearing of most other flags
in dm_table_set_restrictions().

Now that bdev_dax_supported() explicitly checks for QUEUE_FLAG_DAX, this
will ensure that filesystems built upon DM devices will only be able to
mount with DAX if all underlying devices also support DAX.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Fixes: commit 545ed20e ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

dbc62659

23 6月, 2018 1 次提交

dm: use bio_split() when splitting out the already processed bio · f21c601a

由 Mike Snitzer 提交于 6月 15, 2018

Use of bio_clone_bioset() is inefficient if there is no need to clone
the original bio's bio_vec array.  Best to use the bio_clone_fast()
variant.  Also, just using bio_advance() is only part of what is needed
to properly setup the clone -- it doesn't account for the various
bio_integrity() related work that also needs to be performed (see
bio_split).

Address both of these issues by switching from bio_clone_bioset() to
bio_split().

Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable@vger.kernel.org # 4.15+, requires removal of '&' before md->queue->bio_split
Reported-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f21c601a

08 6月, 2018 1 次提交

dm: use bioset_init_from_src() to copy bio_set · 2a2a4c51

由 Jens Axboe 提交于 6月 07, 2018

We can't just copy and clear a bio_set, use the bio helper to
setup a new bio_set with the settings from another one.

Fixes: 6f1c819c ("dm: convert to bioset_init()/mempool_init()")
Reported-by: NVenkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: NVenkat R.B <vrbagal1@linux.vnet.ibm.com>
Tested-by: NLi Wang <liwang@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2a2a4c51

31 5月, 2018 2 次提交

dm: convert to bioset_init()/mempool_init() · 6f1c819c

由 Kent Overstreet 提交于 5月 20, 2018

Convert dm to embedded bio sets.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6f1c819c

block: convert bounce, q->bio_split to bioset_init()/mempool_init() · 338aa96d

由 Kent Overstreet 提交于 5月 20, 2018

Convert the core block functionality to embedded bio sets.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

338aa96d

23 5月, 2018 1 次提交

dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3

由 Dan Williams 提交于 5月 02, 2018

Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b3a9a0c3

01 5月, 2018 1 次提交

dm: fix some sparse warnings and whitespace in dax methods · 3d97c829

由 Mike Snitzer 提交于 4月 30, 2018

Eliminate these sparse warnings:
drivers/md/dm.c:1062:9: warning: context imbalance in 'dm_dax_direct_access' - unexpected unlock
drivers/md/dm.c:1086:9: warning: context imbalance in 'dm_dax_copy_from_iter' - unexpected unlock
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3d97c829

05 4月, 2018 2 次提交

dm: remove fmode_t argument from .prepare_ioctl hook · 5bd5e8d8

由 Mike Snitzer 提交于 4月 03, 2018

Use the fmode_t that is passed to dm_blk_ioctl() rather than
inconsistently (varies across targets) drop it on the floor by
overriding it with the fmode_t stored in 'struct dm_dev'.

All the persistent reservation functions weren't using the fmode_t they
got back from .prepare_ioctl so remove them.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

5bd5e8d8

dm: hold DM table for duration of ioctl rather than use blkdev_get · 971888c4

由 Mike Snitzer 提交于 4月 03, 2018

Commit 519049af ("dm: use blkdev_get rather than bdgrab when issuing
pass-through ioctl") inadvertantly introduced a regression relative to
users of device cgroups that issue ioctls (e.g. libvirt).  Using
blkdev_get() in DM's passthrough ioctl support implicitly introduced a
cgroup permissions check that would fail unless care were taken to add
all devices in the IO stack to the device cgroup.  E.g. rather than just
adding the top-level DM multipath device to the cgroup all the
underlying devices would need to be allowed.

Fix this, to no longer require allowing all underlying devices, by
simply holding the live DM table (which includes the table's original
blkdev_get() reference on the blockdevice that the ioctl will be issued
to) for the duration of the ioctl.

Also, bump the DM ioctl version so a user can know that their device
cgroup allow workaround is no longer needed.
Reported-by: NMichal Privoznik <mprivozn@redhat.com>
Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 519049af ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Cc: stable@vger.kernel.org # 4.16
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

971888c4

04 4月, 2018 2 次提交

dm: add support for secure erase forwarding · 00716545

由 Denis Semakin 提交于 3月 13, 2018

Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's
data devices support secure erase.

Also, add support for secure erase to both the linear and striped
targets.
Signed-off-by: NDenis Semakin <d.semakin@omprussia.ru>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

00716545

dm: backfill abnormal IO support to non-splitting IO submission · 0519c71e

由 Mike Snitzer 提交于 3月 26, 2018

Otherwise, these abnormal IOs would be sent to the DM target
regardless of whether the target advertised support for them.

Factor out __process_abnormal_io() from __split_and_process_non_flush()
so that discards, write same, etc may be conditionally processed.

Fixes: 978e51ba ("dm: optimize bio-based NVMe IO submission")
Cc: stable@vger.kernel.org # 4.16
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0519c71e

03 4月, 2018 1 次提交

dax, dm: allow device-mapper to operate without dax support · 976431b0

由 Dan Williams 提交于 3月 29, 2018

Change device-mapper's DAX dependency to require the presence of at
least one DAX_DRIVER. This allows device-mapper to be built without
bringing the DAX core along which is especially wasteful when there are
no DAX drivers, like BLK_DEV_PMEM, configured.

Cc: Alasdair Kergon <agk@redhat.com>
Reported-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

976431b0

30 3月, 2018 1 次提交

dm: fix dropped return code from dm_get_bdev_for_ioctl · da5dadb4

由 Mike Snitzer 提交于 3月 29, 2018

dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from
prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it
wasn't).  Unfortunately commit 519049af ("dm: use blkdev_get rather
than bdgrab when issuing pass-through ioctl") reused the variable 'r'
to store the return from blkdev_get() that follows prepare_ioctl()
-- whereby dropping prepare_ioctl()'s result on the floor.

This can lead to an ioctl or persistent reservation being issued to a
partition going unnoticed, which implies the extra permission check for
CAP_SYS_RAWIO is skipped.

Fix this by using a different variable to store blkdev_get()'s return.

Fixes: 519049af ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Reported-by: NAlasdair G Kergon <agk@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

da5dadb4

07 3月, 2018 1 次提交

dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl · 519049af

由 Mike Snitzer 提交于 2月 22, 2018

Otherwise an underlying device's teardown (e.g. SCSI) may race with the
DM ioctl or persistent reservation and result in dereferencing driver
memory that gets freed when the underlying device's final blkdev_put()
occurs.

bdgrab() only increases the refcount for the block_device's inode to
ensure the block_device struct itself will not be freed, but does not
guarantee the block_device will remain associated with the gendisk or
its storage.

Cc: stable@vger.kernel.org # 4.8+
Reported-by: NDavid Jeffery <djeffery@redhat.com>
Suggested-by: NDavid Jeffery <djeffery@redhat.com>
Reviewed-by: NBen Marzinski <bmarzins@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

519049af

01 3月, 2018 1 次提交

block: Add 'lock' as third argument to blk_alloc_queue_node() · 5ee0524b

由 Bart Van Assche 提交于 2月 28, 2018

This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5ee0524b

16 2月, 2018 1 次提交

dm: correctly handle chained bios in dec_pending() · 8dd601fa

由 NeilBrown 提交于 2月 15, 2018

dec_pending() is given an error status (possibly 0) to be recorded
against a bio.  It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io->status.  However when it then assigned io->status to bio->bi_status,
it is not careful and could overwrite a genuine error status with 0.

This can happen when chained bios are in use.  If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio->bi_status before the dm_io completes.

This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da8 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.

A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
->bi_status, and the other fragment will normally complete a little
later, and will clear ->bi_status.

The fix is simply to only assign io_error to bio->bi_status when
io_error is not zero.
Reported-and-tested-by: NMilan Broz <gmazyland@gmail.com>
Cc: stable@vger.kernel.org (v3.14+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8dd601fa

30 1月, 2018 1 次提交
- M
  dm: various cleanups to md->queue initialization code · c12c9a3c
  由 Mike Snitzer 提交于 1月 12, 2018
```
Also, add dm_sysfs_init() error handling to dm_create().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
  c12c9a3c
17 1月, 2018 1 次提交
- M
  dm: backfill missing calls to mutex_destroy() · d5ffebdd
  由 Mike Snitzer 提交于 1月 05, 2018
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
  d5ffebdd
15 1月, 2018 1 次提交

dm: fix incomplete request_queue initialization · c100ec49

由 Mike Snitzer 提交于 1月 08, 2018

DM is no longer prone to having its request_queue be improperly
initialized.

Summary of changes:

- defer DM's blk_register_queue() from add_disk()-time until
  dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev().

- dm_setup_md_queue() is updated to fully initialize DM's request_queue
  (_after_ all table loads have occurred and the request_queue's type,
  features and limits are known).

A very welcome side-effect of these changes is DM no longer needs to:
1) backfill the "mq" sysfs entry (because historically DM didn't
initialize the request_queue to use blk-mq until _after_
blk_register_queue() was called via add_disk()).
2) call elv_register_queue() to get .request_fn request-based DM
device's "iosched" exposed in syfs.

In addition, blk-mq debugfs support is now made available because
request-based DM's blk-mq request_queue is now properly initialized
before dm_setup_md_queue() calls blk_register_queue().

These changes also stave off the need to introduce new DM-specific
workarounds in block core, e.g. this proposal:
https://patchwork.kernel.org/patch/10067961/

In the end DM devices should be less unicorn in nature (relative to
initialization and availability of block core infrastructure provided by
the request_queue).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c100ec49

07 1月, 2018 1 次提交

dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE · 8f50e358

由 Ming Lei 提交于 12月 18, 2017

For BIO based DM, some targets aren't ready for dealing with bigger
incoming bio than 1Mbyte, such as crypt target.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc:dm-devel@redhat.com
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f50e358

20 12月, 2017 2 次提交

dm: optimize bio-based NVMe IO submission · 978e51ba

由 Mike Snitzer 提交于 12月 09, 2017

Upper level bio-based drivers that stack immediately ontop of NVMe can
leverage direct_make_request().  In addition DM's NVMe bio-based
will initially only ever have one NVMe device that it submits IO to at a
time.  There is no splitting needed.  Enhance DM core so that
DM_TYPE_NVME_BIO_BASED's IO submission takes advantage of both of these
characteristics.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

978e51ba

dm: introduce DM_TYPE_NVME_BIO_BASED · 22c11858

由 Mike Snitzer 提交于 12月 04, 2017

If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
all devices in the DM table do not support partial completions.  Also,
the table has a single immutable target that doesn't require DM core to
split bios.

This will enable adding NVMe optimizations to bio-based DM.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

22c11858

18 12月, 2017 1 次提交

dm: simplify start of block stats accounting for bio-based · f3986374

由 Mike Snitzer 提交于 12月 17, 2017

No apparent need to generic_start_io_acct() until before the IO is ready
for submission.  start_io_acct() is the proper place to do this
accounting -- it is also where DM accounts for pending IO and, if
enabled, starts dm-stats accounting.

Replace start_io_acct()'s part_round_stats() with generic_start_io_acct().
This eliminates needing to take part_stat_lock() multiple times when
starting an IO on bio-based devices.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f3986374

17 12月, 2017 5 次提交

dm: remove redundant mapped_device member from clone_info structure · bc02cdbe

由 Mike Snitzer 提交于 12月 14, 2017

'struct dm_io' already has the same pointer.  So update all accesses
from ci->md to ci->io->md.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bc02cdbe

M
dm: remove now unused bio-based io_pool and _io_cache · dde1e1ec
由 Mike Snitzer 提交于 12月 11, 2017
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
dde1e1ec

dm: improve performance by moving dm_io structure to per-bio-data · 64f52b0e

由 Mike Snitzer 提交于 12月 11, 2017

Eliminates need for a separate mempool to allocate 'struct dm_io'
objects from.  As such, it saves an extra mempool allocation for each
original bio that DM core is issued.

This complicates the per-bio-data accessor functions by needing to
conditonally add extra padding to get to a target's per-bio-data.  But
in the end this provides a decent performance improvement for all
bio-based DM devices.

On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
DM linear performance improved by 2% (went from 2665 to 2777 MB/s).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

64f52b0e

M
dm: rename 'bio' member of dm_io structure to 'orig_bio' · 745dc570
由 Mike Snitzer 提交于 12月 11, 2017
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
745dc570

dm: remove stale comment blocks · 2abf1fc9

由 Mike Snitzer 提交于 12月 09, 2017

These CRUD comments have worn out their welcome.  The code is what it
is, over time it'll hopefully get better.  But these comments serve no
purpose whatsoever.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2abf1fc9

14 12月, 2017 7 次提交

dm: set QUEUE_FLAG_DAX accordingly in dm_table_set_restrictions() · ad3793fc

由 Mike Snitzer 提交于 12月 04, 2017

Rather than having DAX support be unique by setting it based on table
type in dm_setup_md_queue().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ad3793fc

dm: fix __send_changing_extent_only() to send first bio and chain remainder · 3d7f4562

由 Mike Snitzer 提交于 12月 08, 2017

__send_changing_extent_only() must follow the same pattern that was
established with commit "dm: ensure bio submission follows a depth-first
tree walk".  That is: submit first bio up to split boundary and then
split the remainder to further submissions.
Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3d7f4562

dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs · 0776aa0e

由 Mike Snitzer 提交于 12月 08, 2017

alloc_multiple_bios() assumes it can allocate the requested number of
bios but until now there was no gaurantee that the mempools would be
accomodating.
Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0776aa0e

dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure · 4a3f54d9

由 Mike Snitzer 提交于 11月 22, 2017

Now that all of DM has been revised and/or verified to no longer require
the use of BIOSET_NEED_RESCUER the dm_offload code may be removed.
Suggested-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4a3f54d9

dm: safely allocate multiple bioset bios · 318716dd

由 Mike Snitzer 提交于 11月 22, 2017

DM targets can request multiple bios be sent to them by DM core (see:
num_{flush,discard,write_same,write_zeroes}_bios).  But until now these
bios were allocated in an unsafe manner than could potentially exhaust
the DM device's bioset -- in the face of multiple threads each trying to
do multiple allocations from the same DM device's bioset.

Fix __send_duplicate_bios() by using the new alloc_multiple_bios().  The
allocation strategy used by alloc_multiple_bios() models that used by
dm-crypt.c:crypt_alloc_buffer().

Neil Brown initially proposed this fix but the implementation has been
revised enough that it inappropriate to attribute the entirety of it to
him.
Suggested-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

318716dd

dm: remove unused 'num_write_bios' target interface · f31c21e4

由 NeilBrown 提交于 11月 22, 2017

No DM target provides num_write_bios and none has since dm-cache's
brief use in 2013.

Having the possibility of num_write_bios > 1 complicates bio
allocation.  So remove the interface and assume there is only one bio
needed.

If a target ever needs more, it must provide a suitable bioset and
allocate itself based on its particular needs.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f31c21e4

dm: ensure bio submission follows a depth-first tree walk · 18a25da8

由 NeilBrown 提交于 9月 06, 2017

A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.

The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner.  Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.

In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.

This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.

dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available.  It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request().  This requires the 'rescue'
functionality provided by dm_offload_{start,end}.

Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach.  That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately.  This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.

__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work().  When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.

When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder.  Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset.  This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.

Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio().  This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().

To provide the clone bio when splitting, we use q->bio_split.  This
was previously being freed by bio-based dm to avoid having excess
rescuer threads.  As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

18a25da8

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功