提交 · e9f8ca0ae7b7bc9a032b429929431c626a69dd5e · openeuler / Kernel

28 1月, 2020 1 次提交

dm: fix potential for q->make_request_fn NULL pointer · 47ace7e0

由 Mike Snitzer 提交于 1月 27, 2020

Move blk_queue_make_request() to dm.c:alloc_dev() so that
q->make_request_fn is never NULL during the lifetime of a DM device
(even one that is created without a DM table).

Otherwise generic_make_request() will crash simply by doing:
  dmsetup create -n test
  mount /dev/dm-N /mnt

While at it, move ->congested_data initialization out of
dm.c:alloc_dev() and into the bio-based specific init method.
Reported-by: NStefan Bader <stefan.bader@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1860231
Fixes: ff36ab34 ("dm: remove request-based logic from make_request_fn wrapper")
Depends-on: c12c9a3c ("dm: various cleanups to md->queue initialization code")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

47ace7e0

13 11月, 2019 3 次提交

block: rework zone reporting · d4100351

由 Christoph Hellwig 提交于 11月 11, 2019

Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d4100351

block: Remove partition support for zoned block devices · 5eac3eb3

由 Damien Le Moal 提交于 11月 11, 2019

No known partitioning tool supports zoned block devices, especially the
host managed flavor with strong sequential write constraints.
Furthermore, there are also no known user nor use cases for partitioned
zoned block devices.

This patch removes partition device creation for zoned block devices,
which allows simplifying the processing of zone commands for zoned
block devices. A warning is added if a partition table is found on the
device.

For report zones operations no zone sector information remapping is
necessary anymore, simplifying the code. Of note is that remapping of
zone reports for DM targets is still necessary as done by
dm_remap_zone_report().

Similarly, remaping of a zone reset bio is not necessary anymore.
Testing for the applicability of the zone reset all request also becomes
simpler and only needs to check that the number of sectors of the
requested zone range is equal to the disk capacity.
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5eac3eb3

block: Simplify report zones execution · ceeb373a

由 Damien Le Moal 提交于 11月 11, 2019

All kernel users of blkdev_report_zones() as well as applications use
through ioctl(BLKZONEREPORT) expect to potentially get less zone
descriptors than requested. As such, the use of the internal report
zones command execution loop implemented by blk_report_zones() is
not necessary and can even be harmful to performance by causing the
execution of inefficient small zones report command to service the
reminder of a requested zone array.

This patch removes blk_report_zones(), simplifying the code. Also
remove a now incorrect comment in dm_blk_report_zones().
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJavier Gonzalez <javier@javigon.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ceeb373a

07 11月, 2019 1 次提交

dm: add zone open, close and finish support · 2e2d6f7e

由 Ajay Joshi 提交于 10月 27, 2019

Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH
support to allow explicit control of zone states.

Contains contributions from Matias Bjorling, Hans Holmberg and
Damien Le Moal.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAjay Joshi <ajay.joshi@wdc.com>
Signed-off-by: NMatias Bjorling <matias.bjorling@wdc.com>
Signed-off-by: NHans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2e2d6f7e

23 8月, 2019 1 次提交

dm: make dm_table_find_target return NULL · 123d87d5

由 Mikulas Patocka 提交于 8月 23, 2019

Currently, if we pass too high sector number to dm_table_find_target, it
returns zeroed dm_target structure and callers test if the structure is
zeroed with the macro dm_target_is_valid.

However, returning NULL is common practice to indicate errors.

This patch refactors the dm code, so that dm_table_find_target returns
NULL and its callers test the returned value for NULL. The macro
dm_target_is_valid is deleted. In alloc_targets, we no longer allocate an
extra zeroed target.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

123d87d5

12 7月, 2019 1 次提交

block: Kill gfp_t argument of blkdev_report_zones() · bd976e52

由 Damien Le Moal 提交于 7月 01, 2019

Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bd976e52

06 7月, 2019 2 次提交

dm: enable synchronous dax · 2e9ee095

由 Pankaj Gupta 提交于 7月 05, 2019

This patch sets dax device 'DAXDEV_SYNC' flag if all the target
devices of device mapper support synchrononous DAX. If device
mapper consists of both synchronous and asynchronous dax devices,
we don't set 'DAXDEV_SYNC' flag.

'dm_table_supports_dax' is refactored to pass 'iterate_devices_fn'
as argument so that the callers can pass the appropriate functions.
Suggested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2e9ee095

libnvdimm: add dax_dev sync flag · fefc1d97

由 Pankaj Gupta 提交于 7月 05, 2019

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.
Signed-off-by: NPankaj Gupta <pagupta@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fefc1d97

22 5月, 2019 1 次提交

dm: make sure to obey max_io_len_target_boundary · 51b86f9a

由 Michael Lass 提交于 5月 21, 2019

Commit 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM
target interface") incorrectly removed code from
__send_changing_extent_only() that is required to impose a per-target IO
boundary on IO that exceeds max_io_len_target_boundary().  Otherwise
"special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
where allowed.

Fix this by restoring the max_io_len_target_boundary() limit in
__send_changing_extent_only()

Fixes: 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
Cc: stable@vger.kernel.org # 5.1+
Signed-off-by: NMichael Lass <bevan@bi-co.net>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

51b86f9a

21 5月, 2019 1 次提交

dax: Arrange for dax_supported check to span multiple devices · 7bf7eac8

由 Dan Williams 提交于 5月 16, 2019

Pankaj reports that starting with commit ad428cdb "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.

Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.

Fixes: ad428cdb ("dax: Check the end of the block-device...")
Cc: <stable@vger.kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reported-by: NPankaj Gupta <pagupta@redhat.com>
Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
Tested-by: NPankaj Gupta <pagupta@redhat.com>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7bf7eac8

16 5月, 2019 1 次提交

dm: fix a couple brace coding style issues · 8454fca4

由 Sheetal Singala 提交于 5月 10, 2019

Signed-off-by: NSheetal Singala <2396sheetal@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8454fca4

26 4月, 2019 1 次提交

dm: only initialize md->dax_dev if CONFIG_DAX_DRIVER is enabled · 514cf4f8

由 Peng Wang 提交于 4月 18, 2019

md->dax_dev defaults to NULL and there is no need to initialize it
if CONFIG_DAX_DRIVER is disabled.
Signed-off-by: NPeng Wang <rocking@whu.edu.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

514cf4f8

05 4月, 2019 1 次提交

dm: disable DISCARD if the underlying storage no longer supports it · bcb44433

由 Mike Snitzer 提交于 4月 03, 2019

Storage devices which report supporting discard commands like
WRITE_SAME_16 with unmap, but reject discard commands sent to the
storage device. This is a clear storage firmware bug but it doesn't
change the fact that should a program cause discards to be sent to a
multipath device layered on this buggy storage, all paths can end up
failed at the same time from the discards, causing possible I/O loss.

The first discard to a path will fail with Illegal Request, Invalid
field in cdb, e.g.:
kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
kernel: blk_update_request: critical target error, dev sdfn, sector 10487808

The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
device disables its support for discard on this path, and because of the
BLK_STS_TARGET error multipath fails the discard without failing any
path or retrying down a different path. But subsequent discards can
cause path failures. Any discards sent to the path which already failed
a discard ends up failing with EIO from blk_cloned_rq_check_limits with
an "over max size limit" error since the discard limit was set to 0 by
the sd driver for the path. As the error is EIO, this now fails the
path and multipath tries to send the discard down the next path. This
cycle continues as discards are sent until all paths fail.

Fix this by training DM core to disable DISCARD if the underlying
storage already did so.

Also, fix branching in dm_done() and clone_endio() to reflect the
mutually exclussive nature of the IO operations in question.

Cc: stable@vger.kernel.org
Reported-by: NDavid Jeffery <djeffery@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bcb44433

02 4月, 2019 1 次提交

dm: revert ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE") · 75ae1936

由 Mikulas Patocka 提交于 3月 21, 2019

The limit was already incorporated to dm-crypt with commit 4e870e94
("dm crypt: fix error with too large bios"), so we don't need to apply
it globally to all targets. The quantity BIO_MAX_PAGES * PAGE_SIZE is
wrong anyway because the variable ti->max_io_len it is supposed to be in
the units of 512-byte sectors not in bytes.

Reduction of the limit to 1048576 sectors could even cause data
corruption in rare cases - suppose that we have a dm-striped device with
stripe size 768MiB. The target will call dm_set_target_max_io_len with
the value 1572864. The buggy code would reduce it to 1048576. Now, the
dm-core will errorneously split the bios on 1048576-sector boundary
insetad of 1572864-sector boundary and pass these stripe-crossing bios
to the striped target.

Cc: stable@vger.kernel.org # v4.16+
Fixes: 8f50e358 ("dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

75ae1936

06 3月, 2019 2 次提交

dm: always call blk_queue_split() in dm_process_bio() · effd58c9

由 Mike Snitzer 提交于 2月 22, 2019

Do not just call blk_queue_split() if the bio is_abnormal_io().

Fixes: 568c73a3 ("dm: update dm_process_bio() to split bio if in ->make_request_fn()")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

effd58c9

dm: remove unused _rq_tio_cache and _rq_cache · e689fbab

由 Mike Snitzer 提交于 2月 20, 2019

Also move dm_rq_target_io structure definition from dm-rq.h to dm-rq.c

Fixes: 6a23e05c ("dm: remove legacy request-based IO path")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e689fbab

21 2月, 2019 1 次提交

dm: eliminate 'split_discard_bios' flag from DM target interface · 61697a6a

由 Mike Snitzer 提交于 1月 18, 2019

There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits.  A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

61697a6a

20 2月, 2019 1 次提交

dm: update dm_process_bio() to split bio if in ->make_request_fn() · 568c73a3

由 Mike Snitzer 提交于 1月 18, 2019

Must call blk_queue_split() otherwise queue_limits for abnormal requests
(e.g. discard, writesame, etc) won't be imposed.

In addition, add dm_queue_split() to simplify DM specific splitting that
is needed for targets that impose ti->max_io_len.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

568c73a3

07 2月, 2019 2 次提交

dm: don't use bio_trim() afterall · fa8db494

由 Mike Snitzer 提交于 2月 05, 2019

bio_trim() has an early return, which makes it _not_ idempotent, if the
offset is 0 and the bio's bi_size already matches the requested size.
Prior to DM, all users of bio_trim() were fine with this.  But DM has
exposed the fact that bio_trim()'s early return is incompatible with a
cloned bio whose integrity payload must be trimmed via
bio_integrity_trim().

Fix this by reverting DM back to doing the equivalent of bio_trim() but
in an idempotent manner (so bio_integrity_trim is always performed).

Follow-on work is needed to assess what benefit bio_trim()'s early
return is providing to its existing callers.
Reported-by: NMilan Broz <gmazyland@gmail.com>
Fixes: 57c36519 ("dm: fix clone_bio() to trigger blk_recount_segments()")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

fa8db494

dm: add memory barrier before waitqueue_active · 645efa84

由 Mikulas Patocka 提交于 2月 05, 2019

Block core changes to switch bio-based IO accounting to be percpu had a
side-effect of altering DM core to now rely on calling waitqueue_active
(in both bio-based and request-based) to check if another task is in
dm_wait_for_completion().

A memory barrier is needed before calling waitqueue_active().  DM core
doesn't piggyback on a preceding memory barrier so it must explicitly
use its own.

For more details on why using waitqueue_active() without a preceding
barrier is unsafe, please see the comment before the waitqueue_active()
definition in include/linux/wait.h.

Add the missing memory barrier by switching to using wq_has_sleeper().

Fixes: 6f757231 ("dm: remove the pending IO accounting")
Fixes: c4576aed ("dm: fix request-based dm's use of dm_wait_for_completion")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

645efa84

23 1月, 2019 2 次提交

M
dm: add missing trace_block_split() to __split_and_process_bio() · 075c18c3
由 Mike Snitzer 提交于 1月 18, 2019
```
Provides useful context about bio splits in blktrace.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
075c18c3

dm: fix dm_wq_work() to only use __split_and_process_bio() if appropriate · 6548c7c5

由 Mike Snitzer 提交于 1月 17, 2019

Otherwise targets that don't support/expect IO splitting could resubmit
bios using code paths with unnecessary IO splitting complexity.

Depends-on: 24113d48 ("dm: avoid indirect call in __dm_make_request")
Fixes: 978e51ba ("dm: optimize bio-based NVMe IO submission")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6548c7c5

22 1月, 2019 2 次提交

dm: fix redundant IO accounting for bios that need splitting · a1e1cb72

由 Mike Snitzer 提交于 1月 17, 2019

The risk of redundant IO accounting was not taken into consideration
when commit 18a25da8 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().

Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry.  This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.

Before this fix:

  /dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
  bios are split on 32k boundaries.

  # fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
    	--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers

  with debugging added:
  [103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
  [103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
  [103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
  ...

  16M written yet 136M (278528 * 512b) accounted:
  # cat /sys/block/dm-2/stat | awk '{ print $7 }'
  278528

After this fix:

  16M written and 16M (32768 * 512b) accounted:
  # cat /sys/block/dm-2/stat | awk '{ print $7 }'
  32768

Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable@vger.kernel.org # 4.16+
Reported-by: NBryan Gurney <bgurney@redhat.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a1e1cb72

dm: fix clone_bio() to trigger blk_recount_segments() · 57c36519

由 Mike Snitzer 提交于 1月 16, 2019

DM's clone_bio() now benefits from using bio_trim() by fixing the fact
that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does;
which triggers blk_recount_segments() via bio_phys_segments().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

57c36519

20 12月, 2018 1 次提交

dm: don't reuse bio for flushes · dbe3ece1

由 Jens Axboe 提交于 12月 19, 2018

DM currently has a statically allocated bio that it uses to issue empty
flushes. It doesn't submit this bio, it just uses it for maintaining
state while setting up clones. Multiple users can access this bio at the
same time. This wasn't previously an issue, even if it was a bit iffy,
but with the blkg associations it can become one.

We setup the blkg association, then clone bio's and submit, then remove
the blkg assocation again. But since we can have multiple tasks doing
this at the same time, against multiple blkg's, then we can either lose
references to a blkg, or put it twice. The latter causes complaints on
the percpu ref being <= 0 when released, and can cause use-after-free as
well. Ming reports that xfstest generic/475 triggers this:

------------[ cut here ]------------
percpu ref (blkg_release) <= 0 (0) after switching to atomic
WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0

Switch to just using an on-stack bio for this, and get rid of the
embedded bio.

Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
Reported-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbe3ece1

18 12月, 2018 3 次提交

M
dm: remove indirect calls from __send_changing_extent_only() · 53b47168
由 Mike Snitzer 提交于 12月 03, 2018
```
No need to be so fancy.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
53b47168

dm: avoid indirect call in __dm_make_request · 24113d48

由 Mikulas Patocka 提交于 11月 06, 2018

Indirect calls are inefficient because of retpolines that are used for
spectre workaround. This patch replaces an indirect call with a condition
(that can be predicted by the branch predictor).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

24113d48

blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c

由 Jens Axboe 提交于 12月 17, 2018

There's a single user of this function, dm, and dm just wants
to check if IO is inflight, not that it's just allocated.

This fixes a hang with srp/002 in blktests with dm, where it tries
to suspend but waits for inflight IO to finish first. As it checks
for just allocated requests, this fails.
Tested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c94d83c

11 12月, 2018 2 次提交

dm: fix request-based dm's use of dm_wait_for_completion · c4576aed

由 Mike Snitzer 提交于 12月 11, 2018

The md->wait waitqueue is used by both bio-based and request-based DM.
Commit dbd3bbd2 ("dm rq: leverage blk_mq_queue_busy() to check for
outstanding IO") lost sight of the requirement that
dm_wait_for_completion() must work with all types of DM devices.

Fix md_in_flight() to call the blk-mq or bio-based method accordingly.

Fixes: dbd3bbd2 ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c4576aed

dm: fix inflight IO check · b7934ba4

由 Jens Axboe 提交于 12月 10, 2018

After switching to percpu inflight counters, the inflight check
is totally buggy. It's perfectly valid for some counters to be
non-zero while having a total inflight IO count of 0, that's how
these kinds of counters work (inc on one CPU, dec on another).
Fix the md_in_flight() check to sum all counters before returning
a false positive, potentially.

While at it, remove the inflight read for IO completion. We don't
need it, just wake anyone that's waiting for the IO count to drop
to zero. The caller needs to re-check that value anyway when woken,
which it does.

Fixes: 6f757231 ("dm: remove the pending IO accounting")
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7934ba4

10 12月, 2018 2 次提交

dm: remove the pending IO accounting · 6f757231

由 Mikulas Patocka 提交于 12月 06, 2018

Remove the "pending" atomic counters, that duplicate block-core's
in_flight counters, and update md_in_flight() to look at percpu
in_flight counters.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6f757231

dm: dont rewrite dm_disk(md)->part0.in_flight · 80a787ba

由 Mikulas Patocka 提交于 12月 06, 2018

generic_start_io_acct and generic_end_io_acct already update the variable
in_flight using atomic operations, so we don't have to overwrite them
again.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

80a787ba

08 12月, 2018 2 次提交

dm: set the static flush bio device on demand · 892ad71f

由 Dennis Zhou 提交于 12月 05, 2018

The next patch changes the macro bio_set_dev() to associate a bio with a
blkg based on the device set. However, dm creates a static bio to be
used as the basis for cloning empty flush bios on creation. The
bio_set_dev() call in alloc_dev() will cause problems with the next
patch adding association to bio_set_dev() because the call is before the
bdev is associated with a gendisk (bd_disk is %NULL). To get around
this, set the device on the static bio every time and use that to clone
to the other bios.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

892ad71f

dm: call blk_queue_split() to impose device limits on bios · 89f5fa47

由 Mike Snitzer 提交于 12月 03, 2018

Otherwise the incoming bios, of various types, won't be shaped based on
the DM device's advertised limits.

Depends-on: af67c31f ("blk: remove bio_set arg from blk_queue_split()")
Fixes: 744889b7 ("block: don't deal with discard limit in blkdev_issue_discard()")
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

89f5fa47

16 11月, 2018 1 次提交

block: remove the lock argument to blk_alloc_queue_node · 6d469642

由 Christoph Hellwig 提交于 11月 14, 2018

With the legacy request path gone there is no real need to override the
queue_lock.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6d469642

26 10月, 2018 1 次提交

block: add a report_zones method · e76239a3

由 Christoph Hellwig 提交于 10月 12, 2018

Dispatching a report zones command through the request queue is a major
pain due to the command reply payload rewriting necessary. Given that
blkdev_report_zones() is executing everything synchronously, implement
report zones as a block device file operation instead, allowing major
simplification of the code in many places.

sd, null-blk, dm-linear and dm-flakey being the only block device
drivers supporting exposing zoned block devices, these drivers are
modified to provide the device side implementation of the
report_zones() block device file operation.

For device mappers, a new report_zones() target type operation is
defined so that the upper block layer calls blkdev_report_zones() can
be propagated down to the underlying devices of the dm targets.
Implementation for this new operation is added to the dm-linear and
dm-flakey targets.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[Damien]
* Changed method block_device argument to gendisk
* Various bug fixes and improvements
* Added support for null_blk, dm-linear and dm-flakey.
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e76239a3

17 10月, 2018 1 次提交

dm: remove unnecessary unlikely() around WARN_ON_ONCE() · bab5d988

由 Igor Stoppa 提交于 9月 07, 2018

WARN_ON() already contains an unlikely(), so it's not necessary to
wrap it into another.
Signed-off-by: NIgor Stoppa <igor.stoppa@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

bab5d988

11 10月, 2018 2 次提交

dm: rename DM_TYPE_MQ_REQUEST_BASED to DM_TYPE_REQUEST_BASED · 953923c0

由 Mike Snitzer 提交于 10月 11, 2018

Now that request-based DM is only using blk-mq, there is no need to
differentiate between legacy "rq" and new "mq".  We're back to a single
request-based DM -- and there was much rejoicing!
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

953923c0

dm: remove legacy request-based IO path · 6a23e05c

由 Jens Axboe 提交于 10月 10, 2018

dm supports both, and since we're killing off the legacy path in
general, get rid of it in dm.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6a23e05c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功