提交 · c5248f79f39e5254977a3916b2149c3ccffa2722 · openeuler / raspberrypi-kernel

23 2月, 2016 9 次提交

dm: remove support for stacking dm-mq on .request_fn device(s) · c5248f79

由 Mike Snitzer 提交于 2月 20, 2016

Remove all fiddley code that propped up this support for a blk-mq
request-queue ontop of all .request_fn devices.

Testing has proven this niche request-based dm-mq mode to be buggy, when
testing fault tolerance with DM multipath, and there is no point trying
to preserve it.

Should help improve efficiency of pure dm-mq code and make code
maintenance less delicate.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c5248f79

dm: fix a couple locking issues with use of block interfaces · 818c5f3b

由 Mike Snitzer 提交于 2月 20, 2016

old_stop_queue() was checking blk_queue_stopped() without holding the
q->queue_lock.

dm_requeue_original_request() needed to check blk_queue_stopped(), with
q->queue_lock held, before calling blk_mq_kick_requeue_list().  And a
side-effect of that change is start_queue() must also call
blk_mq_kick_requeue_list().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

818c5f3b

dm: allocate blk_mq_tag_set rather than embed in mapped_device · 1c357a1e

由 Mike Snitzer 提交于 2月 06, 2016

The blk_mq_tag_set is only needed for dm-mq support. There is point
wasting space in 'struct mapped_device' for non-dm-mq devices.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return

1c357a1e

dm: add 'dm_mq_nr_hw_queues' and 'dm_mq_queue_depth' module params · faad87df

由 Mike Snitzer 提交于 1月 28, 2016

Allow user to change these values via module params or sysfs.

'dm_mq_nr_hw_queues' defaults to 1 (max 32).

'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too
small under moderate sized workloads -- the dm-multipath device would
continuously block waiting for tags (requests) to become available).
The maximum is BLK_MQ_MAX_DEPTH (currently 10240).

Keep in mind the total number of pre-allocated requests per
request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth'
(currently 2048).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

faad87df

dm: optimize dm_request_fn() · c91852ff

由 Mike Snitzer 提交于 1月 31, 2016

DM multipath is the only request-based DM target -- which only supports
tables with a single target that is immutable.  Leverage this fact in
dm_request_fn().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c91852ff

dm: optimize dm_mq_queue_rq() · 16f12266

由 Mike Snitzer 提交于 1月 31, 2016

DM multipath is the only dm-mq target. But that aside, request-based DM
only supports tables with a single target that is immutable. Leverage
this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
the mapped_device when the table was made active. This saves the need
to even take the read-side of the SRCU via dm_{get,put}_live_table.

If the active DM table does not have an immutable target (e.g. "error"
target was swapped in) then fallback to the slow-path where the target
is looked up from the live table.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

16f12266

dm: cleanup dm_any_congested() · e522c039

由 Mike Snitzer 提交于 2月 02, 2016

The request-based DM support for checking queue congestion doesn't
require access to the live DM table.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e522c039

M
dm: remove unused dm_get_rq_mapinfo() · ae6ad75e
由 Mike Snitzer 提交于 1月 30, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
ae6ad75e

dm: fix excessive dm-mq context switching · 6acfe68b

由 Mike Snitzer 提交于 2月 05, 2016

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...
  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b0 !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q->mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJens Axboe <axboe@kernel.dk>

6acfe68b

22 2月, 2016 3 次提交

dm: fix sparse "unexpected unlock" warnings in ioctl code · 956a4025

由 Mike Snitzer 提交于 2月 18, 2016

Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it
do the dm_{get,put}_live_table() rather than split those operations.

The dm_grab_bdev_for_ioctl() callers only care about the block_device
associated with a singleton DM device so there isn't any need to retain
a reference to the live DM table.  It is sufficient to:
1) dm_get_live_table()
2) bdgrab() the bdev associated with the singleton table's target
3) dm_put_live_table()
4) bdput() the bdev
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

956a4025

dm: do not return target from dm_get_live_table_for_ioctl() · 66482026

由 Mike Snitzer 提交于 2月 18, 2016

None of the callers actually used the returned target.
Also, just reuse bdev pointer passed to dm_blk_ioctl().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

66482026

dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths · 4328daa2

由 Mike Snitzer 提交于 2月 21, 2016

Using request-based DM mpath configured with the following stacking
(.request_fn DM mpath ontop of scsi-mq paths):

echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
echo N > /sys/module/dm_mod/parameters/use_blk_mq

'struct dm_rq_target_io' would leak if a request is requeued before a
blk-mq clone is allocated (or fails to allocate).  free_rq_tio()
wasn't being called.

kmemleak reported:

unreferenced object 0xffff8800b90b98c0 (size 112):
  comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s)
  hex dump (first 32 bytes):
    00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff  ..\,....@.......
    e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4............
  backtrace:
    [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0
    [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20
    [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170
    [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod]
    [<ffffffff812fbd78>] blk_peek_request+0x168/0x290
    [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod]
    [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40
    [<ffffffff812f9585>] blk_delay_work+0x25/0x40
    [<ffffffff81096fff>] process_one_work+0x14f/0x3d0
    [<ffffffff81097715>] worker_thread+0x125/0x4b0
    [<ffffffff8109ce88>] kthread+0xd8/0xf0
    [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff

crash> struct -o dm_rq_target_io
struct dm_rq_target_io {
    ...
}
SIZE: 112

Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4328daa2

18 11月, 2015 2 次提交

dm: do not reuse dm_blk_ioctl block_device input as local variable · 647a20d5

由 Mike Snitzer 提交于 11月 17, 2015

(Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for
targets' .prepare_ioctl to fail if they go on to check the bdev for
!NULL.

Fixes: e56f81e0 ("dm: refactor ioctl handling")
Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

647a20d5

dm: fix ioctl retry termination with signal · 5bbbfdf6

由 Junichi Nomura 提交于 11月 17, 2015

dm-mpath retries ioctl, when no path is readily available and the device
is configured to queue I/O in such a case. If you want to stop the retry
before multipathd decides to turn off queueing mode, you could send
signal for the process to exit from the loop.

However the check of fatal signal has not carried along when commit
6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the
loop from dm-mpath to dm core. As a result, we can't terminate such
a process in the retry loop.

Easy reproducer of the situation is:

  # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
  # dmsetup message mp 0 'queue_if_no_path'
  # sg_inq /dev/mapper/mp

then you should be able to terminate sg_inq by pressing Ctrl+C.

Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths")
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

5bbbfdf6

08 11月, 2015 1 次提交

block: change ->make_request_fn() and users to return a queue cookie · dece1635

由 Jens Axboe 提交于 11月 05, 2015

No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

dece1635

01 11月, 2015 4 次提交

dm: eliminate unused "bioset" process for each bio-based DM device · dbba42d8

由 Mikulas Patocka 提交于 10月 21, 2015

Commit 54efd50b ("block: make
generic_make_request handle arbitrarily sized bios") makes it possible
for block devices to process large bios.  In doing so that commit
allocates a new queue->bio_split bioset for each block device, this
bioset is used for allocating bios when the driver needs to split large
bios.

Each bioset allocates a workqueue process, thus the above commit
increases the number of processes allocated per block device.

DM doesn't need the queue->bio_split bioset, thus we can deallocate it.
This reduces the number of allocated processes per bio-based DM device
from 3 to 2.  Also remove the call to blk_queue_split(), it is not
needed because DM does its own splitting.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

dbba42d8

dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() · 6f65985e

由 Julia Lawall 提交于 9月 13, 2015

Remove DM's unneeded NULL tests before calling these destroy functions,
now that they check for NULL, thanks to these v4.3 commits:
3942d299 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
4e3ca3e0 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6f65985e

dm: add support for passing through persistent reservations · 71cdb697

由 Christoph Hellwig 提交于 10月 15, 2015

This adds support to pass through persistent reservation requests
similar to the existing ioctl handling, and with the same limitations,
e.g. devices may only have a single target attached.

This is mostly intended for multipathing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

71cdb697

dm: refactor ioctl handling · e56f81e0

由 Christoph Hellwig 提交于 10月 15, 2015

This moves the call to blkdev_ioctl and the argument checking to DM core
code, and only leaves a callout to find the block device to operate on
in the targets.  This simplifies the code and allows us to pass through
ioctl-like command using other methods in the next patch.

Also split out a helper around calling the prepare_ioctl method that
will be reused for persistent reservation handling.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e56f81e0

30 10月, 2015 1 次提交

dm: initialize non-blk-mq queue data before queue is used · ad5f498f

由 Mikulas Patocka 提交于 10月 27, 2015

Commit bfebd1cd ("dm: add full blk-mq
support to request-based DM") moves the initialization of the fields
backing_dev_info.congested_fn, backing_dev_info.congested_data and
queuedata from the function dm_init_md_queue (that is called when the
device is created) to dm_init_old_md_queue (that is called after the
device type is determined).

There is no locking when accessing these variables, thus it is possible
for other parts of the kernel to briefly see this data in a transient
state (e.g. queue->backing_dev_info.congested_fn initialized and
md->queue->backing_dev_info.congested_data uninitialized, resulting in
passing an incorrect parameter to the function dm_any_congested).

This queue data is left initialized for blk-mq devices even though they
that don't use it.

Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v4.1+

ad5f498f

22 10月, 2015 1 次提交

md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown · 9609b994

由 Dan Williams 提交于 10月 21, 2015

Now that the integrity profile is statically allocated there is no work
to do when shutting down an integrity enabled block device.

Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Acked-by: NNeilBrown <neilb@suse.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Acked-by: NVishal Verma <vishal.l.verma@intel.com>
Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9609b994

06 10月, 2015 1 次提交

dm: fix request-based dm error reporting · 50887bd1

由 Junichi Nomura 提交于 10月 06, 2015

end_clone_bio() is a endio callback for clone bio and should check
and save the clone's bi_error for error reporting.  However,
4246a0b6 ("block: add a bi_error field to struct bio") changed
the function to check the original bio's bi_error, which is 0.

Without this fix, clone's error is ignored and reported to the
original request as success.  Thus data corruption will be observed.

Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

50887bd1

01 10月, 2015 1 次提交

dm: fix AB-BA deadlock in __dm_destroy() · 2a708cff

由 Junichi Nomura 提交于 10月 01, 2015

__dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
suspend_lock in reverse order.  Doing so can cause AB-BA deadlock:

  __dm_destroy                    dm_swap_table
  ---------------------------------------------------
                                  mutex_lock(suspend_lock)
  dm_get_live_table()
    srcu_read_lock(io_barrier)
                                  dm_sync_table()
                                    synchronize_srcu(io_barrier)
                                      .. waiting for dm_put_live_table()
  mutex_lock(suspend_lock)
    .. waiting for suspend_lock

Fix this by taking the locks in proper order.
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Fixes: ab7c7bb6 ("dm: hold suspend_lock while suspending device during device deletion")
Acked-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

2a708cff

14 8月, 2015 2 次提交

block: kill merge_bvec_fn() completely · 8ae12666

由 Kent Overstreet 提交于 4月 27, 2015

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
 dm-era-target, and resolve merge conflicts]
Signed-off-by: NDongsu Park <dpark@posteo.net>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8ae12666

block: make generic_make_request handle arbitrarily sized bios · 54efd50b

由 Kent Overstreet 提交于 4月 23, 2015

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.  In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

 * nfhd_make_request (arch/m68k/emu/nfblock.c)
 * axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
 * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
 * brd_make_request (ramdisk - drivers/block/brd.c)
 * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
 * loop_make_request
 * null_queue_bio
 * bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
Acked-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: NDongsu Park <dpark@posteo.net>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

54efd50b

12 8月, 2015 1 次提交

dm: test return value for DM_MAPIO_SUBMITTED · ab37844d

由 Mikulas Patocka 提交于 7月 01, 2015

In properly written code we should not assume that DM_MAPIO_SUBMITTED is
zero. We should test the return value for DM_MAPIO_SUBMITTED rather than
testing it for zero.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ab37844d

04 8月, 2015 1 次提交

dm: fix dm_merge_bvec regression on 32 bit systems · bd4aaf8f

由 Mike Snitzer 提交于 8月 03, 2015

A DM regression on 32 bit systems was reported against v4.2-rc3 here:
https://lkml.org/lkml/2015/7/29/401

Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in
dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code
clarity in dm_merge_bvec").  This combined revert is done to eliminate
the possibility of a partial revert in stable@ kernels.

In hindsight the correct fix, at the time 1c220c69 was applied to fix
the regression that 148e51ba introduced, should've been to simply revert
148e51ba.
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Tested-by: NAdam Williamson <awilliam@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+

bd4aaf8f

29 7月, 2015 1 次提交

block: add a bi_error field to struct bio · 4246a0b6

由 Christoph Hellwig 提交于 7月 20, 2015

Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4246a0b6

13 7月, 2015 1 次提交

dm: fix use after free crash due to incorrect cleanup sequence · b06075a9

由 Mikulas Patocka 提交于 7月 10, 2015

Linux 4.2-rc1 Commit 0f20972f ("dm: factor out a common
cleanup_mapped_device()") moved a common cleanup code to a separate
function.  Unfortunately, that commit incorrectly changed the order of
cleanup, so that it destroys the mapped_device's srcu structure
'io_barrier' before destroying its workqueue.

The function that is executed on the workqueue (dm_wq_work) uses the srcu
structure, thus it may use it after being freed.  That results in a
crash in the LVM test suite's mirror-vgreduce-removemissing.sh test.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 0f20972f ("dm: factor out a common cleanup_mapped_device()")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b06075a9

09 7月, 2015 1 次提交

Revert "dm: only run the queue on completion if congested or no requests pending" · 621739b0

由 Mike Snitzer 提交于 7月 08, 2015

This reverts commit 9a0e609e.
(Resolved a conflict during revert due to commit bfebd1cd that came
after)

This revert is motivated by a couple failure reports on request-based DM
multipath testbeds:
1) Netapp reported that their multipath fault injection test under heavy
   IO load can stall longer than 300 seconds.
2) IBM reported elevated lock contention in their testbed (likely due to
   increased back pressure due to IO not being dispatched as quickly):
   https://www.redhat.com/archives/dm-devel/2015-July/msg00057.htmlSigned-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.1+

621739b0

26 6月, 2015 2 次提交

Revert "block, dm: don't copy bios for request clones" · 78d8e58a

由 Mike Snitzer 提交于 6月 26, 2015

This reverts commit 5f1b670d.

Justification for revert as reported in this dm-devel post:
https://www.redhat.com/archives/dm-devel/2015-June/msg00160.html

this change should not be pushed to mainline yet.

Firstly, Christoph has a newer version of the patch that fixes silent
data corruption problem:
  https://www.redhat.com/archives/dm-devel/2015-May/msg00229.html

And the new version still depends on LLDDs to always complete requests
to the end when error happens, while block API doesn't enforce such a
requirement. If the assumption is ever broken, the inconsistency between
request and bio (e.g. rq->__sector and rq->bio) will cause silent data
corruption:
  https://www.redhat.com/archives/dm-devel/2015-June/msg00022.htmlReported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

78d8e58a

Revert "dm: do not allocate any mempools for blk-mq request-based DM" · 4e6e36c3

由 Mike Snitzer 提交于 6月 26, 2015

This reverts commit cbc4e3c1.
Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4e6e36c3

18 6月, 2015 1 次提交

dm stats: add support for request-based DM devices · e262f347

由 Mikulas Patocka 提交于 6月 09, 2015

This makes it possible to use dm stats with DM multipath.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e262f347

02 6月, 2015 1 次提交

writeback: move backing_dev_info->state into bdi_writeback · 4452226e

由 Tejun Heo 提交于 5月 22, 2015

Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
and the role of the separation is unclear.  For cgroup support for
writeback IOs, a bdi will be updated to host multiple wb's where each
wb serves writeback IOs of a different cgroup on the bdi.  To achieve
that, a wb should carry all states necessary for servicing writeback
IOs for a cgroup independently.

This patch moves bdi->state into wb.

* enum bdi_state is renamed to wb_state and the prefix of all enums is
  changed from BDI_ to WB_.

* Explicit zeroing of bdi->state is removed without adding zeoring of
  wb->state as the whole data structure is zeroed on init anyway.

* As there's still only one bdi_writeback per backing_dev_info, all
  uses of bdi->state are mechanically replaced with bdi->wb.state
  introducing no behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: drbd-dev@lists.linbit.com
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4452226e

30 5月, 2015 4 次提交

dm: factor out a common cleanup_mapped_device() · 0f20972f

由 Mike Snitzer 提交于 4月 28, 2015

Introduce a single common method for cleaning up a DM device's
mapped_device.  No functional change, just eliminates duplication of
delicate mapped_device cleanup code.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0f20972f

dm: cleanup methods that requeue requests · 2d76fff1

由 Mike Snitzer 提交于 4月 29, 2015

More often than not a request that is requeued _is_ mapped (meaning the
clone request is allocated and clone->q is initialized).  Rename
dm_requeue_unmapped_original_request() to avoid potential confusion due
to function name containing "unmapped".

Also, remove dm_requeue_unmapped_request() since callers can easily call
the dm_requeue_original_request() directly.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2d76fff1

dm: do not allocate any mempools for blk-mq request-based DM · cbc4e3c1

由 Mike Snitzer 提交于 4月 27, 2015

Do not allocate the io_pool mempool for blk-mq request-based DM
(DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().

Also refine __bind_mempools() to have more precise awareness of which
mempools each type of DM device uses -- avoids mempool churn when
reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cbc4e3c1

dm: fix casting bug in dm_merge_bvec() · 1c220c69

由 Joe Thornber 提交于 5月 29, 2015

dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn").  In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int.  This code made
assumptions about the value of BIO_MAX_SECTORS.

A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors().  At
this point the cast from sector_t to int resulted in a zero value.  The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.

This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+

1c220c69

29 5月, 2015 1 次提交

dm: fix false warning in free_rq_clone() for unmapped requests · e5d8de32

由 Mike Snitzer 提交于 5月 28, 2015

When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.

The warning was added by commit aa6df8dd ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request").  But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.

Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).

Fixes: aa6df8dd ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e5d8de32

28 5月, 2015 1 次提交

dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY · 45714fbe

由 Mike Snitzer 提交于 5月 27, 2015

Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
DM blk-mq device's .queue_rq.  This cleans up the previous convoluted
handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
(even though it wasn't) and then run blk_mq_requeue_request() followed
by blk_mq_kick_requeue_list().

Also, document that DM blk-mq ontop of old request_fn devices cannot
fail in clone_rq() since the clone request is preallocated as part of
the pdu.
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

45714fbe