提交 · dfcfac3e4cd94abef779297fab6adfd2dbcf52fa · openeuler / raspberrypi-kernel

30 5月, 2015 4 次提交

dm: factor out a common cleanup_mapped_device() · 0f20972f

由 Mike Snitzer 提交于 4月 28, 2015

Introduce a single common method for cleaning up a DM device's
mapped_device.  No functional change, just eliminates duplication of
delicate mapped_device cleanup code.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0f20972f

dm: cleanup methods that requeue requests · 2d76fff1

由 Mike Snitzer 提交于 4月 29, 2015

More often than not a request that is requeued _is_ mapped (meaning the
clone request is allocated and clone->q is initialized).  Rename
dm_requeue_unmapped_original_request() to avoid potential confusion due
to function name containing "unmapped".

Also, remove dm_requeue_unmapped_request() since callers can easily call
the dm_requeue_original_request() directly.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2d76fff1

dm: do not allocate any mempools for blk-mq request-based DM · cbc4e3c1

由 Mike Snitzer 提交于 4月 27, 2015

Do not allocate the io_pool mempool for blk-mq request-based DM
(DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().

Also refine __bind_mempools() to have more precise awareness of which
mempools each type of DM device uses -- avoids mempool churn when
reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cbc4e3c1

dm: fix casting bug in dm_merge_bvec() · 1c220c69

由 Joe Thornber 提交于 5月 29, 2015

dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn").  In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int.  This code made
assumptions about the value of BIO_MAX_SECTORS.

A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors().  At
this point the cast from sector_t to int resulted in a zero value.  The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.

This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.
Signed-off-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+

1c220c69

29 5月, 2015 1 次提交

dm: fix false warning in free_rq_clone() for unmapped requests · e5d8de32

由 Mike Snitzer 提交于 5月 28, 2015

When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.

The warning was added by commit aa6df8dd ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request").  But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.

Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).

Fixes: aa6df8dd ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reported-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e5d8de32

28 5月, 2015 1 次提交

dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY · 45714fbe

由 Mike Snitzer 提交于 5月 27, 2015

Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
DM blk-mq device's .queue_rq.  This cleans up the previous convoluted
handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
(even though it wasn't) and then run blk_mq_requeue_request() followed
by blk_mq_kick_requeue_list().

Also, document that DM blk-mq ontop of old request_fn devices cannot
fail in clone_rq() since the clone request is preallocated as part of
the pdu.
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

45714fbe

27 5月, 2015 1 次提交

dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED · 3a140755

由 Junichi Nomura 提交于 5月 27, 2015

When stacking request-based DM on blk_mq device, request cloning and
remapping are done in a single call to target's clone_and_map_rq().
The clone is allocated and valid only if clone_and_map_rq() returns
DM_MAPIO_REMAPPED.

The "IS_ERR(clone)" check in map_request() does not cover all the
!DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
are not ready or unavailable, clone_and_map_rq() may return
DM_MAPIO_REQUEUE without ever having established an ERR_PTR).  Fix this
by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
map_request().

Without this fix, DM core may call setup_clone() for a NULL clone
and oops like this:

   BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
   IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
   ...
   CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
   ...
   Call Trace:
    [<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
    [<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
    [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
    [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
    [<ffffffff81071fdd>] kthread+0xe6/0xee
    [<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
    [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
    [<ffffffff814c2d98>] ret_from_fork+0x58/0x90
    [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b

Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+

3a140755

26 5月, 2015 1 次提交

dm: run queue on re-queue · 4ae9944d

由 Junichi Nomura 提交于 5月 26, 2015

Without kicking queue, requeued request may stay forever in
the queue if there are no other I/O activities to the device.

The original error had been in v2.6.39 with commit 7eaceacc
("block: remove per-queue plugging"), which replaced conditional
plugging by periodic runqueue.

Commit 9d1deb83 in v4.1-rc1 removed the periodic runqueue
and the problem started to manifest.

Fixes: 9d1deb83 ("dm: don't schedule delayed run of the queue if nothing to do")
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4ae9944d

22 5月, 2015 1 次提交

block, dm: don't copy bios for request clones · 5f1b670d

由 Christoph Hellwig 提交于 5月 22, 2015

Currently dm-multipath has to clone the bios for every request sent
to the lower devices, which wastes cpu cycles and ties down memory.

This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
to not complete bios attached to a request, which we set on clone
requests similar to bios in a flush sequence.  With this change I/O
errors on a path failure only get propagated to dm-multipath, which
can then either resubmit the I/O or complete the bios on the original
request.

I've done some basic testing of this on a Linux target with ALUA support,
and it survives path failures during I/O nicely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5f1b670d

30 4月, 2015 2 次提交

dm: fix free_rq_clone() NULL pointer when requeueing unmapped request · aa6df8dd

由 Mike Snitzer 提交于 4月 29, 2015

Commit 02233342 ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops. It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path

The clone->q check made sense for case #1 but not for #2. However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).

The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html

Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

aa6df8dd

dm: only initialize the request_queue once · 3e6180f0

由 Christoph Hellwig 提交于 4月 30, 2015

Commit bfebd1cd ("dm: add full blk-mq support to request-based DM")
didn't properly account for the need to short-circuit re-initializing
DM's blk-mq request_queue if it was already initialized.

Otherwise, reloading a blk-mq request-based DM table (either manually
or via multipathd) resulted in errors, see:
 https://www.redhat.com/archives/dm-devel/2015-April/msg00132.html

Fix is to only initialize the request_queue on the initial table load
(when the mapped_device type is assigned).

This is better than having dm_init_request_based_blk_mq_queue() return
early if the queue was already initialized because it elevates the
constraint to a more meaningful location in DM core.  As such the
pre-existing early return in dm_init_request_based_queue() can now be
removed.

Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3e6180f0

16 4月, 2015 10 次提交

dm verity: add error handling modes for corrupted blocks · 65ff5b7d

由 Sami Tolvanen 提交于 3月 18, 2015

Add device specific modes to dm-verity to specify how corrupted
blocks should be handled.  The following modes are defined:

  - DM_VERITY_MODE_EIO is the default behavior, where reading a
    corrupted block results in -EIO.

  - DM_VERITY_MODE_LOGGING only logs corrupted blocks, but does
    not block the read.

  - DM_VERITY_MODE_RESTART calls kernel_restart when a corrupted
    block is discovered.

In addition, each mode sends a uevent to notify userspace of
corruption and to allow further recovery actions.

The driver defaults to previous behavior (DM_VERITY_MODE_EIO)
and other modes can be enabled with an additional parameter to
the verity table.
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

65ff5b7d

dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr · 17e149b8

由 Mike Snitzer 提交于 3月 11, 2015

Request-based DM's blk-mq support defaults to off; but a user can easily
change the default using the dm_mod.use_blk_mq module/boot option.

Also, you can check what mode a given request-based DM device is using
with: cat /sys/block/dm-X/dm/use_blk_mq

This change enabled further cleanup and reduced work (e.g. the
md->io_pool and md->rq_pool isn't created if using blk-mq).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

17e149b8

dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342

由 Mike Snitzer 提交于 3月 10, 2015

dm_mq_queue_rq() is in atomic context so care must be taken to not
sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
and dm-mpath's call to blk_get_request().  In the future the bioset
allocations will hopefully go away (by removing support for partial
completions of bios in a cloned request).

Also prepare for supporting DM blk-mq ontop of old-style request_fn
device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
will still be used to queue work if blk-mq is used ontop of old-style
request_fn device(s).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

02233342

dm: add full blk-mq support to request-based DM · bfebd1cd

由 Mike Snitzer 提交于 3月 08, 2015

Commit e5863d9a ("dm: allocate requests in target when stacking on
blk-mq devices") served as the first step toward fully utilizing blk-mq
in request-based DM -- it enabled stacking an old-style (request_fn)
request_queue ontop of the underlying blk-mq device(s).  That first step
didn't improve performance of DM multipath ontop of fast blk-mq devices
(e.g. NVMe) because the top-level old-style request_queue was severely
limited by the queue_lock.

The second step offered here enables stacking a blk-mq request_queue
ontop of the underlying blk-mq device(s).  This unlocks significant
performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
testbed and offered this really positive news:

 "Just providing a performance update. All my fio tests are getting
  roughly equal performance whether accessed through the raw block
  device or the multipath device mapper (~470k IOPS). I could only push
  ~20% of the raw iops through dm before this conversion, so this latest
  tree is looking really solid from a performance standpoint."
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NKeith Busch <keith.busch@intel.com>

bfebd1cd

dm: impose configurable deadline for dm_request_fn's merge heuristic · 0ce65797

由 Mike Snitzer 提交于 2月 26, 2015

Otherwise, for sequential workloads, the dm_request_fn can allow
excessive request merging at the expense of increased service time.

Add a per-device sysfs attribute to allow the user to control how long a
request, that is a reasonable merge candidate, can be queued on the
request queue. The resolution of this request dispatch deadline is in
microseconds (ranging from 1 to 100000 usecs), to set a 20us deadline:
echo 20 > /sys/block/dm-7/dm/rq_based_seq_io_merge_deadline

The dm_request_fn's merge heuristic and associated extra accounting is
disabled by default (rq_based_seq_io_merge_deadline is 0).

This sysfs attribute is not applicable to bio-based DM devices so it
will only ever report 0 for them.

By allowing a request to remain on the queue it will block others
requests on the queue. But introducing a short dequeue delay has proven
very effective at enabling certain sequential IO workloads on really
fast, yet IOPS constrained, devices to build up slightly larger IOs --
yielding 90+% throughput improvements. Having precise control over the
time taken to wait for larger requests to build affords control beyond
that of waiting for certain IO sizes to accumulate (which would require
a deadline anyway). This knob will only ever make sense with sequential
IO workloads and the particular value used is storage configuration
specific.

Given the expected niche use-case for when this knob is useful it has
been deemed acceptable to expose this relatively crude method for
crafting optimal IO on specific storage -- especially given the solution
is simple yet effective. In the context of DM multipath, it is
advisable to tune this sysfs attribute to a value that offers the best
performance for the common case (e.g. if 4 paths are expected active,
tune for that; if paths fail then performance may be slightly reduced).

Alternatives were explored to have request-based DM autotune this value
(e.g. if/when paths fail) but they were quickly deemed too fragile and
complex to warrant further design and development time. If this problem
proves more common as faster storage emerges we'll have to look at
elevating a generic solution into the block core.
Tested-by: NShiva Krishna Merla <shivakrishna.merla@netapp.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

0ce65797

dm: don't start current request if it would've merged with the previous · de3ec86d

由 Mike Snitzer 提交于 2月 24, 2015

Request-based DM's dm_request_fn() is so fast to pull requests off the
queue that steps need to be taken to promote merging by avoiding request
processing if it makes sense.

If the current request would've merged with previous request let the
current request stay on the queue longer.
Suggested-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

de3ec86d

dm: reduce the queue delay used in dm_request_fn from 100ms to 10ms · d548b34b

由 Mike Snitzer 提交于 3月 05, 2015

Commit 7eaceacc ("block: remove per-queue plugging") didn't justify
DM's use of a 100ms delay; such an extended delay is a liability when
there is reason to re-kick the queue.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d548b34b

dm: don't schedule delayed run of the queue if nothing to do · 9d1deb83

由 Mike Snitzer 提交于 2月 24, 2015

In request-based DM's dm_request_fn(), if blk_peek_request() returns
NULL just return.  Avoids unnecessary blk_delay_queue().
Reported-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9d1deb83

dm: only run the queue on completion if congested or no requests pending · 9a0e609e

由 Mike Snitzer 提交于 2月 24, 2015

On really fast storage it can be beneficial to delay running the
request_queue to allow the elevator more opportunity to merge requests.

Otherwise, it has been observed that requests are being sent to
q->request_fn much quicker than is ideal on IOPS-bound backends.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9a0e609e

dm: remove request-based logic from make_request_fn wrapper · ff36ab34

由 Mike Snitzer 提交于 2月 23, 2015

The old dm_request() method used for q->make_request_fn had a branch for
request-based DM support but it isn't needed given that
dm_init_request_based_queue() sets it to the standard blk_queue_bio()
anyway.

Cleanup dm_init_md_queue() to be DM device-type agnostic and have
dm_setup_md_queue() properly finish queue setup based on DM device-type
(bio-based vs request-based).

A followup block patch can be made to remove the export for
blk_queue_bio() now that DM no longer calls it directly.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ff36ab34

01 4月, 2015 3 次提交

dm: remove request-based DM queue's lld_busy_fn hook · d56b9b28

由 Mike Snitzer 提交于 2月 23, 2015

DM multipath is the only caller of blk_lld_busy() -- which calls a
queue's lld_busy_fn hook. Request-based DM doesn't support stacking
multipath devices so there is no reason to register the lld_busy_fn hook
on a multipath device's queue using blk_queue_lld_busy().

As such, remove functions dm_lld_busy and dm_table_any_busy_target.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

d56b9b28

dm: remove unnecessary wrapper around blk_lld_busy · 52b09914

由 Mike Snitzer 提交于 2月 23, 2015

There is no need for DM to export a wrapper around the already exported
blk_lld_busy().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

52b09914

dm: rename __dm_get_reserved_ios() helper to __dm_get_module_param() · 09c2d531

由 Mike Snitzer 提交于 2月 27, 2015

__dm_get_module_param() could be useful for future DM module parameters
besides those related to "reserved_ios".
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

09c2d531

24 3月, 2015 1 次提交

dm: fix add_disk() NULL pointer due to race with free_dev() · 63a4f065

由 Mike Snitzer 提交于 3月 23, 2015

Commit c4db59d3 ("fs: don't reassign dirty inodes to
default_backing_dev_info") exposed DM to a latent race in free_dev() vs
add_disk() in relation to management of the device's minor number.

Fix this by refactoring free_dev() to match cleanup order of the
alloc_dev() error path.  Move cleanup of the gendisk, queue, and bdev
to _before_ the cleanup of the idr managed minor number.

Also, purely due to cleanup that fell out during the free_dev() audit:
- adjust dm_blk_close() to access the gendisk's private_data under
  the _minor_lock spinlock.
- move __dm_destroy()'s dm_get_live_table() call out from under the
  _minor_lock spinlock.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1202449Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
Reported-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

63a4f065

28 2月, 2015 3 次提交

dm snapshot: suspend merging snapshot when doing exception handover · 09ee96b2

由 Mikulas Patocka 提交于 2月 26, 2015

The "dm snapshot: suspend origin when doing exception handover" commit
fixed a exception store handover bug associated with pending exceptions
to the "snapshot-origin" target.

However, a similar problem exists in snapshot merging.  When snapshot
merging is in progress, we use the target "snapshot-merge" instead of
"snapshot-origin".  Consequently, during exception store handover, we
must find the snapshot-merge target and suspend its associated
mapped_device.

To avoid lockdep warnings, the target must be suspended and resumed
without holding _origins_lock.

Introduce a dm_hold() function that grabs a reference on a
mapped_device, but unlike dm_get(), it doesn't crash if the device has
the DMF_FREEING flag set, it returns an error in this case.

In snapshot_resume() we grab the reference to the origin device using
dm_hold() while holding _origins_lock (_origins_lock guarantees that the
device won't disappear).  Then we release _origins_lock, suspend the
device and grab _origins_lock again.

NOTE to stable@ people:
When backporting to kernels 3.18 and older, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

09ee96b2

dm snapshot: suspend origin when doing exception handover · b735fede

由 Mikulas Patocka 提交于 2月 26, 2015

In the function snapshot_resume we perform exception store handover.  If
there is another active snapshot target, the exception store is moved
from this target to the target that is being resumed.

The problem is that if there is some pending exception, it will point to
an incorrect exception store after that handover, causing a crash due to
dm-snap-persistent.c:get_exception()'s BUG_ON.

This bug can be triggered by repeatedly changing snapshot permissions
with "lvchange -p r" and "lvchange -p rw" while there are writes on the
associated origin device.

To fix this bug, we must suspend the origin device when doing the
exception store handover to make sure that there are no pending
exceptions:
- introduce _origin_hash that keeps track of dm_origin structures.
- introduce functions __lookup_dm_origin, __insert_dm_origin and
  __remove_dm_origin that manipulate the origin hash.
- modify snapshot_resume so that it calls dm_internal_suspend_fast() and
  dm_internal_resume_fast() on the origin device.

NOTE to stable@ people:

When backporting to kernels 3.12-3.18, use dm_internal_suspend and
dm_internal_resume instead of dm_internal_suspend_fast and
dm_internal_resume_fast.

When backporting to kernels older than 3.12, you need to pick functions
dm_internal_suspend and dm_internal_resume from the commit
fd2ed4d2.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

b735fede

dm: hold suspend_lock while suspending device during device deletion · ab7c7bb6

由 Mikulas Patocka 提交于 2月 27, 2015

__dm_destroy() must take the suspend_lock so that its presuspend and
postsuspend calls do not race with an internal suspend.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

ab7c7bb6

18 2月, 2015 1 次提交

dm: fix a race condition in dm_get_md · 2bec1f4a

由 Mikulas Patocka 提交于 2月 17, 2015

The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.

dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.

There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.

To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

2bec1f4a

10 2月, 2015 6 次提交

dm: allocate requests in target when stacking on blk-mq devices · e5863d9a

由 Mike Snitzer 提交于 12月 17, 2014

For blk-mq request-based DM the responsibility of allocating a cloned
request is transfered from DM core to the target type.  Doing so
enables the cloned request to be allocated from the appropriate
blk-mq request_queue's pool (only the DM target, e.g. multipath, can
know which block device to send a given cloned request to).

Care was taken to preserve compatibility with old-style block request
completion that requires request-based DM _not_ acquire the clone
request's queue lock in the completion path.  As such, there are now 2
different request-based DM target_type interfaces:
1) the original .map_rq() interface will continue to be used for
   non-blk-mq devices -- the preallocated clone request is passed in
   from DM core.
2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
   blk-mq devices -- blk_get_request() and blk_put_request() are used
   respectively from these hooks.

dm_table_set_type() was updated to detect if the request-based target is
being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
DM core disallows switching the DM table's type after it is set.  This
means that there is no mixing of non-blk-mq and blk-mq devices within
the same request-based DM table.

[This patch was started by Keith and later heavily modified by Mike]
Tested-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e5863d9a

dm: prepare for allocating blk-mq clone requests in target · 466d89a6

由 Keith Busch 提交于 10月 17, 2014

For blk-mq request-based DM the responsibility of allocating a cloned
request will be transfered from DM core to the target type.

To prepare for conditionally using this new model the original
request's 'special' now points to the dm_rq_target_io because the
clone is allocated later in the block layer rather than in DM core.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

466d89a6

dm: submit stacked requests in irq enabled context · 2eb6e1e3

由 Keith Busch 提交于 10月 17, 2014

Switch to having request-based DM enqueue all prep'ed requests into work
processed by another thread. This allows request-based DM to invoke
block APIs that assume interrupt enabled context (e.g. blk_get_request)
and is a prerequisite for adding blk-mq support to request-based DM.

The new kernel thread is only initialized for request-based DM devices.

multipath_map() is now always in irq enabled context so change multipath
spinlock (m->lock) locking to always disable interrupts.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2eb6e1e3

dm: split request structure out from dm_rq_target_io structure · 1ae49ea2

由 Mike Snitzer 提交于 12月 05, 2014

Request-based DM support for blk-mq devices requires that
dm_rq_target_io structures not be allocated with an embedded request
structure.  The request-based DM target (e.g. dm-multipath) must
allocate the request from the blk-mq devices' request_queue using
blk_get_request().

The unfortunate side-effect of this change is old-style request-based DM
support will no longer use contiguous memory for the dm_rq_target_io and
request structures for each clone.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1ae49ea2

dm: remove exports for request-based interfaces without external callers · dbf9782c

由 Mike Snitzer 提交于 12月 16, 2014

Remove exports for dm_dispatch_request, dm_requeue_unmapped_request,
and dm_kill_unmapped_request.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

dbf9782c

dm: fix multipath regression due to initializing wrong request · db507b3f

由 Mike Snitzer 提交于 2月 09, 2015

Commit febf7158 ("block: require blk_rq_prep_clone() be given an
initialized clone request") introduced a regression by calling
blk_rq_init() on the original request rather than the clone
request that is passed to setup_clone().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Fixes: febf7158 ("block: require blk_rq_prep_clone() be given an initialized clone request")
Signed-off-by: NJens Axboe <axboe@fb.com>

db507b3f

29 1月, 2015 1 次提交

block: require blk_rq_prep_clone() be given an initialized clone request · febf7158

由 Keith Busch 提交于 10月 17, 2014

Prepare to allow blk_rq_prep_clone() to accept clone requests that were
allocated from blk-mq request queues.  As such the blk_rq_prep_clone()
caller must first initialize the clone request.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

febf7158

25 1月, 2015 1 次提交

dm: fix handling of multiple internal suspends · 96b26c8c

由 Mikulas Patocka 提交于 1月 08, 2015

Commit ffcc3936 ("dm: enhance internal suspend and resume interface")
attempted to handle multiple internal suspends on the same device, but
it did that incorrectly.  When these functions are called in this order
on the same device the device is no longer suspended, but it should be:
	dm_internal_suspend_noflush
	dm_internal_suspend_noflush
	dm_internal_resume

Fix this bug by maintaining an 'internal_suspend_count' and resuming
the device when this count drops to zero.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

96b26c8c

18 12月, 2014 1 次提交

dm: fix missed error code if .end_io isn't implemented by target_type · 5164bece

由 zhendong chen 提交于 12月 17, 2014

In bio-based DM's clone_endio(), when target_type doesn't implement
.end_io (e.g. linear) r will be always be initialized 0.  So if a
WRITE SAME bio fails WRITE SAME will not be disabled as intended.

Fix this by initializing r to error, rather than 0, in clone_endio().
Signed-off-by: NAlex Chen <alex.chen@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Fixes: 7eee4ae2 ("dm: disable WRITE SAME if it fails")
Cc: stable@vger.kernel.org

5164bece

24 11月, 2014 2 次提交

md: use generic io stats accounting functions to simplify io stat accounting · 18c0b223

由 Gu Zheng 提交于 11月 24, 2014

Use generic io stats accounting help functions (generic_{start,end}_io_acct)
to simplify io stat accounting.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

18c0b223

dm: use rcu_dereference_protected instead of rcu_dereference · a12f5d48

由 Eric Dumazet 提交于 11月 23, 2014

rcu_dereference() should be used in sections protected by rcu_read_lock.

For writers, holding some kind of mutex or lock,
rcu_dereference_protected() is the way to go, adding explicit lockdep
bits.

In __unbind(), we are the last user of this mapped device, so can use
the constant '1' instead of a lockdep_is_held(), not consistent with
other uses of rcu_dereference_protected() which use md->suspend_lock
mutex.
Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 33423974 ("dm: Use rcu_dereference() for accessing rcu pointer")
Cc: Pranith Kumar <bobby.prani@gmail.com>
[snitzer: allow lines longer than 80 columns, refine subject]
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a12f5d48