提交 · 115485e83f497fdf9b4bf779038cfe4e141292a9 · openeuler / Kernel

11 3月, 2016 3 次提交

dm: add 'dm_numa_node' module parameter · 115485e8

由 Mike Snitzer 提交于 2月 22, 2016

Allows user to control which NUMA node the memory for DM device
structures (e.g. mapped_device, request_queue, gendisk, blk_mq_tag_set)
is allocated from.

Defaults to NUMA_NO_NODE (-1).  Allowable range is from -1 until the
last online NUMA node id.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

115485e8

M
dm thin metadata: remove needless newline from subtree_dec() DMERR message · 29f929b5
由 Mike Snitzer 提交于 1月 21, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
29f929b5

dm mpath: cleanup reinstate_path() et al based on code review · ec31f3f7

由 Mike Snitzer 提交于 2月 20, 2016

fail_path() will print a "Failing path ..." message but reinstate_path()
doesn't print a "Reinstating path ...".  Add that message to
reinstate_path() to add symmetry and aid system debugging.

Remove reinstate_path()'s check for the path_selector providing
.reinstate_path hook.  All path selectors provide this and any future
ones must too.

activate_path() calls pg_init_done() with SCSI_DH_DEV_OFFLINED but
pg_init_done() doesn't expicitly handle it in its swicth statement.  Add
SCSI_DH_DEV_OFFLINED to the default case.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ec31f3f7

23 2月, 2016 23 次提交

M
dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busy · 9f54cec5
由 Mike Snitzer 提交于 2月 11, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
9f54cec5
M
dm mpath: switch from 'unsigned' to 'bool' for flags where appropriate · be7d31cc
由 Mike Snitzer 提交于 2月 10, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
be7d31cc

dm round robin: use percpu 'repeat_count' and 'current_path' · b0b477c7

由 Mike Snitzer 提交于 2月 17, 2016

Now that dm-mpath core is lockless in the per-IO fast path it is
critical, for performance, to have the .select_path hook
(rr_select_path) also be as lockless as possible.

The new percpu members of 'struct selector' allow for lockless support
of 'repeat_count' governed repeat use of a previously selected path.  If
a path fails while it is 'current_path' the worst case is concurrent IO
might be mapped to the failed path until the .fail_path hook
(rr_fail_path) is called.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

b0b477c7

dm path selector: remove 'repeat_count' return from .select_path hook · 90a4323c

由 Mike Snitzer 提交于 2月 17, 2016

If a path selector has any use for a repeat_count it should be handled
locally and not depend on the dm-mpath core to be concerned with it.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

90a4323c

dm mpath: push path selector locking down to path selectors · 9659f811

由 Mike Snitzer 提交于 2月 15, 2016

Proper locking of the lists used by the path selectors should be handled
within the selectors (relying on dm-mpath.c code's use of the m->lock
spinlock was reckless).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9659f811

dm mpath: remove repeat_count support from multipath core · 21136f89

由 Mike Snitzer 提交于 2月 10, 2016

Preparation for making __multipath_map() avoid taking the m->lock
spinlock -- in favor of using RCU locking.

repeat_count was primarily for bio-based DM multipath's benefit.  There
is really no need for it anymore now that DM multipath is request-based.
As such, repeat_count > 1 is no longer honored and a warning is
displayed if the user attempts to use a value > 1.  This is a temporary
change for the round-robin path-selector (as a later commit will restore
its support for repeat_count > 1).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

21136f89

M
dm mpath: remove unnecessary casts in front of ti->private · 7943bd6d
由 Mike Snitzer 提交于 2月 02, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
7943bd6d

dm mpath: use blk_mq_alloc_request() and blk_mq_free_request() directly · 78ce23b5

由 Mike Snitzer 提交于 1月 31, 2016

There isn't any need to support both old .request_fn and blk-mq paths
in the blk-mq specific portion of __multipath_map().  Call
blk_mq_alloc_request() directly rather than use blk_get_request().

Similarly, call blk_mq_free_request(), rather than blk_put_request(), in
multipath_release_clone().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

78ce23b5

dm mpath: cleanup 'struct dm_mpath_io' management code · 2eff1924

由 Mike Snitzer 提交于 2月 03, 2016

Refactor and rename existing interfaces to be more specific and
self-documenting.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2eff1924

dm mpath: use blk-mq pdu for per-request 'struct dm_mpath_io' · 8637a6bf

由 Mike Snitzer 提交于 1月 31, 2016

Allow the multipath target to avoid making small allocations for each
'struct dm_mpath_io' that is needed for each request.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8637a6bf

dm: allow immutable request-based targets to use blk-mq pdu · 591ddcfc

由 Mike Snitzer 提交于 1月 31, 2016

This will allow DM multipath to use a portion of the blk-mq pdu space
for target data (e.g. struct dm_mpath_io).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

591ddcfc

M
dm: rename target's per_bio_data_size to per_io_data_size · 30187e1d
由 Mike Snitzer 提交于 1月 31, 2016
```
Request-based DM will also make use of per_bio_data_size.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
30187e1d

dm: distinquish old .request_fn (dm-old) vs dm-mq request-based DM · eca7ee6d

由 Mike Snitzer 提交于 2月 20, 2016

Rename various methods to have either a "dm_old" or "dm_mq" prefix.
Improve code comments to assist with understanding the duality of code
that handles both "dm_old" and "dm_mq" cases.

It is no much easier to quickly look at the code and _know_ that a given
method is either 1) "dm_old" only 2) "dm_mq" only 3) common to both.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

eca7ee6d

dm: remove support for stacking dm-mq on .request_fn device(s) · c5248f79

由 Mike Snitzer 提交于 2月 20, 2016

Remove all fiddley code that propped up this support for a blk-mq
request-queue ontop of all .request_fn devices.

Testing has proven this niche request-based dm-mq mode to be buggy, when
testing fault tolerance with DM multipath, and there is no point trying
to preserve it.

Should help improve efficiency of pure dm-mq code and make code
maintenance less delicate.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c5248f79

dm: fix a couple locking issues with use of block interfaces · 818c5f3b

由 Mike Snitzer 提交于 2月 20, 2016

old_stop_queue() was checking blk_queue_stopped() without holding the
q->queue_lock.

dm_requeue_original_request() needed to check blk_queue_stopped(), with
q->queue_lock held, before calling blk_mq_kick_requeue_list().  And a
side-effect of that change is start_queue() must also call
blk_mq_kick_requeue_list().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

818c5f3b

dm: allocate blk_mq_tag_set rather than embed in mapped_device · 1c357a1e

由 Mike Snitzer 提交于 2月 06, 2016

The blk_mq_tag_set is only needed for dm-mq support. There is point
wasting space in 'struct mapped_device' for non-dm-mq devices.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return

1c357a1e

dm: add 'dm_mq_nr_hw_queues' and 'dm_mq_queue_depth' module params · faad87df

由 Mike Snitzer 提交于 1月 28, 2016

Allow user to change these values via module params or sysfs.

'dm_mq_nr_hw_queues' defaults to 1 (max 32).

'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too
small under moderate sized workloads -- the dm-multipath device would
continuously block waiting for tags (requests) to become available).
The maximum is BLK_MQ_MAX_DEPTH (currently 10240).

Keep in mind the total number of pre-allocated requests per
request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth'
(currently 2048).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

faad87df

dm: optimize dm_request_fn() · c91852ff

由 Mike Snitzer 提交于 1月 31, 2016

DM multipath is the only request-based DM target -- which only supports
tables with a single target that is immutable.  Leverage this fact in
dm_request_fn().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c91852ff

dm: optimize dm_mq_queue_rq() · 16f12266

由 Mike Snitzer 提交于 1月 31, 2016

DM multipath is the only dm-mq target. But that aside, request-based DM
only supports tables with a single target that is immutable. Leverage
this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
the mapped_device when the table was made active. This saves the need
to even take the read-side of the SRCU via dm_{get,put}_live_table.

If the active DM table does not have an immutable target (e.g. "error"
target was swapped in) then fallback to the slow-path where the target
is looked up from the live table.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

16f12266

dm: set DM_TARGET_WILDCARD feature on "error" target · f083b09b

由 Mike Snitzer 提交于 2月 06, 2016

The DM_TARGET_WILDCARD feature indicates that the "error" target may
replace any target; even immutable targets.  This feature will be useful
to preserve the ability to replace the "multipath" target even once it
is formally converted over to having the DM_TARGET_IMMUTABLE feature.

Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
.map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
in the target_type.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f083b09b

dm: cleanup dm_any_congested() · e522c039

由 Mike Snitzer 提交于 2月 02, 2016

The request-based DM support for checking queue congestion doesn't
require access to the live DM table.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e522c039

M
dm: remove unused dm_get_rq_mapinfo() · ae6ad75e
由 Mike Snitzer 提交于 1月 30, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
ae6ad75e

dm: fix excessive dm-mq context switching · 6acfe68b

由 Mike Snitzer 提交于 2月 05, 2016

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...
  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b0 !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q->mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NJens Axboe <axboe@kernel.dk>

6acfe68b

22 2月, 2016 3 次提交

dm: fix sparse "unexpected unlock" warnings in ioctl code · 956a4025

由 Mike Snitzer 提交于 2月 18, 2016

Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it
do the dm_{get,put}_live_table() rather than split those operations.

The dm_grab_bdev_for_ioctl() callers only care about the block_device
associated with a singleton DM device so there isn't any need to retain
a reference to the live DM table.  It is sufficient to:
1) dm_get_live_table()
2) bdgrab() the bdev associated with the singleton table's target
3) dm_put_live_table()
4) bdput() the bdev
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

956a4025

dm: do not return target from dm_get_live_table_for_ioctl() · 66482026

由 Mike Snitzer 提交于 2月 18, 2016

None of the callers actually used the returned target.
Also, just reuse bdev pointer passed to dm_blk_ioctl().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

66482026

dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths · 4328daa2

由 Mike Snitzer 提交于 2月 21, 2016

Using request-based DM mpath configured with the following stacking
(.request_fn DM mpath ontop of scsi-mq paths):

echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
echo N > /sys/module/dm_mod/parameters/use_blk_mq

'struct dm_rq_target_io' would leak if a request is requeued before a
blk-mq clone is allocated (or fails to allocate).  free_rq_tio()
wasn't being called.

kmemleak reported:

unreferenced object 0xffff8800b90b98c0 (size 112):
  comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s)
  hex dump (first 32 bytes):
    00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff  ..\,....@.......
    e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4............
  backtrace:
    [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0
    [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20
    [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170
    [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod]
    [<ffffffff812fbd78>] blk_peek_request+0x168/0x290
    [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod]
    [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40
    [<ffffffff812f9585>] blk_delay_work+0x25/0x40
    [<ffffffff81096fff>] process_one_work+0x14f/0x3d0
    [<ffffffff81097715>] worker_thread+0x125/0x4b0
    [<ffffffff8109ce88>] kthread+0xd8/0xf0
    [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff

crash> struct -o dm_rq_target_io
struct dm_rq_target_io {
    ...
}
SIZE: 112

Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4328daa2

25 1月, 2016 2 次提交

md-cluster: delete useless code · fc2561ec

由 Shaohua Li 提交于 1月 22, 2016

page->index already considers node offset. The node_offset calculation
in write_sb_page is useless and confusion.

Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: NeilBrown <neilb@suse.com>
Acked-by: NGuoqing Jiang <gqjiang@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

fc2561ec

md-cluster: fix missing memory free · 4ac7a65f

由 Shaohua Li 提交于 1月 22, 2016

There are several places we allocate dlm_lock_resource, but not free it.

leave() need free a lock resource too (from Guoqing)
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Guoqing Jiang <gqjiang@suse.com>
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: NShaohua Li <shli@fb.com>

4ac7a65f

21 1月, 2016 1 次提交

MD: rename some functions · 849674e4

由 Shaohua Li 提交于 1月 20, 2016

These short function names are hard to search. Rename them to make vim happy.
Signed-off-by: NShaohua Li <shli@fb.com>

849674e4

14 1月, 2016 4 次提交

md/raid: only permit hot-add of compatible integrity profiles · 1501efad

由 Dan Williams 提交于 1月 13, 2016

It is not safe for an integrity profile to be changed while i/o is
in-flight in the queue.  Prevent adding new disks or otherwise online
spares to an array if the device has an incompatible integrity profile.

The original change to the blk_integrity_unregister implementation in
md, commmit c7bfced9 "md: suspend i/o during runtime
blk_integrity_unregister" introduced an immediate hang regression.

This policy of disallowing changes the integrity profile once one has
been established is shared with DM.

Here is an abbreviated log from a test run that:
1/ Creates a degraded raid1 with an integrity-enabled device (pmem0s) [   59.076127]
2/ Tries to add an integrity-disabled device (pmem1m) [   90.489209]
3/ Retries with an integrity-enabled device (pmem1s) [  205.671277]

[   59.076127] md/raid1:md0: active with 1 out of 2 mirrors
[   59.078302] md: data integrity enabled on md0
[..]
[   90.489209] md0: incompatible integrity profile for pmem1m
[..]
[  205.671277] md: super_written gets error=-5
[  205.677386] md/raid1:md0: Disk failure on pmem1m, disabling device.
[  205.677386] md/raid1:md0: Operation continuing on 1 devices.
[  205.683037] RAID1 conf printout:
[  205.684699]  --- wd:1 rd:2
[  205.685972]  disk 0, wo:0, o:1, dev:pmem0s
[  205.687562]  disk 1, wo:1, o:1, dev:pmem1s
[  205.691717] md: recovery of RAID array md0

Fixes: c7bfced9 ("md: suspend i/o during runtime blk_integrity_unregister")
Cc: <stable@vger.kernel.org>
Cc: Mike Snitzer <snitzer@redhat.com>
Reported-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

1501efad

raid5-cache: handle journal hotadd in quiesce · 16a43f6a

由 Shaohua Li 提交于 1月 06, 2016

Handle journal hotadd in quiesce to avoid creating duplicated threads.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

16a43f6a

MD: add journal with array suspended · 87d4d916

由 Shaohua Li 提交于 1月 06, 2016

Hot add journal disk in recovery thread context brings a lot of trouble
as IO could be running. Unlike spare disk hot add, adding journal disk
with array suspended makes more sense and implmentation is much easier.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

87d4d916

md: set MD_HAS_JOURNAL in correct places · a62ab49e

由 Shaohua Li 提交于 1月 06, 2016

Set MD_HAS_JOURNAL when a array is loaded or journal is initialized.
This is to avoid the flags set too early in journal disk hotadd.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NNeilBrown <neilb@suse.com>

a62ab49e

10 1月, 2016 2 次提交

badblocks: rename badblocks_free to badblocks_exit · d3b407fb

由 Dan Williams 提交于 1月 06, 2016

For symmetry with badblocks_init() make it clear that this path only
destroys incremental allocations of a badblocks instance, and does not
free the badblocks instance itself.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d3b407fb

md: convert to use the generic badblocks code · fc974ee2

由 Vishal Verma 提交于 12月 24, 2015

Retain badblocks as part of rdev, but use the accessor functions from
include/linux/badblocks for all manipulation.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fc974ee2

09 1月, 2016 1 次提交

dm snapshot: fix hung bios when copy error occurs · 385277bf

由 Mikulas Patocka 提交于 1月 08, 2016

When there is an error copying a chunk dm-snapshot can incorrectly hold
associated bios indefinitely, resulting in hung IO.

The function copy_callback sets pe->error if there was error copying the
chunk, and then calls complete_exception.  complete_exception calls
pending_complete on error, otherwise it calls commit_exception with
commit_callback (and commit_callback calls complete_exception).

The persistent exception store (dm-snap-persistent.c) assumes that calls
to prepare_exception and commit_exception are paired.
persistent_prepare_exception increases ps->pending_count and
persistent_commit_exception decreases it.

If there is a copy error, persistent_prepare_exception is called but
persistent_commit_exception is not.  This results in the variable
ps->pending_count never returning to zero and that causes some pending
exceptions (and their associated bios) to be held forever.

Fix this by unconditionally calling commit_exception regardless of
whether the copy was successful.  A new "valid" parameter is added to
commit_exception -- when the copy fails this parameter is set to zero so
that the chunk that failed to copy (and all following chunks) is not
recorded in the snapshot store.  Also, remove commit_callback now that
it is merely a wrapper around pending_complete.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

385277bf

07 1月, 2016 1 次提交

dm thin: bump thin and thin-pool target versions · 1c2e54e1

由 Mike Snitzer 提交于 1月 06, 2016

Commit 3d5f6733 ("dm thin metadata: speed up discard of partially mapped
volumes"), or some other dm-thinp change during the Linux 4.5
development window, really should've bumped these target versions.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1c2e54e1

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功