提交 · 4813577f932050f33c29e2a35a01431814437700 · openanolis / cloud-kernel

21 11月, 2016 3 次提交

dm mpath: change return type of pg_init_all_paths() from int to void · 4813577f

由 Bart Van Assche 提交于 11月 15, 2016

None of the callers of pg_init_all_paths() check its return value.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4813577f

dm mpath: add checks for priority group count to avoid invalid memory access · cc5bd925

由 tang.junhui 提交于 11月 04, 2016

This avoids the potential for invalid memory access, if/when there are
no priority groups, in response to invalid arguments being sent by the
user via DM message (e.g. "switch_group", "disable_group" or
"enable_group").
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cc5bd925

dm mpath: add m->hw_handler_name NULL pointer check in parse_hw_handler() · f97dc421

由 tang.junhui 提交于 10月 28, 2016

Avoids false positive of no hardware handler being specified (which is
implied by a NULL m->hw_handler_name).
Signed-off-by: Ntang.junhui <tang.junhui@zte.com.cn>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

f97dc421

29 9月, 2016 1 次提交

dm mpath: always return reservation conflict without failing over · 8ff232c1

由 Hannes Reinecke 提交于 7月 15, 2015

If dm-mpath encounters an reservation conflict it should not fail the
path (as communication with the target is not affected) but should
rather retry on another path.  However, in doing so we might be inducing
a ping-pong between paths, with no guarantee of any forward progress.
And arguably a reservation conflict is an unexpected error, so we should
be passing it upwards to allow the application to take appropriate
steps.

This change resolves a show-stopper problem seen with the pNFS SCSI
layout because it is trivial to hit reservation conflict based failover
loops without it.

Doubts were raised about the implications of this change relative to
products like IBM's SVC.  But there is little point withholding a fix
for Linux because a proprietary product may or may not have some issues
in its implementation of how it interfaces with Linux.  In the future,
if there is glaring evidence that this change is certainly problematic
we can revisit it.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Acked-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com> # tweaked header

8ff232c1

15 9月, 2016 4 次提交

dm mpath: delay the requeue of blk-mq requests while all paths down · b88efd43

由 Mike Snitzer 提交于 9月 09, 2016

Return DM_MAPIO_DELAY_REQUEUE from .clone_and_map_rq.  Also, return
false from .busy, if all paths are down, so that blk-mq requests get
mapped via .clone_and_map_rq -- which results in DM_MAPIO_DELAY_REQUEUE
being returned to dm-rq.

This change allows for a noticeable reduction in cpu utilization
(reduced kworker load) while all paths are down, e.g.:

system CPU idleness (as measured by fio's --idle-prof=system):
before: system: 86.58%
after:  system: 98.60%
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>

b88efd43

dm mpath: use dm_mq_kick_requeue_list() · 7e48c768

由 Mike Snitzer 提交于 9月 14, 2016

When reinstating a path the blk-mq request_queue's requeue_list should
get kicked.  It makes sense to kick the requeue_list as part of the
existing hook (previously only used by bio-based support).

Rename process_queued_bios_list to process_queued_io_list.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>

7e48c768

dm: convert wait loops to use autoremove_wake_function() · 9f4c3f87

由 Bart Van Assche 提交于 8月 31, 2016

Use autoremove_wake_function() instead of default_wake_function()
to make the dm wait loops more similar to other wait loops in the
kernel.  This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9f4c3f87

dm mpath: check if path's request_queue is dying in activate_path() · f10e06b7

由 Mike Snitzer 提交于 9月 01, 2016

If pg_init_retries is set and a request is queued against a multipath
device with all underlying block device request_queues in the "dying"
state then an infinite loop is triggered because activate_path() never
succeeds and hence never calls pg_init_done().

This change avoids that device removal triggers an infinite loop by
failing the activate_path() which causes the "dying" path to be failed.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

f10e06b7

08 8月, 2016 1 次提交

block: rename bio bi_rw to bi_opf · 1eff9d32

由 Jens Axboe 提交于 8月 05, 2016

Since commit 63a4cc24, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.
Signed-off-by: NJens Axboe <axboe@fb.com>

1eff9d32

03 8月, 2016 1 次提交

dm mpath: add locking to multipath_resume and must_push_back · 1814f2e3

由 Mike Snitzer 提交于 7月 25, 2016

Multiple flags were being tested without locking.  Protect against
non-atomic bit changes in m->flags by holding m->lock (while testing or
setting the queue_if_no_path related flags).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1814f2e3

11 6月, 2016 4 次提交

dm mpath: add optional "queue_mode" feature · e83068a5

由 Mike Snitzer 提交于 5月 24, 2016

Allow a user to specify an optional feature 'queue_mode <mode>' where
<mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
request_fn rq-based, and blk-mq rq-based respectively.

If the queue_mode feature isn't specified the default for the
"multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
it'll default to mode "mq".

This new queue_mode feature introduces the ability for each multipath
device to have its own queue_mode (whereas before this feature all
multipath devices effectively had to have the same queue_mode).

This commit also goes a long way to eliminate the awkward (ab)use of
DM_TYPE_*, the associated filter_md_type() and other relatively fragile
and difficult to maintain code.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e83068a5

M
dm mpath: remove bio-based bloat from struct dm_mpath_io · bf661be1
由 Mike Snitzer 提交于 5月 24, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
bf661be1

dm mpath: reinstate bio-based support · 76e33fe4

由 Mike Snitzer 提交于 5月 19, 2016

Add "multipath-bio" target that offers a bio-based multipath target as
an alternative to the request-based "multipath" target -- but in a
following commit "multipath-bio" will immediately be replaced by a new
"queue_mode" feature for the "multipath" target which will allow
bio-based mode to be selected.

When DM multipath was originally converted from bio-based to
request-based the motivation for the change was better dynamic load
balancing (by leveraging block core's request-based IO schedulers, for
merging and sorting, _before_ DM multipath would make the decision on
where to steer the IO -- based on path load and/or availability).

More background is available in this "Request-based Device-mapper
multipath and Dynamic load balancing" paper:
https://www.kernel.org/doc/ols/2007/ols2007v2-pages-235-244.pdf

But we've now come full circle where significantly faster storage
devices no longer need IOs to be made larger to drive optimal IO
performance. And even if they do there have been changes to the block
and filesystem layers that help ensure upper layers are constructing
larger IOs. In addition, SCSI's differentiated IO errors will propagate
through to bio-based IO completion hooks -- so that eliminates another
historic justiciation for request-based DM multipath. Lastly, the block
layer's immutable biovec changes have made bio cloning cheaper than it
has ever been; whereas request cloning is still relatively expensive
(both on a CPU usage and memory footprint level).

As such, bio-based DM multipath offers the promise of a more efficient
IO path for high IOPs devices that are, or will be, emerging.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

76e33fe4

dm: move request-based code out to dm-rq.[hc] · 4cc96131

由 Mike Snitzer 提交于 5月 12, 2016

Add some seperation between bio-based and request-based DM core code.

'struct mapped_device' and other DM core only structures and functions
have been moved to dm-core.h and all relevant DM core .c files have been
updated to include dm-core.h rather than dm.h

DM targets should _never_ include dm-core.h!

[block core merge conflict resolution from Stephen Rothwell]
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>

4cc96131

06 5月, 2016 4 次提交

dm mpath: eliminate use of spinlock in IO fast-paths · 2da1610a

由 Mike Snitzer 提交于 3月 17, 2016

The primary motivation of this commit is to improve the scalability of
DM multipath on large NUMA systems where m->lock spinlock contention has
been proven to be a serious bottleneck on really fast storage.

The ability to atomically read a pointer, using lockless_dereference(),
is leveraged in this commit.  But all pointer writes are still protected
by the m->lock spinlock (which is fine since these all now occur in the
slow-path).

The following functions no longer require the m->lock spinlock in their
fast-path: multipath_busy(), __multipath_map(), and do_end_io()

And choose_pgpath() is modified to _not_ update m->current_pgpath unless
it also switches the path-group.  This is done to avoid needing to take
the m->lock everytime __multipath_map() calls choose_pgpath().
But m->current_pgpath will be reset if it is failed via fail_path().
Suggested-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2da1610a

dm mpath: move trigger_event member to the end of 'struct multipath' · 20800cb3

由 Mike Snitzer 提交于 3月 17, 2016

Allows the 'work_mutex' member to no longer cross a cacheline.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

20800cb3

dm mpath: use atomic_t for counting members of 'struct multipath' · 91e968aa

由 Mike Snitzer 提交于 3月 17, 2016

The use of atomic_t for nr_valid_paths, pg_init_in_progress and
pg_init_count will allow relaxing the use of the m->lock spinlock.
Suggested-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

91e968aa

dm mpath: switch to using bitops for state flags · 518257b1

由 Mike Snitzer 提交于 3月 17, 2016

Mechanical change that doesn't make any real effort to reduce the use of
m->lock; that will come later (once atomics are used for counters, etc).
Suggested-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

518257b1

11 3月, 2016 1 次提交

dm mpath: cleanup reinstate_path() et al based on code review · ec31f3f7

由 Mike Snitzer 提交于 2月 20, 2016

fail_path() will print a "Failing path ..." message but reinstate_path()
doesn't print a "Reinstating path ...".  Add that message to
reinstate_path() to add symmetry and aid system debugging.

Remove reinstate_path()'s check for the path_selector providing
.reinstate_path hook.  All path selectors provide this and any future
ones must too.

activate_path() calls pg_init_done() with SCSI_DH_DEV_OFFLINED but
pg_init_done() doesn't expicitly handle it in its swicth statement.  Add
SCSI_DH_DEV_OFFLINED to the default case.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

ec31f3f7

23 2月, 2016 11 次提交

M
dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busy · 9f54cec5
由 Mike Snitzer 提交于 2月 11, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
9f54cec5
M
dm mpath: switch from 'unsigned' to 'bool' for flags where appropriate · be7d31cc
由 Mike Snitzer 提交于 2月 10, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
be7d31cc

dm path selector: remove 'repeat_count' return from .select_path hook · 90a4323c

由 Mike Snitzer 提交于 2月 17, 2016

If a path selector has any use for a repeat_count it should be handled
locally and not depend on the dm-mpath core to be concerned with it.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

90a4323c

dm mpath: remove repeat_count support from multipath core · 21136f89

由 Mike Snitzer 提交于 2月 10, 2016

Preparation for making __multipath_map() avoid taking the m->lock
spinlock -- in favor of using RCU locking.

repeat_count was primarily for bio-based DM multipath's benefit.  There
is really no need for it anymore now that DM multipath is request-based.
As such, repeat_count > 1 is no longer honored and a warning is
displayed if the user attempts to use a value > 1.  This is a temporary
change for the round-robin path-selector (as a later commit will restore
its support for repeat_count > 1).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

21136f89

M
dm mpath: remove unnecessary casts in front of ti->private · 7943bd6d
由 Mike Snitzer 提交于 2月 02, 2016
```
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
7943bd6d

dm mpath: use blk_mq_alloc_request() and blk_mq_free_request() directly · 78ce23b5

由 Mike Snitzer 提交于 1月 31, 2016

There isn't any need to support both old .request_fn and blk-mq paths
in the blk-mq specific portion of __multipath_map().  Call
blk_mq_alloc_request() directly rather than use blk_get_request().

Similarly, call blk_mq_free_request(), rather than blk_put_request(), in
multipath_release_clone().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

78ce23b5

dm mpath: cleanup 'struct dm_mpath_io' management code · 2eff1924

由 Mike Snitzer 提交于 2月 03, 2016

Refactor and rename existing interfaces to be more specific and
self-documenting.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

2eff1924

dm mpath: use blk-mq pdu for per-request 'struct dm_mpath_io' · 8637a6bf

由 Mike Snitzer 提交于 1月 31, 2016

Allow the multipath target to avoid making small allocations for each
'struct dm_mpath_io' that is needed for each request.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

8637a6bf

dm: distinquish old .request_fn (dm-old) vs dm-mq request-based DM · eca7ee6d

由 Mike Snitzer 提交于 2月 20, 2016

Rename various methods to have either a "dm_old" or "dm_mq" prefix.
Improve code comments to assist with understanding the duality of code
that handles both "dm_old" and "dm_mq" cases.

It is no much easier to quickly look at the code and _know_ that a given
method is either 1) "dm_old" only 2) "dm_mq" only 3) common to both.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

eca7ee6d

dm: remove support for stacking dm-mq on .request_fn device(s) · c5248f79

由 Mike Snitzer 提交于 2月 20, 2016

Remove all fiddley code that propped up this support for a blk-mq
request-queue ontop of all .request_fn devices.

Testing has proven this niche request-based dm-mq mode to be buggy, when
testing fault tolerance with DM multipath, and there is no point trying
to preserve it.

Should help improve efficiency of pure dm-mq code and make code
maintenance less delicate.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c5248f79

dm: optimize dm_mq_queue_rq() · 16f12266

由 Mike Snitzer 提交于 1月 31, 2016

DM multipath is the only dm-mq target. But that aside, request-based DM
only supports tables with a single target that is immutable. Leverage
this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
the mapped_device when the table was made active. This saves the need
to even take the read-side of the SRCU via dm_{get,put}_live_table.

If the active DM table does not have an immutable target (e.g. "error"
target was swapped in) then fallback to the slow-path where the target
is looked up from the live table.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

16f12266

18 11月, 2015 2 次提交

dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path · 43e43c9e

由 Junichi Nomura 提交于 11月 17, 2015

In multipath_prepare_ioctl(),
  - pgpath is a path selected from available paths
  - m->queue_io is true if we cannot send a request immediately to
    paths, either because:
      * there is no available path
      * the path group needs activation (pg_init)
          - pg_init is not started
          - pg_init is still running
  - m->queue_if_no_path is true if the device is configured to queue
    I/O if there are no available paths

If !pgpath && !m->queue_if_no_path, the handler should return -EIO.
However in the course of refactoring the condition check has broken
and returns success in that case.  Since bdev points to the dm device
itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and
recurses until crash.

You could reproduce the problem like this:

  # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
  # sg_inq /dev/mapper/mp
  <crash>
  [  172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268
  [  172.662843] PGD 19dd067 PUD 0
  [  172.666269] Thread overran stack, or stack corrupted
  [  172.671808] Oops: 0000 [#1] SMP
  ...

Fix the condition check with some clarifications.

Fixes: e56f81e0 ("dm: refactor ioctl handling")
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

43e43c9e

dm: fix ioctl retry termination with signal · 5bbbfdf6

由 Junichi Nomura 提交于 11月 17, 2015

dm-mpath retries ioctl, when no path is readily available and the device
is configured to queue I/O in such a case. If you want to stop the retry
before multipathd decides to turn off queueing mode, you could send
signal for the process to exit from the loop.

However the check of fatal signal has not carried along when commit
6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the
loop from dm-mpath to dm core. As a result, we can't terminate such
a process in the retry loop.

Easy reproducer of the situation is:

  # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
  # dmsetup message mp 0 'queue_if_no_path'
  # sg_inq /dev/mapper/mp

then you should be able to terminate sg_inq by pressing Ctrl+C.

Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths")
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

5bbbfdf6

01 11月, 2015 3 次提交

dm: add support for passing through persistent reservations · 71cdb697

由 Christoph Hellwig 提交于 10月 15, 2015

This adds support to pass through persistent reservation requests
similar to the existing ioctl handling, and with the same limitations,
e.g. devices may only have a single target attached.

This is mostly intended for multipathing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

71cdb697

dm: refactor ioctl handling · e56f81e0

由 Christoph Hellwig 提交于 10月 15, 2015

This moves the call to blkdev_ioctl and the argument checking to DM core
code, and only leaves a callout to find the block device to operate on
in the targets.  This simplifies the code and allows us to pass through
ioctl-like command using other methods in the next patch.

Also split out a helper around calling the prepare_ioctl method that
will be reused for persistent reservation handling.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

e56f81e0

Revert "dm mpath: fix stalls when handling invalid ioctls" · 47796938

由 Mauricio Faria de Oliveira 提交于 10月 29, 2015

This reverts commit a1989b33.

That commit introduced a regression at least for the case of the SG_IO ioctl()
running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
than blocking due to queue_if_no_path until a path becomes active, for example.

That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
(qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
from multipath devices; which leads to SCSI/filesystem errors in such a guest.

More general scenarios can hit that regression too. The following demonstration
employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
(some output & user changes omitted for brevity and comments added for clarity).

Reverting that commit restores normal operation (queueing) in failing scenarios;
tested on linux-next (next-20151022).

1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)

    $ cat sg_simple0.c
    ... see [3] ...
    $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
    $ gcc sgio_inquiry.c -o sgio_inquiry

2) The ioctl() works fine with active paths present.

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM     ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=active
    | |- 8:0:11:0  sdz  65:144  active undef running
    | `- 9:0:9:0   sdbf 67:144  active undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 8:0:12:0  sdae 65:224  active undef running
      `- 9:0:12:0  sdbo 68:32   active undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    Some of the INQUIRY command's response:
        IBM       2145              0000
    INQUIRY duration=0 millisecs, resid=0

3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
   for unprivileged users (rather than blocking due to queue_if_no_path).

    # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
          do multipathd -k"fail path $path"; done

    # multipath -l 85ag56
    85ag56 (...) dm-19 IBM     ,2145
    size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
    |-+- policy='service-time 0' prio=0 status=enabled
    | |- 8:0:11:0  sdz  65:144  failed undef running
    | `- 9:0:9:0   sdbf 67:144  failed undef running
    `-+- policy='service-time 0' prio=0 status=enabled
      |- 8:0:12:0  sdae 65:224  failed undef running
      `- 9:0:12:0  sdbo 68:32   failed undef running

    $ ./sgio_inquiry /dev/mapper/85ag56
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
   it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().

    $ dmesg
    <...>
    [] device-mapper: multipath: Failing path 65:144.
    [] device-mapper: multipath: Failing path 67:144.
    [] device-mapper: multipath: Failing path 65:224.
    [] device-mapper: multipath: Failing path 68:32.
    [] sgio_inquiry: sending ioctl 2285 to a partition!

5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
   (then queueing happens -- in this example, queue_if_no_path is set);
   this is due to a conditional check in scsi_verify_blk_ioctl().

    # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
    sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device

    # ./sgio_inquiry /dev/mapper/85ag56 &
    [1] 72830

    # cat /proc/72830/stack
    [<c00000171c0df700>] 0xc00000171c0df700
    [<c000000000015934>] __switch_to+0x204/0x350
    [<c000000000152d4c>] msleep+0x5c/0x80
    [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
    [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
    [<c0000000003128e4>] block_ioctl+0x64/0xd0
    [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
    [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
    [<c000000000009358>] system_call+0x38/0xd0

6) This is the function call chain exercised in this analysis:

SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
    -> do_vfs_ioctl()
        -> vfs_ioctl()
            ...
            error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
            ...
                -> dm_blk_ioctl() @ drivers/md/dm.c
                    -> multipath_ioctl() @ drivers/md/dm-mpath.c
                        ...
                        (bdev = NULL, due to no active paths)
                        ...
                        if (!bdev || <...>) {
                            int err = scsi_verify_blk_ioctl(NULL, cmd);
                            if (err)
                                r = err;
                        }
                        ...
                            -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                ...
                                if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                    return 0;
                                ...
                                if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                    return 0;
                                ...
                                printk_ratelimited(KERN_WARNING
                                           "%s: sending ioctl %x to a partition!\n" <...>);

                                return -ENOIOCTLCMD;
                            <-
                        ...
                        return r ? : <...>
                    <-
            ...
            if (error == -ENOIOCTLCMD)
                error = -ENOTTY;
             out:
                return error;
            ...

Links:
[1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
[2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
[3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org

47796938

29 8月, 2015 2 次提交

dm-mpath, scsi_dh: request scsi_dh modules in scsi_dh, not dm-mpath · 566079c8

由 Christoph Hellwig 提交于 8月 27, 2015

This way we can reused the same code any attachment method, not just those
requested from dm-mpath.

[jejb: fixup checkpatch error]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJames Bottomley <JBottomley@Odin.com>

566079c8

dm-mpath, scsi_dh: don't let dm detach device handlers · 1bab0de0

由 Christoph Hellwig 提交于 8月 27, 2015

While allowing dm-mpath to attach device handlers is a functionality we need
for backwards compatibility reason there is no reason to reference count
them and detach them if dm-mpath stops using the device for some reason.

If the device handler works for the given device it can just stay attached,
and we can take the retain_hw_handler codepath.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NHannes Reinecke <hare@Suse.de>
Signed-off-by: NJames Bottomley <JBottomley@Odin.com>

1bab0de0

28 5月, 2015 1 次提交

dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path · 4c6dd53d

由 Mike Snitzer 提交于 5月 27, 2015

Otherwise kmemleak reported:

unreferenced object 0xffff88009b14e2b0 (size 16):
  comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
  hex dump (first 16 bytes):
    40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
  backtrace:
    [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
    [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
    [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
    [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
    [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
    [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
    [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
    [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
    [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
    [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
    [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
    [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
    [<ffffffff812c0ff2>] submit_bio+0x72/0x150
    [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
    [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
    [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390

Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+

4c6dd53d

16 4月, 2015 2 次提交

dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342

由 Mike Snitzer 提交于 3月 10, 2015

dm_mq_queue_rq() is in atomic context so care must be taken to not
sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
and dm-mpath's call to blk_get_request().  In the future the bioset
allocations will hopefully go away (by removing support for partial
completions of bios in a cloned request).

Also prepare for supporting DM blk-mq ontop of old-style request_fn
device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
will still be used to queue work if blk-mq is used ontop of old-style
request_fn device(s).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

02233342

dm: add full blk-mq support to request-based DM · bfebd1cd

由 Mike Snitzer 提交于 3月 08, 2015

Commit e5863d9a ("dm: allocate requests in target when stacking on
blk-mq devices") served as the first step toward fully utilizing blk-mq
in request-based DM -- it enabled stacking an old-style (request_fn)
request_queue ontop of the underlying blk-mq device(s).  That first step
didn't improve performance of DM multipath ontop of fast blk-mq devices
(e.g. NVMe) because the top-level old-style request_queue was severely
limited by the queue_lock.

The second step offered here enables stacking a blk-mq request_queue
ontop of the underlying blk-mq device(s).  This unlocks significant
performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
testbed and offered this really positive news:

 "Just providing a performance update. All my fio tests are getting
  roughly equal performance whether accessed through the raw block
  device or the multipath device mapper (~470k IOPS). I could only push
  ~20% of the raw iops through dm before this conversion, so this latest
  tree is looking really solid from a performance standpoint."
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NKeith Busch <keith.busch@intel.com>

bfebd1cd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功