提交 · 843e8ddb250b5e8ff157b4096277c0c54102905c · openeuler / raspberrypi-kernel

30 5月, 2015 2 次提交

NVMe: End sync requests immediately on failure · 75619bfa

由 Keith Busch 提交于 5月 28, 2015

Do not retry failed sync commands so the original status may be seen
without issuing unnecessary retries.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

75619bfa

NVMe: Use requested sync command timeout · f4ff414a

由 Keith Busch 提交于 5月 28, 2015

Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f4ff414a

23 5月, 2015 1 次提交

NVMe: Fix obtaining command result · a0a931d6

由 Keith Busch 提交于 5月 22, 2015

Replaces req->sense_len usage, which is not owned by the LLD, to
req->special to contain the command result for driver created commands,
and sets the result unconditionally on completion.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Signed-off-by: NJens Axboe <axboe@fb.com>

a0a931d6

22 5月, 2015 11 次提交

block, dm: don't copy bios for request clones · 5f1b670d

由 Christoph Hellwig 提交于 5月 22, 2015

Currently dm-multipath has to clone the bios for every request sent
to the lower devices, which wastes cpu cycles and ties down memory.

This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
to not complete bios attached to a request, which we set on clone
requests similar to bios in a flush sequence.  With this change I/O
errors on a path failure only get propagated to dm-multipath, which
can then either resubmit the I/O or complete the bios on the original
request.

I've done some basic testing of this on a Linux target with ALUA support,
and it survives path failures during I/O nicely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5f1b670d

block: remove management of bi_remaining when restoring original bi_end_io · 326e1dbb

由 Mike Snitzer 提交于 5月 22, 2015

Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for
non-chains") regressed all existing callers that followed this pattern:
 1) saving a bio's original bi_end_io
 2) wiring up an intermediate bi_end_io
 3) restoring the original bi_end_io from intermediate bi_end_io
 4) calling bio_endio() to execute the restored original bi_end_io

The regression was due to BIO_CHAIN only ever getting set if
bio_inc_remaining() is called.  For the above pattern it isn't set until
step 3 above (step 2 would've needed to establish BIO_CHAIN).  As such
the first bio_endio(), in step 2 above, never decremented __bi_remaining
before calling the intermediate bi_end_io -- leaving __bi_remaining with
the value 1 instead of 0.  When bio_inc_remaining() occurred during step
3 it brought it to a value of 2.  When the second bio_endio() was
called, in step 4 above, it should've called the original bi_end_io but
it didn't because there was an extra reference that wasn't dropped (due
to atomic operations being optimized away since BIO_CHAIN wasn't set
upfront).

Fix this issue by removing the __bi_remaining management complexity for
all callers that use the above pattern -- bio_chain() is the only
interface that _needs_ to be concerned with __bi_remaining.  For the
above pattern callers just expect the bi_end_io they set to get called!
Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls
that aren't associated with the bio_chain() interface.

Also, the bio_inc_remaining() interface has been moved local to bio.c.

Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

326e1dbb

nvme: submit internal commands through the block layer · d29ec824

由 Christoph Hellwig 提交于 5月 22, 2015

Use block layer queues with an internal cmd_type to submit internally
generated NVMe commands. This both simplifies the code a lot and allow
for a better structure. For example now the LighNVM code can construct
commands without knowing the details of the underlying I/O descriptors.
Or a future NVMe over network target could inject commands, as well as
could the SCSI translation and ioctl code be reused for such a beast.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d29ec824

C
nvme: fail SCSI read/write command with unsupported protection bit · 772ce435
由 Christoph Hellwig 提交于 5月 22, 2015
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
772ce435

nvme: report the DPOFUA in MODE_SENSE · 90851768

由 Christoph Hellwig 提交于 5月 22, 2015

NVMe device always support the FUA bit, and the SCSI translations
accepts the DPO bit, which doesn't have much of a meaning for us.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

90851768

C
nvme: simplify and cleanup the READ/WRITE SCSI CDB parsing code · cbbb7a2e
由 Christoph Hellwig 提交于 5月 22, 2015
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
cbbb7a2e
C
nvme: first round at deobsfucating the SCSI translation code · 3726897e
由 Christoph Hellwig 提交于 5月 22, 2015
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
3726897e

nvme: fix scsi translation error handling · e61b0a86

由 Christoph Hellwig 提交于 5月 22, 2015

Erorr handling for the scsi translation was completely broken, as there
were two different positive error number spaces overlapping.  Fix this
up by removing one of them, and centralizing the generation of the other
positive values in a single place.  Also fix up a few places that didn't
handle the NVMe error codes properly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e61b0a86

nvme: split nvme_trans_send_fw_cmd · b90c48d0

由 Christoph Hellwig 提交于 5月 22, 2015

This function handles two totally different opcodes, so split it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b90c48d0

nvme: store a struct device pointer in struct nvme_dev · e75ec752

由 Christoph Hellwig 提交于 5月 22, 2015

Most users want the generic device, so store that in struct nvme_dev
instead of the pci_dev.  This also happens to be a nice step towards
making some code reusable for non-PCI transports.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e75ec752

nvme: consolidate synchronous command submission helpers · f705f837

由 Christoph Hellwig 提交于 5月 22, 2015

Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default.  This will allow
adding a timeout to the pass through path later on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f705f837

20 5月, 2015 6 次提交

loop: remove (now) unused 'out' label · 6a927007

由 Jens Axboe 提交于 5月 20, 2015

gcc, righfully, complains:

drivers/block/loop.c:1369:1: warning: label 'out' defined but not used [-Wunused-label]

Kill it.
Signed-off-by: NJens Axboe <axboe@fb.com>

6a927007

s390/block/dasd: remove obsolete while -EBUSY loop · a05e5780

由 Jarod Wilson 提交于 5月 06, 2015

With the mutex_trylock bit gone from blkdev_reread_part(), the retry logic
in dasd_scan_partitions() shouldn't be necessary.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Tejun Heo <tj@kernel.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Markus Pargmann <mpa@pengutronix.de>
CC: Stefan Weinhuber <wein@de.ibm.com>
CC: Stefan Haberland <stefan.haberland@de.ibm.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Fabian Frederick <fabf@skynet.be>
CC: Ming Lei <ming.lei@canonical.com>
CC: David Herrmann <dh.herrmann@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: nbd-general@lists.sourceforge.net
CC: linux-s390@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a05e5780

block: dasd_genhd: convert to blkdev_reread_part · 6029a06c

由 Ming Lei 提交于 5月 06, 2015

Also remove the obsolete comment.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJarod Wilson <jarod@redhat.com>
Acked-by: NJarod Wilson <jarod@redhat.com>
Acked-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6029a06c

block: nbd: convert to blkdev_reread_part() · 9dcd1379

由 Ming Lei 提交于 5月 06, 2015

Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJarod Wilson <jarod@redhat.com>
Acked-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9dcd1379

block: loop: fix another reread part failure · 06f0e9e6

由 Ming Lei 提交于 5月 06, 2015

loop_clr_fd() can be run piggyback with lo_release(), and
under this situation, reread partition may always fail because
bd_mutex has been held already.

This patch detects the situation by the reference count, and
call __blkdev_reread_part() to avoid acquiring the lock again.

In the meantime, this patch switches to new kernel APIs
of blkdev_reread_part() and __blkdev_reread_part().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJarod Wilson <jarod@redhat.com>
Acked-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

06f0e9e6

block: loop: don't hold lo_ctl_mutex in lo_open · f8933667

由 Ming Lei 提交于 5月 06, 2015

The lo_ctl_mutex is held for running all ioctl handlers, and
in some ioctl handlers, ioctl_by_bdev(BLKRRPART) is called for
rereading partitions, which requires bd_mutex.

So it is easy to cause failure because trylock(bd_mutex) may
fail inside blkdev_reread_part(), and follows the lock context:

blkid or other application:
	->open()
		->mutex_lock(bd_mutex)
		->lo_open()
			->mutex_lock(lo_ctl_mutex)

losetup(set fd ioctl):
	->mutex_lock(lo_ctl_mutex)
	->ioctl_by_bdev(BLKRRPART)
		->trylock(bd_mutex)

This patch trys to eliminate the ABBA lock dependency by removing
lo_ctl_mutext in lo_open() with the following approach:

1) make lo_refcnt as atomic_t and avoid acquiring lo_ctl_mutex in lo_open():
	- for open vs. add/del loop, no any problem because of loop_index_mutex
	- freeze request queue during clr_fd, so I/O can't come until
	  clearing fd is completed, like the effect of holding lo_ctl_mutex
	  in lo_open
	- both open() and release() have been serialized by bd_mutex already

2) don't hold lo_ctl_mutex for decreasing/checking lo_refcnt in
lo_release(), then lo_ctl_mutex is only required for the last release.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJarod Wilson <jarod@redhat.com>
Acked-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f8933667

19 5月, 2015 3 次提交

nvme: disable irqs in nvme_freeze_queues · cddcd72b

由 Christoph Hellwig 提交于 5月 07, 2015

The queue_lock needs to be taken with irqs disabled.  This is mostly
due to the old pre blk-mq usage pattern, but we've also picked it up
in most of the few places where we use the queue_lock with blk-mq.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

cddcd72b

cciss: correct the non-resettable board list · 8a0ee3b5

由 Tomas Henzl 提交于 2月 17, 2015

The hpsa driver carries a more recent version,
copy the table from there.
Signed-off-by: NTomas Henzl <thenzl@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8a0ee3b5

cciss: remove duplicate entries from board_type struct · 5aea3288

由 Tomas Henzl 提交于 2月 17, 2015

and devices not supported by this driver from unresettable list
Signed-off-by: NTomas Henzl <thenzl@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5aea3288

06 5月, 2015 8 次提交

block: loop: avoiding too many pending per work I/O · 4d4e41ae

由 Ming Lei 提交于 5月 05, 2015

If there are too many pending per work I/O, too many
high priority work thread can be generated so that
system performance can be effected.

This patch limits the max_active parameter of workqueue as 16.

This patch fixes Fedora 22 live booting performance
regression when it is booted from squashfs over dm
based on loop, and looks the following reasons are
related with the problem:

- not like other filesyststems(such as ext4), squashfs
is a bit special, and I observed that increasing I/O jobs
to access file in squashfs only improve I/O performance a
little, but it can make big difference for ext4

- nested loop: both squashfs.img and ext3fs.img are mounted
as loop block, and ext3fs.img is inside the squashfs

- during booting, lots of tasks may run concurrently

Fixes: b5dd2f60
Cc: stable@vger.kernel.org (v4.0)
Cc: Justin M. Forbes <jforbes@fedoraproject.org>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d4e41ae

block: loop: convert to per-device workqueue · f4aa4c7b

由 Ming Lei 提交于 5月 05, 2015

Documentation/workqueue.txt:
	If there is dependency among multiple work items used
	during memory reclaim, they should be queued to separate
	wq each with WQ_MEM_RECLAIM.

Loop devices can be stacked, so we have to convert to per-device
workqueue. One example is Fedora live CD.

Fixes: b5dd2f60
Cc: stable@vger.kernel.org (v4.0)
Cc: Justin M. Forbes <jforbes@fedoraproject.org>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

f4aa4c7b

nbd: stop using req->cmd · 9dc6c806

由 Christoph Hellwig 提交于 4月 17, 2015

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9dc6c806

block: move PM request support to IDE · a7928c15

由 Christoph Hellwig 提交于 4月 17, 2015

This removes the request types and hacks from the block code and into the
old IDE driver.  There is a small amunt of code duplication due to this,
but it's not too bad.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

a7928c15

C
block: move REQ_TYPE_SENSE to the ide driver · b0b93b48
由 Christoph Hellwig 提交于 4月 17, 2015
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
b0b93b48
C
block: rename REQ_TYPE_SPECIAL to REQ_TYPE_DRV_PRIV · 4f8c9510
由 Christoph Hellwig 提交于 4月 17, 2015
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>
```
4f8c9510

bio: skip atomic inc/dec of ->bi_cnt for most use cases · dac56212

由 Jens Axboe 提交于 4月 17, 2015

Struct bio has a reference count that controls when it can be freed.
Most uses cases is allocating the bio, which then returns with a
single reference to it, doing IO, and then dropping that single
reference. We can remove this atomic_dec_and_test() in the completion
path, if nobody else is holding a reference to the bio.

If someone does call bio_get() on the bio, then we flag the bio as
now having valid count and that we must properly honor the reference
count when it's being put.
Tested-by: NRobert Elliott <elliott@hp.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dac56212

bio: skip atomic inc/dec of ->bi_remaining for non-chains · c4cf5261

由 Jens Axboe 提交于 4月 17, 2015

Struct bio has an atomic ref count for chained bio's, and we use this
to know when to end IO on the bio. However, most bio's are not chained,
so we don't need to always introduce this atomic operation as part of
ending IO.

Add a helper to elevate the bi_remaining count, and flag the bio as
now actually needing the decrement at end_io time. Rename the field
to __bi_remaining to catch any current users of this doing the
incrementing manually.

For high IOPS workloads, this reduces the overhead of bio_endio()
substantially.
Tested-by: NRobert Elliott <elliott@hp.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

c4cf5261

04 5月, 2015 1 次提交

hwrng: bcm63xx - Fix driver compilation · f440c4ee

由 Álvaro Fernández Rojas 提交于 5月 02, 2015

- s/clk_didsable_unprepare/clk_disable_unprepare
- s/prov/priv
- s/error/ret (bcm63xx_rng_probe)

Fixes: 6229c160 ("hwrng: bcm63xx - make use of devm_hwrng_register")
Signed-off-by: NÁlvaro Fernández Rojas <noltari@gmail.com>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

f440c4ee

02 5月, 2015 1 次提交

rbd: end I/O the entire obj_request on error · 082a75da

由 Ilya Dryomov 提交于 4月 25, 2015

When we end I/O struct request with error, we need to pass
obj_request->length as @nr_bytes so that the entire obj_request worth
of bytes is completed.  Otherwise block layer ends up confused and we
trip on

    rbd_assert(more ^ (which == img_request->obj_request_count));

in rbd_img_obj_callback() due to more being true no matter what.  We
already do it in most cases but we are missing some, in particular
those where we don't even get a chance to submit any obj_requests, due
to an early -ENOMEM for example.

A number of obj_request->xferred assignments seem to be redundant but
I haven't touched any of obj_request->xferred stuff to keep this small
and isolated.

Cc: Alex Elder <elder@linaro.org>
Cc: stable@vger.kernel.org # 3.10+
Reported-by: NShawn Edwards <lesser.evil@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

082a75da

01 5月, 2015 5 次提交

net: fec: Fix RGMII-ID mode · e813bb2b

由 Markus Pargmann 提交于 4月 30, 2015

RGMII-ID uses an internal delay within the transmitter or receiver. This
feature is phy specific. The rest of the communication is normal RGMII.

So the fec driver has to check for all RGMII modes, not only
'PHY_INTERFACE_MODE_RGMII'.
Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e813bb2b

net/mlx4_en: Schedule napi when RX buffers allocation fails · 07841f9d

由 Ido Shamay 提交于 4月 30, 2015

When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07841f9d

netxen_nic: use spin_[un]lock_bh around tx_clean_lock · c232d8a8

由 Tony Camuso 提交于 4月 30, 2015

While testing this driver with DEBUG_LOCKDEP and DEBUG_SPINLOCK
enabled did not produce any traces, it would be more prudent in the
case of tx_clean_lock to use spin_[un]lock_bh, since this lock is
manipulated in both the process and softirq contexts.

This patch was tested for functionality and regressions with netperf
and DEBUG_LOCKDEP and DEBUG_SPINLOCK enabled.
Signed-off-by: NTony Camuso <tcamuso@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c232d8a8

net/mlx4_core: Fix unaligned accesses · 17d5ceb6

由 David Ahern 提交于 4月 29, 2015

Addresses the following kernel logs seen during boot:

Kernel unaligned access at TPC[100ee150] mlx4_QUERY_HCA+0x80/0x248 [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Signed-off-by: NDavid Ahern <david.ahern@oracle.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17d5ceb6

mlx4_en: Use correct loop cursor in error path. · f94813f3

由 Benjamin Poirier 提交于 4月 29, 2015

Signed-off-by: NBenjamin Poirier <bpoirier@suse.de>
Fixes: 9e311e77 ("net/mlx4_en: Use affinity hint")
Acked-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f94813f3

30 4月, 2015 2 次提交

dm: fix free_rq_clone() NULL pointer when requeueing unmapped request · aa6df8dd

由 Mike Snitzer 提交于 4月 29, 2015

Commit 02233342 ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops. It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path

The clone->q check made sense for case #1 but not for #2. However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).

The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html

Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops. This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

aa6df8dd

dm: only initialize the request_queue once · 3e6180f0

由 Christoph Hellwig 提交于 4月 30, 2015

Commit bfebd1cd ("dm: add full blk-mq support to request-based DM")
didn't properly account for the need to short-circuit re-initializing
DM's blk-mq request_queue if it was already initialized.

Otherwise, reloading a blk-mq request-based DM table (either manually
or via multipathd) resulted in errors, see:
 https://www.redhat.com/archives/dm-devel/2015-April/msg00132.html

Fix is to only initialize the request_queue on the initial table load
(when the mapped_device type is assigned).

This is better than having dm_init_request_based_blk_mq_queue() return
early if the queue was already initialized because it elevates the
constraint to a more meaningful location in DM core.  As such the
pre-existing early return in dm_init_request_based_queue() can now be
removed.

Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3e6180f0