提交 · bccf5e26d99c28980bd6ced474422a1b18402263 · openeuler / Kernel

04 9月, 2020 2 次提交

blk-mq: Facilitate a shared sbitmap per tagset · 32bc15af

由 John Garry 提交于 8月 19, 2020

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec0
("blk-mq: drain I/O when all CPUs in a hctx are offline").

However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi:
core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.

This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32bc15af

blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED · 51db1c37

由 Ming Lei 提交于 8月 19, 2020

BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

[jpg: rebase a few times, add rnbd-clt.c change]
Suggested-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

51db1c37

02 9月, 2020 1 次提交

block: Move blk_mq_bio_list_merge() into blk-merge.c · bdc6a287

由 Baolin Wang 提交于 8月 28, 2020

Move the blk_mq_bio_list_merge() into blk-merge.c and
rename it as a generic name.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bdc6a287

29 7月, 2020 1 次提交

block: Remove callback typedefs for blk_mq_ops · 0516c2f6

由 Daniel Wagner 提交于 7月 28, 2020

No need to define typedefs for the callbacks, because there is not a
single user except blk_mq_ops.
Signed-off-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0516c2f6

01 7月, 2020 1 次提交

block: move ->make_request_fn to struct block_device_operations · c62b37d9

由 Christoph Hellwig 提交于 7月 01, 2020

The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector.  Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does).  Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c62b37d9

30 6月, 2020 1 次提交

blk-mq: pass request queue into get/put budget callback · 65c76369

由 Ming Lei 提交于 6月 30, 2020

blk-mq budget is abstract from scsi's device queue depth, and it is
always per-request-queue instead of hctx.

It can be quite absurd to get a budget from one hctx, then dequeue a
request from scheduler queue, and this request may not belong to this
hctx, at least for bfq and deadline.

So fix the mess and always pass request queue to get/put budget
callback.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NBaolin Wang <baolin.wang7@gmail.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDouglas Anderson <dianders@chromium.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Baolin Wang <baolin.wang7@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Douglas Anderson <dianders@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

65c76369

29 6月, 2020 1 次提交

blk-mq: remove the BLK_MQ_REQ_INTERNAL flag · 42fdc5e4

由 Christoph Hellwig 提交于 6月 29, 2020

Just check for a non-NULL elevator directly to make the code more clear.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

42fdc5e4

24 6月, 2020 2 次提交

blk-mq: add a new blk_mq_complete_request_remote API · 40d09b53

由 Christoph Hellwig 提交于 6月 11, 2020

This is a variant of blk_mq_complete_request_remote that only completes
the request if it needs to be bounced to another CPU or a softirq.  If
the request can be completed locally the function returns false and lets
the driver complete it without requring and indirect function call.
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

40d09b53

blk-mq: move failure injection out of blk_mq_complete_request · 15f73f5b

由 Christoph Hellwig 提交于 6月 11, 2020

Move the call to blk_should_fake_timeout out of blk_mq_complete_request
and into the drivers, skipping call sites that are obvious error
handlers, and remove the now superflous blk_mq_force_complete_rq helper.
This ensures we don't keep injecting errors into completions that just
terminate the Linux request after the hardware has been reset or the
command has been aborted.
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

15f73f5b

30 5月, 2020 2 次提交

blk-mq: drain I/O when all CPUs in a hctx are offline · bf0beec0

由 Ming Lei 提交于 5月 29, 2020

Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"That was the constraint of managed interrupts from the very beginning:

 The driver/subsystem has to quiesce the interrupt line and the associated
 queue _before_ it gets shutdown in CPU unplug and not fiddle with it
 until it's restarted by the core when the CPU is plugged in again."

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown.  Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.

Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests.  This guarantees that there is no inflight I/O before shutting
down the managed IRQ.

Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

[hch: different retry mechanism, merged two patches, minor cleanups]
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf0beec0

blk-mq: blk-mq: provide forced completion method · 7b11eab0

由 Keith Busch 提交于 5月 29, 2020

Drivers may need to bypass error injection for error recovery. Rename
__blk_mq_complete_request() to blk_mq_force_complete_rq() and export
that function so drivers may skip potential fake timeouts after they've
reclaimed lost requests.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b11eab0

25 4月, 2020 1 次提交

block: bypass ->make_request_fn for blk-mq drivers · 8cf7961d

由 Christoph Hellwig 提交于 4月 25, 2020

Call blk_mq_make_request when no ->make_request_fn is set.  This is
safe now that blk_alloc_queue always sets up the pointer for make_request
based drivers.  This avoids an indirect call in the blk-mq driver I/O
fast path, which is rather expensive due to spectre mitigations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cf7961d

21 4月, 2020 1 次提交

blk-mq: Add blk_mq_delay_run_hw_queues() API call · b9151e7b

由 Douglas Anderson 提交于 4月 20, 2020

We have:
* blk_mq_run_hw_queue()
* blk_mq_delay_run_hw_queue()
* blk_mq_run_hw_queues()

...but not blk_mq_delay_run_hw_queues(), presumably because nobody
needed it before now.  Since we need it for a later patch in this
series, add it.
Signed-off-by: NDouglas Anderson <dianders@chromium.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b9151e7b

19 4月, 2020 1 次提交

blk-mq: Replace zero-length array with flexible-array member · f36aaf8b

由 Gustavo A. R. Silva 提交于 3月 23, 2020

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>

f36aaf8b

28 3月, 2020 1 次提交

block: add a blk_mq_init_queue_data helper · 2f227bb9

由 Christoph Hellwig 提交于 3月 27, 2020

This allows a driver to pass a queuedata member before ->init_hctx is
called.  null_blk currently open codes this logic, but I'd rather have
it in the core to ease future maintainance.
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2f227bb9

10 3月, 2020 1 次提交

blk-mq: Fix a comment in include/linux/blk-mq.h · 2dd209f0

由 Bart Van Assche 提交于 3月 09, 2020

The 'hctx_list' member of struct blk_mq_hw_ctx is not a list head but
instead an entry in q->unused_hctx_list. Fix the comment above this
struct member.

Fixes: d386732b ("blk-mq: fill header with kernel-doc")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: André Almeida <andrealmeid@collabora.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2dd209f0

14 11月, 2019 1 次提交

blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue() · cb711b91

由 John Garry 提交于 11月 14, 2019

These functions are not referenced, so delete them.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cb711b91

01 11月, 2019 1 次提交

blk-mq: Make blk_mq_run_hw_queue() return void · 626fb735

由 John Garry 提交于 10月 30, 2019

Since commit 97889f9a ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()"), the return value of blk_mq_run_hw_queue()
is never checked, so make it return void, which very marginally simplifies
the code.
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

626fb735

26 10月, 2019 1 次提交

blk-mq: fill header with kernel-doc · d386732b

由 André Almeida 提交于 10月 21, 2019

Insert documentation for structs, enums and functions at header file.
Format existing and new comments at struct blk_mq_ops as
kernel-doc comments.
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NAndré Almeida <andrealmeid@collabora.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d386732b

07 10月, 2019 2 次提交

blk-mq: Inline status checkers · 27a46989

由 Pavel Begunkov 提交于 9月 30, 2019

blk_mq_request_completed() and blk_mq_request_started() are
short, inline it.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27a46989

block: Document all members of blk_mq_tag_set and bkl_mq_queue_map · 7a18312c

由 Bart Van Assche 提交于 9月 30, 2019

The meaning of several member variables of these two data structures is
nontrivial. Hence document all member variables using the kernel-doc
syntax.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7a18312c

06 9月, 2019 1 次提交

block: Delay default elevator initialization · 737eb78e

由 Damien Le Moal 提交于 9月 05, 2019

When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet for most drivers. The device type and elevator required features
are not set yet, preventing to correctly select the default elevator
most suitable for the device.

This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.

Fix this by adding the boolean elevator_init argument to
blk_mq_init_allocated_queue() to control the execution of
elevator_init_mq(). Two cases exist:
1) elevator_init = false is used for calls to
   blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
   case, a call to elevator_init_mq() is added to __device_add_disk(),
   resulting in the delayed initialization of the queue elevator
   after the device driver finished probing the device information. This
   effectively allows elevator_init_mq() access to more information
   about the device.
2) elevator_init = true preserves the current behavior of initializing
   the elevator directly from blk_mq_init_allocated_queue(). This case
   is used for the special request based DM devices where the device
   gendisk is created before the queue initialization and device
   information (e.g. queue limits) is already known when the queue
   initialization is executed.

Additionally, to make sure that the elevator initialization is never
done while requests are in-flight (there should be none when the device
driver calls device_add_disk()), freeze and quiesce the device request
queue before calling blk_mq_init_sched() in elevator_init_mq().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

737eb78e

28 8月, 2019 1 次提交

block: Remove blk_mq_register_dev() · 9685b227

由 Bart Van Assche 提交于 8月 27, 2019

This function has no callers. Hence remove it.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9685b227

05 8月, 2019 4 次提交

blk-mq: add callback of .cleanup_rq · 226b4fc7

由 Ming Lei 提交于 7月 25, 2019

SCSI maintains its own driver private data hooked off of each SCSI
request, and the pridate data won't be freed after scsi_queue_rq()
returns BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE. An upper layer driver
(e.g. dm-rq) may need to retry these SCSI requests, before SCSI has
fully dispatched them, due to a lower level SCSI driver's resource
limitation identified in scsi_queue_rq(). Currently SCSI's per-request
private data is leaked when the upper layer driver (dm-rq) frees and
then retries these requests in response to BLK_STS_RESOURCE or
BLK_STS_DEV_RESOURCE returns from scsi_queue_rq().

This usecase is so specialized that it doesn't warrant training an
existing blk-mq interface (e.g. blk_mq_free_request) to allow SCSI to
account for freeing its driver private data -- doing so would add an
extra branch for handling a special case that all other consumers of
SCSI (and blk-mq) won't ever need to worry about.

So the most pragmatic way forward is to delegate freeing SCSI driver
private data to the upper layer driver (dm-rq).  Do so by adding
new .cleanup_rq callback and calling a new blk_mq_cleanup_rq() method
from dm-rq.  A following commit will implement the .cleanup_rq() hook
in scsi_mq_ops.

Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

226b4fc7

blk-mq: remove blk_mq_complete_request_sync · a87ccce0

由 Ming Lei 提交于 7月 24, 2019

blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a87ccce0

blk-mq: introduce blk_mq_tagset_wait_completed_request() · f9934a80

由 Ming Lei 提交于 7月 24, 2019

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9934a80

blk-mq: introduce blk_mq_request_completed() · aa306ab7

由 Ming Lei 提交于 7月 24, 2019

NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.

So introduce it.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aa306ab7

21 6月, 2019 1 次提交

block: remove the bi_phys_segments field in struct bio · 14ccb66b

由 Christoph Hellwig 提交于 6月 06, 2019

We only need the number of segments in the blk-mq submission path.
Remove the field from struct bio, and return it from a variant of
blk_queue_split instead of that it can passed as an argument to
those functions that need the value.

This also means we stop recounting segments except for cloning
and partial segments.

To keep the number of arguments in this how path down remove
pointless struct request_queue arguments from any of the functions
that had it and grew a nr_segs argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

14ccb66b

04 5月, 2019 1 次提交

blk-mq: always free hctx after request queue is freed · 2f8f1336

由 Ming Lei 提交于 4月 30, 2019

In normal queue cleanup path, hctx is released after request queue
is freed, see blk_mq_release().

However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
of hw queues shrinking. This way is easy to cause use-after-free,
because: one implicit rule is that it is safe to call almost all block
layer APIs if the request queue is alive; and one hctx may be retrieved
by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
finally use-after-free is triggered.

Fixes this issue by always freeing hctx after releasing request queue.
If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
a per-queue list to hold them, then try to resuse these hctxs if numa
node is matched.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2f8f1336

10 4月, 2019 1 次提交

blk-mq: introduce blk_mq_complete_request_sync() · 1b8f21b7

由 Ming Lei 提交于 4月 09, 2019

In NVMe's error handler, follows the typical steps of tearing down
hardware for recovering controller:

1) stop blk_mq hw queues
2) stop the real hw queues
3) cancel in-flight requests via
	blk_mq_tagset_busy_iter(tags, cancel_request, ...)
cancel_request():
	mark the request as abort
	blk_mq_complete_request(req);
4) destroy real hw queues

However, there may be race between #3 and #4, because blk_mq_complete_request()
may run q->mq_ops->complete(rq) remotelly and asynchronously, and
->complete(rq) may be run after #4.

This patch introduces blk_mq_complete_request_sync() for fixing the
above race.

Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: James Smart <james.smart@broadcom.com>
Cc: linux-nvme@lists.infradead.org
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1b8f21b7

21 3月, 2019 1 次提交

block: Unexport blk_mq_add_to_requeue_list() · e6c98712

由 Bart Van Assche 提交于 3月 20, 2019

This function is not used outside the block layer core. Hence unexport it.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e6c98712

19 3月, 2019 1 次提交

blk-mq: remove unused 'nr_expired' from blk_mq_hw_ctx · 9496c015

由 Dongli Zhang 提交于 3月 19, 2019

There is no usage of 'nr_expired'.

The 'nr_expired' was introduced by commit 1d9bd516 ("blk-mq: replace
timeout synchronization with a RCU and generation based scheme"). Its usage
was removed since commit 12f5b931 ("blk-mq: Remove generation
seqeunce").
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9496c015

15 2月, 2019 1 次提交

block: kill BLK_MQ_F_SG_MERGE · 56d18f62

由 Ming Lei 提交于 2月 15, 2019

QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56d18f62

19 12月, 2018 1 次提交

block: make request_to_qc_t public · 7b7ab780

由 Sagi Grimberg 提交于 12月 14, 2018

block consumers will need it for polling requests that
are sent with blk_execute_rq_nowait. Also, get rid of
blk_tag_to_qc_t and open-code it instead.
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7b7ab780

18 12月, 2018 1 次提交

blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c

由 Jens Axboe 提交于 12月 17, 2018

There's a single user of this function, dm, and dm just wants
to check if IO is inflight, not that it's just allocated.

This fixes a hang with srp/002 in blktests with dm, where it tries
to suspend but waits for inflight IO to finish first. As it checks
for just allocated requests, this fails.
Tested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c94d83c

05 12月, 2018 1 次提交

block: move queues types to the block layer · e20ba6e1

由 Christoph Hellwig 提交于 12月 02, 2018

Having another indirect all in the fast path doesn't really help
in our post-spectre world.  Also having too many queue type is just
going to create confusion, so I'd rather manage them centrally.

Note that the queue type naming and ordering changes a bit - the
first index now is the default queue for everything not explicitly
marked, the optional ones are read and poll queues.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e20ba6e1

30 11月, 2018 1 次提交

blk-mq: add mq_ops->commit_rqs() · d666ba98

由 Jens Axboe 提交于 11月 27, 2018

blk-mq passes information to the hardware about any given request being
the last that we will issue in this sequence. The point is that hardware
can defer costly doorbell type writes to the last request. But if we run
into errors issuing a sequence of requests, we may never send the request
with bd->last == true set. For that case, we need a hook that tells the
hardware that nothing else is coming right now.

For failures returned by the drivers ->queue_rq() hook, the driver is
responsible for flushing pending requests, if it uses bd->last to
optimize that part. This works like before, no changes there.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d666ba98

27 11月, 2018 2 次提交

blk-mq: Simplify request completion state · af78ff7c

由 Keith Busch 提交于 11月 26, 2018

There are no more users relying on blk-mq request states to prevent
double completions, so replace the relatively expensive cmpxchg operation
with WRITE_ONCE.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af78ff7c

blk-mq: Return true if request was completed · 16c15eb1

由 Keith Busch 提交于 11月 26, 2018

A driver may have internal state to cleanup if we're pretending a request
didn't complete. Return 'false' if the command wasn't actually completed
due to the timeout error injection, and true otherwise.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16c15eb1

26 11月, 2018 1 次提交

blk-mq: remove 'tag' parameter from mq_ops->poll() · 9743139c

由 Jens Axboe 提交于 11月 16, 2018

We always pass in -1 now and none of the callers use the tag value,
remove the parameter.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9743139c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功