提交 · a7e7388dced47a10ca13ae95ca975ea2830f196b · openeuler / Kernel

18 10月, 2021 1 次提交

blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap() · a7e7388d

由 John Garry 提交于 10月 05, 2021

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/1633429419-228500-10-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a7e7388d

13 9月, 2021 1 次提交

blk-mq: avoid to iterate over stale request · 67f3b2f8

由 Ming Lei 提交于 9月 06, 2021

blk-mq can't run allocating driver tag and updating ->rqs[tag]
atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver
tag is released.

So there is chance to iterating over one stale request just after the
tag is allocated and before updating ->rqs[tag].

scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
in-flight requests after scsi host is blocked, so no new scsi command can
be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can
be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT,
but this request may have been kept in another slot of ->rqs[], meantime
the slot can be allocated out but ->rqs[] isn't updated yet. Then this
in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes
trouble in handling scsi error.

Fixes the issue by not iterating over stale request.

Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Reported-by: Nluojiaxing <luojiaxing@huawei.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210906065003.439019-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

67f3b2f8

24 5月, 2021 4 次提交

blk-mq: Use request queue-wide tags for tagset-wide sbitmap · d97e594c

由 John Garry 提交于 5月 13, 2021

The tags used for an IO scheduler are currently per hctx.

As such, when q->nr_hw_queues grows, so does the request queue total IO
scheduler tag depth.

This may cause problems for SCSI MQ HBAs whose total driver depth is
fixed.

Ming and Yanhui report higher CPU usage and lower throughput in scenarios
where the fixed total driver tag depth is appreciably lower than the total
scheduler tag depth:
https://lore.kernel.org/linux-block/440dfcfc-1a2c-bd98-1161-cec4d78c6dfc@huawei.com/T/#mc0d6d4f95275a2743d1c8c3e4dc9ff6c9aa3a76b

In that scenario, since the scheduler tag is got first, much contention
is introduced since a driver tag may not be available after we have got
the sched tag.

Improve this scenario by introducing request queue-wide tags for when
a tagset-wide sbitmap is used. The static sched requests are still
allocated per hctx, as requests are initialised per hctx, as in
blk_mq_init_request(..., hctx_idx, ...) ->
set->ops->init_request(.., hctx_idx, ...).

For simplicity of resizing the request queue sbitmap when updating the
request queue depth, just init at the max possible size, so we don't need
to deal with the possibly with swapping out a new sbitmap for old if
we need to grow.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1620907258-30910-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d97e594c

blk-mq: Some tag allocation code refactoring · 56b68085

由 John Garry 提交于 5月 13, 2021

The tag allocation code to alloc the sbitmap pairs is common for regular
bitmaps tags and shared sbitmap, so refactor into a common function.

Also remove superfluous "flags" argument from blk_mq_init_shared_sbitmap().
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/1620907258-30910-2-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

56b68085

blk-mq: clear stale request in tags->rq[] before freeing one request pool · bd63141d

由 Ming Lei 提交于 5月 11, 2021

refcount_inc_not_zero() in bt_tags_iter() still may read one freed
request.

Fix the issue by the following approach:

1) hold a per-tags spinlock when reading ->rqs[tag] and calling
refcount_inc_not_zero in bt_tags_iter()

2) clearing stale request referred via ->rqs[tag] before freeing
request pool, the per-tags spinlock is held for clearing stale
->rq[tag]

So after we cleared stale requests, bt_tags_iter() won't observe
freed request any more, also the clearing will wait for pending
request reference.

The idea of clearing ->rqs[] is borrowed from John Garry's previous
patch and one recent David's patch.
Tested-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NDavid Jeffery <djeffery@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bd63141d

blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter · 2e315dc0

由 Ming Lei 提交于 5月 11, 2021

Grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter(), and
this way will prevent the request from being re-used when ->fn is
running. The approach is same as what we do during handling timeout.

Fix request use-after-free(UAF) related with completion race or queue
releasing:

- If one rq is referred before rq->q is frozen, then queue won't be
frozen before the request is released during iteration.

- If one rq is referred after rq->q is frozen, refcount_inc_not_zero()
will return false, and we won't iterate over this request.

However, still one request UAF not covered: refcount_inc_not_zero() may
read one freed request, and it will be handled in next patch.
Tested-by: NJohn Garry <john.garry@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210511152236.763464-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2e315dc0

06 4月, 2021 1 次提交

blk-mq: Always use blk_mq_is_sbitmap_shared · 39aa56db

由 Nikolay Borisov 提交于 3月 11, 2021

Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20210311081713.2763171-1-nborisov@suse.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

39aa56db

26 3月, 2021 1 次提交

blk-mq: Sentence reconstruct for better readability · 9cf1adc6

由 Bhaskar Chowdhury 提交于 3月 20, 2021

Sentence reconstruction for better readability.
Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9cf1adc6

29 9月, 2020 1 次提交

block-mq: fix comments in blk_mq_queue_tag_busy_iter · 76cffccd

由 yangerkun 提交于 9月 19, 2020

'f5bbbbe4 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")' introduce a bug what we may sleep between
rcu lock. Then '530ca2c9 ("blk-mq: Allow blocking queue tag iter
callbacks")' fix it by get request_queue's ref. And 'a9a80808 ("block:
Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()")'
remove the synchronize_rcu in __blk_mq_update_nr_hw_queues. We need
update the confused comments in blk_mq_queue_tag_busy_iter.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76cffccd

11 9月, 2020 1 次提交

blk-mq: always allow reserved allocation in hctx_may_queue · 28500850

由 Ming Lei 提交于 9月 11, 2020

NVMe shares tagset between fabric queue and admin queue or between
connect_q and NS queue, so hctx_may_queue() can be called to allocate
request for these queues.

Tags can be reserved in these tagset. Before error recovery, there is
often lots of in-flight requests which can't be completed, and new
reserved request may be needed in error recovery path. However,
hctx_may_queue() can always return false because there is too many
in-flight requests which can't be completed during error handling.
Finally, nothing can proceed.

Fix this issue by always allowing reserved tag allocation in
hctx_may_queue(). This is reasonable because reserved tags are supposed
to always be available.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Cc: David Milburn <dmilburn@redhat.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28500850

04 9月, 2020 5 次提交

blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap · f1b49fdc

由 John Garry 提交于 8月 19, 2020

For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.

Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.

Originally-from: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1b49fdc

blk-mq: Facilitate a shared sbitmap per tagset · 32bc15af

由 John Garry 提交于 8月 19, 2020

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec0
("blk-mq: drain I/O when all CPUs in a hctx are offline").

However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi:
core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.

This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32bc15af

blk-mq: Use pointers for blk_mq_tags bitmap tags · 222a5ae0

由 John Garry 提交于 8月 19, 2020

Introduce pointers for the blk_mq_tags regular and reserved bitmap tags,
with the goal of later being able to use a common shared tag bitmap across
all HW contexts in a set.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

222a5ae0

blk-mq: Pass flags for tag init/free · 1c0706a7

由 John Garry 提交于 8月 19, 2020

Pass hctx/tagset flags argument down to blk_mq_init_tags() and
blk_mq_free_tags() for selective init/free.

For now, make it include the alloc policy flag, which can be evaluated
when needed (in blk_mq_init_tags()).
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c0706a7

blk-mq: Free tags in blk_mq_init_tags() upon error · 4d063237

由 Hannes Reinecke 提交于 8月 19, 2020

Since the tags are allocated in blk_mq_init_tags(), it's better practice
to free in that same function upon error, rather than a callee which is to
init the bitmap tags (blk_mq_init_tags()).

[jpg: Split from an earlier patch with a new commit message]
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d063237

01 7月, 2020 1 次提交

blk-mq: move blk_mq_get_driver_tag into blk-mq.c · 570e9b73

由 Ming Lei 提交于 6月 30, 2020

blk_mq_get_driver_tag() is only used by blk-mq.c and is supposed to
stay in blk-mq.c, so move it and preparing for cleanup code of
get/put driver tag.

Meantime hctx_may_queue() is moved to header file and it is fine
since it is defined as inline always.

No functional change.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

570e9b73

29 6月, 2020 1 次提交

blk-mq: remove the BLK_MQ_REQ_INTERNAL flag · 42fdc5e4

由 Christoph Hellwig 提交于 6月 29, 2020

Just check for a non-NULL elevator directly to make the code more clear.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

42fdc5e4

15 6月, 2020 1 次提交

blk-mq: Remove redundant 'return' statement · a8a5e383

由 Baolin Wang 提交于 6月 15, 2020

The blk_mq_all_tag_iter() is a void function, thus remove
the redundant 'return' statement in this function.
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8a5e383

07 6月, 2020 2 次提交

blk-mq: fix blk_mq_all_tag_iter · 22f614bc

由 Ming Lei 提交于 6月 05, 2020

blk_mq_all_tag_iter() is added to iterate all requests, so we should
fetch the request from ->static_rqs][] instead of ->rqs[] which is for
holding in-flight request only.

Fix it by adding flag of BT_TAG_ITER_STATIC_RQS.

Fixes: bf0beec0 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NJohn Garry <john.garry@huawei.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

22f614bc

blk-mq: split out a __blk_mq_get_driver_tag helper · d94ecfc3

由 Christoph Hellwig 提交于 6月 05, 2020

Allocation of the driver tag in the case of using a scheduler shares very
little code with the "normal" tag allocation.  Split out a new helper to
streamline this path, and untangle it from the complex normal tag
allocation.

This way also avoids to fail driver tag allocation because of inactive hctx
during cpu hotplug, and fixes potential hang risk.

Fixes: bf0beec0 ("blk-mq: drain I/O when all CPUs in a hctx are offline")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJohn Garry <john.garry@huawei.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d94ecfc3

30 5月, 2020 4 次提交

blk-mq: drain I/O when all CPUs in a hctx are offline · bf0beec0

由 Ming Lei 提交于 5月 29, 2020

Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup
up queue mapping. Thomas mentioned the following point[1]:

"That was the constraint of managed interrupts from the very beginning:

 The driver/subsystem has to quiesce the interrupt line and the associated
 queue _before_ it gets shutdown in CPU unplug and not fiddle with it
 until it's restarted by the core when the CPU is plugged in again."

However, current blk-mq implementation doesn't quiesce hw queue before
the last CPU in the hctx is shutdown.  Even worse, CPUHP_BLK_MQ_DEAD is a
cpuhp state handled after the CPU is down, so there isn't any chance to
quiesce the hctx before shutting down the CPU.

Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs
where the last CPU goes away, and wait for completion of in-flight
requests.  This guarantees that there is no inflight I/O before shutting
down the managed IRQ.

Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need
to wait for completion of in-flight requests from these drivers to avoid
a potential dead-lock. It is safe to do this for stacking drivers as those
do not use interrupts at all and their I/O completions are triggered by
underlying devices I/O completion.

[1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/

[hch: different retry mechanism, merged two patches, minor cleanups]
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bf0beec0

blk-mq: add blk_mq_all_tag_iter · 602380d2

由 Ming Lei 提交于 5月 29, 2020

Add a new blk_mq_all_tag_iter function to iterate over all allocated
scheduler tags and driver tags.  This is more flexible than the existing
blk_mq_all_tag_busy_iter function as it allows the callers to do whatever
they want on allocated request instead of being limited to started
requests.

It will be used to implement draining allocated requests on specified
hctx in this patchset.

[hch: switch from the two booleans to a more readable flags field and
 consolidate the tags iter functions]
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NBart van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

602380d2

blk-mq: use BLK_MQ_NO_TAG in more places · 76647368

由 Christoph Hellwig 提交于 5月 29, 2020

Replace various magic -1 constants for tags with BLK_MQ_NO_TAG.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76647368

blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG · 419c3d5e

由 Christoph Hellwig 提交于 5月 29, 2020

To prepare for wider use of this constant give it a more applicable name.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

419c3d5e

27 2月, 2020 1 次提交

blk-mq: Remove some unused function arguments · cae740a0

由 John Garry 提交于 2月 26, 2020

The struct blk_mq_hw_ctx pointer argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.

Overall obj code size shows a minor reduction, before:
   text	   data	    bss	    dec	    hex	filename
  27306	   1312	      0	  28618	   6fca	block/blk-mq.o
   4303	    272	      0	   4575	   11df	block/blk-mq-tag.o

after:
  27282	   1312	      0	  28594	   6fb2	block/blk-mq.o
   4311	    272	      0	   4583	   11e7	block/blk-mq-tag.o
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
--
This minor patch had been carried as part of the blk-mq shared tags RFC,
I'd rather not carry it anymore as it required rebasing, so now or never..
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cae740a0

14 11月, 2019 1 次提交

blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue() · cb711b91

由 John Garry 提交于 11月 14, 2019

These functions are not referenced, so delete them.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cb711b91

05 8月, 2019 1 次提交

blk-mq: introduce blk_mq_tagset_wait_completed_request() · f9934a80

由 Ming Lei 提交于 7月 24, 2019

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9934a80

03 7月, 2019 1 次提交

blk-mq: remove blk_mq_put_ctx() · c05f4220

由 Bart Van Assche 提交于 7月 01, 2019

No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
on preemption being disabled for its correctness. Since removing the CPU
preemption calls does not measurably affect performance, simplify the
blk-mq code by removing the blk_mq_put_ctx() function and also by not
disabling preemption in blk_mq_get_ctx().

Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c05f4220

01 5月, 2019 1 次提交

block: add SPDX tags to block layer files missing licensing information · 3dcf60bc

由 Christoph Hellwig 提交于 4月 30, 2019

Various block layer files do not have any licensing information at all.
Add SPDX tags for the default kernel GPLv2 license to those.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3dcf60bc

01 2月, 2019 1 次提交

blk-mq: save queue mapping result into ctx directly · 8ccdf4a3

由 Jianchao Wang 提交于 1月 24, 2019

Currently, the queue mapping result is saved in a two-dimensional
array. In the hot path, to get a hctx, we need do following:

  q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]

This isn't very efficient. We could save the queue mapping result into
ctx directly with different hctx type, like,

  ctx->hctxs[type]
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ccdf4a3

01 12月, 2018 1 次提交

sbitmap: optimize wakeup check · 5d2ee712

由 Jens Axboe 提交于 11月 29, 2018

Even if we have no waiters on any of the sbitmap_queue wait states, we
still have to loop every entry to check. We do this for every IO, so
the cost adds up.

Shift a bit of the cost to the slow path, when we actually have waiters.
Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain
an internal count of how many are currently active. Then we can simply
check this count in sbq_wake_ptr() and not have to loop if we don't
have any sleepers.

Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d2ee712

09 11月, 2018 2 次提交

blk-mq-tag: document tag iteration helper return value · ab11fe5a

由 Jens Axboe 提交于 11月 08, 2018

Document the fact that the strategy function passed in can
control whether to continue iterating or not.
Suggested-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ab11fe5a

blk-mq-tag: change busy_iter_fn to return whether to continue or not · 7baa8572

由 Jens Axboe 提交于 11月 08, 2018

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7baa8572

08 11月, 2018 3 次提交

blk-mq: cache request hardware queue mapping · ea4f995e

由 Jens Axboe 提交于 10月 29, 2018

We call blk_mq_map_queue() a lot, at least two times for each
request per IO, sometimes more. Since we now have an indirect
call as well in that function. cache the mapping so we don't
have to re-call blk_mq_map_queue() for the same request
multiple times.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ea4f995e

blk-mq: pass in request/bio flags to queue mapping · f9afca4d

由 Jens Axboe 提交于 10月 29, 2018

Prep patch for being able to place request based not just on
CPU location, but also on the type of request.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9afca4d

block: remove legacy rq tagging · 7ca01926

由 Jens Axboe 提交于 10月 24, 2018

It's now unused, kill it.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ca01926

26 9月, 2018 1 次提交

blk-mq: Allow blocking queue tag iter callbacks · 530ca2c9

由 Keith Busch 提交于 9月 25, 2018

A recent commit runs tag iterator callbacks under the rcu read lock,
but existing callbacks do not satisfy the non-blocking requirement.
The commit intended to prevent an iterator from accessing a queue that's
being modified. This patch fixes the original issue by taking a queue
reference instead of reading it, which allows callbacks to make blocking
calls.

Fixes: f5bbbbe4 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter")
Acked-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

530ca2c9

22 9月, 2018 1 次提交

blk-mq: Document the functions that iterate over requests · c7b1bf5c

由 Bart Van Assche 提交于 9月 21, 2018

Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix several minor spelling
and grammer mistakes in comments in these functions.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7b1bf5c

21 8月, 2018 1 次提交

blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter · f5bbbbe4

由 Jianchao Wang 提交于 8月 21, 2018

For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
account the inflight requests. It will access the queue_hw_ctx and
nr_hw_queues w/o any protection. When updating nr_hw_queues and
blk_mq_in_flight/rw occur concurrently, panic comes up.

Before update nr_hw_queues, the q will be frozen. So we could use
q_usage_counter to avoid the race. percpu_ref_is_zero is used here
so that we will not miss any in-flight request. The access to
nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
under rcu critical section, __blk_mq_update_nr_hw_queues could use
synchronize_rcu to ensure the zeroed q_usage_counter to be globally
visible.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5bbbbe4

09 8月, 2018 1 次提交

blk-mq: count the hctx as active before allocating tag · d263ed99

由 Jianchao Wang 提交于 8月 09, 2018

Currently, we count the hctx as active after allocate driver tag
successfully. If a previously inactive hctx try to get tag first
time, it may fails and need to wait. However, due to the stale tag
->active_queues, the other shared-tags users are still able to
occupy all driver tags while there is someone waiting for tag.
Consequently, even if the previously inactive hctx is waked up, it
still may not be able to get a tag and could be starved.

To fix it, we count the hctx as active before try to allocate driver
tag, then when it is waiting the tag, the other shared-tag users
will reserve budget for it.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d263ed99

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功