提交 · f91328c40a559362b6e7b7bfee01ca17fda87592 · openanolis / cloud-kernel

02 3月, 2017 2 次提交

blk-mq: Provide freeze queue timeout · f91328c4

由 Keith Busch 提交于 3月 01, 2017

A driver may wish to take corrective action if queued requests do not
complete within a set time.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f91328c4

blk-mq: Export blk_mq_freeze_queue_wait · 6bae363e

由 Keith Busch 提交于 3月 01, 2017

Drivers can start a freeze, so this provides a way to wait for frozen.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

6bae363e

24 2月, 2017 1 次提交

blk-mq: use sbq wait queues instead of restart for driver tags · da55f2cc

由 Omar Sandoval 提交于 2月 22, 2017

Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware
queues and shared tags") fixed one starvation issue for shared tags.
However, we can still get into a situation where we fail to allocate a
tag because all tags are allocated but we don't have any pending
requests on any hardware queue.

One solution for this would be to restart all queues that share a tag
map, but that really sucks. Ideally, we could just block and wait for a
tag, but that isn't always possible from blk_mq_dispatch_rq_list().

However, we can still use the struct sbitmap_queue wait queues with a
custom callback instead of blocking. This has a few benefits:

1. It avoids iterating over all hardware queues when completing an I/O,
   which the current restart code has to do.
2. It benefits from the existing rolling wakeup code.
3. It avoids punting to another thread just to have it block.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

da55f2cc

18 1月, 2017 3 次提交

blk-mq-sched: allow setting of default IO scheduler · d3484991

由 Jens Axboe 提交于 1月 13, 2017

Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

d3484991

blk-mq-sched: add framework for MQ capable IO schedulers · bd166ef1

由 Jens Axboe 提交于 1月 17, 2017

This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

bd166ef1

blk-mq: un-export blk_mq_free_hctx_request() · 16a3c2a7

由 Jens Axboe 提交于 12月 15, 2016

It's only used in blk-mq, kill it from the main exported header
and kill the symbol export as well.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

16a3c2a7

12 1月, 2017 1 次提交

blk-mq: make mq_ops a const pointer · f8a5b122

由 Jens Axboe 提交于 12月 13, 2016

We never change it, make that clear.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>

f8a5b122

10 12月, 2016 1 次提交

blk-mq: add blk_mq_start_stopped_hw_queue() · ae911c5e

由 Jens Axboe 提交于 12月 08, 2016

We have a variant for all hardware queues, but not one for a single
hardware queue.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>

ae911c5e

09 11月, 2016 1 次提交

blk-mq: export blk_mq_map_queues · 9e5a7e22

由 Christoph Hellwig 提交于 11月 01, 2016

This will allow SCSI to have a single blk_mq_ops structure that either
lets the LLDD map the queues to PCIe MSIx vectors or use the default.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9e5a7e22

03 11月, 2016 4 次提交

blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() · 2b053aca

由 Bart Van Assche 提交于 10月 28, 2016

Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

2b053aca

blk-mq: Introduce blk_mq_quiesce_queue() · 6a83e74d

由 Bart Van Assche 提交于 11月 02, 2016

blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
  followed by synchronize_srcu() or synchronize_rcu(). The latter
  call waits for .queue_rq() invocations that started before
  blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  blk_mq_quiesce_queue().
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6a83e74d

blk-mq: Remove blk_mq_cancel_requeue_work() · 9b7dd572

由 Bart Van Assche 提交于 10月 28, 2016

Since blk_mq_requeue_work() no longer restarts stopped queues
canceling requeue work is no longer needed to prevent that a
stopped queue would be restarted. Hence remove this function.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b7dd572

blk-mq: Introduce blk_mq_queue_stopped() · fd001443

由 Bart Van Assche 提交于 10月 28, 2016

The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fd001443

23 9月, 2016 1 次提交

blk-mq: add flag for drivers wanting blocking ->queue_rq() · 1b792f2f

由 Jens Axboe 提交于 9月 21, 2016

If a driver sets BLK_MQ_F_BLOCKING, it is allowed to block in its
->queue_rq() handler. For that case, blk-mq ensures that we always
calls it from a safe context.
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-by: NJosef Bacik <jbacik@fb.com>

1b792f2f

22 9月, 2016 1 次提交

blk-mq/cpu-notif: Convert to new hotplug state machine · 9467f859

由 Thomas Gleixner 提交于 9月 22, 2016

Replace the block-mq notifier list management with the multi instance
facility in the cpu hotplug state machine.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-block@vger.kernel.org
Cc: rt@linutronix.de
Cc: Christoph Hellwing <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9467f859

21 9月, 2016 1 次提交

blk-mq: register device instead of disk · b21d5b30

由 Matias Bjørling 提交于 9月 16, 2016

Enable devices without a gendisk instance to register itself with blk-mq
and expose the associated multi-queue sysfs entries.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

b21d5b30

17 9月, 2016 1 次提交

blk-mq: abstract tag allocation out into sbitmap library · 88459642

由 Omar Sandoval 提交于 9月 17, 2016

This is a generally useful data structure, so make it available to
anyone else who might want to use it. It's also a nice cleanup
separating the allocation logic from the rest of the tag handling logic.

The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
selected by CONFIG_BLOCK for now.

This should be a complete noop functionality-wise.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

88459642

15 9月, 2016 5 次提交

blk-mq: get rid of the cpumask in struct blk_mq_tags · 1b157939

由 Christoph Hellwig 提交于 9月 14, 2016

Unused now that NVMe sets up irq affinity before calling into blk-mq.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1b157939

blk-mq: allow the driver to pass in a queue mapping · da695ba2

由 Christoph Hellwig 提交于 9月 14, 2016

This allows drivers specify their own queue mapping by overriding the
setup-time function that builds the mq_map.  This can be used for
example to build the map based on the MSI-X vector mapping provided
by the core interrupt layer for PCI devices.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

da695ba2

blk-mq: remove ->map_queue · 7d7e0f90

由 Christoph Hellwig 提交于 9月 14, 2016

All drivers use the default, so provide an inline version of it.  If we
ever need other queue mapping we can add an optional method back,
although supporting will also require major changes to the queue setup
code.

This provides better code generation, and better debugability as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7d7e0f90

blk-mq: only allocate a single mq_map per tag_set · bdd17e75

由 Christoph Hellwig 提交于 9月 14, 2016

The mapping is identical for all queues in a tag_set, so stop wasting
memory for building multiple.  Note that for now I've kept the mq_map
pointer in the request_queue, but we'll need to investigate if we can
remove it without suffering too much from the additional pointer chasing.
The same would apply to the mq_ops pointer as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

bdd17e75

blk-mq: introduce blk_mq_delay_kick_requeue_list() · 2849450a

由 Mike Snitzer 提交于 9月 14, 2016

blk_mq_delay_kick_requeue_list() provides the ability to kick the
q->requeue_list after a specified time.  To do this the request_queue's
'requeue_work' member was changed to a delayed_work.

blk_mq_delay_kick_requeue_list() allows DM to defer processing requeued
requests while it doesn't make sense to immediately requeue them
(e.g. when all paths in a DM multipath have failed).
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2849450a

14 9月, 2016 2 次提交

block: remove blk_mq_alloc_single_hw_queue() prototype · abe47114

由 Linus Walleij 提交于 9月 14, 2016

The blk_mq_alloc_single_hw_queue() is a prototype artifact that
should have been removed with
commit cdef54dd
"blk-mq: remove alloc_hctx and free_hctx methods" where the last
users of it were deleted.

Fixes: cdef54dd ("blk-mq: remove alloc_hctx and free_hctx methods")
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

abe47114

block: add poll_considered statistic · 6e219353

由 Stephen Bates 提交于 9月 13, 2016

In order to help determine the effectiveness of polling in a running
system it is usful to determine the ratio of how often the poll
function is called vs how often the completion is checked. For this
reason we add a poll_considered variable and add it to the sysfs entry
for io_poll.
Signed-off-by: NStephen Bates <sbates@raithlin.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6e219353

29 8月, 2016 2 次提交

blk-mq: improve layout of blk_mq_hw_ctx · 8d354f13

由 Jens Axboe 提交于 8月 25, 2016

Various cache line optimizations:

- Move delay_work towards the end. It's huge, and we don't use it
  a lot (only SCSI).

- Move the atomic state into the same cacheline as the the dispatch
  list and lock.

- Rearrange a few members to pack it better.

- Shrink the max-order for dispatch accounting from 10 to 7. This
  means that ->dispatched[] and ->run now take up their own
  cacheline.

This shrinks struct blk_mq_hw_ctx down to 8 cachelines.
Signed-off-by: NJens Axboe <axboe@fb.com>

8d354f13

blk-mq: turn hctx->run_work into a regular work struct · 27489a3c

由 Jens Axboe 提交于 8月 24, 2016

We don't need the larger delayed work struct, since we always run it
immediately.
Signed-off-by: NJens Axboe <axboe@fb.com>

27489a3c

08 7月, 2016 1 次提交

blk-mq: Introduce blk_mq_reinit_tagset · 486cf989

由 Sagi Grimberg 提交于 7月 06, 2016

The new nvme-rdma driver will need to reinitialize all the tags as part of
the error recovery procedure (realloc the tag memory region). Add a helper
in blk-mq for it that can iterate over all requests in a tagset to make
this easier.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NMing Lin <ming.l@ssi.samsung.com>
Reviewed-by: NStephen Bates <Stephen.Bates@pmcs.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

486cf989

06 7月, 2016 1 次提交

blk-mq: add blk_mq_alloc_request_hctx · 1f5bd336

由 Ming Lin 提交于 6月 13, 2016

For some protocols like NVMe over Fabrics we need to be able to send
initialization commands to a specific queue.

Based on an earlier patch from Christoph Hellwig <hch@lst.de>.
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
[hch: disallow sleeping allocation, req_op fixes]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1f5bd336

13 4月, 2016 2 次提交

blk-mq: Make blk_mq_all_tag_busy_iter static · e8f1e163

由 Sagi Grimberg 提交于 3月 10, 2016

No caller outside the blk-mq code so we can settle
with it static.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e8f1e163

blk-mq: Export tagset iter function · e0489487

由 Sagi Grimberg 提交于 3月 10, 2016

Its useful to iterate on all the active tags in cases
where we will need to fail all the queues IO.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
[hch: carefully check for valid tagsets]
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e0489487

20 3月, 2016 1 次提交

blk-mq: Use proper cpumask iterator · 897bb0c7

由 Thomas Gleixner 提交于 3月 19, 2016

queue_for_each_ctx() iterates over per_cpu variables under the assumption that
the possible cpu mask cannot have holes. That's wrong as all cpumasks can have
holes. In case there are holes the iteration ends up accessing uninitialized
memory and crashing as a result.

Replace the macro by a proper for_each_possible_cpu() loop and drop the unused
macro blk_ctx_sum() which references queue_for_each_ctx().
Reported-by: NXiong Zhou <jencce.kernel@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

897bb0c7

10 2月, 2016 1 次提交

blk-mq: dynamic h/w context count · 868f2f0b

由 Keith Busch 提交于 12月 17, 2015

The hardware's provided queue count may change at runtime with resource
provisioning. This patch allows a block driver to alter the number of
h/w queues available when its resource count changes.

The main part is a new blk-mq API to request a new number of h/w queues
for a given live tag set. The new API freezes all queues using that set,
then adjusts the allocated count prior to remapping these to CPUs.

The bulk of the rest just shifts where h/w contexts and all their
artifacts are allocated and freed.

The number of max h/w contexts is capped to the number of possible cpus
since there is no use for more than that. As such, all pre-allocated
memory for pointers need to account for the max possible rather than
the initial number of queues.

A side effect of this is that the blk-mq will proceed successfully as
long as it can allocate at least one h/w context. Previously it would
fail request queue initialization if less than the requested number
was allocated.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

868f2f0b

02 12月, 2015 1 次提交

blk-mq: add a flags parameter to blk_mq_alloc_request · 6f3b0e8b

由 Christoph Hellwig 提交于 11月 26, 2015

We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t.  Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6f3b0e8b

08 11月, 2015 1 次提交

block: add block polling support · 05229bee

由 Jens Axboe 提交于 11月 05, 2015

Add basic support for polling for specific IO to complete. This uses
the cookie that blk-mq passes back, which enables the block layer
to pass this cookie to the driver to spin for a specific request.

This will be combined with request latency tracking, so we can make
qualified decisions about when to poll and when not to. For now, for
benchmark purposes, we add a sysfs file that controls whether polling
is enabled or not.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NKeith Busch <keith.busch@intel.com>

05229bee

22 10月, 2015 1 次提交

block: generic request_queue reference counting · 3ef28e83

由 Dan Williams 提交于 10月 21, 2015

Allow pmem, and other synchronous/bio-based block drivers, to fallback
on a per-cpu reference count managed by the core for tracking queue
live/dead state.

The existing per-cpu reference count for the blk_mq case is promoted to
be used in all block i/o scenarios.  This involves initializing it by
default, waiting for it to drop to zero at exit, and holding a live
reference over the invocation of q->make_request_fn() in
generic_make_request().  The blk_mq code continues to take its own
reference per blk_mq request and retains the ability to freeze the
queue, but the check that the queue is frozen is moved to
generic_make_request().

This fixes crash signatures like the following:

 BUG: unable to handle kernel paging request at ffff880140000000
 [..]
 Call Trace:
  [<ffffffff8145e8bf>] ? copy_user_handle_tail+0x5f/0x70
  [<ffffffffa004e1e0>] pmem_do_bvec.isra.11+0x70/0xf0 [nd_pmem]
  [<ffffffffa004e331>] pmem_make_request+0xd1/0x200 [nd_pmem]
  [<ffffffff811c3162>] ? mempool_alloc+0x72/0x1a0
  [<ffffffff8141f8b6>] generic_make_request+0xd6/0x110
  [<ffffffff8141f966>] submit_bio+0x76/0x170
  [<ffffffff81286dff>] submit_bh_wbc+0x12f/0x160
  [<ffffffff81286e62>] submit_bh+0x12/0x20
  [<ffffffff813395bd>] jbd2_write_superblock+0x8d/0x170
  [<ffffffff8133974d>] jbd2_mark_journal_empty+0x5d/0x90
  [<ffffffff813399cb>] jbd2_journal_destroy+0x24b/0x270
  [<ffffffff810bc4ca>] ? put_pwq_unlocked+0x2a/0x30
  [<ffffffff810bc6f5>] ? destroy_workqueue+0x225/0x250
  [<ffffffff81303494>] ext4_put_super+0x64/0x360
  [<ffffffff8124ab1a>] generic_shutdown_super+0x6a/0xf0

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3ef28e83

01 10月, 2015 2 次提交

blk-mq: factor out a helper to iterate all tags for a request_queue · 0bf6cd5b

由 Christoph Hellwig 提交于 9月 27, 2015

And replace the blk_mq_tag_busy_iter with it - the driver use has been
replaced with a new helper a while ago, and internal to the block we
only need the new version.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0bf6cd5b

blk-mq: fix racy updates of rq->errors · f4829a9b

由 Christoph Hellwig 提交于 9月 27, 2015

blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.

Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f4829a9b

30 9月, 2015 1 次提交

blk-mq: fix sysfs registration/unregistration race · 4593fdbe

由 Akinobu Mita 提交于 9月 27, 2015

There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.

null_add_dev
    --> blk_mq_init_queue
        --> blk_mq_init_allocated_queue
            --> add to 'all_q_list' (*)
    --> add_disk
        --> blk_register_queue
            --> blk_mq_register_disk (++)

null_del_dev
    --> del_gendisk
        --> blk_unregister_queue
            --> blk_mq_unregister_disk (--)
    --> blk_cleanup_queue
        --> blk_mq_free_queue
            --> del from 'all_q_list' (*)

blk_mq_queue_reinit
    --> blk_mq_sysfs_unregister (-)
    --> blk_mq_sysfs_register (+)

While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback.  But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished.  Because '/sys/block/*/mq/' is not exists.

There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.

In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking.  So this introduces
q->mq_sysfs_init_done which is properly protected with all_q_mutex.

Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held.  Since hctx->nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4593fdbe

02 6月, 2015 1 次提交

blk-mq: Shared tag enhancements · f26cdc85

由 Keith Busch 提交于 6月 01, 2015

Storage controllers may expose multiple block devices that share hardware
resources managed by blk-mq. This patch enhances the shared tags so a
low-level driver can access the shared resources not tied to the unshared
h/w contexts. This way the LLD can dynamically add and delete disks and
request queues without having to track all the request_queue hctx's to
iterate outstanding tags.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f26cdc85

17 4月, 2015 1 次提交

blk-mq: fix iteration of busy bitmap · 569fd0ce

由 Jens Axboe 提交于 4月 17, 2015

Commit 889fa31f was a bit too eager in reducing the loop count,
so we ended up missing queues in some configurations. Ensure that
our division rounds up, so that's not the case.
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Fixes: 889fa31f ("blk-mq: reduce unnecessary software queue looping")
Signed-off-by: NJens Axboe <axboe@fb.com>

569fd0ce

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功