提交 · 79f720a751cad613620d0237e3b44f89f4a69181 · openanolis / cloud-kernel

11 11月, 2017 2 次提交

blk-mq: only run the hardware queue if IO is pending · 79f720a7

由 Jens Axboe 提交于 11月 10, 2017

Currently we are inconsistent in when we decide to run the queue. Using
blk_mq_run_hw_queues() we check if the hctx has pending IO before
running it, but we don't do that from the individual queue run function,
blk_mq_run_hw_queue(). This results in a lot of extra and pointless
queue runs, potentially, on flush requests and (much worse) on tag
starvation situations. This is observable just looking at top output,
with lots of kworkers active. For the !async runs, it just adds to the
CPU overhead of blk-mq.

Move the has-pending check into the run function instead of having
callers do it.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

79f720a7

block, nvme: Introduce blk_mq_req_flags_t · 9a95e4ef

由 Bart Van Assche 提交于 11月 09, 2017

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9a95e4ef

05 11月, 2017 3 次提交

blk-mq: move blk_mq_put_driver_tag*() into blk-mq.h · 244c65a3

由 Ming Lei 提交于 11月 04, 2017

We need this helper to put the driver tag for flush rq, since we will
not share tag in the flush request sequence in the following patch
in case that I/O scheduler is applied.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

244c65a3

block: pass 'run_queue' to blk_mq_request_bypass_insert · b0850297

由 Ming Lei 提交于 11月 02, 2017

Block flush need this function without running the queue, so add a
parameter controlling whether we run it or not.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0850297

blk-mq: don't handle failure in .get_budget · 88022d72

由 Ming Lei 提交于 11月 05, 2017

It is enough to just check if we can get the budget via .get_budget().
And we don't need to deal with device state change in .get_budget().

For SCSI, one issue to be fixed is that we have to call
scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
to handle the request. And it isn't enough to simply call
blk_mq_end_request() to do that if this request is marked as
RQF_DONTPREP.

Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

88022d72

01 11月, 2017 2 次提交

blk-mq-sched: improve dispatching from sw queue · b347689f

由 Ming Lei 提交于 10月 14, 2017

SCSI devices use host-wide tagset, and the shared driver tag space is
often quite big. However, there is also a queue depth for each lun(
.cmd_per_lun), which is often small, for example, on both lpfc and
qla2xxx, .cmd_per_lun is just 3.

So lots of requests may stay in sw queue, and we always flush all
belonging to same hw queue and dispatch them all to driver.
Unfortunately it is easy to cause queue busy because of the small
.cmd_per_lun.  Once these requests are flushed out, they have to stay in
hctx->dispatch, and no bio merge can happen on these requests, and
sequential IO performance is harmed.

This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request
from a sw queue, so that we can dispatch them in scheduler's way. We can
then avoid dequeueing too many requests from sw queue, since we don't
flush ->dispatch completely.

This patch improves dispatching from sw queue by using the .get_budget
and .put_budget callbacks.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b347689f

blk-mq: introduce .get_budget and .put_budget in blk_mq_ops · de148297

由 Ming Lei 提交于 10月 14, 2017

For SCSI devices, there is often a per-request-queue depth, which needs
to be respected before queuing one request.

Currently blk-mq always dequeues the request first, then calls
.queue_rq() to dispatch the request to lld. One obvious issue with this
approach is that I/O merging may not be successful, because when the
per-request-queue depth can't be respected, .queue_rq() has to return
BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
list. This means it never gets a chance to be merged with other IO.

This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
then we can try to get reserved budget first before dequeuing request.
If the budget for queueing I/O can't be satisfied, we don't need to
dequeue request at all. Hence the request can be left in the IO
scheduler queue, for more merging opportunities.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de148297

12 9月, 2017 1 次提交

block: directly insert blk-mq request from blk_insert_cloned_request() · 157f377b

由 Jens Axboe 提交于 9月 11, 2017

A NULL pointer crash was reported for the case of having the BFQ IO
scheduler attached to the underlying blk-mq paths of a DM multipath
device.  The crash occured in blk_mq_sched_insert_request()'s call to
e->type->ops.mq.insert_requests().

Paolo Valente correctly summarized why the crash occured with:
"the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
blk_rq_prep_clone) creates a cloned request without invoking
e->type->ops.mq.prepare_request for the target elevator e.  The cloned
request is therefore not initialized for the scheduler, but it is
however inserted into the scheduler by blk_mq_sched_insert_request."

All said, a request-based DM multipath device's IO scheduler should be
the only one used -- when the original requests are issued to the
underlying paths as cloned requests they are inserted directly in the
underlying dispatch queue(s) rather than through an additional elevator.

But commit bd166ef1 ("blk-mq-sched: add framework for MQ capable IO
schedulers") switched blk_insert_cloned_request() from using
blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
incorrectly added elevator machinery into a call chain that isn't
supposed to have any.

To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
that blk_insert_cloned_request() calls to insert the request without
involving any elevator that may be attached to the cloned request's
request_queue.

Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Cc: stable@vger.kernel.org
Reported-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Tested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

157f377b

10 8月, 2017 1 次提交

blk-mq: provide internal in-flight variant · f299b7c7

由 Jens Axboe 提交于 8月 08, 2017

We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f299b7c7

29 6月, 2017 1 次提交

blk-mq: Create hctx for each present CPU · 4b855ad3

由 Christoph Hellwig 提交于 6月 26, 2017

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-block@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4b855ad3

19 6月, 2017 3 次提交

blk-mq: remove __blk_mq_alloc_request · e4cdf1a1

由 Christoph Hellwig 提交于 6月 16, 2017

Move most code into blk_mq_rq_ctx_init, and the rest into
blk_mq_get_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e4cdf1a1

blk-mq: simplify blk_mq_free_request · 6af54051

由 Christoph Hellwig 提交于 6月 16, 2017

Merge three functions only tail-called by blk_mq_free_request into
blk_mq_free_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6af54051

C
blk-mq: mark blk_mq_rq_ctx_init static · 6e15cf2a
由 Christoph Hellwig 提交于 6月 16, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
6e15cf2a

04 5月, 2017 1 次提交

blk-mq: move debugfs declarations to a separate header file · d173a251

由 Omar Sandoval 提交于 5月 04, 2017

Preparation for adding more declarations.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d173a251

27 4月, 2017 3 次提交

blk-mq-debugfs: Rename functions for registering and unregistering the mq directory · 62d6c949

由 Bart Van Assche 提交于 4月 26, 2017

Since the blk_mq_debugfs_*register_hctxs() functions register and
unregister all attributes under the "mq" directory, rename these
into blk_mq_debugfs_*register_mq().
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

62d6c949

blk-mq: Let blk_mq_debugfs_register() look up the queue name · 4c9e4019

由 Bart Van Assche 提交于 4月 26, 2017

A later patch will move the call of blk_mq_debugfs_register() to
a function to which the queue name is not passed as an argument.
To avoid having to add a 'name' argument to multiple callers, let
blk_mq_debugfs_register() look up the queue name.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4c9e4019

blk-mq: Register <dev>/queue/mq after having registered <dev>/queue · 2d0364c8

由 Bart Van Assche 提交于 4月 26, 2017

A later patch in this series will modify blk_mq_debugfs_register()
such that it uses q->kobj.parent to determine the name of a
request queue. Hence make sure that that pointer is initialized
before blk_mq_debugfs_register() is called. To avoid lock inversion,
protect sysfs / debugfs registration with the queue sysfs_lock
instead of the global mutex all_q_mutex.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2d0364c8

15 4月, 2017 1 次提交

blk-mq: add shallow depth option for blk_mq_get_tag() · 229a9287

由 Omar Sandoval 提交于 4月 14, 2017

Wire up the sbitmap_get_shallow() operation to the tag code so that a
caller can limit the number of tags available to it.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

229a9287

07 4月, 2017 1 次提交

blk-mq: use the right hctx when getting a driver tag fails · 81380ca1

由 Omar Sandoval 提交于 4月 07, 2017

While dispatching requests, if we fail to get a driver tag, we mark the
hardware queue as waiting for a tag and put the requests on a
hctx->dispatch list to be run later when a driver tag is freed. However,
blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware
queues if using a single-queue scheduler with a multiqueue device. If
blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we
are processing. This means we end up using the hardware queue of the
previous request, which may or may not be the same as that of the
current request. If it isn't, the wrong hardware queue will end up
waiting for a tag, and the requests will be on the wrong dispatch list,
leading to a hang.

The fix is twofold:

1. Make sure we save which hardware queue we were trying to get a
   request for in blk_mq_get_driver_tag() regardless of whether it
   succeeds or not.
2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a
   blk_mq_hw_queue to make it clear that it must handle multiple
   hardware queues, since I've already messed this up on a couple of
   occasions.

This didn't appear in testing with nvme and mq-deadline because nvme has
more driver tags than the default number of scheduler tags. However,
with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

81380ca1

22 3月, 2017 1 次提交

blk-stat: convert to callback-based statistics reporting · 34dbad5d

由 Omar Sandoval 提交于 3月 21, 2017

Currently, statistics are gathered in ~0.13s windows, and users grab the
statistics whenever they need them. This is not ideal for both in-tree
users:

1. Writeback throttling wants its own dynamically sized window of
   statistics. Since the blk-stats statistics are reset after every
   window and the wbt windows don't line up with the blk-stats windows,
   wbt doesn't see every I/O.
2. Polling currently grabs the statistics on every I/O. Again, depending
   on how the window lines up, we may miss some I/Os. It's also
   unnecessary overhead to get the statistics on every I/O; the hybrid
   polling heuristic would be just as happy with the statistics from the
   previous full window.

This reworks the blk-stats infrastructure to be callback-based: users
register a callback that they want called at a given time with all of
the statistics from the window during which the callback was active.
Users can dynamically bucketize the statistics. wbt and polling both
currently use read vs. write, but polling can be extended to further
subdivide based on request size.

The callbacks are kept on an RCU list, and each callback has percpu
stats buffers. There will only be a few users, so the overhead on the
I/O completion side is low. The stats flushing is also simplified
considerably: since the timer function is responsible for clearing the
statistics, we don't have to worry about stale statistics.

wbt is a trivial conversion. After the conversion, the windowing problem
mentioned above is fixed.

For polling, we register an extra callback that caches the previous
window's statistics in the struct request_queue for the hybrid polling
heuristic to use.

Since we no longer have a single stats buffer for the request queue,
this also removes the sysfs and debugfs stats entries. To replace those,
we add a debugfs entry for the poll statistics.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

34dbad5d

09 3月, 2017 2 次提交

blk-mq: make lifetime consitent between q/ctx and its kobject · 7ea5fe31

由 Ming Lei 提交于 2月 22, 2017

Currently from kobject view, both q->mq_kobj and ctx->kobj can
be released during one cycle of blk_mq_register_dev() and
blk_mq_unregister_dev(). Actually, sw queue's lifetime is
same with its request queue's, which is covered by request_queue->kobj.

So we don't need to call kobject_put() for the two kinds of
kobject in __blk_mq_unregister_dev(), instead we do that
in release handler of request queue.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

7ea5fe31

blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() · 737f98cf

由 Ming Lei 提交于 2月 22, 2017

Both q->mq_kobj and sw queues' kobjects should have been initialized
once, instead of doing that each add_disk context.

Also this patch removes clearing of ctx in blk_mq_init_cpu_queues()
because percpu allocator fills zero to allocated variable.

This patch fixes one issue[1] reported from Omar.

[1] kernel wearning when doing unbind/bind on one scsi-mq device

[   19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong.
[   19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34
[   19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
[   19.350920] Workqueue: events_unbound async_run_entry_fn
[   19.350920] Call Trace:
[   19.350920]  dump_stack+0x63/0x83
[   19.350920]  kobject_init+0x77/0x90
[   19.350920]  blk_mq_register_dev+0x40/0x130
[   19.350920]  blk_register_queue+0xb6/0x190
[   19.350920]  device_add_disk+0x1ec/0x4b0
[   19.350920]  sd_probe_async+0x10d/0x1c0 [sd_mod]
[   19.350920]  async_run_entry_fn+0x48/0x150
[   19.350920]  process_one_work+0x1d0/0x480
[   19.350920]  worker_thread+0x48/0x4e0
[   19.350920]  kthread+0x101/0x140
[   19.350920]  ? process_one_work+0x480/0x480
[   19.350920]  ? kthread_create_on_node+0x60/0x60
[   19.350920]  ret_from_fork+0x2c/0x40

Cc: Omar Sandoval <osandov@osandov.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

737f98cf

02 3月, 2017 1 次提交

blk-mq: kill blk_mq_set_alloc_data() · 59748398

由 Omar Sandoval 提交于 2月 27, 2017

Nothing is using it anymore.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>

59748398

03 2月, 2017 1 次提交

block: use same block debugfs directory for blk-mq and blktrace · 18fbda91

由 Omar Sandoval 提交于 1月 31, 2017

When I added the blk-mq debugging information to debugfs, I didn't
notice that blktrace also creates a "block" directory in debugfs. Make
them use the same dentry, now created in the core block code. Based on a
patch from Jens.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

18fbda91

28 1月, 2017 2 次提交

blk-mq: fix debugfs compilation issues · 400f73b2

由 Omar Sandoval 提交于 1月 27, 2017

This fixes a couple of problems:

1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
   all.

Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.

Fixes: 07e4fead ("blk-mq: create debugfs directory tree")
Signed-off-by: NOmar Sandoval <osandov@fb.com>

Augment Kconfig description.
Signed-off-by: NJens Axboe <axboe@fb.com>

400f73b2

blk-mq-sched: add flush insertion into blk_mq_sched_insert_request() · bd6737f1

由 Jens Axboe 提交于 1月 27, 2017

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

bd6737f1

27 1月, 2017 2 次提交

blk-mq-sched: fix starvation for multiple hardware queues and shared tags · 50e1dab8

由 Jens Axboe 提交于 1月 26, 2017

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

50e1dab8

blk-mq: create debugfs directory tree · 07e4fead

由 Omar Sandoval 提交于 1月 25, 2017

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    |   `-- mq
    |       |-- 0
    |       |   `-- cpu0
    |       |-- 1
    |       |   `-- cpu1
    |       |-- 2
    |       |   `-- cpu2
    |       `-- 3
    |           `-- cpu3
    `-- vda
        `-- mq
            `-- 0
                |-- cpu0
                |-- cpu1
                |-- cpu2
                `-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07e4fead

18 1月, 2017 4 次提交

blk-mq-sched: add framework for MQ capable IO schedulers · bd166ef1

由 Jens Axboe 提交于 1月 17, 2017

This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

bd166ef1

blk-mq: abstract out helpers for allocating/freeing tag maps · cc71a6f4

由 Jens Axboe 提交于 1月 11, 2017

Prep patch for adding an extra tag map for scheduler requests.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

cc71a6f4

blk-mq-tag: cleanup the normal/reserved tag allocation · 4941115b

由 Jens Axboe 提交于 1月 13, 2017

This is in preparation for having another tag set available. Cleanup
the parameters, and allow passing in of tags for blk_mq_put_tag().
Signed-off-by: NJens Axboe <axboe@fb.com>
[hch: even more cleanups]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

4941115b

blk-mq: export some helpers we need to the scheduling framework · 2c3ad667

由 Jens Axboe 提交于 12月 14, 2016

Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2c3ad667

10 12月, 2016 1 次提交

blk-mq: abstract out blk_mq_dispatch_rq_list() helper · f04c3df3

由 Jens Axboe 提交于 12月 07, 2016

Takes a list of requests, and dispatches it. Moves any residual
requests to the dispatch list.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>

f04c3df3

11 11月, 2016 1 次提交

block: add scalable completion tracking of requests · cf43e6be

由 Jens Axboe 提交于 11月 07, 2016

For legacy block, we simply track them in the request queue. For
blk-mq, we track them on a per-sw queue basis, which we can then
sum up through the hardware queues and finally to a per device
state.

The stats are tracked in, roughly, 0.1s interval windows.

Add sysfs files to display the stats.

The feature is off by default, to avoid any extra overhead. In-kernel
users of it can turn it on by setting QUEUE_FLAG_STATS in the queue
flags. We currently don't turn it on if someone just reads any of
the stats files, that is something we could add as well.
Signed-off-by: NJens Axboe <axboe@fb.com>

cf43e6be

09 11月, 2016 1 次提交

blk-mq: export blk_mq_map_queues · 9e5a7e22

由 Christoph Hellwig 提交于 11月 01, 2016

This will allow SCSI to have a single blk_mq_ops structure that either
lets the LLDD map the queues to PCIe MSIx vectors or use the default.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9e5a7e22

03 11月, 2016 1 次提交

blk-mq: Introduce blk_mq_hctx_stopped() · 5d1b25c1

由 Bart Van Assche 提交于 10月 28, 2016

Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
a helper function that performs this test.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

5d1b25c1

22 9月, 2016 1 次提交

blk-mq/cpu-notif: Convert to new hotplug state machine · 9467f859

由 Thomas Gleixner 提交于 9月 22, 2016

Replace the block-mq notifier list management with the multi instance
facility in the cpu hotplug state machine.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-block@vger.kernel.org
Cc: rt@linutronix.de
Cc: Christoph Hellwing <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9467f859

17 9月, 2016 2 次提交

sbitmap: push per-cpu last_tag into sbitmap_queue · 40aabb67

由 Omar Sandoval 提交于 9月 17, 2016

Allocating your own per-cpu allocation hint separately makes for an
awkward API. Instead, allocate the per-cpu hint as part of the struct
sbitmap_queue. There's no point for a struct sbitmap_queue without the
cache, but you can still use a bare struct sbitmap.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

40aabb67

blk-mq: abstract tag allocation out into sbitmap library · 88459642

由 Omar Sandoval 提交于 9月 17, 2016

This is a generally useful data structure, so make it available to
anyone else who might want to use it. It's also a nice cleanup
separating the allocation logic from the rest of the tag handling logic.

The code is behind a new Kconfig option, CONFIG_SBITMAP, which is only
selected by CONFIG_BLOCK for now.

This should be a complete noop functionality-wise.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

88459642

15 9月, 2016 1 次提交

blk-mq: allow the driver to pass in a queue mapping · da695ba2

由 Christoph Hellwig 提交于 9月 14, 2016

This allows drivers specify their own queue mapping by overriding the
setup-time function that builds the mq_map.  This can be used for
example to build the map based on the MSI-X vector mapping provided
by the core interrupt layer for PCI devices.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

da695ba2

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功