提交 · 6c8b232efea1ad3d263ff8b9c16b7e8767a77488 · openeuler / raspberrypi-kernel

09 3月, 2017 3 次提交

blk-mq: make lifetime consistent between hctx and its kobject · 6c8b232e

由 Ming Lei 提交于 2月 22, 2017

This patch removes kobject_put() over hctx in __blk_mq_unregister_dev(),
and trys to keep lifetime consistent between hctx and hctx's kobject.

Now blk_mq_sysfs_register() and blk_mq_sysfs_unregister() become
totally symmetrical, and kobject's refcounter drops to zero just
when the hctx is freed.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

6c8b232e

blk-mq: make lifetime consitent between q/ctx and its kobject · 7ea5fe31

由 Ming Lei 提交于 2月 22, 2017

Currently from kobject view, both q->mq_kobj and ctx->kobj can
be released during one cycle of blk_mq_register_dev() and
blk_mq_unregister_dev(). Actually, sw queue's lifetime is
same with its request queue's, which is covered by request_queue->kobj.

So we don't need to call kobject_put() for the two kinds of
kobject in __blk_mq_unregister_dev(), instead we do that
in release handler of request queue.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

7ea5fe31

blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() · 737f98cf

由 Ming Lei 提交于 2月 22, 2017

Both q->mq_kobj and sw queues' kobjects should have been initialized
once, instead of doing that each add_disk context.

Also this patch removes clearing of ctx in blk_mq_init_cpu_queues()
because percpu allocator fills zero to allocated variable.

This patch fixes one issue[1] reported from Omar.

[1] kernel wearning when doing unbind/bind on one scsi-mq device

[   19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong.
[   19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34
[   19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
[   19.350920] Workqueue: events_unbound async_run_entry_fn
[   19.350920] Call Trace:
[   19.350920]  dump_stack+0x63/0x83
[   19.350920]  kobject_init+0x77/0x90
[   19.350920]  blk_mq_register_dev+0x40/0x130
[   19.350920]  blk_register_queue+0xb6/0x190
[   19.350920]  device_add_disk+0x1ec/0x4b0
[   19.350920]  sd_probe_async+0x10d/0x1c0 [sd_mod]
[   19.350920]  async_run_entry_fn+0x48/0x150
[   19.350920]  process_one_work+0x1d0/0x480
[   19.350920]  worker_thread+0x48/0x4e0
[   19.350920]  kthread+0x101/0x140
[   19.350920]  ? process_one_work+0x480/0x480
[   19.350920]  ? kthread_create_on_node+0x60/0x60
[   19.350920]  ret_from_fork+0x2c/0x40

Cc: Omar Sandoval <osandov@osandov.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

737f98cf

03 3月, 2017 1 次提交

blk-mq: ensure that bd->last is always set correctly · 113285b4

由 Jens Axboe 提交于 3月 02, 2017

When drivers are called with a request in blk-mq, blk-mq flags the
state such that the driver knows if this is the last request in
this call chain or not. The driver can then use that information
to defer kicking off IO until bd->last is true. However, with blk-mq
and scheduling, we need to allocate a driver tag for a request before
it can be issued. If we fail to allocate such a tag, we could end up
in the situation where the last request issued did not have
bd->last == true set. This can then cause a driver hang.

This fixes a hang with virtio-blk, which uses bd->last as a hint
on whether to kick the queue or not.
Reported-by: NChris Mason <clm@fb.com>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

113285b4

02 3月, 2017 8 次提交

blk-mq: Provide freeze queue timeout · f91328c4

由 Keith Busch 提交于 3月 01, 2017

A driver may wish to take corrective action if queued requests do not
complete within a set time.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f91328c4

blk-mq: Export blk_mq_freeze_queue_wait · 6bae363e

由 Keith Busch 提交于 3月 01, 2017

Drivers can start a freeze, so this provides a way to wait for frozen.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

6bae363e

blk-mq: move update of tags->rqs to __blk_mq_alloc_request() · 562bef42

由 Omar Sandoval 提交于 2月 27, 2017

No functional difference, it just makes a little more sense to update
the tag map where we actually allocate the tag.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>

562bef42

blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request · 6d2809d5

由 Omar Sandoval 提交于 2月 27, 2017

blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
its blk_mq_alloc_request() counterpart. It also crashes because it
doesn't update the tags->rqs map.

Fix it by making it allocate a scheduler request.
Reported-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>

6d2809d5

blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset · 415b806d

由 Sagi Grimberg 提交于 2月 27, 2017

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

Modified by me to also check at driver tag allocation time if the
original request was reserved, so we can be sure to allocate a
properly reserved tag at that point in time, too.
Signed-off-by: NJens Axboe <axboe@fb.com>

415b806d

blk-mq: allocate blk_mq_tags and requests in correct node · 59f082e4

由 Shaohua Li 提交于 2月 01, 2017

blk_mq_tags/requests of specific hardware queue are mostly used in
specific cpus, which might not be in the same numa node as disk. For
example, a nvme card is in node 0. half hardware queue will be used by
node 0, the other node 1.
Signed-off-by: NShaohua Li <shli@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

59f082e4

sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

174cd4b1

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/topology.h> · 105ab3d8

由 Ingo Molnar 提交于 2月 01, 2017

We are going to split <linux/sched/topology.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/topology.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

105ab3d8

24 2月, 2017 1 次提交

blk-mq: use sbq wait queues instead of restart for driver tags · da55f2cc

由 Omar Sandoval 提交于 2月 22, 2017

Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware
queues and shared tags") fixed one starvation issue for shared tags.
However, we can still get into a situation where we fail to allocate a
tag because all tags are allocated but we don't have any pending
requests on any hardware queue.

One solution for this would be to restart all queues that share a tag
map, but that really sucks. Ideally, we could just block and wait for a
tag, but that isn't always possible from blk_mq_dispatch_rq_list().

However, we can still use the struct sbitmap_queue wait queues with a
custom callback instead of blocking. This has a few benefits:

1. It avoids iterating over all hardware queues when completing an I/O,
   which the current restart code has to do.
2. It benefits from the existing rolling wakeup code.
3. It avoids punting to another thread just to have it block.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

da55f2cc

18 2月, 2017 2 次提交

blk-mq: don't special case flush inserts for blk-mq-sched · 0c2a6fe4

由 Jens Axboe 提交于 2月 17, 2017

The current request insertion machinery works just fine for
directly inserting flushes, so no need to special case
this anymore.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

0c2a6fe4

blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not · 2aa0f21d

由 Jens Axboe 提交于 2月 17, 2017

Currently we're almost there, but if we dispatch nothing, then we
still return success.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2aa0f21d

11 2月, 2017 1 次提交

block: set make_request_fn manually in blk_mq_update_nr_hw_queues · f6f94300

由 Josef Bacik 提交于 2月 10, 2017

Calling blk_queue_make_request resets a bunch of settings on the
request_queue, but all we really want is to update the make_request_fn,
so do this directly so we don't lose things like the logical and
physical block sizes.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6f94300

09 2月, 2017 2 次提交

block: optionally merge discontiguous discard bios into a single request · 1e739730

由 Christoph Hellwig 提交于 2月 08, 2017

Add a new merge strategy that merges discard bios into a request until the
maximum number of discard ranges (or the maximum discard size) is reached
from the plug merging code.  I/O scheduler merging is not wired up yet
but might also be useful, although not for fast devices like NVMe which
are the only user for now.

Note that for now we don't support limiting the size of each discard range,
but if needed that can be added later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

1e739730

block: enumify ELEVATOR_*_MERGE · 34fe7c05

由 Christoph Hellwig 提交于 2月 08, 2017

Switch these constants to an enum, and make let the compiler ensure that
all callers of blk_try_merge and elv_merge handle all potential values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

34fe7c05

03 2月, 2017 1 次提交

block: use same block debugfs directory for blk-mq and blktrace · 18fbda91

由 Omar Sandoval 提交于 1月 31, 2017

When I added the blk-mq debugging information to debugfs, I didn't
notice that blktrace also creates a "block" directory in debugfs. Make
them use the same dentry, now created in the core block code. Based on a
patch from Jens.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

18fbda91

01 2月, 2017 1 次提交

blk-mq: don't fail allocating driver tag for stopped hw queue · 12d70958

由 Jens Axboe 提交于 1月 31, 2017

We rely on blk_mq_get_driver_tag() not failing if 'wait' is true,
but it currently fails in that case if the queue happens to be
stopped at the time of the call.

We don't need to check for stopped here, it's just assigning
the tag. If the queue is stopped, we'll handle it when
attempting to run the queue.

This fixes a stall/crash on flush intensive workloads, where
we proceed to process a flush that doesn't have a valid tag
assigned.
Signed-off-by: NJens Axboe <axboe@fb.com>

12d70958

28 1月, 2017 3 次提交

block: split scsi_request out of struct request · 82ed4db4

由 Christoph Hellwig 提交于 1月 27, 2017

And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

82ed4db4

blk-mq-sched: add flush insertion into blk_mq_sched_insert_request() · bd6737f1

由 Jens Axboe 提交于 1月 27, 2017

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

bd6737f1

block: add a op_is_flush helper · f73f44eb

由 Christoph Hellwig 提交于 1月 27, 2017

This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f73f44eb

27 1月, 2017 7 次提交

blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() · c13660a0

由 Jens Axboe 提交于 1月 26, 2017

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

c13660a0

blk-mq-sched: fix starvation for multiple hardware queues and shared tags · 50e1dab8

由 Jens Axboe 提交于 1月 26, 2017

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

50e1dab8

blk-mq: release driver tag on a requeue event · 99cf1dc5

由 Jens Axboe 提交于 1月 26, 2017

We don't want to hold on to this resource when we have a scheduler
attached.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

99cf1dc5

blk-mq: fix potential race in queue restart and driver tag allocation · 3c782d67

由 Jens Axboe 提交于 1月 26, 2017

Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

3c782d67

blk-mq: improve scheduler queue sync/async running · 0abad774

由 Jens Axboe 提交于 1月 26, 2017

We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

0abad774

blk-mq: create debugfs directory tree · 07e4fead

由 Omar Sandoval 提交于 1月 25, 2017

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    |   `-- mq
    |       |-- 0
    |       |   `-- cpu0
    |       |-- 1
    |       |   `-- cpu1
    |       |-- 2
    |       |   `-- cpu2
    |       `-- 3
    |           `-- cpu3
    `-- vda
        `-- mq
            `-- 0
                |-- cpu0
                |-- cpu1
                |-- cpu2
                `-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07e4fead

blk-mq: don't lose flags passed in to blk_mq_alloc_request() · 5a797e00

由 Jens Axboe 提交于 1月 26, 2017

If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5a797e00

25 1月, 2017 1 次提交

blk-mq: only apply active queue tag throttling for driver tags · 200e86b3

由 Jens Axboe 提交于 1月 25, 2017

If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.
Signed-off-by: NJens Axboe <axboe@fb.com>

200e86b3

21 1月, 2017 1 次提交

blk-mq: allow resize of scheduler requests · 70f36b60

由 Jens Axboe 提交于 1月 19, 2017

Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.

This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

70f36b60

19 1月, 2017 2 次提交

blk-mq: stop hardware queue in blk_mq_delay_queue() · 7e79dadc

由 Jens Axboe 提交于 1月 19, 2017

The run handler we register for the delayed work requires that the
queue be stopped, yet we leave that up to the caller. Let's move
it into blk_mq_delay_queue() itself, so that the API is sane.

This fixes a stall with SCSI, where it calls blk_mq_delay_queue()
without having stopped the queue. Hence the queue is never run.
Reported-by: NHannes Reinecke <hare@suse.com>
Fixes: 70f4db63 ("blk-mq: add blk_mq_delay_queue")
Signed-off-by: NJens Axboe <axboe@fb.com>

7e79dadc

blk-mq: Remove unused variable · 88a75033

由 Keith Busch 提交于 1月 18, 2017

Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

88a75033

18 1月, 2017 6 次提交

blk-mq-sched: allow setting of default IO scheduler · d3484991

由 Jens Axboe 提交于 1月 13, 2017

Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

d3484991

blk-mq-sched: add framework for MQ capable IO schedulers · bd166ef1

由 Jens Axboe 提交于 1月 17, 2017

This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

bd166ef1

blk-mq: split tag ->rqs[] into two · 2af8cbe3

由 Jens Axboe 提交于 1月 13, 2017

This is in preparation for having two sets of tags available. For
that we need a static index, and a dynamically assignable one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2af8cbe3

blk-mq: add support for carrying internal tag information in blk_qc_t · fd2d3326

由 Jens Axboe 提交于 1月 12, 2017

No functional change in this patch, just in preparation for having
two types of tags available to the block layer for a single request.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

fd2d3326

blk-mq: abstract out helpers for allocating/freeing tag maps · cc71a6f4

由 Jens Axboe 提交于 1月 11, 2017

Prep patch for adding an extra tag map for scheduler requests.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

cc71a6f4

blk-mq-tag: cleanup the normal/reserved tag allocation · 4941115b

由 Jens Axboe 提交于 1月 13, 2017

This is in preparation for having another tag set available. Cleanup
the parameters, and allow passing in of tags for blk_mq_put_tag().
Signed-off-by: NJens Axboe <axboe@fb.com>
[hch: even more cleanups]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

4941115b