提交 · 82ed4db499b8598f16f8871261bff088d6b0597f · openanolis / cloud-kernel

28 1月, 2017 9 次提交

block: split scsi_request out of struct request · 82ed4db4

由 Christoph Hellwig 提交于 1月 27, 2017

And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

82ed4db4

block/bsg: move queue creation into bsg_setup_queue · 8ae94eb6

由 Christoph Hellwig 提交于 1月 03, 2017

Simply the boilerplate code needed for bsg nodes a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8ae94eb6

block: allow specifying size for extra command data · 6d247d7f

由 Christoph Hellwig 提交于 1月 27, 2017

This mirrors the blk-mq capabilities to allocate extra drivers-specific
data behind struct request by setting a cmd_size field, as well as having
a constructor / destructor for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6d247d7f

block: simplify blk_init_allocated_queue · 5ea708d1

由 Christoph Hellwig 提交于 1月 03, 2017

Return an errno value instead of the passed in queue so that the callers
don't have to keep track of two queues, and move the assignment of the
request_fn and lock to the caller as passing them as argument doesn't
simplify anything.  While we're at it also remove two pointless NULL
assignments, given that the request structure is zeroed on allocation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5ea708d1

block: fix elevator init check · e6f7f93d

由 Christoph Hellwig 提交于 1月 25, 2017

We can't initalize the elevator fields for flushes as flush share space
in struct request with the elevator data.  But currently we can't
communicate that a request is a flush through blk_get_request as we
can only pass READ or WRITE, and the low-level code looks at the
possible NULL bio to check for a flush.

Fix this by allowing to pass any block op and flags, and by checking for
the flush flags in __get_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e6f7f93d

blk-mq: fix debugfs compilation issues · 400f73b2

由 Omar Sandoval 提交于 1月 27, 2017

This fixes a couple of problems:

1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
   all.

Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.

Fixes: 07e4fead ("blk-mq: create debugfs directory tree")
Signed-off-by: NOmar Sandoval <osandov@fb.com>

Augment Kconfig description.
Signed-off-by: NJens Axboe <axboe@fb.com>

400f73b2

J
block: cleanup remaining manual checks for PREFLUSH|FUA · f3a8ab7d
由 Jens Axboe 提交于 1月 27, 2017
```
Use op_is_flush() where applicable.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
f3a8ab7d

blk-mq-sched: add flush insertion into blk_mq_sched_insert_request() · bd6737f1

由 Jens Axboe 提交于 1月 27, 2017

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

bd6737f1

block: add a op_is_flush helper · f73f44eb

由 Christoph Hellwig 提交于 1月 27, 2017

This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f73f44eb

27 1月, 2017 16 次提交

blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() · c13660a0

由 Jens Axboe 提交于 1月 26, 2017

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

c13660a0

blk-mq-sched: fix starvation for multiple hardware queues and shared tags · 50e1dab8

由 Jens Axboe 提交于 1月 26, 2017

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

50e1dab8

blk-mq: release driver tag on a requeue event · 99cf1dc5

由 Jens Axboe 提交于 1月 26, 2017

We don't want to hold on to this resource when we have a scheduler
attached.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

99cf1dc5

blk-mq: fix potential race in queue restart and driver tag allocation · 3c782d67

由 Jens Axboe 提交于 1月 26, 2017

Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

3c782d67

blk-mq: improve scheduler queue sync/async running · 0abad774

由 Jens Axboe 提交于 1月 26, 2017

We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

0abad774

blk-mq: move hctx and ctx counters from sysfs to debugfs · 4a46f05e

由 Omar Sandoval 提交于 1月 25, 2017

These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4a46f05e

blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs · be215473

由 Omar Sandoval 提交于 1月 25, 2017

These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

be215473

blk-mq: add tags and sched_tags bitmaps to debugfs · d7e3621a

由 Omar Sandoval 提交于 1月 25, 2017

These can be used to debug issues like tag leaks and stuck requests.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d7e3621a

blk-mq: move tags and sched_tags info from sysfs to debugfs · d96b37c0

由 Omar Sandoval 提交于 1月 25, 2017

These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d96b37c0

blk-mq: export software queue pending map to debugfs · 0bfa5288

由 Omar Sandoval 提交于 1月 25, 2017

This is useful for debugging problems where we've gotten stuck with
requests in the software queues.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0bfa5288

blk-mq: add extra request information to debugfs · 7b393852

由 Omar Sandoval 提交于 1月 25, 2017

The request pointers by themselves aren't super useful.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7b393852

blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs · 950cd7e9

由 Omar Sandoval 提交于 1月 25, 2017

These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

950cd7e9

blk-mq: add hctx->{state,flags} to debugfs · 9abb2ad2

由 Omar Sandoval 提交于 1月 25, 2017

hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9abb2ad2

blk-mq: create debugfs directory tree · 07e4fead

由 Omar Sandoval 提交于 1月 25, 2017

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    |   `-- mq
    |       |-- 0
    |       |   `-- cpu0
    |       |-- 1
    |       |   `-- cpu1
    |       |-- 2
    |       |   `-- cpu2
    |       `-- 3
    |           `-- cpu3
    `-- vda
        `-- mq
            `-- 0
                |-- cpu0
                |-- cpu1
                |-- cpu2
                `-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07e4fead

blk-mq-sched: check for successful allocation before assigning tag · b48fda09

由 Jens Axboe 提交于 1月 26, 2017

We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b48fda09

blk-mq: don't lose flags passed in to blk_mq_alloc_request() · 5a797e00

由 Jens Axboe 提交于 1月 26, 2017

If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5a797e00

25 1月, 2017 1 次提交

blk-mq: only apply active queue tag throttling for driver tags · 200e86b3

由 Jens Axboe 提交于 1月 25, 2017

If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.
Signed-off-by: NJens Axboe <axboe@fb.com>

200e86b3

23 1月, 2017 3 次提交

cfq-iosched: Adjust one function call together with a variable assignment · 1cf41753

由 Markus Elfring 提交于 1月 21, 2017

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code place.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NJens Axboe <axboe@fb.com>

1cf41753

blk-throttle: Adjust two function calls together with a variable assignment · d609af3a

由 Markus Elfring 提交于 1月 21, 2017

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code places.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d609af3a

block: Initialize cfqq->ioprio_class in cfq_get_queue() · 4d608baa

由 Alexander Potapenko 提交于 1月 23, 2017

KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in cfq_init_cfqq():

==================================================================
BUG: KMSAN: use of unitialized memory
...
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:?
 [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:?
 [<     inline     >] cfq_init_cfqq block/cfq-iosched.c:3754
 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
...
origin:
 [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:?
 [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:?
 [<     inline     >] allocate_slab mm/slub.c:1627
 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641
 [<     inline     >] new_slab_objects mm/slub.c:2407
 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
 [<     inline     >] __slab_alloc mm/slub.c:2606
 [<     inline     >] slab_alloc_node mm/slub.c:2669
 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
...
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
before it's initialized.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d608baa

21 1月, 2017 1 次提交

blk-mq: allow resize of scheduler requests · 70f36b60

由 Jens Axboe 提交于 1月 19, 2017

Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.

This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

70f36b60

19 1月, 2017 5 次提交

blk-mq: stop hardware queue in blk_mq_delay_queue() · 7e79dadc

由 Jens Axboe 提交于 1月 19, 2017

The run handler we register for the delayed work requires that the
queue be stopped, yet we leave that up to the caller. Let's move
it into blk_mq_delay_queue() itself, so that the API is sane.

This fixes a stall with SCSI, where it calls blk_mq_delay_queue()
without having stopped the queue. Hence the queue is never run.
Reported-by: NHannes Reinecke <hare@suse.com>
Fixes: 70f4db63 ("blk-mq: add blk_mq_delay_queue")
Signed-off-by: NJens Axboe <axboe@fb.com>

7e79dadc

blk-mq-tag: remove redundant check for 'data->hctx' being non-NULL · 8cecb07d

由 Jens Axboe 提交于 1月 19, 2017

We used to pass in NULL for hctx for reserved tags, but we don't
do that anymore. Hence the check for whether hctx is NULL or not
is now redundant, kill it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation")
Signed-off-by: NJens Axboe <axboe@fb.com>

8cecb07d

elevator: fix unnecessary put of elevator in failure case · 610d886c

由 Jens Axboe 提交于 1月 19, 2017

We already checked that e is NULL, so no point in calling
elevator_put() to free it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers")
Signed-off-by: NJens Axboe <axboe@fb.com>

610d886c

blk-cgroup: don't quiesce the queue on policy activate/deactivate · 38dbb7dd

由 Jens Axboe 提交于 1月 18, 2017

There's no potential harm in quiescing the queue, but it also doesn't
buy us anything. And we can't run the queue async for policy
deactivate, since we could be in the path of tearing the queue down.
If we schedule an async run of the queue at that time, we're racing
with queue teardown AFTER having we've already torn most of it down.
Reported-by: NOmar Sandoval <osandov@fb.com>
Fixes: 4d199c6f ("blk-cgroup: ensure that we clear the stop bit on quiesced queues")
Tested-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

38dbb7dd

blk-mq: Remove unused variable · 88a75033

由 Keith Busch 提交于 1月 18, 2017

Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

88a75033

18 1月, 2017 5 次提交

blk-cgroup: ensure that we clear the stop bit on quiesced queues · 4d199c6f

由 Jens Axboe 提交于 1月 18, 2017

If we call blk_mq_quiesce_queue() on a queue, we must remember to
pair that with something that clears the stopped by on the
queues later on.
Signed-off-by: NJens Axboe <axboe@fb.com>

4d199c6f

blk-mq-sched: allow setting of default IO scheduler · d3484991

由 Jens Axboe 提交于 1月 13, 2017

Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

d3484991

mq-deadline: add blk-mq adaptation of the deadline IO scheduler · 945ffb60

由 Jens Axboe 提交于 1月 14, 2017

This is basically identical to deadline-iosched, except it registers
as a MQ capable scheduler. This is still a single queue design.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

945ffb60

blk-mq-sched: add framework for MQ capable IO schedulers · bd166ef1

由 Jens Axboe 提交于 1月 17, 2017

This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

bd166ef1

blk-mq: split tag ->rqs[] into two · 2af8cbe3

由 Jens Axboe 提交于 1月 13, 2017

This is in preparation for having two sets of tags available. For
that we need a static index, and a dynamically assignable one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2af8cbe3

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功