提交 · 0abad774124351ba211b9053786ebd5a5a722d53 · openanolis / cloud-kernel

27 1月, 2017 13 次提交

blk-mq: improve scheduler queue sync/async running · 0abad774

由 Jens Axboe 提交于 1月 26, 2017

We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

0abad774

blk-mq: move hctx and ctx counters from sysfs to debugfs · 4a46f05e

由 Omar Sandoval 提交于 1月 25, 2017

These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4a46f05e

blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs · be215473

由 Omar Sandoval 提交于 1月 25, 2017

These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

be215473

blk-mq: add tags and sched_tags bitmaps to debugfs · d7e3621a

由 Omar Sandoval 提交于 1月 25, 2017

These can be used to debug issues like tag leaks and stuck requests.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d7e3621a

blk-mq: move tags and sched_tags info from sysfs to debugfs · d96b37c0

由 Omar Sandoval 提交于 1月 25, 2017

These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d96b37c0

blk-mq: export software queue pending map to debugfs · 0bfa5288

由 Omar Sandoval 提交于 1月 25, 2017

This is useful for debugging problems where we've gotten stuck with
requests in the software queues.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0bfa5288

sbitmap: add helpers for dumping to a seq_file · 24af1ccf

由 Omar Sandoval 提交于 1月 25, 2017

This is useful debugging information that will be used in the blk-mq
debugfs directory.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>

Changed 'weight' to 'busy'.
Signed-off-by: NJens Axboe <axboe@fb.com>

24af1ccf

blk-mq: add extra request information to debugfs · 7b393852

由 Omar Sandoval 提交于 1月 25, 2017

The request pointers by themselves aren't super useful.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7b393852

blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs · 950cd7e9

由 Omar Sandoval 提交于 1月 25, 2017

These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

950cd7e9

blk-mq: add hctx->{state,flags} to debugfs · 9abb2ad2

由 Omar Sandoval 提交于 1月 25, 2017

hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9abb2ad2

blk-mq: create debugfs directory tree · 07e4fead

由 Omar Sandoval 提交于 1月 25, 2017

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    |   `-- mq
    |       |-- 0
    |       |   `-- cpu0
    |       |-- 1
    |       |   `-- cpu1
    |       |-- 2
    |       |   `-- cpu2
    |       `-- 3
    |           `-- cpu3
    `-- vda
        `-- mq
            `-- 0
                |-- cpu0
                |-- cpu1
                |-- cpu2
                `-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07e4fead

blk-mq-sched: check for successful allocation before assigning tag · b48fda09

由 Jens Axboe 提交于 1月 26, 2017

We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b48fda09

blk-mq: don't lose flags passed in to blk_mq_alloc_request() · 5a797e00

由 Jens Axboe 提交于 1月 26, 2017

If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5a797e00

25 1月, 2017 1 次提交

blk-mq: only apply active queue tag throttling for driver tags · 200e86b3

由 Jens Axboe 提交于 1月 25, 2017

If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.
Signed-off-by: NJens Axboe <axboe@fb.com>

200e86b3

23 1月, 2017 3 次提交

cfq-iosched: Adjust one function call together with a variable assignment · 1cf41753

由 Markus Elfring 提交于 1月 21, 2017

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code place.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NJens Axboe <axboe@fb.com>

1cf41753

blk-throttle: Adjust two function calls together with a variable assignment · d609af3a

由 Markus Elfring 提交于 1月 21, 2017

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code places.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d609af3a

block: Initialize cfqq->ioprio_class in cfq_get_queue() · 4d608baa

由 Alexander Potapenko 提交于 1月 23, 2017

KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in cfq_init_cfqq():

==================================================================
BUG: KMSAN: use of unitialized memory
...
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:?
 [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:?
 [<     inline     >] cfq_init_cfqq block/cfq-iosched.c:3754
 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
...
origin:
 [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:?
 [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:?
 [<     inline     >] allocate_slab mm/slub.c:1627
 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641
 [<     inline     >] new_slab_objects mm/slub.c:2407
 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
 [<     inline     >] __slab_alloc mm/slub.c:2606
 [<     inline     >] slab_alloc_node mm/slub.c:2669
 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
...
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
before it's initialized.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d608baa

21 1月, 2017 1 次提交

blk-mq: allow resize of scheduler requests · 70f36b60

由 Jens Axboe 提交于 1月 19, 2017

Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.

This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

70f36b60

19 1月, 2017 6 次提交

blk-mq: stop hardware queue in blk_mq_delay_queue() · 7e79dadc

由 Jens Axboe 提交于 1月 19, 2017

The run handler we register for the delayed work requires that the
queue be stopped, yet we leave that up to the caller. Let's move
it into blk_mq_delay_queue() itself, so that the API is sane.

This fixes a stall with SCSI, where it calls blk_mq_delay_queue()
without having stopped the queue. Hence the queue is never run.
Reported-by: NHannes Reinecke <hare@suse.com>
Fixes: 70f4db63 ("blk-mq: add blk_mq_delay_queue")
Signed-off-by: NJens Axboe <axboe@fb.com>

7e79dadc

blk-mq-tag: remove redundant check for 'data->hctx' being non-NULL · 8cecb07d

由 Jens Axboe 提交于 1月 19, 2017

We used to pass in NULL for hctx for reserved tags, but we don't
do that anymore. Hence the check for whether hctx is NULL or not
is now redundant, kill it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation")
Signed-off-by: NJens Axboe <axboe@fb.com>

8cecb07d

elevator: fix unnecessary put of elevator in failure case · 610d886c

由 Jens Axboe 提交于 1月 19, 2017

We already checked that e is NULL, so no point in calling
elevator_put() to free it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers")
Signed-off-by: NJens Axboe <axboe@fb.com>

610d886c

blk-cgroup: don't quiesce the queue on policy activate/deactivate · 38dbb7dd

由 Jens Axboe 提交于 1月 18, 2017

There's no potential harm in quiescing the queue, but it also doesn't
buy us anything. And we can't run the queue async for policy
deactivate, since we could be in the path of tearing the queue down.
If we schedule an async run of the queue at that time, we're racing
with queue teardown AFTER having we've already torn most of it down.
Reported-by: NOmar Sandoval <osandov@fb.com>
Fixes: 4d199c6f ("blk-cgroup: ensure that we clear the stop bit on quiesced queues")
Tested-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

38dbb7dd

sbitmap: fix wakeup hang after sbq resize · 6c0ca7ae

由 Omar Sandoval 提交于 1月 18, 2017

When we resize a struct sbitmap_queue, we update the wakeup batch size,
but we don't update the wait count in the struct sbq_wait_states. If we
resized down from a size which could use a bigger batch size, these
counts could be too large and cause us to miss necessary wakeups. To fix
this, update the wait counts when we resize (ensuring some careful
memory ordering so that it's safe w.r.t. concurrent clears).

This also fixes a theoretical issue where two threads could end up
bumping the wait count up by the batch size, which could also
potentially lead to hangs.
Reported-by: NMartin Raiber <martin@urbackup.org>
Fixes: e3a2b3f9 ("blk-mq: allow changing of queue depth through sysfs")
Fixes: 2971c35f ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt")
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6c0ca7ae

sbitmap: use smp_mb__after_atomic() in sbq_wake_up() · f66227de

由 Omar Sandoval 提交于 1月 18, 2017

We always do an atomic clear_bit() right before we call sbq_wake_up(),
so we can use smp_mb__after_atomic(). While we're here, comment the
memory barriers in here a little more.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f66227de

18 1月, 2017 13 次提交

blk-cgroup: ensure that we clear the stop bit on quiesced queues · 4d199c6f

由 Jens Axboe 提交于 1月 18, 2017

If we call blk_mq_quiesce_queue() on a queue, we must remember to
pair that with something that clears the stopped by on the
queues later on.
Signed-off-by: NJens Axboe <axboe@fb.com>

4d199c6f

blk-mq-sched: allow setting of default IO scheduler · d3484991

由 Jens Axboe 提交于 1月 13, 2017

Add Kconfig entries to manage what devices get assigned an MQ
scheduler, and add a blk-mq flag for drivers to opt out of scheduling.
The latter is useful for admin type queues that still allocate a blk-mq
queue and tag set, but aren't use for normal IO.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

d3484991

mq-deadline: add blk-mq adaptation of the deadline IO scheduler · 945ffb60

由 Jens Axboe 提交于 1月 14, 2017

This is basically identical to deadline-iosched, except it registers
as a MQ capable scheduler. This is still a single queue design.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

945ffb60

blk-mq-sched: add framework for MQ capable IO schedulers · bd166ef1

由 Jens Axboe 提交于 1月 17, 2017

This adds a set of hooks that intercepts the blk-mq path of
allocating/inserting/issuing/completing requests, allowing
us to develop a scheduler within that framework.

We reuse the existing elevator scheduler API on the registration
side, but augment that with the scheduler flagging support for
the blk-mq interfce, and with a separate set of ops hooks for MQ
devices.

We split driver and scheduler tags, so we can run the scheduling
independently of device queue depth.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

bd166ef1

blk-mq: split tag ->rqs[] into two · 2af8cbe3

由 Jens Axboe 提交于 1月 13, 2017

This is in preparation for having two sets of tags available. For
that we need a static index, and a dynamically assignable one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2af8cbe3

blk-mq: add support for carrying internal tag information in blk_qc_t · fd2d3326

由 Jens Axboe 提交于 1月 12, 2017

No functional change in this patch, just in preparation for having
two types of tags available to the block layer for a single request.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

fd2d3326

blk-mq: abstract out helpers for allocating/freeing tag maps · cc71a6f4

由 Jens Axboe 提交于 1月 11, 2017

Prep patch for adding an extra tag map for scheduler requests.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

cc71a6f4

blk-mq-tag: cleanup the normal/reserved tag allocation · 4941115b

由 Jens Axboe 提交于 1月 13, 2017

This is in preparation for having another tag set available. Cleanup
the parameters, and allow passing in of tags for blk_mq_put_tag().
Signed-off-by: NJens Axboe <axboe@fb.com>
[hch: even more cleanups]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

4941115b

blk-mq: export some helpers we need to the scheduling framework · 2c3ad667

由 Jens Axboe 提交于 12月 14, 2016

Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2c3ad667

blk-mq: un-export blk_mq_free_hctx_request() · 16a3c2a7

由 Jens Axboe 提交于 12月 15, 2016

It's only used in blk-mq, kill it from the main exported header
and kill the symbol export as well.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

16a3c2a7

block: move rq_ioc() to blk.h · c23ecb42

由 Jens Axboe 提交于 12月 14, 2016

We want to use it outside of blk-core.c.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

c23ecb42

block: move existing elevator ops to union · c51ca6cf

由 Jens Axboe 提交于 12月 10, 2016

Prep patch for adding MQ ops as well, since doing anon unions with
named initializers doesn't work on older compilers.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

c51ca6cf

partitions/efi: Fix integer overflow in GPT size calculation · c5082b70

由 Alden Tondettar 提交于 1月 15, 2017

If a GUID Partition Table claims to have more than 2**25 entries, the
calculation of the partition table size in alloc_read_gpt_entries() will
overflow a 32-bit integer and not enough space will be allocated for the
table.

Nothing seems to get written out of bounds, but later efi_partition() will
read up to 32768 bytes from a 128 byte buffer, possibly OOPSing or exposing
information to /proc/partitions and uevents.

The problem exists on both 64-bit and 32-bit platforms.

Fix the overflow and also print a meaningful debug message if the table
size is too large.
Signed-off-by: NAlden Tondettar <alden.tondettar@gmail.com>
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

c5082b70

12 1月, 2017 3 次提交

MAINTAINERS: Update maintainer entry for NBD · 1e668f4e

由 Josef Bacik 提交于 1月 11, 2017

The old maintainers email is bouncing and I've rewritten most of this
driver in the recent months.  Also add linux-block to the mailinglist
and remove the old tree, I will send patches through the linux-block
tree.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1e668f4e

blk-mq: make mq_ops a const pointer · f8a5b122

由 Jens Axboe 提交于 12月 13, 2016

We never change it, make that clear.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>

f8a5b122

block: relax check on sg gap · 729204ef

由 Ming Lei 提交于 12月 17, 2016

If the last bvec of the 1st bio and the 1st bvec of the next
bio are physically contigious, and the latter can be merged
to last segment of the 1st bio, we should think they don't
violate sg gap(or virt boundary) limit.

Both Vitaly and Dexuan reported lots of unmergeable small bios
are observed when running mkfs on Hyper-V virtual storage, and
performance becomes quite low. This patch fixes that performance
issue.

The same issue should exist on NVMe, since it sets virt boundary too.
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reported-by: NDexuan Cui <decui@microsoft.com>
Tested-by: NDexuan Cui <decui@microsoft.com>
Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

729204ef

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功