提交 · 84d4add793c65b5bda802dcefcf0d7ab1a8e22ed · openeuler / Kernel

31 1月, 2017 8 次提交

lightnvm: add ioctls for vector I/Os · 84d4add7

由 Matias Bjørling 提交于 1月 31, 2017

Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.

For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.

The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Metadata implementation, test, and fixes.
Signed-off-by: NSimon A.F. Lund <slund@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

84d4add7

lightnvm: reduce number of nvm_id groups to one · 19bd6fe7

由 Matias Bjørling 提交于 1月 31, 2017

The number of configuration groups has been limited to one in current
code, even if there is support for up to four. With the introduction
of the open-channel SSD 1.3 specification, only a single
group is exposed onwards. Reflect this in the nvm_id structure.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

19bd6fe7

lightnvm: cleanup nvm transformation functions · dab8ee9e

由 Matias Bjørling 提交于 1月 31, 2017

Going from target specific ppa addresses to device was accomplished by
first converting target to generic ppa addresses and generic to device
addresses. The conversion was either open-coded or used the built-in
nvm_trans_* and nvm_map_* functions for conversion. Simplify the
interface and cleanup the calls to provide clean functions that now
either take a list of ppas or a nvm_rq, and is exposed through:

void nvm_ppa_* - target to/from device with a list of PPAs,
void nvm_rq_* - target to/from device with a nvm_rq.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dab8ee9e

lightnvm: make nvm_map_* return void · 61a561d8

由 Matias Bjørling 提交于 1月 31, 2017

The only check there was done was a debugging check. Remove it and
replace the return value with void to reduce error checking.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

61a561d8

lightnvm: remove nvm_get_bb_tbl and nvm_set_bb_tbl · 8f4fe008

由 Matias Bjørling 提交于 1月 31, 2017

Since the merge of gennvm and core, there is no longer a need for the
device specific bad block functions.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8f4fe008

lightnvm: remove nvm_submit_ppa* functions · 583b7058

由 Matias Bjørling 提交于 1月 31, 2017

The nvm_submit_ppa* functions are no longer needed after gennvm and core
have been merged.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

583b7058

lightnvm: collapse nvm_erase_ppa and nvm_erase_blk · 10995c3d

由 Matias Bjørling 提交于 1月 31, 2017

After gennvm and core have been merged, there are no more callers to
nvm_erase_ppa. Therefore collapse the device specific and target
specific erase functions.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

10995c3d

lightnvm: merge gennvm with core · ade69e24

由 Matias Bjørling 提交于 1月 31, 2017

For the first iteration of Open-Channel SSDs, it was anticipated that
there could be various media managers on top of an open-channel SSD,
such to allow vendors to plug in their own host-side FTLs, without the
media manager in between.

Now that an Open-Channel SSD is exposed as a traditional block device,
there is no longer a need for this. Therefore lets merge the gennvm code
with core and simplify the stack.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ade69e24

28 1月, 2017 4 次提交

blk-mq: fix debugfs compilation issues · 400f73b2

由 Omar Sandoval 提交于 1月 27, 2017

This fixes a couple of problems:

1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
   all.

Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.

Fixes: 07e4fead ("blk-mq: create debugfs directory tree")
Signed-off-by: NOmar Sandoval <osandov@fb.com>

Augment Kconfig description.
Signed-off-by: NJens Axboe <axboe@fb.com>

400f73b2

J
block: cleanup remaining manual checks for PREFLUSH|FUA · f3a8ab7d
由 Jens Axboe 提交于 1月 27, 2017
```
Use op_is_flush() where applicable.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
f3a8ab7d

blk-mq-sched: add flush insertion into blk_mq_sched_insert_request() · bd6737f1

由 Jens Axboe 提交于 1月 27, 2017

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().
Signed-off-by: NJens Axboe <axboe@fb.com>

bd6737f1

block: add a op_is_flush helper · f73f44eb

由 Christoph Hellwig 提交于 1月 27, 2017

This centralizes the checks for bios that needs to be go into the flush
state machine.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f73f44eb

27 1月, 2017 17 次提交

blk-mq-sched: change ->dispatch_requests() to ->dispatch_request() · c13660a0

由 Jens Axboe 提交于 1月 26, 2017

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

c13660a0

blk-mq-sched: fix starvation for multiple hardware queues and shared tags · 50e1dab8

由 Jens Axboe 提交于 1月 26, 2017

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

50e1dab8

blk-mq: release driver tag on a requeue event · 99cf1dc5

由 Jens Axboe 提交于 1月 26, 2017

We don't want to hold on to this resource when we have a scheduler
attached.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

99cf1dc5

blk-mq: fix potential race in queue restart and driver tag allocation · 3c782d67

由 Jens Axboe 提交于 1月 26, 2017

Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

3c782d67

blk-mq: improve scheduler queue sync/async running · 0abad774

由 Jens Axboe 提交于 1月 26, 2017

We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Tested-by: NHannes Reinecke <hare@suse.com>

0abad774

blk-mq: move hctx and ctx counters from sysfs to debugfs · 4a46f05e

由 Omar Sandoval 提交于 1月 25, 2017

These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4a46f05e

blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs · be215473

由 Omar Sandoval 提交于 1月 25, 2017

These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

be215473

blk-mq: add tags and sched_tags bitmaps to debugfs · d7e3621a

由 Omar Sandoval 提交于 1月 25, 2017

These can be used to debug issues like tag leaks and stuck requests.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d7e3621a

blk-mq: move tags and sched_tags info from sysfs to debugfs · d96b37c0

由 Omar Sandoval 提交于 1月 25, 2017

These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d96b37c0

blk-mq: export software queue pending map to debugfs · 0bfa5288

由 Omar Sandoval 提交于 1月 25, 2017

This is useful for debugging problems where we've gotten stuck with
requests in the software queues.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0bfa5288

sbitmap: add helpers for dumping to a seq_file · 24af1ccf

由 Omar Sandoval 提交于 1月 25, 2017

This is useful debugging information that will be used in the blk-mq
debugfs directory.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>

Changed 'weight' to 'busy'.
Signed-off-by: NJens Axboe <axboe@fb.com>

24af1ccf

blk-mq: add extra request information to debugfs · 7b393852

由 Omar Sandoval 提交于 1月 25, 2017

The request pointers by themselves aren't super useful.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7b393852

blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs · 950cd7e9

由 Omar Sandoval 提交于 1月 25, 2017

These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

950cd7e9

blk-mq: add hctx->{state,flags} to debugfs · 9abb2ad2

由 Omar Sandoval 提交于 1月 25, 2017

hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9abb2ad2

blk-mq: create debugfs directory tree · 07e4fead

由 Omar Sandoval 提交于 1月 25, 2017

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    |   `-- mq
    |       |-- 0
    |       |   `-- cpu0
    |       |-- 1
    |       |   `-- cpu1
    |       |-- 2
    |       |   `-- cpu2
    |       `-- 3
    |           `-- cpu3
    `-- vda
        `-- mq
            `-- 0
                |-- cpu0
                |-- cpu1
                |-- cpu2
                `-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07e4fead

blk-mq-sched: check for successful allocation before assigning tag · b48fda09

由 Jens Axboe 提交于 1月 26, 2017

We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b48fda09

blk-mq: don't lose flags passed in to blk_mq_alloc_request() · 5a797e00

由 Jens Axboe 提交于 1月 26, 2017

If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.
Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

5a797e00

25 1月, 2017 1 次提交

blk-mq: only apply active queue tag throttling for driver tags · 200e86b3

由 Jens Axboe 提交于 1月 25, 2017

If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.
Signed-off-by: NJens Axboe <axboe@fb.com>

200e86b3

23 1月, 2017 3 次提交

cfq-iosched: Adjust one function call together with a variable assignment · 1cf41753

由 Markus Elfring 提交于 1月 21, 2017

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code place.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NJens Axboe <axboe@fb.com>

1cf41753

blk-throttle: Adjust two function calls together with a variable assignment · d609af3a

由 Markus Elfring 提交于 1月 21, 2017

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code places.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d609af3a

block: Initialize cfqq->ioprio_class in cfq_get_queue() · 4d608baa

由 Alexander Potapenko 提交于 1月 23, 2017

KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in cfq_init_cfqq():

==================================================================
BUG: KMSAN: use of unitialized memory
...
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:?
 [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:?
 [<     inline     >] cfq_init_cfqq block/cfq-iosched.c:3754
 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
...
origin:
 [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:?
 [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:?
 [<     inline     >] allocate_slab mm/slub.c:1627
 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641
 [<     inline     >] new_slab_objects mm/slub.c:2407
 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
 [<     inline     >] __slab_alloc mm/slub.c:2606
 [<     inline     >] slab_alloc_node mm/slub.c:2669
 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
...
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
before it's initialized.
Signed-off-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d608baa

21 1月, 2017 1 次提交

blk-mq: allow resize of scheduler requests · 70f36b60

由 Jens Axboe 提交于 1月 19, 2017

Add support for growing the tags associated with a hardware queue, for
the scheduler tags. Currently we only support resizing within the
limits of the original depth, change that so we can grow it as well by
allocating and replacing the existing scheduler tag set.

This is similar to how we could increase the software queue depth with
the legacy IO stack and schedulers.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

70f36b60

19 1月, 2017 6 次提交

blk-mq: stop hardware queue in blk_mq_delay_queue() · 7e79dadc

由 Jens Axboe 提交于 1月 19, 2017

The run handler we register for the delayed work requires that the
queue be stopped, yet we leave that up to the caller. Let's move
it into blk_mq_delay_queue() itself, so that the API is sane.

This fixes a stall with SCSI, where it calls blk_mq_delay_queue()
without having stopped the queue. Hence the queue is never run.
Reported-by: NHannes Reinecke <hare@suse.com>
Fixes: 70f4db63 ("blk-mq: add blk_mq_delay_queue")
Signed-off-by: NJens Axboe <axboe@fb.com>

7e79dadc

blk-mq-tag: remove redundant check for 'data->hctx' being non-NULL · 8cecb07d

由 Jens Axboe 提交于 1月 19, 2017

We used to pass in NULL for hctx for reserved tags, but we don't
do that anymore. Hence the check for whether hctx is NULL or not
is now redundant, kill it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: a642a158aec6 ("blk-mq-tag: cleanup the normal/reserved tag allocation")
Signed-off-by: NJens Axboe <axboe@fb.com>

8cecb07d

elevator: fix unnecessary put of elevator in failure case · 610d886c

由 Jens Axboe 提交于 1月 19, 2017

We already checked that e is NULL, so no point in calling
elevator_put() to free it.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: dc877dbd088f ("blk-mq-sched: add framework for MQ capable IO schedulers")
Signed-off-by: NJens Axboe <axboe@fb.com>

610d886c

blk-cgroup: don't quiesce the queue on policy activate/deactivate · 38dbb7dd

由 Jens Axboe 提交于 1月 18, 2017

There's no potential harm in quiescing the queue, but it also doesn't
buy us anything. And we can't run the queue async for policy
deactivate, since we could be in the path of tearing the queue down.
If we schedule an async run of the queue at that time, we're racing
with queue teardown AFTER having we've already torn most of it down.
Reported-by: NOmar Sandoval <osandov@fb.com>
Fixes: 4d199c6f ("blk-cgroup: ensure that we clear the stop bit on quiesced queues")
Tested-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

38dbb7dd

sbitmap: fix wakeup hang after sbq resize · 6c0ca7ae

由 Omar Sandoval 提交于 1月 18, 2017

When we resize a struct sbitmap_queue, we update the wakeup batch size,
but we don't update the wait count in the struct sbq_wait_states. If we
resized down from a size which could use a bigger batch size, these
counts could be too large and cause us to miss necessary wakeups. To fix
this, update the wait counts when we resize (ensuring some careful
memory ordering so that it's safe w.r.t. concurrent clears).

This also fixes a theoretical issue where two threads could end up
bumping the wait count up by the batch size, which could also
potentially lead to hangs.
Reported-by: NMartin Raiber <martin@urbackup.org>
Fixes: e3a2b3f9 ("blk-mq: allow changing of queue depth through sysfs")
Fixes: 2971c35f ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt")
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6c0ca7ae

sbitmap: use smp_mb__after_atomic() in sbq_wake_up() · f66227de

由 Omar Sandoval 提交于 1月 18, 2017

We always do an atomic clear_bit() right before we call sbq_wake_up(),
so we can use smp_mb__after_atomic(). While we're here, comment the
memory barriers in here a little more.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f66227de

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功