提交 · 87c1d2d373c2dfc9993f09c3cfd69cf2c3347b20 · openeuler / Kernel

22 11月, 2017 1 次提交

lightnvm: Convert timers to use timer_setup() · 87c1d2d3

由 Kees Cook 提交于 10月 17, 2017

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Matias Bjorling <mb@lightnvm.io>
Cc: linux-block@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>

87c1d2d3

13 10月, 2017 22 次提交

lightnvm: implement generic path for sync I/O · 1a94b2d4

由 Javier González 提交于 10月 13, 2017

Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1a94b2d4

lightnvm: pblk: cleanup unused and static functions · 8bd40020

由 Javier González 提交于 10月 13, 2017

Cleanup up unused and static functions across the whole codebase.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8bd40020

lightnvm: pblk: gc all lines in the pipeline before exit · d6b992f7

由 Hans Holmberg 提交于 10月 13, 2017

Finish garbage collect of the lines that are in the gc pipeline
before exiting. Ensure that all lines already in in the pipeline
goes through, from read to write.

Do this by keeping track of how many lines are in the pipeline
and waiting for that number to reach zero before exiting the gc
reader task.

Since we're adding a new gc line counter, change the name of
inflight_gc to read_inflight_gc to make the distinction clear.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d6b992f7

lightnvm: pblk: remove useless line · e480689b

由 Rakesh Pandit 提交于 10月 13, 2017

Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e480689b

lightnvm: pblk: enable 1 LUN configuration · 21d22871

由 Javier González 提交于 10月 13, 2017

Metadata I/Os are scheduled to minimize their impact on user data I/Os.
When there are enough LUNs instantiated (i.e., enough bandwidth), it is
easy to interleave metadata and data one after the other so that
metadata I/Os are the ones being blocked and not vice-versa.

We do this by calculating the distance between the I/Os in terms of the
LUNs that are not in used, and selecting a free LUN that satisfies a
the simple heuristic that metadata is scheduled behind. The per-LUN
semaphores guarantee consistency. This works fine on >1 LUN
configuration. However, when a single LUN is instantiated, this design
leads to a deadlock, where metadata waits to be scheduled on a free LUN.

This patch implements the 1 LUN case by simply scheduling the metadada
I/O after the data I/O. In the process, we refactor the way a line is
replaced to ensure that metadata writes are submitted after data writes
in order to guarantee block sequentiality. Note that, since there is
only one LUN, both I/Os will block each other by design. However, such
configuration only pursues tight read latencies, not write bandwidth.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21d22871

lightnvm: pblk: guarantee line integrity on reads · 7bd4d370

由 Javier González 提交于 10月 13, 2017

When a line is recycled during garbage collection, reads can still be
issued to the line. If the line is freed in the middle of this process,
data corruption might occur.

This patch guarantees that lines are not freed in the middle of reads
that target them (lines). Specifically, we use the existing line
reference to decide when a line is eligible for being freed after the
recycle process.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7bd4d370

lightnvm: pblk: use rqd->end_io for completion · 26532ee5

由 Javier González 提交于 10月 13, 2017

For consistency with the rest of pblk, use rqd->end_io to point to the
function taking care of ending the request on the completion path.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26532ee5

lightnvm: pblk: refactor rqd alloc/free · 67bf26a3

由 Javier González 提交于 10月 13, 2017

Refactor the rqd allocation and free functions so that all I/O types can
use these helper functions.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

67bf26a3

lightnvm: pblk: improve naming for internal req. · e2cddf20

由 Javier González 提交于 10月 13, 2017

Each request type sent to the LightNVM subsystem requires different
metadata. Until now, we have tailored this metadata based on write, read
and erase commands. However, pblk uses different metadata for internal
writes that do not hit the write buffer. Instead of abusing the metadata
for reads, create a new request type - internal write to improve
code readability.

In the process, create internal values for each I/O type instead of
abusing the READ/WRITE macros, as suggested by Christoph.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e2cddf20

lightnvm: pblk: put bio on bio completion · 55e836d4

由 Javier González 提交于 10月 13, 2017

Simplify put bio by doing it on bio end_io instead of manually putting
it on the completion path.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55e836d4

lightnvm: pblk: simplify data validity check on GC · d340121e

由 Javier González 提交于 10月 13, 2017

When a line is selected for recycling by the garbage collector (GC), the
line state changes and the invalid bitmap is frozen, preventing
invalidations from happening. Throughout the GC, the L2P map is checked
to verify that not data being recycled has been updated. The last check
is done before the new map is being stored on the L2P table. Though
this algorithm works, it requires a number of corner cases to be checked
each time the L2P table is being updated. This complicates readability
and is error prone in case that the recycling algorithm is modified.

Instead, this patch makes the invalid bitmap accessible even when the
line is being recycled. When recycled data is being remapped, it is
enough to check the invalid bitmap for the line before updating the L2P
table.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d340121e

lightnvm: pblk: normalize ppa namings · 9f6cb13b

由 Javier González 提交于 10月 13, 2017

Normalize the way we name ppa variables to improve code readability.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9f6cb13b

lightnvm: pblk: remove checks on mempool alloc. · 2942f50f

由 Javier González 提交于 10月 13, 2017

As part of the mempool audit on pblk, remove unnecessary mempool
allocation checks on mempools.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2942f50f

lightnvm: pblk: do not use a mempool for line bitmaps · e72ec1d3

由 Javier González 提交于 10月 13, 2017

pblk holds two sector bitmaps: one to keep track of the mapped sectors
while the line is active and another one to keep track of the invalid
sectors. The latter is kept during the whole live of the line, until it
is recycled. Since we cannot guarantee forward progress for the mempool
in this case, get rid of the mempool and simply allocate memory through
kmalloc.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e72ec1d3

lightnvm: pblk: decouple read/erase mempools · 0d880398

由 Javier González 提交于 10月 13, 2017

Since read and erase paths offer different guarantees for inflight I/Os,
separate the mempools to set the right min_nr for each on creation.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d880398

lightnvm: pblk: simplify work_queue mempool · b84ae4a8

由 Javier González 提交于 10月 13, 2017

In pblk, we have a mempool to allocate a generic structure that we
pass along workqueues. This is heavily used in the GC path in order
to have enough inflight reads and fully utilize the GC bandwidth.

However, the current GC path copies data to the host memory and puts it
back into the write buffer. This requires a vmalloc allocation for the
data and a memory copy. Thus, guaranteeing the allocation by using a
mempool for the structure in itself does not give us much. Until we
implement support for vector copy to avoid moving data through the host,
just allocate the workqueue structure using kmalloc.

This allows us to have a much smaller mempool.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b84ae4a8

lightnvm: pblk: fix min size for page mempool · bd432417

由 Javier González 提交于 10月 13, 2017

pblk uses an internal page mempool for allocating pages on internal
bios. The main two users of this memory pool are partial reads (reads
with some sectors in cache and some on media) and padded writes, which
need to add dummy pages to an existing bio already containing valid
data (and with a large enough bioset allocated). In both cases, the
maximum number of pages per bio is defined by the maximum number of
physical sectors supported by the underlying device.

This patch fixes a bad mempool allocation, where the min_nr of elements
on the pool was fixed (to 16), which is lower than the maximum number
of sectors supported by NVMe (as of the time for this patch). Instead,
use the maximum number of allowed sectors reported by the device.
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bd432417

lightnvm: pblk: free padded entries in write buffer · cd8ddbf7

由 Javier González 提交于 10月 13, 2017

When a REQ_FLUSH reaches pblk, the bio cannot be directly completed.
Instead, data on the write buffer is flushed and the bio is completed on
the completion pah. This might require some sectors to be padded in
order to guarantee a successful write.

This patch fixes a memory leak on the padded pages. A consequence of
this bad free was that internal bios not containing data (only a flush)
were not being completed.

Fixes: a4bd217b ("lightnvm: physical block device (pblk) target")
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd8ddbf7

lightnvm: pblk: reuse pblk_gc_should_kick · 32825ebb

由 Rakesh Pandit 提交于 10月 13, 2017

This is a trivial change which reuses pblk_gc_should_kick instead of
repeating it again in pblk_rl_free_lines_inc.
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Made it apply to the common case.
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32825ebb

lightnvm: pblk: print incompatible line version correctly · c79819bc

由 Rakesh Pandit 提交于 10月 13, 2017

Correct it by converting little endian to cpu endian and also define
a macro for line version so that maintenance is easy.
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c79819bc

lightnvm: pblk: improve error message if down_timeout fails · c5493845

由 Rakesh Pandit 提交于 10月 13, 2017

The two pr_err messages are useless as they don't differentiate
error code.
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c5493845

lightnvm: pblk: protect line bitmap while submitting meta io · e57903fd

由 Rakesh Pandit 提交于 10月 13, 2017

It seems pblk_dealloc_page would race against pblk_alloc_pages for
line bitmap for sector allocation.The chances are very low but might
as well protect the bitmap properly.
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e57903fd

08 7月, 2017 1 次提交

lightnvm: pblk: control I/O flow also on tear down · 3eaa11e2

由 Javier González 提交于 7月 07, 2017

When removing a pblk instance, control the write I/O flow to the
controller as we do in the fast path.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3eaa11e2

01 7月, 2017 4 次提交

lightnvm: pblk: set line bitmap check under debug · a84ebb83

由 Javier González 提交于 6月 30, 2017

Do bitmap checks only when debug mode is enable. The line bitmap used
for mapping to physical addresses is fairly large (~512KB) and it is
expensive to do this checks on the fast path.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a84ebb83

lightnvm: pblk: remove target using async. I/Os · ee8d5c1a

由 Javier González 提交于 6月 30, 2017

When removing a pblk instance, pad the current line using asynchronous
I/O. This reduces the removal time from ~1 minute in the worst case to a
couple of seconds.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ee8d5c1a

lightnvm: pblk: use vmalloc for GC data buffer · de54e703

由 Javier González 提交于 6月 30, 2017

For now, we allocate a per I/O buffer for GC data. Since the potential
size of the buffer is 256KB and GC is not in the fast path, do this
allocation with vmalloc. This puts lets pressure on the memory
allocator at no performance cost.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de54e703

lightnvm: pblk: fix bad le64 assignations · f417aa0b

由 Javier González 提交于 6月 30, 2017

Use the right types and conversions on le64 variables. Reported by
sparse.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f417aa0b

27 6月, 2017 12 次提交

lightnvm: pblk: fail gracefully on irrec. error · 588726d3

由 Javier González 提交于 6月 26, 2017

Due to user writes being decoupled from media writes because of the need
of an intermediate write buffer, irrecoverable media write errors lead
to pblk stalling; user writes fill up the buffer and end up in an
infinite retry loop.

In order to let user writes fail gracefully, it is necessary for pblk to
keep track of its own internal state and prevent further writes from
being placed into the write buffer.

This patch implements a state machine to keep track of internal errors
and, in case of failure, fail further user writes in an standard way.
Depending on the type of error, pblk will do its best to persist
buffered writes (which are already acknowledged) and close down on a
graceful manner. This way, data might be recovered by re-instantiating
pblk. Such state machine paves out the way for a state-based FTL log.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

588726d3

lightnvm: pblk: set mempool and workqueue params. · ef576494

由 Javier González 提交于 6月 26, 2017

Make constants to define sizes for internal mempools and workqueues. In
this process, adjust the values to be more meaningful given the internal
constrains of the FTL. In order to do this for workqueues, separate the
current auxiliary workqueue into two dedicated workqueues to manage
lines being closed and bad blocks.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef576494

lightnvm: pblk: redesign GC algorithm · b20ba1bc

由 Javier González 提交于 6月 26, 2017

At the moment, in order to get enough read parallelism, we have recycled
several lines at the same time. This approach has proven not to work
well when reaching capacity, since we end up mixing valid data from all
lines, thus not maintaining a sustainable free/recycled line ratio.

The new design, relies on a two level workqueue mechanism. In the first
level, we read the metadata for a number of lines based on the GC list
they reside on (this is governed by the number of valid sectors in each
line). In the second level, we recycle a single line at a time. Here, we
issue reads in parallel, while a single GC write thread places data in
the write buffer. This design allows to (i) only move data from one line
at a time, thus maintaining a sane free/recycled ration and (ii)
maintain the GC writer busy with recycled data.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b20ba1bc

lightnvm: pblk: add lock assertions on helpers · 476118c9

由 Javier González 提交于 6月 26, 2017

Add lockdep assertions on helper functions.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

476118c9

lightnvm: pblk: cleanup unnecessary code · 0c0ea881

由 Javier González 提交于 6月 26, 2017

Cleanup unnecessary headers and code lines.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0c0ea881

lightnvm: pblk: set metadata list for all I/Os · 63e3809c

由 Javier González 提交于 6月 26, 2017

Set a dma area for all I/Os in order to read/write from/to the metadata
stored on the per-sector out-of-bound area.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

63e3809c

lightnvm: pblk: simplify meta. memory allocation · f680f19a

由 Javier González 提交于 6月 26, 2017

smeta size will always be suitable for a kmalloc allocation. Simplify
the code and leave the vmalloc fallback only for emeta, where the pblk
configuration has an impact.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f680f19a

lightnvm: pblk: issue multiplane reads if possible · f9c10152

由 Javier González 提交于 6月 26, 2017

If a read request is sequential and its size aligns with a
multi-plane page size, use the multi-plane hint to process the I/O in
parallel in the controller.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9c10152

lightnvm: pblk: delete redundant buffer pointer · 0880a9aa

由 Javier González 提交于 6月 26, 2017

After refactoring the metadata path, the backpointer controlling
synced I/Os in a line becomes unnecessary; metadata is scheduled
on the write thread, thus we know when the end of the line is reached
and act on it directly.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0880a9aa

lightnvm: pblk: sched. metadata on write thread · dd2a4343

由 Javier González 提交于 6月 26, 2017

At the moment, line metadata is persisted on a separate work queue, that
is kicked each time that a line is closed. The assumption when designing
this was that freeing the write thread from creating a new write request
was better than the potential impact of writes colliding on the media
(user I/O and metadata I/O). Experimentation has proven that this
assumption is wrong; collision can cause up to 25% of bandwidth and
introduce long tail latencies on the write thread, which potentially
cause user write threads to spend more time spinning to get a free entry
on the write buffer.

This patch moves the metadata logic to the write thread. When a line is
closed, remaining metadata is written in memory and is placed on a
metadata queue. The write thread then takes the metadata corresponding
to the previous line, creates the write request and schedules it to
minimize collisions on the media. Using this approach, we see that we
can saturate the media's bandwidth, which helps reducing both write
latencies and the spinning time for user writer threads.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dd2a4343

lightnvm: pblk: rename read request pool · 084ec9ba

由 Javier González 提交于 6月 26, 2017

Read requests allocate some extra memory to store its per I/O context.
Instead of requiring yet another memory pool for other type of requests,
generalize this context allocation (and change naming accordingly).
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

084ec9ba

lightnvm: pblk: generalize erase path · d624f371

由 Javier González 提交于 6月 26, 2017

Erase I/Os are scheduled with the following goals in mind: (i) minimize
LUNs collisions with write I/Os, and (ii) even out the price of erasing
on every write, instead of putting all the burden on when garbage
collection runs. This works well on the current design, but is specific
to the default mapping algorithm.

This patch generalizes the erase path so that other mapping algorithms
can select an arbitrary line to be erased instead. It also gets rid of
the erase semaphore since it creates jittering for user writes.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d624f371

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功