提交 · cc9c9a00b10eaf33abe1cece2c05ea34601af21b · OpenHarmony / kernel_linux

01 6月, 2018 6 次提交

lightnvm: pblk: kick writer on new flush points · cc9c9a00

由 Hans Holmberg 提交于 6月 01, 2018

Unless we kick the writer directly when setting a new flush point, the
user risks having to wait for up to one second (the default timeout for
the write thread to be kicked) for the IO to complete.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc9c9a00

lightnvm: pblk: garbage collect lines with failed writes · 48b8d208

由 Hans Holmberg 提交于 6月 01, 2018

Write failures should not happen under normal circumstances,
so in order to bring the chunk back into a known state as soon
as possible, evacuate all the valid data out of the line and let the
fw judge if the block can be written to in the next reset cycle.

Do this by introducing a new gc list for lines with failed writes,
and ensure that the rate limiter allocates a small portion of
the write bandwidth to get the job done.

The lba list is saved in memory for use during gc as we
cannot gurantee that the emeta data is readable if a write
error occurred.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

48b8d208

lightnvm: pblk: rework write error recovery path · 6a3abf5b

由 Hans Holmberg 提交于 6月 01, 2018

The write error recovery path is incomplete, so rework
the write error recovery handling to do resubmits directly
from the write buffer.

When a write error occurs, the remaining sectors in the chunk are
mapped out and invalidated and the request inserted in a resubmit list.

The writer thread checks if there are any requests to resubmit,
scans and invalidates any lbas that have been overwritten by later
writes and resubmits the failed entries.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a3abf5b

lightnvm: pblk: remove dead function · 72b6cdbb

由 Javier González 提交于 6月 01, 2018

Remove dead function for manual sync. I/O
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

72b6cdbb

lightnvm: pass flag on graceful teardown to targets · a7c9e910

由 Javier González 提交于 6月 01, 2018

If the namespace is unregistered before the LightNVM target is removed
(e.g., on hot unplug) it is too late for the target to store any metadata
on the device - any attempt to write to the device will fail. In this
case, pass on a "gracefull teardown" flag to the target to let it know
when this happens.

In the case of pblk, we pad the open line (close all open chunks) to
improve data retention. In the event of an ungraceful shutdown, avoid
this part and just clean up.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7c9e910

lightnvm: pblk: remove unnecessary argument · 8e55c07b

由 Javier González 提交于 6月 01, 2018

Remove unnecessary argument on pblk_line_free()
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e55c07b

31 5月, 2018 1 次提交

lightnvm: convert to bioset_init()/mempool_init() · b906bbb6

由 Kent Overstreet 提交于 5月 20, 2018

Convert lightnvm to embedded bio sets.
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b906bbb6

30 3月, 2018 10 次提交

lightnvm: pblk: implement 2.0 support · 3b2a3ad1

由 Javier González 提交于 3月 30, 2018

Implement 2.0 support in pblk. This includes the address formatting and
mapping paths, as well as the sysfs entries for them.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b2a3ad1

lightnvm: pblk: implement get log report chunk · 32ef9412

由 Javier González 提交于 3月 30, 2018

In preparation of pblk supporting 2.0, implement the get log report
chunk in pblk. Also, define the chunk states as given in the 2.0 spec.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32ef9412

lightnvm: pblk: rename ppaf* to addrf* · bb845ae4

由 Javier González 提交于 3月 30, 2018

In preparation for 2.0 support in pblk, rename variables referring to
the address format to addrf and reserve ppaf for the 1.2 path.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb845ae4

lightnvm: add support for 2.0 address format · 69471513

由 Javier González 提交于 3月 30, 2018

Add support for 2.0 address format. Also, align address bits for 1.2 and
2.0 to be able to operate on channel and luns without requiring a format
conversion. Use a generic address format for this purpose.

Also, convert the generic operations to the generic format in pblk.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69471513

lightnvm: normalize geometry nomenclature · a40afad9

由 Javier González 提交于 3月 30, 2018

Normalize nomenclature for naming channels, luns, chunks, planes and
sectors as well as derivations in order to improve readability.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a40afad9

lightnvm: simplify geometry structure · e46f4e48

由 Javier González 提交于 3月 30, 2018

Currently, the device geometry is stored redundantly in the nvm_id and
nvm_geo structures at a device level. Moreover, when instantiating
targets on a specific number of LUNs, these structures are replicated
and manually modified to fit the instance channel and LUN partitioning.

Instead, create a generic geometry around nvm_geo, which can be used by
(i) the underlying device to describe the geometry of the whole device,
and (ii) instances to describe their geometry independently.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e46f4e48

lightnvm: pblk: refactor bad block identification · e411b331

由 Javier González 提交于 3月 30, 2018

In preparation for the OCSSD 2.0 spec. bad block identification,
refactor the current code to generalize bad block get/set functions and
structures.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e411b331

lightnvm: pblk: add padding distribution sysfs attribute · 5d149bfa

由 Hans Holmberg 提交于 3月 30, 2018

When pblk receives a sync, all data up to that point in the write buffer
must be comitted to persistent storage, and as flash memory comes with a
minimal write size there is a significant cost involved both in terms
of time for completing the sync and in terms of write amplification
padded sectors for filling up to the minimal write size.

In order to get a better understanding of the costs involved for syncs,
Add a sysfs attribute to pblk: padded_dist, showing a normalized
distribution of sectors padded. In order to facilitate measurements of
specific workloads during the lifetime of the pblk instance, the
distribution can be reset by writing 0 to the attribute.

Do this by introducing counters for each possible padding:
{0..(minimal write size - 1)} and calculate the normalized distribution
when showing the attribute.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Rearranged total_buckets statement in pblk_sysfs_get_padding_dist
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d149bfa

lightnvm: pblk: export write amplification counters to sysfs · 76758390

由 Hans Holmberg 提交于 3月 30, 2018

In a SSD, write amplification, WA, is defined as the average
number of page writes per user page write. Write amplification
negatively affects write performance and decreases the lifetime
of the disk, so it's a useful metric to add to sysfs.

In plkb's case, the number of writes per user sector is the sum of:

    (1) number of user writes
    (2) number of sectors written by the garbage collector
    (3) number of sectors padded (i.e. due to syncs)

This patch adds persistent counters for 1-3 and two sysfs attributes
to export these along with WA calculated with five decimals:

    write_amp_mileage: the accumulated write amplification stats
                      for the lifetime of the pblk instance

    write_amp_trip: resetable stats to facilitate delta measurements,
                    values reset at creation and if 0 is written
                    to the attribute.

64-bit counters are used as a 32 bit counter would wrap around
already after about 17 TB worth of user data. It will take a
long long time before the 64 bit sector counters wrap around.

The counters are stored after the bad block bitmap in the first
emeta sector of each written line. There is plenty of space in the
first emeta sector, so we don't need to bump the major version of
the line data format.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76758390

lightnvm: pblk: check data lines version on recovery · d0ab0b1a

由 Hans Holmberg 提交于 3月 30, 2018

As a preparation for future bumps of data line persistent storage
versions, we need to start checking the emeta line version during
recovery. Also slit up the current emeta/smeta version into two
bytes (major,minor).

Recovering lines with the same major number as the current pblk data
line version must succeed. This means that any changes in the
persistent format must be:

 (1) Backward compatible: if we switch back to and older
     kernel, recovery of lines stored with major == current_major
     and minor > current_minor must succeed.

 (2) Forward compatible: switching to a newer kernel,
     recovery of lines stored with major=current_major and
     minor < minor must handle the data format differences
     gracefully(i.e. initialize new data structures to default values).

If we detect lines that have a different major number than
the current we must abort recovery. The user must manually
migrate the data in this case.

Previously the version stored in the emeta header was copied
from smeta, which has version 1, so we need to set the minor
version to 1.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0ab0b1a

05 1月, 2018 10 次提交

lightnvm: pblk: refactor pblk_ppa_comp function · 8b7bc849

由 Matias Bjørling 提交于 1月 05, 2018

Shorten function to simply return the value of the if statement.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b7bc849

lightnvm: pblk: add iostat support · 998ba629

由 Javier González 提交于 1月 05, 2018

Since pblk registers its own block device, the iostat accounting is
not automatically done for us. Therefore, add the necessary
accounting logic to satisfy the iostat interface.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

998ba629

lightnvm: pblk: do not log recovery read errors · 8f554597

由 Javier González 提交于 1月 05, 2018

On scan recovery, reads can fail. This happens because the first page
for each line is read in order to determined if the line has been used
(and thus needs to be recovered), or not. This can lead to "empty page"
read errors.

Since these errors are normal, do not log them, as they are confusing
when reviewing the logs.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f554597

lightnvm: set target over-provision on create ioctl · e5392739

由 Javier González 提交于 1月 05, 2018

Allow to set the over-provision percentage on target creation. In case
that the value is not provided, fall back to the default value set by
the target.

In pblk, set the default OP to 11% of the total size of the device
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e5392739

lightnvm: pblk: use exact free block counter in RL · a7689938

由 Javier González 提交于 1月 05, 2018

Until now, pblk's rate-limiter has used a heuristic to reserve space for
GC I/O given that the over-provision area was fixed.

In preparation for allowing to define the over-provision area on target
creation, define a dedicated free_block counter in the rate-limiter to
track the number of blocks being used for user data.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7689938

lightnvm: pblk: rename sync_point to flush_point · 8154d296

由 Hans Holmberg 提交于 1月 05, 2018

Sync point is a really confusing name for keeping track of
the last entry that needs to be flushed so change the name
to to flush_point instead.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8154d296

lightnvm: pblk: refactor emeta consistency check · 06bc072b

由 Hans Holmberg 提交于 1月 05, 2018

Currently pblk_recov_get_lba list does two separate things:
it checks the consistency of the emeta and extracts the lba list.

This patch separates the consistency check to make the code easier
to read and to prepare for version checks of the line emeta
persistent data format version.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

06bc072b

lightnvm: pblk: remove pblk_for_each_lun helper · d6d3ec2a

由 Javier González 提交于 1月 05, 2018

Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d6d3ec2a

lightnvm: pblk: compress and reorder helper functions · b1bcfda1

由 Javier González 提交于 1月 05, 2018

Through time, we have generated some redundant helper functions.
Refactor them to eliminate redundant and unnecessary code. Also, reorder
them to improve readability
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1bcfda1

lightnvm: make geometry structures 2.0 ready · fae7fae4

由 Matias Bjørling 提交于 1月 05, 2018

Prepare for the 2.0 revision by adapting the geometry
structures to coexist with the 1.2 revision.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fae7fae4

22 11月, 2017 1 次提交

lightnvm: Convert timers to use timer_setup() · 87c1d2d3

由 Kees Cook 提交于 10月 17, 2017

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Matias Bjorling <mb@lightnvm.io>
Cc: linux-block@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>

87c1d2d3

24 10月, 2017 1 次提交

lightnvm: pblk: remove leftover testing function · 75bc5f06

由 Javier González 提交于 10月 24, 2017

A previous patch inadvertently left an unused test function in the
header, kill it.

Fixes: 8bd40020 ("lightnvm: pblk: cleanup unused and static functions")
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

75bc5f06

13 10月, 2017 11 次提交

lightnvm: implement generic path for sync I/O · 1a94b2d4

由 Javier González 提交于 10月 13, 2017

Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1a94b2d4

lightnvm: pblk: cleanup unused and static functions · 8bd40020

由 Javier González 提交于 10月 13, 2017

Cleanup up unused and static functions across the whole codebase.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8bd40020

lightnvm: pblk: gc all lines in the pipeline before exit · d6b992f7

由 Hans Holmberg 提交于 10月 13, 2017

Finish garbage collect of the lines that are in the gc pipeline
before exiting. Ensure that all lines already in in the pipeline
goes through, from read to write.

Do this by keeping track of how many lines are in the pipeline
and waiting for that number to reach zero before exiting the gc
reader task.

Since we're adding a new gc line counter, change the name of
inflight_gc to read_inflight_gc to make the distinction clear.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d6b992f7

lightnvm: pblk: start gc if needed during init · 03661b5f

由 Hans Holmberg 提交于 10月 13, 2017

Start GC if needed, directly after init, as we might
need to garbage collect in order to make room for user writes.

Create a helper function that allows to kick GC without exposing the
internals of the GC/rate-limiter interaction.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

03661b5f

lightnvm: pblk: free full lines during recovery · 37ce33d5

由 Hans Holmberg 提交于 10月 13, 2017

When rebuilding the L2P table, any full lines (lines without any
valid sectors) will be identified. If these lines are not freed,
we risk not being able to allocate the first data line.

This patch refactors the part of GC that frees empty lines
into a separate function and adds a call to this after the
L2P table has been rebuilt.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

37ce33d5

lightnvm: pblk: enable 1 LUN configuration · 21d22871

由 Javier González 提交于 10月 13, 2017

Metadata I/Os are scheduled to minimize their impact on user data I/Os.
When there are enough LUNs instantiated (i.e., enough bandwidth), it is
easy to interleave metadata and data one after the other so that
metadata I/Os are the ones being blocked and not vice-versa.

We do this by calculating the distance between the I/Os in terms of the
LUNs that are not in used, and selecting a free LUN that satisfies a
the simple heuristic that metadata is scheduled behind. The per-LUN
semaphores guarantee consistency. This works fine on >1 LUN
configuration. However, when a single LUN is instantiated, this design
leads to a deadlock, where metadata waits to be scheduled on a free LUN.

This patch implements the 1 LUN case by simply scheduling the metadada
I/O after the data I/O. In the process, we refactor the way a line is
replaced to ensure that metadata writes are submitted after data writes
in order to guarantee block sequentiality. Note that, since there is
only one LUN, both I/Os will block each other by design. However, such
configuration only pursues tight read latencies, not write bandwidth.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21d22871

lightnvm: pblk: guarantee line integrity on reads · 7bd4d370

由 Javier González 提交于 10月 13, 2017

When a line is recycled during garbage collection, reads can still be
issued to the line. If the line is freed in the middle of this process,
data corruption might occur.

This patch guarantees that lines are not freed in the middle of reads
that target them (lines). Specifically, we use the existing line
reference to decide when a line is eligible for being freed after the
recycle process.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7bd4d370

lightnvm: pblk: check lba sanity on read path · a4809fee

由 Javier González 提交于 10月 13, 2017

As part of pblk's recovery scheme, we store the lba mapped to each
physical sector on the device's out-of-bound (OOB) area.

On the read path, we can use this information to validate that the data
being delivered to the upper layers corresponds to the lba being
requested. The cost of this check is an extra copy on the DMA region on
the device and an extra comparison in the host, given that (i) the OOB
area is being read together with the data in the media, and (ii) the DMA
region allocated for the ppa list can be reused for the metadata stored
on the OOB area.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a4809fee

lightnvm: pblk: refactor rqd alloc/free · 67bf26a3

由 Javier González 提交于 10月 13, 2017

Refactor the rqd allocation and free functions so that all I/O types can
use these helper functions.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

67bf26a3

lightnvm: pblk: improve naming for internal req. · e2cddf20

由 Javier González 提交于 10月 13, 2017

Each request type sent to the LightNVM subsystem requires different
metadata. Until now, we have tailored this metadata based on write, read
and erase commands. However, pblk uses different metadata for internal
writes that do not hit the write buffer. Instead of abusing the metadata
for reads, create a new request type - internal write to improve
code readability.

In the process, create internal values for each I/O type instead of
abusing the READ/WRITE macros, as suggested by Christoph.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e2cddf20

lightnvm: pblk: allocate bio size more accurately · 875d94f3

由 Javier González 提交于 10月 13, 2017

Wait until we know the exact number of ppas to be sent to the device,
before allocating the bio.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

875d94f3

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多