提交 · 9d7aa4a484872cb2b4dc81bd6f058cb8351ca9ed · openanolis / cloud-kernel

30 3月, 2018 17 次提交

lightnvm: Avoid validation of default op value · 9d7aa4a4

由 Heiner Litz 提交于 3月 30, 2018

Fixes: 38401d231de65 ("lightnvm: set target over-provision on create ioctl")
Signed-off-by: NHeiner Litz <hlitz@ucsc.edu>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9d7aa4a4

lightnvm: centralize permission check for lightnvm ioctl · 40f962d7

由 Johannes Thumshirn 提交于 3月 30, 2018

Currently all functions for handling the lightnvm core ioctl commands
do a check for CAP_SYS_ADMIN.

Change this to fail early in nvm_ctl_ioctl(), so we don't have to
duplicate the permission checks all over.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

40f962d7

lightnvm: fix bad block initialization · a38c78d8

由 Heiner Litz 提交于 3月 30, 2018

fix reading bad block device information to correctly setup the per line
blk_bitmap during lightnvm initialization
Signed-off-by: NHeiner Litz <hlitz@ucsc.edu>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a38c78d8

nvme: lightnvm: add late setup of block size and metadata · 96257a8a

由 Matias Bjørling 提交于 3月 30, 2018

The nvme driver sets up the size of the nvme namespace in two steps.
First it initializes the device with standard logical block and
metadata sizes, and then sets the correct logical block and metadata
size. Due to the OCSSD 2.0 specification relies on the namespace to
expose these sizes for correct initialization, let it be updated
appropriately on the LightNVM side as well.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

96257a8a

lightnvm: remove nvm_dev_ops->max_phys_sect · 89a09c56

由 Matias Bjørling 提交于 3月 30, 2018

The value of max_phys_sect is always static. Instead of
defining it in the nvm_dev_ops structure, declare it as a global
value.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89a09c56

lightnvm: remove max_rq_size · af569398

由 Matias Bjørling 提交于 3月 30, 2018

The field is no longer used.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af569398

lightnvm: add 2.0 geometry identification · 62771fe0

由 Matias Bjørling 提交于 3月 30, 2018

Implement the geometry data structures for 2.0 and enable a drive
to be identified as one, including exposing the appropriate 2.0
sysfs entries.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62771fe0

lightnvm: flatten nvm_id_group into nvm_id · c6ac3f35

由 Matias Bjørling 提交于 3月 30, 2018

There are no groups in the 2.0 specification, make sure that the
nvm_id structure is flattened before 2.0 data structures are added.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6ac3f35

lightnvm: pblk: refactor bad block identification · e411b331

由 Javier González 提交于 3月 30, 2018

In preparation for the OCSSD 2.0 spec. bad block identification,
refactor the current code to generalize bad block get/set functions and
structures.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e411b331

lightnvm: pblk: prevent race in pblk_rb_flush_point_set · 3c05ef11

由 Hans Holmberg 提交于 3月 30, 2018

Make sure that we are not advancing the sync pointer while
we're adding bios to the write buffer entry completion list.

This race condition results in bios not completing and was identified
by a hang when running xfstest generic/113.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c05ef11

lightnvm: pblk: allow allocation of new lines during shutdown · b966c50b

由 Hans Holmberg 提交于 3月 30, 2018

When shutting down pblk the write buffer is flushed and if the
current line can't fit the data in the write buffer we need
to allocate a new line, so remove the check that prevents this.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b966c50b

lightnvm: pblk: delete writer kick timer before stopping thread · 7be970b2

由 Hans Holmberg 提交于 3月 30, 2018

Unless we delete the timer that wakes up the write thread
before we stop the thread we risk re-starting the thread, so
delete the timer first.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7be970b2

lightnvm: pblk: add padding distribution sysfs attribute · 5d149bfa

由 Hans Holmberg 提交于 3月 30, 2018

When pblk receives a sync, all data up to that point in the write buffer
must be comitted to persistent storage, and as flash memory comes with a
minimal write size there is a significant cost involved both in terms
of time for completing the sync and in terms of write amplification
padded sectors for filling up to the minimal write size.

In order to get a better understanding of the costs involved for syncs,
Add a sysfs attribute to pblk: padded_dist, showing a normalized
distribution of sectors padded. In order to facilitate measurements of
specific workloads during the lifetime of the pblk instance, the
distribution can be reset by writing 0 to the attribute.

Do this by introducing counters for each possible padding:
{0..(minimal write size - 1)} and calculate the normalized distribution
when showing the attribute.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Rearranged total_buckets statement in pblk_sysfs_get_padding_dist
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d149bfa

lightnvm: pblk: export write amplification counters to sysfs · 76758390

由 Hans Holmberg 提交于 3月 30, 2018

In a SSD, write amplification, WA, is defined as the average
number of page writes per user page write. Write amplification
negatively affects write performance and decreases the lifetime
of the disk, so it's a useful metric to add to sysfs.

In plkb's case, the number of writes per user sector is the sum of:

    (1) number of user writes
    (2) number of sectors written by the garbage collector
    (3) number of sectors padded (i.e. due to syncs)

This patch adds persistent counters for 1-3 and two sysfs attributes
to export these along with WA calculated with five decimals:

    write_amp_mileage: the accumulated write amplification stats
                      for the lifetime of the pblk instance

    write_amp_trip: resetable stats to facilitate delta measurements,
                    values reset at creation and if 0 is written
                    to the attribute.

64-bit counters are used as a 32 bit counter would wrap around
already after about 17 TB worth of user data. It will take a
long long time before the 64 bit sector counters wrap around.

The counters are stored after the bad block bitmap in the first
emeta sector of each written line. There is plenty of space in the
first emeta sector, so we don't need to bump the major version of
the line data format.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76758390

lightnvm: pblk: check data lines version on recovery · d0ab0b1a

由 Hans Holmberg 提交于 3月 30, 2018

As a preparation for future bumps of data line persistent storage
versions, we need to start checking the emeta line version during
recovery. Also slit up the current emeta/smeta version into two
bytes (major,minor).

Recovering lines with the same major number as the current pblk data
line version must succeed. This means that any changes in the
persistent format must be:

 (1) Backward compatible: if we switch back to and older
     kernel, recovery of lines stored with major == current_major
     and minor > current_minor must succeed.

 (2) Forward compatible: switching to a newer kernel,
     recovery of lines stored with major=current_major and
     minor < minor must handle the data format differences
     gracefully(i.e. initialize new data structures to default values).

If we detect lines that have a different major number than
the current we must abort recovery. The user must manually
migrate the data in this case.

Previously the version stored in the emeta header was copied
from smeta, which has version 1, so we need to set the minor
version to 1.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0ab0b1a

lightnvm: pblk: handle bad sectors in the emeta area correctly · cfe1c9e2

由 Hans Holmberg 提交于 3月 30, 2018

Unless we check if there are bad sectors in the entire emeta-area
we risk ending up with valid bitmap / available sector count inconsistency.
This results in lines with a bad chunk at the last LUN marked as bad,
so go through the whole emeta area and mark up the invalid sectors.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cfe1c9e2

lightnvm/pblk-gc: Delete an error message for a failed memory allocation in... · 5da84cf6

由 Markus Elfring 提交于 3月 30, 2018

lightnvm/pblk-gc: Delete an error message for a failed memory allocation in pblk_gc_line_prepare_ws()

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5da84cf6

09 3月, 2018 1 次提交

block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() · 8b904b5b

由 Bart Van Assche 提交于 3月 07, 2018

This patch has been generated as follows:

for verb in set_unlocked clear_unlocked set clear; do
  replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
    $(git grep -lw queue_flag_${verb} drivers block/bsg*)
done

Except for protecting all queue flag changes with the queue lock
this patch does not change any functionality.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b904b5b

01 3月, 2018 1 次提交

block: Add 'lock' as third argument to blk_alloc_queue_node() · 5ee0524b

由 Bart Van Assche 提交于 2月 28, 2018

This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5ee0524b

05 1月, 2018 21 次提交

lightnvm: pblk: refactor pblk_ppa_comp function · 8b7bc849

由 Matias Bjørling 提交于 1月 05, 2018

Shorten function to simply return the value of the if statement.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b7bc849

lightnvm: pblk: add iostat support · 998ba629

由 Javier González 提交于 1月 05, 2018

Since pblk registers its own block device, the iostat accounting is
not automatically done for us. Therefore, add the necessary
accounting logic to satisfy the iostat interface.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

998ba629

lightnvm: pblk: print instance name on instance info · 30d82a86

由 Javier González 提交于 1月 05, 2018

Add the instance name to the information printed out on target creation.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

30d82a86

lightnvm: pblk: free write buffer on init failure · c6847e4e

由 Javier González 提交于 1月 05, 2018

Refactor the way we free the write buffer to ensure that all entries get
freed in case of an error on the init sequence.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6847e4e

lightnvm: pblk: ensure kthread alloc. before kicking it · cc4f5ba1

由 Javier González 提交于 1月 05, 2018

When creating the write thread, ensure that the kthread has been created
before initializing the timer responsible from kicking it. Otherwise, if
the kthread creation fails or gets killed from used space, we risk
kicking an empty thread structure.

Also, since the kthread creation can be interrupted form user space,
adapt the error path to not report an error when this happens, since it
is intentional that the instance creation is aborted.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Updated source to reflect the new timer_setup API.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc4f5ba1

lightnvm: pblk: do not log recovery read errors · 8f554597

由 Javier González 提交于 1月 05, 2018

On scan recovery, reads can fail. This happens because the first page
for each line is read in order to determined if the line has been used
(and thus needs to be recovered), or not. This can lead to "empty page"
read errors.

Since these errors are normal, do not log them, as they are confusing
when reviewing the logs.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f554597

lightnvm: pblk: ignore high ecc errors on recovery · 5d201f07

由 Javier González 提交于 1月 05, 2018

On recovery, do not stop L2P recovery if reads report high ECC error
as the data is still available.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d201f07

lightnvm: set target over-provision on create ioctl · e5392739

由 Javier González 提交于 1月 05, 2018

Allow to set the over-provision percentage on target creation. In case
that the value is not provided, fall back to the default value set by
the target.

In pblk, set the default OP to 11% of the total size of the device
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e5392739

lightnvm: pblk: use exact free block counter in RL · a7689938

由 Javier González 提交于 1月 05, 2018

Until now, pblk's rate-limiter has used a heuristic to reserve space for
GC I/O given that the over-provision area was fixed.

In preparation for allowing to define the over-provision area on target
creation, define a dedicated free_block counter in the rate-limiter to
track the number of blocks being used for user data.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7689938

lightnvm: pblk: remove pblk_gc_stop · aed49e19

由 Hans Holmberg 提交于 1月 05, 2018

pblk_gc_stop just sets pblk->gc->gc_active to zero, ignoring
the flush parameter. This is plain confusing, so remove the
function and set the gc active flag at the call points instead.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aed49e19

lightnvm: pblk: prevent premature sync point resets · b36bbf9d

由 Hans Holmberg 提交于 1月 05, 2018

Unless we protect flush pointer updates with a lock, we risk
resetting new flush points before we've synced all sectors
up to that point.

This patch protects new flush points with the same spin lock
that is being held when advancing the sync pointer and
resetting completed flush points.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b36bbf9d

lightnvm: pblk: clear flush point on completed writes · 533657c1

由 Hans Holmberg 提交于 1月 05, 2018

Move completion of syncs and clearing of flush points to the
write completion path - this ensures that the data has been
comitted to the media before completing bios containing syncs.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

533657c1

lightnvm: pblk: rename sync_point to flush_point · 8154d296

由 Hans Holmberg 提交于 1月 05, 2018

Sync point is a really confusing name for keeping track of
the last entry that needs to be flushed so change the name
to to flush_point instead.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8154d296

lightnvm: pblk: refactor emeta consistency check · 06bc072b

由 Hans Holmberg 提交于 1月 05, 2018

Currently pblk_recov_get_lba list does two separate things:
it checks the consistency of the emeta and extracts the lba list.

This patch separates the consistency check to make the code easier
to read and to prepare for version checks of the line emeta
persistent data format version.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

06bc072b

lightnvm: pblk: remove pblk_for_each_lun helper · d6d3ec2a

由 Javier González 提交于 1月 05, 2018

Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d6d3ec2a

lightnvm: pblk: compress and reorder helper functions · b1bcfda1

由 Javier González 提交于 1月 05, 2018

Through time, we have generated some redundant helper functions.
Refactor them to eliminate redundant and unnecessary code. Also, reorder
them to improve readability
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1bcfda1

lightnvm: guarantee target unique name across devs. · bd77b23b

由 Javier González 提交于 1月 05, 2018

Until now, target unique naming is only guaranteed per device. This is
ok from a lightnvm perspective, but not from a sysfs one, since groups
will collide regardless of the underlying device.

Check that names are unique across all lightnvm-capable devices.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bd77b23b

lightnvm: refactor target type lookup · e29c80e6

由 Javier González 提交于 1月 05, 2018

Refactor target type lookup to use/not use locks explicitly instead of
using a hidden parameter to make the function locking.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e29c80e6

lightnvm: make geometry structures 2.0 ready · fae7fae4

由 Matias Bjørling 提交于 1月 05, 2018

Prepare for the 2.0 revision by adapting the geometry
structures to coexist with the 1.2 revision.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fae7fae4

lightnvm: remove lower page tables · bb27aa9e

由 Matias Bjørling 提交于 1月 05, 2018

The lower page table is unused. All page tables reported by 1.2
devices are all reporting a sequential 1:1 page mapping. This is
also not used going forward with the 2.0 revision.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb27aa9e

lightnvm: remove hybrid ocssd 1.2 support · e3e13bcc

由 Matias Bjørling 提交于 1月 05, 2018

Now that rrpc have been removed. Also remove the hybrid 1.2 support
from the core.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3e13bcc

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功