提交 · a294c199455187d124b0760fa8f86c13cdaa4b25 · openeuler / Kernel

30 3月, 2018 30 次提交

lightnvm: implement get log report chunk helpers · a294c199

由 Javier González 提交于 3月 30, 2018

The 2.0 spec provides a report chunk log page that can be retrieved
using the stangard nvme get log page. This replaces the dedicated
get/put bad block table in 1.2.

This patch implements the helper functions to allow targets retrieve the
chunk metadata using get log page. It makes nvme_get_log_ext available
outside of nvme core so that we can use it form lightnvm.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a294c199

lightnvm: make address conversions depend on generic device · 7100d50a

由 Javier González 提交于 3月 30, 2018

On address conversions, use the generic device, instead of the target
device. This allows to use conversions outside of the target's realm.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7100d50a

lightnvm: add support for 2.0 address format · 69471513

由 Javier González 提交于 3月 30, 2018

Add support for 2.0 address format. Also, align address bits for 1.2 and
2.0 to be able to operate on channel and luns without requiring a format
conversion. Use a generic address format for this purpose.

Also, convert the generic operations to the generic format in pblk.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69471513

lightnvm: normalize geometry nomenclature · a40afad9

由 Javier González 提交于 3月 30, 2018

Normalize nomenclature for naming channels, luns, chunks, planes and
sectors as well as derivations in order to improve readability.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a40afad9

lightnvm: complete geo structure with maxoc* · 3f48021b

由 Javier González 提交于 3月 30, 2018

Complete the generic geometry structure with the maxoc and maxocpu
felds, present in the 2.0 spec. Also, expose them through sysfs.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f48021b

lightnvm: add shorten OCSSD version in geo · f1d4e812

由 Javier González 提交于 3月 30, 2018

Create a shorten version to use in the generic geometry.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1d4e812

lightnvm: add minor version to generic geometry · 3cb98f84

由 Javier González 提交于 3月 30, 2018

Separate the version between major and minor on the generic geometry and
represent it through sysfs in the 2.0 path. The 1.2 path only shows the
major version to preserve the existing user space interface.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3cb98f84

lightnvm: simplify geometry structure · e46f4e48

由 Javier González 提交于 3月 30, 2018

Currently, the device geometry is stored redundantly in the nvm_id and
nvm_geo structures at a device level. Moreover, when instantiating
targets on a specific number of LUNs, these structures are replicated
and manually modified to fit the instance channel and LUN partitioning.

Instead, create a generic geometry around nvm_geo, which can be used by
(i) the underlying device to describe the geometry of the whole device,
and (ii) instances to describe their geometry independently.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e46f4e48

lightnvm: pblk: refactor init/exit sequences · 43d47127

由 Javier González 提交于 3月 30, 2018

Refactor init and exit sequences to eliminate dependencies among init
modules and improve readability.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

43d47127

lightnvm: Avoid validation of default op value · 9d7aa4a4

由 Heiner Litz 提交于 3月 30, 2018

Fixes: 38401d231de65 ("lightnvm: set target over-provision on create ioctl")
Signed-off-by: NHeiner Litz <hlitz@ucsc.edu>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9d7aa4a4

lightnvm: centralize permission check for lightnvm ioctl · 40f962d7

由 Johannes Thumshirn 提交于 3月 30, 2018

Currently all functions for handling the lightnvm core ioctl commands
do a check for CAP_SYS_ADMIN.

Change this to fail early in nvm_ctl_ioctl(), so we don't have to
duplicate the permission checks all over.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

40f962d7

lightnvm: fix bad block initialization · a38c78d8

由 Heiner Litz 提交于 3月 30, 2018

fix reading bad block device information to correctly setup the per line
blk_bitmap during lightnvm initialization
Signed-off-by: NHeiner Litz <hlitz@ucsc.edu>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a38c78d8

nvme: lightnvm: add late setup of block size and metadata · 96257a8a

由 Matias Bjørling 提交于 3月 30, 2018

The nvme driver sets up the size of the nvme namespace in two steps.
First it initializes the device with standard logical block and
metadata sizes, and then sets the correct logical block and metadata
size. Due to the OCSSD 2.0 specification relies on the namespace to
expose these sizes for correct initialization, let it be updated
appropriately on the LightNVM side as well.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

96257a8a

lightnvm: remove nvm_dev_ops->max_phys_sect · 89a09c56

由 Matias Bjørling 提交于 3月 30, 2018

The value of max_phys_sect is always static. Instead of
defining it in the nvm_dev_ops structure, declare it as a global
value.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

89a09c56

lightnvm: remove max_rq_size · af569398

由 Matias Bjørling 提交于 3月 30, 2018

The field is no longer used.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af569398

lightnvm: add 2.0 geometry identification · 62771fe0

由 Matias Bjørling 提交于 3月 30, 2018

Implement the geometry data structures for 2.0 and enable a drive
to be identified as one, including exposing the appropriate 2.0
sysfs entries.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62771fe0

lightnvm: flatten nvm_id_group into nvm_id · c6ac3f35

由 Matias Bjørling 提交于 3月 30, 2018

There are no groups in the 2.0 specification, make sure that the
nvm_id structure is flattened before 2.0 data structures are added.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6ac3f35

lightnvm: make 1.2 data structures explicit · a04e0cf9

由 Matias Bjørling 提交于 3月 30, 2018

Make the 1.2 data structures explicit, so it will be easy to identify
the 2.0 data structures. Also fix the order of which the nvme_nvm_*
are declared, such that they follow the nvme_nvm_command order.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a04e0cf9

lightnvm: pblk: refactor bad block identification · e411b331

由 Javier González 提交于 3月 30, 2018

In preparation for the OCSSD 2.0 spec. bad block identification,
refactor the current code to generalize bad block get/set functions and
structures.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e411b331

lightnvm: pblk: prevent race in pblk_rb_flush_point_set · 3c05ef11

由 Hans Holmberg 提交于 3月 30, 2018

Make sure that we are not advancing the sync pointer while
we're adding bios to the write buffer entry completion list.

This race condition results in bios not completing and was identified
by a hang when running xfstest generic/113.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3c05ef11

lightnvm: pblk: allow allocation of new lines during shutdown · b966c50b

由 Hans Holmberg 提交于 3月 30, 2018

When shutting down pblk the write buffer is flushed and if the
current line can't fit the data in the write buffer we need
to allocate a new line, so remove the check that prevents this.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b966c50b

lightnvm: pblk: delete writer kick timer before stopping thread · 7be970b2

由 Hans Holmberg 提交于 3月 30, 2018

Unless we delete the timer that wakes up the write thread
before we stop the thread we risk re-starting the thread, so
delete the timer first.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7be970b2

lightnvm: pblk: add padding distribution sysfs attribute · 5d149bfa

由 Hans Holmberg 提交于 3月 30, 2018

When pblk receives a sync, all data up to that point in the write buffer
must be comitted to persistent storage, and as flash memory comes with a
minimal write size there is a significant cost involved both in terms
of time for completing the sync and in terms of write amplification
padded sectors for filling up to the minimal write size.

In order to get a better understanding of the costs involved for syncs,
Add a sysfs attribute to pblk: padded_dist, showing a normalized
distribution of sectors padded. In order to facilitate measurements of
specific workloads during the lifetime of the pblk instance, the
distribution can be reset by writing 0 to the attribute.

Do this by introducing counters for each possible padding:
{0..(minimal write size - 1)} and calculate the normalized distribution
when showing the attribute.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Rearranged total_buckets statement in pblk_sysfs_get_padding_dist
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d149bfa

lightnvm: remove multiple groups in 1.2 data structure · ff12581e

由 Matias Bjørling 提交于 3月 30, 2018

Only one id group from the 1.2 specification is supported. Make
sure that only the first group is accessible.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff12581e

lightnvm: remove mlc pairs structure · d8a39cae

由 Matias Bjørling 提交于 3月 30, 2018

The known implementations of the 1.2 specification, and upcoming 2.0
implementation all expose a sequential list of pages to write.
Remove the data structure, as it is no longer needed.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d8a39cae

lightnvm: pblk: export write amplification counters to sysfs · 76758390

由 Hans Holmberg 提交于 3月 30, 2018

In a SSD, write amplification, WA, is defined as the average
number of page writes per user page write. Write amplification
negatively affects write performance and decreases the lifetime
of the disk, so it's a useful metric to add to sysfs.

In plkb's case, the number of writes per user sector is the sum of:

    (1) number of user writes
    (2) number of sectors written by the garbage collector
    (3) number of sectors padded (i.e. due to syncs)

This patch adds persistent counters for 1-3 and two sysfs attributes
to export these along with WA calculated with five decimals:

    write_amp_mileage: the accumulated write amplification stats
                      for the lifetime of the pblk instance

    write_amp_trip: resetable stats to facilitate delta measurements,
                    values reset at creation and if 0 is written
                    to the attribute.

64-bit counters are used as a 32 bit counter would wrap around
already after about 17 TB worth of user data. It will take a
long long time before the 64 bit sector counters wrap around.

The counters are stored after the bad block bitmap in the first
emeta sector of each written line. There is plenty of space in the
first emeta sector, so we don't need to bump the major version of
the line data format.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76758390

lightnvm: pblk: check data lines version on recovery · d0ab0b1a

由 Hans Holmberg 提交于 3月 30, 2018

As a preparation for future bumps of data line persistent storage
versions, we need to start checking the emeta line version during
recovery. Also slit up the current emeta/smeta version into two
bytes (major,minor).

Recovering lines with the same major number as the current pblk data
line version must succeed. This means that any changes in the
persistent format must be:

 (1) Backward compatible: if we switch back to and older
     kernel, recovery of lines stored with major == current_major
     and minor > current_minor must succeed.

 (2) Forward compatible: switching to a newer kernel,
     recovery of lines stored with major=current_major and
     minor < minor must handle the data format differences
     gracefully(i.e. initialize new data structures to default values).

If we detect lines that have a different major number than
the current we must abort recovery. The user must manually
migrate the data in this case.

Previously the version stored in the emeta header was copied
from smeta, which has version 1, so we need to set the minor
version to 1.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0ab0b1a

lightnvm: pblk: handle bad sectors in the emeta area correctly · cfe1c9e2

由 Hans Holmberg 提交于 3月 30, 2018

Unless we check if there are bad sectors in the entire emeta-area
we risk ending up with valid bitmap / available sector count inconsistency.
This results in lines with a bad chunk at the last LUN marked as bad,
so go through the whole emeta area and mark up the invalid sectors.
Signed-off-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cfe1c9e2

lightnvm: remove chnl_offset in nvme_nvm_identity · 8f37d191

由 Matias Bjørling 提交于 3月 30, 2018

The identity structure is initialized to zero in the beginning of
the nvme_nvm_identity function. The chnl_offset is separately set to
zero. Since both the variable and assignment is never changed, remove
them.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f37d191

lightnvm/pblk-gc: Delete an error message for a failed memory allocation in... · 5da84cf6

由 Markus Elfring 提交于 3月 30, 2018

lightnvm/pblk-gc: Delete an error message for a failed memory allocation in pblk_gc_line_prepare_ws()

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5da84cf6

28 3月, 2018 3 次提交

blk-mq: Allow PCI vector offset for mapping queues · f23f5bec

由 Keith Busch 提交于 3月 27, 2018

The PCI interrupt vectors intended to be associated with a queue may
not start at 0; a driver may allocate pre_vectors for special use. This
patch adds an offset parameter so blk-mq may find the intended affinity
mask and updates all drivers using this API accordingly.

Cc: Don Brace <don.brace@microsemi.com>
Cc: <qla2xxx-upstream@qlogic.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f23f5bec

loop: use killable lock in ioctls · 3148ffbd

由 Omar Sandoval 提交于 3月 26, 2018

Even after the previous patch to drop lo_ctl_mutex while calling
vfs_getattr(), there are other cases where we can end up sleeping for a
long time while holding lo_ctl_mutex. Let's avoid the uninterruptible
sleep from the ioctls.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3148ffbd

loop: don't call into filesystem while holding lo_ctl_mutex · 2d1d4c1e

由 Omar Sandoval 提交于 3月 26, 2018

We hit an issue where a loop device on NFS was stuck in
loop_get_status() doing vfs_getattr() after the NFS server died, which
caused a pile-up of uninterruptible processes waiting on lo_ctl_mutex.
There's no reason to hold this lock while we wait on the filesystem;
let's drop it so that other processes can do their thing. We need to
grab a reference on lo_backing_file while we use it, and we can get rid
of the check on lo_device, which has been unnecessary since commit
a34c0ae9ebd6 ("[PATCH] loop: remove the bio remapping capability") in
the linux-history tree.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d1d4c1e

27 3月, 2018 1 次提交

block, bfq: lower-bound the estimated peak rate to 1 · bc56e2ca

由 Paolo Valente 提交于 3月 26, 2018

If a storage device handled by BFQ happens to be slower than 7.5 KB/s
for a certain amount of time (in the order of a second), then the
estimated peak rate of the device, maintained in BFQ, becomes equal to
0. The reason is the limited precision with which the rate is
represented (details on the range of representable values in the
comments introduced by this commit). This leads to a division-by-zero
error where the estimated peak rate is used as divisor. Such a type of
failure has been reported in [1].

This commit addresses this issue by:
1. Lower-bounding the estimated peak rate to 1
2. Adding and improving comments on the range of rates representable

[1] https://www.spinics.net/lists/kernel/msg2739205.htmlSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc56e2ca

26 3月, 2018 6 次提交

nvme: make nvme_get_log_ext non-static · d558fb51

由 Matias Bjørling 提交于 3月 21, 2018

Enable the lightnvm integration to use the nvme_get_log_ext()
function.
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d558fb51

nvmet: constify struct nvmet_fabrics_ops · e929f06d

由 Christoph Hellwig 提交于 3月 20, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e929f06d

nvmet: refactor configfs transport type handling · a5d18612

由 Christoph Hellwig 提交于 3月 20, 2018

Have a common table of mappings from numerical transport ids to names, and
zero the transport specific area in common code in nvmet_addr_trtype_store.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a5d18612

nvmet: move device_uuid configfs attr definition to suitable place · f871749a

由 Max Gurtovoy 提交于 3月 20, 2018

Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f871749a

nvme: Add .stop_ctrl to nvme ctrl ops · b435ecea

由 Nitzan Carmi 提交于 3月 20, 2018

For consistancy reasons, any fabric-specific works
(e.g error recovery/reconnect) should be canceled in
nvme_stop_ctrl, as for all other NVMe pending works
(e.g. scan, keep alive).

The patch aims to simplify the logic of the code, as
we now only rely on a vague demand from any fabric
to flush its private workqueues at the beginning of
.delete_ctrl op.
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b435ecea

nvme-rdma: Allow DELETING state change failure in error_recovery · 187c0832

由 Nitzan Carmi 提交于 3月 20, 2018

While error recovery is ongoing, it is OK to move
ctrl to DELETING state (from concurrent delete_work).
Thus we don't need a warning for that case.
Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

187c0832

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功