提交 · 0e5c3246dbb96b6870634e7d51b2490f05c976cf · openeuler / Kernel

30 11月, 2016 10 次提交

lightnvm: make address conversion functions global · 0e5c3246

由 Javier González 提交于 11月 28, 2016

Targets are assumed to used the same generic ppa format, where the
address is partitioned on ch:lun:block:pg:pl:sec. Thus, make the
function in charge of transforming the ppa address from a linear format
to the generic one available to all targets.

This function will be needed by the media manager in order to do target
mapping translations when targets are divided on different physical
partitions.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

0e5c3246

lightnvm: cleanup unused target operations · 7e4f64a9

由 Javier González 提交于 11月 28, 2016

Cleanup definition leftovers from old gennvm interface
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

7e4f64a9

lightnvm: remove sysfs configuration interface · 17b25cfc

由 Javier González 提交于 11月 28, 2016

LightNVM used to be managed and configured through sysfs. Since the
introduction of management ioctls this interface is redundant and
outdated. Get rid of it.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

17b25cfc

lightnvm: rrpc: split bios of size > 256kb · f0b01b6a

由 Javier González 提交于 11月 28, 2016

rrpc cannot handle bios of size > 256kb due to NVMe using a 64 bit
bitmap to signal I/O completion. If a larger bio comes, split it
explicitly.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

f0b01b6a

lightnvm: add ECC error codes · 402ab9a8

由 Javier González 提交于 11月 28, 2016

Add ECC error codes to enable the appropriate handling in the target.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

402ab9a8

lightnvm: export set bad block table · a24ba464

由 Javier González 提交于 11月 28, 2016

Bad blocks should be managed by block owners. This would be either
targets for data blocks or sysblk for system blocks.

In order to support this, export two functions: One to mark a block as
an specific type (e.g., bad block) and another to update the bad block
table on the device.

Move bad block management to rrpc.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

a24ba464

lightnvm: do not protect block 0 · 8a3c95ab

由 Javier González 提交于 11月 28, 2016

Device blocks should be marked by the device and considered as bad
blocks by the media manager. Thus, do not make assumptions on which
blocks are going to be used by the device. In doing so we might lose
valid blocks from the free list.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

8a3c95ab

lightnvm: enable to send hint to erase command · bb314979

由 Javier González 提交于 11月 28, 2016

Erases might be subject to host hints. An example is multi-plane
programming to erase blocks in parallel. Enable targets to specify this
hint.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

bb314979

nvme: lightnvm: attach lightnvm sysfs to nvme block device · 3dc87dd0

由 Matias Bjørling 提交于 11月 28, 2016

Previously, LBA read and write were not supported in the lightnvm
specification. Now that it supports it, lets use the traditional
NVMe gendisk, and attach the lightnvm sysfs geometry export.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

3dc87dd0

nvme: lightnvm: frees wrong cmd structure · 7498e99f

由 Matias Bjørling 提交于 11月 28, 2016

When struct nvme_request was introduced, the nvme_nvm_submit_io was
converted to the new interface. The interface moves nvme_nvm_command
data structure into the struct request pdu. On io completion, rq->cmd is
freed, which should have been the dereferenced pdu nvme_request->cmd.

Fixes: d49187e9 "nvme: introduce struct nvme_request"
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

7498e99f

29 11月, 2016 4 次提交

blk-mq: Drop explicit timeout sync in hotplug · 415d3dab

由 Gabriel Krisman Bertazi 提交于 11月 28, 2016

After commit 287922eb ("block: defer timeouts to a workqueue"),
deleting the timeout work after freezing the queue shouldn't be
necessary, since the synchronization is already enforced by the
acquisition of a q_usage_counter reference in blk_mq_timeout_work.
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

415d3dab

blk-wbt: allow wbt to be enabled always through sysfs · d62118b6

由 Jens Axboe 提交于 11月 28, 2016

Currently there's no way to enable wbt if it's not enabled in the
kernel config by default for a device. Allow a write to the
'wbt_lat_usec' queue sysfs file to enable wbt.

This is useful for both the kernel config case, but also if the
device is CFQ managed and it was turned off by default.
Signed-off-by: NJens Axboe <axboe@fb.com>

d62118b6

blk-wbt: cleanup disable-by-default for CFQ · fa224eed

由 Jens Axboe 提交于 11月 28, 2016

Make it clear that we are disabling wbt for the specified queued,
if it was enabled by default. This is in preparation for allowing
users to re-enable wbt, and not have it disabled automatically
again.
Signed-off-by: NJens Axboe <axboe@fb.com>

fa224eed

blk-wbt: allow reset of default latency through sysfs · 80e091d1

由 Jens Axboe 提交于 11月 28, 2016

Allow a write of '-1' to reset the default latency target for
a given device. This removes knowledge of the different default
settings for rotational vs non-rotational from user space.
Signed-off-by: NJens Axboe <axboe@fb.com>

80e091d1

23 11月, 2016 3 次提交

nbd: fix setting of 'error' in NBD_DO_IT ioctl · feffa5cc

由 Jens Axboe 提交于 11月 22, 2016

Multiple paths don't set it properly, ensure that we do.

Fixes: 9561a7ad ("nbd: add multi-connection support")
Signed-off-by: NJens Axboe <axboe@fb.com>

feffa5cc

nbd: move multi-connection bit to unused value · 63db89ea

由 Jens Axboe 提交于 11月 22, 2016

Bit #7 is already used, move to bit #8 which is the first unused
one.

Fixes: 9561a7ad ("nbd: add multi-connection support")
Signed-off-by: NJens Axboe <axboe@fb.com>

63db89ea

nbd: add multi-connection support · 9561a7ad

由 Josef Bacik 提交于 11月 22, 2016

NBD can become contended on its single connection.  We have to serialize all
writes and we can only process one read response at a time.  Fix this by
allowing userspace to provide multiple connections to a single nbd device.  This
coupled with block-mq drastically increases performance in multi-process cases.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9561a7ad

22 11月, 2016 15 次提交

block,blkcg: use __GFP_NOWARN for best-effort allocations in blkcg · e00f4f4d

由 Tejun Heo 提交于 11月 21, 2016

blkcg allocates some per-cgroup data structures with GFP_NOWAIT and
when that fails falls back to operations which aren't specific to the
cgroup.  Occassional failures are expected under pressure and falling
back to non-cgroup operation is the right thing to do.

Unfortunately, I forgot to add __GFP_NOWARN to these allocations and
these expected failures end up creating a lot of noise.  Add
__GFP_NOWARN.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMarc MERLIN <marc@merlins.org>
Reported-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

e00f4f4d

fs: logfs: remove unnecesary check · 05aea81b

由 Ming Lei 提交于 11月 11, 2016

The check on bio->bi_vcnt doesn't make sense in erase_end_io().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

05aea81b

fs: logfs: use bio_add_page() in do_erase() · c1248436

由 Ming Lei 提交于 11月 11, 2016

Also code gets simplified a bit.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c1248436

fs: logfs: use bio_add_page() in __bdev_writeseg() · d4f98a89

由 Ming Lei 提交于 11月 11, 2016

Also this patch simplify the code a bit.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d4f98a89

fs: logfs: convert to bio_add_page() in sync_request() · 739a9975

由 Ming Lei 提交于 11月 11, 2016

Always bio_add_page() is the standard and preferred way to
do the task.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

739a9975

bcache: debug: avoid accessing .bi_io_vec directly · 4113b88a

由 Ming Lei 提交于 11月 11, 2016

Instead we use standard iterator way to do that.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4113b88a

target: avoid accessing .bi_vcnt directly · 84c85906

由 Ming Lei 提交于 11月 11, 2016

When the bio is full, bio_add_pc_page() will return zero,
so use this information tell when the bio is full.

Also replace access to .bi_vcnt for pr_debug() with bio_segments().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

84c85906

block: floppy: use bio_add_page() · 2c73a603

由 Ming Lei 提交于 11月 11, 2016

Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2c73a603

block: drbd: remove impossible failure handling · 06efffda

由 Ming Lei 提交于 11月 11, 2016

For a non-cloned bio, bio_add_page() only returns failure when
the io vec table is full, but in that case, bio->bi_vcnt can't
be zero at all.

So remove the impossible failure handling.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

06efffda

block: bio: pass bvec table to bio_init() · 3a83f467

由 Ming Lei 提交于 11月 22, 2016

Some drivers often use external bvec table, so introduce
this helper for this case. It is always safe to access the
bio->bi_io_vec in this way for this case.

After converting to this usage, it will becomes a bit easier
to evaluate the remaining direct access to bio->bi_io_vec,
so it can help to prepare for the following multipage bvec
support.
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

Fixed up the new O_DIRECT cases.
Signed-off-by: NJens Axboe <axboe@fb.com>

3a83f467

block_dev: get rid of blksize bits calculation · 9a794fb9

由 Jens Axboe 提交于 11月 22, 2016

We store the bits in the bdev sector size locally, but we don't use
the calculation anymore. All we do with it is shift it back up to
the bdev sector size. So let's just use that directly and kill the
variable and bits calculation.
Signed-off-by: NJens Axboe <axboe@fb.com>

9a794fb9

block_dev: Fixed direct I/O bio sector calculation · 4d1a4765

由 Damien Le Moal 提交于 11月 22, 2016

A direct I/O alignment must be always checked against the device blocks size,
but the I/O offset (bio->bi_iter.bi_sector must always use 512B sector unit, and
not the actual logical block size.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d1a4765

block: apply blk_partition_remap to REQ_OP_ZONE_RESET · 778889d8

由 Shaun Tancheff 提交于 11月 21, 2016

If a ZBC device is partitioned and operations are performed on the partition
the zone information is rebased to the partition, however the zone reset
is not mapped from the partition to device as are other operations.

This causes the API (report zones / reset zone) to be unbalanced in this
regard. Checking for the zone reset op code explicitly will balance the
API.
Signed-off-by: NShaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

778889d8

block: clear all of bi_opf in bio_set_op_attrs · 93c5bdf7

由 Christoph Hellwig 提交于 11月 21, 2016

Since commit 87374179 ("block: add a proper block layer data direction
encoding") we only or the new op and flags into bi_opf in bio_set_op_attrs
instead of clearing the old value.  I've not seen any breakage with the
new behavior, but it seems dangerous.

Also convert it to an inline function to make the argument passing
safer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

93c5bdf7

pktcdvd: mark as unmaintained and deprecated · 5a8b187c

由 Jens Axboe 提交于 11月 21, 2016

This driver is both orphaned, and not really useful anymore. Mark
it as such, and remove it in a future kernel after a release or
two.
Signed-off-by: NJens Axboe <axboe@fb.com>

5a8b187c

18 11月, 2016 8 次提交

block: Change extern inline to static inline · 9a05e754

由 Tobias Klauser 提交于 11月 18, 2016

With compilers which follow the C99 standard (like modern versions of
gcc and clang), "extern inline" does the opposite thing from older
versions of gcc (emits code for an externally linkable version of the
inline function).

"static inline" does the intended behavior in all cases instead.

Description taken from commit 6d91857d ("staging, rtl8192e,
LLVMLinux: Change extern inline to static inline").

This also fixes the following GCC warning when building with CONFIG_PM
disabled:

  ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes]

Fixes: d07ab6d1 ("block: Add blk_set_runtime_active()")
Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NJens Axboe <axboe@fb.com>

9a05e754

skd_main: drop duplicate header scatterlist.h · 55f958cc

由 Geliang Tang 提交于 11月 18, 2016

Drop duplicate header scatterlist.h from skd_main.c.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

55f958cc

block: document the 'io_poll_delay' queue sysfs file · 10e6246e

由 Jens Axboe 提交于 11月 17, 2016

This was documented in the original commit, 64f1c21e, but it
never made it into the proper location for queue sysfs files.
Signed-off-by: NJens Axboe <axboe@fb.com>

10e6246e

block: new direct I/O implementation · 542ff7bf

由 Christoph Hellwig 提交于 11月 16, 2016

Similar to the simple fast path, but we now need a dio structure to
track multiple-bio completions.  It's basically a cut-down version
of the new iomap-based direct I/O code for filesystems, but without
all the logic to call into the filesystem for extent lookup or
allocation, and without the complex I/O completion workqueue handler
for AIO - instead we just use the FUA bit on the bios to ensure
data is flushed to stable storage.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

542ff7bf

J
block: make __blkdev_direct_IO_sync() support O_SYNC/DSYNC · 78250c02
由 Jens Axboe 提交于 11月 17, 2016
```
Split the op setting code into a helper, use it in both places.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
78250c02
J
block: support a full bio worth of IO for simplified bdev direct-io · 72ecad22
由 Jens Axboe 提交于 11月 16, 2016
```
Just alloc the bio_vec array if we exceed the inline limit.
Signed-off-by: NJens Axboe <axboe@fb.com>
```
72ecad22

blk-mq: make the polling code adaptive · 64f1c21e

由 Jens Axboe 提交于 11月 14, 2016

The previous commit introduced the hybrid sleep/poll mode. Take
that one step further, and use the completion latencies to
automatically sleep for half the mean completion time. This is
a good approximation.

This changes the 'io_poll_delay' sysfs file a bit to expose the
various options. Depending on the value, the polling code will
behave differently:

-1	Never enter hybrid sleep mode
 0	Use half of the completion mean for the sleep delay
>0	Use this specific value as the sleep delay
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-By: NStephen Bates <sbates@raithlin.com>
Reviewed-By: NStephen Bates <sbates@raithlin.com>

64f1c21e

blk-mq: implement hybrid poll mode for sync O_DIRECT · 06426adf

由 Jens Axboe 提交于 11月 14, 2016

This patch enables a hybrid polling mode. Instead of polling after IO
submission, we can induce an artificial delay, and then poll after that.
For example, if the IO is presumed to complete in 8 usecs from now, we
can sleep for 4 usecs, wake up, and then do our polling. This still puts
a sleep/wakeup cycle in the IO path, but instead of the wakeup happening
after the IO has completed, it'll happen before. With this hybrid
scheme, we can achieve big latency reductions while still using the same
(or less) amount of CPU.
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-By: NStephen Bates <sbates@raithlin.com>
Reviewed-By: NStephen Bates <sbates@raithlin.com>

06426adf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功