提交 · b9c54f5660e7eff10dd2ddd1eae554573105b15d · openeuler / Kernel

27 5月, 2020 12 次提交

block: account merge of two requests · b9c54f56

由 Konstantin Khlebnikov 提交于 5月 27, 2020

Also rename blk_account_io_merge() into blk_account_io_merge_request() to
distinguish it from merging request and bio.

[hch: rebased]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b9c54f56

block: always use a percpu variable for disk stats · 58d4f14f

由 Christoph Hellwig 提交于 5月 27, 2020

percpu variables have a perfectly fine working stub implementation
for UP kernels, so use that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

58d4f14f

block: move update_io_ticks to blk-core.c · 9123bf6f

由 Christoph Hellwig 提交于 5月 27, 2020

All callers are in blk-core.c, so move update_io_ticks over.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9123bf6f

block: remove generic_{start,end}_io_acct · e722fff2

由 Christoph Hellwig 提交于 5月 27, 2020

Remove these now unused functions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e722fff2

zram: nvdimm: use bio_{start,end}_io_acct and disk_{start,end}_io_acct · d7614e44

由 Christoph Hellwig 提交于 5月 27, 2020

Switch zram to use the nicer bio accounting helpers, and as part of that
ensure each bio is counted as a single I/O request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d7614e44

nvdimm: use bio_{start,end}_io_acct · 0fd92f89

由 Christoph Hellwig 提交于 5月 27, 2020

Switch dm to use the nicer bio accounting helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0fd92f89

dm: use bio_{start,end}_io_acct · 86240d5b

由 Christoph Hellwig 提交于 5月 27, 2020

Switch dm to use the nicer bio accounting helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

86240d5b

bcache: use bio_{start,end}_io_acct · 85750aeb

由 Christoph Hellwig 提交于 5月 27, 2020

Switch bcache to use the nicer bio accounting helpers, and call the
routines where we also sample the start time to give coherent accounting
results.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85750aeb

lightnvm/pblk: use bio_{start,end}_io_acct · a8e45650

由 Christoph Hellwig 提交于 5月 27, 2020

Switch rsxx to use the nicer bio accounting helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8e45650

rsxx: use bio_{start,end}_io_acct · 421716bc

由 Christoph Hellwig 提交于 5月 27, 2020

Switch rsxx to use the nicer bio accounting helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

421716bc

drbd: use bio_{start,end}_io_acct · 24d69293

由 Christoph Hellwig 提交于 5月 27, 2020

Switch drbd to use the nicer bio accounting helpers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

24d69293

block: add disk/bio-based accounting helpers · 956d510e

由 Christoph Hellwig 提交于 5月 27, 2020

Add two new helpers to simplify I/O accounting for bio based drivers.
Currently these drivers use the generic_start_io_acct and
generic_end_io_acct helpers which have very cumbersome calling
conventions, don't actually return the time they started accounting,
and try to deal with accounting for partitions, which can't happen
for bio based drivers.  The new helpers will be used to subsequently
replace uses of the old helpers.

The main API is the bio based wrappes in blkdev.h, but for zram
which wants to account rw_page based I/O lower level routines are
provided as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

956d510e

22 5月, 2020 2 次提交

block: remove the disk and queue NULL checks in blkdev_issue_flush · c81b49d4

由 Christoph Hellwig 提交于 5月 13, 2020

Both of these never can be NULL for a live block device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c81b49d4

block: remove the error_sector argument to blkdev_issue_flush · 9398554f

由 Christoph Hellwig 提交于 5月 13, 2020

The argument isn't used by any caller, and drivers don't fill out
bi_sector for flush requests either.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9398554f

19 5月, 2020 13 次提交

block: Remove unused flush_queue_delayed in struct blk_flush_queue · 172ce41d

由 Baolin Wang 提交于 5月 17, 2020

The flush_queue_delayed was introdued to hold queue if flush is
running for non-queueable flush drive by commit 3ac0cc45
("hold queue if flush is running for non-queueable flush drive"),
but the non mq parts of the flush code had been removed by
commit 7e992f84 ("block: remove non mq parts from the flush code"),
as well as removing the usage of the flush_queue_delayed flag.
Thus remove the unused flush_queue_delayed flag.
Signed-off-by: NBaolin Wang <baolin.wang7@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

172ce41d

null_blk: Zero-initialize read buffers in non-memory-backed mode · cecbc9ce

由 Bart Van Assche 提交于 5月 18, 2020

This patch suppresses an uninteresting KMSAN complaint without affecting
performance of the null_blk driver if CONFIG_KMSAN is disabled.
Reported-by: NAlexander Potapenko <glider@google.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Tested-by: NAlexander Potapenko <glider@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cecbc9ce

block: Document the bio_vec properties · 854b5f01

由 Bart Van Assche 提交于 5月 18, 2020

Since it is nontrivial that nth_page() does not have to be used for a
bio_vec, document this.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
CC: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

854b5f01

bio.h: Declare the arguments of the bio iteration functions const · c1527c0e

由 Bart Van Assche 提交于 5月 18, 2020

This change makes it possible to pass 'const struct bio *' arguments to
these functions.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c1527c0e

block: Fix type of first compat_put_{,u}long() argument · c8210a57

由 Bart Van Assche 提交于 5月 18, 2020

This patch fixes the following sparse warnings:

block/ioctl.c:209:16: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:209:16: expected void const volatile [noderef] <asn:1> *
block/ioctl.c:209:16: got signed int [usertype] *argp
block/ioctl.c:214:16: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:214:16: expected void const volatile [noderef] <asn:1> *
block/ioctl.c:214:16: got unsigned int [usertype] *argp
block/ioctl.c:666:40: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:666:40: expected signed int [usertype] *argp
block/ioctl.c:666:40: got void [noderef] <asn:1> *argp
block/ioctl.c:672:41: warning: incorrect type in argument 1 (different address spaces)
block/ioctl.c:672:41: expected unsigned int [usertype] *argp
block/ioctl.c:672:41: got void [noderef] <asn:1> *argp

Fixes: 9b81648c ("compat_ioctl: simplify up block/ioctl.c")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8210a57

block: merge part_{inc,dev}_in_flight into their only callers · 10ec5e86

由 Christoph Hellwig 提交于 5月 13, 2020

part_inc_in_flight and part_dec_in_flight only have one caller each, and
those callers are purely for bio based drivers.  Merge each function into
the only caller, and remove the superflous blk-mq checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10ec5e86

block: don't call part_{inc,dec}_in_flight for blk-mq devices · 76268f3a

由 Christoph Hellwig 提交于 5月 13, 2020

part_inc_in_flight and part_dec_in_flight are no-ops for blk-mq queues,
so remove the calls in purely blk-mq callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76268f3a

block: move the blk-mq calls out of part_in_flight{,_rw} · b2f609e1

由 Christoph Hellwig 提交于 5月 13, 2020

Don't bother to call part_in_flight / part_in_flight_rw on blk-mq
devices, just call the blk-mq versions directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2f609e1

C
block: mark blk_account_io_completion static · f1394b79
由 Christoph Hellwig 提交于 5月 13, 2020
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
f1394b79

blk-mq: allow blk_mq_make_request to consume the q_usage_counter reference · ac7c5675

由 Christoph Hellwig 提交于 5月 16, 2020

blk_mq_make_request currently needs to grab an q_usage_counter
reference when allocating a request.  This is because the block layer
grabs one before calling blk_mq_make_request, but also releases it as
soon as blk_mq_make_request returns.  Remove the blk_queue_exit call
after blk_mq_make_request returns, and instead let it consume the
reference.  This works perfectly fine for the block layer caller, just
device mapper needs an extra reference as the old problem still
persists there.  Open code blk_queue_enter_live in device mapper,
as there should be no other callers and this allows better documenting
why we do a non-try get.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ac7c5675

blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request_hctx · 35b371ff

由 Christoph Hellwig 提交于 5月 16, 2020

No need for two queue references.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

35b371ff

blk-mq: remove a pointless queue enter pair in blk_mq_alloc_request · 22fa792c

由 Christoph Hellwig 提交于 5月 16, 2020

No need for two queue references.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

22fa792c

blk-mq: move the call to blk_queue_enter_live out of blk_mq_get_request · a5ea5811

由 Christoph Hellwig 提交于 5月 16, 2020

Move the blk_queue_enter_live calls into the callers, where they can
successively be cleaned up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a5ea5811

17 5月, 2020 2 次提交

blktrace: Report pid with note messages · 870c153c

由 Jan Kara 提交于 5月 13, 2020

Currently informational messages within block trace do not have PID
information of the process reporting the message included. With BFQ it
is sometimes useful to have the information and there's no good reason
to omit the information from the trace. So just fill in pid information
when generating note message.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

870c153c

block: remove the REQ_NOWAIT_INLINE flag · 2771cefe

由 Christoph Hellwig 提交于 5月 04, 2020

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2771cefe

14 5月, 2020 7 次提交

block: blk-crypto-fallback for Inline Encryption · 488f6682

由 Satya Tangirala 提交于 5月 14, 2020

Blk-crypto delegates crypto operations to inline encryption hardware
when available. The separately configurable blk-crypto-fallback contains
a software fallback to the kernel crypto API - when enabled, blk-crypto
will use this fallback for en/decryption when inline encryption hardware
is not available.

This lets upper layers not have to worry about whether or not the
underlying device has support for inline encryption before deciding to
specify an encryption context for a bio. It also allows for testing
without actual inline encryption hardware - in particular, it makes it
possible to test the inline encryption code in ext4 and f2fs simply by
running xfstests with the inlinecrypt mount option, which in turn allows
for things like the regular upstream regression testing of ext4 to cover
the inline encryption code paths.

For more details, refer to Documentation/block/inline-encryption.rst.
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

488f6682

block: Make blk-integrity preclude hardware inline encryption · d145dc23

由 Satya Tangirala 提交于 5月 14, 2020

Whenever a device supports blk-integrity, make the kernel pretend that
the device doesn't support inline encryption (essentially by setting the
keyslot manager in the request queue to NULL).

There's no hardware currently that supports both integrity and inline
encryption. However, it seems possible that there will be such hardware
in the near future (like the NVMe key per I/O support that might support
both inline encryption and PI).

But properly integrating both features is not trivial, and without
real hardware that implements both, it is difficult to tell if it will
be done correctly by the majority of hardware that support both.
So it seems best not to support both features together right now, and
to decide what to do at probe time.
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d145dc23

block: Inline encryption support for blk-mq · a892c8d5

由 Satya Tangirala 提交于 5月 14, 2020

We must have some way of letting a storage device driver know what
encryption context it should use for en/decrypting a request. However,
it's the upper layers (like the filesystem/fscrypt) that know about and
manages encryption contexts. As such, when the upper layer submits a bio
to the block layer, and this bio eventually reaches a device driver with
support for inline encryption, the device driver will need to have been
told the encryption context for that bio.

We want to communicate the encryption context from the upper layer to the
storage device along with the bio, when the bio is submitted to the block
layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
represent an encryption context (note that we can't use the bi_private
field in struct bio to do this because that field does not function to pass
information across layers in the storage stack). We also introduce various
functions to manipulate the bio_crypt_ctx and make the bio/request merging
logic aware of the bio_crypt_ctx.

We also make changes to blk-mq to make it handle bios with encryption
contexts. blk-mq can merge many bios into the same request. These bios need
to have contiguous data unit numbers (the necessary changes to blk-merge
are also made to ensure this) - as such, it suffices to keep the data unit
number of just the first bio, since that's all a storage driver needs to
infer the data unit number to use for each data block in each bio in a
request. blk-mq keeps track of the encryption context to be used for all
the bios in a request with the request's rq_crypt_ctx. When the first bio
is added to an empty request, blk-mq will program the encryption context
of that bio into the request_queue's keyslot manager, and store the
returned keyslot in the request's rq_crypt_ctx. All the functions to
operate on encryption contexts are in blk-crypto.c.

Upper layers only need to call bio_crypt_set_ctx with the encryption key,
algorithm and data_unit_num; they don't have to worry about getting a
keyslot for each encryption context, as blk-mq/blk-crypto handles that.
Blk-crypto also makes it possible for request-based layered devices like
dm-rq to make use of inline encryption hardware by cloning the
rq_crypt_ctx and programming a keyslot in the new request_queue when
necessary.

Note that any user of the block layer can submit bios with an
encryption context, such as filesystems, device-mapper targets, etc.
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a892c8d5

block: Keyslot Manager for Inline Encryption · 1b262839

由 Satya Tangirala 提交于 5月 14, 2020

Inline Encryption hardware allows software to specify an encryption context
(an encryption key, crypto algorithm, data unit num, data unit size) along
with a data transfer request to a storage device, and the inline encryption
hardware will use that context to en/decrypt the data. The inline
encryption hardware is part of the storage device, and it conceptually sits
on the data path between system memory and the storage device.

Inline Encryption hardware implementations often function around the
concept of "keyslots". These implementations often have a limited number
of "keyslots", each of which can hold a key (we say that a key can be
"programmed" into a keyslot). Requests made to the storage device may have
a keyslot and a data unit number associated with them, and the inline
encryption hardware will en/decrypt the data in the requests using the key
programmed into that associated keyslot and the data unit number specified
with the request.

As keyslots are limited, and programming keys may be expensive in many
implementations, and multiple requests may use exactly the same encryption
contexts, we introduce a Keyslot Manager to efficiently manage keyslots.

We also introduce a blk_crypto_key, which will represent the key that's
programmed into keyslots managed by keyslot managers. The keyslot manager
also functions as the interface that upper layers will use to program keys
into inline encryption hardware. For more information on the Keyslot
Manager, refer to documentation found in block/keyslot-manager.c and
linux/keyslot-manager.h.
Co-developed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1b262839

Documentation: Document the blk-crypto framework · 54b259f6

由 Satya Tangirala 提交于 5月 14, 2020

The blk-crypto framework adds support for inline encryption. There are
numerous changes throughout the storage stack. This patch documents the
main design choices in the block layer, the API presented to users of
the block layer (like fscrypt or layered devices) and the API presented
to drivers for adding support for inline encryption.
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54b259f6

iocost: don't let vrate run wild while there's no saturation signal · 81ca627a

由 Tejun Heo 提交于 10月 14, 2019

When the QoS targets are met and nothing is being throttled, there's
no way to tell how saturated the underlying device is - it could be
almost entirely idle, at the cusp of saturation or anywhere inbetween.
Given that there's no information, it's best to keep vrate as-is in
this state.  Before 7cd806a9 ("iocost: improve nr_lagging
handling"), this was the case - if the device isn't missing QoS
targets and nothing is being throttled, busy_level was reset to zero.

While fixing nr_lagging handling, 7cd806a9 ("iocost: improve
nr_lagging handling") broke this.  Now, while the device is hitting
QoS targets and nothing is being throttled, vrate keeps getting
adjusted according to the existing busy_level.

This led to vrate keeping climing till it hits max when there's an IO
issuer with limited request concurrency if the vrate started low.
vrate starts getting adjusted upwards until the issuer can issue IOs
w/o being throttled.  From then on, QoS targets keeps getting met and
nothing on the system needs throttling and vrate keeps getting
increased due to the existing busy_level.

This patch makes the following changes to the busy_level logic.

* Reset busy_level if nr_shortages is zero to avoid the above
  scenario.

* Make non-zero nr_lagging block lowering nr_level but still clear
  positive busy_level if there's clear non-saturation signal - QoS
  targets are met and nr_shortages is non-zero.  nr_lagging's role is
  preventing adjusting vrate upwards while there are long-running
  commands and it shouldn't keep busy_level positive while there's
  clear non-saturation signal.

* Restructure code for clarity and add comments.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NAndy Newell <newella@fb.com>
Fixes: 7cd806a9 ("iocost: improve nr_lagging handling")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

81ca627a

block: move blk_io_schedule() out of header file · 71ac860a

由 Ming Lei 提交于 5月 14, 2020

blk_io_schedule() isn't called from performance sensitive code path, and
it is easier to maintain by exporting it as symbol.

Also blk_io_schedule() is only called by CONFIG_BLOCK code, so it is safe
to do this way. Meantime fixes build failure when CONFIG_BLOCK is off.

Cc: Christoph Hellwig <hch@infradead.org>
Fixes: e6249cdd ("block: add blk_io_schedule() for avoiding task hung in sync dio")
Reported-by: NSatya Tangirala <satyat@google.com>
Tested-by: NSatya Tangirala <satyat@google.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

71ac860a

13 5月, 2020 4 次提交

zonefs: use REQ_OP_ZONE_APPEND for sync DIO · 02ef12a6

由 Johannes Thumshirn 提交于 5月 12, 2020

Synchronous direct I/O to a sequential write only zone can be issued using
the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple
BIOs can potentially result in reordering, we cannot support asynchronous
IO via this interface.

We also can only dispatch up to queue_max_zone_append_sectors() via the
new zone-append method and have to return a short write back to user-space
in case an IO larger than queue_max_zone_append_sectors() has been issued.
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

02ef12a6

block: export bio_release_pages and bio_iov_iter_get_pages · 29b2a3aa

由 Johannes Thumshirn 提交于 5月 12, 2020

Export bio_release_pages and bio_iov_iter_get_pages, so they can be used
from modular code.
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

29b2a3aa

null_blk: Support REQ_OP_ZONE_APPEND · e0489ed5

由 Damien Le Moal 提交于 5月 12, 2020

Support REQ_OP_ZONE_APPEND requests for null_blk devices with zoned
mode enabled. Use the internally tracked zone write pointer position
as the actual write position and return it using the command request
__sector field in the case of an mq device and using the command BIO
sector in the case of a BIO device.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e0489ed5

scsi: sd_zbc: emulate ZONE_APPEND commands · 5795eb44

由 Johannes Thumshirn 提交于 5月 12, 2020

Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command
with a start LBA set to the target zone write pointer position.

In order to always know the write pointer position of a sequential write
zone, the write pointer of all zones is tracked using an array of 32bits
zone write pointer offset attached to the scsi disk structure. Each
entry of the array indicate a zone write pointer position relative to
the zone start sector. The write pointer offsets are maintained in sync
with the device as follows:
1) the write pointer offset of a zone is reset to 0 when a
   REQ_OP_ZONE_RESET command completes.
2) the write pointer offset of a zone is set to the zone size when a
   REQ_OP_ZONE_FINISH command completes.
3) the write pointer offset of a zone is incremented by the number of
   512B sectors written when a write, write same or a zone append
   command completes.
4) the write pointer offset of all zones is reset to 0 when a
   REQ_OP_ZONE_RESET_ALL command completes.

Since the block layer does not write lock zones for zone append
commands, to ensure a sequential ordering of the regular write commands
used for the emulation, the target zone of a zone append command is
locked when the function sd_zbc_prepare_zone_append() is called from
sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained
(e.g. a zone append is in-flight or a regular write has already locked
the zone), the zone append command dispatching is delayed by returning
BLK_STS_ZONE_RESOURCE.

To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL
requests, use a spinlock to protect accesses and modifications of the
zone write pointer offsets. This spinlock is initialized from sd_probe()
using the new function sd_zbc_init().
Co-developed-by: NDamien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5795eb44

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功