- 27 5月, 2020 12 次提交
-
-
由 Konstantin Khlebnikov 提交于
Also rename blk_account_io_merge() into blk_account_io_merge_request() to distinguish it from merging request and bio. [hch: rebased] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
percpu variables have a perfectly fine working stub implementation for UP kernels, so use that. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
All callers are in blk-core.c, so move update_io_ticks over. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Remove these now unused functions. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch zram to use the nicer bio accounting helpers, and as part of that ensure each bio is counted as a single I/O request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch dm to use the nicer bio accounting helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch dm to use the nicer bio accounting helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch bcache to use the nicer bio accounting helpers, and call the routines where we also sample the start time to give coherent accounting results. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch rsxx to use the nicer bio accounting helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch rsxx to use the nicer bio accounting helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch drbd to use the nicer bio accounting helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Add two new helpers to simplify I/O accounting for bio based drivers. Currently these drivers use the generic_start_io_acct and generic_end_io_acct helpers which have very cumbersome calling conventions, don't actually return the time they started accounting, and try to deal with accounting for partitions, which can't happen for bio based drivers. The new helpers will be used to subsequently replace uses of the old helpers. The main API is the bio based wrappes in blkdev.h, but for zram which wants to account rw_page based I/O lower level routines are provided as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 5月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Both of these never can be NULL for a live block device. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 5月, 2020 13 次提交
-
-
由 Baolin Wang 提交于
The flush_queue_delayed was introdued to hold queue if flush is running for non-queueable flush drive by commit 3ac0cc45 ("hold queue if flush is running for non-queueable flush drive"), but the non mq parts of the flush code had been removed by commit 7e992f84 ("block: remove non mq parts from the flush code"), as well as removing the usage of the flush_queue_delayed flag. Thus remove the unused flush_queue_delayed flag. Signed-off-by: NBaolin Wang <baolin.wang7@gmail.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
This patch suppresses an uninteresting KMSAN complaint without affecting performance of the null_blk driver if CONFIG_KMSAN is disabled. Reported-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Tested-by: NAlexander Potapenko <glider@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Since it is nontrivial that nth_page() does not have to be used for a bio_vec, document this. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> CC: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
This change makes it possible to pass 'const struct bio *' arguments to these functions. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
This patch fixes the following sparse warnings: block/ioctl.c:209:16: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:209:16: expected void const volatile [noderef] <asn:1> * block/ioctl.c:209:16: got signed int [usertype] *argp block/ioctl.c:214:16: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:214:16: expected void const volatile [noderef] <asn:1> * block/ioctl.c:214:16: got unsigned int [usertype] *argp block/ioctl.c:666:40: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:666:40: expected signed int [usertype] *argp block/ioctl.c:666:40: got void [noderef] <asn:1> *argp block/ioctl.c:672:41: warning: incorrect type in argument 1 (different address spaces) block/ioctl.c:672:41: expected unsigned int [usertype] *argp block/ioctl.c:672:41: got void [noderef] <asn:1> *argp Fixes: 9b81648c ("compat_ioctl: simplify up block/ioctl.c") Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
part_inc_in_flight and part_dec_in_flight only have one caller each, and those callers are purely for bio based drivers. Merge each function into the only caller, and remove the superflous blk-mq checks. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
part_inc_in_flight and part_dec_in_flight are no-ops for blk-mq queues, so remove the calls in purely blk-mq callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Don't bother to call part_in_flight / part_in_flight_rw on blk-mq devices, just call the blk-mq versions directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
blk_mq_make_request currently needs to grab an q_usage_counter reference when allocating a request. This is because the block layer grabs one before calling blk_mq_make_request, but also releases it as soon as blk_mq_make_request returns. Remove the blk_queue_exit call after blk_mq_make_request returns, and instead let it consume the reference. This works perfectly fine for the block layer caller, just device mapper needs an extra reference as the old problem still persists there. Open code blk_queue_enter_live in device mapper, as there should be no other callers and this allows better documenting why we do a non-try get. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
No need for two queue references. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
No need for two queue references. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Move the blk_queue_enter_live calls into the callers, where they can successively be cleaned up. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 5月, 2020 2 次提交
-
-
由 Jan Kara 提交于
Currently informational messages within block trace do not have PID information of the process reporting the message included. With BFQ it is sometimes useful to have the information and there's no good reason to omit the information from the trace. So just fill in pid information when generating note message. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NPaolo Valente <paolo.valente@linaro.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 5月, 2020 7 次提交
-
-
由 Satya Tangirala 提交于
Blk-crypto delegates crypto operations to inline encryption hardware when available. The separately configurable blk-crypto-fallback contains a software fallback to the kernel crypto API - when enabled, blk-crypto will use this fallback for en/decryption when inline encryption hardware is not available. This lets upper layers not have to worry about whether or not the underlying device has support for inline encryption before deciding to specify an encryption context for a bio. It also allows for testing without actual inline encryption hardware - in particular, it makes it possible to test the inline encryption code in ext4 and f2fs simply by running xfstests with the inlinecrypt mount option, which in turn allows for things like the regular upstream regression testing of ext4 to cover the inline encryption code paths. For more details, refer to Documentation/block/inline-encryption.rst. Signed-off-by: NSatya Tangirala <satyat@google.com> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Satya Tangirala 提交于
Whenever a device supports blk-integrity, make the kernel pretend that the device doesn't support inline encryption (essentially by setting the keyslot manager in the request queue to NULL). There's no hardware currently that supports both integrity and inline encryption. However, it seems possible that there will be such hardware in the near future (like the NVMe key per I/O support that might support both inline encryption and PI). But properly integrating both features is not trivial, and without real hardware that implements both, it is difficult to tell if it will be done correctly by the majority of hardware that support both. So it seems best not to support both features together right now, and to decide what to do at probe time. Signed-off-by: NSatya Tangirala <satyat@google.com> Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Satya Tangirala 提交于
We must have some way of letting a storage device driver know what encryption context it should use for en/decrypting a request. However, it's the upper layers (like the filesystem/fscrypt) that know about and manages encryption contexts. As such, when the upper layer submits a bio to the block layer, and this bio eventually reaches a device driver with support for inline encryption, the device driver will need to have been told the encryption context for that bio. We want to communicate the encryption context from the upper layer to the storage device along with the bio, when the bio is submitted to the block layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can represent an encryption context (note that we can't use the bi_private field in struct bio to do this because that field does not function to pass information across layers in the storage stack). We also introduce various functions to manipulate the bio_crypt_ctx and make the bio/request merging logic aware of the bio_crypt_ctx. We also make changes to blk-mq to make it handle bios with encryption contexts. blk-mq can merge many bios into the same request. These bios need to have contiguous data unit numbers (the necessary changes to blk-merge are also made to ensure this) - as such, it suffices to keep the data unit number of just the first bio, since that's all a storage driver needs to infer the data unit number to use for each data block in each bio in a request. blk-mq keeps track of the encryption context to be used for all the bios in a request with the request's rq_crypt_ctx. When the first bio is added to an empty request, blk-mq will program the encryption context of that bio into the request_queue's keyslot manager, and store the returned keyslot in the request's rq_crypt_ctx. All the functions to operate on encryption contexts are in blk-crypto.c. Upper layers only need to call bio_crypt_set_ctx with the encryption key, algorithm and data_unit_num; they don't have to worry about getting a keyslot for each encryption context, as blk-mq/blk-crypto handles that. Blk-crypto also makes it possible for request-based layered devices like dm-rq to make use of inline encryption hardware by cloning the rq_crypt_ctx and programming a keyslot in the new request_queue when necessary. Note that any user of the block layer can submit bios with an encryption context, such as filesystems, device-mapper targets, etc. Signed-off-by: NSatya Tangirala <satyat@google.com> Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Satya Tangirala 提交于
Inline Encryption hardware allows software to specify an encryption context (an encryption key, crypto algorithm, data unit num, data unit size) along with a data transfer request to a storage device, and the inline encryption hardware will use that context to en/decrypt the data. The inline encryption hardware is part of the storage device, and it conceptually sits on the data path between system memory and the storage device. Inline Encryption hardware implementations often function around the concept of "keyslots". These implementations often have a limited number of "keyslots", each of which can hold a key (we say that a key can be "programmed" into a keyslot). Requests made to the storage device may have a keyslot and a data unit number associated with them, and the inline encryption hardware will en/decrypt the data in the requests using the key programmed into that associated keyslot and the data unit number specified with the request. As keyslots are limited, and programming keys may be expensive in many implementations, and multiple requests may use exactly the same encryption contexts, we introduce a Keyslot Manager to efficiently manage keyslots. We also introduce a blk_crypto_key, which will represent the key that's programmed into keyslots managed by keyslot managers. The keyslot manager also functions as the interface that upper layers will use to program keys into inline encryption hardware. For more information on the Keyslot Manager, refer to documentation found in block/keyslot-manager.c and linux/keyslot-manager.h. Co-developed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NSatya Tangirala <satyat@google.com> Reviewed-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Satya Tangirala 提交于
The blk-crypto framework adds support for inline encryption. There are numerous changes throughout the storage stack. This patch documents the main design choices in the block layer, the API presented to users of the block layer (like fscrypt or layered devices) and the API presented to drivers for adding support for inline encryption. Signed-off-by: NSatya Tangirala <satyat@google.com> Reviewed-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Tejun Heo 提交于
When the QoS targets are met and nothing is being throttled, there's no way to tell how saturated the underlying device is - it could be almost entirely idle, at the cusp of saturation or anywhere inbetween. Given that there's no information, it's best to keep vrate as-is in this state. Before 7cd806a9 ("iocost: improve nr_lagging handling"), this was the case - if the device isn't missing QoS targets and nothing is being throttled, busy_level was reset to zero. While fixing nr_lagging handling, 7cd806a9 ("iocost: improve nr_lagging handling") broke this. Now, while the device is hitting QoS targets and nothing is being throttled, vrate keeps getting adjusted according to the existing busy_level. This led to vrate keeping climing till it hits max when there's an IO issuer with limited request concurrency if the vrate started low. vrate starts getting adjusted upwards until the issuer can issue IOs w/o being throttled. From then on, QoS targets keeps getting met and nothing on the system needs throttling and vrate keeps getting increased due to the existing busy_level. This patch makes the following changes to the busy_level logic. * Reset busy_level if nr_shortages is zero to avoid the above scenario. * Make non-zero nr_lagging block lowering nr_level but still clear positive busy_level if there's clear non-saturation signal - QoS targets are met and nr_shortages is non-zero. nr_lagging's role is preventing adjusting vrate upwards while there are long-running commands and it shouldn't keep busy_level positive while there's clear non-saturation signal. * Restructure code for clarity and add comments. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NAndy Newell <newella@fb.com> Fixes: 7cd806a9 ("iocost: improve nr_lagging handling") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
blk_io_schedule() isn't called from performance sensitive code path, and it is easier to maintain by exporting it as symbol. Also blk_io_schedule() is only called by CONFIG_BLOCK code, so it is safe to do this way. Meantime fixes build failure when CONFIG_BLOCK is off. Cc: Christoph Hellwig <hch@infradead.org> Fixes: e6249cdd ("block: add blk_io_schedule() for avoiding task hung in sync dio") Reported-by: NSatya Tangirala <satyat@google.com> Tested-by: NSatya Tangirala <satyat@google.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 5月, 2020 4 次提交
-
-
由 Johannes Thumshirn 提交于
Synchronous direct I/O to a sequential write only zone can be issued using the new REQ_OP_ZONE_APPEND request operation. As dispatching multiple BIOs can potentially result in reordering, we cannot support asynchronous IO via this interface. We also can only dispatch up to queue_max_zone_append_sectors() via the new zone-append method and have to return a short write back to user-space in case an IO larger than queue_max_zone_append_sectors() has been issued. Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Johannes Thumshirn 提交于
Export bio_release_pages and bio_iov_iter_get_pages, so they can be used from modular code. Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
Support REQ_OP_ZONE_APPEND requests for null_blk devices with zoned mode enabled. Use the internally tracked zone write pointer position as the actual write position and return it using the command request __sector field in the case of an mq device and using the command BIO sector in the case of a BIO device. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Johannes Thumshirn 提交于
Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command with a start LBA set to the target zone write pointer position. In order to always know the write pointer position of a sequential write zone, the write pointer of all zones is tracked using an array of 32bits zone write pointer offset attached to the scsi disk structure. Each entry of the array indicate a zone write pointer position relative to the zone start sector. The write pointer offsets are maintained in sync with the device as follows: 1) the write pointer offset of a zone is reset to 0 when a REQ_OP_ZONE_RESET command completes. 2) the write pointer offset of a zone is set to the zone size when a REQ_OP_ZONE_FINISH command completes. 3) the write pointer offset of a zone is incremented by the number of 512B sectors written when a write, write same or a zone append command completes. 4) the write pointer offset of all zones is reset to 0 when a REQ_OP_ZONE_RESET_ALL command completes. Since the block layer does not write lock zones for zone append commands, to ensure a sequential ordering of the regular write commands used for the emulation, the target zone of a zone append command is locked when the function sd_zbc_prepare_zone_append() is called from sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained (e.g. a zone append is in-flight or a regular write has already locked the zone), the zone append command dispatching is delayed by returning BLK_STS_ZONE_RESOURCE. To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and modifications of the zone write pointer offsets. This spinlock is initialized from sd_probe() using the new function sd_zbc_init(). Co-developed-by: NDamien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-