- 30 12月, 2019 1 次提交
-
-
由 Ming Lei 提交于
We ran into a problem with a mpt3sas based controller, where we would see random (and hard to reproduce) file corruption). The issue seemed specific to this controller, but wasn't specific to the file system. After a lot of debugging, we find out that it's caused by segments spanning a 4G memory boundary. This shouldn't happen, as the default setting for segment boundary masks is 4G. Turns out there are two issues in get_max_segment_size(): 1) The default segment boundary mask is bypassed 2) The segment start address isn't taken into account when checking segment boundary limit Fix these two issues by removing the bypass of the segment boundary check even if the mask is set to the default value, and taking into account the actual start address of the request when checking if a segment needs splitting. Cc: stable@vger.kernel.org # v5.1+ Reviewed-by: NChris Mason <clm@fb.com> Tested-by: NChris Mason <clm@fb.com> Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: NMing Lei <ming.lei@redhat.com> Dropped const on the page pointer, ppc page_to_phys() doesn't mark the page as const... Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 11月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
We really don't need this, as the slow path will do the right thing anyway. This reverts commit 6952a7f8. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 11月, 2019 2 次提交
-
-
由 Ming Lei 提交于
64K PAGE_SIZE is popular on ARM64 or other ARCHs, and 64K has been big enough to break some devices probably, so change the logic to split bio if the only bvec's length is > SZ_4K instead of PAGE_SIZE. Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations) Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Some device may set segment boundary as PAGE_SIZE - 1. If the bvec crosses pages, and meantime its length is <= PAGE_SIZE, we still need to split the bvec into 2 segments. Fixes this issue by still splitting bio if the single bvec crosses pages. Reported-by: Nkernel test robot <lkp@intel.com> Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations) Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 11月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
__blk_queue_split() adds significant overhead for small I/O operations. Add a shortcut to avoid it for cases where we know we never need to split. Based on a patch from Ming Lei. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 8月, 2019 5 次提交
-
-
由 Bart Van Assche 提交于
Consider the following example: * The logical block size is 4 KB. * The physical block size is 8 KB. * max_sectors equals (16 KB >> 9) sectors. * A non-aligned 4 KB and an aligned 64 KB bio are merged into a single non-aligned 68 KB bio. The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB + 16 KB + 4 KB). The start of none of these five bio's is aligned to a physical block boundary. This patch ensures that such a bio is split into four aligned and one non-aligned bio instead of being split into five non-aligned bios. This improves performance because most block devices can handle aligned requests faster than non-aligned requests. Since the physical block size is larger than or equal to the logical block size, this patch preserves the guarantee that the returned value is a multiple of the logical block size. Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Move the max_sectors check into bvec_split_segs() such that a single call to that function can do all the necessary checks. This patch optimizes the fast path further, namely if a bvec fits in a page. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Simplify this function by by removing two if-tests. Other than requiring that the @sectors pointer is not NULL, this patch does not change the behavior of bvec_split_segs(). Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Since what the bio splitting functions do is nontrivial, document these functions. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Make it clear to the compiler and also to humans that the functions that query request queue properties do not modify any member of the request_queue data structure. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 7月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Fix a regression introduced when removing bi_phys_segments for Write Zeroes requests, which need to have a segment count of zero, as they don't have a payload. Fixes: 14ccb66b ("block: remove the bi_phys_segments field in struct bio") Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 6月, 2019 3 次提交
-
-
由 Christoph Hellwig 提交于
Now that we don't need to assign the front/back segment sizes, we can duplicating the segs assignment for the split vs no-split case and remove a whole chunk of boilerplate code. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Return the segement and let the callers assign them, which makes the code a littler more obvious. Also pass the request instead of q plus bio chain, allowing for the use of rq_for_each_bvec. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
We only need the number of segments in the blk-mq submission path. Remove the field from struct bio, and return it from a variant of blk_queue_split instead of that it can passed as an argument to those functions that need the value. This also means we stop recounting segments except for cloning and partial segments. To keep the number of arguments in this how path down remove pointless struct request_queue arguments from any of the functions that had it and grew a nr_segs argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 5月, 2019 3 次提交
-
-
由 Christoph Hellwig 提交于
At this point these fields aren't used for anything, so we can remove them. Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
We fundamentally do not have a maximum segement size for devices with a virt boundary. So don't bother checking it, especially given that the existing checks didn't properly work to start with as we never fully update the front/back segment size and miss the bi_seg_front_size that wuld have been required for some cases. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Currently ll_merge_requests_fn, unlike all other merge functions, reduces nr_phys_segments by one if the last segment of the previous, and the first segment of the next segement are contigous. While this seems like a nice solution to avoid building smaller than possible requests it causes a mismatch between the segments actually present in the request and those iterated over by the bvec iterators, including __rq_for_each_bio. This can for example mistrigger the single segment optimization in the nvme-pci driver, and might lead to mismatching nr_phys_segments number when recalculating the number of request when inserting a cloned request. We could possibly work around this by making the bvec iterators take the front and back segment size into account, but that would require moving them from the bio to the bio_iter and spreading this mess over all users of bvecs. Or we could simply remove this optimization under the assumption that most users already build good enough bvecs, and that the bio merge patch never cared about this optimization either. The latter is what this patch does. dff824b2 ("nvme-pci: optimize mapping of small single segment requests"). Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 4月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
While we generally allow scatterlists to have offsets larger than page size for an entry, and other subsystems like the crypto code make use of that, the block layer isn't quite ready for that. Flip the switch back to avoid them for now, and revisit that decision early in a merge window once the known offenders are fixed. Fixes: 8a96a0e4 ("block: rewrite blk_bvec_map_sg to avoid a nth_page call") Reviewed-by: NMing Lei <ming.lei@redhat.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 4月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
The offset in scatterlists is allowed to be larger than the page size, so don't go to great length to avoid that case and simplify the arithmetics. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 4月, 2019 1 次提交
-
-
由 Ming Lei 提交于
Commit f6970f83 ("block: don't check if adjacent bvecs in one bio can be mergeable") changes bvec merge by only considering two bvecs from different bios. However, if the former bio doesn't inlcude any io bvec, then the following warning may be triggered: warning: ‘bvec.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized] In practice, it shouldn't be triggered. Fixes it by adding check on former bio, the check shouldn't add any cost given 'bio->bi_iter' can be hit in cache. Reported-by: NJens Axboe <axboe@kernel.dk> Fixes: f6970f83 ("block: don't check if adjacent bvecs in one bio can be mergeable") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 4月, 2019 4 次提交
-
-
由 Ming Lei 提交于
Now both passthrough and FS IO have supported multi-page bvec, and bvec merging has been handled actually when adding page to bio, then adjacent bvecs won't be mergeable any more if they belong to same bio. So only try to merge bvecs if they are from different bios. Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Inside __blk_segment_map_sg(), page sized bvec mapping is optimized a bit with one standalone branch. So reuse __blk_bvec_map_sg() to do that. Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
The argument of 'request_queue' isn't used by __blk_bvec_map_sg(), so remove it. Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
For normal filesystem IO, each page is added via blk_add_page(), in which bvec(page) merge has been handled already, and basically not possible to merge two adjacent bvecs in one bio. So not try to merge two adjacent bvecs in blk_queue_split(). Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 3月, 2019 1 次提交
-
-
由 Ming Lei 提交于
blk_recount_segments() can be called in bio_add_pc_page() for calculating how many segments this bio will has after one page is added to this bio. If the resulted segment number is beyond the queue limit, the added page will be removed. The try-and-fix policy requires blk_recount_segments(__blk_recalc_rq_segments) to not consider the segment number limit. Unfortunately bvec_split_segs() does check this limit, and causes small segment number returned to bio_add_pc_page(), then page still may be added to the bio even though segment number limit becomes broken. Fixes this issue by not considering segment number limit when calcualting bio's segment number. Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count") Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 3月, 2019 1 次提交
-
-
由 Ming Lei 提交于
When the current bvec can be merged to the 1st segment, the bio's front segment size has to be updated. However, dcebd755 doesn't consider that case, then bio's front segment size may not be correct. This patch fixes this issue. Cc: Christoph Hellwig <hch@lst.de> Cc: Omar Sandoval <osandov@fb.com> Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 2月, 2019 3 次提交
-
-
由 Ming Lei 提交于
Introduce a fast path for single-page bvec IO, then we can avoid to call bvec_split_segs() unnecessarily. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg() can be avoided. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Single-page bvec can often be seen in small BS workloads, so introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily, which looks not cheap. Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 2月, 2019 1 次提交
-
-
由 Ming Lei 提交于
rq->bio can be NULL sometimes, such as flush request, so don't read bio->bi_seg_front_size until this 'bio' is checked as valid. Cc: Bart Van Assche <bvanassche@acm.org> Reported-by: NBart Van Assche <bvanassche@acm.org> Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 2月, 2019 4 次提交
-
-
由 Ming Lei 提交于
Since bdced438 ("block: setup bi_phys_segments after splitting"), physical segment number is mainly figured out in blk_queue_split() for fast path, and the flag of BIO_SEG_VALID is set there too. Now only blk_recount_segments() and blk_recalc_rq_segments() use this flag. Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID is set in blk_queue_split(). For another user of blk_recalc_rq_segments(): - run in partial completion branch of blk_update_request, which is an unusual case - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed since dm-rq is the only user. Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the current setup of the I/O path, as it isn't going to save you a significant amount of cycles. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
It is more efficient to use bio_for_each_bvec() to map sg, meantime we have to consider splitting multipage bvec as done in blk_bio_segment_split(). Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
First it is more efficient to use bio_for_each_bvec() in both blk_bio_segment_split() and __blk_recalc_rq_segments() to compute how many multi-page bvecs there are in the bio. Secondly once bio_for_each_bvec() is used, the bvec may need to be splitted because its length can be very longer than max segment size, so we have to split the big bvec into several segments. Thirdly when splitting multi-page bvec into segments, the max segment limit may be reached, so the bio split need to be considered under this situation too. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
It is wrong to use bio->bi_vcnt to figure out how many segments there are in the bio even though CLONED flag isn't set on this bio, because this bio may be splitted or advanced. So always use bio_segments() in blk_recount_segments(), and it shouldn't cause any performance loss now because the physical segment number is figured out in blk_queue_split() and BIO_SEG_VALID is set meantime since bdced438 ("block: setup bi_phys_segments after splitting"). Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Fixes: 76d8137a ("blk-merge: recaculate segment if it isn't less than max segments") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 1月, 2019 1 次提交
-
-
由 Jens Axboe 提交于
We can't touch a bio after ->make_request_fn(), for all we know it could already have been completed by the time this function returns. This reverts commit 698cef17. Reported-by: syzbot+4df6ca820108fd248943@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 1月, 2019 1 次提交
-
-
由 Ming Lei 提交于
Except for blk_queue_split(), bio_split() is used for splitting bio too, then the remained bio is often resubmit to queue via generic_make_request(). So the same queue enter recursion exits in this case too. Unfortunatley commit cd4a4ae4 doesn't help this case. This patch covers the above case by setting BIO_QUEUE_ENTERED before calling q->make_request_fn. In theory the per-bio flag is used to simulate one stack variable, it is just fine to clear it after q->make_request_fn is returned. Especially the same bio can't be submitted from another context. Fixes: cd4a4ae4 ("block: don't use blocking queue entered for recursive bio submits") Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: NeilBrown <neilb@suse.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 12月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that the the SCSI layer replaced the use of the cluster flag with segment size limits and the DMA boundary we can remove the cluster flag from the block layer. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 14 12月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 12月, 2018 2 次提交
-
-
由 Mikulas Patocka 提交于
We want to convert to per-cpu in_flight counters. The function part_round_stats needs the in_flight counter every jiffy, it would be too costly to sum all the percpu variables every jiffy, so it must be deleted. part_round_stats is used to calculate two counters - time_in_queue and io_ticks. time_in_queue can be calculated without part_round_stats, by adding the duration of the I/O when the I/O ends (the value is almost as exact as the previously calculated value, except that time for in-progress I/Os is not counted). io_ticks can be approximated by increasing the value when I/O is started or ended and the jiffies value has changed. If the I/Os take less than a jiffy, the value is as exact as the previously calculated value. If the I/Os take more than a jiffy, io_ticks can drift behind the previously calculated value. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mike Snitzer 提交于
All of part_stat_* and related methods are used with preempt disabled, so there is no need to pass cpu around to allow of them. Just call smp_processor_id() as needed. Suggested-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-