- 13 11月, 2021 1 次提交
-
-
由 Ming Lei 提交于
submit_bio_checks() may update bio->bi_opf, so we have to initialize blk_mq_alloc_data.cmd_flags with bio->bi_opf after submit_bio_checks() returns when allocating new request. In case of using cached request, fallback to allocate new request if cached rq isn't compatible with the incoming bio, otherwise change rq->cmd_flags with incoming bio->bi_opf. Fixes: 900e0807 ("block: move queue enter logic into blk_mq_submit_bio()") Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 11月, 2021 5 次提交
-
-
由 Laibin Qiu 提交于
KASAN reports a use-after-free report when doing block test: ================================================================== [10050.967049] BUG: KASAN: use-after-free in submit_bio_checks+0x1539/0x1550 [10050.977638] Call Trace: [10050.978190] dump_stack+0x9b/0xce [10050.979674] print_address_description.constprop.6+0x3e/0x60 [10050.983510] kasan_report.cold.9+0x22/0x3a [10050.986089] submit_bio_checks+0x1539/0x1550 [10050.989576] submit_bio_noacct+0x83/0xc80 [10050.993714] submit_bio+0xa7/0x330 [10050.994435] mpage_readahead+0x380/0x500 [10050.998009] read_pages+0x1c1/0xbf0 [10051.002057] page_cache_ra_unbounded+0x4c2/0x6f0 [10051.007413] do_page_cache_ra+0xda/0x110 [10051.008207] force_page_cache_ra+0x23d/0x3d0 [10051.009087] page_cache_sync_ra+0xca/0x300 [10051.009970] generic_file_buffered_read+0xbea/0x2130 [10051.012685] generic_file_read_iter+0x315/0x490 [10051.014472] blkdev_read_iter+0x113/0x1b0 [10051.015300] aio_read+0x2ad/0x450 [10051.023786] io_submit_one+0xc8e/0x1d60 [10051.029855] __se_sys_io_submit+0x125/0x350 [10051.033442] do_syscall_64+0x2d/0x40 [10051.034156] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [10051.048733] Allocated by task 18598: [10051.049482] kasan_save_stack+0x19/0x40 [10051.050263] __kasan_kmalloc.constprop.1+0xc1/0xd0 [10051.051230] kmem_cache_alloc+0x146/0x440 [10051.052060] mempool_alloc+0x125/0x2f0 [10051.052818] bio_alloc_bioset+0x353/0x590 [10051.053658] mpage_alloc+0x3b/0x240 [10051.054382] do_mpage_readpage+0xddf/0x1ef0 [10051.055250] mpage_readahead+0x264/0x500 [10051.056060] read_pages+0x1c1/0xbf0 [10051.056758] page_cache_ra_unbounded+0x4c2/0x6f0 [10051.057702] do_page_cache_ra+0xda/0x110 [10051.058511] force_page_cache_ra+0x23d/0x3d0 [10051.059373] page_cache_sync_ra+0xca/0x300 [10051.060198] generic_file_buffered_read+0xbea/0x2130 [10051.061195] generic_file_read_iter+0x315/0x490 [10051.062189] blkdev_read_iter+0x113/0x1b0 [10051.063015] aio_read+0x2ad/0x450 [10051.063686] io_submit_one+0xc8e/0x1d60 [10051.064467] __se_sys_io_submit+0x125/0x350 [10051.065318] do_syscall_64+0x2d/0x40 [10051.066082] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [10051.067455] Freed by task 13307: [10051.068136] kasan_save_stack+0x19/0x40 [10051.068931] kasan_set_track+0x1c/0x30 [10051.069726] kasan_set_free_info+0x1b/0x30 [10051.070621] __kasan_slab_free+0x111/0x160 [10051.071480] kmem_cache_free+0x94/0x460 [10051.072256] mempool_free+0xd6/0x320 [10051.072985] bio_free+0xe0/0x130 [10051.073630] bio_put+0xab/0xe0 [10051.074252] bio_endio+0x3a6/0x5d0 [10051.074984] blk_update_request+0x590/0x1370 [10051.075870] scsi_end_request+0x7d/0x400 [10051.076667] scsi_io_completion+0x1aa/0xe50 [10051.077503] scsi_softirq_done+0x11b/0x240 [10051.078344] blk_mq_complete_request+0xd4/0x120 [10051.079275] scsi_mq_done+0xf0/0x200 [10051.080036] virtscsi_vq_done+0xbc/0x150 [10051.080850] vring_interrupt+0x179/0x390 [10051.081650] __handle_irq_event_percpu+0xf7/0x490 [10051.082626] handle_irq_event_percpu+0x7b/0x160 [10051.083527] handle_irq_event+0xcc/0x170 [10051.084297] handle_edge_irq+0x215/0xb20 [10051.085122] asm_call_irq_on_stack+0xf/0x20 [10051.085986] common_interrupt+0xae/0x120 [10051.086830] asm_common_interrupt+0x1e/0x40 ================================================================== Bio will be checked at beginning of submit_bio_noacct(). If bio needs to be throttled, it will start the timer and stop submit bio directly. Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires. But in the current process, if bio is throttled, it will still set bio issue->value by blkcg_bio_issue_init(). This is redundant and may cause the above use-after-free. CPU0 CPU1 submit_bio submit_bio_noacct submit_bio_checks blk_throtl_bio() <=mod_timer(&sq->pending_timer blk_throtl_dispatch_work_fn submit_bio_noacct() <= bio have throttle tag, will throw directly and bio issue->value will be set here bio_endio() bio_put() bio_free() <= free this bio blkcg_bio_issue_init(bio) <= bio has been freed and will lead to UAF return BLK_QC_T_NONE Fix this by remove extra blkcg_bio_issue_init. Fixes: e439bedf (blkcg: consolidate bio_issue_init() to be a part of core) Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com> Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.comReviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shin'ichiro Kawasaki 提交于
When BLKRESETZONE ioctl and data read race, the data read leaves stale page cache. The commit e5113505 ("block: Discard page cache of zone reset target range") added page cache truncation to avoid stale page cache after the ioctl. However, the stale page cache still can be read during the reset zone operation for the ioctl. To avoid the stale page cache completely, hold invalidate_lock of the block device file mapping. Fixes: e5113505 ("block: Discard page cache of zone reset target range") Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
It is very annoying to have two block layer functions which share same name, so rename blk_attempt_bio_merge in blk-mq.c as blk_mq_attempt_bio_merge. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(), which is called when queue usage counter is grabbed already: 1) blk_mq_get_new_requests() 2) blk_mq_get_request() - cached request in current plug owns one queue usage counter So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more importantly this nest way causes hang in blk_mq_freeze_queue_wait(). Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The naming got changed as part of a revision of the patchset, but the kerneldoc apparently never got updated. Fix it. Reported-by: Nkernel test robot <lkp@intel.com> Fixes: a2247f19 ("block: Add independent access ranges support") Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 11月, 2021 4 次提交
-
-
由 Luis Chamberlain 提交于
Now that we have done a spring cleaning on all drivers and added error checking / handling, let's keep it that way and ensure no new drivers fail to stick with it. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20211110002949.999380-1-mcgrof@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
kernel test robot reports that we now trigger some sparse warnings: block/blk-mq.h:169:32: sparse: sparse: restricted req_flags_t degrades to integer block/blk-mq.h:169:32: sparse: sparse: restricted req_flags_t degrades to integer block/blk-mq.h:169:32: sparse: sparse: restricted req_flags_t degrades to integer which is due to ->rq_flags being an unsigned int, rather than the stronger type req_flags_t enum. Change the type to req_flags_t to silence this warning. Fixes: 56f8da64 ("block: add rq_flags to struct blk_mq_alloc_data") Reported-by: Nkernel test robot <lkp@intel.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shin'ichiro Kawasaki 提交于
When BLKZEROOUT ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is modified to call "blkdiscard -z" command and repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y. Rework is required for older stable kernels. Fixes: 22dd6d35 ("block: invalidate the page cache when issuing BLKZEROOUT") Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shin'ichiro Kawasaki 提交于
When BLKDISCARD ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y with slight patch edit. Rework is required for older stable kernels. Fixes: 351499a1 ("block: Invalidate cache on discard v2") Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-2-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 11月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it is hard to switch to this style, so these drivers need one atomic flag for helping to balance quiesce and unquiesce. When quiesce is in-progress, the driver still needs to wait until the quiesce is done, so add API of blk_mq_wait_quiesce_done() for these drivers. Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 11月, 2021 1 次提交
-
-
由 Ye Bin 提交于
We got UAF report on v5.10 as follows: [ 1446.674930] ================================================================== [ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348 [ 1446.677851] [ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2 [ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn [ 1446.681448] Call Trace: [ 1446.681800] dump_stack+0x9b/0xce [ 1446.682916] print_address_description.constprop.6+0x3e/0x60 [ 1446.685999] kasan_report.cold.9+0x22/0x3a [ 1446.687186] blk_mq_get_driver_tag+0x9a4/0xa90 [ 1446.687785] blk_mq_dispatch_rq_list+0x21a/0x1d40 [ 1446.692576] __blk_mq_do_dispatch_sched+0x394/0x830 [ 1446.695758] __blk_mq_sched_dispatch_requests+0x398/0x4f0 [ 1446.698279] blk_mq_sched_dispatch_requests+0xdf/0x140 [ 1446.698967] __blk_mq_run_hw_queue+0xc0/0x270 [ 1446.699561] __blk_mq_delay_run_hw_queue+0x4cc/0x550 [ 1446.701407] blk_mq_run_hw_queue+0x13b/0x2b0 [ 1446.702593] blk_mq_sched_insert_requests+0x1de/0x390 [ 1446.703309] blk_mq_flush_plug_list+0x4b4/0x760 [ 1446.705408] blk_flush_plug_list+0x2c5/0x480 [ 1446.708471] blk_finish_plug+0x55/0xa0 [ 1446.708980] blk_throtl_dispatch_work_fn+0x23b/0x2e0 [ 1446.711236] process_one_work+0x6d4/0xfe0 [ 1446.711778] worker_thread+0x91/0xc80 [ 1446.713400] kthread+0x32d/0x3f0 [ 1446.714362] ret_from_fork+0x1f/0x30 [ 1446.714846] [ 1446.715062] Allocated by task 1: [ 1446.715509] kasan_save_stack+0x19/0x40 [ 1446.716026] __kasan_kmalloc.constprop.1+0xc1/0xd0 [ 1446.716673] blk_mq_init_tags+0x6d/0x330 [ 1446.717207] blk_mq_alloc_rq_map+0x50/0x1c0 [ 1446.717769] __blk_mq_alloc_map_and_request+0xe5/0x320 [ 1446.718459] blk_mq_alloc_tag_set+0x679/0xdc0 [ 1446.719050] scsi_add_host_with_dma.cold.3+0xa0/0x5db [ 1446.719736] virtscsi_probe+0x7bf/0xbd0 [ 1446.720265] virtio_dev_probe+0x402/0x6c0 [ 1446.720808] really_probe+0x276/0xde0 [ 1446.721320] driver_probe_device+0x267/0x3d0 [ 1446.721892] device_driver_attach+0xfe/0x140 [ 1446.722491] __driver_attach+0x13a/0x2c0 [ 1446.723037] bus_for_each_dev+0x146/0x1c0 [ 1446.723603] bus_add_driver+0x3fc/0x680 [ 1446.724145] driver_register+0x1c0/0x400 [ 1446.724693] init+0xa2/0xe8 [ 1446.725091] do_one_initcall+0x9e/0x310 [ 1446.725626] kernel_init_freeable+0xc56/0xcb9 [ 1446.726231] kernel_init+0x11/0x198 [ 1446.726714] ret_from_fork+0x1f/0x30 [ 1446.727212] [ 1446.727433] Freed by task 26992: [ 1446.727882] kasan_save_stack+0x19/0x40 [ 1446.728420] kasan_set_track+0x1c/0x30 [ 1446.728943] kasan_set_free_info+0x1b/0x30 [ 1446.729517] __kasan_slab_free+0x111/0x160 [ 1446.730084] kfree+0xb8/0x520 [ 1446.730507] blk_mq_free_map_and_requests+0x10b/0x1b0 [ 1446.731206] blk_mq_realloc_hw_ctxs+0x8cb/0x15b0 [ 1446.731844] blk_mq_init_allocated_queue+0x374/0x1380 [ 1446.732540] blk_mq_init_queue_data+0x7f/0xd0 [ 1446.733155] scsi_mq_alloc_queue+0x45/0x170 [ 1446.733730] scsi_alloc_sdev+0x73c/0xb20 [ 1446.734281] scsi_probe_and_add_lun+0x9a6/0x2d90 [ 1446.734916] __scsi_scan_target+0x208/0xc50 [ 1446.735500] scsi_scan_channel.part.3+0x113/0x170 [ 1446.736149] scsi_scan_host_selected+0x25a/0x360 [ 1446.736783] store_scan+0x290/0x2d0 [ 1446.737275] dev_attr_store+0x55/0x80 [ 1446.737782] sysfs_kf_write+0x132/0x190 [ 1446.738313] kernfs_fop_write_iter+0x319/0x4b0 [ 1446.738921] new_sync_write+0x40e/0x5c0 [ 1446.739429] vfs_write+0x519/0x720 [ 1446.739877] ksys_write+0xf8/0x1f0 [ 1446.740332] do_syscall_64+0x2d/0x40 [ 1446.740802] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1446.741462] [ 1446.741670] The buggy address belongs to the object at ffff8880185afd00 [ 1446.741670] which belongs to the cache kmalloc-256 of size 256 [ 1446.743276] The buggy address is located 16 bytes inside of [ 1446.743276] 256-byte region [ffff8880185afd00, ffff8880185afe00) [ 1446.744765] The buggy address belongs to the page: [ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac [ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0 [ 1446.747719] flags: 0x1fffff80010200(slab|head) [ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240 [ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 [ 1446.750455] page dumped because: kasan: bad access detected [ 1446.751227] [ 1446.751445] Memory state around the buggy address: [ 1446.752102] ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.753090] ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.755065] ^ [ 1446.755589] ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1446.756574] ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1446.757566] ================================================================== Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the same host initializes it's queue successfully. However, if the second device failed to allocate memory in blk_mq_alloc_and_init_hctx() from blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(), __blk_mq_free_map_and_rqs() will be called on error path, and if 'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed while it's still used by the first device. To fix this issue we move release newly allocated hardware context from blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to release hardware context in blk_mq_init_allocated_queue. Fixes: 868f2f0b ("blk-mq: dynamic h/w context count") Signed-off-by: NYe Bin <yebin10@huawei.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 11月, 2021 7 次提交
-
-
由 Jens Axboe 提交于
We have new helpers for this, use them rather than the slower inode size reads. This makes the read/write path consistent with most of the rest of block as well. Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/a72767cd-3c6d-47f7-80f4-aa025a17b2cb@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Luis Chamberlain 提交于
Commit 83cbce95 ("block: add error handling for device_add_disk / add_disk") added error handling to device_add_disk(), however the goto label for the kobject_create_and_add() failure did not set the return value correctly, and so we can end up in a situation where kobject_create_and_add() fails but we report success. Fixes: 83cbce95 ("block: add error handling for device_add_disk / add_disk") Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211103164023.1384821-1-mcgrof@kernel.org [axboe: fold in followup fix from Wu Bo <wubo40@huawei.com>] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
If we're driving multiple devices, we could have pre-populated the cache for a different device. Ensure that the empty request matches the current queue. Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch") Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Retain the old logic for the fops based submit, but for our internal blk_mq_submit_bio(), move the queue entering logic into the core function itself. We need to be a bit careful if going into the scheduler, as a scheduler or queue mappings can arbitrarily change before we have entered the queue. Have the bio scheduler mapping do that separately, it's a very cheap operation compared to actually doing merging locking and lookups. Reviewed-by: NChristoph Hellwig <hch@lst.de> [axboe: update to check merge post submit_bio_checks() doing remap...] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just a prep patch for shifting the queue enter logic. This moves the expected fast path inline, and leaves __bio_queue_enter() as an out-of-line function call. We don't want to inline the latter, as it's mostly slow path code. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
This is in preparation for a fix, but serves as a cleanup as well moving the cached vs regular alloc logic out of blk_mq_submit_bio(). Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Requests that were stored in the cache deliberately didn't hold an enter reference to the queue, instead we grabbed one every time we pulled a request out of there. That made for awkward logic on freeing the remainder of the cached list, if needed, where we had to artificially raise the queue usage count before each free. Grab references up front for cached plug requests. That's safer, and also more efficient. Fixes: 47c122e3 ("block: pre-allocate requests if plug is started and is a batch") Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 11月, 2021 1 次提交
-
-
由 Luis Chamberlain 提交于
__register_blkdev() is used to register a probe callback, and that callback is typically used to call add_disk(). Now that we are able to capture errors for add_disk(), we need to fix those probe calls where add_disk() fails and clean up resources. We don't extend the probe call to return the error given: 1) we'd have to always special-case the case where the disk was already present, as otherwise concurrent requests to open an existing block device would fail, and this would be a userspace visible change 2) the error from ilookup() on blkdev_get_no_open() is sufficient 3) The only thing the probe call is used for is to support pre-devtmpfs, pre-udev semantics that want to create disks when their pre-created device node is accessed, and so we don't care for failures on probe there. Expand documentation for the probe callback to ensure users cleanup resources if add_disk() is used and to clarify this interface may be removed in the future. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20211103230437.1639990-12-mcgrof@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 11月, 2021 4 次提交
-
-
由 Ming Lei 提交于
In case of shared tags and none io sched, batched completion still may be run into, and hctx->nr_active is accounted when getting driver tag, so it has to be updated in blk_mq_end_request_batch(). Otherwise, hctx->nr_active may become same with queue depth, then hctx_may_queue() always return false, then io hang is caused. Fixes the issue by updating the counter in batched way. Reported-by: NShinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: f794f335 ("block: add support for blk_mq_end_request_batch()") Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102153619.3627505-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Looks it is missed so add it. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
It is obvious that io merge can't be done between two different queues, so just try to run io merge in case of same queue. Signed-off-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It's not safe to do this before blk_queue_enter(), as the scheduler state could have changed in between. Hence move the RQF_ELV setting into the allocators, where we know the queue is already entered. Suggested-by: NMing Lei <ming.lei@redhat.com> Reported-by: NYi Zhang <yi.zhang@redhat.com> Reported-by: NSteffen Maier <maier@linux.ibm.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 11月, 2021 2 次提交
-
-
由 Jens Axboe 提交于
A previous commit fixed up the condition for doing direct issue, but that left the 'from_schedule' argument dead inside the branch. Replace it with 'false'. Fixes: ff155223 ("blk-mq: don't issue request directly in case that current is to be blocked") Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Ensure that current tag is correctly assigned before attempting to prefetch the first cacheline of the request. Fixes: 92aff191 ("block: prefetch request to be initialized") Reported-and-tested-by: syzbot+cd20829ac44b92bf6ed0@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 10月, 2021 1 次提交
-
-
由 Jean Sacren 提交于
In the if branch, e is checked. In the else branch, ->dispatch_busy is merely a number and has no effect on !e. We should remove the check of !e since it is always true. Signed-off-by: NJean Sacren <sakiwit@gmail.com> Link: https://lore.kernel.org/r/20211029202945.3052-1-sakiwit@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 10月, 2021 3 次提交
-
-
由 John Garry 提交于
Currently we show the hctx.active value for the per-hctx "active" file. However this is not maintained for shared tags, and we instead keep a record of the number active requests per request queue - see commit f1b49fdc ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap). Change for the case of shared tags to show the active requests per request queue by using __blk_mq_active_requests() helper. Signed-off-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1635496823-33515-1-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
These are now pointless wrappers around blk_mq_{alloc,free}_request, so remove them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It's faster and easier to read if we tolerate cur_hctx being NULL in the "when to flush" condition. Rename last_hctx to cur_hctx while at it, as it better describes the role of that variable. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 10月, 2021 10 次提交
-
-
由 Jens Axboe 提交于
Now that we have flags passed in, we can do a final re-arrange of the flow of blk_mq_rq_ctx_init() so we're always writing request in the order in which it is laid out. Signed-off-by: NJens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-5-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Now we have the tags available in __blk_mq_alloc_requests_batch(), we can start fetching the first request cacheline before calling into the request initialization. Signed-off-by: NJens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-4-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of getting this from data for every invocation of request initialization, pass it in as an argument instead. Signed-off-by: NJens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-3-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
There's a hole here we can use, and it's faster to set this earlier rather than need to check q->elevator multiple times. Signed-off-by: NJens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-2-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Shin'ichiro Kawasaki 提交于
Commit a33df75c ("block: use an xarray for disk->part_tbl") modified the method to check partition existence in host-aware zoned block devices from disk_has_partitions() helper function call to empty check of xarray disk->part_tbl. However, disk->part_tbl always has single entry for disk->part0 and never becomes empty. This resulted in the host-aware zoned devices always judged to have partitions, and it made the sysfs queue/zoned attribute to be "none" instead of "host-aware" regardless of partition existence in the devices. This also caused DEBUG_LOCKS_WARN_ON(lock->magic != lock) for sdkp->rev_mutex in scsi layer when the kernel detects host-aware zoned device. Since block layer handled the host-aware zoned devices as non- zoned devices, scsi layer did not have chance to initialize the mutex for zone revalidation. Therefore, the warning was triggered. To fix the issues, call the helper function disk_has_partitions() in place of disk->part_tbl empty check. Since the function was removed with the commit a33df75c, reimplement it to walk through entries in the xarray disk->part_tbl. Fixes: a33df75c ("block: use an xarray for disk->part_tbl") Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.14+ Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211026060115.753746-1-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
If we know that a iocb is async we can optimise bio_set_polled() a bit, add a new helper bio_set_polled_async(). Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8fa137885164a5d05fadcff4c3521da8d5a83d00.1635337135.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Now __blkdev_direct_IO() serves only multi-bio I/O, thus remove not used anymore single bio refcounting optimisations. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/88eb488aae9ed4852a30f3a7132f296f56e43b80.1635337135.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
With addition of __blkdev_direct_IO_async(), __blkdev_direct_IO() now serves only multio-bio I/O, which we don't poll. Now we can remove anything related to I/O polling from it. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8c597a6b7ee612df394853bfd24726aee5b898e.1635337135.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Pavel Begunkov 提交于
Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as the we now have __blkdev_direct_IO_async(), which gets pages only once, we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU spent. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6158edfbfa2ae3bc24aed29a72f035df18fad2f.1635337135.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
The Concurrent Positioning Ranges VPD page (for SCSI) and data log page (for ATA) contain parameters describing the set of contiguous LBAs that can be served independently by a single LUN multi-actuator hard-disk. Similarly, a logically defined block device composed of multiple disks can in some cases execute requests directed at different sector ranges in parallel. A dm-linear device aggregating 2 block devices together is an example. This patch implements support for exposing a block device independent access ranges to the user through sysfs to allow optimizing device accesses to increase performance. To describe the set of independent sector ranges of a device (actuators of a multi-actuator HDDs or table entries of a dm-linear device), The type struct blk_independent_access_ranges is introduced. This structure describes the sector ranges using an array of struct blk_independent_access_range structures. This range structure defines the start sector and number of sectors of the access range. The ranges in the array cannot overlap and must contain all sectors within the device capacity. The function disk_set_independent_access_ranges() allows a device driver to signal to the block layer that a device has multiple independent access ranges. In this case, a struct blk_independent_access_ranges is attached to the device request queue by the function disk_set_independent_access_ranges(). The function disk_alloc_independent_access_ranges() is provided for drivers to allocate this structure. struct blk_independent_access_ranges contains kobjects (struct kobject) to expose to the user through sysfs the set of independent access ranges supported by a device. When the device is initialized, sysfs registration of the ranges information is done from blk_register_queue() using the block layer internal function disk_register_independent_access_ranges(). If a driver calls disk_set_independent_access_ranges() for a registered queue, e.g. when a device is revalidated, disk_set_independent_access_ranges() will execute disk_register_independent_access_ranges() to update the sysfs attribute files. The sysfs file structure created starts from the independent_access_ranges sub-directory and contains the start sector and number of sectors of each range, with the information for each range grouped in numbered sub-directories. E.g. for a dual actuator HDD, the user sees: $ tree /sys/block/sdk/queue/independent_access_ranges/ /sys/block/sdk/queue/independent_access_ranges/ |-- 0 | |-- nr_sectors | `-- sector `-- 1 |-- nr_sectors `-- sector For a regular device with a single access range, the independent_access_ranges sysfs directory does not exist. Device revalidation may lead to changes to this structure and to the attribute values. When manipulated, the queue sysfs_lock and sysfs_dir_lock mutexes are held for atomicity, similarly to how the blk-mq and elevator sysfs queue sub-directories are protected. The code related to the management of independent access ranges is added in the new file block/blk-ia-ranges.c. Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-2-damien.lemoal@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-