- 08 3月, 2023 2 次提交
-
-
由 Li Lingfeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6HOKY CVE: NA -------------------------------- commit 92567b1d6f5a ("block: split .sysfs_lock into two locks") introduce new member 'sysfs_dir_lock' in struct request_queue, thus move it to request_queue_wrapper to avoid kabi broken. Fixes: 92567b1d6f5a ("block: split .sysfs_lock into two locks") Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Ming Lei 提交于
mainline inclusion from mainline-5.4-rc1 commit cecf5d87 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6HOKY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2&id=cecf5d87ff2035127bb5a9ee054d0023a4a7cad3 --------------------------- The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store path. Meantime, inside block's .show/.store callback, q->sysfs_lock is required. However, when mq & iosched kobjects are removed via blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held too. This way causes AB-BA lock because the kernfs built-in lock of 'kn-count' is required inside kobject_del() too, see the lockdep warning[1]. On the other hand, it isn't necessary to acquire q->sysfs_lock for both blk_mq_unregister_dev() & elv_unregister_queue() because clearing REGISTERED flag prevents storing to 'queue/scheduler' from being happened. Also sysfs write(store) is exclusive, so no necessary to hold the lock for elv_unregister_queue() when it is called in switching elevator path. So split .sysfs_lock into two: one is still named as .sysfs_lock for covering sync .store, the other one is named as .sysfs_dir_lock for covering kobjects and related status change. sysfs itself can handle the race between add/remove kobjects and showing/storing attributes under kobjects. For switching scheduler via storing to 'queue/scheduler', we use the queue flag of QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then we can avoid to hold .sysfs_lock during removing/adding kobjects. [1] lockdep warning ====================================================== WARNING: possible circular locking dependency detected 5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted ------------------------------------------------------ rmmod/777 is trying to acquire lock: 00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72 but task is already holding lock: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&q->sysfs_lock){+.+.}: __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 __mutex_lock+0x14a/0xa9b blk_mq_hw_sysfs_show+0x63/0xb6 sysfs_kf_seq_show+0x11f/0x196 seq_read+0x2cd/0x5f2 vfs_read+0xc7/0x18c ksys_read+0xc4/0x13e do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: check_prev_add+0x5d2/0xc45 validate_chain+0xed3/0xf94 __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 __kernfs_remove+0x237/0x40b kernfs_remove_by_name_ns+0x59/0x72 remove_files+0x61/0x96 sysfs_remove_group+0x81/0xa4 sysfs_remove_groups+0x3b/0x44 kobject_del+0x44/0x94 blk_mq_unregister_dev+0x83/0xdd blk_unregister_queue+0xa0/0x10b del_gendisk+0x259/0x3fa null_del_dev+0x8b/0x1c3 [null_blk] null_exit+0x5c/0x95 [null_blk] __se_sys_delete_module+0x204/0x337 do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->sysfs_lock); lock(kn->count#202); lock(&q->sysfs_lock); lock(kn->count#202); *** DEADLOCK *** 2 locks held by rmmod/777: #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk] #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b stack backtrace: CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4 Call Trace: dump_stack+0x9a/0xe6 check_noncircular+0x207/0x251 ? print_circular_bug+0x32a/0x32a ? find_usage_backwards+0x84/0xb0 check_prev_add+0x5d2/0xc45 validate_chain+0xed3/0xf94 ? check_prev_add+0xc45/0xc45 ? mark_lock+0x11b/0x804 ? check_usage_forwards+0x1ca/0x1ca __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 ? kernfs_remove_by_name_ns+0x59/0x72 __kernfs_remove+0x237/0x40b ? kernfs_remove_by_name_ns+0x59/0x72 ? kernfs_next_descendant_post+0x7d/0x7d ? strlen+0x10/0x23 ? strcmp+0x22/0x44 kernfs_remove_by_name_ns+0x59/0x72 remove_files+0x61/0x96 sysfs_remove_group+0x81/0xa4 sysfs_remove_groups+0x3b/0x44 kobject_del+0x44/0x94 blk_mq_unregister_dev+0x83/0xdd blk_unregister_queue+0xa0/0x10b del_gendisk+0x259/0x3fa ? disk_events_poll_msecs_store+0x12b/0x12b ? check_flags+0x1ea/0x204 ? mark_held_locks+0x1f/0x7a null_del_dev+0x8b/0x1c3 [null_blk] null_exit+0x5c/0x95 [null_blk] __se_sys_delete_module+0x204/0x337 ? free_module+0x39f/0x39f ? blkcg_maybe_throttle_current+0x8a/0x718 ? rwlock_bug+0x62/0x62 ? __blkcg_punt_bio_submit+0xd0/0xd0 ? trace_hardirqs_on_thunk+0x1a/0x20 ? mark_held_locks+0x1f/0x7a ? do_syscall_64+0x4c/0x295 do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fb696cdbe6b Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008 RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828 RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000 R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0 R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0 Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: block/blk.h block/elevator.c block/blk-sysfs.c block/blk-core.c Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 09 10月, 2022 3 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: performance bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M CVE: NA -------------------------------- If CONFIG_BLK_BIO_DISPATCH_ASYNC is enabled, and driver support QUEUE_FLAG_DISPATCH_ASYNC, bios will be dispatched asynchronously to specific CPUs to avoid across nodes memory access in driver. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: performance bugzilla: 187597, https://gitee.com/openeuler/kernel/issues/I5QK5M CVE: NA -------------------------------- request_queue_wrapper is not accessible in drivers currently, introduce a new helper to initialize async dispatch to fix kabi broken. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I5QK5M CVE: NA -------------------------------- Add a new flag QUEUE_FLAG_DISPATCH_ASYNC and two new fields 'dispatch_cpumask' and 'last_dispatch_cpu' for request_queue, prepare to support dispatch bio asynchronous in specified cpus. This patch also add sysfs apis. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 09 3月, 2022 2 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 186143, https://gitee.com/openeuler/kernel/issues/I4WC06 CVE: NA ----------------------------------------------- If precise_iostat is enabled, inflight will be recorded and cleared by atomic operations. However, for dm multipath, inflight will be recorded by dm_requeue_original_request(), and if some error happened, multipath_release_clone() won't clear inflight, which will cause inflight to leak. Furthermore, %util will always be 100 in iostat. Fix the problem by calling __blk_mq_end_request() in multipath_release_clone() instead. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Zhang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: 39265, https://gitee.com/openeuler/kernel/issues/I4WC06 CVE: NA ----------------------------------------------- When the inflight IOs are slow and no new IOs are issued, we expect iostat could manifest the IO hang problem. However after commit 9c6dea45 ("block: delete part_round_stats and switch to less precise counting"), io_tick and time_in_queue will not be updated until the end of IO, and the avgqu-sz and %util columns of iostat will be zero. To fix it, we could fallback to the implementation before commit 9c6dea45, but it may cause performance regression on NVMe device or bio-based device (due to overhead of inflight calculation), so add a switch to control whether or not to use precise iostat accounting. It can be enabled by adding "precise_iostat=1" in kernel boot cmdline. When precise accouting is enabled, io_tick and time_in_queue will be updated when accessing /proc/diskstats and /sys/block/sdX/sdXN/stat. Fixes: 9c6dea45 ("block: delete part_round_stats and switch to less precise counting") Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 17 1月, 2022 1 次提交
-
-
由 Xie Yongji 提交于
mainline inclusion from mainline-5.16 commit 570b1cac category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- There are some duplicated codes to validate the block size in block drivers. This limitation actually comes from block layer, so this patch tries to add a new block layer helper for that. Signed-off-by: NXie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20211026144015.188-2-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NWenchao Hao <haowenchao@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 09 12月, 2021 1 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 173974 CVE: NA --------------------------- Queue will be quiesced if the old or the new flag is set, and the queue will be unqiesced if both flags is cleared. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 30 8月, 2021 1 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 177149, https://gitee.com/openeuler/kernel/issues/I47R8R CVE: NA ----------------------------------------------- If blk-throttle is enabled and io is issued before blk_throtl_register_queue() is done. Divide by zero crash will be triggered in tg_may_dispatch() because 'throtl_slice' is uninitialized. Thus introduce a new falg QUEUE_FLAG_THROTL_INIT_DONE. It will be set after blk_throtl_register_queue() is done, and will be checked before apply any config. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 16 8月, 2021 2 次提交
-
-
由 Yu Kuai 提交于
hulk inclusion category: bugfix bugzilla: 173119 CVE: NA ----------------------------------------------- Add struct request_queue_wrapper to avoid kabi broken. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Bob Liu 提交于
mainline inclusion from mainline-v5.2-rc2 commit 7996a8b5 category: bugfix bugzilla: 173119 CVE: NA ----------------------------------------------- The following is a description of a hang in blk_mq_freeze_queue_wait(). The hang happens on attempt to freeze a queue while another task does queue unfreeze. The root cause is an incorrect sequence of percpu_ref_resurrect() and percpu_ref_kill() and as a result those two can be swapped: CPU#0 CPU#1 ---------------- ----------------- q1 = blk_mq_init_queue(shared_tags) q2 = blk_mq_init_queue(shared_tags): blk_mq_add_queue_tag_set(shared_tags): blk_mq_update_tag_set_depth(shared_tags): list_for_each_entry() blk_mq_freeze_queue(q1) > percpu_ref_kill() > blk_mq_freeze_queue_wait() blk_cleanup_queue(q1) blk_mq_freeze_queue(q1) > percpu_ref_kill() ^^^^^^ freeze_depth can't guarantee the order blk_mq_unfreeze_queue() > percpu_ref_resurrect() > blk_mq_freeze_queue_wait() ^^^^^^ Hang here!!!! This wrong sequence raises kernel warning: percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release! WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0 But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(), which waits for a zero of a q_usage_counter, which never happens because percpu-ref was reinited (instead of being killed) and stays in PERCPU state forever. How to reproduce: - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2" - cpu0: python Script.py 0; taskset the corresponding process running on cpu0 - cpu1: python Script.py 1; taskset the corresponding process running on cpu1 Script.py: ------ #!/usr/bin/python3 import os import sys while True: on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1] off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1] os.system(on) os.system(off) ------ This bug was first reported and fixed by Roman, previous discussion: [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org [3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com> Signed-off-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 22 2月, 2021 1 次提交
-
-
由 Yufen Yu 提交于
hulk inclusion category: bugfix bugzilla: 46860 -------------------------------- Drivers (such as scsi enclosure) will not call blk_register_queue() to do initialize for request_queue. And we rely on driver self to deal with the race in that case when cleanup queue. But, some self-developed drivers cannot deal the race. To avoid null pointer reference as following, we do quiesce in kernel. [67760.308034] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 ... [67760.308069] pc : blk_mq_do_dispatch_sched+0x94/0x130 [67760.308072] lr : blk_mq_sched_dispatch_requests+0x128/0x1f0 [67760.308072] sp : ffff0000b2bb3ca0 [67760.308073] x29: ffff0000b2bb3ca0 x28: 0000000000000000 [67760.308075] x27: 0000000000000000 x26: ffff00008128a000 [67760.308076] x25: ffff8042b13e9700 x24: 0000000000000000 [67760.308077] x23: ffff0000b2bb3cf8 x22: ffff8042b2976808 [67760.308078] x21: ffff0000b2bb3d58 x20: ffff00008128a000 [67760.308079] x19: ffff8042b2976800 x18: 0000000000000000 [67760.308080] x17: 0000000000000000 x16: 0000000000000000 [67760.308081] x15: 0000000000000000 x14: 0000000000000000 [67760.308082] x13: 0000000000000000 x12: 0000000000000000 [67760.308084] x11: ffff800040801004 x10: ffff80004080100c [67760.308085] x9 : 0000000000000060 x8 : ffff809e58f7e500 [67760.308087] x7 : 0000000000000000 x6 : 00000000ffffffff [67760.308088] x5 : ffff000080ab7550 x4 : ffff80418e84ec80 [67760.308089] x3 : ffff0000815d7f10 x2 : 94e133dc71839c00 [67760.308090] x1 : 0000000000000000 x0 : ffff00008128a748 [67760.308091] Call trace: [67760.308093] blk_mq_do_dispatch_sched+0x94/0x130 [67760.308095] blk_mq_sched_dispatch_requests+0x128/0x1f0 [67760.308096] __blk_mq_run_hw_queue+0x98/0x138 [67760.308097] blk_mq_run_work_fn+0x28/0x38 Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
-
- 12 3月, 2020 1 次提交
-
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.6-rc4 commit c780e86d category: bugfix bugzilla: 13690 CVE: CVE-2019-19768 ------------------------------------------------- KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we wait for the end of RCU grace period when shutting down tracing. Luckily that is rare enough event that we can afford that. Note that postponing the freeing of blk_trace to an RCU callback should better be avoided as it could have unexpected user visible side-effects as debugfs files would be still existing for a short while block tracing has been shut down. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711 CC: stable@vger.kernel.org Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Tested-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reported-by: NTristan Madani <tristmd@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk> Conflicts: kernel/trace/blktrace.c [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 27 12月, 2019 4 次提交
-
-
由 Bart Van Assche 提交于
commit cd84a62e0078dce09f4ed349bec84f86c9d54b30 upstream. The RQF_PREEMPT flag is used for three purposes: - In the SCSI core, for making sure that power management requests are executed even if a device is in the "quiesced" state. - For domain validation by SCSI drivers that use the parallel port. - In the IDE driver, for IDE preempt requests. Rename "preempt-only" into "pm-only" because the primary purpose of this mode is power management. Since the power management core may but does not have to resume a runtime suspended device before performing system-wide suspend and since a later patch will set "pm-only" mode as long as a block device is runtime suspended, make it possible to set "pm-only" mode from more than one context. Since with this change scsi_device_quiesce() is no longer idempotent, make that function return early if it is called for a quiesced queue. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ming Lei 提交于
mainline inclusion from mainline-5.2-rc1 commit 2f8f1336 category: bugfix bugzilla: 14836 CVE: NA --------------------------- In normal queue cleanup path, hctx is released after request queue is freed, see blk_mq_release(). However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because of hw queues shrinking. This way is easy to cause use-after-free, because: one implicit rule is that it is safe to call almost all block layer APIs if the request queue is alive; and one hctx may be retrieved by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues(); finally use-after-free is triggered. Fixes this issue by always freeing hctx after releasing request queue. If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce a per-queue list to hold them, then try to resuse these hctxs if numa node is matched. Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Conflict: block/blk-mq.c include/linux/blk-mq.h include/linux/blkdev.h Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Tan Xiaojun 提交于
hulk inclusion category: feature bugzilla: 13276 CVE: NA ------------------------------- Reserve space for the structure in blk. Signed-off-by: NTan Xiaojun <tanxiaojun@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ming Lei 提交于
mainline inclusion from mainline-5.0-rc1 commit 1db4909e category: bugfix bugzilla: 5901 CVE: NA --------------------------- Even though .mq_kobj, ctx->kobj and q->kobj share same lifetime from block layer's view, actually they don't because userspace may grab one kobject anytime via sysfs. This patch fixes the issue by the following approach: 1) introduce 'struct blk_mq_ctxs' for holding .mq_kobj and managing all ctxs 2) free all allocated ctxs and the 'blk_mq_ctxs' instance in release handler of .mq_kobj 3) grab one ref of .mq_kobj before initializing each ctx->kobj, so that .mq_kobj is always released after all ctxs are freed. This patch fixes kernel panic issue during booting when DEBUG_KOBJECT_RELEASE is enabled. Reported-by: NGuenter Roeck <linux@roeck-us.net> Cc: "jianchao.wang" <jianchao.w.wang@oracle.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 12 9月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
After merging the iolatency policy, we potentially now have 4 policies being registered, but only support 3. This causes one of them to fail loading. Takashi reports that BFQ no longer works for him, because it fails to load due to policy registration failure. Bump to 5 policies, and also add a warning for when we have exceeded the global amount. If we have to touch this again, we should switch to a dynamic scheme instead. Reported-by: NTakashi Iwai <tiwai@suse.de> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Tested-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 8月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but did not remove the corresponding #include directives. Since these include directives are no longer needed, remove them. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <keith.busch@intel.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Jianchao Wang <jianchao.w.wang@oracle.com> Cc: Hannes Reinecke <hare@suse.com>, Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 7月, 2018 1 次提交
-
-
由 Greg Edwards 提交于
This allows bio_integrity_bytes() to be called from drivers instead of open coding it. Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NGreg Edwards <gedwards@ddn.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 7月, 2018 1 次提交
-
-
由 Tejun Heo 提交于
c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @OP with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 13 7月, 2018 1 次提交
-
-
由 Vladimir Zapolskiy 提交于
Remove blkdev_entry_to_request() macro, which remained unused through the observable history, also note that it repeats list_entry_rq() macro verbatim. Signed-off-by: NVladimir Zapolskiy <vz@mleia.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 7月, 2018 5 次提交
-
-
由 Josef Bacik 提交于
blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
We have to remove synchronize_rcu() from blk_queue_cleanup(), otherwise long delay can be caused during lun probe. For removing it, we have to avoid to iterate the set->tag_list in IO path, eg, blk_mq_sched_restart(). This patch reverts 5b79413946d (Revert "blk-mq: don't handle TAG_SHARED in restart"). Given we have fixed enough IO hang issue, and there isn't any reason to restart all queues in one tags any more, see the following reasons: 1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(), which can wake up queues waiting for driver tag. 2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue, target or host is ready, but SCSI built-in restart can cover all these well, see scsi_end_request(), queue will be rerun after any request initiated from this host/target is completed. In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30% when running I/O on these 8 luns concurrently. Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: Omar Sandoval <osandov@fb.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Reported-by: NAndrew Jones <drjones@redhat.com> Tested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Exclude zoned block device members from struct request_queue for CONFIG_BLK_DEV_ZONED == n. Avoid breaking the build by only building the code that uses these struct request_queue members if CONFIG_BLK_DEV_ZONED != n. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Since the implementation of blk_queue_nr_zones() is trivial and since it only has a single caller, inline this function. Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Bart Van Assche 提交于
Remove this function since it has no callers. This function was introduced in commit 6cc77e9c ("block: introduce zoned block devices zone write locking"). Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matias Bjorling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 6月, 2018 1 次提交
-
-
由 Keith Busch 提交于
A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Tested-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com> Cc: <stable@vger.kernel.org> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 6月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
This function is entirely unused, so remove it and the tag_queue_busy member of struct request_queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 6月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
We can currently call the timeout handler again on a request that has already been handed over to the timeout handler. Prevent that with a new flag. Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce") Reported-by: NAndrew Randrianasulu <randrianasulu@gmail.com> Tested-by: NAndrew Randrianasulu <randrianasulu@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 5月, 2018 1 次提交
-
-
由 Kent Overstreet 提交于
Convert the core block functionality to embedded bio sets. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 5月, 2018 5 次提交
-
-
由 Jens Axboe 提交于
After the recent timeout handling changes, we have two holes in the struct. Move the timeout near the deadline, killing both, and moving related members closer together. On my config on x86-64, this shrinks struct request from 312 to 304 bytes. Signed-off-by: NJens Axboe <axboe@kernel.dk> -
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The BLK_EH_NOT_HANDLED implies nothing happen, but very often that is not what is happening - instead the driver already completed the command. Fix the symbolic name to reflect that a little better. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
This patch simplifies the timeout handling by relying on the request reference counting to ensure the iterator is operating on an inflight and truly timed out request. Since the reference counting prevents the tag from being reallocated, the block layer no longer needs to prevent drivers from completing their requests while the timeout handler is operating on it: a driver completing a request is allowed to proceed to the next state without additional syncronization with the block layer. This also removes any need for generation sequence numbers since the request lifetime is prevented from being reallocated as a new sequence while timeout handling is operating on it. To enables this a refcount is added to struct request so that request users can be sure they're operating on the same request without it changing while they're processing it. The request's tag won't be released for reuse until both the timeout handler and the completion are done with it. Signed-off-by: NKeith Busch <keith.busch@intel.com> [hch: slight cleanups, added back submission side hctx lock, use cmpxchg for completions] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch everyone to blk_get_request_flags, and then rename blk_get_request_flags to blk_get_request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 5月, 2018 2 次提交
-
-
由 Omar Sandoval 提交于
Currently, struct request has four timestamp fields: - A start time, set at get_request time, in jiffies, used for iostats - An I/O start time, set at start_request time, in ktime nanoseconds, used for blk-stats (i.e., wbt, kyber, hybrid polling) - Another start time and another I/O start time, used for cfq and bfq These can all be consolidated into one start time and one I/O start time, both in ktime nanoseconds, shaving off up to 16 bytes from struct request depending on the kernel config. Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Omar Sandoval 提交于
cfq and bfq have some internal fields that use sched_clock() which can trivially use ktime_get_ns() instead. Their timestamp fields in struct request can also use ktime_get_ns(), which resolves the 8 year old comment added by commit 28f4197e ("block: disable preemption before using sched_clock()"). Signed-off-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-