- 27 9月, 2019 1 次提交
-
-
由 Yufen Yu 提交于
We have updated limits after calling wbt_set_min_lat(). No need to update again. Reviewed-by: NBob Liu <bob.liu@oracle.com> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 9月, 2019 1 次提交
-
-
由 Ming Lei 提交于
cecf5d87 ("block: split .sysfs_lock into two locks") starts to release & acquire sysfs_lock before registering/un-registering elevator queue during switching elevator for avoiding potential deadlock from showing & storing 'queue/iosched' attributes and removing elevator's kobject. Turns out there isn't such deadlock because 'q->sysfs_lock' isn't required in .show & .store of queue/iosched's attributes, and just elevator's sysfs lock is acquired in elv_iosched_store() and elv_iosched_show(). So it is safe to hold queue's sysfs lock when registering/un-registering elevator queue. The biggest issue is that commit cecf5d87 assumes that concurrent write on 'queue/scheduler' can't happen. However, this assumption isn't true, because kernfs_fop_write() only guarantees that concurrent write aren't called on the same open file, but the write could be from different open on the file. So we can't release & re-acquire queue's sysfs lock during switching elevator, otherwise use-after-free on elevator could be triggered. Fixes the issue by not releasing queue's sysfs lock during switching elevator. Fixes: cecf5d87 ("block: split .sysfs_lock into two locks") Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 9月, 2019 1 次提交
-
-
由 Ming Lei 提交于
cecf5d87 ("block: split .sysfs_lock into two locks") starts to release & actuire sysfs_lock again during switching elevator. So it isn't enough to prevent switching elevator from happening by simply clearing QUEUE_FLAG_REGISTERED with holding sysfs_lock, because in-progress switch still can move on after re-acquiring the lock, meantime the flag of QUEUE_FLAG_REGISTERED won't get checked. Fixes this issue by checking 'q->elevator' directly & locklessly after q->kobj is removed in blk_unregister_queue(), this way is safe because q->elevator can't be changed at that time. Fixes: cecf5d87 ("block: split .sysfs_lock into two locks") Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 8月, 2019 2 次提交
-
-
由 Ming Lei 提交于
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store path. Meantime, inside block's .show/.store callback, q->sysfs_lock is required. However, when mq & iosched kobjects are removed via blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held too. This way causes AB-BA lock because the kernfs built-in lock of 'kn-count' is required inside kobject_del() too, see the lockdep warning[1]. On the other hand, it isn't necessary to acquire q->sysfs_lock for both blk_mq_unregister_dev() & elv_unregister_queue() because clearing REGISTERED flag prevents storing to 'queue/scheduler' from being happened. Also sysfs write(store) is exclusive, so no necessary to hold the lock for elv_unregister_queue() when it is called in switching elevator path. So split .sysfs_lock into two: one is still named as .sysfs_lock for covering sync .store, the other one is named as .sysfs_dir_lock for covering kobjects and related status change. sysfs itself can handle the race between add/remove kobjects and showing/storing attributes under kobjects. For switching scheduler via storing to 'queue/scheduler', we use the queue flag of QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then we can avoid to hold .sysfs_lock during removing/adding kobjects. [1] lockdep warning ====================================================== WARNING: possible circular locking dependency detected 5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted ------------------------------------------------------ rmmod/777 is trying to acquire lock: 00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72 but task is already holding lock: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&q->sysfs_lock){+.+.}: __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 __mutex_lock+0x14a/0xa9b blk_mq_hw_sysfs_show+0x63/0xb6 sysfs_kf_seq_show+0x11f/0x196 seq_read+0x2cd/0x5f2 vfs_read+0xc7/0x18c ksys_read+0xc4/0x13e do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (kn->count#202){++++}: check_prev_add+0x5d2/0xc45 validate_chain+0xed3/0xf94 __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 __kernfs_remove+0x237/0x40b kernfs_remove_by_name_ns+0x59/0x72 remove_files+0x61/0x96 sysfs_remove_group+0x81/0xa4 sysfs_remove_groups+0x3b/0x44 kobject_del+0x44/0x94 blk_mq_unregister_dev+0x83/0xdd blk_unregister_queue+0xa0/0x10b del_gendisk+0x259/0x3fa null_del_dev+0x8b/0x1c3 [null_blk] null_exit+0x5c/0x95 [null_blk] __se_sys_delete_module+0x204/0x337 do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&q->sysfs_lock); lock(kn->count#202); lock(&q->sysfs_lock); lock(kn->count#202); *** DEADLOCK *** 2 locks held by rmmod/777: #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk] #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b stack backtrace: CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4 Call Trace: dump_stack+0x9a/0xe6 check_noncircular+0x207/0x251 ? print_circular_bug+0x32a/0x32a ? find_usage_backwards+0x84/0xb0 check_prev_add+0x5d2/0xc45 validate_chain+0xed3/0xf94 ? check_prev_add+0xc45/0xc45 ? mark_lock+0x11b/0x804 ? check_usage_forwards+0x1ca/0x1ca __lock_acquire+0x95f/0xa2f lock_acquire+0x1b4/0x1e8 ? kernfs_remove_by_name_ns+0x59/0x72 __kernfs_remove+0x237/0x40b ? kernfs_remove_by_name_ns+0x59/0x72 ? kernfs_next_descendant_post+0x7d/0x7d ? strlen+0x10/0x23 ? strcmp+0x22/0x44 kernfs_remove_by_name_ns+0x59/0x72 remove_files+0x61/0x96 sysfs_remove_group+0x81/0xa4 sysfs_remove_groups+0x3b/0x44 kobject_del+0x44/0x94 blk_mq_unregister_dev+0x83/0xdd blk_unregister_queue+0xa0/0x10b del_gendisk+0x259/0x3fa ? disk_events_poll_msecs_store+0x12b/0x12b ? check_flags+0x1ea/0x204 ? mark_held_locks+0x1f/0x7a null_del_dev+0x8b/0x1c3 [null_blk] null_exit+0x5c/0x95 [null_blk] __se_sys_delete_module+0x204/0x337 ? free_module+0x39f/0x39f ? blkcg_maybe_throttle_current+0x8a/0x718 ? rwlock_bug+0x62/0x62 ? __blkcg_punt_bio_submit+0xd0/0xd0 ? trace_hardirqs_on_thunk+0x1a/0x20 ? mark_held_locks+0x1f/0x7a ? do_syscall_64+0x4c/0x295 do_syscall_64+0xa7/0x295 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fb696cdbe6b Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008 RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828 RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000 R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0 R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0 Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
There are 4 users which check if queue is registered, so add one helper to check it. Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 8月, 2019 1 次提交
-
-
由 zhengbin 提交于
blk_exit_queue will free elevator_data, while blk_mq_requeue_work will access it. Move cancel of requeue_work to the front of blk_exit_queue to avoid use-after-free. blk_exit_queue blk_mq_requeue_work __elevator_exit blk_mq_run_hw_queues blk_mq_exit_sched blk_mq_run_hw_queue dd_exit_queue blk_mq_hctx_has_pending kfree(elevator_data) blk_mq_sched_has_work dd_has_work Fixes: fbc2a15e ("blk-mq: move cancel of requeue_work into blk_mq_release") Cc: stable@vger.kernel.org Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: Nzhengbin <zhengbin13@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 6月, 2019 1 次提交
-
-
由 Ming Lei 提交于
In theory, IO scheduler belongs to request queue, and the request pool of sched tags belongs to the request queue too. However, the current tags allocation interfaces are re-used for both driver tags and sched tags, and driver tags is definitely host wide, and doesn't belong to any request queue, same with its request pool. So we need tagset instance for freeing request of sched tags. Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched tags to be freed before calling blk_mq_free_tag_set(). Commit 47cdee29 ("block: move blk_exit_queue into __blk_release_queue") moves blk_exit_queue into __blk_release_queue for simplying the fast path in generic_make_request(), then causes oops during freeing requests of sched tags in __blk_release_queue(). Fix the above issue by move freeing request pool of sched tags into blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any in-queue requests at that time. Freeing sched tags has to be kept in queue's release handler becasue there might be un-completed dispatch activity which might refer to sched tags. Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Fixes: 47cdee29 ("block: move blk_exit_queue into __blk_release_queue") Tested-by: NYi Zhang <yi.zhang@redhat.com> Reported-by: Nkernel test robot <rong.a.chen@intel.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 5月, 2019 1 次提交
-
-
由 Ming Lei 提交于
Commit 498f6650 ("block: Fix a race between the cgroup code and request queue initialization") moves what blk_exit_queue does into blk_cleanup_queue() for fixing issue caused by changing back queue lock. However, after legacy request IO path is killed, driver queue lock won't be used at all, and there isn't story for changing back queue lock. Then the issue addressed by Commit 498f6650 doesn't exist any more. So move move blk_exit_queue into __blk_release_queue. This patch basically reverts the following two commits: 498f6650 block: Fix a race between the cgroup code and request queue initialization 24ecc358 block: Ensure that a request queue is dissociated from the cgroup controller Cc: Bart Van Assche <bvanassche@acm.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 4月, 2019 1 次提交
-
-
由 Kimberly Brown 提交于
The kobj_type default_attrs field is being replaced by the default_groups field. Replace all of the ktype default_attrs fields in the block subsystem with default_groups and use the ATTRIBUTE_GROUPS macro to create the default groups. Remove default_ctx_attrs[] because it doesn't contain any attributes. This patch was tested by verifying that the sysfs files for the attributes in the default groups were created. Signed-off-by: NKimberly Brown <kimbrownkd@gmail.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 4月, 2019 1 次提交
-
-
由 Weiping Zhang 提交于
If the low level driver has no timeout handler, the /sys/block/<disk>/queue/io_timeout will not be displayed. Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NWeiping Zhang <zhangweiping@didiglobal.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 3月, 2019 1 次提交
-
-
由 Yufen Yu 提交于
For q->poll_nsec == -1, means doing classic poll, not hybrid poll. We introduce a new flag BLK_MQ_POLL_CLASSIC to replace -1, which may make code much easier to read. Additionally, since val is an int obtained with kstrtoint(), val can be a negative value other than -1, so return -EINVAL for that case. Thanks to Damien Le Moal for some good suggestion. Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NYufen Yu <yuyufen@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 2月, 2019 2 次提交
-
-
由 Aleksei Zakharov 提交于
There's no reason to set wbt min lat and freeze request queue if current value is the same. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NAleksei Zakharov <zakharov.a.g@yandex.ru> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Marcos Paulo de Souza 提交于
The Notes section of the comment was removed, because now blk_release_queue can only be executed from blk_cleanup_queue (being called when the q->kobj reaches zero), and also blk_init_queue was removed in a1ce35fa. Signed-off-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 12月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that the the SCSI layer replaced the use of the cluster flag with segment size limits and the DMA boundary we can remove the cluster flag from the block layer. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 18 12月, 2018 1 次提交
-
-
由 Ming Lei 提交于
The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues is bigger than zero, so enhance the constraint by checking .nr_queues of type poll before enabling IO poll. Otherwise IO race & timeout can be observed when running block/007. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 05 12月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
This avoids having to have differnet mq_ops for different setups with or without poll queues. Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 11月, 2018 1 次提交
-
-
由 Weiping Zhang 提交于
Give a interface to adjust io timeout(ms) by device. Signed-off-by: NWeiping Zhang <zhangweiping@didiglobal.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 11月, 2018 3 次提交
-
-
由 Jens Axboe 提交于
Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
With the legacy request path gone there is no good reason to keep queue_lock as a pointer, we can always use the embedded lock now. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Fixed floppy and blk-cgroup missing conversions and half done edits. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
->queue_flags is generally not set or cleared in the fast path, and also generally set or cleared one flag at a time. Make use of the normal atomic bitops for it so that we don't need to take the queue_lock, which is otherwise mostly unused in the core block layer now. Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 11月, 2018 2 次提交
-
-
由 Jens Axboe 提交于
This removes a bunch of core and elevator related code. On the core front, we remove anything related to queue running, draining, initialization, plugging, and congestions. We also kill anything related to request allocation, merging, retrieval, and completion. Remove any checking for single queue IO schedulers, as they no longer exist. This means we can also delete a bunch of code related to request issue, adding, completion, etc - and all the SQ related ops and helpers. Also kill the load_default_modules(), as all that did was provide for a way to load the default single queue elevator. Tested-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It's now unused, kill it. Reviewed-by: NHannes Reinecke <hare@suse.com> Tested-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 10月, 2018 1 次提交
-
-
由 Ming Lei 提交于
rq_qos_exit() removes the current q->rq_qos, this action has to be done after queue is frozen, otherwise the IO queue path may never be waken up, then IO hang is caused. So fixes this issue by moving rq_qos_exit() after queue is frozen. Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 10月, 2018 2 次提交
-
-
由 Damien Le Moal 提交于
Drivers exposing zoned block devices have to initialize and maintain correctness (i.e. revalidate) of the device zone bitmaps attached to the device request queue (seq_zones_bitmap and seq_zones_wlock). To simplify coding this, introduce a generic helper function blk_revalidate_disk_zones() suitable for most (and likely all) cases. This new function always update the seq_zones_bitmap and seq_zones_wlock bitmaps as well as the queue nr_zones field when called for a disk using a request based queue. For a disk using a BIO based queue, only the number of zones is updated since these queues do not have schedulers and so do not need the zone bitmaps. With this change, the zone bitmap initialization code in sd_zbc.c can be replaced with a call to this function in sd_zbc_read_zones(), which is called from the disk revalidate block operation method. A call to blk_revalidate_disk_zones() is also added to the null_blk driver for devices created with the zoned mode enabled. Finally, to ensure that zoned devices created with dm-linear or dm-flakey expose the correct number of zones through sysfs, a call to blk_revalidate_disk_zones() is added to dm_table_set_restrictions(). The zone bitmaps allocated and initialized with blk_revalidate_disk_zones() are freed automatically from __blk_release_queue() using the block internal function blk_queue_free_zone_bitmaps(). Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Damien Le Moal 提交于
Expose through sysfs the nr_zones field of struct request_queue. Exposing this value helps in debugging disk issues as well as facilitating scripts based use of the disk (e.g. blktests). For zoned block devices, the nr_zones field indicates the total number of zones of the device calculated using the known disk capacity and zone size. This number of zones is always 0 for regular block devices. Since nr_zones is defined conditionally with CONFIG_BLK_DEV_ZONED, introduce the blk_queue_nr_zones() function to return the correct value for any device, regardless if CONFIG_BLK_DEV_ZONED is set. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 8月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
A previous commit removed the ability to have per-rq flags. We used those flags to maintain inflight counts. Since we don't have those anymore, we have to always maintain inflight counts, even if wbt is disabled. This is clearly suboptimal. Add a queue quiesce around changing the wbt latency settings from sysfs to work around this. With that, we can reliably put the enabled check in our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be consistent for the lifetime of the request. Fixes: c1c80384 ("block: remove external dependency on wbt_flags") Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 8月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
For legacy queues the only call of blkg_root_lookup() happens after bypass mode has been enabled. Since blkg_lookup() returns NULL for queues in bypass mode, modify the blkg_root_lookup() such that it no longer depends on bypass mode. Rename the function into blk_queue_root_blkg() as suggested by Tejun. Suggested-by: NTejun Heo <tj@kernel.org> Fixes: 6bad9b21 ("blkcg: Introduce blkg_root_lookup()") Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 8月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
Several block drivers call alloc_disk() followed by put_disk() if something fails before device_add_disk() is called without calling blk_cleanup_queue(). Make sure that also for this scenario a request queue is dissociated from the cgroup controller. This patch avoids that loading the parport_pc, paride and pf drivers triggers the following kernel crash: BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride] Read of size 4 at addr 0000000000000008 by task modprobe/744 Call Trace: dump_stack+0x9a/0xeb kasan_report+0x139/0x350 pi_init+0x42e/0x580 [paride] pf_init+0x2bb/0x1000 [pf] do_one_initcall+0x8e/0x405 do_init_module+0xd9/0x2f2 load_module+0x3ab4/0x4700 SYSC_finit_module+0x176/0x1a0 do_syscall_64+0xee/0x2b0 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Reported-by: NAlexandru Moise <00moses.alexander00@gmail.com> Fixes: a063057d ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17 Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Tested-by: NAlexandru Moise <00moses.alexander00@gmail.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Alexandru Moise <00moses.alexander00@gmail.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 7月, 2018 1 次提交
-
-
由 Josef Bacik 提交于
blkcg-qos is going to do essentially what wbt does, only on a cgroup basis. Break out the common code that will be shared between blkcg-qos and wbt into blk-rq-qos.* so they can both utilize the same infrastructure. Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 31 5月, 2018 1 次提交
-
-
由 Kent Overstreet 提交于
Convert the core block functionality to embedded bio sets. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 5月, 2018 1 次提交
-
-
由 Joe Perches 提交于
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 5月, 2018 1 次提交
-
-
由 Kent Overstreet 提交于
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 3月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
Introduce functions that modify the queue flags and that protect these modifications with the request queue lock. Except for moving one wake_up_all() call from inside to outside a critical section, this patch does not change any functionality. Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 3月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
Avoid that the following race can occur: blk_cleanup_queue() blkcg_print_blkgs() spin_lock_irq(lock) (1) spin_lock_irq(blkg->q->queue_lock) (2,5) q->queue_lock = &q->__queue_lock (3) spin_unlock_irq(lock) (4) spin_unlock_irq(blkg->q->queue_lock) (6) (1) take driver lock; (2) busy loop for driver lock; (3) override driver lock with internal lock; (4) unlock driver lock; (5) can take driver lock now; (6) but unlock internal lock. This change is safe because only the SCSI core and the NVME core keep a reference on a request queue after having called blk_cleanup_queue(). Neither driver accesses any of the removed data structures between its blk_cleanup_queue() and blk_put_queue() calls. Reported-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Cc: Jan Kara <jack@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 1月, 2018 1 次提交
-
-
由 Bart Van Assche 提交于
The __blk_mq_register_dev(), blk_mq_unregister_dev(), elv_register_queue() and elv_unregister_queue() calls need to be protected with sysfs_lock but other code in these functions not. Hence protect only this code with sysfs_lock. This patch fixes a locking inversion issue in blk_unregister_queue() and also in an error path of blk_register_queue(): it is not allowed to hold sysfs_lock around the kobject_del(&q->kobj) call. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 15 1月, 2018 2 次提交
-
-
由 Mike Snitzer 提交于
Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this is block/genhd.c:add_disk() has requires that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called -- because add_disk() also deals with exposing the request_queue via blk_register_queue(). DM's dynamic creation of arbitrary device types (and associated request_queue types) requires the DM device's gendisk be available so that DM table loads can establish a master/slave relationship with subordinate devices that are referenced by loaded DM tables -- using bd_link_disk_holder(). But until these DM tables, and their associated subordinate devices, are known DM cannot know what type of request_queue it needs -- nor what its queue_limits should be. This chicken and egg scenario has created all manner of problems for DM and, at times, the block layer. Summary of changes: - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant that drivers may use to add a disk without also calling blk_register_queue(). Driver must call blk_register_queue() once its request_queue is fully initialized. - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED is not set. It won't be set if driver used add_disk_no_queue_reg() but driver encounters an error and must del_gendisk() before calling blk_register_queue(). - Export blk_register_queue(). These changes allow DM to use add_disk_no_queue_reg() to anchor its gendisk as the "master" for master/slave relationships DM must establish with subordinate devices referenced in DM tables that get loaded. Once all "slave" devices for a DM device are known its request_queue can be properly initialized and then advertised via sysfs -- important improvement being that no request_queue resource initialization performed by blk_register_queue() is missed for DM devices anymore. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Mike Snitzer 提交于
The original commit e9a823fb (block: fix warning when I/O elevator is changed as request_queue is being removed) is pretty conflated. "conflated" because the resource being protected by q->sysfs_lock isn't the queue_flags (it is the 'queue' kobj). q->sysfs_lock serializes __elevator_change() (via elv_iosched_store) from racing with blk_unregister_queue(): 1) By holding q->sysfs_lock first, __elevator_change() can complete before a racing blk_unregister_queue(). 2) Conversely, __elevator_change() is testing for QUEUE_FLAG_REGISTERED in case elv_iosched_store() loses the race with blk_unregister_queue(), it needs a way to know the 'queue' kobj isn't there. Expand the scope of blk_unregister_queue()'s q->sysfs_lock use so it is held until after the 'queue' kobj is removed. To do so blk_mq_unregister_dev() must not also take q->sysfs_lock. So rename __blk_mq_unregister_dev() to blk_mq_unregister_dev(). Also, blk_unregister_queue() should use q->queue_lock to protect against any concurrent writes to q->queue_flags -- even though chances are the queue is being cleaned up so no concurrent writes are likely. Fixes: e9a823fb ("block: fix warning when I/O elevator is changed as request_queue is being removed") Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 11月, 2017 1 次提交
-
-
由 weiping zhang 提交于
wbt_init doesn't set q->rq_wb to NULL, if wbt_init return 0, so check return value is enough, remove NULL checking. Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 11月, 2017 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 29 8月, 2017 1 次提交
-
-
由 David Jeffery 提交于
There is a race between changing I/O elevator and request_queue removal which can trigger the warning in kobject_add_internal. A program can use sysfs to request a change of elevator at the same time another task is unregistering the request_queue the elevator would be attached to. The elevator's kobject will then attempt to be connected to the request_queue in the object tree when the request_queue has just been removed from sysfs. This triggers the warning in kobject_add_internal as the request_queue no longer has a sysfs directory: kobject_add_internal failed for iosched (error: -2 parent: queue) ------------[ cut here ]------------ WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0 To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when changing the elevator and use the request_queue's sysfs_lock to serialize between clearing the flag and the elevator testing the flag. Signed-off-by: NDavid Jeffery <djeffery@redhat.com> Tested-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-