1. 24 10月, 2022 4 次提交
  2. 09 10月, 2022 1 次提交
  3. 21 9月, 2022 1 次提交
    • Y
      blk-wbt: call rq_qos_add() after wb_normal is initialized · 8c5035df
      Yu Kuai 提交于
      Our test found a problem that wbt inflight counter is negative, which
      will cause io hang(noted that this problem doesn't exist in mainline):
      
      t1: device create	t2: issue io
      add_disk
       blk_register_queue
        wbt_enable_default
         wbt_init
          rq_qos_add
          // wb_normal is still 0
      			/*
      			 * in mainline, disk can't be opened before
      			 * bdev_add(), however, in old kernels, disk
      			 * can be opened before blk_register_queue().
      			 */
      			blkdev_issue_flush
                              // disk size is 0, however, it's not checked
                               submit_bio_wait
                                submit_bio
                                 blk_mq_submit_bio
                                  rq_qos_throttle
                                   wbt_wait
      			      bio_to_wbt_flags
                                     rwb_enabled
      			       // wb_normal is 0, inflight is not increased
      
          wbt_queue_depth_changed(&rwb->rqos);
           wbt_update_limits
           // wb_normal is initialized
                                  rq_qos_track
                                   wbt_track
                                    rq->wbt_flags |= bio_to_wbt_flags(rwb, bio);
      			      // wb_normal is not 0,wbt_flags will be set
      t3: io completion
      blk_mq_free_request
       rq_qos_done
        wbt_done
         wbt_is_tracked
         // return true
         __wbt_done
          wbt_rqw_done
           atomic_dec_return(&rqw->inflight);
           // inflight is decreased
      
      commit 8235b5c1 ("block: call bdev_add later in device_add_disk") can
      avoid this problem, however it's better to fix this problem in wbt:
      
      1) Lower kernel can't backport this patch due to lots of refactor.
      2) Root cause is that wbt call rq_qos_add() before wb_normal is
      initialized.
      
      Fixes: e34cbd30 ("blk-wbt: add general throttling mechanism")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20220913105749.3086243-1-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      8c5035df
  4. 20 7月, 2022 1 次提交
    • J
      block: don't allow the same type rq_qos add more than once · 14a6e2eb
      Jinke Han 提交于
      In our test of iocost, we encountered some list add/del corruptions of
      inner_walk list in ioc_timer_fn.
      
      The reason can be described as follows:
      
      cpu 0					cpu 1
      ioc_qos_write				ioc_qos_write
      
      ioc = q_to_ioc(queue);
      if (!ioc) {
              ioc = kzalloc();
      					ioc = q_to_ioc(queue);
      					if (!ioc) {
      						ioc = kzalloc();
      						...
      						rq_qos_add(q, rqos);
      					}
              ...
              rq_qos_add(q, rqos);
              ...
      }
      
      When the io.cost.qos file is written by two cpus concurrently, rq_qos may
      be added to one disk twice. In that case, there will be two iocs enabled
      and running on one disk. They own different iocgs on their active list. In
      the ioc_timer_fn function, because of the iocgs from two iocs have the
      same root iocg, the root iocg's walk_list may be overwritten by each other
      and this leads to list add/del corruptions in building or destroying the
      inner_walk list.
      
      And so far, the blk-rq-qos framework works in case that one instance for
      one type rq_qos per queue by default. This patch make this explicit and
      also fix the crash above.
      Signed-off-by: NJinke Han <hanjinke.666@bytedance.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      14a6e2eb
  5. 15 7月, 2022 2 次提交
  6. 19 10月, 2021 1 次提交
    • A
      blk-wbt: prevent NULL pointer dereference in wb_timer_fn · 480d42dc
      Andrea Righi 提交于
      The timer callback used to evaluate if the latency is exceeded can be
      executed after the corresponding disk has been released, causing the
      following NULL pointer dereference:
      
      [ 119.987108] BUG: kernel NULL pointer dereference, address: 0000000000000098
      [ 119.987617] #PF: supervisor read access in kernel mode
      [ 119.987971] #PF: error_code(0x0000) - not-present page
      [ 119.988325] PGD 7c4a4067 P4D 7c4a4067 PUD 7bf63067 PMD 0
      [ 119.988697] Oops: 0000 [#1] SMP NOPTI
      [ 119.988959] CPU: 1 PID: 9353 Comm: cloud-init Not tainted 5.15-rc5+arighi #rc5+arighi
      [ 119.989520] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      [ 119.990055] RIP: 0010:wb_timer_fn+0x44/0x3c0
      [ 119.990376] Code: 41 8b 9c 24 98 00 00 00 41 8b 94 24 b8 00 00 00 41 8b 84 24 d8 00 00 00 4d 8b 74 24 28 01 d3 01 c3 49 8b 44 24 60 48 8b 40 78 <4c> 8b b8 98 00 00 00 4d 85 f6 0f 84 c4 00 00 00 49 83 7c 24 30 00
      [ 119.991578] RSP: 0000:ffffb5f580957da8 EFLAGS: 00010246
      [ 119.991937] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      [ 119.992412] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88f476d7f780
      [ 119.992895] RBP: ffffb5f580957dd0 R08: 0000000000000000 R09: 0000000000000000
      [ 119.993371] R10: 0000000000000004 R11: 0000000000000002 R12: ffff88f476c84500
      [ 119.993847] R13: ffff88f4434390c0 R14: 0000000000000000 R15: ffff88f4bdc98c00
      [ 119.994323] FS: 00007fb90bcd9c00(0000) GS:ffff88f4bdc80000(0000) knlGS:0000000000000000
      [ 119.994952] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 119.995380] CR2: 0000000000000098 CR3: 000000007c0d6000 CR4: 00000000000006e0
      [ 119.995906] Call Trace:
      [ 119.996130] ? blk_stat_free_callback_rcu+0x30/0x30
      [ 119.996505] blk_stat_timer_fn+0x138/0x140
      [ 119.996830] call_timer_fn+0x2b/0x100
      [ 119.997136] __run_timers.part.0+0x1d1/0x240
      [ 119.997470] ? kvm_clock_get_cycles+0x11/0x20
      [ 119.997826] ? ktime_get+0x3e/0xa0
      [ 119.998110] ? native_apic_msr_write+0x2c/0x30
      [ 119.998456] ? lapic_next_event+0x20/0x30
      [ 119.998779] ? clockevents_program_event+0x94/0xf0
      [ 119.999150] run_timer_softirq+0x2a/0x50
      [ 119.999465] __do_softirq+0xcb/0x26f
      [ 119.999764] irq_exit_rcu+0x8c/0xb0
      [ 120.000057] sysvec_apic_timer_interrupt+0x43/0x90
      [ 120.000429] ? asm_sysvec_apic_timer_interrupt+0xa/0x20
      [ 120.000836] asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      In this case simply return from the timer callback (no action
      required) to prevent the NULL pointer dereference.
      
      BugLink: https://bugs.launchpad.net/bugs/1947557
      Link: https://lore.kernel.org/linux-mm/YWRNVTk9N8K0RMst@arighi-desktop/
      Fixes: 34dbad5d ("blk-stat: convert to callback-based statistics reporting")
      Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Link: https://lore.kernel.org/r/YW6N2qXpBU3oc50q@arighi-desktopSigned-off-by: NJens Axboe <axboe@kernel.dk>
      480d42dc
  7. 24 8月, 2021 1 次提交
  8. 10 8月, 2021 1 次提交
  9. 22 6月, 2021 2 次提交
  10. 19 6月, 2021 1 次提交
  11. 27 1月, 2021 1 次提交
  12. 01 12月, 2020 1 次提交
  13. 24 8月, 2020 1 次提交
  14. 30 5月, 2020 2 次提交
  15. 17 4月, 2020 1 次提交
  16. 06 10月, 2019 1 次提交
  17. 29 8月, 2019 1 次提交
  18. 28 8月, 2019 1 次提交
  19. 01 5月, 2019 1 次提交
  20. 25 1月, 2019 1 次提交
    • B
      blk-wbt: Declare local functions static · c83f536a
      Bart Van Assche 提交于
      This patch avoids that sparse reports the following warnings:
      
        CHECK   block/blk-wbt.c
      block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static?
      block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static?
        CC      block/blk-wbt.o
      block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes]
       void wbt_issue(struct rq_qos *rqos, struct request *rq)
            ^~~~~~~~~
      block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes]
       void wbt_requeue(struct rq_qos *rqos, struct request *rq)
            ^~~~~~~~~~~
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c83f536a
  21. 17 12月, 2018 1 次提交
  22. 12 12月, 2018 1 次提交
    • M
      block: deactivate blk_stat timer in wbt_disable_default() · 544fbd16
      Ming Lei 提交于
      rwb_enabled() can't be changed when there is any inflight IO.
      
      wbt_disable_default() may set rwb->wb_normal as zero, however the
      blk_stat timer may still be pending, and the timer function will update
      wrb->wb_normal again.
      
      This patch introduces blk_stat_deactivate() and applies it in
      wbt_disable_default(), then the following IO hang triggered when running
      parted & switching io scheduler can be fixed:
      
      [  369.937806] INFO: task parted:3645 blocked for more than 120 seconds.
      [  369.938941]       Not tainted 4.20.0-rc6-00284-g906c801e5248 #498
      [  369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  369.940768] parted          D    0  3645   3239 0x00000000
      [  369.941500] Call Trace:
      [  369.941874]  ? __schedule+0x6d9/0x74c
      [  369.942392]  ? wbt_done+0x5e/0x5e
      [  369.942864]  ? wbt_cleanup_cb+0x16/0x16
      [  369.943404]  ? wbt_done+0x5e/0x5e
      [  369.943874]  schedule+0x67/0x78
      [  369.944298]  io_schedule+0x12/0x33
      [  369.944771]  rq_qos_wait+0xb5/0x119
      [  369.945193]  ? karma_partition+0x1c2/0x1c2
      [  369.945691]  ? wbt_cleanup_cb+0x16/0x16
      [  369.946151]  wbt_wait+0x85/0xb6
      [  369.946540]  __rq_qos_throttle+0x23/0x2f
      [  369.947014]  blk_mq_make_request+0xe6/0x40a
      [  369.947518]  generic_make_request+0x192/0x2fe
      [  369.948042]  ? submit_bio+0x103/0x11f
      [  369.948486]  ? __radix_tree_lookup+0x35/0xb5
      [  369.949011]  submit_bio+0x103/0x11f
      [  369.949436]  ? blkg_lookup_slowpath+0x25/0x44
      [  369.949962]  submit_bio_wait+0x53/0x7f
      [  369.950469]  blkdev_issue_flush+0x8a/0xae
      [  369.951032]  blkdev_fsync+0x2f/0x3a
      [  369.951502]  do_fsync+0x2e/0x47
      [  369.951887]  __x64_sys_fsync+0x10/0x13
      [  369.952374]  do_syscall_64+0x89/0x149
      [  369.952819]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  369.953492] RIP: 0033:0x7f95a1e729d4
      [  369.953996] Code: Bad RIP value.
      [  369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
      [  369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4
      [  369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004
      [  369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0
      [  369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380
      [  369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008
      
      Cc: stable@vger.kernel.org
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      544fbd16
  23. 08 12月, 2018 1 次提交
  24. 16 11月, 2018 4 次提交
  25. 08 11月, 2018 1 次提交
  26. 12 10月, 2018 1 次提交
  27. 28 8月, 2018 3 次提交
  28. 23 8月, 2018 2 次提交