1. 09 6月, 2020 1 次提交
  2. 07 5月, 2020 1 次提交
  3. 25 3月, 2020 1 次提交
    • X
      alinux: blk-mq: fix broken io_ticks & time_in_queue update · a9ee8ebe
      Xiaoguang Wang 提交于
      fix #25369772
      
      In blk-mq device, we observed a issue that though iops is low, but iostat
      shows a very high svctm & util value, which is counter-intuitive.
      
      The root cause is that blk_account_io_start() calls part_round_stats()
      before "rq->part = part" statement, so part_round_stats() will count
      an inflight request to the whole device, but not for the specific
      partition, then it'll update whole device's io_ticks and time_in_queue
      with a stale part->stamp.
      
      To fix this issue, if a request's part is NULL, we just don't count
      it as an inflight request to the whole device.
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      a9ee8ebe
  4. 18 3月, 2020 17 次提交
  5. 17 1月, 2020 12 次提交
  6. 15 1月, 2020 8 次提交
    • J
    • J
      alinux: iocost: add ioc_gq stat · 0670363c
      Jiufei Xue 提交于
      Add a stat file to monitor the ioc_gq stat.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      0670363c
    • X
      alinux: blk-throttle: limit bios to fix amount of pages entering writeback prematurely · 0fd4aa6d
      Xiaoguang Wang 提交于
      Currently in blk_throtl_bio(), if one bio exceeds its throtl_grp's bps
      or iops limit, this bio will be queued throtl_grp's throtl_service_queue,
      then obviously mm subsys will submit more pages, even underlying device
      can not handle these io requests, also this will make large amount of pages
      entering writeback prematurely, later if some process writes some of these
      pages, it will wait for long time.
      
      I have done some tests: one process does buffered writes on a 1GB file,
      and make this process's blkcg max bps limit be 10MB/s, I observe this:
      	#cat /proc/meminfo  | grep -i back
      	Writeback:        900024 kB
      	WritebackTmp:          0 kB
      
      I think this Writeback value is just too big, indeed many bios have been
      queued in throtl_grp's throtl_service_queue, if one process try to write
      the last bio's page in this queue, it will call wait_on_page_writeback(page),
      which must wait the previous bios to finish and will take long time, we
      have also see 120s hung task warning in our server.
      
       INFO: task kworker/u128:0:30072 blocked for more than 120 seconds.
             Tainted: G            E 4.9.147-013.ali3000_015_test.alios7.x86_64 #1
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       kworker/u128:0  D    0 30072      2 0x00000000
       Workqueue: writeback wb_workfn (flush-8:16)
        ffff882ddd066b40 0000000000000000 ffff882e5cad3400 ffff882fbe959e80
        ffff882fa50b1a00 ffffc9003a5a3768 ffffffff8173325d ffffc9003a5a3780
        00ff882e5cad3400 ffff882fbe959e80 ffffffff81360b49 ffff882e5cad3400
       Call Trace:
        [<ffffffff8173325d>] ? __schedule+0x23d/0x6d0
        [<ffffffff81360b49>] ? alloc_request_struct+0x19/0x20
        [<ffffffff81733726>] schedule+0x36/0x80
        [<ffffffff81736c56>] schedule_timeout+0x206/0x4b0
        [<ffffffff81036c69>] ? sched_clock+0x9/0x10
        [<ffffffff81363073>] ? get_request+0x403/0x810
        [<ffffffff8110ca10>] ? ktime_get+0x40/0xb0
        [<ffffffff81732f8a>] io_schedule_timeout+0xda/0x170
        [<ffffffff81733f90>] ? bit_wait+0x60/0x60
        [<ffffffff81733fab>] bit_wait_io+0x1b/0x60
        [<ffffffff81733b28>] __wait_on_bit+0x58/0x90
        [<ffffffff811b0d91>] ? find_get_pages_tag+0x161/0x2e0
        [<ffffffff811aff62>] wait_on_page_bit+0x82/0xa0
        [<ffffffff810d47f0>] ? wake_atomic_t_function+0x60/0x60
        [<ffffffffa02fc181>] mpage_prepare_extent_to_map+0x2d1/0x310 [ext4]
        [<ffffffff8121ff65>] ? kmem_cache_alloc+0x185/0x1a0
        [<ffffffffa0305a2f>] ? ext4_init_io_end+0x1f/0x40 [ext4]
        [<ffffffffa0300294>] ext4_writepages+0x404/0xef0 [ext4]
        [<ffffffff81508c64>] ? scsi_init_io+0x44/0x200
        [<ffffffff81398a0f>] ? fprop_fraction_percpu+0x2f/0x80
        [<ffffffff811c139e>] do_writepages+0x1e/0x30
        [<ffffffff8127c0f5>] __writeback_single_inode+0x45/0x320
        [<ffffffff8127c942>] writeback_sb_inodes+0x272/0x600
        [<ffffffff8127cf6b>] wb_writeback+0x10b/0x300
        [<ffffffff8127d884>] wb_workfn+0xb4/0x380
        [<ffffffff810b85e9>] ? try_to_wake_up+0x59/0x3e0
        [<ffffffff810a5759>] process_one_work+0x189/0x420
        [<ffffffff810a5a3e>] worker_thread+0x4e/0x4b0
        [<ffffffff810a59f0>] ? process_one_work+0x420/0x420
        [<ffffffff810ac026>] kthread+0xe6/0x100
        [<ffffffff810abf40>] ? kthread_park+0x60/0x60
        [<ffffffff81738499>] ret_from_fork+0x39/0x50
      
      To fix this issue, we can simply limit throtl_service_queue's max queued
      bios, currently we limit it to throtl_grp's bps_limit or iops limit, if it
      still exteeds, we just sleep for a while.
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      0fd4aa6d
    • J
      alinux: block-throttle: add counters for completed io · 33ed5f09
      Jiufei Xue 提交于
      Now we have counters for wait_time and service_time, but no completed
      ios, so the average latency can not be measured.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      33ed5f09
    • J
      alinux: block-throttle: code cleanup · b03ba65b
      Jiufei Xue 提交于
      This patch does the code cleanup because the seq_show handlers for tg
      counters are the same. No functional changes.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      b03ba65b
    • J
      alinux: blk-throttle: add throttled io/bytes counter · 766cfe98
      Joseph Qi 提交于
      Add 2 interfaces to stat io throttle information:
        blkio.throttle.total_io_queued
        blkio.throttle.total_bytes_queued
      
      These interfaces are used for monitoring throttled io/bytes and
      analyzing if delay has relation with io throttle.
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NGavin Shan <shan.gavin@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      766cfe98
    • J
      alinux: blk-throttle: fix tg NULL pointer dereference · bc0cc360
      Joseph Qi 提交于
      io throtl stats will blkg_get at the beginning of throttle and then
      blkg_put at the new introduced bi_tg_end_io. This will cause blkg to be
      freed if end_io is called twice like dm-thin, which will save origin
      end_io first, and call its overwrite end_io and then the saved end_io.
      After that, access blkg is invalid and finally BUG:
      
      [ 4417.235048] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0
      [ 4417.236475] IP: [<ffffffff812e7c71>] throtl_update_dispatch_stats+0x21/0xb0
      [ 4417.237865] PGD 98395067 PUD 362e1067 PMD 0
      [ 4417.239232] Oops: 0000 [#1] SMP
      ......
      [ 4417.274070] Call Trace:
      [ 4417.275407]  [<ffffffff812ea93d>] blk_throtl_bio+0xfd/0x630
      [ 4417.276760]  [<ffffffff810b3613>] ? wake_up_process+0x23/0x40
      [ 4417.278079]  [<ffffffff81094c04>] ? wake_up_worker+0x24/0x30
      [ 4417.279387]  [<ffffffff81095772>] ? insert_work+0x62/0xa0
      [ 4417.280697]  [<ffffffff8116c2c7>] ? mempool_free_slab+0x17/0x20
      [ 4417.282019]  [<ffffffff8116c6c9>] ? mempool_free+0x49/0x90
      [ 4417.283326]  [<ffffffff812c9acf>] generic_make_request_checks+0x16f/0x360
      [ 4417.284637]  [<ffffffffa0340d97>] ? thin_map+0x227/0x2c0 [dm_thin_pool]
      [ 4417.285951]  [<ffffffff812c9ce7>] generic_make_request+0x27/0x130
      [ 4417.287240]  [<ffffffffa0230b3d>] __map_bio+0xad/0x100 [dm_mod]
      [ 4417.288503]  [<ffffffffa023257e>] __clone_and_map_data_bio+0x15e/0x240 [dm_mod]
      [ 4417.289778]  [<ffffffffa02329ea>] __split_and_process_bio+0x38a/0x500 [dm_mod]
      [ 4417.291062]  [<ffffffffa0232c91>] dm_make_request+0x131/0x1a0 [dm_mod]
      [ 4417.292344]  [<ffffffff812c9da2>] generic_make_request+0xe2/0x130
      [ 4417.293626]  [<ffffffff812c9e61>] submit_bio+0x71/0x150
      [ 4417.294909]  [<ffffffff8121ab1d>] ? bio_alloc_bioset+0x20d/0x360
      [ 4417.296195]  [<ffffffff81215acb>] _submit_bh+0x14b/0x220
      [ 4417.297484]  [<ffffffff81215bb0>] submit_bh+0x10/0x20
      [ 4417.298744]  [<ffffffffa016d8d8>] jbd2_journal_commit_transaction+0x6c8/0x19a0 [jbd2]
      [ 4417.300014]  [<ffffffff810135b8>] ? __switch_to+0xf8/0x4c0
      [ 4417.301268]  [<ffffffffa01731e9>] kjournald2+0xc9/0x270 [jbd2]
      [ 4417.302524]  [<ffffffff810a0fd0>] ? wake_up_atomic_t+0x30/0x30
      [ 4417.303753]  [<ffffffffa0173120>] ? commit_timeout+0x10/0x10 [jbd2]
      [ 4417.304950]  [<ffffffff8109ffef>] kthread+0xcf/0xe0
      [ 4417.306107]  [<ffffffff8109ff20>] ? kthread_create_on_node+0x140/0x140
      [ 4417.307255]  [<ffffffff81647f18>] ret_from_fork+0x58/0x90
      [ 4417.308349]  [<ffffffff8109ff20>] ? kthread_create_on_node+0x140/0x140
      ......
      
      Now we introduce a new bio flag BIO_THROTL_STATED to make sure
      blkg_get/put only get called once for the same bio.
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      bc0cc360
    • J
      alinux: blk-throttle: support io delay stats · dc61ad52
      Joseph Qi 提交于
      Add blkio.throttle.io_service_time and blkio.throttle.io_wait_time to
      get per-cgroup io delay statistics.
      io_service_time represents the time spent after io throttle to io
      completion, while io_wait_time represents the time spent on throttle
      queue.
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      dc61ad52