• T
    blk-iolatency: Fix inflight count imbalances and IO hangs on offline · 8a177a36
    Tejun Heo 提交于
    iolatency needs to track the number of inflight IOs per cgroup. As this
    tracking can be expensive, it is disabled when no cgroup has iolatency
    configured for the device. To ensure that the inflight counters stay
    balanced, iolatency_set_limit() freezes the request_queue while manipulating
    the enabled counter, which ensures that no IO is in flight and thus all
    counters are zero.
    
    Unfortunately, iolatency_set_limit() isn't the only place where the enabled
    counter is manipulated. iolatency_pd_offline() can also dec the counter and
    trigger disabling. As this disabling happens without freezing the q, this
    can easily happen while some IOs are in flight and thus leak the counts.
    
    This can be easily demonstrated by turning on iolatency on an one empty
    cgroup while IOs are in flight in other cgroups and then removing the
    cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
    system to ensure that removing the cgroup disables iolatency for the whole
    device.
    
    The following keeps flipping on and off iolatency on sda:
    
      echo +io > /sys/fs/cgroup/cgroup.subtree_control
      while true; do
          mkdir -p /sys/fs/cgroup/test
          echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
          sleep 1
          rmdir /sys/fs/cgroup/test
          sleep 1
      done
    
    and there's concurrent fio generating direct rand reads:
    
      fio --name test --filename=/dev/sda --direct=1 --rw=randread \
          --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k
    
    while monitoring with the following drgn script:
    
      while True:
        for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
            for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
                blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
                pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
                if pd.value_() == 0:
                    continue
                iolat = container_of(pd, 'struct iolatency_grp', 'pd')
                inflight = iolat.rq_wait.inflight.counter.value_()
                if inflight:
                    print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
                          f'{cgroup_path(css.cgroup).decode("utf-8")}')
        time.sleep(1)
    
    The monitoring output looks like the following:
    
      inflight=1 sda /user.slice
      inflight=1 sda /user.slice
      ...
      inflight=14 sda /user.slice
      inflight=13 sda /user.slice
      inflight=17 sda /user.slice
      inflight=15 sda /user.slice
      inflight=18 sda /user.slice
      inflight=17 sda /user.slice
      inflight=20 sda /user.slice
      inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
      inflight=19 sda /user.slice
      inflight=19 sda /user.slice
    
    If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
    will never get issued as there's no completion event to wake it up leading
    to an indefinite hang.
    
    This patch fixes the bug by unifying enable handling into a work item which
    is automatically kicked off from iolatency_set_min_lat_nsec() which is
    called from both iolatency_set_limit() and iolatency_pd_offline() paths.
    Punting to a work item is necessary as iolatency_pd_offline() is called
    under spinlocks while freezing a request_queue requires a sleepable context.
    
    This also simplifies the code reducing LOC sans the comments and avoids the
    unnecessary freezes which were happening whenever a cgroup's latency target
    is newly set or cleared.
    Signed-off-by: NTejun Heo <tj@kernel.org>
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Liu Bo <bo.liu@linux.alibaba.com>
    Fixes: 8c772a9b ("blk-iolatency: fix IO hang due to negative inflight counter")
    Cc: stable@vger.kernel.org # v5.0+
    Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
    8a177a36
blk-iolatency.c 29.2 KB