• D
    blkcg: delay blkg destruction until after writeback has finished · 59b57717
    Dennis Zhou (Facebook) 提交于
    Currently, blkcg destruction relies on a sequence of events:
      1. Destruction starts. blkcg_css_offline() is called and blkgs
         release their reference to the blkcg. This immediately destroys
         the cgwbs (writeback).
      2. With blkgs giving up their reference, the blkcg ref count should
         become zero and eventually call blkcg_css_free() which finally
         frees the blkcg.
    
    Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
    and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
    on the completion of all writeback associated with the blkcg. A count of
    the number of cgwbs is maintained and once that goes to zero, blkg
    destruction can follow. This should prevent premature blkg destruction
    related to writeback.
    
    The new process for blkcg cleanup is as follows:
      1. Destruction starts. blkcg_css_offline() is called which offlines
         writeback. Blkg destruction is delayed on the cgwb_refcnt count to
         avoid punting potentially large amounts of outstanding writeback
         to root while maintaining any ongoing policies. Here, the base
         cgwb_refcnt is put back.
      2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
         and handles destruction of blkgs. This is where the css reference
         held by each blkg is released.
      3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
         This finally frees the blkg.
    
    It seems in the past blk-throttle didn't do the most understandable
    things with taking data from a blkg while associating with current. So,
    the simplification and unification of what blk-throttle is doing caused
    this.
    
    Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
    Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
    Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
    Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Josef Bacik <josef@toxicpanda.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: NJens Axboe <axboe@kernel.dk>
    59b57717
blk-cgroup.c 47.5 KB