1. 08 11月, 2019 3 次提交
    • T
      blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT · 1d156646
      Tejun Heo 提交于
      blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
      cgroup1.  Let's move it into its own files and gate it behind a config
      option.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1d156646
    • T
      blk-cgroup: reimplement basic IO stats using cgroup rstat · f7331648
      Tejun Heo 提交于
      blk-cgroup has been using blkg_rwstat to track basic IO stats.
      Unfortunately, reading recursive stats scales badly as itinvolves
      walking all descendants.  On systems with a huge number of cgroups
      (dead or alive), this can lead to substantial CPU cost when reading IO
      stats.
      
      This patch reimplements basic IO stats using cgroup rstat which uses
      more memory but makes recursive stat reading O(# descendants which
      have been active since last reading) instead of O(# descendants).
      
      * blk-cgroup core no longer uses sync/async stats.  Introduce new stat
        enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}.
      
      * Add blkg_iostat[_set] which encapsulates byte and io stats, last
        values for propagation delta calculation and u64_stats_sync for
        correctness on 32bit archs.
      
      * Update the new percpu stat counters directly and implement
        blkcg_rstat_flush() to implement propagation.
      
      * blkg_print_stat() can now bring the stats up to date by calling
        cgroup_rstat_flush() and print them instead of directly summing up
        all descendants.
      
      * It now allocates 96 bytes per cpu.  It used to be 40 bytes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Dan Schatzberg <dschatzberg@fb.com>
      Cc: Daniel Xu <dlxu@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f7331648
    • T
      blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive() · 8a80d5d6
      Tejun Heo 提交于
      These don't have users anymore.  Remove them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a80d5d6
  2. 29 8月, 2019 2 次提交
  3. 05 8月, 2019 1 次提交
  4. 17 7月, 2019 1 次提交
  5. 10 7月, 2019 1 次提交
    • T
      blkcg: implement REQ_CGROUP_PUNT · d3f77dfd
      Tejun Heo 提交于
      When a shared kthread needs to issue a bio for a cgroup, doing so
      synchronously can lead to priority inversions as the kthread can be
      trapped waiting for that cgroup.  This patch implements
      REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
      to a dedicated per-blkcg work item to avoid such priority inversions.
      
      This will be used to fix priority inversions in btrfs compression and
      should be generally useful as we grow filesystem support for
      comprehensive IO control.
      
      Cc: Chris Mason <clm@fb.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d3f77dfd
  6. 21 6月, 2019 4 次提交
  7. 21 12月, 2018 1 次提交
  8. 13 12月, 2018 1 次提交
  9. 08 12月, 2018 10 次提交
  10. 16 11月, 2018 2 次提交
  11. 08 11月, 2018 1 次提交
  12. 02 11月, 2018 1 次提交
  13. 21 10月, 2018 1 次提交
  14. 22 9月, 2018 10 次提交
  15. 01 9月, 2018 1 次提交
    • D
      blkcg: delay blkg destruction until after writeback has finished · 59b57717
      Dennis Zhou (Facebook) 提交于
      Currently, blkcg destruction relies on a sequence of events:
        1. Destruction starts. blkcg_css_offline() is called and blkgs
           release their reference to the blkcg. This immediately destroys
           the cgwbs (writeback).
        2. With blkgs giving up their reference, the blkcg ref count should
           become zero and eventually call blkcg_css_free() which finally
           frees the blkcg.
      
      Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
      and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
      on the completion of all writeback associated with the blkcg. A count of
      the number of cgwbs is maintained and once that goes to zero, blkg
      destruction can follow. This should prevent premature blkg destruction
      related to writeback.
      
      The new process for blkcg cleanup is as follows:
        1. Destruction starts. blkcg_css_offline() is called which offlines
           writeback. Blkg destruction is delayed on the cgwb_refcnt count to
           avoid punting potentially large amounts of outstanding writeback
           to root while maintaining any ongoing policies. Here, the base
           cgwb_refcnt is put back.
        2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
           and handles destruction of blkgs. This is where the css reference
           held by each blkg is released.
        3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
           This finally frees the blkg.
      
      It seems in the past blk-throttle didn't do the most understandable
      things with taking data from a blkg while associating with current. So,
      the simplification and unification of what blk-throttle is doing caused
      this.
      
      Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59b57717