提交 · 93b8063804b62b55248e16499d853e1b20eff905 · openeuler / Kernel

29 6月, 2020 3 次提交

blk-cgroup: move rcu locking from blkcg_bio_issue_check to blk_throtl_bio · 93b80638

由 Christoph Hellwig 提交于 6月 27, 2020

The only thing in blkcg_bio_issue_check that needs to be under
rcu_read_lock is blk_throtl_bio, so move the locking there.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

93b80638

blk-cgroup: remove the !bio->bi_blkg check in blkcg_bio_issue_check · 81630e27

由 Christoph Hellwig 提交于 6月 27, 2020

This is purely a sanity check for grave programming errors.  Remove it
to simplify further work in this area.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

81630e27

block: move the bio cgroup associatation helpers to blk-cgroup.c · 28fc591f

由 Christoph Hellwig 提交于 6月 27, 2020

Keep the cgroup code together.
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28fc591f

01 5月, 2020 1 次提交

blk-iocost: switch to fixed non-auto-decaying use_delay · 54c52e10

由 Tejun Heo 提交于 4月 13, 2020

The use_delay mechanism was introduced by blk-iolatency to hold memory
allocators accountable for the reclaim and other shared IOs they cause. The
duration of the delay is dynamically balanced between iolatency increasing the
value on each target miss and it auto-decaying as time passes and threads get
delayed on it.

While this works well for iolatency, iocost's control model isn't compatible
with it. There is no repeated "violation" events which can be balanced against
auto-decaying. iocost instead knows how much a given cgroup is over budget and
wants to prevent that cgroup from issuing IOs while over budget. Until now,
iocost has been adding the cost of force-issued IOs. However, this doesn't
reflect the amount which is already over budget and is simply not enough to
counter the auto-decaying allowing anon-memory leaking low priority cgroup to
go over its alloted share of IOs.

As auto-decaying doesn't make much sense for iocost, this patch introduces a
different mode of operation for use_delay - when blkcg_set_delay() are used
insted of blkcg_add/use_delay(), the delay duration is not auto-decayed until it
is explicitly cleared with blkcg_clear_delay(). iocost is updated to keep the
delay duration synchronized to the budget overage amount.

With this change, iocost can effectively police cgroups which generate
significant amount of force-issued IOs.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54c52e10

29 4月, 2020 1 次提交

block: replace BIO_QUEUE_ENTERED with BIO_CGROUP_ACCT · 0376e9ef

由 Christoph Hellwig 提交于 4月 28, 2020

BIO_QUEUE_ENTERED is only used for cgroup accounting now, so rename
the flag and move setting it into the cgroup code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0376e9ef

02 4月, 2020 2 次提交

blkcg: don't offline parent blkcg first · 4308a434

由 Tejun Heo 提交于 7月 24, 2019

blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them.  However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.

Let's fix this by making child blkcgs pin the parents' online states.

Note that pin/unpin names are chosen over get/put intentionally
because css uses get/put online for something different.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4308a434

blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it · d866dbf6

由 Tejun Heo 提交于 7月 24, 2019

blkcg->cgwb_refcnt is used to delay blkcg offlining so that blkgs
don't get offlined while there are active cgwbs on them.  However, it
ends up making offlining unordered sometimes causing parents to be
offlined before children.

To fix it, we want child blkcgs to pin the parents' online states
turning the refcnt into a more generic online pinning mechanism.

In prepartion,

* blkcg->cgwb_refcnt -> blkcg->online_pin
* blkcg_cgwb_get/put() -> blkcg_pin/unpin_online()
* Take them out of CONFIG_CGROUP_WRITEBACK
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d866dbf6

13 12月, 2019 1 次提交

blk-cgroup: remove blkcg_drain_queue · 5addeae1

由 Guoqing Jiang 提交于 12月 12, 2019

Since blk_drain_queue had already been removed, so this function
is not needed anymore.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5addeae1

18 11月, 2019 1 次提交

blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1 · 496074f9

由 Tejun Heo 提交于 11月 14, 2019

Currently, cgroup rstat is supported only on cgroup2 hierarchy and
rstat functions shouldn't be called on cgroup1 cgroups.  While
converting blk-cgroup core statistics to rstat, f7331648
("blk-cgroup: reimplement basic IO stats using cgroup rstat")
accidentally ended up calling cgroup_rstat_updated() on cgroup1
cgroups causing crashes.

Longer term, we probably should add cgroup1 support to rstat but for
now let's mask the call directly.

Fixes: f7331648 ("blk-cgroup: reimplement basic IO stats using cgroup rstat")
Tested-by: NFaiz Abbas <faiz_abbas@ti.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

496074f9

08 11月, 2019 3 次提交

blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT · 1d156646

由 Tejun Heo 提交于 11月 07, 2019

blkg_rwstat is now only used by bfq-iosched and blk-throtl when on
cgroup1.  Let's move it into its own files and gate it behind a config
option.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1d156646

blk-cgroup: reimplement basic IO stats using cgroup rstat · f7331648

由 Tejun Heo 提交于 11月 07, 2019

blk-cgroup has been using blkg_rwstat to track basic IO stats.
Unfortunately, reading recursive stats scales badly as itinvolves
walking all descendants.  On systems with a huge number of cgroups
(dead or alive), this can lead to substantial CPU cost when reading IO
stats.

This patch reimplements basic IO stats using cgroup rstat which uses
more memory but makes recursive stat reading O(# descendants which
have been active since last reading) instead of O(# descendants).

* blk-cgroup core no longer uses sync/async stats.  Introduce new stat
  enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}.

* Add blkg_iostat[_set] which encapsulates byte and io stats, last
  values for propagation delta calculation and u64_stats_sync for
  correctness on 32bit archs.

* Update the new percpu stat counters directly and implement
  blkcg_rstat_flush() to implement propagation.

* blkg_print_stat() can now bring the stats up to date by calling
  cgroup_rstat_flush() and print them instead of directly summing up
  all descendants.

* It now allocates 96 bytes per cpu.  It used to be 40 bytes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Dan Schatzberg <dschatzberg@fb.com>
Cc: Daniel Xu <dlxu@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7331648

blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive() · 8a80d5d6

由 Tejun Heo 提交于 11月 07, 2019

These don't have users anymore.  Remove them.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a80d5d6

29 8月, 2019 2 次提交

blkcg: separate blkcg_conf_get_disk() out of blkg_conf_prep() · 015d254c

由 Tejun Heo 提交于 8月 28, 2019

Separate out blkcg_conf_get_disk() so that it can be used by blkcg
policy interface file input parsers before the policy is actually
enabled.  This doesn't introduce any functional changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

015d254c

blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn() · cf09a8ee

由 Tejun Heo 提交于 8月 28, 2019

Instead of @node, pass in @q and @blkcg so that the alloc function has
more context.  This doesn't cause any behavior change and will be used
by io.weight implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cf09a8ee

05 8月, 2019 1 次提交

block: Fix spelling in the header above blkg_lookup() · 012d4a65

由 Bart Van Assche 提交于 8月 01, 2019

See also commit 8f4236d9 ("block: remove QUEUE_FLAG_BYPASS and ->bypass") # v5.0.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

012d4a65

17 7月, 2019 1 次提交

blkcg: allow blkcg_policy->pd_stat() to print non-debug info too · 07b0fdec

由 Tejun Heo 提交于 7月 16, 2019

Currently, ->pd_stat() is called only when moduleparam
blkcg_debug_stats is set which prevents it from printing non-debug
policy-specific statistics.  Let's move debug testing down so that
->pd_stat() can print non-debug stat too.  This patch doesn't cause
any visible behavior change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07b0fdec

10 7月, 2019 1 次提交

blkcg: implement REQ_CGROUP_PUNT · d3f77dfd

由 Tejun Heo 提交于 6月 27, 2019

When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup.  This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.

This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.

Cc: Chris Mason <clm@fb.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3f77dfd

21 6月, 2019 4 次提交

blk-cgroup: move struct blkg_stat to bfq · c0ce79dc

由 Christoph Hellwig 提交于 6月 06, 2019

This structure and assorted infrastructure is only used by the bfq I/O
scheduler.  Move it there instead of bloating the common code.
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c0ce79dc

blk-cgroup: introduce a new struct blkg_rwstat_sample · 7af6fd91

由 Christoph Hellwig 提交于 6月 06, 2019

When sampling the blkcg counts we don't need atomics or per-cpu
variables.  Introduce a new structure just containing plain u64
counters.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7af6fd91

blk-cgroup: pass blkg_rwstat structures by reference · 5d0b6e48

由 Christoph Hellwig 提交于 6月 06, 2019

Returning a structure generates rather bad code, so switch to passing
by reference.  Also don't require the structure to be zeroed and add
to the 0-initialized counters, but actually set the counters to the
calculated value.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d0b6e48

blk-cgroup: factor out a helper to read rwstat counter · 239eeb08

由 Christoph Hellwig 提交于 6月 06, 2019

Trying to break up the crazy statements to something readable.
Also switch to an unsigned counter as it can't ever turn negative.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

239eeb08

21 12月, 2018 1 次提交

blkcg: clean up blkg_tryget_closest() · 6ab21879

由 Dennis Zhou 提交于 12月 19, 2018

The implementation of blkg_tryget_closest() wasn't super obvious and
became a point of suspicion when debugging [1]. So let's clean it up so
it's obviously not the problem.

Also add missing RCU read locking to bio_clone_blkg_association(), which
got exposed by adding the RCU read lock held check in
blkg_tryget_closest().

[1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6ab21879

13 12月, 2018 1 次提交

blkcg: handle dying request_queue when associating a blkg · 0273ac34

由 Dennis Zhou 提交于 12月 11, 2018

Between v3 [1] and v4 [2] of the blkg association series, the
association point moved from generic_make_request_checks(), which is
called after the request enters the queue, to bio_set_dev(), which is when
the bio is formed before submit_bio(). When the request_queue goes away,
the blkgs supporting the request_queue are destroyed and then the
q->root_blkg is set to %NULL.

This patch adds a %NULL check to blkg_tryget_closest() to prevent the
NPE caused by the above. It also adds a guard to see if the
request_queue is dying when creating a blkg to prevent creating a blkg
for a dead request_queue.

[1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/

Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
Reported-and-tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0273ac34

08 12月, 2018 10 次提交

blkcg: put back rcu lock in blkcg_bio_issue_check() · 4705de73

由 Dennis Zhou 提交于 12月 06, 2018

I was a little overzealous in removing the rcu_read_lock() call from
blkcg_bio_issue_check() and it broke blk-throttle. Put it back.

Fixes: e35403a034bf ("blkcg: associate blkg when associating a device")
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4705de73

blkcg: rename blkg_try_get() to blkg_tryget() · 7754f669