提交 · d3f77dfdc71835f8db71ca57d272b1fbec9dfc18 · openeuler / Kernel

10 7月, 2019 1 次提交

blkcg: implement REQ_CGROUP_PUNT · d3f77dfd

由 Tejun Heo 提交于 6月 27, 2019

When a shared kthread needs to issue a bio for a cgroup, doing so
synchronously can lead to priority inversions as the kthread can be
trapped waiting for that cgroup.  This patch implements
REQ_CGROUP_PUNT flag which makes submit_bio() punt the actual issuing
to a dedicated per-blkcg work item to avoid such priority inversions.

This will be used to fix priority inversions in btrfs compression and
should be generally useful as we grow filesystem support for
comprehensive IO control.

Cc: Chris Mason <clm@fb.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d3f77dfd

21 6月, 2019 4 次提交

blk-cgroup: move struct blkg_stat to bfq · c0ce79dc

由 Christoph Hellwig 提交于 6月 06, 2019

This structure and assorted infrastructure is only used by the bfq I/O
scheduler.  Move it there instead of bloating the common code.
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c0ce79dc

blk-cgroup: introduce a new struct blkg_rwstat_sample · 7af6fd91

由 Christoph Hellwig 提交于 6月 06, 2019

When sampling the blkcg counts we don't need atomics or per-cpu
variables.  Introduce a new structure just containing plain u64
counters.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7af6fd91

blk-cgroup: pass blkg_rwstat structures by reference · 5d0b6e48

由 Christoph Hellwig 提交于 6月 06, 2019

Returning a structure generates rather bad code, so switch to passing
by reference.  Also don't require the structure to be zeroed and add
to the 0-initialized counters, but actually set the counters to the
calculated value.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5d0b6e48

blk-cgroup: factor out a helper to read rwstat counter · 239eeb08

由 Christoph Hellwig 提交于 6月 06, 2019

Trying to break up the crazy statements to something readable.
Also switch to an unsigned counter as it can't ever turn negative.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

239eeb08

21 12月, 2018 1 次提交

blkcg: clean up blkg_tryget_closest() · 6ab21879

由 Dennis Zhou 提交于 12月 19, 2018

The implementation of blkg_tryget_closest() wasn't super obvious and
became a point of suspicion when debugging [1]. So let's clean it up so
it's obviously not the problem.

Also add missing RCU read locking to bio_clone_blkg_association(), which
got exposed by adding the RCU read lock held check in
blkg_tryget_closest().

[1] https://lore.kernel.org/linux-block/a7e97e4b-0dd8-3a54-23b7-a0f27b17fde8@kernel.dk/Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6ab21879

13 12月, 2018 1 次提交

blkcg: handle dying request_queue when associating a blkg · 0273ac34

由 Dennis Zhou 提交于 12月 11, 2018

Between v3 [1] and v4 [2] of the blkg association series, the
association point moved from generic_make_request_checks(), which is
called after the request enters the queue, to bio_set_dev(), which is when
the bio is formed before submit_bio(). When the request_queue goes away,
the blkgs supporting the request_queue are destroyed and then the
q->root_blkg is set to %NULL.

This patch adds a %NULL check to blkg_tryget_closest() to prevent the
NPE caused by the above. It also adds a guard to see if the
request_queue is dying when creating a blkg to prevent creating a blkg
for a dead request_queue.

[1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/

Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
Reported-and-tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0273ac34

08 12月, 2018 10 次提交

blkcg: put back rcu lock in blkcg_bio_issue_check() · 4705de73

由 Dennis Zhou 提交于 12月 06, 2018

I was a little overzealous in removing the rcu_read_lock() call from
blkcg_bio_issue_check() and it broke blk-throttle. Put it back.

Fixes: e35403a034bf ("blkcg: associate blkg when associating a device")
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4705de73

blkcg: rename blkg_try_get() to blkg_tryget() · 7754f669

由 Dennis Zhou 提交于 12月 05, 2018

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or %NULL.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7754f669

blkcg: change blkg reference counting to use percpu_ref · 7fcf2b03

由 Dennis Zhou 提交于 12月 05, 2018

Every bio is now associated with a blkg putting blkg_get, blkg_try_get,
and blkg_put on the hot path. Switch over the refcnt in blkg to use
percpu_ref.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7fcf2b03

blkcg: remove additional reference to the css · fc5a828b

由 Dennis Zhou 提交于 12月 05, 2018

The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc5a828b

blkcg: remove bio->bi_css and instead use bio->bi_blkg · db6638d7

由 Dennis Zhou 提交于 12月 05, 2018

Prior patches ensured that any bio that interacts with a request_queue
is properly associated with a blkg. This makes bio->bi_css unnecessary
as blkg maintains a reference to blkcg already.

This removes the bio field bi_css and transfers corresponding uses to
access via bi_blkg.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db6638d7

blkcg: consolidate bio_issue_init() to be a part of core · e439bedf

由 Dennis Zhou 提交于 12月 05, 2018

bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e439bedf

blkcg: associate blkg when associating a device · 5cdf2e3f

由 Dennis Zhou 提交于 12月 05, 2018

Previously, blkg association was handled by controller specific code in
blk-throttle and blk-iolatency. However, because a blkg represents a
relationship between a blkcg and a request_queue, it makes sense to keep
the blkg->q and bio->bi_disk->queue consistent.

This patch moves association into the bio_set_dev macro(). This should
cover the majority of cases where the device is set/changed keeping the
two pointers consistent. Fallback code is added to
blkcg_bio_issue_check() to catch any missing paths.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5cdf2e3f

blkcg: convert blkg_lookup_create() to find closest blkg · beea9da0

由 Dennis Zhou 提交于 12月 05, 2018

There are several scenarios where blkg_lookup_create() can fail such as
the blkcg dying, request_queue is dying, or simply being OOM. Most
handle this by simply falling back to the q->root_blkg and calling it a
day.

This patch implements the notion of closest blkg. During
blkg_lookup_create(), if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest() is introduced and used
during association so a bio is always attached to a blkg.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

beea9da0

blkcg: update blkg_lookup_create() to do locking · b978962a

由 Dennis Zhou 提交于 12月 05, 2018

To know when to create a blkg, the general pattern is to do a
blkg_lookup() and if that fails, lock and do the lookup again, and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create() to do locking and implement this
pattern. The old blkg_lookup_create() is renamed to
__blkg_lookup_create().  If a call site wants to do its own error
handling or already owns the queue lock, they can use
__blkg_lookup_create(). This will be used in upcoming patches.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b978962a

blkcg: fix ref count issue with bio_blkcg() using task_css · 0fe061b9

由 Dennis Zhou 提交于 12月 05, 2018

The bio_blkcg() function turns out to be inconsistent and consequently
dangerous to use. The first part returns a blkcg where a reference is
owned by the bio meaning it does not need to be rcu protected. However,
the third case, the last line, is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

This can race against task migration and the cgroup dying. It is also
semantically different as it must be called rcu protected and is
susceptible to failure when trying to get a reference to it.

This patch adds association ahead of calling bio_blkcg() rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg(). In blk-iolatency, association is
moved above the bio_blkcg() call to ensure it will not return %NULL.

BFQ uses the old bio_blkcg() function, but I do not want to address it
in this series due to the complexity. I have created a private version
documenting the inconsistency and noting not to use it.
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0fe061b9

16 11月, 2018 2 次提交

block: remove the queue_lock indirection · 0d945c1f

由 Christoph Hellwig 提交于 11月 15, 2018

With the legacy request path gone there is no good reason to keep
queue_lock as a pointer, we can always use the embedded lock now.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

Fixed floppy and blk-cgroup missing conversions and half done edits.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d945c1f

block: remove QUEUE_FLAG_BYPASS and ->bypass · 8f4236d9

由 Christoph Hellwig 提交于 11月 14, 2018

Unused since the removal of the legacy request code.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f4236d9

08 11月, 2018 1 次提交

block: remove request_list code · db6d9952

由 Jens Axboe 提交于 11月 02, 2018

It's now dead code, nobody uses it.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db6d9952

02 11月, 2018 1 次提交

blkcg: revert blkcg cleanups series · b5f2954d

由 Dennis Zhou 提交于 11月 01, 2018

This reverts a series committed earlier due to null pointer exception
bug report in [1]. It seems there are edge case interactions that I did
not consider and will need some time to understand what causes the
adverse interactions.

The original series can be found in [2] with a follow up series in [3].

[1] https://www.spinics.net/lists/cgroups/msg20719.html
[2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
[3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/

This reverts the following commits:
d459d853, b2c3fa54, 101246ec, b3b9f24f, e2b09899,
f0fcb3ec, c839e7a0, bdc24917, 74b7c02a, 5bf9a1f3,
a7b39b4e, 07b05bcc, 49f4c2dc, 27e6fa99Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b5f2954d

21 10月, 2018 1 次提交

blkcg: fix edge case for blk_get_rl() under memory pressure · b2c3fa54

由 Dennis Zhou 提交于 10月 20, 2018

It is possible for blkg creation to fail when in blk_get_rl(). In this
situation, the fallback logic returns the nearest created blkg. There is
however special handling for the request_list for the root blkcg. This
fixes the missing edge case from the earlier series changing
blk_get_rl().

Fixes: e2b09899 ("blkcg: cleanup and make blk_get_rl use blkg_lookup_create")
Signed-off-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2c3fa54

22 9月, 2018 10 次提交

blkcg: rename blkg_try_get to blkg_tryget · 101246ec

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

101246ec

blkcg: change blkg reference counting to use percpu_ref · b3b9f24f

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b3b9f24f

blkcg: cleanup and make blk_get_rl use blkg_lookup_create · e2b09899

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

blk_get_rl is responsible for identifying which request_list a request
should be allocated to. Try get logic was added earlier, but
semantically the logic was not changed.

This patch makes better use of the bio already having a reference to the
blkg in the hot path. The cold path uses a better fallback of
blkg_lookup_create rather than just blkg_lookup and then falling back to
the q->root_rl. If lookup_create fails with anything but -ENODEV, it
falls back to q->root_rl.

A clarifying comment is added to explain why q->root_rl is used rather
than the root blkg's rl.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e2b09899

blkcg: remove additional reference to the css · f0fcb3ec

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f0fcb3ec

blkcg: remove bio->bi_css and instead use bio->bi_blkg · c839e7a0

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c839e7a0

blkcg: consolidate bio_issue_init to be a part of core · 5bf9a1f3

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5bf9a1f3

blkcg: always associate a bio with a blkg · a7b39b4e

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.

This patch begins the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg. The duplicate association logic is removed from blk-throttle and
blk-iolatency.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7b39b4e

blkcg: convert blkg_lookup_create to find closest blkg · 07b05bcc

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07b05bcc

blkcg: update blkg_lookup_create to do locking · 49f4c2dc

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49f4c2dc

blkcg: fix ref count issue with bio_blkcg using task_css · 27e6fa99

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg. blk_get_rl is modified as well to get
a reference to the blkcg it may use and blk_put_rl will always put the
reference back. Association is also moved above the bio_blkcg call to
ensure it will not return NULL in blk-iolatency.

BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
to address this in this series. I've created a private version of the
function with notes not to use it describing the flaw. Hopefully soon,
that code can be cleaned up.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27e6fa99

01 9月, 2018 2 次提交

blkcg: delay blkg destruction until after writeback has finished · 59b57717

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59b57717

Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" · 6b065462

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

This reverts commit 4c699480.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c699480) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c699480 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b065462

12 8月, 2018 1 次提交

blkcg: Make blkg_root_lookup() work for queues in bypass mode · b86d865c

由 Bart Van Assche 提交于 8月 10, 2018

For legacy queues the only call of blkg_root_lookup() happens after
bypass mode has been enabled. Since blkg_lookup() returns NULL for
queues in bypass mode, modify the blkg_root_lookup() such that it
no longer depends on bypass mode. Rename the function into
blk_queue_root_blkg() as suggested by Tejun.
Suggested-by: NTejun Heo <tj@kernel.org>
Fixes: 6bad9b21 ("blkcg: Introduce blkg_root_lookup()")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b86d865c

09 8月, 2018 1 次提交

blkcg: Introduce blkg_root_lookup() · 6bad9b21

由 Bart Van Assche 提交于 8月 09, 2018

This new function will be used in a later patch to verify whether a
queue has been dissociated from the cgroup controller before being
released.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alexandru Moise <00moses.alexander00@gmail.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6bad9b21

30 7月, 2018 1 次提交

block: don't account for split bio's size in cgroup stats · c454edc2

由 Josef Bacik 提交于 7月 30, 2018

We need to check in blkcg_bio_issue_check if the bio is flagged as
QUEUE_ENTERED, because if it is then we've already accounted for the
size of the IO in the cgroup stats.  We can still however account for
the extra IO since it'll be another request.
Reported-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c454edc2

18 7月, 2018 1 次提交

blkcg: Track DISCARD statistics and output them in cgroup io.stat · 636620b6

由 Tejun Heo 提交于 7月 18, 2018

Add tracking of REQ_OP_DISCARD ios to the per-cgroup io.stat.  Two
fields, dbytes and dios, to respectively count the total bytes and
number of discards are added.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andy Newell <newella@fb.com>
Cc: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

636620b6

09 7月, 2018 2 次提交

blkcg: add generic throttling mechanism · d09d8df3

由 Josef Bacik 提交于 7月 03, 2018

Since IO can be issued from literally anywhere it's almost impossible to
do throttling without having some sort of adverse effect somewhere else
in the system because of locking or other dependencies.  The best way to
solve this is to do the throttling when we know we aren't holding any
other kernel resources.  Do this by tracking throttling in a per-blkg
basis, and if we require throttling flag the task that it needs to check
before it returns to user space and possibly sleep there.

This is to address the case where a process is doing work that is
generating IO that can't be throttled, whether that is directly with a
lot of REQ_META IO, or indirectly by allocating so much memory that it
is swamping the disk with REQ_SWAP.  We can't use task_add_work as we
don't want to induce a memory allocation in the IO path, so simply
saving the request queue in the task and flagging it to do the
notify_resume thing achieves the same result without the overhead of a
memory allocation.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d09d8df3

blk: introduce REQ_SWAP · 0d1e0c7c

由 Josef Bacik 提交于 7月 03, 2018

Just like REQ_META, it's important to know the IO coming down is swap
in order to guard against potential IO priority inversion issues with
cgroups.  Add REQ_SWAP and use it for all swap IO, and add it to our
bio_issue_as_root_blkg helper.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0d1e0c7c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功