提交 · 3dccdae54fe836a22cee9dc6df9fd1708ae075ce · openeuler / Kernel

25 9月, 2018 6 次提交

block: merge BIOVEC_SEG_BOUNDARY into biovec_phys_mergeable · 3dccdae5

由 Christoph Hellwig 提交于 9月 24, 2018

These two checks should always be performed together, so merge them into
a single helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3dccdae5

block: add a missing BIOVEC_SEG_BOUNDARY check in bio_add_pc_page · 0e253391

由 Christoph Hellwig 提交于 9月 24, 2018

The actual recaculation of segments in __blk_recalc_rq_segments will
do this check, so there is no point in forcing it if we know it won't
succeed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0e253391

block: simplify BIOVEC_PHYS_MERGEABLE · 6a9f5f24

由 Christoph Hellwig 提交于 9月 24, 2018

Turn the macro into an inline, move it to blk.h and simplify the
arch hooks a bit.

Also rename the function to biovec_phys_mergeable as there is no need
to shout.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a9f5f24

block: move req_gap_back_merge to blk.h · 27ca1d4e

由 Christoph Hellwig 提交于 9月 24, 2018

No need to expose these helpers outside the block layer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27ca1d4e

block: move req_gap_{back,front}_merge to blk-merge.c · e9907009

由 Christoph Hellwig 提交于 9月 24, 2018

Keep it close to the actual users instead of exposing the function to all
drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9907009

block: move integrity_req_gap_{back,front}_merge to blk.h · 43b729bf

由 Christoph Hellwig 提交于 9月 24, 2018

No need to expose these to drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

43b729bf

22 9月, 2018 11 次提交

blk-mq: Document the functions that iterate over requests · c7b1bf5c

由 Bart Van Assche 提交于 9月 21, 2018

Make it easier to understand the purpose of the functions that iterate
over requests by documenting their purpose. Fix several minor spelling
and grammer mistakes in comments in these functions.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c7b1bf5c

blkcg: rename blkg_try_get to blkg_tryget · 101246ec

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

blkg reference counting now uses percpu_ref rather than atomic_t. Let's
make this consistent with css_tryget. This renames blkg_try_get to
blkg_tryget and now returns a bool rather than the blkg or NULL.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

101246ec

blkcg: change blkg reference counting to use percpu_ref · b3b9f24f

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

Now that every bio is associated with a blkg, this puts the use of
blkg_get, blkg_try_get, and blkg_put on the hot path. This switches over
the refcnt in blkg to use percpu_ref.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b3b9f24f

blkcg: remove additional reference to the css · f0fcb3ec

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

The previous patch in this series removed carrying around a pointer to
the css in blkg. However, the blkg association logic still relied on
taking a reference on the css to ensure we wouldn't fail in getting a
reference for the blkg.

Here the implicit dependency on the css is removed. The association
continues to rely on the tryget logic walking up the blkg tree. This
streamlines the three ways that association can happen: normal, swap,
and writeback.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f0fcb3ec

blkcg: remove bio->bi_css and instead use bio->bi_blkg · c839e7a0

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

Prior patches ensured that all bios are now associated with some blkg.
This now makes bio->bi_css unnecessary as blkg maintains a reference to
the blkcg already.

This patch removes the field bi_css and transfers corresponding uses to
access via bi_blkg.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c839e7a0

blkcg: associate a blkg for pages being evicted by swap · 74b7c02a

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

A prior patch in this series added blkg association to bios issued by
cgroups. There are two other paths that we want to attribute work back
to the appropriate cgroup: swap and writeback. Here we modify the way
swap tags bios to include the blkg. Writeback will be tackle in the next
patch.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74b7c02a

blkcg: consolidate bio_issue_init to be a part of core · 5bf9a1f3

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

bio_issue_init among other things initializes the timestamp for an IO.
Rather than have this logic handled by policies, this consolidates it to
be on the init paths (normal, clone, bounce clone).
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5bf9a1f3

blkcg: always associate a bio with a blkg · a7b39b4e

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

Previously, blkg's were only assigned as needed by blk-iolatency and
blk-throttle. bio->css was also always being associated while blkg was
being looked up and then thrown away in blkcg_bio_issue_check.

This patch begins the cleanup of bio->css and bio->bi_blkg by always
associating a blkg in blkcg_bio_issue_check. This tries to create the
blkg, but if it is not possible, falls back to using the root_blkg of
the request_queue. Therefore, a bio will always be associated with a
blkg. The duplicate association logic is removed from blk-throttle and
blk-iolatency.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7b39b4e

blkcg: convert blkg_lookup_create to find closest blkg · 07b05bcc

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07b05bcc

blkcg: update blkg_lookup_create to do locking · 49f4c2dc

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49f4c2dc

blkcg: fix ref count issue with bio_blkcg using task_css · 27e6fa99

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg. blk_get_rl is modified as well to get
a reference to the blkcg it may use and blk_put_rl will always put the
reference back. Association is also moved above the bio_blkcg call to
ensure it will not return NULL in blk-iolatency.

BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
to address this in this series. I've created a private version of the
function with notes not to use it describing the flaw. Hopefully soon,
that code can be cleaned up.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27e6fa99

21 9月, 2018 1 次提交

Blk-throttle: update to use rbtree with leftmost node cached · 9ff01255

由 Liu Bo 提交于 8月 21, 2018

As rbtree has native support of caching leftmost node,
i.e. rb_root_cached, no need to do the caching by ourselves.
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ff01255

20 9月, 2018 1 次提交

block: use bio_add_page in bio_iov_iter_get_pages · 576ed913

由 Christoph Hellwig 提交于 9月 20, 2018

Replace a nasty hack with a different nasty hack to prepare for multipage
bio_vecs.  By moving the temporary page array as far up as possible in
the space allocated for the bio_vec array we can iterate forward over it
and thus use bio_add_page.  Using bio_add_page means we'll be able to
merge physically contiguous pages once support for multipath bio_vecs is
merged.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

576ed913

15 9月, 2018 3 次提交

blok, bfq: do not plug I/O if all queues are weight-raised · c8765de0

由 Paolo Valente 提交于 9月 14, 2018

To reduce latency for interactive and soft real-time applications, bfq
privileges the bfq_queues containing the I/O of these
applications. These privileged queues, referred-to as weight-raised
queues, get a much higher share of the device throughput
w.r.t. non-privileged queues. To preserve this higher share, the I/O
of any non-weight-raised queue must be plugged whenever a sync
weight-raised queue, while being served, remains temporarily empty. To
attain this goal, bfq simply plugs any I/O (from any queue), if a sync
weight-raised queue remains empty while in service.

Unfortunately, this plugging typically lowers throughput with random
I/O, on devices with internal queueing (because it reduces the filling
level of the internal queues of the device).

This commit addresses this issue by restricting the cases where
plugging is performed: if a sync weight-raised queue remains empty
while in service, then I/O plugging is performed only if some of the
active bfq_queues are *not* weight-raised (which is actually the only
circumstance where plugging is needed to preserve the higher share of
the throughput of weight-raised queues). This restriction proved able
to boost throughput in really many use cases needing only maximum
throughput.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8765de0

block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash · d0edc247

由 Paolo Valente 提交于 9月 14, 2018

The Achilles' heel of BFQ is its failing to reach a high throughput
with sync random I/O on flash storage with internal queueing, in case
the processes doing I/O have differentiated weights.

The cause of this failure is as follows. If at least two processes do
sync I/O, and have a different weight from each other, then BFQ plugs
I/O dispatching every time one of these processes, while it is being
served, remains temporarily without pending I/O requests. This
plugging is necessary to guarantee that every process enjoys a
bandwidth proportional to its weight; but it empties the internal
queue(s) of the drive. And this kills throughput with random I/O. So,
if some processes have differentiated weights and do both sync and
random I/O, the end result is a throughput collapse.

This commit tries to counter this problem by injecting the service of
other processes, in a controlled way, while the process in service
happens to have no I/O. This injection is performed only if the medium
is non rotational and performs internal queueing, and the process in
service does random I/O (service injection might be beneficial for
sequential I/O too, we'll work on that).

As an example of the benefits of this commit, on a PLEXTOR PX-256M5S
SSD, and with five processes having differentiated weights and doing
sync random 4KB I/O, this commit makes the throughput with bfq grow by
400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than
that reached with none. As some less random I/O is added to the mix,
the throughput becomes equal to or higher than that with none.

This commit is a very first attempt to recover throughput without
losing control, and certainly has many limitations. One is, e.g., that
the processes whose service is injected are not chosen so as to
distribute the extra bandwidth they receive in accordance to their
weights. Thus there might be loss of weighted fairness in some
cases. Anyway, this loss concerns extra service, which would not have
been received at all without this commit. Other limitations and issues
will probably show up with usage.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0edc247

block, bfq: correctly charge and reset entity service in all cases · cbeb869a

由 Paolo Valente 提交于 9月 14, 2018

BFQ schedules entities (which represent either per-process queues or
groups of queues) as a function of their timestamps. In particular, as
a function of their (virtual) finish times. The finish time of an
entity is computed as a function of the budget assigned to the entity,
assuming, tentatively, that the entity, once in service, will receive
an amount of service equal to its budget. Then, when the entity is
expired because it finishes to be served, this finish time is updated
as a function of the actual service received by the entity. This
allows the entity to be correctly charged with only the service
received, and then to be correctly re-scheduled.

Yet an entity may receive service also while not being the entity in
service (in the scheduling environment of its parent entity), for
several reasons. If the entity remains with no backlog while receiving
this 'unofficial' service, then it is expired. Also on such an
expiration, the finish time of the entity should be updated to account
for only the service actually received by the entity. Unfortunately,
such an update is not performed for an entity expiring without being
the entity in service.

In a similar vein, the service counter of the entity in service is
reset when the entity is expired, to be ready to be used for next
service cycle. This reset too should be performed also in case an
entity is expired because it remains empty after receiving service
while not being the entity in service. But in this case the reset is
not performed.

This commit performs the above update of the finish time and reset of
the service received, also for an entity expiring while not being the
entity in service.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cbeb869a

14 9月, 2018 1 次提交

blk-iolatency: remove set but not used variables 'changed' and 'blkiolat' · f8c0d7b1

由 YueHaibing 提交于 9月 14, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

block/blk-iolatency.c: In function 'scale_change':
block/blk-iolatency.c:301:7: warning:
 variable 'changed' set but not used [-Wunused-but-set-variable]

block/blk-iolatency.c: In function 'iolatency_set_limit':
block/blk-iolatency.c:765:24: warning:
 variable 'blkiolat' set but not used [-Wunused-but-set-variable]
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8c0d7b1

07 9月, 2018 2 次提交

block: remove bio_rewind_iter() · 7759eb23

由 Ming Lei 提交于 9月 05, 2018

It is pointed that bio_rewind_iter() is one very bad API[1]:

1) bio size may not be restored after rewinding

2) it causes some bogus change, such as 5151842b (block: reset
bi_iter.bi_done after splitting bio)

3) rewinding really makes things complicated wrt. bio splitting

4) unnecessary updating of .bi_done in fast path

[1] https://marc.info/?t=153549924200005&r=1&w=2

So this patch takes Kent's suggestion to restore one bio into its original
state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(),
given now bio_rewind_iter() is only used by bio integrity code.

Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Hannes Reinecke <hare@suse.com>
Suggested-by: NKent Overstreet <kent.overstreet@gmail.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7759eb23

block: bfq: swap puts in bfqg_and_blkg_put · d5274b3c

由 Konstantin Khlebnikov 提交于 9月 06, 2018

Fix trivial use-after-free. This could be last reference to bfqg.

Fixes: 8f9bebc3 ("block, bfq: access and cache blkg data only when safe")
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d5274b3c

06 9月, 2018 1 次提交

block: don't warn when doing fsync on read-only devices · 8b2ded1c

由 Mikulas Patocka 提交于 9月 05, 2018

It is possible to call fsync on a read-only handle (for example, fsck.ext2
does it when doing read-only check), and this call results in kernel
warning.

The patch b089cfd9 ("block: don't warn for flush on read-only device")
attempted to disable the warning, but it is buggy and it doesn't
(op_is_flush tests flags, but bio_op strips off the flags).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions")
Cc: stable@vger.kernel.org	# 4.18
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b2ded1c

01 9月, 2018 3 次提交

blkcg: use tryget logic when associating a blkg with a bio · 31118850

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

There is a very small change a bio gets caught up in a really
unfortunate race between a task migration, cgroup exiting, and itself
trying to associate with a blkg. This is due to css offlining being
performed after the css->refcnt is killed which triggers removal of
blkgs that reach their blkg->refcnt of 0.

To avoid this, association with a blkg should use tryget and fallback to
using the root_blkg.

Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

31118850

blkcg: delay blkg destruction until after writeback has finished · 59b57717

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

59b57717

Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" · 6b065462

由 Dennis Zhou (Facebook) 提交于 8月 31, 2018

This reverts commit 4c699480.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c699480) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c699480 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b065462

28 8月, 2018 5 次提交

block: bsg: move atomic_t ref_count variable to refcount API · db193954

由 John Pittman 提交于 8月 27, 2018

Currently, variable ref_count within the bsg_device struct is of
type atomic_t.  For variables being used as reference counters,
the refcount API should be used instead of atomic.  The newer
refcount API works to prevent counter overflows and use-after-free
bugs.  So, move this varable from the atomic API to refcount,
potentially avoiding the issues mentioned.
Signed-off-by: NJohn Pittman <jpittman@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db193954

block: remove unnecessary condition check · 62d2a194

由 Chengguang Xu 提交于 8月 28, 2018

kmem_cache_destroy() can handle NULL pointer correctly, so there is
no need to check e->icq_cache before calling kmem_cache_destroy().
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62d2a194

blk-wbt: remove dead code · b0a84beb

由 Jens Axboe 提交于 8月 27, 2018

We already note and mark discard and swap IO from bio_to_wbt_flags().
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b0a84beb

blk-wbt: improve waking of tasks · 38cfb5a4

由 Jens Axboe 提交于 8月 26, 2018

We have two potential issues:

1) After commit 2887e41b, we only wake one process at the time when
   we finish an IO. We really want to wake up as many tasks as can
   queue IO. Before this commit, we woke up everyone, which could cause
   a thundering herd issue.

2) A task can potentially consume two wakeups, causing us to (in
   practice) miss a wakeup.

Fix both by providing our own wakeup function, which stops
__wake_up_common() from waking up more tasks if we fail to get a
queueing token. With the strict ordering we have on the wait list, this
wakes the right tasks and the right amount of tasks.

Based on a patch from Jianchao Wang <jianchao.w.wang@oracle.com>.
Tested-by: NAgarwal, Anchal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38cfb5a4

blk-wbt: abstract out end IO completion handler · 061a5427

由 Jens Axboe 提交于 8月 26, 2018

Prep patch for calling the handler from a different context,
no functional changes in this patch.
Tested-by: NAgarwal, Anchal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

061a5427

23 8月, 2018 4 次提交

blk-wbt: don't maintain inflight counts if disabled · c125311d

由 Jens Axboe 提交于 8月 23, 2018

A previous commit removed the ability to have per-rq flags. We used
those flags to maintain inflight counts. Since we don't have those
anymore, we have to always maintain inflight counts, even if wbt is
disabled. This is clearly suboptimal.

Add a queue quiesce around changing the wbt latency settings from sysfs
to work around this. With that, we can reliably put the enabled check in
our bio_to_wbt_flags(), since we know the WBT_TRACKED flag will be
consistent for the lifetime of the request.

Fixes: c1c80384 ("block: remove external dependency on wbt_flags")
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c125311d

blk-wbt: fix has-sleeper queueing check · c45e6a03

由 Jens Axboe 提交于 8月 20, 2018

We need to do this inside the loop as well, or we can allow new
IO to supersede previous IO.
Tested-by: NAnchal Agarwal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c45e6a03

blk-wbt: use wq_has_sleeper() for wq active check · b7882093

由 Jens Axboe 提交于 8月 20, 2018

We need the memory barrier before checking the list head,
use the appropriate helper for this. The matching queue
side memory barrier is provided by set_current_state().
Tested-by: NAnchal Agarwal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7882093

blk-wbt: move disable check into get_limit() · ffa358dc

由 Jens Axboe 提交于 8月 20, 2018

Check it in one place, instead of in multiple places.
Tested-by: NAnchal Agarwal <anchalag@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ffa358dc

21 8月, 2018 2 次提交

blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter · f5bbbbe4

由 Jianchao Wang 提交于 8月 21, 2018

For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
account the inflight requests. It will access the queue_hw_ctx and
nr_hw_queues w/o any protection. When updating nr_hw_queues and
blk_mq_in_flight/rw occur concurrently, panic comes up.

Before update nr_hw_queues, the q will be frozen. So we could use
q_usage_counter to avoid the race. percpu_ref_is_zero is used here
so that we will not miss any in-flight request. The access to
nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
under rcu critical section, __blk_mq_update_nr_hw_queues could use
synchronize_rcu to ensure the zeroed q_usage_counter to be globally
visible.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5bbbbe4

blk-mq: init hctx sched after update ctx and hctx mapping · d48ece20

由 Jianchao Wang 提交于 8月 21, 2018

Currently, when update nr_hw_queues, IO scheduler's init_hctx will
be invoked before the mapping between ctx and hctx is adapted
correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
may depend on this mapping and get wrong result and panic finally.
A simply way to fix this is that switch the IO scheduler to 'none'
before update the nr_hw_queues, and then switch it back after
update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
to nobody use them any more.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d48ece20

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功