提交 · a8a275c9c2fb6bc9b45ad3e4187469726e2af7d1 · openanolis / cloud-kernel

01 6月, 2018 2 次提交

block: unexport elevator_init/exit · a8a275c9

由 Christoph Hellwig 提交于 5月 31, 2018

These are only used by the block core.  Also move the declarations to
block/blk.h.
Reported-by: NDamien Le Moal <Damien.LeMoal@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8a275c9

block: move initialization of elevator-related fields to blk_alloc_queue_node · cbf62af3

由 Christoph Hellwig 提交于 5月 31, 2018

No point in doing this in elevator_init.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NDamien Le Moal <Damien.LeMoal@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cbf62af3

31 5月, 2018 26 次提交

block, bfq: prevent soft_rt_next_start from being stuck at infinity · f6c3ca0e

由 Davide Sapienza 提交于 5月 31, 2018

BFQ can deem a bfq_queue as soft real-time only if the queue
- periodically becomes completely idle, i.e., empty and with
  no still-outstanding I/O request;
- after becoming idle, gets new I/O only after a special reference
  time soft_rt_next_start.

In this respect, after commit "block, bfq: consider also past I/O in
soft real-time detection", the value of soft_rt_next_start can never
decrease. This causes a problem with the following special updating
case for soft_rt_next_start: to prevent queues that are not completely
idle to be wrongly detected as soft real-time (when they become
non-empty again), soft_rt_next_start is temporarily set to infinity
for empty queues with still outstanding I/O requests. But, if such an
update is actually performed, then, because of the above commit,
soft_rt_next_start will be stuck at infinity forever, and the queue
will have no more chance to be considered soft real-time.

On slow systems, this problem does cause actual soft real-time
applications to be occasionally not detected as such.

This commit addresses this issue by eliminating the pushing of
soft_rt_next_start to infinity, and by changing the way non-empty
queues are prevented from being wrongly detected as soft
real-time. Simply, a queue that becomes non-empty again can now be
detected as soft real-time only if it has no outstanding I/O request.
Signed-off-by: NDavide Sapienza <sapienza.dav@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6c3ca0e

block, bfq: increase weight-raising duration for interactive apps · d450542e

由 Davide Sapienza 提交于 5月 31, 2018

The maximum possible duration of the weight-raising period for
interactive applications is limited to 13 seconds, as this is the time
needed to load the largest application that we considered when tuning
weight raising. Unfortunately, in such an evaluation, we did not
consider the case of very slow virtual machines.

For example, on a QEMU/KVM virtual machine
- running in a slow PC;
- with a virtual disk stacked on a slow low-end 5400rpm HDD;
- serving a heavy I/O workload, such as the sequential reading of
several files;
mplayer takes 23 seconds to start, if constantly weight-raised.

To address this issue, this commit conservatively sets the upper limit
for weight-raising duration to 25 seconds.
Signed-off-by: NDavide Sapienza <sapienza.dav@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d450542e

block, bfq: remove slow-system class · e24f1c24

由 Paolo Valente 提交于 5月 31, 2018

BFQ computes the duration of weight raising for interactive
applications automatically, using some reference parameters. In
particular, BFQ uses the best durations (see comments in the code for
how these durations have been assessed) for two classes of systems:
slow and fast ones. Examples of slow systems are old phones or systems
using micro HDDs. Fast systems are all the remaining ones. Using these
parameters, BFQ computes the actual duration of the weight raising,
for the system at hand, as a function of the relative speed of the
system w.r.t. the speed of a reference system, belonging to the same
class of systems as the system at hand.

This slow vs fast differentiation proved to be useful in the past, but
happens to have little meaning with current hardware. Even worse, it
does cause problems in virtual systems, where the speed of the system
can vary frequently, and so widely to just confuse the class-detection
mechanism, and, as we have verified experimentally, to cause BFQ to
compute non-sensical weight-raising durations.

This commit addresses this issue by removing the slow class and the
class-detection mechanism.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e24f1c24

block, bfq: add description of weight-raising heuristics · 4029eef1

由 Paolo Valente 提交于 5月 31, 2018

A description of how weight raising works is missing in BFQ
sources. In addition, the code for handling weight raising is
scattered across a few functions. This makes it rather hard to
understand the mechanism and its rationale. This commits adds such a
description at the beginning of the main source file.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4029eef1

block, bfq: remove the removal of 'next' rq in bfq_requests_merged · ac857e0d

由 Filippo Muzzini 提交于 5月 31, 2018

Since bfq_finish_request() is always called on the request 'next',
after bfq_requests_merged() is finished, and bfq_finish_request()
removes 'next' from its bfq_queue if needed, it isn't necessary to do
such a removal in advance in bfq_merged_requests().

This commit removes such a useless 'next' removal.
Signed-off-by: NFilippo Muzzini <filippo.muzzini@outlook.it>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ac857e0d

block, bfq: remove wrong check in bfq_requests_merged · 8abfa4d6

由 Paolo Valente 提交于 5月 31, 2018

The request rq passed to the function bfq_requests_merged is always in
a bfq_queue, so the check !RB_EMPTY_NODE(&rq->rb_node) at the
beginning of bfq_requests_merged always succeeds, and the control
flow systematically skips to the end of the function.  This implies
that the body of the function is never executed, i.e., the
repositioning of rq is never performed.

On the opposite end, a control is missing in the body of the function:
'next' must be removed only if it is inside a bfq_queue.

This commit removes the wrong check on rq, and adds the missing check
on 'next'. In addition, this commit adds comments on
bfq_requests_merged.
Signed-off-by: NFilippo Muzzini <filippo.muzzini@outlook.it>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8abfa4d6

block, bfq: remove wrong lock in bfq_requests_merged · a12bffeb

由 Filippo Muzzini 提交于 5月 31, 2018

In bfq_requests_merged(), there is a deadlock because the lock on
bfqq->bfqd->lock is held by the calling function, but the code of
this function tries to grab the lock again.

This deadlock is currently hidden by another bug (fixed by next commit
for this source file), which causes the body of bfq_requests_merged()
to be never executed.

This commit removes the deadlock by removing the lock/unlock pair.
Signed-off-by: NFilippo Muzzini <filippo.muzzini@outlook.it>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a12bffeb

block: fixup bioset_integrity_create() call · 04c4950d

由 Jens Axboe 提交于 5月 30, 2018

Missed converting the bioset_integrity_create() bounce bio set
call.

Fixes: 338aa96d ("block: convert bounce, q->bio_split to bioset_init()/mempool_init()")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

04c4950d

block: Drop bioset_create() · dad08527

由 Kent Overstreet 提交于 5月 20, 2018

All users have been converted to bioset_init(), kill off the
old API.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dad08527

xfs: convert to bioset_init()/mempool_init() · e292d7bc

由 Kent Overstreet 提交于 5月 20, 2018

Convert XFS to embedded bio sets.
Acked-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e292d7bc

btrfs: convert to bioset_init()/mempool_init() · 8ac9f7c1

由 Kent Overstreet 提交于 5月 20, 2018

Convert btrfs to embedded bio sets.
Acked-by: NChris Mason <clm@fb.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ac9f7c1

fs: convert block_dev.c to bioset_init() · 52190f8a

由 Kent Overstreet 提交于 5月 20, 2018

Convert block DIO code to embedded bio sets.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

52190f8a

target: convert to bioset_init()/mempool_init() · a47a28b7

由 Kent Overstreet 提交于 5月 20, 2018

Convert the target code to embedded bio sets.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a47a28b7

dm: convert to bioset_init()/mempool_init() · 6f1c819c

由 Kent Overstreet 提交于 5月 20, 2018

Convert dm to embedded bio sets.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6f1c819c

md: convert to bioset_init()/mempool_init() · afeee514

由 Kent Overstreet 提交于 5月 20, 2018

Convert md to embedded bio sets.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

afeee514

bcache: convert to bioset_init()/mempool_init() · d19936a2

由 Kent Overstreet 提交于 5月 20, 2018

Convert bcache to embedded bio sets.
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d19936a2

lightnvm: convert to bioset_init()/mempool_init() · b906bbb6

由 Kent Overstreet 提交于 5月 20, 2018

Convert lightnvm to embedded bio sets.
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b906bbb6

pktcdvd: convert to bioset_init()/mempool_init() · 64c4bc4d

由 Kent Overstreet 提交于 5月 20, 2018

Convert pktcdvd to embedded bio sets.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

64c4bc4d

drbd: convert to bioset_init()/mempool_init() · 0892fac8

由 Kent Overstreet 提交于 5月 20, 2018

Convert drbd to embedded bio sets and mempools.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0892fac8

block: convert bounce, q->bio_split to bioset_init()/mempool_init() · 338aa96d

由 Kent Overstreet 提交于 5月 20, 2018

Convert the core block functionality to embedded bio sets.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

338aa96d

blk-throttle: return proper bool type to caller instead of 0/1 · 0b6bad7d

由 Chengguang Xu 提交于 5月 29, 2018

Change to return true/false only for bool type return code.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b6bad7d

blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter · d250bf4e

由 Christoph Hellwig 提交于 5月 30, 2018

We already check for started commands in all callbacks, but we should
also protect against already completed commands.  Do this by taking
the checks to common code.
Acked-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d250bf4e

nbd: clear DISCONNECT_REQUESTED flag once disconnection occurs. · 5e3c3a7e

由 Kevin Vigor 提交于 5月 30, 2018

When a userspace client requests a NBD device be disconnected, the
DISCONNECT_REQUESTED flag is set. While this flag is set, the driver
will not inform userspace when a connection is closed.

Unfortunately the flag was never cleared, so once a disconnect was
requested the driver would thereafter never tell userspace about a
closed connection. Thus when connections failed due to timeout, no
attempt to reconnect was made and eventually the device would fail.

Fix by clearing the DISCONNECT_REQUESTED flag (and setting the
DISCONNECTED flag) once all connections are closed.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NKevin Vigor <kvigor@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e3c3a7e

blk-throttle: fix potential NULL pointer dereference in throtl_select_dispatch · 2ab74cd2

由 Liu Bo 提交于 5月 29, 2018

tg in throtl_select_dispatch is used first and then do check. Since tg
may be NULL, it has potential NULL pointer dereference risk. So fix
it.
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2ab74cd2

block: kyber: make kyber more friendly with merging · a6088845

由 Jianchao Wang 提交于 5月 30, 2018

Currently, kyber is very unfriendly with merging. kyber depends
on ctx rq_list to do merging, however, most of time, it will not
leave any requests in ctx rq_list. This is because even if tokens
of one domain is used up, kyber will try to dispatch requests
from other domain and flush the rq_list there.

To improve this, we setup kyber_ctx_queue (kcq) which is similar
with ctx, but it has rq_lists for different domain and build same
mapping between kcq and khd as the ctx & hctx. Then we could merge,
insert and dispatch for different domains separately. At the same
time, only flush the rq_list of kcq when get domain token successfully.
Then if one domain token is used up, the requests could be left in
the rq_list of that domain and maybe merged with following io.

Following is my test result on machine with 8 cores and NVMe card
INTEL SSDPEKKR128G7

fio size=256m ioengine=libaio iodepth=64 direct=1 numjobs=8
seq/random
+------+---------------------------------------------------------------+
|patch?| bw(MB/s) |   iops    | slat(usec) |    clat(usec)   |  merge  |
+----------------------------------------------------------------------+
| w/o  |  606/612 | 151k/153k |  6.89/7.03 | 3349.21/3305.40 |   0/0   |
+----------------------------------------------------------------------+
| w/   | 1083/616 | 277k/154k |  4.93/6.95 | 1830.62/3279.95 | 223k/3k |
+----------------------------------------------------------------------+
When set numjobs to 16, the bw and iops could reach 1662MB/s and 425k
on my platform.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a6088845

blk-mq: abstract out blk-mq-sched rq list iteration bio merge helper · 9c558734

由 Jens Axboe 提交于 5月 30, 2018

No functional changes in this patch, just a prep patch for utilizing
this in an IO scheduler.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

9c558734

30 5月, 2018 2 次提交

block: remove parent device reference from struct bsg_class_device · 5de815a7

由 Christoph Hellwig 提交于 5月 29, 2018

Bsg holding a reference to the parent device may result in a crash if a
bsg file handle is closed after the parent device driver has unloaded.

Holding a reference is not really needed: the parent device must exist
between bsg_register_queue and bsg_unregister_queue.  Before the device
goes away the caller does blk_cleanup_queue so that all in-flight
requests to the device are gone and all new requests cannot pass beyond
the queue.  The queue itself is a refcounted object and it will stay
alive with a bsg file.

Based on analysis, previous patch and changelog from Anatoliy Glagolev.
Reported-by: NAnatoliy Glagolev <glagolig@gmail.com>
Reviewed-by: NJames E.J. Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5de815a7

Merge branch 'nvme-4.18-2' of git://git.infradead.org/nvme into for-4.18/block · b7405176

由 Jens Axboe 提交于 5月 29, 2018

Pull NVMe changes from Christoph:

"Here is the current batch of nvme updates for 4.18, we have a few more
 patches in the queue, but I'd like to get this pile into your tree
 and linux-next ASAP.

 The biggest item is support for file-backed namespaces in the NVMe
 target from Chaitanya, in addition to that we mostly small fixes from
 all the usual suspects."

* 'nvme-4.18-2' of git://git.infradead.org/nvme:
  nvme: fixup memory leak in nvme_init_identify()
  nvme: fix KASAN warning when parsing host nqn
  nvmet-loop: use nr_phys_segments when map rq to sgl
  nvmet-fc: increase LS buffer count per fc port
  nvmet: add simple file backed ns support
  nvmet: remove duplicate NULL initialization for req->ns
  nvmet: make a few error messages more generic
  nvme-fabrics: allow duplicate connections to the discovery controller
  nvme-fabrics: centralize discovery controller defaults
  nvme-fabrics: remove unnecessary controller subnqn validation
  nvme-fc: remove setting DNR on exception conditions
  nvme-rdma: stop admin queue before freeing it
  nvme-pci: Fix AER reset handling
  nvme-pci: set nvmeq->cq_vector after alloc cq/sq
  nvme: host: core: fix precedence of ternary operator
  nvme: fix lockdep warning in nvme_mpath_clear_current_path

b7405176

29 5月, 2018 10 次提交

block: don't print a message when the device went away · 5afb7835

由 Christoph Hellwig 提交于 5月 29, 2018

The information about a size change in this case just creates confusion.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5afb7835

block: unexport check_disk_size_change · 4163a039

由 Christoph Hellwig 提交于 5月 29, 2018

Only used in block_dev.c and the partitions code, and it should remain
that way..
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4163a039

block: move ->timeout request member · 0b7576d8

由 Jens Axboe 提交于 5月 29, 2018

After the recent timeout handling changes, we have two holes in
the struct. Move the timeout near the deadline, killing both,
and moving related members closer together. On my config on
x86-64, this shrinks struct request from 312 to 304 bytes.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b7576d8

blk-mq: simplify blk_mq_rq_timed_out · d1210d5a

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d1210d5a

block: document the blk_eh_timer_return values · 88b0cfad

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

88b0cfad

block: remove BLK_EH_HANDLED · f6e7d48a

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6e7d48a

libiscsi: don't try to bypass SCSI EH · adb2b769

由 Christoph Hellwig 提交于 5月 29, 2018

libiscsi is the only SCSI code that return BLK_EH_HANDLED, thus trying to
bypass the normal SCSI EH code.  We are going to remove this return value
at the block layer, and at least from a quick look it doesn't look too
harmful to try to send an abort for these cases, especially as the first
one should not actually be possible.  If this doesn't work out iscsi
will probably need its own eh_strategy_handler instead to just do the
right thing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

adb2b769

mmc: complete requests from ->timeout · ad73d6fe

由 Christoph Hellwig 提交于 5月 29, 2018

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

[While this keeps existing behavior it seems to mismatch the comment,
 maintainers please chime in!]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad73d6fe

scsi_transport_fc: complete requests from ->timeout · 1fc2b62e

由 Christoph Hellwig 提交于 5月 29, 2018

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1fc2b62e

null_blk: complete requests from ->timeout · 0df0bb08

由 Christoph Hellwig 提交于 5月 29, 2018

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0df0bb08

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功