提交 · 9c5587346490ad4355e8de6ae402b76e55c411d5 · openanolis / cloud-kernel

31 5月, 2018 1 次提交

blk-mq: abstract out blk-mq-sched rq list iteration bio merge helper · 9c558734

由 Jens Axboe 提交于 5月 30, 2018

No functional changes in this patch, just a prep patch for utilizing
this in an IO scheduler.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

9c558734

30 5月, 2018 1 次提交

block: remove parent device reference from struct bsg_class_device · 5de815a7

由 Christoph Hellwig 提交于 5月 29, 2018

Bsg holding a reference to the parent device may result in a crash if a
bsg file handle is closed after the parent device driver has unloaded.

Holding a reference is not really needed: the parent device must exist
between bsg_register_queue and bsg_unregister_queue.  Before the device
goes away the caller does blk_cleanup_queue so that all in-flight
requests to the device are gone and all new requests cannot pass beyond
the queue.  The queue itself is a refcounted object and it will stay
alive with a bsg file.

Based on analysis, previous patch and changelog from Anatoliy Glagolev.
Reported-by: NAnatoliy Glagolev <glagolig@gmail.com>
Reviewed-by: NJames E.J. Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5de815a7

29 5月, 2018 6 次提交

block: don't print a message when the device went away · 5afb7835

由 Christoph Hellwig 提交于 5月 29, 2018

The information about a size change in this case just creates confusion.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5afb7835

blk-mq: simplify blk_mq_rq_timed_out · d1210d5a

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d1210d5a

block: remove BLK_EH_HANDLED · f6e7d48a

由 Christoph Hellwig 提交于 5月 29, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f6e7d48a

block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE · 6600593c

由 Christoph Hellwig 提交于 5月 29, 2018

The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command.  Fix the symbolic name to reflect that a little better.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6600593c

blk-mq: Remove generation seqeunce · 12f5b931

由 Keith Busch 提交于 5月 29, 2018

This patch simplifies the timeout handling by relying on the request
reference counting to ensure the iterator is operating on an inflight
and truly timed out request. Since the reference counting prevents the
tag from being reallocated, the block layer no longer needs to prevent
drivers from completing their requests while the timeout handler is
operating on it: a driver completing a request is allowed to proceed to
the next state without additional syncronization with the block layer.

This also removes any need for generation sequence numbers since the
request lifetime is prevented from being reallocated as a new sequence
while timeout handling is operating on it.

To enables this a refcount is added to struct request so that request
users can be sure they're operating on the same request without it
changing while they're processing it.  The request's tag won't be
released for reuse until both the timeout handler and the completion
are done with it.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[hch: slight cleanups, added back submission side hctx lock, use cmpxchg
 for completions]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

12f5b931

blk-mq: Fix timeout and state order · ad103e79

由 Keith Busch 提交于 5月 29, 2018

The block layer had been setting the state to in-flight prior to updating
the timer. This is the wrong order since the timeout handler could observe
the in-flight state with the older timeout, believing the request had
expired when in fact it is just getting started.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad103e79

25 5月, 2018 2 次提交

block drivers/block: Use octal not symbolic permissions · 5657a819

由 Joe Perches 提交于 5月 24, 2018

Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5657a819

blk-mq: avoid starving tag allocation after allocating process migrates · e6fc4649

由 Ming Lei 提交于 5月 24, 2018

When the allocation process is scheduled back and the mapped hw queue is
changed, fake one extra wake up on previous queue for compensating wake
up miss, so other allocations on the previous queue won't be starved.

This patch fixes one request allocation hang issue, which can be
triggered easily in case of very low nr_request.

The race is as follows:

1) 2 hw queues, nr_requests are 2, and wake_batch is one

2) there are 3 waiters on hw queue 0

3) two in-flight requests in hw queue 0 are completed, and only two
   waiters of 3 are waken up because of wake_batch, but both the two
   waiters can be scheduled to another CPU and cause to switch to hw
   queue 1

4) then the 3rd waiter will wait for ever, since no in-flight request
   is in hw queue 0 any more.

5) this patch fixes it by the fake wakeup when waiter is scheduled to
   another hw queue

Cc: <stable@vger.kernel.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>

Modified commit message to make it clearer, and make it apply on
top of the 4.18 branch.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e6fc4649

23 5月, 2018 1 次提交

blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers · 327ea4ad

由 Bart Van Assche 提交于 5月 22, 2018

Avoid that complaints similar to the following appear in the kernel log
if the number of zones is sufficiently large:

  fio: page allocation failure: order:9, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
  Call Trace:
  dump_stack+0x63/0x88
  warn_alloc+0xf5/0x190
  __alloc_pages_slowpath+0x8f0/0xb0d
  __alloc_pages_nodemask+0x242/0x260
  alloc_pages_current+0x6a/0xb0
  kmalloc_order+0x18/0x50
  kmalloc_order_trace+0x26/0xb0
  __kmalloc+0x20e/0x220
  blkdev_report_zones_ioctl+0xa5/0x1a0
  blkdev_ioctl+0x1ba/0x930
  block_ioctl+0x41/0x50
  do_vfs_ioctl+0xaa/0x610
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x79/0x1b0
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Shaun Tancheff <shaun.tancheff@seagate.com>
Cc: Damien Le Moal <damien.lemoal@hgst.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

327ea4ad

22 5月, 2018 1 次提交

blk-mq: remove wrong 'unlikely' check · b4f6f38d

由 huhai 提交于 5月 22, 2018

When dispatch_rq_from_ctx is called, in the vast majority of cases
the ctx->rq_list is not empty.
Signed-off-by: Nhuhai <huhai@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b4f6f38d

18 5月, 2018 1 次提交

blk-mq: clear hctx->dispatch_from when mappings change · d416c92c

由 huhai 提交于 5月 18, 2018

When the number of hardware queues is changed, the drivers will call
blk_mq_update_nr_hw_queues() to remap hardware queues. This changes
the ctx mappings, but the current code doesn't clear the
->dispatch_from hint. This can result in dispatch_from pointing to
a ctx that isn't mapped to the hctx anymore.

Fixes: b347689f ("blk-mq-sched: improve dispatching from sw queue")
Signed-off-by: Nhuhai <huhai@kylinos.cn>
Reviewed-by: NMing Lei <ming.lei@redhat.com>

Moved the placement of the clearing to where we clear other items
pertaining to the existing mapping, added Fixes line, and reworded
the commit message.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d416c92c

16 5月, 2018 1 次提交

blk-mq: remove redundant insert case in blk_mq_make_request() · 8fa9f556

由 huhai 提交于 5月 16, 2018

We can use blk_mq_sched_insert_request() even if we don't have
an IO scheduler attached, since that case will end up being
exactly the same as what blk_mq_queue_io() was doing now.
Signed-off-by: Nhuhai <huhai@kylinos.cn>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8fa9f556

15 5月, 2018 9 次提交

block: Add sysfs entry for fua support · 6fcefbe5

由 Kent Overstreet 提交于 5月 08, 2018

Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6fcefbe5

block: Export bio check/set pages_dirty · 1900fcc4

由 Kent Overstreet 提交于 5月 08, 2018

Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1900fcc4

block: Add warning for bi_next not NULL in bio_endio() · 0ba99ca4

由 Kent Overstreet 提交于 5月 08, 2018

Recently found a bug where a driver left bi_next not NULL and then
called bio_endio(), and then the submitter of the bio used
bio_copy_data() which was treating src and dst as lists of bios.

Fixed that bug by splitting out bio_list_copy_data(), but in case other
things are depending on bi_next in weird ways, add a warning to help
avoid more bugs like that in the future.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0ba99ca4

block: Add missing flush_dcache_page() call · 6e6e811d

由 Kent Overstreet 提交于 5月 08, 2018

Since a bio can point to userspace pages (e.g. direct IO), this is
generally necessary.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e6e811d

block: Split out bio_list_copy_data() · 45db54d5

由 Kent Overstreet 提交于 5月 08, 2018

Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().

Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like this in the future.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

45db54d5

block: Add bio_copy_data_iter(), zero_fill_bio_iter() · 38a72dac

由 Kent Overstreet 提交于 5月 08, 2018

Add versions that take bvec_iter args instead of using bio->bi_iter - to
be used by bcachefs.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38a72dac

block: Use bioset_init() for fs_bio_set · f4f8154a

由 Kent Overstreet 提交于 5月 08, 2018

Minor optimization - remove a pointer indirection when using fs_bio_set.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f4f8154a

block: Add bioset_init()/bioset_exit() · 917a38c7

由 Kent Overstreet 提交于 5月 08, 2018

Similarly to mempool_init()/mempool_exit(), take a pointer indirection
out of allocation/freeing by allowing biosets to be embedded in other
structs.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

917a38c7

block: Convert bio_set to mempool_init() · 8aa6ba2f

由 Kent Overstreet 提交于 5月 08, 2018

Minor performance improvement by getting rid of pointer indirections
from allocation/freeing fastpaths.
Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8aa6ba2f

14 5月, 2018 5 次提交

block: consistently use GFP_NOIO instead of __GFP_NORECLAIM · 0eb0b63c

由 Christoph Hellwig 提交于 5月 09, 2018

Same numerical value (for now at least), but a much better documentation
of intent.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0eb0b63c

block: use GFP_NOIO instead of __GFP_DIRECT_RECLAIM · c3036021

由 Christoph Hellwig 提交于 5月 09, 2018

We just can't do I/O when doing block layer requests allocations,
so use GFP_NOIO instead of the even more limited __GFP_DIRECT_RECLAIM.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c3036021

block: pass an explicit gfp_t to get_request · 4accf5fc

由 Christoph Hellwig 提交于 5月 09, 2018

blk_old_get_request already has it at hand, and in blk_queue_bio, which
is the fast path, it is constant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4accf5fc

block: sanitize blk_get_request calling conventions · ff005a06

由 Christoph Hellwig 提交于 5月 09, 2018

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff005a06

C
block: fix __get_request documentation · a9a14d36
由 Christoph Hellwig 提交于 5月 09, 2018
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
a9a14d36

11 5月, 2018 7 次提交

kyber-iosched: update shallow depth when setting up hardware queue · 28820640

由 Jens Axboe 提交于 5月 09, 2018

We don't expect the async depth to be smaller than the wake batch
count for sbitmap, but just in case, inform sbitmap of what shallow
depth kyber may use.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

28820640

bfq-iosched: update shallow depth to smallest one used · 483b7bf2

由 Jens Axboe 提交于 5月 09, 2018

If our shallow depth is smaller than the wake batching of sbitmap,
we can introduce hangs. Ensure that sbitmap knows how low we'll go.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

483b7bf2

bfq-iosched: remove unused variable · bd7d4ef6

由 Jens Axboe 提交于 5月 09, 2018

bfqd->sb_shift was attempted used as a cache for the sbitmap queue
shift, but we don't need it, as it never changes. Kill it with fire.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bd7d4ef6

bfq: calculate shallow depths at init time · f0635b8a

由 Jens Axboe 提交于 5月 09, 2018

It doesn't change, so don't put it in the per-IO hot path.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f0635b8a

bfq-iosched: don't worry about reserved tags in limit_depth · 55141366

由 Jens Axboe 提交于 5月 09, 2018

Reserved tags are used for error handling, we don't need to
care about them for regular IO. The core won't call us for these
anyway.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55141366

blk-mq: don't call into depth limiting for reserved tags · 17a51199

由 Jens Axboe 提交于 5月 09, 2018

It's not useful, they are internal and/or error handling recovery
commands.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

17a51199

block, bfq: postpone rq preparation to insert or merge · 18e5a57d

由 Paolo Valente 提交于 5月 04, 2018

When invoked for an I/O request rq, the prepare_request hook of bfq
increments reference counters in the destination bfq_queue for rq. In
this respect, after this hook has been invoked, rq may still be
transformed into a request with no icq attached, i.e., for bfq, a
request not associated with any bfq_queue. No further hook is invoked
to signal this tranformation to bfq (in general, to the destination
elevator for rq). This leads bfq into an inconsistent state, because
bfq has no chance to correctly lower these counters back. This
inconsistency may in its turn cause incorrect scheduling and hangs. It
certainly causes memory leaks, by making it impossible for bfq to free
the involved bfq_queue.

On the bright side, no transformation can still happen for rq after rq
has been inserted into bfq, or merged with another, already inserted,
request. Exploiting this fact, this commit addresses the above issue
by delaying the preparation of an I/O request to when the request is
inserted or merged.

This change also gives a performance bonus: a lock-contention point
gets removed. To prepare a request, bfq needs to hold its scheduler
lock. After postponing request preparation to insertion or merging, no
lock needs to be grabbed any longer in the prepare_request hook, while
the lock already taken to perform insertion or merging is used to
preparare the request as well.
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

18e5a57d

09 5月, 2018 5 次提交

block: consolidate struct request timestamp fields · 522a7775

由 Omar Sandoval 提交于 5月 09, 2018

Currently, struct request has four timestamp fields:

- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
  used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq

These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

522a7775

block: move blk_stat_add() to __blk_mq_end_request() · 4bc6339a

由 Omar Sandoval 提交于 5月 09, 2018

We want this next to blk_account_io_done() for the next change so that
we can call ktime_get() only once for both.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4bc6339a

block: use ktime_get_ns() instead of sched_clock() for cfq and bfq · 84c7afce

由 Omar Sandoval 提交于 5月 09, 2018

cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e ("block: disable preemption before
using sched_clock()").
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84c7afce

block: get rid of struct blk_issue_stat · 544ccc8d

由 Omar Sandoval 提交于 5月 09, 2018

struct blk_issue_stat squashes three things into one u64:

- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling

It turns out that on x86_64, we have a 4 byte hole in struct request
which we can fill with the non-timestamp fields from blk_issue_stat,
simplifying things quite a bit.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

544ccc8d

block: replace bio->bi_issue_stat with bio-specific type · 5238dcf4

由 Omar Sandoval 提交于 5月 09, 2018

struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
specific for bios. The helpers work the same way as the blk-stats
helpers.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5238dcf4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功