提交 · f299b7c7a9dee64425e5965bd4f56dc024c1befc · openanolis / cloud-kernel

10 8月, 2017 5 次提交

blk-mq: provide internal in-flight variant · f299b7c7

由 Jens Axboe 提交于 8月 08, 2017

We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f299b7c7

block: make part_in_flight() take an array of two ints · 0609e0ef

由 Jens Axboe 提交于 8月 08, 2017

Instead of returning the count that matches the partition, pass
in an array of two ints. Index 0 will be filled with the inflight
count for the partition in question, and index 1 will filled
with the root inflight count, if the partition passed in is not the
root.

This is in preparation for being able to calculate both in one
go.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0609e0ef

block: pass in queue to inflight accounting · d62e26b3

由 Jens Axboe 提交于 6月 30, 2017

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d62e26b3

blk-mq-tag: check for NULL rq when iterating tags · 7f5562d5

由 Jens Axboe 提交于 8月 04, 2017

Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.

This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7f5562d5

bio-integrity: move the bio integrity profile check earlier in bio_integrity_prep · 9346beb9

由 Christoph Hellwig 提交于 8月 09, 2017

This makes the code more obvious, and moves the most likely branch first
in the function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9346beb9

02 8月, 2017 1 次提交

block: Add comment to submit_bio_wait() · 3d289d68

由 Jan Kara 提交于 8月 02, 2017

submit_bio_wait() does not consume bio reference. Add comment about
that.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3d289d68

01 8月, 2017 1 次提交

blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled · b7a71e66

由 Jens Axboe 提交于 8月 01, 2017

We recently had a bug in the IPR SCSI driver, where it would end up
making the SCSI mid layer run the mq hardware queue with interrupts
disabled. This isn't legal, since the software queue locking relies
on never being grabbed from interrupt context. Additionally, drivers
that set BLK_MQ_F_BLOCKING may schedule from this context.

Add a WARN_ON_ONCE() to catch bad users up front.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7a71e66

29 7月, 2017 3 次提交

blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags · 18e9781d

由 Jens Axboe 提交于 7月 27, 2017

We know we're in process context, so don't bother using the
IRQ safe versions of the spin lock.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

18e9781d

block: use standard blktrace API to output cgroup info for debug notes · 35fe6d76

由 Shaohua Li 提交于 7月 12, 2017

Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.

Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup' option enabled. cgroup
info isn't output as a string by default too, we only do this with
'blk_cgname' option enabled. Also cgroup info is output in different
position of the note string. I think these behavior changes aren't a big
issue (actually we make trace data shorter which is good), since the
blktrace note is solely for debugging.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

35fe6d76

block: always attach cgroup info into bio · 007cc56b

由 Shaohua Li 提交于 7月 12, 2017

blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
info.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

007cc56b

25 7月, 2017 1 次提交

blk-mq: map queues to all present CPUs · 76451d79

由 Christoph Hellwig 提交于 7月 18, 2017

We already do this for PCI mappings, and the higher level code now
expects that CPU on/offlining doesn't have an affect on the queue
mappings.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76451d79

24 7月, 2017 1 次提交

block: disable runtime-pm for blk-mq · 765e40b6

由 Christoph Hellwig 提交于 7月 21, 2017

The blk-mq code lacks support for looking at the rpm_status field, tracking
active requests and the RQF_PM flag.

Due to the default switch to blk-mq for scsi people start to run into
suspend / resume issue due to this fact, so make sure we disable the runtime
PM functionality until it is properly implemented.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

765e40b6

12 7月, 2017 2 次提交

bfq: dispatch request to prevent queue stalling after the request completion · 3f7cb4f4

由 Hou Tao 提交于 7月 11, 2017

There are mq devices (eg., virtio-blk, nbd and loopback) which don't
invoke blk_mq_run_hw_queues() after the completion of a request.
If bfq is enabled on these devices and the slice_idle attribute or
strict_guarantees attribute is set as zero, it is possible that
after a request completion the remaining requests of busy bfq queue
will stalled in the bfq schedule until a new request arrives.

To fix the scheduler latency problem, we need to check whether or not
all issued requests have completed and dispatch more requests to driver
if there is no request in driver.

The problem can be reproduced by running the following script
on a virtio-blk device with nr_hw_queues as 1:

#!/bin/sh

dev=vdb
# mount point for dev
mp=/tmp/mnt
cd $mp

job=strict.job
cat <<EOF > $job
[global]
direct=1
bs=4k
size=256M
rw=write
ioengine=libaio
iodepth=128
runtime=5
time_based

[1]
filename=1.data

[2]
new_group
filename=2.data
EOF

echo bfq > /sys/block/$dev/queue/scheduler
echo 1 > /sys/block/$dev/queue/iosched/strict_guarantees
fio $job
Signed-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f7cb4f4

bfq: fix typos in comments about B-WF2Q+ algorithm · 38c91407

由 Hou Tao 提交于 7月 12, 2017

The start time of eligible entity should be less than or equal to
the current virtual time, and the entity in idle tree has a finish
time being greater than the current virtual time.
Signed-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38c91407

11 7月, 2017 1 次提交

block: call bio_uninit in bio_endio · b222dd2f

由 Shaohua Li 提交于 7月 10, 2017

bio_free isn't a good place to free cgroup info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should be called anyway. The
bio_uninit call in bio_free is kept, in case the bio never gets called
bio endio.

This assumes ->bi_end_io() doesn't access cgroup info, which seems true
in my audit.

This along with Christoph's integrity patch should fix the memory leak
issue.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b222dd2f

06 7月, 2017 1 次提交

block: Fix __blkdev_issue_zeroout loop · 615d22a5

由 Damien Le Moal 提交于 7月 06, 2017

The BIO issuing loop in __blkdev_issue_zeroout() is allocating BIOs
with a maximum number of bvec (pages) equal to

min(nr_sects, (sector_t)BIO_MAX_PAGES)

This works since the requested number of bvecs will always be limited
to the absolute maximum number supported (BIO_MAX_PAGES), but this is
ineficient as too many bvec entries may be requested due to the
different units being used in the min() operation (number of sectors vs
number of pages).
To fix this, introduce the helper __blkdev_sectors_to_bio_pages() to
correctly calculate the number of bvecs for zeroout BIOs as the issuing
loop progresses. The calculation is done using consistent units and
makes sure that the number of pages return is at least 1 (for cases
where the number of sectors is less that the number of sectors in
a page).

Also remove a trailing space after the bit shift in the internal loop
min() call.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

615d22a5

05 7月, 2017 1 次提交

bio-integrity: fix boolreturn.cocci warnings · ea4d12da

由 kbuild test robot 提交于 7月 05, 2017

block/bio-integrity.c:318:10-11: WARNING: return of 0/1 in function 'bio_integrity_prep' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: e23947bd ("bio-integrity: fold bio_integrity_enabled to bio_integrity_prep")
CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ea4d12da

04 7月, 2017 9 次提交

bio-integrity: stop abusing bi_end_io · 7c20f116

由 Christoph Hellwig 提交于 7月 03, 2017

And instead call directly into the integrity code from bio_end_io.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7c20f116

bio-integrity: Restore original iterator on verify stage · 63573e35

由 Dmitry Monakhov 提交于 6月 29, 2017

Currently ->verify_fn not woks at all because at the moment it is called
bio->bi_iter.bi_size == 0, so we do not iterate integrity bvecs at all.

In order to perform verification we need to know original data vector,
with new bvec rewind API this is trivial.

testcase: https://github.com/dmonakhov/xfstests/commit/3c6509eaa83b9c17cd0bc95d73fcdd76e1c54a85Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
[hch: adopted for new status values]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

63573e35

t10-pi: Move opencoded contants to common header · 128b6f9f

由 Dmitry Monakhov 提交于 6月 29, 2017

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

128b6f9f

bio-integrity: fold bio_integrity_enabled to bio_integrity_prep · e23947bd

由 Dmitry Monakhov 提交于 6月 29, 2017

Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.

In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold it in to one function.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
[hch: merged with the latest block tree,
	return bool from bio_integrity_prep]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e23947bd

bio-integrity: fix interface for bio_integrity_trim · fbd08e76

由 Dmitry Monakhov 提交于 6月 29, 2017

bio_integrity_trim inherent it's interface from bio_trim and accept
offset and size, but this API is error prone because data offset
must always be insync with bio's data offset. That is why we have
integrity update hook in bio_advance()

So only meaningful values are: offset == 0, sectors == bio_sectors(bio)
Let's just remove them completely.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fbd08e76

bio-integrity: bio_integrity_advance must update integrity seed · 309a62fa

由 Dmitry Monakhov 提交于 6月 29, 2017

SCSI drivers do care about bip_seed so we must update it accordingly.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

309a62fa

bio-integrity: bio_trim should truncate integrity vector accordingly · 376a78ab

由 Dmitry Monakhov 提交于 6月 29, 2017

Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

376a78ab

blk-mq-sched: fix performance regression of mq-deadline · 32825c45

由 Ming Lei 提交于 7月 03, 2017

When mq-deadline is taken, IOPS of sequential read and
seqential write is observed more than 20% drop on sata(scsi-mq)
devices, compared with using 'none' scheduler.

The reason is that the default nr_requests for scheduler is
too big for small queuedepth devices, and latency is increased
much.

Since the principle of taking 256 requests for mq scheduler
is based on 128 queue depth, this patch changes into
double size of min(hw queue_depth, 128).
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

32825c45

block, bfq: don't change ioprio class for a bfq_queue on a service tree · 431b17f9

由 Paolo Valente 提交于 7月 03, 2017

On each deactivation or re-scheduling (after being served) of a
bfq_queue, BFQ invokes the function __bfq_entity_update_weight_prio(),
to perform pending updates of ioprio, weight and ioprio class for the
bfq_queue. BFQ also invokes this function on I/O-request dispatches,
to raise or lower weights more quickly when needed, thereby improving
latency. However, the entity representing the bfq_queue may be on the
active (sub)tree of a service tree when this happens, and, although
with a very low probability, the bfq_queue may happen to also have a
pending change of its ioprio class. If both conditions hold when
__bfq_entity_update_weight_prio() is invoked, then the entity moves to
a sort of hybrid state: the new service tree for the entity, as
returned by bfq_entity_service_tree(), differs from service tree on
which the entity still is. The functions that handle activations and
deactivations of entities do not cope with such a hybrid state (and
would need to become more complex to cope).

This commit addresses this issue by just making
__bfq_entity_update_weight_prio() not perform also a possible pending
change of ioprio class, when invoked on an I/O-request dispatch for a
bfq_queue. Such a change is thus postponed to when
__bfq_entity_update_weight_prio() is invoked on deactivation or
re-scheduling of the bfq_queue.
Reported-by: NMarco Piazza <mpiazza@gmail.com>
Reported-by: NLaurentiu Nicola <lnicola@dend.ro>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NMarco Piazza <mpiazza@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

431b17f9

30 6月, 2017 2 次提交
- A
  compat_hdio_ioctl: get rid of set_fs() · 30138384
  由 Al Viro 提交于 6月 27, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  30138384
- A
  take floppy compat ioctls to sodding floppy.c · 229b53c9
  由 Al Viro 提交于 6月 27, 2017
```
all other drivers recognizing those ioctls are very much *not*
biarch.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  229b53c9
29 6月, 2017 4 次提交

blk-mq: map all HWQ also in hyperthreaded system · fe631457

由 Max Gurtovoy 提交于 6月 29, 2017

This patch performs sequential mapping between CPUs and queues.
In case the system has more CPUs than HWQs then there are still
CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs
and their siblings to the same HWQ.
This actually fixes a bug that found unmapped HWQs in a system with
2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs)
running NVMEoF (opens upto maximum of 64 HWQs).

Performance results running fio (72 jobs, 128 iodepth)
using null_blk (w/w.o patch):

bs IOPS(read submit_queues=72) IOPS(write submit_queues=72) IOPS(read submit_queues=24) IOPS(write submit_queues=24)
----- ---------------------------- ------------------------------ ---------------------------- -----------------------------
512 4890.4K/4723.5K 4524.7K/4324.2K 4280.2K/4264.3K 3902.4K/3909.5K
1k 4910.1K/4715.2K 4535.8K/4309.6K 4296.7K/4269.1K 3906.8K/3914.9K
2k 4906.3K/4739.7K 4526.7K/4330.6K 4301.1K/4262.4K 3890.8K/3900.1K
4k 4918.6K/4730.7K 4556.1K/4343.6K 4297.6K/4264.5K 3886.9K/3893.9K
8k 4906.4K/4748.9K 4550.9K/4346.7K 4283.2K/4268.8K 3863.4K/3858.2K
16k 4903.8K/4782.6K 4501.5K/4233.9K 4292.3K/4282.3K 3773.1K/3773.5K
32k 4885.8K/4782.4K 4365.9K/4184.2K 4307.5K/4289.4K 3780.3K/3687.3K
64k 4822.5K/4762.7K 2752.8K/2675.1K 4308.8K/4312.3K 2651.5K/2655.7K
128k 2388.5K/2313.8K 1391.9K/1375.7K 2142.8K/2152.2K 1395.5K/1374.2K
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe631457

block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5

由 Jens Axboe 提交于 6月 28, 2017

Wen reports significant memory leaks with DIF and O_DIRECT:

"With nvme devive + T10 enabled, On a system it has 256GB and started
logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
leaking.

/proc/meminfo | grep SUnreclaim...

SUnreclaim:      6752128 kB
SUnreclaim:      6874880 kB
SUnreclaim:      7238080 kB
....
SUnreclaim:     22307264 kB
SUnreclaim:     22485888 kB
SUnreclaim:     22720256 kB

When testcases with T10 enabled call into __blkdev_direct_IO_simple,
code doesn't free memory allocated by bio_integrity_alloc. The patch
fixes the issue. HTX has been run with +60 hours without failure."

Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
doesn't go through the regular bio free. This means that any ancillary
data allocated with the bio through the stack is not freed. Hence, we
can leak the integrity data associated with the bio, if the device is
using DIF/DIX.

Fix this by providing a bio_uninit() and export it, so that we can use
it to free this data. Note that this is a minimal fix for this issue.
Any current user of bio's that are allocated outside of
bio_alloc_bioset() suffers from this issue, most notably some drivers.
We will fix those in a more comprehensive patch for 4.13. This also
means that the commit marked as being fixed by this isn't the real
culprit, it's just the most obvious one out there.

Fixes: 542ff7bf ("block: new direct I/O implementation")
Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ae3b3f5

blk-mq: Create hctx for each present CPU · 4b855ad3

由 Christoph Hellwig 提交于 6月 26, 2017

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations.  Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-block@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4b855ad3

blk-mq: Include all present CPUs in the default queue mapping · 5f042e7c

由 Christoph Hellwig 提交于 6月 26, 2017

This way we get a nice distribution independent of the current cpu
online / offline state.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-block@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Link: http://lkml.kernel.org/r/20170626102058.10200-2-hch@lst.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5f042e7c

28 6月, 2017 8 次提交

block, bfq: update wr_busy_queues if needed on a queue split · 13c931bd

由 Paolo Valente 提交于 6月 27, 2017

This commit fixes a bug triggered by a non-trivial sequence of
events. These events are briefly described in the next two
paragraphs. The impatiens, or those who are familiar with queue
merging and splitting, can jump directly to the last paragraph.

On each I/O-request arrival for a shared bfq_queue, i.e., for a
bfq_queue that is the result of the merge of two or more bfq_queues,
BFQ checks whether the shared bfq_queue has become seeky (i.e., if too
many random I/O requests have arrived for the bfq_queue; if the device
is non rotational, then random requests must be also small for the
bfq_queue to be tagged as seeky). If the shared bfq_queue is actually
detected as seeky, then a split occurs: the bfq I/O context of the
process that has issued the request is redirected from the shared
bfq_queue to a new non-shared bfq_queue. As a degenerate case, if the
shared bfq_queue actually happens to be shared only by one process
(because of previous splits), then no new bfq_queue is created: the
state of the shared bfq_queue is just changed from shared to non
shared.

Regardless of whether a brand new non-shared bfq_queue is created, or
the pre-existing shared bfq_queue is just turned into a non-shared
bfq_queue, several parameters of the non-shared bfq_queue are set
(restored) to the original values they had when the bfq_queue
associated with the bfq I/O context of the process (that has just
issued an I/O request) was merged with the shared bfq_queue. One of
these parameters is the weight-raising state.

If, on the split of a shared bfq_queue,
1) a pre-existing shared bfq_queue is turned into a non-shared
bfq_queue;
2) the previously shared bfq_queue happens to be busy;
3) the weight-raising state of the previously shared bfq_queue happens
to change;
the number of weight-raised busy queues changes. The field
wr_busy_queues must then be updated accordingly, but such an update
was missing. This commit adds the missing update.
Reported-by: NLuca Miccio <lucmiccio@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

13c931bd

block: don't set bounce limit in blk_init_queue · 8fc45044

由 Christoph Hellwig 提交于 6月 19, 2017

Instead move it to the callers.  Those that either don't use bio_data() or
page_address() or are specific to architectures that do not support highmem
are skipped.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8fc45044

block: don't set bounce limit in blk_init_allocated_queue · 0bf6595e

由 Christoph Hellwig 提交于 6月 19, 2017

And just move it into scsi_transport_sas which needs it due to low-level
drivers directly derferencing bio_data, and into blk_init_queue_node,
which will need a further push into the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bf6595e

blk-mq: don't bounce by default · 46685d1a

由 Christoph Hellwig 提交于 6月 19, 2017

For historical reasons we default to bouncing highmem pages for all block
queues.  But the blk-mq drivers are easy to audit to ensure that we don't
need this - scsi and mtip32xx set explicit limits and everyone else doesn't
have any particular ones.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

46685d1a

block: don't bother with bounce limits for make_request drivers · 0b0bcacc

由 Christoph Hellwig 提交于 6月 19, 2017

We only call blk_queue_bounce for request-based drivers, so stop messing
with it for make_request based drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b0bcacc

block: remove the queue_bounce_pfn helper · 1c4bc3ab

由 Christoph Hellwig 提交于 6月 19, 2017

Only used inside the bounce code, and opencoding it makes it more obvious
what is going on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1c4bc3ab

C
block: move bounce declarations to block/blk.h · 3bce016a
由 Christoph Hellwig 提交于 6月 19, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
3bce016a

blk-map: call blk_queue_bounce from blk_rq_append_bio · caa4b024

由 Christoph Hellwig 提交于 6月 27, 2017

This makes moves the knowledge about bouncing out of the callers into the
block core (just like we do for the normal I/O path), and allows to unexport
blk_queue_bounce.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

caa4b024

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功