提交 · 5034435c84bea5e92c6a7dee70b51f0c0e441a51 · openanolis / cloud-kernel

29 8月, 2017 3 次提交

block: Make blk_dequeue_request() static · 5034435c

由 Damien Le Moal 提交于 8月 29, 2017

The only caller of this function is blk_start_request() in the same
file. Fix blk_start_request() description accordingly.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5034435c

block: fix warning when I/O elevator is changed as request_queue is being removed · e9a823fb

由 David Jeffery 提交于 8月 28, 2017

There is a race between changing I/O elevator and request_queue removal
which can trigger the warning in kobject_add_internal.  A program can
use sysfs to request a change of elevator at the same time another task
is unregistering the request_queue the elevator would be attached to.
The elevator's kobject will then attempt to be connected to the
request_queue in the object tree when the request_queue has just been
removed from sysfs.  This triggers the warning in kobject_add_internal
as the request_queue no longer has a sysfs directory:

kobject_add_internal failed for iosched (error: -2 parent: queue)
------------[ cut here ]------------
WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0

To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
changing the elevator and use the request_queue's sysfs_lock to
serialize between clearing the flag and the elevator testing the flag.
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9a823fb

block, scheduler: convert xxx_var_store to void · 235f8da1

由 weiping zhang 提交于 8月 25, 2017

The last parameter "count" never be used in xxx_var_store,
convert these functions to void.
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

235f8da1

26 8月, 2017 2 次提交

blkcg: avoid free blkcg_root when failed to alloc blkcg policy · 4c18c9e9

由 weiping zhang 提交于 8月 25, 2017

this patch fix two errors, firstly avoid kfree blk_root, secondly not
free(blkcg) ,if blkcg alloc fail(blkcg == NULL), just unlock that mutex;
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4c18c9e9

block: update comments to reflect REQ_FLUSH -> REQ_PREFLUSH rename · 3140c3cf

由 Omar Sandoval 提交于 8月 24, 2017

Normally I wouldn't bother with this, but in my opinion the comments are
the most important part of this whole file since without them no one
would have any clue how this insanity works.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3140c3cf

24 8月, 2017 8 次提交

compat_hdio_ioctl: Fix a declaration · 6a934bb8

由 Bart Van Assche 提交于 8月 23, 2017

This patch avoids that sparse reports the following warning messages:

block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
block/compat_ioctl.c:85:11: expected unsigned long *[noderef] <asn:1>p
block/compat_ioctl.c:85:11: got void [noderef] <asn:1>*
block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
block/compat_ioctl.c:91:21: expected void const volatile [noderef] <asn:1>*<noident>
block/compat_ioctl.c:91:21: got unsigned long *[noderef] <asn:1>p
block/compat_ioctl.c:87:53: warning: dereference of noderef expression
block/compat_ioctl.c:91:21: warning: dereference of noderef expression

Fixes: commit d597580d ("generic ...copy_..._user primitives")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a934bb8

block: remove blk_free_devt in add_partition · 47570848

由 weiping zhang 提交于 8月 18, 2017

put_device(pdev) will call pdev->type->release finally, and blk_free_devt
has been called in part_release(), so remove it.
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

47570848

bio-integrity: Fix regression if profile verify_fn is NULL · 97e05463

由 Milan Broz 提交于 8月 09, 2017

In dm-integrity target we register integrity profile that have
both generate_fn and verify_fn callbacks set to NULL.

This is used if dm-integrity is stacked under a dm-crypt device
for authenticated encryption (integrity payload contains authentication
tag and IV seed).

In this case the verification is done through own crypto API
processing inside dm-crypt; integrity profile is only holder
of these data. (And memory is owned by dm-crypt as well.)

After the commit (and previous changes)
  Commit 7c20f116
  Author: Christoph Hellwig <hch@lst.de>
  Date:   Mon Jul 3 16:58:43 2017 -0600

    bio-integrity: stop abusing bi_end_io

we get this crash:

: BUG: unable to handle kernel NULL pointer dereference at   (null)
: IP:   (null)
: *pde = 00000000
...
:
: Workqueue: kintegrityd bio_integrity_verify_fn
: task: f48ae180 task.stack: f4b5c000
: EIP:   (null)
: EFLAGS: 00210286 CPU: 0
: EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
: ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
: CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
: Call Trace:
:  ? bio_integrity_process+0xe3/0x1e0
:  bio_integrity_verify_fn+0xea/0x150
:  process_one_work+0x1c7/0x5c0
:  worker_thread+0x39/0x380
:  kthread+0xd6/0x110
:  ? process_one_work+0x5c0/0x5c0
:  ? kthread_worker_fn+0x100/0x100
:  ? kthread_worker_fn+0x100/0x100
:  ret_from_fork+0x19/0x24
: Code:  Bad EIP value.
: EIP:   (null) SS:ESP: 0068:f4b5dea4
: CR2: 0000000000000000

Patch just skip the whole verify workqueue if verify_fn is set to NULL.

Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
[hch: trivial whitespace fix]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

97e05463

block, bfq: fix error handle in bfq_init · 37dcd657

由 weiping zhang 提交于 8月 19, 2017

if elv_register fail, bfq_pool should be free.
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

37dcd657

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

block: add a __disk_get_part helper · 807d4af2

由 Christoph Hellwig 提交于 8月 23, 2017

This helper allows looking up a partion under RCU protection without
grabbing a reference to it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

807d4af2

C
block: reject attempts to allocate more than DISK_MAX_PARTS partitions · de65b012
由 Christoph Hellwig 提交于 8月 23, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
de65b012

block: Warn if blk_queue_rq_timed_out() is called for a blk-mq queue · 130d733a

由 Bart Van Assche 提交于 8月 23, 2017

The timeout handler set by blk_queue_rq_timed_out() is only used
in single queue mode. Calling this function for blk-mq drivers is
wrong. Hence issue a warning if this function is called by a blk-mq
driver.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

130d733a

18 8月, 2017 6 次提交

block: Relax a check in blk_start_queue() · 4ddd56b0

由 Bart Van Assche 提交于 8月 17, 2017

Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:

WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
 skd_unquiesce_dev+0x12a/0x1d0 [skd]
 skd_complete_internal+0x1e7/0x5a0 [skd]
 skd_complete_other+0xc2/0xd0 [skd]
 skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
 skd_isr+0x14f/0x180 [skd]
 irq_forced_thread_fn+0x2a/0x70
 irq_thread+0x144/0x1a0
 kthread+0x125/0x140
 ret_from_fork+0x2a/0x40

Fixes: commit a038e253 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ddd56b0

genhd: Annotate all part and part_tbl pointer dereferences · 6d2cf6f2

由 Bart Van Assche 提交于 8月 17, 2017

Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with
rcu_dereference_protected(). This patch does not change the behavior
of the modified code but ensures that sparse does not complain about
disk->part_tbl manipulations nor about part_tbl->part accesses.
Additionally, improve documentation of the locking requirements of
the modified functions.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6d2cf6f2

blk-mq-debugfs: Declare a local symbol static · f8465933

由 Bart Van Assche 提交于 8月 17, 2017

This was detected by sparse.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8465933

blk-mq: Make blk_mq_reinit_tagset() calls easier to read · d352ae20

由 Bart Van Assche 提交于 8月 17, 2017

Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d352ae20

block: Unexport blk_queue_end_tag() · 37f02e5f

由 Bart Van Assche 提交于 8月 17, 2017

This function is only used inside the block layer core. Hence
unexport it.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

37f02e5f

block: Fix two comments that refer to .queue_rq() return values · 4d606219

由 Bart Van Assche 提交于 8月 17, 2017

Since patch "blk-mq: switch .queue_rq return value to blk_status_t"
.queue_rq() returns a BLK_STS_* value instead of a BLK_MQ_RQ_*
value. Hence refer to the former in comments about .queue_rq()
return values.

Fixes: commit 39a70c76 ("blk-mq: clarify dispatch may not be drained/blocked by stopping queue")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4d606219

11 8月, 2017 3 次提交

cfq: Give a chance for arming slice idle timer in case of group_idle · b3193bc0

由 Ritesh Harjani 提交于 8月 09, 2017

In below scenario blkio cgroup does not work as per their assigned
weights :-
1. When the underlying device is nonrotational with a single HW queue
with depth of >= CFQ_HW_QUEUE_MIN
2. When the use case is forming two blkio cgroups cg1(weight 1000) &
cg2(wight 100) and two processes(file1 and file2) doing sync IO in
their respective blkio cgroups.

For above usecase result of fio (without this patch):-
file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan  1 19:41:49 1970
  write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
<...>
file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan  1 19:41:49 1970
  write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
<...>
// both the process BW is equal even though they belong to diff.
cgroups with weight of 1000(cg1) and 100(cg2)

In above case (for non rotational NCQ devices),
as soon as the request from cg1 is completed and even
though it is provided with higher set_slice=10, because of CFQ
algorithm when the driver tries to fetch the request, CFQ expires
this group without providing any idle time nor weight priority
and schedules another cfq group (in this case cg2).
And thus both cfq groups(cg1 & cg2) keep alternating to get the
disk time and hence loses the cgroup weight based scheduling.

Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
to arm the slice timer in case group_idle is enabled.
In case if group_idle is also not required (including for nonrotational
NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
such cases.

With this patch result of fio(for above usecase) :-
file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan  1 00:06:08 1970
  write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
<..>
file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan  1 00:06:08 1970
  write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
<..>
// In this processes BW is as per their respective cgroups weight.
Signed-off-by: NRitesh Harjani <riteshh@codeaurora.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b3193bc0

block, bfq: boost throughput with flash-based non-queueing devices · edaf9428

由 Paolo Valente 提交于 8月 04, 2017

When a queue associated with a process remains empty, there are cases
where throughput gets boosted if the device is idled to await the
arrival of a new I/O request for that queue. Currently, BFQ assumes
that one of these cases is when the device has no internal queueing
(regardless of the properties of the I/O being served). Unfortunately,
this condition has proved to be too general. So, this commit refines it
as "the device has no internal queueing and is rotational".

This refinement provides a significant throughput boost with random
I/O, on flash-based storage without internal queueing. For example, on
a HiKey board, throughput increases by up to 125%, growing, e.g., from
6.9MB/s to 15.6MB/s with two or three random readers in parallel.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NLuca Miccio <lucmiccio@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

edaf9428

block,bfq: refactor device-idling logic · d5be3fef

由 Paolo Valente 提交于 8月 04, 2017

The logic that decides whether to idle the device is scattered across
three functions. Almost all of the logic is in the function
bfq_bfqq_may_idle, but (1) part of the decision is made in
bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
switch off idling regardless of the output of bfq_bfqq_may_idle. In
addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
their decisions as a function of parameters that are used, for similar
purposes, also in bfq_bfqq_may_idle. This commit addresses these
issues by moving all the logic into bfq_bfqq_may_idle.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d5be3fef

10 8月, 2017 7 次提交

block: remove unused syncfull/asyncfull queue flags · e743eb1e

由 Jens Axboe 提交于 8月 10, 2017

We haven't used these in years, but somehow the definitions still
remained. Kill them, and renumber the QUEUE_FLAG_ space. We had
a hole in the beginning of the space, too.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e743eb1e

blk-mq: enable checking two part inflight counts at the same time · b8d62b3a

由 Jens Axboe 提交于 8月 08, 2017

Modify blk_mq_in_flight() to count both a partition and root at
the same time. Then we only have to call it once, instead of
potentially looping the tags twice.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b8d62b3a

blk-mq: provide internal in-flight variant · f299b7c7

由 Jens Axboe 提交于 8月 08, 2017

We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f299b7c7

block: make part_in_flight() take an array of two ints · 0609e0ef

由 Jens Axboe 提交于 8月 08, 2017

Instead of returning the count that matches the partition, pass
in an array of two ints. Index 0 will be filled with the inflight
count for the partition in question, and index 1 will filled
with the root inflight count, if the partition passed in is not the
root.

This is in preparation for being able to calculate both in one
go.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0609e0ef

block: pass in queue to inflight accounting · d62e26b3

由 Jens Axboe 提交于 6月 30, 2017

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d62e26b3

blk-mq-tag: check for NULL rq when iterating tags · 7f5562d5

由 Jens Axboe 提交于 8月 04, 2017

Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.

This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7f5562d5

bio-integrity: move the bio integrity profile check earlier in bio_integrity_prep · 9346beb9

由 Christoph Hellwig 提交于 8月 09, 2017

This makes the code more obvious, and moves the most likely branch first
in the function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9346beb9

02 8月, 2017 1 次提交

block: Add comment to submit_bio_wait() · 3d289d68

由 Jan Kara 提交于 8月 02, 2017

submit_bio_wait() does not consume bio reference. Add comment about
that.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3d289d68

01 8月, 2017 1 次提交

blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled · b7a71e66

由 Jens Axboe 提交于 8月 01, 2017

We recently had a bug in the IPR SCSI driver, where it would end up
making the SCSI mid layer run the mq hardware queue with interrupts
disabled. This isn't legal, since the software queue locking relies
on never being grabbed from interrupt context. Additionally, drivers
that set BLK_MQ_F_BLOCKING may schedule from this context.

Add a WARN_ON_ONCE() to catch bad users up front.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7a71e66

29 7月, 2017 3 次提交

blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags · 18e9781d

由 Jens Axboe 提交于 7月 27, 2017

We know we're in process context, so don't bother using the
IRQ safe versions of the spin lock.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

18e9781d

block: use standard blktrace API to output cgroup info for debug notes · 35fe6d76

由 Shaohua Li 提交于 7月 12, 2017

Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.

Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup' option enabled. cgroup
info isn't output as a string by default too, we only do this with
'blk_cgname' option enabled. Also cgroup info is output in different
position of the note string. I think these behavior changes aren't a big
issue (actually we make trace data shorter which is good), since the
blktrace note is solely for debugging.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

35fe6d76

block: always attach cgroup info into bio · 007cc56b

由 Shaohua Li 提交于 7月 12, 2017

blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
info.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

007cc56b

25 7月, 2017 1 次提交

blk-mq: map queues to all present CPUs · 76451d79

由 Christoph Hellwig 提交于 7月 18, 2017

We already do this for PCI mappings, and the higher level code now
expects that CPU on/offlining doesn't have an affect on the queue
mappings.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76451d79

24 7月, 2017 1 次提交

block: disable runtime-pm for blk-mq · 765e40b6

由 Christoph Hellwig 提交于 7月 21, 2017

The blk-mq code lacks support for looking at the rpm_status field, tracking
active requests and the RQF_PM flag.

Due to the default switch to blk-mq for scsi people start to run into
suspend / resume issue due to this fact, so make sure we disable the runtime
PM functionality until it is properly implemented.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

765e40b6

12 7月, 2017 2 次提交

bfq: dispatch request to prevent queue stalling after the request completion · 3f7cb4f4

由 Hou Tao 提交于 7月 11, 2017

There are mq devices (eg., virtio-blk, nbd and loopback) which don't
invoke blk_mq_run_hw_queues() after the completion of a request.
If bfq is enabled on these devices and the slice_idle attribute or
strict_guarantees attribute is set as zero, it is possible that
after a request completion the remaining requests of busy bfq queue
will stalled in the bfq schedule until a new request arrives.

To fix the scheduler latency problem, we need to check whether or not
all issued requests have completed and dispatch more requests to driver
if there is no request in driver.

The problem can be reproduced by running the following script
on a virtio-blk device with nr_hw_queues as 1:

#!/bin/sh

dev=vdb
# mount point for dev
mp=/tmp/mnt
cd $mp

job=strict.job
cat <<EOF > $job
[global]
direct=1
bs=4k
size=256M
rw=write
ioengine=libaio
iodepth=128
runtime=5
time_based

[1]
filename=1.data

[2]
new_group
filename=2.data
EOF

echo bfq > /sys/block/$dev/queue/scheduler
echo 1 > /sys/block/$dev/queue/iosched/strict_guarantees
fio $job
Signed-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f7cb4f4

bfq: fix typos in comments about B-WF2Q+ algorithm · 38c91407

由 Hou Tao 提交于 7月 12, 2017

The start time of eligible entity should be less than or equal to
the current virtual time, and the entity in idle tree has a finish
time being greater than the current virtual time.
Signed-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38c91407

11 7月, 2017 1 次提交

block: call bio_uninit in bio_endio · b222dd2f

由 Shaohua Li 提交于 7月 10, 2017

bio_free isn't a good place to free cgroup info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should be called anyway. The
bio_uninit call in bio_free is kept, in case the bio never gets called
bio endio.

This assumes ->bi_end_io() doesn't access cgroup info, which seems true
in my audit.

This along with Christoph's integrity patch should fix the memory leak
issue.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b222dd2f

06 7月, 2017 1 次提交

block: Fix __blkdev_issue_zeroout loop · 615d22a5

由 Damien Le Moal 提交于 7月 06, 2017

The BIO issuing loop in __blkdev_issue_zeroout() is allocating BIOs
with a maximum number of bvec (pages) equal to

min(nr_sects, (sector_t)BIO_MAX_PAGES)

This works since the requested number of bvecs will always be limited
to the absolute maximum number supported (BIO_MAX_PAGES), but this is
ineficient as too many bvec entries may be requested due to the
different units being used in the min() operation (number of sectors vs
number of pages).
To fix this, introduce the helper __blkdev_sectors_to_bio_pages() to
correctly calculate the number of bvecs for zeroout BIOs as the issuing
loop progresses. The calculation is done using consistent units and
makes sure that the number of pages return is at least 1 (for cases
where the number of sectors is less that the number of sectors in
a page).

Also remove a trailing space after the bit shift in the internal loop
min() call.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

615d22a5

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功