提交 · fc13457f74dcf054b0d17efb7b94b46fdf17f412 · openanolis / cloud-kernel

05 10月, 2017 2 次提交

blk-mq: document the need to have STARTED and COMPLETED share a byte · fc13457f

由 Jens Axboe 提交于 10月 04, 2017

For memory ordering guarantees on stores, we need to ensure that
these two bits share the same byte of storage in the unsigned
long. Add a comment as to why, and a BUILD_BUG_ON() to ensure that
we don't violate this requirement.
Suggested-by: NBoqun Feng <boqun.feng@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc13457f

blk-mq: attempt to fix atomic flag memory ordering · a7af0af3

由 Peter Zijlstra 提交于 9月 06, 2017

Attempt to untangle the ordering in blk-mq. The patch introducing the
single smp_mb__before_atomic() is obviously broken in that it doesn't
clearly specify a pairing barrier and an obtained guarantee.

The comment is further misleading in that it hints that the
deadline store and the COMPLETE store also need to be ordered, but
AFAICT there is no such dependency. However what does appear to be
important is the clear happening _after_ the store, and that worked by
pure accident.

This clarifies blk_mq_start_request() -- we should not get there with
STARTING set -- this simplifies the code and makes the barrier usage
sane (the old code could be read to allow not having _any_ atomic after
the barrier, in which case the barrier hasn't got anything to order). We
then also introduce the missing pairing barrier for it.

Also down-grade the barrier to smp_wmb(), this is cheaper for
PowerPC/ARM and doesn't cost anything extra on x86.

And it documents the STARTING vs COMPLETE ordering. Although I've not
been entirely successful in reverse engineering the blk-mq state
machine so there might still be more funnies around timeout vs
requeue.

If I got anything wrong, feel free to educate me by adding comments to
clarify things ;-)

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: 538b7534 ("blk-mq: request deadline must be visible before marking rq as started")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a7af0af3

03 10月, 2017 6 次提交

block: move __elv_next_request to blk-core.c · 9c988374

由 Christoph Hellwig 提交于 10月 03, 2017

No need to have this helper inline in a header.  Also drop the __ prefix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9c988374

block, bfq: decrease burst size when queues in burst exit · 7cb04004

由 Paolo Valente 提交于 9月 21, 2017

If many queues belonging to the same group happen to be created
shortly after each other, then the concurrent processes associated
with these queues have typically a common goal, and they get it done
as soon as possible if not hampered by device idling. Examples are
processes spawned by git grep, or by systemd during boot. As for
device idling, this mechanism is currently necessary for weight
raising to succeed in its goal: privileging I/O. In view of these
facts, BFQ does not provide the above queues with either weight
raising or device idling.

On the other hand, a burst of queue creations may be caused also by
the start-up of a complex application. In this case, these queues need
usually to be served one after the other, and as quickly as possible,
to maximise responsiveness. Therefore, in this case the best strategy
is to weight-raise all the queues created during the burst, i.e., the
exact opposite of the strategy for the above case.

To distinguish between the two cases, BFQ uses an empirical burst-size
threshold, found through extensive tests and monitoring of daily
usage. Only large bursts, i.e., burst with a size above this
threshold, are considered as generated by a high number of parallel
processes. In this respect, upstart-based boot proved to be rather
hard to detect as generating a large burst of queue creations, because
with upstart most of the queues created in a burst exit *before* the
next queues in the same burst are created. To address this issue, I
changed the burst-detection mechanism so as to not decrease the size
of the current burst even if one of the queues in the burst is
eliminated.

Unfortunately, this missing decrease causes false positives on very
fast systems: on the start-up of a complex application, such as
libreoffice writer, so many queues are created, served and exited
shortly after each other, that a large burst of queue creations is
wrongly detected as occurring. These false positives just disappear if
the size of a burst is decreased when one of the queues in the burst
exits. This commit restores the missing burst-size decrease, relying
of the fact that upstart is apparently unlikely to be used on systems
running this and future versions of the kernel.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NMauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7cb04004

block, bfq: let early-merged queues be weight-raised on split too · 894df937

由 Paolo Valente 提交于 9月 21, 2017

A just-created bfq_queue, say Q, may happen to be merged with another
bfq_queue on the very first invocation of the function
__bfq_insert_request. In such a case, even if Q would clearly deserve
interactive weight raising (as it has just been created), the function
bfq_add_request does not make it to be invoked for Q, and thus to
activate weight raising for Q. As a consequence, when the state of Q
is saved for a possible future restore, after a split of Q from the
other bfq_queue(s), such a state happens to be (unjustly)
non-weight-raised. Then the bfq_queue will not enjoy any weight
raising on the split, even if should still be in an interactive
weight-raising period when the split occurs.

This commit solves this problem as follows, for a just-created
bfq_queue that is being early-merged: it stores directly, in the saved
state of the bfq_queue, the weight-raising state that would have been
assigned to the bfq_queue if not early-merged.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

894df937

block, bfq: check and switch back to interactive wr also on queue split · 3e2bdd6d

由 Paolo Valente 提交于 9月 21, 2017

As already explained in the message of commit "block, bfq: fix
wrong init of saved start time for weight raising", if a soft
real-time weight-raising period happens to be nested in a larger
interactive weight-raising period, then BFQ restores the interactive
weight raising at the end of the soft real-time weight raising. In
particular, BFQ checks whether the latter has ended only on request
dispatches.

Unfortunately, the above scheme fails to restore interactive weight
raising in the following corner case: if a bfq_queue, say Q,
1) Is merged with another bfq_queue while it is in a nested soft
real-time weight-raising period. The weight-raising state of Q is
then saved, and not considered any longer until a split occurs.
2) Is split from the other bfq_queue(s) at a time instant when its
soft real-time weight raising is already finished.
On the split, while resuming the previous, soft real-time
weight-raised state of the bfq_queue Q, BFQ checks whether the
current soft real-time weight-raising period is actually over. If so,
BFQ switches weight raising off for Q, *without* checking whether the
soft real-time period was actually nested in a non-yet-finished
interactive weight-raising period.

This commit addresses this issue by adding the above missing check in
bfq_queue splits, and restoring interactive weight raising if needed.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e2bdd6d

block, bfq: fix wrong init of saved start time for weight raising · 4baa8bb1

由 Paolo Valente 提交于 9月 21, 2017

This commit fixes a bug that causes bfq to fail to guarantee a high
responsiveness on some drives, if there is heavy random read+write I/O
in the background. More precisely, such a failure allowed this bug to
be found [1], but the bug may well cause other yet unreported
anomalies.

BFQ raises the weight of the bfq_queues associated with soft real-time
applications, to privilege the I/O, and thus reduce latency, for these
applications. This mechanism is named soft-real-time weight raising in
BFQ. A soft real-time period may happen to be nested into an
interactive weight raising period, i.e., it may happen that, when a
bfq_queue switches to a soft real-time weight-raised state, the
bfq_queue is already being weight-raised because deemed interactive
too. In this case, BFQ saves in a special variable
wr_start_at_switch_to_srt, the time instant when the interactive
weight-raising period started for the bfq_queue, i.e., the time
instant when BFQ started to deem the bfq_queue interactive. This value
is then used to check whether the interactive weight-raising period
would still be in progress when the soft real-time weight-raising
period ends. If so, interactive weight raising is restored for the
bfq_queue. This restore is useful, in particular, because it prevents
bfq_queues from losing their interactive weight raising prematurely,
as a consequence of spurious, short-lived soft real-time
weight-raising periods caused by wrong detections as soft real-time.

If, instead, a bfq_queue switches to soft-real-time weight raising
while it *is not* already in an interactive weight-raising period,
then the variable wr_start_at_switch_to_srt has no meaning during the
following soft real-time weight-raising period. Unfortunately the
handling of this case is wrong in BFQ: not only the variable is not
flagged somehow as meaningless, but it is also set to the time when
the switch to soft real-time weight-raising occurs. This may cause an
interactive weight-raising period to be considered mistakenly as still
in progress, and thus a spurious interactive weight-raising period to
start for the bfq_queue, at the end of the soft-real-time
weight-raising period. In particular the spurious interactive
weight-raising period will be considered as still in progress, if the
soft-real-time weight-raising period does not last very long. The
bfq_queue will then be wrongly privileged and, if I/O bound, will
unjustly steal bandwidth to truly interactive or soft real-time
bfq_queues, harming responsiveness and low latency.

This commit fixes this issue by just setting wr_start_at_switch_to_srt
to minus infinity (farthest past time instant according to jiffies
macros): when the soft-real-time weight-raising period ends, certainly
no interactive weight-raising period will be considered as still in
progress.

[1] Background I/O Type: Random - Background I/O mix: Reads and writes
- Application to start: LibreOffice Writer in
http://www.phoronix.com/scan.php?page=news_item&px=Linux-4.13-IO-LaptopSigned-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NAngelo Ruocco <angeloruocco90@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Tested-by: NMirko Montanari <mirkomontanari91@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4baa8bb1

blk-mq: wire up completion notifier for laptop mode · 7beb2f84

由 Jens Axboe 提交于 9月 30, 2017

For some reason, the laptop mode IO completion notified was never wired
up for blk-mq. Ensure that we trigger the callback appropriately, to arm
the laptop mode flush timer.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7beb2f84

01 10月, 2017 1 次提交

blk-mq-tag: kill unused tag enums · 5385fa47

由 Jens Axboe 提交于 10月 01, 2017

We don't have any notion of a tagging cache anymore, and haven't
for a long time. Kill off the unused enums.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5385fa47

30 9月, 2017 1 次提交

blk-mq: remove unused function hctx_allow_merges · 54724873

由 weiping zhang 提交于 9月 30, 2017

since 9bddeb2a "blk-mq: make per-sw-queue bio merge as default .bio_merge"
there is no caller for this function.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

54724873

26 9月, 2017 1 次提交

blkcg: delete unused APIs · af551fb3

由 Shaohua Li 提交于 9月 14, 2017

Nobody uses the APIs right now.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af551fb3

25 9月, 2017 3 次提交

block: fix a crash caused by wrong API · f5c156c4

由 Shaohua Li 提交于 9月 21, 2017

part_stat_show takes a part device not a disk, so we should use
part_to_disk.

Fixes: d62e26b3("block: pass in queue to inflight accounting")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5c156c4

blktrace: Fix potential deadlock between delete & sysfs ops · 5acb3cc2

由 Waiman Long 提交于 9月 20, 2017

The lockdep code had reported the following unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(s_active#228);
                               lock(&bdev->bd_mutex/1);
                               lock(s_active#228);
  lock(&bdev->bd_mutex);

 *** DEADLOCK ***

The deadlock may happen when one task (CPU1) is trying to delete a
partition in a block device and another task (CPU0) is accessing
tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
partition.

The s_active isn't an actual lock. It is a reference count (kn->count)
on the sysfs (kernfs) file. Removal of a sysfs file, however, require
a wait until all the references are gone. The reference count is
treated like a rwsem using lockdep instrumentation code.

The fact that a thread is in the sysfs callback method or in the
ioctl call means there is a reference to the opended sysfs or device
file. That should prevent the underlying block structure from being
removed.

Instead of using bd_mutex in the block_device structure, a new
blk_trace_mutex is now added to the request_queue structure to protect
access to the blk_trace structure.
Suggested-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NWaiman Long <longman@redhat.com>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

Fix typo in patch subject line, and prune a comment detailing how
the code used to work.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5acb3cc2

bsg-lib: don't free job in bsg_prepare_job · f507b54d

由 Christoph Hellwig 提交于 9月 07, 2017

The job structure is allocated as part of the request, so we should not
free it in the error path of bsg_prepare_job.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f507b54d

12 9月, 2017 1 次提交

block: directly insert blk-mq request from blk_insert_cloned_request() · 157f377b

由 Jens Axboe 提交于 9月 11, 2017

A NULL pointer crash was reported for the case of having the BFQ IO
scheduler attached to the underlying blk-mq paths of a DM multipath
device.  The crash occured in blk_mq_sched_insert_request()'s call to
e->type->ops.mq.insert_requests().

Paolo Valente correctly summarized why the crash occured with:
"the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
blk_rq_prep_clone) creates a cloned request without invoking
e->type->ops.mq.prepare_request for the target elevator e.  The cloned
request is therefore not initialized for the scheduler, but it is
however inserted into the scheduler by blk_mq_sched_insert_request."

All said, a request-based DM multipath device's IO scheduler should be
the only one used -- when the original requests are issued to the
underlying paths as cloned requests they are inserted directly in the
underlying dispatch queue(s) rather than through an additional elevator.

But commit bd166ef1 ("blk-mq-sched: add framework for MQ capable IO
schedulers") switched blk_insert_cloned_request() from using
blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
incorrectly added elevator machinery into a call chain that isn't
supposed to have any.

To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
that blk_insert_cloned_request() calls to insert the request without
involving any elevator that may be attached to the cloned request's
request_queue.

Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Cc: stable@vger.kernel.org
Reported-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Tested-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

157f377b

11 9月, 2017 2 次提交

block: fix integer overflow in __blkdev_sectors_to_bio_pages() · 09c2c359

由 Mikulas Patocka 提交于 9月 11, 2017

Fix possible integer overflow in __blkdev_sectors_to_bio_pages if
sector_t is 32-bit.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 615d22a5 ("block: Fix __blkdev_issue_zeroout loop")
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

09c2c359

block: sed-opal: Set MBRDone on S3 resume path if TPER is MBREnabled · dbec491b

由 Scott Bauer 提交于 9月 01, 2017

Users who are booting off their Opal enabled drives are having
issues when they have a shadow MBR set up after s3/resume cycle.
When the Drive has a shadow MBR setup the MBRDone flag is set to
false upon power loss (S3/S4/S5). When the MBRDone flag is false
I/O to LBA 0 -> LBA_END_MBR are remapped to the shadow mbr
of the drive. If the drive contains useful data in the 0 -> end_mbr
range upon s3 resume the user can never get to that data as the
drive will keep remapping it to the MBR. To fix this when we unlock
on S3 resume, we need to tell the drive that we're done with the
shadow mbr (even though we didnt use it) by setting true to MBRDone.
This way the drive will stop the remapping and the user can access
their data.

Acked-by Jon Derrick: <jonathan.derrick@intel.com>
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbec491b

09 9月, 2017 2 次提交

block/cfq: cache rightmost rb_node · f0f1a45f

由 Davidlohr Bueso 提交于 9月 08, 2017

For the same reasons we already cache the leftmost pointer, apply the same
optimization for rb_last() calls.  Users must explicitly do this as
rb_root_cached only deals with the smallest node.

[dave@stgolabs.net: brain fart #1]
  Link: http://lkml.kernel.org/r/20170731155955.GD21328@linux-80c1.suse
Link: http://lkml.kernel.org/r/20170719014603.19029-18-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0f1a45f

block/cfq: replace cfq_rb_root leftmost caching · 09663c86

由 Davidlohr Bueso 提交于 9月 08, 2017

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Link: http://lkml.kernel.org/r/20170719014603.19029-11-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09663c86

02 9月, 2017 5 次提交

bfq: Use icq_to_bic() consistently · 12cd3a2f

由 Bart Van Assche 提交于 8月 30, 2017

Some code uses icq_to_bic() to convert an io_cq pointer to a
bfq_io_cq pointer while other code uses a direct cast. Convert
the code that uses a direct cast such that it uses icq_to_bic().
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

12cd3a2f

bfq: Suppress compiler warnings about comparisons · 1530486c

由 Bart Van Assche 提交于 8月 30, 2017

This patch avoids that the following warnings are reported when
building with W=1:

block/bfq-iosched.c: In function 'bfq_back_seek_max_store':
block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
  if (__data < (MIN))      \
             ^
block/bfq-iosched.c:4876:1: note: in expansion of macro 'STORE_FUNCTION'
 STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
 ^~~~~~~~~~~~~~
block/bfq-iosched.c: In function 'bfq_slice_idle_store':
block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
  if (__data < (MIN))      \
             ^
block/bfq-iosched.c:4879:1: note: in expansion of macro 'STORE_FUNCTION'
 STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 2);
 ^~~~~~~~~~~~~~
block/bfq-iosched.c: In function 'bfq_slice_idle_us_store':
block/bfq-iosched.c:4892:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
  if (__data < (MIN))      \
             ^
block/bfq-iosched.c:4899:1: note: in expansion of macro 'USEC_STORE_FUNCTION'
 USEC_STORE_FUNCTION(bfq_slice_idle_us_store, &bfqd->bfq_slice_idle, 0,
 ^~~~~~~~~~~~~~~~~~~
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1530486c

bfq: Check kstrtoul() return value · 2f79136b

由 Bart Van Assche 提交于 8月 30, 2017

Make sysfs writes fail for invalid numbers instead of storing
uninitialized data copied from the stack. This patch removes
all uninitialized_var() occurrences from the BFQ source code.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2f79136b

bfq: Declare local functions static · dfb79af5

由 Bart Van Assche 提交于 8月 30, 2017

Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dfb79af5

bfq: Annotate fall-through in a switch statement · fa393d1b

由 Bart Van Assche 提交于 8月 30, 2017

This patch avoids that gcc 7 issues a warning about fall-through
when building with W=1.
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fa393d1b

01 9月, 2017 1 次提交

compat_hdio_ioctl: Fix a declaration · 8363dae2

由 Bart Van Assche 提交于 8月 23, 2017

This patch avoids that sparse reports the following warning messages:

block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
block/compat_ioctl.c:85:11: expected unsigned long *[noderef] <asn:1>p
block/compat_ioctl.c:85:11: got void [noderef] <asn:1>*
block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
block/compat_ioctl.c:91:21: expected void const volatile [noderef] <asn:1>*<noident>
block/compat_ioctl.c:91:21: got unsigned long *[noderef] <asn:1>p
block/compat_ioctl.c:87:53: warning: dereference of noderef expression
block/compat_ioctl.c:91:21: warning: dereference of noderef expression

Fixes: commit d597580d ("generic ...copy_..._user primitives")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8363dae2

31 8月, 2017 3 次提交

block, bfq: guarantee update_next_in_service always returns an eligible entity · 24d90bb2

由 Paolo Valente 提交于 8月 31, 2017

If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, then it doesn't
invoke bfq_lookup_next_entity to get the next-in-service entity. In
contrast, it follows a shorter path: if E happens to be eligible (see
commit "bfq-sq-mq: make lookup_next_entity push up vtime on
expirations" for details on eligibility) and to have a lower virtual
finish time than the current candidate as next-in-service entity, then
E directly becomes the next-in-service entity. Unfortunately, there is
a corner case for which this shorter path makes
bfq_update_next_in_service choose a non eligible entity: it occurs if
both E and the current next-in-service entity happen to be non
eligible when bfq_update_next_in_service is invoked. In this case, E
is not set as next-in-service, and, since bfq_lookup_next_entity is
not invoked, the state of the parent entity is not updated so as to
end up with an eligible entity as the proper next-in-service entity.

In this respect, next-in-service is actually allowed to be non
eligible while some queue is in service: since no system-virtual-time
push-up can be performed in that case (see again commit "bfq-sq-mq:
make lookup_next_entity push up vtime on expirations" for details),
next-in-service is chosen, speculatively, as a function of the
possible value that the system virtual time may get after a push
up. But the correctness of the schedule breaks if next-in-service is
still a non eligible entity when it is time to set in service the next
entity. Unfortunately, this may happen in the above corner case.

This commit fixes this problem by making bfq_update_next_in_service
invoke bfq_lookup_next_entity not only if the above shorter path
cannot be taken, but also if the shorter path is taken but fails to
yield an eligible next-in-service entity.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

24d90bb2

block, bfq: remove direct switch to an entity in higher class · a02195ce

由 Paolo Valente 提交于 8月 31, 2017

If the function bfq_update_next_in_service is invoked as a consequence
of the activation or requeueing of an entity, say E, and finds out
that E belongs to a higher-priority class than that of the current
next-in-service entity, then it sets next_in_service directly to
E. But this may lead to anomalous schedules, because E may happen not
be eligible for service, because its virtual start time is higher than
the system virtual time for its service tree.

This commit addresses this issue by simply removing this direct
switch.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a02195ce

block, bfq: make lookup_next_entity push up vtime on expirations · 80294c3b

由 Paolo Valente 提交于 8月 31, 2017

To provide a very smooth service, bfq starts to serve a bfq_queue
only if the queue is 'eligible', i.e., if the same queue would
have started to be served in the ideal, perfectly fair system that
bfq simulates internally. This is obtained by associating each
queue with a virtual start time, and by computing a special system
virtual time quantity: a queue is eligible only if the system
virtual time has reached the virtual start time of the
queue. Finally, bfq guarantees that, when a new queue must be set
in service, there is always at least one eligible entity for each
active parent entity in the scheduler. To provide this guarantee,
the function __bfq_lookup_next_entity pushes up, for each parent
entity on which it is invoked, the system virtual time to the
minimum among the virtual start times of the entities in the
active tree for the parent entity (more precisely, the push up
occurs if the system virtual time happens to be lower than all
such virtual start times).

There is however a circumstance in which __bfq_lookup_next_entity
cannot push up the system virtual time for a parent entity, even
if the system virtual time is lower than the virtual start times
of all the child entities in the active tree. It happens if one of
the child entities is in service. In fact, in such a case, there
is already an eligible entity, the in-service one, even if it may
not be not present in the active tree (because in-service entities
may be removed from the active tree).

Unfortunately, in the last re-design of the
hierarchical-scheduling engine, the reset of the pointer to the
in-service entity for a given parent entity--reset to be done as a
consequence of the expiration of the in-service entity--always
happens after the function __bfq_lookup_next_entity has been
invoked. This causes the function to think that there is still an
entity in service for the parent entity, and then that the system
virtual time cannot be pushed up, even if actually such a
no-more-in-service entity has already been properly reinserted
into the active tree (or in some other tree if no more
active). Yet, the system virtual time *had* to be pushed up, to be
ready to correctly choose the next queue to serve. Because of the
lack of this push up, bfq may wrongly set in service a queue that
had been speculatively pre-computed as the possible
next-in-service queue, but that would no more be the one to serve
after the expiration and the reinsertion into the active trees of
the previously in-service entities.

This commit addresses this issue by making
__bfq_lookup_next_entity properly push up the system virtual time
if an expiration is occurring.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

80294c3b

30 8月, 2017 4 次提交

scsi: bsg-lib: pass the release callback through bsg_setup_queue · c1225f01

由 Christoph Hellwig 提交于 8月 25, 2017

The SAS code will need it.  Also mark the name argument const to match
bsg_register_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c1225f01

bsg: remove #if 0'ed code · c529594f

由 Christoph Hellwig 提交于 8月 29, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c529594f

mq-deadline: Enable auto-loading when built as module · 7de967e7

由 Ben Hutchings 提交于 8月 13, 2017

The block core requests modules with the "-iosched" name suffix, but
mq-deadline does not have that suffix.  Add an alias.

Fixes: 945ffb60 ("mq-deadline: add blk-mq adaptation of the deadline ...")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7de967e7

bfq: Re-enable auto-loading when built as a module · 26b4cf24

由 Ben Hutchings 提交于 8月 13, 2017

The block core requests modules with the "-iosched" name suffix, but
bfq no longer has that suffix.  Add an alias.

Fixes: ea25da48 ("block, bfq: split bfq-iosched.c into multiple ...")
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26b4cf24

29 8月, 2017 4 次提交

block: Make blk_dequeue_request() static · 5034435c

由 Damien Le Moal 提交于 8月 29, 2017

The only caller of this function is blk_start_request() in the same
file. Fix blk_start_request() description accordingly.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5034435c

smp: Avoid using two cache lines for struct call_single_data · 966a9671

由 Ying Huang 提交于 8月 08, 2017

struct call_single_data is used in IPIs to transfer information between
CPUs.  Its size is bigger than sizeof(unsigned long) and less than
cache line size.  Currently it is not allocated with any explicit alignment
requirements.  This makes it possible for allocated call_single_data to
cross two cache lines, which results in double the number of the cache lines
that need to be transferred among CPUs.

This can be fixed by requiring call_single_data to be aligned with the
size of call_single_data. Currently the size of call_single_data is the
power of 2.  If we add new fields to call_single_data, we may need to
add padding to make sure the size of new definition is the power of 2
as well.

Fortunately, this is enforced by GCC, which will report bad sizes.

To set alignment requirements of call_single_data to the size of
call_single_data, a struct definition and a typedef is used.

To test the effect of the patch, I used the vm-scalability multiple
thread swap test case (swap-w-seq-mt).  The test will create multiple
threads and each thread will eat memory until all RAM and part of swap
is used, so that huge number of IPIs are triggered when unmapping
memory.  In the test, the throughput of memory writing improves ~5%
compared with misaligned call_single_data, because of faster IPIs.
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
[ Add call_single_data_t and align with size of call_single_data. ]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

966a9671

block: fix warning when I/O elevator is changed as request_queue is being removed · e9a823fb

由 David Jeffery 提交于 8月 28, 2017

There is a race between changing I/O elevator and request_queue removal
which can trigger the warning in kobject_add_internal.  A program can
use sysfs to request a change of elevator at the same time another task
is unregistering the request_queue the elevator would be attached to.
The elevator's kobject will then attempt to be connected to the
request_queue in the object tree when the request_queue has just been
removed from sysfs.  This triggers the warning in kobject_add_internal
as the request_queue no longer has a sysfs directory:

kobject_add_internal failed for iosched (error: -2 parent: queue)
------------[ cut here ]------------
WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0

To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
changing the elevator and use the request_queue's sysfs_lock to
serialize between clearing the flag and the elevator testing the flag.
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9a823fb

block, scheduler: convert xxx_var_store to void · 235f8da1

由 weiping zhang 提交于 8月 25, 2017

The last parameter "count" never be used in xxx_var_store,
convert these functions to void.
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

235f8da1

26 8月, 2017 3 次提交

blkcg: avoid free blkcg_root when failed to alloc blkcg policy · 4c18c9e9

由 weiping zhang 提交于 8月 25, 2017

this patch fix two errors, firstly avoid kfree blk_root, secondly not
free(blkcg) ,if blkcg alloc fail(blkcg == NULL), just unlock that mutex;
Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4c18c9e9

md/raid0: attach correct cgroup info in bio · 8a8e6f84

由 Shaohua Li 提交于 8月 18, 2017

The discard bio doesn't attach the original bio cgroup info. Normal bio
is cloned, so is fine.
Signed-off-by: NShaohua Li <shli@fb.com>

8a8e6f84

block: update comments to reflect REQ_FLUSH -> REQ_PREFLUSH rename · 3140c3cf

由 Omar Sandoval 提交于 8月 24, 2017

Normally I wouldn't bother with this, but in my opinion the comments are
the most important part of this whole file since without them no one
would have any clue how this insanity works.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3140c3cf

25 8月, 2017 1 次提交

blk-mq-debugfs: Add names for recently added flags · 22d53821

由 Bart Van Assche 提交于 8月 18, 2017

The symbolic constants QUEUE_FLAG_SCSI_PASSTHROUGH, QUEUE_FLAG_QUIESCED
and REQ_NOWAIT are missing from blk-mq-debugfs.c. Add these to
blk-mq-debugfs.c such that these appear as names in debugfs instead of
as numbers.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

22d53821

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功