提交 · 6d2809d51a5079f01a416d91dd63b0766cb685d0 · openanolis / cloud-kernel

02 3月, 2017 3 次提交

blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request · 6d2809d5

由 Omar Sandoval 提交于 2月 27, 2017

blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
its blk_mq_alloc_request() counterpart. It also crashes because it
doesn't update the tags->rqs map.

Fix it by making it allocate a scheduler request.
Reported-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>

6d2809d5

blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset · 415b806d

由 Sagi Grimberg 提交于 2月 27, 2017

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

Modified by me to also check at driver tag allocation time if the
original request was reserved, so we can be sure to allocate a
properly reserved tag at that point in time, too.
Signed-off-by: NJens Axboe <axboe@fb.com>

415b806d

blk-mq: allocate blk_mq_tags and requests in correct node · 59f082e4

由 Shaohua Li 提交于 2月 01, 2017

blk_mq_tags/requests of specific hardware queue are mostly used in
specific cpus, which might not be in the same numa node as disk. For
example, a nvme card is in node 0. half hardware queue will be used by
node 0, the other node 1.
Signed-off-by: NShaohua Li <shli@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

59f082e4

28 2月, 2017 2 次提交

lib/vsprintf.c: remove %Z support · 5b5e0928

由 Alexey Dobriyan 提交于 2月 27, 2017

Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b5e0928

scripts/spelling.txt: add "embeded" pattern and fix typo instances · b43daedc

由 Masahiro Yamada 提交于 2月 27, 2017

Fix typos and add the following to the scripts/spelling.txt:

embeded||embedded

Link: http://lkml.kernel.org/r/1481573103-11329-12-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b43daedc

24 2月, 2017 4 次提交

blk-mq-sched: separate mark hctx and queue restart operations · d38d3515

由 Omar Sandoval 提交于 2月 22, 2017

In blk_mq_sched_dispatch_requests(), we call blk_mq_sched_mark_restart()
after we dispatch requests left over on our hardware queue dispatch
list. This is so we'll go back and dispatch requests from the scheduler.
In this case, it's only necessary to restart the hardware queue that we
are running; there's no reason to run other hardware queues just because
we are using shared tags.

So, split out blk_mq_sched_mark_restart() into two operations, one for
just the hardware queue and one for the whole request queue. The core
code only needs the hctx variant, but I/O schedulers will want to use
both.

This also requires adjusting blk_mq_sched_restart_queues() to always
check the queue restart flag, not just when using shared tags.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d38d3515

blk-mq: use sbq wait queues instead of restart for driver tags · da55f2cc

由 Omar Sandoval 提交于 2月 22, 2017

Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware
queues and shared tags") fixed one starvation issue for shared tags.
However, we can still get into a situation where we fail to allocate a
tag because all tags are allocated but we don't have any pending
requests on any hardware queue.

One solution for this would be to restart all queues that share a tag
map, but that really sucks. Ideally, we could just block and wait for a
tag, but that isn't always possible from blk_mq_dispatch_rq_list().

However, we can still use the struct sbitmap_queue wait queues with a
custom callback instead of blocking. This has a few benefits:

1. It avoids iterating over all hardware queues when completing an I/O,
   which the current restart code has to do.
2. It benefits from the existing rolling wakeup code.
3. It avoids punting to another thread just to have it block.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

da55f2cc

block/sed-opal: Propagate original error message to userland. · 2d19020b

由 Scott Bauer 提交于 2月 22, 2017

During an error on a comannd, ex: user provides wrong pw to unlock
range, we will gracefully terminate the opal session. We want to
propagate the original error to userland instead of the result of
the session termination, which is almost always a success.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2d19020b

block/sed-opal: Introduce free_opal_dev to free the structure and clean up state · 7d6d1578

由 Scott Bauer 提交于 2月 22, 2017

Before we free the opal structure we need to clean up any saved
locking ranges that the user had told us to unlock from a suspend.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

7d6d1578

23 2月, 2017 6 次提交

block: use for_each_thread() in sys_ioprio_set()/sys_ioprio_get() · 612dafab

由 Tetsuo Handa 提交于 2月 22, 2017

IOPRIO_WHO_USER case in sys_ioprio_set()/sys_ioprio_get() are using
while_each_thread(), which is unsafe under RCU lock according to commit
0c740d0a ("introduce for_each_thread() to replace the buggy
while_each_thread()"). Use for_each_thread() (via
for_each_process_thread()) which is safe under RCU lock.

Link: http://lkml.kernel.org/r/201702011947.DBD56740.OMVHOLOtSJFFFQ@I-love.SAKURA.ne.jp
Link: http://lkml.kernel.org/r/1486041779-4401-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

612dafab

block: get rid of blk-mq default scheduler choice Kconfig entries · b86dd815

由 Jens Axboe 提交于 2月 22, 2017

The wording in the entries were poor and not understandable
by even deities. Kill the selection for default block scheduler,
and impose a policy with sane defaults.
Architected-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b86dd815

block/sed: Embed function data into the function sequence · eed64951

由 Jon Derrick 提交于 2月 22, 2017

By embedding the function data with the function sequence, we can
eliminate the external function data and state variable code. It also
made obvious some other small cleanups.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

eed64951

block/sed: Check received header lengths · 77039b96

由 Jon Derrick 提交于 2月 21, 2017

Add a buffer size check against discovery and response header lengths
before we loop over their buffers.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

77039b96

block/sed: Add helper to qualify response tokens · cccb9241

由 Jon Derrick 提交于 2月 21, 2017

Add helper which verifies the response token is valid and matches the
expected value. Merges token_type and response_get_token.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

cccb9241

block/sed: Use ssize_t on atom parsers to return errors · aedb6e24

由 Jon Derrick 提交于 2月 21, 2017

The short atom parser can return an errno from decoding but does not
currently return the error as a signed value. Convert all of the parsers
to ssize_t.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

aedb6e24

22 2月, 2017 2 次提交

block: Unhash also block device inode for the whole device · d06e05c0

由 Jan Kara 提交于 2月 21, 2017

Iteration over partitions in del_gendisk() omits part0. Add
bdev_unhash_inode() call for the whole device. Otherwise if the device
number gets reused, bdev inode will be still associated with the old
(stale) bdi.
Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

d06e05c0

block: Move bdev_unhash_inode() after invalidate_partition() · 4b8c861a

由 Jan Kara 提交于 2月 21, 2017

Move bdev_unhash_inode() after invalidate_partition() as
invalidate_partition() looks up bdev and it cannot find the right bdev
inode after bdev_unhash_inode() is called. Thus invalidate_partition()
would not invalidate page cache of the previously used bdev. Also use
part_devt() when calling bdev_unhash_inode() instead of manually
creating the device number.
Tested-by: NLekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

4b8c861a

18 2月, 2017 7 次提交

block/sed-opal: allocate struct opal_dev dynamically · 4f1244c8

由 Christoph Hellwig 提交于 2月 17, 2017

Insted of bloating the containing structure with it all the time this
allocates struct opal_dev dynamically.  Additionally this allows moving
the definition of struct opal_dev into sed-opal.c.  For this a new
private data field is added to it that is passed to the send/receive
callback.  After that a lot of internals can be made private as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4f1244c8

block/sed-opal: tone down not supported warnings · f5b37b7c

由 Christoph Hellwig 提交于 2月 17, 2017

Not having OPAL or a sub-feature supported is an entirely normal
condition for many drives, so don't warn about it.  Keep the messages,
but tone them down to debug only.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f5b37b7c

block: don't defer flushes on blk-mq + scheduling · 7520872c

由 Jens Axboe 提交于 2月 17, 2017

For blk-mq with scheduling, we can potentially end up with ALL
driver tags assigned and sitting on the flush queues. If we
defer because of an inlfight data request, then we can deadlock
if that data request doesn't already have a tag assigned.

This fixes a deadlock with running the xfs/297 xfstest, where
thousands of syncs can cause the drive queue to stall.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

7520872c

blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers · 64765a75

由 Jens Axboe 提交于 2月 17, 2017

Usually we don't ask the scheduler for work, if we already have
leftovers on the dispatch list. This is done to leave work on
the scheduler side for as long as possible, for proper merging.
But if we do have work leftover but didn't dispatch anything,
then we should ask the scheduler since we could potentially
issue requests from that.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

64765a75

blk-mq: don't special case flush inserts for blk-mq-sched · 0c2a6fe4

由 Jens Axboe 提交于 2月 17, 2017

The current request insertion machinery works just fine for
directly inserting flushes, so no need to special case
this anymore.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

0c2a6fe4

blk-mq-sched: don't add flushes to the head of requeue queue · c7a571b4

由 Jens Axboe 提交于 2月 17, 2017

If we are currently out of driver tags, we don't want to add a
new flush (without a tag) to the head of the requeue list. We
want to add it to the back, behind the others that are
potentially also waiting for a tag.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

c7a571b4

blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not · 2aa0f21d

由 Jens Axboe 提交于 2月 17, 2017

Currently we're almost there, but if we dispatch nothing, then we
still return success.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

2aa0f21d

16 2月, 2017 2 次提交

cfq-iosched: don't call wbt_disable_default() with IRQs disabled · 5d7f5ce1

由 Jens Axboe 提交于 2月 16, 2017

wbt_disable_default() calls del_timer_sync() to wait for the wbt
timer to finish before disabling throttling. We can't do this with
IRQs disable. This fixes a lockdep splat on boot, if non-root
cgroups are used.
Reported-by: NGabriel C <nix.or.die@gmail.com>
Fixes: 87760e5e ("block: hook up writeback throttling")
Signed-off-by: NJens Axboe <axboe@fb.com>

5d7f5ce1

block: introduce bio_clone_bioset_partial() · c18a1e09

由 Ming Lei 提交于 2月 14, 2017

md still need bio clone(not the fast version) for behind write,
and it is more efficient to use bio_clone_bioset_partial().

The idea is simple and just copy the bvecs range specified from
parameters.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Signed-off-by: NShaohua Li <shli@fb.com>

c18a1e09

15 2月, 2017 2 次提交

block: do not allow updates through sysfs until registration completes · b410aff2

由 Tahsin Erdogan 提交于 2月 14, 2017

When a new disk shows up, sysfs queue directory is created before elevator
is registered. This allows a user to attempt a scheduler switch even though
the initial registration hasn't completed yet.

In one scenario, blk_register_queue() calls elv_register_queue() and
right before cfq_registered_queue() is called, another process executes
elevator_switch() and replaces q->elevator with deadline scheduler. When
cfq_registered_queue() executes it interprets e->elevator_data as struct
cfq_data even though it is actually struct deadline_data.

Grab q->sysfs_lock in blk_register_queue() to synchronize with sysfs
callers.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b410aff2

Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN · e225c20e

由 Scott Bauer 提交于 2月 14, 2017

When CONFIG_KASAN is enabled, compilation fails:

block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()

Fixes: 455a7b23 ("block: Add Sed-opal library")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e225c20e

14 2月, 2017 1 次提交

elevator: fix loading wrong elevator type for blk-mq devices · d1a987f3

由 Jens Axboe 提交于 2月 14, 2017

The old elevator= boot parameter blindly attempts to load the
same scheduler for mq and !mq devices, leading to a crash if
we specify the wrong one.

Ensure that we only apply this boot parameter to old !mq devices.
Signed-off-by: NJens Axboe <axboe@fb.com>

d1a987f3

11 2月, 2017 3 次提交

blk-mq-sched: don't hold queue_lock when calling exit_icq · 3d492c2e

由 Omar Sandoval 提交于 2月 10, 2017

None of the other blk-mq elevator hooks are called with this lock held.
Additionally, it can lead to circular locking dependencies between
queue_lock and the private scheduler lock.
Reported-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

3d492c2e

block: set make_request_fn manually in blk_mq_update_nr_hw_queues · f6f94300

由 Josef Bacik 提交于 2月 10, 2017

Calling blk_queue_make_request resets a bunch of settings on the
request_queue, but all we really want is to update the make_request_fn,
so do this directly so we don't lose things like the logical and
physical block sizes.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6f94300

blk-mq: pass bio to blk_mq_sched_get_rq_priv · f1ba8261

由 Paolo Valente 提交于 2月 07, 2017

bio is used in bfq-mq's get_rq_priv, to get the request group. We could
pass directly the group here, but I thought that passing the bio was
more general, giving the possibility to get other pieces of information
if needed.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

f1ba8261

09 2月, 2017 3 次提交

block: optionally merge discontiguous discard bios into a single request · 1e739730

由 Christoph Hellwig 提交于 2月 08, 2017

Add a new merge strategy that merges discard bios into a request until the
maximum number of discard ranges (or the maximum discard size) is reached
from the plug merging code.  I/O scheduler merging is not wired up yet
but might also be useful, although not for fast devices like NVMe which
are the only user for now.

Note that for now we don't support limiting the size of each discard range,
but if needed that can be added later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

1e739730

block: enumify ELEVATOR_*_MERGE · 34fe7c05

由 Christoph Hellwig 提交于 2月 08, 2017

Switch these constants to an enum, and make let the compiler ensure that
all callers of blk_try_merge and elv_merge handle all potential values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

34fe7c05

block: move req_set_nomerge to blk.h · 6cf7677f

由 Christoph Hellwig 提交于 2月 08, 2017

This makes it available outside of blk-merge.c, and inlining such a trivial
helper seems pretty useful to start with.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6cf7677f

07 2月, 2017 3 次提交

blk-mq-sched: (un)register elevator when (un)registering queue · 80c6b157

由 Omar Sandoval 提交于 2月 06, 2017

I noticed that when booting with a default blk-mq I/O scheduler, the
/sys/block/*/queue/iosched directory was missing. However, switching
after boot did create the directory. This is because we skip the initial
elevator register/unregister when we don't have a ->request_fn(), but we
should still do it for the ->mq_ops case.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

80c6b157

block: Add Sed-opal library · 455a7b23

由 Scott Bauer 提交于 2月 03, 2017

This patch implements the necessary logic to bring an Opal
enabled drive out of a factory-enabled into a working
Opal state.

This patch set also enables logic to save a password to
be replayed during a resume from suspend.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NRafael Antognolli <Rafael.Antognolli@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

455a7b23

block: don't try Write Same from __blkdev_issue_zeroout · eeeefd41

由 Christoph Hellwig 提交于 2月 05, 2017

Write Same can return an error asynchronously if it turns out the
underlying SCSI device does not support Write Same, which makes a
proper fallback to other methods in __blkdev_issue_zeroout impossible.
Thus only issue a Write Same from blkdev_issue_zeroout an don't try it
at all from __blkdev_issue_zeroout as a non-invasive workaround.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Fixes: e73c23ff ("block: add async variant of blkdev_issue_zeroout")
Tested-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

eeeefd41

04 2月, 2017 2 次提交

block: free merged request in the caller · e4d750c9

由 Jens Axboe 提交于 2月 03, 2017

If we end up doing a request-to-request merge when we have completed
a bio-to-request merge, we free the request from deep down in that
path. For blk-mq-sched, the merge path has to hold the appropriate
lock, but we don't need it for freeing the request. And in fact
holding the lock is problematic, since we are now calling the
mq sched put_rq_private() hook with the lock held. Other call paths
do not hold this lock.

Fix this inconsistency by ensuring that the caller frees a merged
request. Then we can do it outside of the lock, making it both more
efficient and fixing the blk-mq-sched problem of invoking parts of
the scheduler with an unknown lock state.
Reported-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

e4d750c9

blk-merge: return the merged request · b973cb7e

由 Jens Axboe 提交于 2月 02, 2017

When we attempt to merge request-to-request, we return a 0/1 if we
ended up merging or not. Change that to return the pointer to the
request that we freed. We will use this to move the freeing of
that request out of the merge logic, so that callers can drop
locks before freeing the request.

There should be no functional changes in this patch.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>

b973cb7e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功