提交 · 88022d7201e96b43f1754b0358fc6bcd8dbdcde1 · openanolis / cloud-kernel

05 11月, 2017 1 次提交

blk-mq: don't handle failure in .get_budget · 88022d72

由 Ming Lei 提交于 11月 05, 2017

It is enough to just check if we can get the budget via .get_budget().
And we don't need to deal with device state change in .get_budget().

For SCSI, one issue to be fixed is that we have to call
scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
to handle the request. And it isn't enough to simply call
blk_mq_end_request() to do that if this request is marked as
RQF_DONTPREP.

Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

88022d72

04 11月, 2017 7 次提交

block: add a poll_fn callback to struct request_queue · ea435e1b

由 Christoph Hellwig 提交于 11月 02, 2017

That we we can also poll non blk-mq queues.  Mostly needed for
the NVMe multipath code, but could also be useful elsewhere.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ea435e1b

block: introduce GENHD_FL_HIDDEN · 8ddcd653

由 Christoph Hellwig 提交于 11月 02, 2017

With this flag a driver can create a gendisk that can be used for I/O
submission inside the kernel, but which is not registered as user
facing block device.  This will be useful for the NVMe multipath
implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ddcd653

block: don't look at the struct device dev_t in disk_devt · 517bf3c3

由 Christoph Hellwig 提交于 11月 02, 2017

The hidden gendisks introduced in the next patch need to keep the dev
field in their struct device empty so that udev won't try to create
block device nodes for them.  To support that rewrite disk_devt to
look at the major and first_minor fields in the gendisk itself instead
of looking into the struct device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

517bf3c3

block: add a blk_steal_bios helper · ef71de8b

由 Christoph Hellwig 提交于 11月 02, 2017

This helpers allows to bounce steal the uncompleted bios from a request so
that they can be reissued on another path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef71de8b

block: provide a direct_make_request helper · f421e1d9

由 Christoph Hellwig 提交于 11月 02, 2017

This helper allows reinserting a bio into a new queue without much
overhead, but requires all queue limits to be the same for the upper
and lower queues, and it does not provide any recursion preventions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f421e1d9

block: add REQ_DRV bit · 96222bcc

由 Christoph Hellwig 提交于 11月 02, 2017

Set aside a bit in the request/bio flags for driver use.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

96222bcc

block: move REQ_NOWAIT · 8977f563

由 Christoph Hellwig 提交于 11月 02, 2017

This flag should be before the operation-specific REQ_NOUNMAP bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8977f563

01 11月, 2017 4 次提交

nvme-fc: add a dev_loss_tmo field to the remoteport · ac7fe82b

由 James Smart 提交于 10月 25, 2017

Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device
connectivity loss.

The transport initializes the value in the nvme_fc_register_remoteport()
call. If the value is not set, a default of 60s is set.

Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine,
which allows the lldd to dynamically update the value on an existing
remoteport.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ac7fe82b

blk-mq-sched: improve dispatching from sw queue · b347689f

由 Ming Lei 提交于 10月 14, 2017

SCSI devices use host-wide tagset, and the shared driver tag space is
often quite big. However, there is also a queue depth for each lun(
.cmd_per_lun), which is often small, for example, on both lpfc and
qla2xxx, .cmd_per_lun is just 3.

So lots of requests may stay in sw queue, and we always flush all
belonging to same hw queue and dispatch them all to driver.
Unfortunately it is easy to cause queue busy because of the small
.cmd_per_lun.  Once these requests are flushed out, they have to stay in
hctx->dispatch, and no bio merge can happen on these requests, and
sequential IO performance is harmed.

This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request
from a sw queue, so that we can dispatch them in scheduler's way. We can
then avoid dequeueing too many requests from sw queue, since we don't
flush ->dispatch completely.

This patch improves dispatching from sw queue by using the .get_budget
and .put_budget callbacks.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b347689f

blk-mq: introduce .get_budget and .put_budget in blk_mq_ops · de148297

由 Ming Lei 提交于 10月 14, 2017

For SCSI devices, there is often a per-request-queue depth, which needs
to be respected before queuing one request.

Currently blk-mq always dequeues the request first, then calls
.queue_rq() to dispatch the request to lld. One obvious issue with this
approach is that I/O merging may not be successful, because when the
per-request-queue depth can't be respected, .queue_rq() has to return
BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
list. This means it never gets a chance to be merged with other IO.

This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
then we can try to get reserved budget first before dequeuing request.
If the budget for queueing I/O can't be satisfied, we don't need to
dequeue request at all. Hence the request can be left in the IO
scheduler queue, for more merging opportunities.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de148297

sbitmap: introduce __sbitmap_for_each_set() · 7930d0a0

由 Ming Lei 提交于 10月 14, 2017

For blk-mq, we need to be able to iterate software queues starting
from any queue in a round robin fashion, so introduce this helper.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7930d0a0

27 10月, 2017 1 次提交

nvme-fc: remove NVME_FC_MAX_SEGMENTS · ecad0d2c

由 James Smart 提交于 10月 23, 2017

The define is an arbitrary limit to the io size on the initiator,
capping the io to 1MB-4KB.

Remove the define from the transport. I/O size will solely be limited
by the LLDD sg limits.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ecad0d2c

26 10月, 2017 1 次提交

elevator: allow name aliases · 8ac0d9a8

由 Jens Axboe 提交于 10月 25, 2017

Since we now lookup elevator types with the appropriate multiqueue
capability, allow schedulers to register with an alias alongside
the real name. This is in preparation for allowing 'mq-deadline'
to register an alias of 'deadline' as well.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8ac0d9a8

19 10月, 2017 2 次提交

block: remove blk_mq_reinit_tagset · dab7487b

由 Sagi Grimberg 提交于 10月 11, 2017

No callers left.
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

dab7487b

block: introduce blk_mq_tagset_iter · 149e10f8

由 Sagi Grimberg 提交于 10月 11, 2017

Iterator helper to apply a function on all the
tags in a given tagset. export it as it will be used
outside the block layer later on.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

149e10f8

13 10月, 2017 5 次提交

lightnvm: implement generic path for sync I/O · 1a94b2d4

由 Javier González 提交于 10月 13, 2017

Implement a generic path for sending sync I/O on LightNVM. This allows
to reuse the standard synchronous path trough blk_execute_rq(), instead
of implementing a wait_for_completion on the target side (e.g., pblk).
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1a94b2d4

lightnvm: remove stale extern and unused exported symbols · eb6f168f

由 Rakesh Pandit 提交于 10月 13, 2017

Not all exported symbols are being used outside core and there were
some stale entries in lightnvm.h
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb6f168f

lightnvm: remove unused argument from nvm_set_tgt_bb_tbl · ef56b9ce

由 Rakesh Pandit 提交于 10月 13, 2017

vblk isn't being used anyway and if we ever have a usecase we can
introduce this again.  This makes the logic easier and removes
unnecessary checks.
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef56b9ce

lightnvm: prevent target type module removal when in use · 90014829

由 Rakesh Pandit 提交于 10月 13, 2017

If target type module e.g. pblk here is unloaded (rmmod) while module
is in use (after creating target) system crashes.  We fix this by
using module API refcnt.
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

90014829

fs/block_dev: remove vfs_msg() interface · 7f66721a

由 Rakesh Pandit 提交于 10月 12, 2017

Replaced by pr_err usage in commit ef510424 ("block, dax: move
"select DAX" from BLOCK to FS_DAX")
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Acked-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7f66721a

11 10月, 2017 1 次提交

blk-stat: delete useless code · eca8b53a

由 Shaohua Li 提交于 10月 06, 2017

Fix two issues:
- the per-cpu stat flush is unnecessary, nobody uses per-cpu stat except
  sum it to global stat. We can do the calculation there. The flush just
  wastes cpu time.
- some fields are signed int/s64. I don't see the point.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eca8b53a

10 10月, 2017 1 次提交

writeback: merge try_to_writeback_inodes_sb_nr() into caller · 8264c321

由 Rakesh Pandit 提交于 10月 09, 2017

Since commit 925a6efb ("Btrfs: stop using
try_to_writeback_inodes_sb_nr to flush delalloc") this function hasn't
been used outside so stop exporting it.

In addition we merge it into try_to_writeback_inodes_sb() which is the
only caller.  Also change return type of try_to_writeback_inodes_sb to
void as the only user ext4 doesn't care.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8264c321

06 10月, 2017 2 次提交

backing-dev: kill unused pdflush_proc_obsolete() · 775d3a35

由 Jens Axboe 提交于 10月 06, 2017

After commit b35bd0d9, pdflush_proc_obsolete() is no longer
used. Kill the function and declaration.
Reported-by: NRakesh Pandit <rakesh@tuxera.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

775d3a35

block: remove QUEUE_FLAG_STACKABLE · 5fdee212

由 Christoph Hellwig 提交于 10月 05, 2017

We already have a queue_is_rq_based helper to check if a request_queue
is request based, so we can remove the flag for it.
Acked-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5fdee212

05 10月, 2017 1 次提交

writeback: eliminate work item allocation in bd_start_writeback() · 85009b4f

由 Jens Axboe 提交于 9月 30, 2017

Handle start-all writeback like we do periodic or kupdate
style writeback - by marking the bdi_writeback as needing a full
flush, and simply waking the thread. This eliminates the need to
allocate and queue a specific work item just for this purpose.

After this change, we truly only ever have one of them running at
any point in time. We mark the need to start all flushes, and the
writeback thread will clear it once it has processed the request.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85009b4f

04 10月, 2017 1 次提交

nvme-fc: add uevent for auto-connect · eaefd5ab

由 James Smart 提交于 9月 14, 2017

To support auto-connecting to FC-NVME devices upon their dynamic
appearance, add a uevent that can kick off connection scripts.
uevent is posted against the fc_udev device.

patch set tested with the following rule to kick an nvme-cli connect-all
for the FC initiator and FC target ports. This is just an example for
testing and not intended for real life use.

ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"

I will post proposed udev/systemd scripts for possible kernel support.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

eaefd5ab

03 10月, 2017 6 次提交

writeback: only allow one inflight and pending full flush · aac8d41c

由 Jens Axboe 提交于 9月 28, 2017

When someone calls wakeup_flusher_threads() or
wakeup_flusher_threads_bdi(), they schedule writeback of all dirty
pages in the system (or on that bdi). If we are tight on memory, we
can get tons of these queued from kswapd/vmscan. This causes (at
least) two problems:

1) We consume a ton of memory just allocating writeback work items.
   We've seen as much as 600 million of these writeback work items
   pending. That's a lot of memory to pointlessly hold hostage,
   while the box is under memory pressure.

2) We spend so much time processing these work items, that we
   introduce a softlockup in writeback processing. This is because
   each of the writeback work items don't end up doing any work (it's
   hard when you have millions of identical ones coming in to the
   flush machinery), so we just sit in a tight loop pulling work
   items and deleting/freeing them.

Fix this by adding a 'start_all' bit to the writeback structure, and
set that when someone attempts to flush all dirty pages. The bit is
cleared when we start writeback on that work item. If the bit is
already set when we attempt to queue !nr_pages writeback, then we
simply ignore it.

This provides us one full flush in flight, with one pending as well,
and makes for more efficient handling of this type of writeback.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aac8d41c

writeback: make wb_start_writeback() static · 9dfb176f

由 Jens Axboe 提交于 9月 28, 2017

We don't have any callers outside of fs-writeback.c anymore,
make it private.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9dfb176f

writeback: provide a wakeup_flusher_threads_bdi() · 595043e5

由 Jens Axboe 提交于 9月 28, 2017

Similar to wakeup_flusher_threads(), except that we only wake
up the flusher threads on the specified backing device.

No functional changes in this patch.
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: NChris Mason <clm@fb.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

595043e5

J
writeback: remove 'range_cyclic' argument for wb_start_writeback() · 47410d88
由 Jens Axboe 提交于 9月 28, 2017
```
All the callers pass in 'true' for range_cyclic, so kill the
argument.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
47410d88

fs: kill 'nr_pages' argument from wakeup_flusher_threads() · 9ba4b2df

由 Jens Axboe 提交于 9月 20, 2017

Everybody is passing in 0 now, let's get rid of the argument.
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ba4b2df

buffer: have alloc_page_buffers() use __GFP_NOFAIL · 640ab98f

由 Jens Axboe 提交于 9月 27, 2017

Instead of adding weird retry logic in that function, utilize
__GFP_NOFAIL to ensure that the vm takes care of handling any
potential retries appropriately. This means we don't have to
call free_more_memory() from here.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

640ab98f

27 9月, 2017 1 次提交

block: fix a build error · 0b508bc9

由 Shaohua Li 提交于 9月 26, 2017

The code is only for blkcg not for all cgroups

Fixes: d4478e92 ("block/loop: make loop cgroup aware")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b508bc9

26 9月, 2017 4 次提交

block: make blkcg aware of kthread stored original cgroup info · 902ec5b6

由 Shaohua Li 提交于 9月 14, 2017

bio_blkcg is the only API to get cgroup info for a bio right now. If
bio_blkcg finds current task is a kthread and has original blkcg
associated, it will use the css instead of associating the bio to
current task. This makes it possible that kthread dispatches bios on
behalf of other threads.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

902ec5b6

blkcg: delete unused APIs · af551fb3

由 Shaohua Li 提交于 9月 14, 2017

Nobody uses the APIs right now.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af551fb3

kthread: add a mechanism to store cgroup info · 05e3db95

由 Shaohua Li 提交于 9月 14, 2017

kthread usually runs jobs on behalf of other threads. The jobs should be
charged to cgroup of original threads. But the jobs run in a kthread,
where we lose the cgroup context of original threads. The patch adds a
machanism to record cgroup info of original threads in kthread context.
Later we can retrieve the cgroup info and attach the cgroup info to jobs.

Since this mechanism is only required by kthread, we store the cgroup
info in kthread data instead of generic task_struct.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

05e3db95

nvmet-fc: sync header templates with comments · 6b71f9e1

由 James Smart 提交于 9月 20, 2017

Comments were incorrect:
- defer_rcv was in host port template. moved to target port template
- Added Mandatory statements for target port template items
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b71f9e1

25 9月, 2017 2 次提交

nvme: add transport SGL definitions · d85cf207

由 James Smart 提交于 9月 07, 2017

Add transport SGL defintions from NVMe TP 4008, required for
the final NVMe-FC standard.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d85cf207

nvme.h: remove FC transport-specific error values · c98cb3bd

由 James Smart 提交于 9月 07, 2017

The NVM express group recinded the reserved range for the transport.
Remove the FC-centric values that had been defined.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c98cb3bd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功