提交 · 20d0dfe65afd3fb59d14720570a6921eb6bf5c1f · openanolis / cloud-kernel

02 7月, 2017 3 次提交

nvme: move ctrl cap to struct nvme_ctrl · 20d0dfe6

由 Sagi Grimberg 提交于 6月 27, 2017

All transports use either a private cache of controller cap or an on-stack
copy, move it to the generic struct nvme_ctrl. In the future it will also
be maintained by the core.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

20d0dfe6

nvme: move queue_count to the nvme_ctrl · d858e5f0

由 Sagi Grimberg 提交于 4月 24, 2017

All all transports use the queue_count in exactly the same, so move it to
the generic struct nvme_ctrl. In the future it will also be maintained by
the core.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-By: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d858e5f0

nvme: Quirks for PM1725 controllers · d554b5e1

由 Martin K. Petersen 提交于 6月 27, 2017

PM1725 controllers have a couple of quirks that need to be handled in
the driver:

 - I/O queue depth must be limited to 64 entries on controllers that do
   not report MQES.

 - The host interface registers go offline briefly while resetting the
   chip. Thus a delay is needed before checking whether the controller
   is ready.

Note that the admin queue depth is also limited to 64 on older versions
of this board. Since our NVME_AQ_DEPTH is now 32 that is no longer an
issue.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d554b5e1

01 7月, 2017 10 次提交

lightnvm: pblk: set line bitmap check under debug · a84ebb83

由 Javier González 提交于 6月 30, 2017

Do bitmap checks only when debug mode is enable. The line bitmap used
for mapping to physical addresses is fairly large (~512KB) and it is
expensive to do this checks on the fast path.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a84ebb83

lightnvm: pblk: verify that cache read is still valid · 07698466

由 Javier González 提交于 6月 30, 2017

When a read is directed to the cache, we risk that the lba has been
updated during the time we made the L2P table lookup and the time we are
actually reading form the cache. We intentionally not hold the L2P lock
not to block other threads.

While strict ordering is not a guarantee at this level (unless REQ_FLUSH
has been previously issued), we have experience that some databases that
have recently implemented direct I/O support, issue metadata reads very
close to the writes, without issuing a fsync in the middle. An easy way
to support them while they is to make an extra effort and check the L2P
map right before reading the cache.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07698466

lightnvm: pblk: add initialization check · b5e063a2

由 Javier González 提交于 6月 30, 2017

Add a sanity check to the pblk initialization sequence in order to
ensure that enough LUNs have been allocated to store the line metadata.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b5e063a2

lightnvm: pblk: remove target using async. I/Os · ee8d5c1a

由 Javier González 提交于 6月 30, 2017

When removing a pblk instance, pad the current line using asynchronous
I/O. This reduces the removal time from ~1 minute in the worst case to a
couple of seconds.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ee8d5c1a

lightnvm: pblk: use vmalloc for GC data buffer · de54e703

由 Javier González 提交于 6月 30, 2017

For now, we allocate a per I/O buffer for GC data. Since the potential
size of the buffer is 256KB and GC is not in the fast path, do this
allocation with vmalloc. This puts lets pressure on the memory
allocator at no performance cost.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de54e703

lightnvm: pblk: use right metadata buffer for recovery · 8224cbd8

由 Javier González 提交于 6月 30, 2017

Fix bad metadata buffer assignations introduced when refactoring the
medatada write path.

Fixes: dd2a4343 lightnvm: pblk: sched. metadata on write thread
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8224cbd8

lightnvm: pblk: schedule if data is not ready · 10888129

由 Javier González 提交于 6月 30, 2017

When user threads place data into the write buffer, they reserve space
and do the memory copy out of the lock. As a consequence, when the write
thread starts persisting data, there is a chance that it is not copied
yet. In this case, avoid polling, and schedule before retrying.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

10888129

lightnvm: pblk: remove unused return variable · 653cbb84

由 Javier González 提交于 6月 30, 2017

Remove unused variable.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

653cbb84

lightnvm: pblk: fix double-free on pblk init · 2950e7e6

由 Javier González 提交于 6月 30, 2017

Prevent pblk->lines being double freed in case of an error during pblk
initialization.

Fixes: dd2a4343: "lightnvm: pblk: sched. metadata on write thread"
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2950e7e6

lightnvm: pblk: fix bad le64 assignations · f417aa0b

由 Javier González 提交于 6月 30, 2017

Use the right types and conversions on le64 variables. Reported by
sparse.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f417aa0b

29 6月, 2017 2 次提交

nvme: Makefile: remove dead build rule · a2b93775

由 Valentin Rothberg 提交于 6月 29, 2017

Remove dead build rule for drivers/nvme/host/scsi.c which has been
removed by commit ("nvme: Remove SCSI translations").
Signed-off-by: NValentin Rothberg <vrothberg@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a2b93775

blk-mq: map all HWQ also in hyperthreaded system · fe631457

由 Max Gurtovoy 提交于 6月 29, 2017

This patch performs sequential mapping between CPUs and queues.
In case the system has more CPUs than HWQs then there are still
CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs
and their siblings to the same HWQ.
This actually fixes a bug that found unmapped HWQs in a system with
2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs)
running NVMEoF (opens upto maximum of 64 HWQs).

Performance results running fio (72 jobs, 128 iodepth)
using null_blk (w/w.o patch):

bs IOPS(read submit_queues=72) IOPS(write submit_queues=72) IOPS(read submit_queues=24) IOPS(write submit_queues=24)
----- ---------------------------- ------------------------------ ---------------------------- -----------------------------
512 4890.4K/4723.5K 4524.7K/4324.2K 4280.2K/4264.3K 3902.4K/3909.5K
1k 4910.1K/4715.2K 4535.8K/4309.6K 4296.7K/4269.1K 3906.8K/3914.9K
2k 4906.3K/4739.7K 4526.7K/4330.6K 4301.1K/4262.4K 3890.8K/3900.1K
4k 4918.6K/4730.7K 4556.1K/4343.6K 4297.6K/4264.5K 3886.9K/3893.9K
8k 4906.4K/4748.9K 4550.9K/4346.7K 4283.2K/4268.8K 3863.4K/3858.2K
16k 4903.8K/4782.6K 4501.5K/4233.9K 4292.3K/4282.3K 3773.1K/3773.5K
32k 4885.8K/4782.4K 4365.9K/4184.2K 4307.5K/4289.4K 3780.3K/3687.3K
64k 4822.5K/4762.7K 2752.8K/2675.1K 4308.8K/4312.3K 2651.5K/2655.7K
128k 2388.5K/2313.8K 1391.9K/1375.7K 2142.8K/2152.2K 1395.5K/1374.2K
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe631457

28 6月, 2017 25 次提交

nvmet-rdma: register ib_client to not deadlock in device removal · f1d4ef7d

由 Sagi Grimberg 提交于 6月 27, 2017

We can deadlock in case we got to a device removal
event on a queue which is already in the process of
destroying the cm_id is this is blocking until all
events on this cm_id will drain. On the other hand
we cannot guarantee that rdma_destroy_id was invoked
as we only have indication that the queue disconnect
flow has been queued (the queue state is updated before
the realease work has been queued).

So, we leave all the queue removal to a separate ib_client
to avoid this deadlock as ib_client device removal is in
a different context than the cm_id itself.
Reported-by: NShiraz Saleem <shiraz.saleem@intel.com>
Tested-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f1d4ef7d

nvme_fc: fix error recovery on link down. · 69fa9646

由 James Smart 提交于 6月 21, 2017

Currently, the fc transport invokes nvme_fc_error_recovery() on every
io in which the transport detects an error.  Which means:
a) it's really noisy on large io loads that all get hit by a link down.
b) we repeatively call nvme_stop_queues() even though queues are
 stopped upon the first error or as first steps of reset_work.

Correct by:
Errors are only meaningful if the controller is in the LIVE state.
Thus, enact the reset_work only if LIVE. If called repeatively, state
will have already transitioned.
There's no need to stop the queues here. Let the first steps of
reset_work do the queue stopping.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69fa9646

nvmet_fc: fix crashes on bad opcodes · 188f7e8a

由 James Smart 提交于 6月 15, 2017

if a nvme command is issued with an opcode that is not supported by
the target (example: opcode 21 - detach namespace), the target
crashes due to a null pointer.

nvmet_req_init() detects the bad opcode and immediately calls the nvme
command done routine with an error status, allowing the transport to
send the response. However, the FC transport was aborting the command
on error, so the abort freed the lldd point, but the rsp transmit path
referenced it psot the free.

Fix by removing the abort call on nvmet_req_init() failure.
The completion response will be sent with an error status code.

As the completion path will terminate the io, ensure the data_sg
lists show an unused state so that teardown paths are successful.
Signed-off-by: NPaul Ely <Paul.Ely@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

188f7e8a

nvme_fc: Fix crash when nvme controller connection fails. · 0b5a7669

由 James Smart 提交于 6月 15, 2017

If a controller connection is attempted (say to a subsystem that
does not exist), the first attempt errors out.  If another connect
is attempted, it crashes.

Issue is the prior controller has yet execute it's final put, thus
its still on lists. However, opts points on it have been cleared, thus
causing the crash if they are referenced.

Fix is to add the missing put after the nvme_uninit_ctrl() call on
the attachment failure.
Signed-off-by: NPaul Ely <Paul.Ely@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b5a7669

nvme_fc: replace ioabort msleep loop with completion · 36715cf4

由 James Smart 提交于 5月 22, 2017

Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Wait for io aborts to complete wait converted from msleep look to
using a struct completion.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

36715cf4

nvme_fc: fix double calls to nvme_cleanup_cmd() · b4dfd6ee

由 James Smart 提交于 6月 21, 2017

Current fc transport code, on io termination, is calling
nvme_cleanup_cmd() followed by the transport dma unmap routine
which also calls nvme_cleanup_cmd(). Which means two kfrees occur
on the same address, raising havoc. This resulted in odd data errors,
effectively corruption..

Fix by removing the extraneous double calls. Call now occurs only in
teardown paths and as part of dma unmap routine.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NEwan D. Milne <emilne@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b4dfd6ee

nvme-fabrics: verify that a controller returns the correct NQN · b1465c63

由 Christoph Hellwig 提交于 6月 26, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1465c63

nvme: simplify nvme_dev_attrs_are_visible · 49d3d50b

由 Christoph Hellwig 提交于 6月 26, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49d3d50b

nvme: read the subsystem NQN from Identify Controller · 180de007

由 Christoph Hellwig 提交于 6月 26, 2017

NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures.  Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it.  For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

180de007

nvme: remove a misleading comment on struct nvme_ns · 942fbab4

由 Christoph Hellwig 提交于 6月 26, 2017

While a NVMe Namespace is somewhat similar to a SCSI Logical Unit (and not
a Logical Unit Number anyway) there are subtile differences.  Remove the
misleading comment.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grmberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

942fbab4

nvme: explicitly disable APST on quirked devices · 76a5af84

由 Kai-Heng Feng 提交于 6月 26, 2017

A user reports APST is enabled, even when the NVMe is quirked or with
option "default_ps_max_latency_us=0".

The current logic will not set APST if the device is quirked. But the
NVMe in question will enable APST automatically.

Separate the logic "apst is supported" and "to enable apst", so we can
use the latter one to explicitly disable APST at initialiaztion.

BugLink: https://bugs.launchpad.net/bugs/1699004Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

76a5af84

nvme: use a single NVME_AQ_DEPTH and relax it to 32 · 7aa1f427

由 Sagi Grimberg 提交于 6月 18, 2017

No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7aa1f427

nvme: add hostid token to fabric options · 6bfe0425

由 Johannes Thumshirn 提交于 6月 20, 2017

Currently we have no way to define a stable host-id but always use the one
which is randomly generated when we add the host or use the default host.

Provide a "hostid=%s" for user-space to pass in a persistent host-id which
overrides the randomly generated one.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6bfe0425

nvme: Remove SCSI translations · 3f7f25a9

由 Keith Busch 提交于 6月 20, 2017

The SCSI-to-NVMe translations were added to assist storage applications
utilizing SG_IO transitioning to NVMe. It was always recommended,
however, to use native NVMe for device management as too much is lost
in translation and the maintenance burden in keeping this kludgey
layer around has been neglected such that much of the translations are
completely broken.

This patch removes SG_IO handling from NVMe to avoid any confusion
regarding maintenance support for this interface. The config option for
NVMe SCSI emulation has been disabled by default since 4.5. The driver
has supported native nvme user commands since the beginning, and native
tooling is publicly available for use or as reference for anyone writing
their own tools, so there's no excuse for hanging onto a broken crutch.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f7f25a9

nvme-pci: open-code polling logic in nvme_poll · 442e19b7

由 Sagi Grimberg 提交于 6月 18, 2017

Given that the code is simple enough it seems better
then passing a tag by reference for each call site, also
we can now get rid of __nvme_process_cq.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

442e19b7

nvme-pci: factor out the cqe reading mechanics from __nvme_process_cq · 920d13a8

由 Sagi Grimberg 提交于 6月 18, 2017

Also, maintain a consumed counter to rely on for doorbell and
cqe_seen update instead of directly relying on the cq head and phase.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

920d13a8

nvme-pci: factor out cqe handling into a dedicated routine · 83a12fb7

由 Sagi Grimberg 提交于 6月 18, 2017

Makes the code slightly more readable.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

83a12fb7

nvme-pci: Introduce nvme_ring_cq_doorbell · eb281c82

由 Sagi Grimberg 提交于 6月 18, 2017

Nice abstraction of the actual mechanics of how to do it.
Note the change that we call it after we assign nvmeq->cq_head
to avoid passing it.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eb281c82

fs/fcntl: use copy_to/from_user() for u64 types · 5657cb07

由 Jens Axboe 提交于 6月 28, 2017

Some architectures (at least PPC) doesn't like get/put_user with
64-bit types on a 32-bit system. Use the variably sized copy
to/from user variants instead.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Fixes: c75b1d94 ("fs: add fcntl() interface for setting/getting write life time hints")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5657cb07

drbd: Drop unnecessary static · e9d5d4a0

由 Julia Lawall 提交于 6月 27, 2017

Drop static on a local variable, when the variable is initialized before
any use, on every possible execution path through the function.  The
static has no benefit, and dropping it reduces the code size.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@bad exists@
position p;
identifier x;
type T;
@@

static T x@p;
...
x = <+...x...+>

@@
identifier x;
expression e;
type T;
position p != bad.p;
@@

-static
 T x@p;
 ... when != x
     when strict
?x = e;
// </smpl>

The change in code size is indicates by the following output from the size
command.

before:
   text    data     bss     dec     hex filename
  67299    2291    1056   70646   113f6 drivers/block/drbd/drbd_nl.o

after:
   text    data     bss     dec     hex filename
  67283    2291    1056   70630   113e6 drivers/block/drbd/drbd_nl.o
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9d5d4a0

block, bfq: update wr_busy_queues if needed on a queue split · 13c931bd

由 Paolo Valente 提交于 6月 27, 2017

This commit fixes a bug triggered by a non-trivial sequence of
events. These events are briefly described in the next two
paragraphs. The impatiens, or those who are familiar with queue
merging and splitting, can jump directly to the last paragraph.

On each I/O-request arrival for a shared bfq_queue, i.e., for a
bfq_queue that is the result of the merge of two or more bfq_queues,
BFQ checks whether the shared bfq_queue has become seeky (i.e., if too
many random I/O requests have arrived for the bfq_queue; if the device
is non rotational, then random requests must be also small for the
bfq_queue to be tagged as seeky). If the shared bfq_queue is actually
detected as seeky, then a split occurs: the bfq I/O context of the
process that has issued the request is redirected from the shared
bfq_queue to a new non-shared bfq_queue. As a degenerate case, if the
shared bfq_queue actually happens to be shared only by one process
(because of previous splits), then no new bfq_queue is created: the
state of the shared bfq_queue is just changed from shared to non
shared.

Regardless of whether a brand new non-shared bfq_queue is created, or
the pre-existing shared bfq_queue is just turned into a non-shared
bfq_queue, several parameters of the non-shared bfq_queue are set
(restored) to the original values they had when the bfq_queue
associated with the bfq I/O context of the process (that has just
issued an I/O request) was merged with the shared bfq_queue. One of
these parameters is the weight-raising state.

If, on the split of a shared bfq_queue,
1) a pre-existing shared bfq_queue is turned into a non-shared
bfq_queue;
2) the previously shared bfq_queue happens to be busy;
3) the weight-raising state of the previously shared bfq_queue happens
to change;
the number of weight-raised busy queues changes. The field
wr_busy_queues must then be updated accordingly, but such an update
was missing. This commit adds the missing update.
Reported-by: NLuca Miccio <lucmiccio@gmail.com>
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

13c931bd

mmc/block: remove a call to blk_queue_bounce_limit · 8298912b

由 Christoph Hellwig 提交于 6月 19, 2017

BLK_BOUNCE_ANY is the defauly now, so the call is superflous.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8298912b

dm: don't set bounce limit · 41341afa

由 Christoph Hellwig 提交于 6月 19, 2017

Now all queues allocators come without abounce limit by default,
dm doesn't have to override this anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

41341afa

block: don't set bounce limit in blk_init_queue · 8fc45044

由 Christoph Hellwig 提交于 6月 19, 2017

Instead move it to the callers.  Those that either don't use bio_data() or
page_address() or are specific to architectures that do not support highmem
are skipped.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8fc45044

block: don't set bounce limit in blk_init_allocated_queue · 0bf6595e

由 Christoph Hellwig 提交于 6月 19, 2017

And just move it into scsi_transport_sas which needs it due to low-level
drivers directly derferencing bio_data, and into blk_init_queue_node,
which will need a further push into the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bf6595e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功