提交 · a0a931d6a2c1fbc5d5966ebf0e7a043748692c22 · openanolis / cloud-kernel

23 5月, 2015 1 次提交

NVMe: Fix obtaining command result · a0a931d6

由 Keith Busch 提交于 5月 22, 2015

Replaces req->sense_len usage, which is not owned by the LLD, to
req->special to contain the command result for driver created commands,
and sets the result unconditionally on completion.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Fixes: d29ec824 ("nvme: submit internal commands through the block layer")
Signed-off-by: NJens Axboe <axboe@fb.com>

a0a931d6

22 5月, 2015 3 次提交

nvme: submit internal commands through the block layer · d29ec824

由 Christoph Hellwig 提交于 5月 22, 2015

Use block layer queues with an internal cmd_type to submit internally
generated NVMe commands. This both simplifies the code a lot and allow
for a better structure. For example now the LighNVM code can construct
commands without knowing the details of the underlying I/O descriptors.
Or a future NVMe over network target could inject commands, as well as
could the SCSI translation and ioctl code be reused for such a beast.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d29ec824

nvme: store a struct device pointer in struct nvme_dev · e75ec752

由 Christoph Hellwig 提交于 5月 22, 2015

Most users want the generic device, so store that in struct nvme_dev
instead of the pci_dev.  This also happens to be a nice step towards
making some code reusable for non-PCI transports.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e75ec752

nvme: consolidate synchronous command submission helpers · f705f837

由 Christoph Hellwig 提交于 5月 22, 2015

Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default.  This will allow
adding a timeout to the pass through path later on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f705f837

19 5月, 2015 1 次提交

nvme: disable irqs in nvme_freeze_queues · cddcd72b

由 Christoph Hellwig 提交于 5月 07, 2015

The queue_lock needs to be taken with irqs disabled.  This is mostly
due to the old pre blk-mq usage pattern, but we've also picked it up
in most of the few places where we use the queue_lock with blk-mq.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

cddcd72b

08 4月, 2015 3 次提交

NVMe: Meta data handling through submit io ioctl · a67a9513

由 Keith Busch 提交于 4月 07, 2015

This adds support for the extended metadata formats through the submit
IO ioctl, and simplifies the rest when using a separate metadata format.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a67a9513

NVMe: Remove check for null · 44722802

由 Keith Busch 提交于 4月 07, 2015

Checking fails static analysis due to additional arithmetic prior to
the NULL check. Mapping doesn't return NULL here anyway, so removing
the check.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

44722802

NVMe: Fix error handling of class_create("nvme") · c727040b

由 Alexey Khoroshilov 提交于 3月 07, 2015

class_create() returns ERR_PTR on failure,
so IS_ERR() should be used instead of check for NULL.

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c727040b

01 4月, 2015 5 次提交

NVMe: increase depth of admin queue · d31af0a3

由 Jens Axboe 提交于 3月 06, 2015

Usually the admin queue depth of 64 is plenty, but for some use cases we
really need it larger. Examples are use cases like MAT, where you have
to touch all of NAND for init/format like purposes. In those cases, we
see a good 2x increase with an increased queue depth.
Signed-off-by: NJens Axboe <axboe@fb.com>
Acked-by: NKeith Busch <keith.busch@intel.com>

d31af0a3

nvme: Fix PRP list calculation for non-4k system page size · f137e0f1

由 Murali Iyer 提交于 3月 26, 2015

PRP list calculation is supposed to be based on device's page size.
Systems with page size larger than device's page size cause corruption
to the name space as well as system memory with out this fix.
Systems like x86 might not experience this issue because it uses
PAGE_SIZE of 4K where as powerpc uses PAGE_SIZE of 64k while NVMe device's
page size varies depending upon the vendor.
Signed-off-by: NMurali Iyer <mniyer@us.ibm.com>
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f137e0f1

NVMe: Fix blk-mq hot cpu notification · 1efccc9d

由 Keith Busch 提交于 3月 31, 2015

The driver may issue commands to a device that may never return, so its
request_queue could always have active requests while the controller is
running. Waiting for the queue to freeze could block forever, which is
what blk-mq's hot cpu notification handler was doing when nvme drives
were in use.

This has the nvme driver make the asynchronous event command's tag
reserved and does not keep the request active. We can't have more than
one since the request is released back to the request_queue before the
command is completed. Having only one avoids potential tag collisions,
and reserving the tag for this purpose prevents other admin tasks from
reusing the tag.

I also couldn't think of a scenario where issuing AEN requests single
depth is worse than issuing them in batches, so I don't think we lose
anything with this change.

As an added bonus, doing it this way removes "Cancelling I/O" warnings
observed when unbinding the nvme driver from a device.
Reported-by: NYigal Korman <yigal@plexistor.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1efccc9d

NVMe: embedded iod mask cleanup · fda631ff

由 Chong Yuan 提交于 3月 27, 2015

Remove unused mask in nvme_alloc_iod
Signed-off-by: NChong Yuan <chong.yuan@memblaze.com>
Reviewed-by: NWenbo Wang  <wenbo.wang@memblaze.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

fda631ff

NVMe: Freeze admin queue on device failure · 6df3dbc8

由 Keith Busch 提交于 3月 26, 2015

This fixes a race accessing an invalid address when a controller's admin
queue is in use during a reset for failure or hot removal occurs. The
admin queue will be frozen to prevent new users from entering prior to
the doorbell queue being unmapped.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6df3dbc8

23 3月, 2015 1 次提交

NVMe: Initialize device list head before starting · e6e96d73

由 Keith Busch 提交于 3月 23, 2015

Driver recovery requires the device's list node to have been initialized.

Fixes: https://lkml.org/lkml/2015/3/22/262Reported-by: NSteven Noonan <steven@uplinklabs.net>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e6e96d73

24 2月, 2015 1 次提交

NVMe: Fix for BLK_DEV_INTEGRITY not set · 52b68d7e

由 Keith Busch 提交于 2月 23, 2015

Need to define and use appropriate functions for when BLK_DEV_INTEGRITY
is not set.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

52b68d7e

20 2月, 2015 6 次提交

NVMe: Fix potential corruption on sync commands · 0c0f9b95

由 Keith Busch 提交于 2月 19, 2015

This makes all sync commands uninterruptible and schedules without timeout
so the controller either has to post a completion or the timeout recovery
fails the command. This fixes potential memory or data corruption from
a command timing out too early or woken by a signal. Previously any DMA
buffers mapped for that command would have been released even though we
don't know what the controller is planning to do with those addresses.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

0c0f9b95

NVMe: Remove unused variables · 48328518

由 Keith Busch 提交于 2月 19, 2015

We don't track queues in a llist, subscribe to hot-cpu notifications,
or internally retry commands. Delete the unused artifacts.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

48328518

NVMe: Fix potential corruption during shutdown · 07836e65

由 Keith Busch 提交于 2月 19, 2015

The driver has to end unreturned commands at some point even if the
controller has not provided a completion. The driver tried to be safe by
deleting IO queues prior to ending all unreturned commands. That should
cause the controller to internally abort inflight commands, but IO queue
deletion request does not have to be successful, so all bets are off. We
still have to make progress, so to be extra safe, this patch doesn't
clear a queue to release the dma mapping for a command until after the
pci device has been disabled.

This patch removes the special handling during device initialization
so controller recovery can be done all the time. This is possible since
initialization is not inlined with pci probe anymore.
Reported-by: NNilish Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

07836e65

NVMe: Asynchronous controller probe · 2e1d8448

由 Keith Busch 提交于 2月 12, 2015

This performs the longest parts of nvme device probe in scheduled work.
This speeds up probe significantly when multiple devices are in use.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

2e1d8448

NVMe: Register management handle under nvme class · b3fffdef

由 Keith Busch 提交于 2月 03, 2015

This creates a new class type for nvme devices to register their
management character devices with. This is so we do not rely on miscdev
to provide enough minors for as many nvme devices some people plan to
use. The previous limit was approximately 60 NVMe controllers, depending
on the platform and kernel. Now the limit is 1M, which ought to be enough
for anybody.

Since we have a new device class, it makes sense to attach the block
devices under this as well, so part of this patch moves the management
handle initialization prior to the namespaces discovery.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

b3fffdef

NVMe: Metadata format support · e1e5e564

由 Keith Busch 提交于 2月 19, 2015

Adds support for NVMe metadata formats and exposes block devices for
all namespaces regardless of their format. Namespace formats that are
unusable will have disk capacity set to 0, but a handle to the block
device is created to simplify device management. A namespace is not
usable when the format requires host interleave block and metadata in
single buffer, has no provisioned storage, or has better data but failed
to register with blk integrity.

The namespace has to be scanned in two phases to support separate
metadata formats. The first establishes the sector size and capacity
prior to invoking add_disk. If metadata is required, the capacity will
be temporarilly set to 0 until it can be revalidated and registered with
the integrity extenstions after add_disk completes.

The driver relies on the integrity extensions to provide the metadata
buffer. NVMe requires this be a single physically contiguous region,
so only one integrity segment is allowed per command. If the metadata
is used for T10 PI, the driver provides mappings to save and restore
the reftag physical block translation. The driver provides no-op
functions for generate and verify if metadata is not used for protection
information. This way the setup is always provided by the block layer.

If a request does not supply a required metadata buffer, the command
is failed with bad address. This could only happen if a user manually
disables verify/generate on such a disk. The only exception to where
this is okay is if the controller is capable of stripping/generating
the metadata, which is possible on some types of formats.

The metadata scatter gather list now occupies the spot in the nvme_iod
that used to be used to link retryable IOD's, but we don't do that
anymore, so the field was unused.
Signed-off-by: NKeith Busch <keith.busch@intel.com>

e1e5e564

30 1月, 2015 1 次提交

NVMe: avoid kmalloc/kfree for smaller IO · ac3dd5bd

由 Jens Axboe 提交于 1月 22, 2015

Currently we allocate an nvme_iod for each IO, which holds the
sg list, prps, and other IO related info. Set a threshold of
2 pages and/or 8KB of data, below which we can just embed this
in the per-command pdu in blk-mq. For any IO at or below
NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree.

For higher IOPS, this saves up to 1% of CPU time.
Signed-off-by: NJens Axboe <axboe@fb.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

ac3dd5bd

22 1月, 2015 1 次提交

NVMe: within nvme_free_queues(), delete RCU sychro/deferred free · 121c7ad4

由 kaoudis 提交于 1月 14, 2015

Converting from to blk-queue got rid of the driver's RCU
locking-on-queue, so removing unnecessary RCU locking-on-queue
artefacts.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NKelly Nicole Kaoudis <kaoudis@colorado.edu>
Signed-off-by: NJens Axboe <axboe@fb.com>

121c7ad4

16 1月, 2015 1 次提交

NVMe: cq_vector should be signed · 6222d172

由 Jens Axboe 提交于 1月 15, 2015

This was inadvertently dropped from an earlier commit, otherwise
the check against cq_vector == -1 to prevent double free doesn't
make any sense.

Fixes: 2b25d981Signed-off-by: NJens Axboe <axboe@fb.com>

6222d172

09 1月, 2015 6 次提交

NVMe: Fix locking on abort handling · 7a509a6b

由 Keith Busch 提交于 1月 07, 2015

The queues and device need to be locked when messing with them.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7a509a6b

NVMe: Start and stop h/w queues on reset · c9d3bf88

由 Keith Busch 提交于 1月 07, 2015

This freezes and stops all the queues on device shutdown and restarts
them on resume. This fixes hotplug and reset issues when the controller
is actively being used.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c9d3bf88

NVMe: Command abort handling fixes · cef6a948

由 Keith Busch 提交于 1月 07, 2015

Aborts all requeued commands prior to killing the request_queue. For
commands that time out on a dying request queue, set the "Do Not Retry"
bit on the command status so the command cannot be requeued. Finanally, if
the driver is requested to abort a command it did not start, do nothing.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

cef6a948

NVMe: Admin queue removal handling · 0fb59cbc

由 Keith Busch 提交于 1月 07, 2015

This protects admin queue access on shutdown. When the controller is
disabled, the queue is frozen to prevent new entry, and unfrozen on
resume, and fixes cq_vector signedness to not suspend a queue twice.

Since unfreezing the queue makes it available for commands, it requires
the queue be initialized, so this moves this part after that.

Special handling is done when the device is unresponsive during
shutdown. This can be optimized to not require subsequent commands to
timeout, but saving that fix for later.

This patch also removes the kill signals in this path that were left-over
artifacts from the blk-mq conversion and no longer necessary.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0fb59cbc

NVMe: Reference count admin queue usage · ea191d2f

由 Keith Busch 提交于 1月 07, 2015

Since there is no gendisk associated with the admin queue, the driver
needs to hold a reference to it until all open references to the
controller are closed.

This also combines queue cleanup with freeing the tag set since these
should not be separate.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

ea191d2f

NVMe: Start all requests · c917dfe5

由 Keith Busch 提交于 1月 07, 2015

Once the nvme callback is set for a request, the driver can start it
and make it available for timeout handling. For timed out commands on a
device that is not initialized, this fixes potential deadlocks that can
occur on startup and shutdown when a device is unresponsive since they
can now be cancelled.

Asynchronous requests do not have any expected timeout, so these are
using the new "REQ_NO_TIMEOUT" request flags.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c917dfe5

03 1月, 2015 1 次提交

block: fix checking return value of blk_mq_init_queue · 35b489d3

由 Ming Lei 提交于 1月 02, 2015

Check IS_ERR_OR_NULL(return value) instead of just return value.
Signed-off-by: NMing Lei <ming.lei@canonical.com>

Reduced to IS_ERR() by me, we never return NULL.
Signed-off-by: NJens Axboe <axboe@fb.com>

35b489d3

23 12月, 2014 1 次提交

NVMe: Fix double free irq · 2b25d981

由 Keith Busch 提交于 12月 22, 2014

Sets the vector to an invalid value after it's freed so we don't free
it twice.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2b25d981

12 12月, 2014 2 次提交

NVMe: fix race condition in nvme_submit_sync_cmd() · 849c6e77

由 Jens Axboe 提交于 12月 12, 2014

If we have a race between the schedule timing out and the command
completing, we could have the task issuing the command exit
nvme_submit_sync_cmd() while the irq is running sync_completion().
If that happens, we could be corrupting memory, since the stack
that held 'cmdinfo' is no longer valid.

Fix this by always calling nvme_abort_cmd_info(). Once that call
completes, we know that we have either run sync_completion() if
the completion came in, or that we will never run it since we now
have special_completion() as the command callback handler.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

849c6e77

NVMe: fix retry/error logic in nvme_queue_rq() · fe54303e

由 Jens Axboe 提交于 12月 11, 2014

The logic around retrying and erroring IO in nvme_queue_rq() is broken
in a few ways:

- If we fail allocating dma memory for a discard, we return retry. We
  have the 'iod' stored in ->special, but we free the 'iod'.

- For a normal request, if we fail dma mapping of setting up prps, we
  have the same iod situation. Additionally, we haven't set the callback
  for the request yet, so we also potentially leak IOMMU resources.

Get rid of the ->special 'iod' store. The retry is uncommon enough that
it's not worth optimizing for or holding on to resources to attempt to
speed it up. Additionally, it's usually best practice to free any
request related resources when doing retries.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

fe54303e

11 12月, 2014 3 次提交

NVMe: Fix FS mount issue (hot-remove followed by hot-add) · 285dffc9

由 Indraneel M 提交于 12月 11, 2014

After Hot-remove of a device with a mounted partition,
when the device is hot-added again, the new node reappears
as nvme0n1. Mounting this new node fails with the error:

mount: mount /dev/nvme0n1p1 on /mnt failed: File exists.

The old nodes's FS entries still exist and the kernel can't re-create
procfs and sysfs entries for the new node with the same name.
The patch fixes this issue.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NIndraneel M <indraneel.m@samsung.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

285dffc9

NVMe: fix error return checking from blk_mq_alloc_request() · 97fe3832

由 Jens Axboe 提交于 12月 10, 2014

We return an error pointer or the request, not NULL. Half
the call paths got it right, the others didn't. Fix those up.
Signed-off-by: NJens Axboe <axboe@fb.com>

97fe3832

NVMe: fix freeing of wrong request in abort path · c87fd540

由 Sam Bradshaw 提交于 12月 10, 2014

We allocate 'abort_req', but free 'req' in case of an error
submitting the IO.
Signed-off-by: NSam Bradshaw <sbradshaw@micron.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c87fd540

04 12月, 2014 1 次提交

NVMe: Fix command setup on IO retry · 9af8785a

由 Keith Busch 提交于 12月 03, 2014

On retry, the req->special is pointing to an already setup IOD, but we
still need to setup the command context and callback, otherwise you'll
see false twice completed errors and leak requests.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9af8785a

22 11月, 2014 1 次提交

NVMe: Update module version major number · c78b4713

由 Keith Busch 提交于 11月 21, 2014

It's already near impossible to tell what bits someone is running based on
a 'modinfo nvme', and I don't want to try guessing if someone is running
blk-mq or bio-based. Let's make it obvious with the module version that
the blk-mq conversion is a major change. Future bio-based versions can
increment to 0.10 in a fork if revisions occur.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c78b4713

21 11月, 2014 1 次提交

NVMe: fail pci initialization if the device doesn't have any BARs · be7837e8

由 Jens Axboe 提交于 11月 14, 2014

The PCI init of NVMe doesn't check for valid bars before proceeding
to map and use BAR 0. If the device is hosed (or firmware is), then
we should catch this case and give up early.

This fixes a:

[ 1662.035778] WARNING: CPU: 0 PID: 4 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xa7/0xc0()

and later badness on such a device.
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

be7837e8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功