提交 · 1a6fe74dfd1bb10afb41cbbbdc14890604be42a6 · openeuler / raspberrypi-kernel

25 9月, 2016 2 次提交

nvme: Pass pointers, not dma addresses, to nvme_get/set_features() · 1a6fe74d

由 Andy Lutomirski 提交于 9月 16, 2016

Any user I can imagine that needs a buffer at all will want to pass
a pointer directly.  There are no currently callers that use
buffers, so this change is painless, and it will make it much easier
to start using features that use buffers (e.g. APST).
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Tested-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1a6fe74d

nvme/scsi: Remove power management support · 26501db8

由 Andy Lutomirski 提交于 9月 16, 2016

As far as I can tell, there is basically nothing correct about this
code.  It misinterprets npss (off-by-one).  It hardcodes a bunch of
power states, which is nonsense, because they're all just indices
into a table that software needs to parse.  It completely ignores
the distinction between operational and non-operational states.
And, until 4.8, if all of the above magically succeeded, it would
dereference a NULL pointer and OOPS.

Since this code appears to be useless, just delete it.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Tested-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

26501db8

24 9月, 2016 6 次提交

nvmet: Make dsm number of ranges zero based · 2e5d0baa

由 Alexander Solganik 提交于 9月 21, 2016

This caused the nvmet request data length to be
incorrect.
Signed-off-by: NAlexander Solganik <sashas@lightbitslabs.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2e5d0baa

nvmet: Use direct IO for writes · 9b349b08

由 Sagi Grimberg 提交于 9月 21, 2016

We're designed to work with high-end devices where
direct IO makes perfect sense. We noticed that we
context switch by scheduling kblockd instead of going
directly to the device without REQ_SYNC for writes.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

9b349b08

admin-cmd: Added smart-log command support. · 2d79c7dc

由 Chaitanya Kulkarni 提交于 9月 01, 2016

This patch implements the support for smart-log command
(NVM Express 1.2.1-section 5.10.1.2 SMART / Health Information
(Log Identifier 02h)) on the target for NVMe over Fabric.

In current implementation host can retrieve following statistics:-
1. Data Units Read.
2. Data Units Written.
3. Host Read Commands.
4. Host Write Commands.
Signed-off-by: NChaitanya Kulkarni <ckulkarnilinux@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

2d79c7dc

nvme-fabrics: Add host_traddr options field to host infrastructure · 478bcb93

由 James Smart 提交于 8月 02, 2016

Add the host_traddr field to allow specification of the host-port
connection info for the transport. Will be used by FC transport.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Acked-by: NJohannes Thumshirn <jth@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

478bcb93

nvme-fabrics: revise host transport option descriptions · 4a9f05c5

由 James Smart 提交于 8月 02, 2016

Revise some of the comments so not so ethernet-network centric
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Acked-by: NJohannes Thumshirn <jth@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

4a9f05c5

nvme-fabrics: rework nvmf_get_address() for variable options · 0fe51ff2

由 James Smart 提交于 8月 02, 2016

Revise nvmf_get_address() string to account for not all options being
present.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Acked-by: NJohannes Thumshirn <jth@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

0fe51ff2

21 9月, 2016 3 次提交

lightnvm: expose device geometry through sysfs · 40267efd

由 Simon A. F. Lund 提交于 9月 16, 2016

For a host to access an Open-Channel SSD, it has to know its geometry,
so that it writes and reads at the appropriate device bounds.

Currently, the geometry information is kept within the kernel, and not
exported to user-space for consumption. This patch exposes the
configuration through sysfs and enables user-space libraries, such as
liblightnvm, to use the sysfs implementation to get the geometry of an
Open-Channel SSD.

The sysfs entries are stored within the device hierarchy, and can be
found using the "lightnvm" device type.

An example configuration looks like this:

/sys/class/nvme/
└── nvme0n1
   ├── capabilities: 3
   ├── device_mode: 1
   ├── erase_max: 1000000
   ├── erase_typ: 1000000
   ├── flash_media_type: 0
   ├── media_capabilities: 0x00000001
   ├── media_type: 0
   ├── multiplane: 0x00010101
   ├── num_blocks: 1022
   ├── num_channels: 1
   ├── num_luns: 4
   ├── num_pages: 64
   ├── num_planes: 1
   ├── page_size: 4096
   ├── prog_max: 100000
   ├── prog_typ: 100000
   ├── read_max: 10000
   ├── read_typ: 10000
   ├── sector_oob_size: 0
   ├── sector_size: 4096
   ├── media_manager: gennvm
   ├── ppa_format: 0x380830082808001010102008
   ├── vendor_opcode: 0
   ├── max_phys_secs: 64
   └── version: 1
Signed-off-by: NSimon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

40267efd

lightnvm: control life of nvm_dev in driver · b0b4e09c

由 Matias Bjørling 提交于 9月 16, 2016

LightNVM compatible device drivers does not have a method to expose
LightNVM specific sysfs entries.

To enable LightNVM sysfs entries to be exposed, lightnvm device
drivers require a struct device to attach it to. To allow both the
actual device driver and lightnvm sysfs entries to coexist, the device
driver tracks the lifetime of the nvm_dev structure.

This patch refactors NVMe and null_blk to handle the lifetime of struct
nvm_dev, which eliminates the need for struct gendisk when a lightnvm
compatible device is provided.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

b0b4e09c

nvme: refactor namespaces to support non-gendisk devices · ac81bfa9

由 Matias Bjørling 提交于 9月 16, 2016

With LightNVM enabled namespaces, the gendisk structure is not exposed
to the user. This prevents LightNVM users from accessing the NVMe device
driver specific sysfs entries, and LightNVM namespace geometry.

Refactor the revalidation process, so that a namespace, instead of a
gendisk, is revalidated. This later allows patches to wire up the
sysfs entries up to a non-gendisk namespace.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

ac81bfa9

24 8月, 2016 1 次提交

nvme: Fix nvme_get/set_features() with a NULL result pointer · 9b47f77a

由 Andy Lutomirski 提交于 8月 24, 2016

nvme_set_features() callers seem to expect that passing NULL as the
result pointer is acceptable.  Teach nvme_set_features() not to try to
write to the NULL address.

For symmetry, make the same change to nvme_get_features(), despite the
fact that all current callers pass a valid result pointer.

I assume that this bug hasn't been reported in practice because
the callers that pass NULL are all in the SCSI translation layer
and no one uses the relevant operations.

Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b47f77a

15 8月, 2016 1 次提交

nvme: Prevent controller state invalid transition · f6b6a28e

由 Gabriel Krisman Bertazi 提交于 7月 29, 2016

Acquiring the nvme_ctrl lock before reading ctrl->state in
nvme_change_ctrl_state() should prevent a theoretical invalid state
transition, in the event of two threads racing inside that function.

I haven't been able to observe this happening with the current code, and
the current state machine seems to be simple enough to not be
affected by these invalid transitions, but future modifications could
make it more likely to happen.
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NSagi Grimberg <sag@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6b6a28e

11 8月, 2016 1 次提交

nvme: Suspend all queues before deletion · c21377f8

由 Gabriel Krisman Bertazi 提交于 8月 11, 2016

When nvme_delete_queue fails in the first pass of the
nvme_disable_io_queues() loop, we return early, failing to suspend all
of the IO queues.  Later, on the nvme_pci_disable path, this causes us
to disable MSI without actually having freed all the IRQs, which
triggers the BUG_ON in free_msi_irqs(), as show below.

This patch refactors nvme_disable_io_queues to suspend all queues before
start submitting delete queue commands.  This way, we ensure that we
have at least returned every IRQ before continuing with the removal
path.

[  487.529200] kernel BUG at ../drivers/pci/msi.c:368!
cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650]
    pc: c000000000627a50: free_msi_irqs+0x90/0x200
    lr: c000000000627a40: free_msi_irqs+0x80/0x200
    sp: c0000078c5b838d0
   msr: 9000000100029033
  current = 0xc0000078c5b40000
  paca    = 0xc000000002bd7600   softe: 0        irq_happened: 0x01
    pid   = 1376, comm = kworker/70:1H
kernel BUG at ../drivers/pci/msi.c:368!
Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413
(Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016
enter ? for help
[c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme]
[c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme]
[c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110
[c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170
[c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110
[c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0
[c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590
[c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660
[c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130
[c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Signed-off-by: NJens Axboe <axboe@fb.com>

c21377f8

04 8月, 2016 5 次提交

nvme-rdma: Remove unused includes · e3266378

由 Sagi Grimberg 提交于 8月 04, 2016

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

e3266378

nvme-rdma: start async event handler after reconnecting to a controller · 3ef1b4b2

由 Sagi Grimberg 提交于 8月 04, 2016

When we reset or reconnect to a controller, we are cancelling the
async event handler so we can safely re-establish resources, but we
need to remember to start it again when we successfully reconnect.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3ef1b4b2

nvmet: Fix controller serial number inconsistency · 28b89118

由 Sagi Grimberg 提交于 8月 04, 2016

The host is allowed to issue identify as many times
as it wants, we need to stay consistent when reporting
the serial number for a given controller.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

28b89118

nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads · 40e64e07

由 Sagi Grimberg 提交于 7月 28, 2016

Under extreme conditions this might cause data corruptions. By doing that
we we repost the buffer and then post this buffer for the device to send.
If we happen to use shared receive queues the device might write to the
buffer before it sends it (there is no ordering between send and recv
queues). Without SRQs we probably won't get that if the host doesn't
mis-behave and send more than we allowed it, but relying on that is not
really a good idea.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

40e64e07

nvmet-rdma: Correctly handle RDMA device hot removal · d8f7750a

由 Sagi Grimberg 提交于 5月 19, 2016

When configuring a device attached listener, we may
see device removal events. In this case we return a
non-zero return code from the cm event handler which
implicitly destroys the cm_id. It is possible that in
the future the user will remove this listener and by
that trigger a second call to rdma_destroy_id on an
already destroyed cm_id -> BUG.

In addition, when a queue bound (active session) cm_id
generates a DEVICE_REMOVAL event we must guarantee all
resources are cleaned up by the time we return from the
event handler.

Introduce nvmet_rdma_device_removal which addresses
(or at least attempts to) both scenarios.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

d8f7750a

03 8月, 2016 7 次提交

nvme-rdma: Make sure to shutdown the controller if we can · 45862ebc

由 Sagi Grimberg 提交于 7月 24, 2016

Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because
it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it).

Instead, check that the admin queue is connected. Note that it is safe
because we can never see a copmeting thread trying to destroy the admin
queue (reset or delete controller).
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

45862ebc

nvme-loop: Remove duplicate call to nvme_remove_namespaces · a159c64d

由 Sagi Grimberg 提交于 7月 24, 2016

nvme_uninit_ctrl already does that for us. Note that we
reordered nvme_loop_shutdown_ctrl with nvme_uninit_ctrl
but its safe because we want controller uninit to happen
before we shutdown the transport resources.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a159c64d

nvme-rdma: Free the I/O tags when we delete the controller · a34ca17a

由 Sagi Grimberg 提交于 7月 24, 2016

If we wait until we free the controller (free_ctrl) we might
lose our rdma device without any notification while we still
have open resources (tags mrs and dma mappings).

Instead, destroy the tags with their rdma resources once we
delete the device and not when freeing it.

Note that we don't do that in nvme_rdma_shutdown_ctrl because
controller reset uses it as well and we want to give active I/O
a chance to complete successfully.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a34ca17a

nvme-rdma: Remove duplicate call to nvme_remove_namespaces · 2461a8dd

由 Sagi Grimberg 提交于 7月 24, 2016

nvme_uninit_ctrl already does that for us. Note that we reordered
nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly
fine because we actually want ctrl uninit (aen, scan cancellation
and namespaces removal) to happen before we shutdown the rdma
resources.

Also, centralize the deletion work and the dead controller removal
work code duplication into __nvme_rdma_shutdown_ctrl that accepts
a shutdown boolean.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2461a8dd

nvme-rdma: Fix device removal handling · 57de5a0a

由 Sagi Grimberg 提交于 7月 14, 2016

Device removal sequence may have crashed because the
controller (and admin queue space) was freed before
we destroyed the admin queue resources. Thus we
want to destroy the admin queue and only then queue
controller deletion and wait for it to complete.

More specifically we:
1. own the controller deletion (make sure we are not
   competing with another deletion).
2. get rid of inflight reconnects if exists (which
   also destroy and create queues).
3. destroy the queue.
4. safely queue controller deletion (and wait for it
   to complete).
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

57de5a0a

nvme-rdma: Queue ns scanning after a sucessful reconnection · 5f372eb3

由 Sagi Grimberg 提交于 7月 31, 2016

On an ordered target shutdown, the target can send a AEN on a namespace
removal, this will trigger the host to queue ns-list query. The shutdown
will trigger error recovery which will attepmt periodic reconnect.

We can hit a race where the ns rescanning fails (error recovery kicked
in and we're not connected) causing removing all the namespaces and when
we reconnect we won't see any namespaces for this controller.

So, queue a namespace rescan after we successfully reconnected to the target.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5f372eb3

nvme-rdma: Don't leak uninitialized memory in connect request private data · 0b857b44

由 Roland Dreier 提交于 7月 31, 2016

Zero out the full nvme_rdma_cm_req structure before sending it.
Otherwise we end up leaking kernel memory in the reserved field, which
might break forward compatibility in the future.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

0b857b44

21 7月, 2016 4 次提交

nvme/pci: Provide SR-IOV support · 13880f5b

由 Keith Busch 提交于 6月 20, 2016

This registers an sr-iov callback for nvme.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

13880f5b

nvme: initialize variable before logical OR'ing it · fa9a89fc

由 Jay Freyensee 提交于 7月 20, 2016

It is typically not good coding or secure coding practice
to logical OR a variable without an initialization value first.
Here on this line:

integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;

BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable
never set to an initial value. This patch fixes that.
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: NMing Lin <ming.l@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fa9a89fc

block: ensure bios return from blk_get_request are properly initialized · 0c4de0f3

由 Christoph Hellwig 提交于 7月 19, 2016

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API.  Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0c4de0f3

block: get rid of bio_rw and READA · 70246286

由 Christoph Hellwig 提交于 7月 19, 2016

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

70246286

14 7月, 2016 3 次提交

NVMe: don't allocate unused nvme_major · b09dcf58

由 NeilBrown 提交于 7月 13, 2016

When alloc_disk(0) is used, the ->major number is ignored.  All device
numbers are allocated with a major of BLOCK_EXT_MAJOR.

So remove all references to nvme_major.

[akpm@linux-foundation.org: one unregister_blkdev() was missed]
Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@nobleSigned-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

b09dcf58

nvme: Remove RCU namespace protection · 32f0c4af

由 Keith Busch 提交于 7月 13, 2016

We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.

To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.

List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.

List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.

Reported-by: Ming Lin <mlin at kernel.org>
[fixes 0bf77e9d nvme: switch to RCU freeing the namespace]
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

32f0c4af

nvme: avoid crashes when node 0 is memoryless node. · 2fa84351

由 Masayoshi Mizuma 提交于 6月 20, 2016

When CONFIG_NUMA is enabled and node 0 is memoryless, the system
crashes because nvme_probe() sets the device->numa_node to 0 by
set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0.
To avoid the crash, we should change the 0 to first_memory_node.
Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2fa84351

13 7月, 2016 1 次提交

nvme: Limit command retries · f80ec966

由 Keith Busch 提交于 7月 12, 2016

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f80ec966

12 7月, 2016 6 次提交

nvme-loop: fix nvme-loop Kconfig dependencies · 6eae8c45

由 Arnd Bergmann 提交于 7月 12, 2016

I ran into the same problem on NVME_TARGET_RDMA now,
which otherwise needs dependencies on both CONFIG_BLOCK and
CONFIGFS_FS:

    warning: (NVME_TARGET_LOOP && NVME_TARGET_RDMA) selects NVME_TARGET which has unmet direct dependencies (BLOCK && CONFIGFS_FS)
    0xA002B368 Mon Jul 11 18:00:45 CEST 2016 failed
    In file included from ../drivers/nvme/target/core.c:16:0:
    drivers/nvme/target/nvmet.h:222:14: error: field 'inline_bio' has incomplete type
      struct bio  inline_bio;
                  ^~~~~~~~~~
    drivers/nvme/target/core.c: In function 'nvmet_async_event_work':
    drivers/nvme/target/core.c:98:3: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration]
       kfree(aen);
       ^~~~~
    ../drivers/nvme/target/core.c: In function 'nvmet_ns_enable':
    ../drivers/nvme/target/core.c:269:13: error: implicit declaration of function 'blkdev_get_by_path' [-Werror=implicit-function-declaration]
      ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ | FMODE_WRITE,

Folding in my patch below should address that too.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6eae8c45

nvmet: fix return value check in nvmet_subsys_alloc() · 69555af2

由 Wei Yongjun 提交于 7月 06, 2016

In case of error, the function kstrndup() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

69555af2

nvme-fabrics: add-remove ctrl repeat fix · e76debd9

由 Ming Lin 提交于 7月 01, 2016

Repeatedly adding then removing the same NVMe-over-Fabrics controller
over and over again (shown below) can cause a kernel crash (also shown
below).  This patch fixes that.

[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
-nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
[nvmf]# ./remove_nvme_connections.sh 2
echo 1 > /sys/class/nvme/nvme0/delete_controller
echo 1 > /sys/class/nvme/nvme1/delete_controller
[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
Killed

[nvmf]# dmesg
[  313.416908] nvme nvme0: creating 16 I/O queues.
[  313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
192.168.1.100:4420
[  313.524857] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[  313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
[  313.525490] PGD 0
[  313.525726] Oops: 0000 [#1] SMP
[  313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en
mlx4_ib ib_core mlx4_core
[  313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
4.7.0-rc2+ #2
[  313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
-F/IBQF/IBFF, BIOS 1.0a 10/09/2012
[  313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
ffff88025b980000
[  313.527879] RIP: 0010:[<ffffffff8136c60e>]  [<ffffffff8136c60e>]
strcmp+0xe/0x30
[  313.528232] RSP: 0018:ffff88025b983db0  EFLAGS: 00010206
[  313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
fffffffffffffff1
[  313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
0000000000000011
[  313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
ffff880471879058
[  313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
ffff880471879800
[  313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
fffffffffffffff8
[  313.529303] FS:  00007f778f510700(0000) GS:ffff88047fbc0000(0000)
knlGS:0000000000000000
[  313.529629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
00000000000406e0
[  313.529989] Stack:
[  313.530154]  ffff88025b983e48 ffffffffa0171c74 0000000000000001
0000000000000059
[  313.530621]  ffff880476f32400 ffff88047e8add80 0000010074b33aa0
ffff880471879059
[  313.531162]  ffff88047187904b ffff880471879058 0000000000000000
ffff88047736e000
[  313.531629] Call Trace:
[  313.531797]  [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
[nvme_fabrics]
[  313.531974]  [<ffffffff81180b53>] __vfs_write+0x23/0x120
[  313.532146]  [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
[  313.532316]  [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
[  313.532487]  [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
[  313.532658]  [<ffffffff8117e321>] ? filp_close+0x51/0x70
[  313.532845]  [<ffffffff811824e1>] SyS_write+0x41/0xa0
[  313.533016]  [<ffffffff8183055b>]
entry_SYSCALL_64_fastpath+0x13/0x8f
[  313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01
84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83
c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31
[  313.536563] RIP  [<ffffffff8136c60e>] strcmp+0xe/0x30
[  313.536815]  RSP <ffff88025b983db0>
[  313.536981] CR2: 0000000000000010
[  313.537151] ---[ end trace 3d952e590e7bc2d5 ]---
Reported-and-tested-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NMing Lin <mlin@kernel.org>
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e76debd9

nvme-fabrics: Remove tl_retry_count · 6a92967c

由 Sagi Grimberg 提交于 6月 22, 2016

The timeout before error recovery logic kicks in is
dictated by the nvme keep-alive, so we don't really need
a transport layer retry count. transports can retry for
as much as they like.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

6a92967c

nvme-rdma: Don't use tl_retry_count · 2ac17c28

由 Sagi Grimberg 提交于 6月 22, 2016

Always use the maximum qp retry count as the
error recovery timeout is dictated from the nvme
keep-alive.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

2ac17c28

nvme-rdma: fix the return value of nvme_rdma_reinit_request() · 458a9632

由 Wei Yongjun 提交于 7月 12, 2016

PTR_ERR should be applied before its argument is reassigned, otherwise the
return value will be set to 0, not error code.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

458a9632