提交 · 82469c59d222f839ded5cd282172258e026f9112 · openeuler / raspberrypi-kernel

07 9月, 2016 1 次提交

nvme: Don't suspend admin queue that wasn't created · 82469c59

由 Gabriel Krisman Bertazi 提交于 9月 06, 2016

This fixes a regression in my previous commit c21377f8 ("nvme:
Suspend all queues before deletion"), which provoked an Oops in the
removal path when removing a device that became IO incapable very early
at probe (i.e. after a failed EEH recovery).

Turns out, if the error occurred very early at the probe path, before
even configuring the admin queue, we might try to suspend the
uninitialized admin queue, accessing bad memory.

Fixes: c21377f8 ("nvme: Suspend all queues before deletion")
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

82469c59

28 8月, 2016 2 次提交

nvme-rdma: Get rid of redundant defines · 4d8c6a79

由 Sagi Grimberg 提交于 8月 26, 2016

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4d8c6a79

nvme-rdma: Get rid of duplicate variable · f5b7b559

由 Sagi Grimberg 提交于 8月 24, 2016

We already have need_inval in ib_mr, lets use
that instead.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f5b7b559

24 8月, 2016 1 次提交

nvme: Fix nvme_get/set_features() with a NULL result pointer · 9b47f77a

由 Andy Lutomirski 提交于 8月 24, 2016

nvme_set_features() callers seem to expect that passing NULL as the
result pointer is acceptable.  Teach nvme_set_features() not to try to
write to the NULL address.

For symmetry, make the same change to nvme_get_features(), despite the
fact that all current callers pass a valid result pointer.

I assume that this bug hasn't been reported in practice because
the callers that pass NULL are all in the SCSI translation layer
and no one uses the relevant operations.

Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b47f77a

19 8月, 2016 3 次提交

nvme: fabrics drivers don't need the nvme-pci driver · aa719874

由 Christoph Hellwig 提交于 8月 18, 2016

So select the NVME_CORE symbol instead of depending on BLK_DEV_NVME.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

aa719874

nvme-fabrics: get a reference when reusing a nvme_host structure · 98096d8a

由 Christoph Hellwig 提交于 8月 18, 2016

Without this we'll get a use after free after connecting two controller
using the same hostnqn and then disconnecting one of them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

98096d8a

nvme-fabrics: change NQN UUID to big-endian format · 7a665d2f

由 Daniel Verkamp 提交于 6月 28, 2016

NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the
UUID format of NQN uses a UUID based on RFC 4122.

RFC 4122 specifies that the UUID is encoded in big-endian byte order.

Switch the NVMe over Fabrics host ID field from little-endian UUID to
big-endian UUID to match the specification.
Signed-off-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

7a665d2f

18 8月, 2016 2 次提交

nvme-rdma: fix sqsize/hsqsize per spec · c5af8654

由 Jay Freyensee 提交于 8月 17, 2016

Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as
a 0-based value.

Also per spec, the RDMA binding values shall be set
to sqsize, which makes hsqsize 0-based values.

Thus, the sqsize during NVMf connect() is now:

[root@fedora23-fabrics-host1 for-48]# dmesg
[  318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for
admin queue: 31
[  318.720884] nvme nvme0: creating 16 I/O queues.
[  318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o
queue: 127

Finally, current interpretation implies hrqsize is 1's based
so set it appropriately.
Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c5af8654

fabrics: define admin sqsize min default, per spec · f994d9dc

由 Jay Freyensee 提交于 8月 17, 2016

Upon admin queue connect(), the rdma qp was being
set based on NVMF_AQ_DEPTH.  However, the fabrics layer was
using the sqsize field value set for I/O queues for the admin
queue, which threw the nvme layer and rdma layer off-whack:

root@fedora23-fabrics-host1 nvmf]# dmesg
[ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
being sent is: 128
[ 3507.798858] nvme nvme0: creating 16 I/O queues.
[ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
192.168.1.3:4420

Thus, to have a different admin queue value, we use
NVMF_AQ_DEPTH for connect() and RDMA private data
as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec
(and in that RDMA private data we treat hrqsize as 1's-based
value, per current understanding of the fabrics spec).
Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

f994d9dc

16 8月, 2016 1 次提交

nvme-rdma: initialize ret to zero to avoid returning garbage · 39bbee4e

由 Colin Ian King 提交于 8月 16, 2016

ret is not initialized so it contains garbage.  Ensure garbage
is not returned by initializing rc to 0.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

39bbee4e

15 8月, 2016 1 次提交

nvme: Prevent controller state invalid transition · f6b6a28e

由 Gabriel Krisman Bertazi 提交于 7月 29, 2016

Acquiring the nvme_ctrl lock before reading ctrl->state in
nvme_change_ctrl_state() should prevent a theoretical invalid state
transition, in the event of two threads racing inside that function.

I haven't been able to observe this happening with the current code, and
the current state machine seems to be simple enough to not be
affected by these invalid transitions, but future modifications could
make it more likely to happen.
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NSagi Grimberg <sag@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6b6a28e

11 8月, 2016 1 次提交

nvme: Suspend all queues before deletion · c21377f8

由 Gabriel Krisman Bertazi 提交于 8月 11, 2016

When nvme_delete_queue fails in the first pass of the
nvme_disable_io_queues() loop, we return early, failing to suspend all
of the IO queues.  Later, on the nvme_pci_disable path, this causes us
to disable MSI without actually having freed all the IRQs, which
triggers the BUG_ON in free_msi_irqs(), as show below.

This patch refactors nvme_disable_io_queues to suspend all queues before
start submitting delete queue commands.  This way, we ensure that we
have at least returned every IRQ before continuing with the removal
path.

[  487.529200] kernel BUG at ../drivers/pci/msi.c:368!
cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650]
    pc: c000000000627a50: free_msi_irqs+0x90/0x200
    lr: c000000000627a40: free_msi_irqs+0x80/0x200
    sp: c0000078c5b838d0
   msr: 9000000100029033
  current = 0xc0000078c5b40000
  paca    = 0xc000000002bd7600   softe: 0        irq_happened: 0x01
    pid   = 1376, comm = kworker/70:1H
kernel BUG at ../drivers/pci/msi.c:368!
Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413
(Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016
enter ? for help
[c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme]
[c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme]
[c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110
[c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170
[c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110
[c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0
[c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590
[c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660
[c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130
[c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Signed-off-by: NJens Axboe <axboe@fb.com>

c21377f8

04 8月, 2016 2 次提交

nvme-rdma: Remove unused includes · e3266378

由 Sagi Grimberg 提交于 8月 04, 2016

Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>

e3266378

nvme-rdma: start async event handler after reconnecting to a controller · 3ef1b4b2

由 Sagi Grimberg 提交于 8月 04, 2016

When we reset or reconnect to a controller, we are cancelling the
async event handler so we can safely re-establish resources, but we
need to remember to start it again when we successfully reconnect.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3ef1b4b2

03 8月, 2016 6 次提交

nvme-rdma: Make sure to shutdown the controller if we can · 45862ebc

由 Sagi Grimberg 提交于 7月 24, 2016

Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because
it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it).

Instead, check that the admin queue is connected. Note that it is safe
because we can never see a copmeting thread trying to destroy the admin
queue (reset or delete controller).
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

45862ebc

nvme-rdma: Free the I/O tags when we delete the controller · a34ca17a

由 Sagi Grimberg 提交于 7月 24, 2016

If we wait until we free the controller (free_ctrl) we might
lose our rdma device without any notification while we still
have open resources (tags mrs and dma mappings).

Instead, destroy the tags with their rdma resources once we
delete the device and not when freeing it.

Note that we don't do that in nvme_rdma_shutdown_ctrl because
controller reset uses it as well and we want to give active I/O
a chance to complete successfully.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a34ca17a

nvme-rdma: Remove duplicate call to nvme_remove_namespaces · 2461a8dd

由 Sagi Grimberg 提交于 7月 24, 2016

nvme_uninit_ctrl already does that for us. Note that we reordered
nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly
fine because we actually want ctrl uninit (aen, scan cancellation
and namespaces removal) to happen before we shutdown the rdma
resources.

Also, centralize the deletion work and the dead controller removal
work code duplication into __nvme_rdma_shutdown_ctrl that accepts
a shutdown boolean.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2461a8dd

nvme-rdma: Fix device removal handling · 57de5a0a

由 Sagi Grimberg 提交于 7月 14, 2016

Device removal sequence may have crashed because the
controller (and admin queue space) was freed before
we destroyed the admin queue resources. Thus we
want to destroy the admin queue and only then queue
controller deletion and wait for it to complete.

More specifically we:
1. own the controller deletion (make sure we are not
   competing with another deletion).
2. get rid of inflight reconnects if exists (which
   also destroy and create queues).
3. destroy the queue.
4. safely queue controller deletion (and wait for it
   to complete).
Reported-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

57de5a0a

nvme-rdma: Queue ns scanning after a sucessful reconnection · 5f372eb3

由 Sagi Grimberg 提交于 7月 31, 2016

On an ordered target shutdown, the target can send a AEN on a namespace
removal, this will trigger the host to queue ns-list query. The shutdown
will trigger error recovery which will attepmt periodic reconnect.

We can hit a race where the ns rescanning fails (error recovery kicked
in and we're not connected) causing removing all the namespaces and when
we reconnect we won't see any namespaces for this controller.

So, queue a namespace rescan after we successfully reconnected to the target.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5f372eb3

nvme-rdma: Don't leak uninitialized memory in connect request private data · 0b857b44

由 Roland Dreier 提交于 7月 31, 2016

Zero out the full nvme_rdma_cm_req structure before sending it.
Otherwise we end up leaking kernel memory in the reserved field, which
might break forward compatibility in the future.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

0b857b44

21 7月, 2016 4 次提交

nvme/pci: Provide SR-IOV support · 13880f5b

由 Keith Busch 提交于 6月 20, 2016

This registers an sr-iov callback for nvme.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

13880f5b

nvme: initialize variable before logical OR'ing it · fa9a89fc

由 Jay Freyensee 提交于 7月 20, 2016

It is typically not good coding or secure coding practice
to logical OR a variable without an initialization value first.
Here on this line:

integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;

BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable
never set to an initial value. This patch fixes that.
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: NMing Lin <ming.l@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fa9a89fc

block: ensure bios return from blk_get_request are properly initialized · 0c4de0f3

由 Christoph Hellwig 提交于 7月 19, 2016

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API.  Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0c4de0f3

block: get rid of bio_rw and READA · 70246286

由 Christoph Hellwig 提交于 7月 19, 2016

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

70246286

14 7月, 2016 3 次提交

NVMe: don't allocate unused nvme_major · b09dcf58

由 NeilBrown 提交于 7月 13, 2016

When alloc_disk(0) is used, the ->major number is ignored.  All device
numbers are allocated with a major of BLOCK_EXT_MAJOR.

So remove all references to nvme_major.

[akpm@linux-foundation.org: one unregister_blkdev() was missed]
Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@nobleSigned-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

b09dcf58

nvme: Remove RCU namespace protection · 32f0c4af

由 Keith Busch 提交于 7月 13, 2016

We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.

To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.

List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.

List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.

Reported-by: Ming Lin <mlin at kernel.org>
[fixes 0bf77e9d nvme: switch to RCU freeing the namespace]
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

32f0c4af

nvme: avoid crashes when node 0 is memoryless node. · 2fa84351

由 Masayoshi Mizuma 提交于 6月 20, 2016

When CONFIG_NUMA is enabled and node 0 is memoryless, the system
crashes because nvme_probe() sets the device->numa_node to 0 by
set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0.
To avoid the crash, we should change the 0 to first_memory_node.
Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2fa84351

13 7月, 2016 1 次提交

nvme: Limit command retries · f80ec966

由 Keith Busch 提交于 7月 12, 2016

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f80ec966

12 7月, 2016 5 次提交

nvme-fabrics: add-remove ctrl repeat fix · e76debd9

由 Ming Lin 提交于 7月 01, 2016

Repeatedly adding then removing the same NVMe-over-Fabrics controller
over and over again (shown below) can cause a kernel crash (also shown
below).  This patch fixes that.

[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
-nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
[nvmf]# ./remove_nvme_connections.sh 2
echo 1 > /sys/class/nvme/nvme0/delete_controller
echo 1 > /sys/class/nvme/nvme1/delete_controller
[nvmf]# ./setup_nvme_connections.sh
traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
-nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
Killed

[nvmf]# dmesg
[  313.416908] nvme nvme0: creating 16 I/O queues.
[  313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
192.168.1.100:4420
[  313.524857] BUG: unable to handle kernel NULL pointer dereference at
0000000000000010
[  313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
[  313.525490] PGD 0
[  313.525726] Oops: 0000 [#1] SMP
[  313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en
mlx4_ib ib_core mlx4_core
[  313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
4.7.0-rc2+ #2
[  313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
-F/IBQF/IBFF, BIOS 1.0a 10/09/2012
[  313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
ffff88025b980000
[  313.527879] RIP: 0010:[<ffffffff8136c60e>]  [<ffffffff8136c60e>]
strcmp+0xe/0x30
[  313.528232] RSP: 0018:ffff88025b983db0  EFLAGS: 00010206
[  313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
fffffffffffffff1
[  313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
0000000000000011
[  313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
ffff880471879058
[  313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
ffff880471879800
[  313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
fffffffffffffff8
[  313.529303] FS:  00007f778f510700(0000) GS:ffff88047fbc0000(0000)
knlGS:0000000000000000
[  313.529629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
00000000000406e0
[  313.529989] Stack:
[  313.530154]  ffff88025b983e48 ffffffffa0171c74 0000000000000001
0000000000000059
[  313.530621]  ffff880476f32400 ffff88047e8add80 0000010074b33aa0
ffff880471879059
[  313.531162]  ffff88047187904b ffff880471879058 0000000000000000
ffff88047736e000
[  313.531629] Call Trace:
[  313.531797]  [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
[nvme_fabrics]
[  313.531974]  [<ffffffff81180b53>] __vfs_write+0x23/0x120
[  313.532146]  [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
[  313.532316]  [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
[  313.532487]  [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
[  313.532658]  [<ffffffff8117e321>] ? filp_close+0x51/0x70
[  313.532845]  [<ffffffff811824e1>] SyS_write+0x41/0xa0
[  313.533016]  [<ffffffff8183055b>]
entry_SYSCALL_64_fastpath+0x13/0x8f
[  313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01
84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83
c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31
[  313.536563] RIP  [<ffffffff8136c60e>] strcmp+0xe/0x30
[  313.536815]  RSP <ffff88025b983db0>
[  313.536981] CR2: 0000000000000010
[  313.537151] ---[ end trace 3d952e590e7bc2d5 ]---
Reported-and-tested-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NMing Lin <mlin@kernel.org>
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e76debd9

nvme-fabrics: Remove tl_retry_count · 6a92967c

由 Sagi Grimberg 提交于 6月 22, 2016

The timeout before error recovery logic kicks in is
dictated by the nvme keep-alive, so we don't really need
a transport layer retry count. transports can retry for
as much as they like.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

6a92967c

nvme-rdma: Don't use tl_retry_count · 2ac17c28

由 Sagi Grimberg 提交于 6月 22, 2016

Always use the maximum qp retry count as the
error recovery timeout is dictated from the nvme
keep-alive.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

2ac17c28

nvme-rdma: fix the return value of nvme_rdma_reinit_request() · 458a9632

由 Wei Yongjun 提交于 7月 12, 2016

PTR_ERR should be applied before its argument is reassigned, otherwise the
return value will be set to 0, not error code.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

458a9632

nvme/quirk: Add a delay before checking for adapter readiness · 54adc010

由 Guilherme G. Piccoli 提交于 6月 14, 2016

When disabling the controller, the specification says the register
NVME_REG_CC should be written and then driver needs to wait the
adapter to be ready, which is checked by reading another register
bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
so in case this timeout is reached the driver gives up and removes
the adapter from the system.

After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
(HGST adapter) end up being removed if we issue a reset_controller,
because driver keeps verifying the NVME_REG_CSTS until the timeout is
reached. This patch adds a necessary quirk for this adapter, by
introducing a delay before nvme_wait_ready(), so the reset procedure
is able to be completed. This quirk is needed because just increasing
the timeout is not enough in case of this adapter - the driver must
wait before start reading NVME_REG_CSTS register on this specific
device.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

54adc010

08 7月, 2016 2 次提交

nvme-rdma: add a NVMe over Fabrics RDMA host driver · 71102307

由 Christoph Hellwig 提交于 7月 06, 2016

This patch implements the RDMA host (initiator in SCSI speak) driver.  It
can be used to connect to remote NVMe over Fabrics controllers over
Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well
a the new fabrics library.

To connect to all NVMe over Fabrics controller reachable on a given taget
port using RDMA/CM use the following command:

	nvme connect-all -t rdma -a $IPADDR

This requires the latest version of nvme-cli with Fabrics support.
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

71102307

nvme: add new reconnecting controller state · def61eca

由 Christoph Hellwig 提交于 7月 06, 2016

The nvme fabric (RDMA, FC, etc...) can introduce port, link or node
failures that may require a reconnect to re-establish the connection.

Add a new reconnecting state that will initially be used by the RDMA
driver.
Reviewed-by: NJay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

def61eca

07 7月, 2016 1 次提交

nvme: lightnvm: make MLC num_pairs little endian · 6f929702

由 Johannes Thumshirn 提交于 7月 07, 2016

According to the OpenChannel SSD interface specification the NAND flash
MLC page pairing information's number of page page pairings field is the
first two bytes in the MLC Page Pairing data structure. The hardware's
data structure itself is little endian so annotate it as such, like the
rest of lighnvm's data structures.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

6f929702

06 7月, 2016 4 次提交

nvme: add keep-alive support · 038bd4cb

由 Sagi Grimberg 提交于 6月 13, 2016

Periodic keep-alive is a mandatory feature in NVMe over Fabrics, and
optional in NVMe 1.2.1 for PCIe.  This patch adds periodic keep-alive
sent from the host to verify that the controller is still responsive
and vice-versa.  The keep-alive timeout is user-defined (with
keep_alive_tmo connection parameter) and defaults to 5 seconds.

In order to avoid a race condition where the host sends a keep-alive
competing with the target side keep-alive timeout expiration, the host
adds a grace period of 10 seconds when publishing the keep-alive timeout
to the target.

In case a keep-alive failed (or timed out), a transport specific error
recovery kicks in.

For now only NVMe over Fabrics is wired up to support keep alive, but
we can add PCIe support easily once controllers actually supporting it
become available.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NSteve Wise <swise@chelsio.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

038bd4cb

nvme-fabrics: add a generic NVMe over Fabrics library · 07bfcd09

由 Christoph Hellwig 提交于 6月 13, 2016

The NVMe over Fabrics library provides an interface for both transports
and the nvme core to handle fabrics specific commands and attributes
independent of the underlying transport.

In addition, the fabrics library adds a misc device interface that allow
actually creating a fabrics controller, as we can't just autodiscover
it like in the PCI case.  The nvme-cli utility has been enhanced to use
this interface to support fabric connect and discovery.

Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>,
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>,
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

07bfcd09

nvme.h: add NVMe over Fabrics definitions · eb793e2c

由 Christoph Hellwig 提交于 6月 13, 2016

The NVMe over Fabrics specification defines a protocol interface and
related extensions to NVMe that enable operation over network protocols.
The NVMe over Fabrics specification has an NVMe Transport binding for
each NVMe Transport.

This patch adds the fabrics related definitions:
- fabric specific command set and error codes
- transport addressing and binding definitions
- fabrics sgl extensions
- controller identification fabrics enhancements
- discovery log page definition
Signed-off-by: NArmen Baloyan <armenx.baloyan@intel.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

eb793e2c

nvme: add fabrics sysfs attributes · 1a353d85

由 Ming Lin 提交于 6月 13, 2016

- delete_controller: This attribute allows to delete a controller.
  A driver is not obligated to support it (pci doesn't) so it is
  created only if the driver supports it. The new fabrics drivers
  will support it (essentialy a disconnect operation).

  Usage:
  echo > /sys/class/nvme/nvme0/delete_controller

- subsysnqn: This attribute shows the subsystem nqn of the configured
  device. If a driver does not implement the get_subsysnqn method, the
  file will not appear in sysfs.

- transport: This attribute shows the transport name. Added a "name"
  field to struct nvme_ctrl_ops.

  For loop,
  cat /sys/class/nvme/nvme0/transport
  loop

  For RDMA,
  cat /sys/class/nvme/nvme0/transport
  rdma

  For PCIe,
  cat /sys/class/nvme/nvme0/transport
  pcie

- address: This attributes shows the controller address. The fabrics
  drivers that will implement get_address can show the address of the
  connected controller.

  example:
  cat /sys/class/nvme/nvme0/address
  traddr=192.168.2.2,trsvcid=1023
Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
Reviewed-by: NJay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1a353d85