提交 · cd48282cc736377d5abf7c04de8c6ba864ba3794 · openanolis / cloud-kernel

25 9月, 2017 8 次提交

nvme: stop aer posting if controller state not live · cd48282c

由 James Smart 提交于 9月 14, 2017

If an nvme async_event command completes, in most cases, a new
async event is posted. However, if the controller enters a
resetting or reconnecting state, there is nothing to block the
scheduled work element from posting the async event again. Nor are
there calls from the transport to stop async events when an
association dies.

In the case of FC, where the association is torn down, the aer must
be aborted on the FC link and completes through the normal job
completion path. Thus the terminated async event ends up being
rescheduled even though the controller isn't in a valid state for
the aer, and the reposting gets the transport into a partially torn
down data structure.

It's possible to hit the scenario on rdma, although much less likely
due to an aer completing right as the association is terminated and
as the association teardown reclaims the blk requests via
nvme_cancel_request() so its immediate, not a link-related action
like on FC.

Fix by putting controller state checks in both the async event
completion routine where it schedules the async event and in the
async event work routine before it calls into the transport. It's
effectively a "stop_async_events()" behavior.  The transport, when
it creates a new association with the subsystem will transition
the state back to live and is already restarting the async event
posting.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
[hch: remove taking a lock over reading the controller state]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd48282c

nvme-pci: Print invalid SGL only once · d0877473

由 Keith Busch 提交于 9月 15, 2017

The WARN_ONCE macro returns true if the condition is true, not if the
warn was raised, so we're printing the scatter list every time it's
invalid. This is excessive and makes debugging harder, so this patch
prints it just once.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0877473

nvme-pci: initialize queue memory before interrupts · 161b8be2

由 Keith Busch 提交于 9月 14, 2017

A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.

The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

161b8be2

nvmet-fc: fix failing max io queue connections · deb61742

由 James Smart 提交于 9月 11, 2017

fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g.
admin queue plus NVMET_NR_QUEUES-1 io queues.  But NVMET_NR_QUEUES is
the number of io queues, so maximum queue count is really
NVMET_NR_QUEUES+1.

Fix the handling in the target fc transport
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

deb61742

nvme-fc: use transport-specific sgl format · d9d34c0b

由 James Smart 提交于 9月 07, 2017

Sync with NVM Express spec change and FC-NVME 1.18.

FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.

Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d9d34c0b

nvmet-fcloop: remove use of FC-specific error codes · fc9608e8

由 James Smart 提交于 9月 07, 2017

The FC-NVME transport loopback test module used the FC-specific error
codes in cases where it emulated a transport abort case. Instead of
using the FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc9608e8

nvmet-fc: remove use of FC-specific error codes · 29b3d26e

由 James Smart 提交于 9月 07, 2017

The FC-NVME target transport used the FC-specific error codes in
return codes when the transport or lldd failed. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

29b3d26e

nvme-fc: remove use of FC-specific error codes · 56b7103a

由 James Smart 提交于 9月 07, 2017

The FC-NVME transport used the FC-specific error codes in cases where
it had to fabricate an error to go back up stack. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56b7103a

12 9月, 2017 5 次提交

nvme-pci: implement the HMB entry number and size limitations · 044a9df1

由 Christoph Hellwig 提交于 9月 11, 2017

Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size
and Host Memory Maximum Descriptors Entries field that were added in
TP 4002 HMB Enhancements.  These allow the controller to advertise
limits for the usual number of segments in the host memory buffer, as
well as a minimum usable per-segment size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

044a9df1

nvme-pci: propagate (some) errors from host memory buffer setup · 9620cfba

由 Christoph Hellwig 提交于 9月 06, 2017

We want to catch command execution errors when resetting the device, so
propagate errors from the Set Features when setting up the host memory
buffer.  We keep ignoring memory allocation failures, as the spec
clearly says that the controller must work without a host memory buffer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org

9620cfba

nvme-pci: use appropriate initial chunk size for HMB allocation · 30f92d62

由 Akinobu Mita 提交于 9月 06, 2017

The initial chunk size for host memory buffer allocation is currently
PAGE_SIZE << MAX_ORDER.  MAX_ORDER order allocation is usually failed
without CONFIG_DMA_CMA.  So the HMB allocation is retried with chunk size
PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the
retry allocation works correctly.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
[hch: rebased]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org

30f92d62

nvme-pci: fix host memory buffer allocation fallback · 92dc6895

由 Christoph Hellwig 提交于 9月 11, 2017

nvme_alloc_host_mem currently contains two loops that are interwinded,
and the outer retry loop turns out to be broken.  Fix this by untangling
the two.

Based on a report an initial patch from Akinobu Mita.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NAkinobu Mita <akinobu.mita@gmail.com>
Tested-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org

92dc6895

nvme: fix lightnvm check · 608cc4b1

由 Christoph Hellwig 提交于 9月 06, 2017

nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to
reading an incorrect field, or possible even a dereference of unallocated
memory for fabrics controllers.

Fix this by introducing a quirk for lighnvm capable devices instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatias Bjørling <mb@lightnvm.io>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

608cc4b1

01 9月, 2017 1 次提交

nvme-fabrics: generate spec-compliant UUID NQNs · 40a5fce4

由 Daniel Verkamp 提交于 8月 30, 2017

The default host NQN, which is generated based on the host's UUID,
does not follow the UUID-based NQN format laid out in the NVMe 1.3
specification.  Remove the "NVMf:" portion of the NQN to match the spec.
Signed-off-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>

40a5fce4

30 8月, 2017 8 次提交

nvmet: add support for reporting the host identifier · 28dd5cf7

由 Omri Mann 提交于 8月 30, 2017

And fix the Get/Set Log Page implementation to take all 8 bits of the
feature identifier into account.
Signed-off-by: NOmri Mann <omri@excelero.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[hch: used the UUID API, updated changelog]

28dd5cf7

nvme: Use metadata for passthrough commands · 63263d60

由 Keith Busch 提交于 8月 29, 2017

The ioctls' struct allows the user to provide a metadata address and
length for a passthrough command. This patch uses these values that were
previously ignored and deletes the now unused wrapper function.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

63263d60

nvme: Make nvme user functions static · 485783ca

由 Keith Busch 提交于 8月 29, 2017

These functions are used only locally in the nvme core.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

485783ca

nvme/pci: Use req_op to determine DIF remapping · b5d8af5b

由 Keith Busch 提交于 8月 29, 2017

Only read and write commands need DIF remapping. Everything else uses
a passthrough integrity payload.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b5d8af5b

nvme: factor metadata handling out of __nvme_submit_user_cmd · 1cad6562

由 Christoph Hellwig 提交于 8月 29, 2017

Keep the metadata code in a separate helper instead of making the
main function more complicated.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1cad6562

nvme-fabrics: Convert nvmf_transports_mutex to an rwsem · 489beb91

由 Roland Dreier 提交于 8月 29, 2017

The mutex protects against the list of transports changing while a
controller is being created, but using a plain old mutex means that it
also serializes controller creation.  This unnecessarily slows down
creating multiple controllers - for example for the RDMA transport,
creating a controller involves establishing one connection for every IO
queue, which involves even more network/software round trips, so the
delay can become significant.

The simplest way to fix this is to change the mutex to an rwsem and only
hold it for writing when the list is being mutated.  Since we can take
the rwsem for reading while creating a controller, we can create multiple
controllers in parallel.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

489beb91

nvme-pci: use dma memory for the host memory buffer descriptors · 4033f35d

由 Christoph Hellwig 提交于 8月 28, 2017

The NVMe 1.3 specification says in section 5.21.1.13:

"After a successful completion of a Set Features enabling the host memory
 buffer, the host shall not write to the associated host memory region,
 buffer size, or descriptor list until the host memory buffer has been
 disabled."

While this doesn't state that the descriptor list must remain accessible
to the device it certainly implies it must remaing readable by the device.

So switch to a dma coherent allocation for the descriptor list just to be
safe - it's not like the cost for it matters compared to the actual
memory buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Fixes: 87ad72a5 ("nvme-pci: implement host memory buffer support")

4033f35d

nvme-rdma: default MR page size to 4k · b925a2dc

由 Max Gurtovoy 提交于 8月 28, 2017

Due to various page sizes in the system (IOMMU/device/kernel), we
set the fabrics controller page size to 4k and block layer boundaries
accordinglly. In architectures that uses different kernel page size
we'll have a mismatch to the MR page size that may cause a mapping error.
Update the MR page size to correspond to the core ctrl settings.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b925a2dc

29 8月, 2017 18 次提交

nvme: don't blindly overwrite identifiers on disk revalidate · 1d5df6af

由 Christoph Hellwig 提交于 8月 17, 2017

Instead validate that these identifiers do not change, as that is
prohibited by the specification.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

1d5df6af

nvme: remove nvme_revalidate_ns · cdbff4f2

由 Christoph Hellwig 提交于 8月 16, 2017

The function is used in two places, and the shared code for those will
diverge later in this series.

Instead factor out a new helper to get the ids for a namespace, simplify
the calling conventions for nvme_identify_ns and just open code the
sequence.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

cdbff4f2

nvme: remove unused struct nvme_ns fields · 57eeaf8e

由 Christoph Hellwig 提交于 8月 16, 2017

And move the flags for the flags field near that field while touching
this area.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

57eeaf8e

nvme: allow calling nvme_change_ctrl_state from irq context · 0a72bbba

由 Christoph Hellwig 提交于 8月 22, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

0a72bbba

nvme: report more detailed status codes to the block layer · a751da33

由 Christoph Hellwig 提交于 8月 22, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

a751da33

nvme: honor RTD3 Entry Latency for shutdowns · 07fbd32a

由 Martin K. Petersen 提交于 8月 25, 2017

If an NVMe controller reports RTD3 Entry Latency larger than
shutdown_timeout, up to a maximum of 60 seconds, use that value to set
the shutdown timer. Otherwise fall back to the module parameter which
defaults to 5 seconds.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
[hch: removed do_div, made transition time local scope]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

07fbd32a

nvme: fix uninitialized prp2 value on small transfers · 5228b328

由 Jan H. Schönherr 提交于 8月 27, 2017

The value of iod->first_dma ends up as prp2 in NVMe commands. In case
there is not enough data to cross a page boundary, iod->first_dma is
never initialized and contains random data.

Comply with the NVMe specification and fill in 0 in that case.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5228b328

nvme-rdma: Use unlikely macro in the fast path · a7b7c7a1

由 Max Gurtovoy 提交于 8月 14, 2017

This patch slightly improves performance (mainly for small block sizes).
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a7b7c7a1

nvmet: use memcpy_and_pad for identify sn/fr · 17c39d05

由 Martin Wilck 提交于 8月 14, 2017

This changes the earlier patch "nvmet: don't report 0-bytes
in serial number" to use the memcpy_and_pad() helper introduced
in a previous patch.
Signed-off-by: NMartin Wilck <mwilck@suse.com>
Reviewed-by: NSagi Grimberg <sagi@grimbeg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

17c39d05

nvmet-fc: simplify sg list handling · 48fa362b

由 James Smart 提交于 7月 31, 2017

The existing nvmet_fc sg list handling has 2 faults:
a) the request between LLDD and transport has too large of an sg
   list (256 elements), which is normally 256k (64 elements).
b) sglist handling doesn't optimize on the fact that each element
   is a page.

This patch removes the static sg list in the request and uses the
dynamic list already present in the nvmet_fc transport. It also
simplies the handling of the sg list on multiple sequences to
take advantage of the per-page divisions.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

48fa362b

nvme-fc: Reattach to localports on re-registration · 5533d424

由 James Smart 提交于 7月 31, 2017

If the LLDD resets or detaches from an fc port, the LLDD will
deregister all remoteports seen by the fc port and deregister the
localport associated with the fc port. The teardown of the localport
structure will be held off due to reference counting until all the
remoteports are removed (and they are held off until all
controllers/associations to terminated). Currently, if the fc port
is reinit/reattached and registered again as a localport it is
treated as an independent entity from the prior localport and all
prior remoteports and controllers cannot be revived. They are
created as new and separate entities.

This patch changes the localport registration to look at the known
localports that are waiting to be torndown. If they are the same port
based on wwn's, the local port is transitioned out of the teardown
state.  This allows the remote ports and controller connections to
be reestablished and resumed as long as the localport can also be
reregistered within the timeout windows.

The patch adds a new routine nvme_fc_attach_to_unreg_lport() with
the functionality and moves the lport get/put routines to avoid
forward references.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5533d424

M
nvme: rename AMS symbolic constants to fit specification · 60b43f62
由 Max Gurtovoy 提交于 8月 13, 2017
```
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
60b43f62

nvme: add symbolic constants for CC identifiers · ad4e05b2

由 Max Gurtovoy 提交于 8月 13, 2017

Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ad4e05b2

nvme: fix identify namespace logging · caaa15c5

由 Sagi Grimberg 提交于 8月 15, 2017

Use ctrl->device and lose the func name.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

caaa15c5

nvme-fabrics: log a warning if hostid is invalid · 9b483da1

由 Guan Junxiong 提交于 8月 03, 2017

This helps users to quickly spot the reason of why connection fails
if the hostid is not compliant with the uuid format.
Signed-off-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9b483da1

nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64 · 09fdc23b

由 Sagi Grimberg 提交于 7月 10, 2017

To make the nvme_rdma_configure_admin_queue generic in preparation of
moving it to common code.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

09fdc23b

nvme-rdma: cleanup error path in controller reset · 370ae6e4

由 Sagi Grimberg 提交于 7月 10, 2017

No need to queue an extra work to indirect controller removal, just call the
ctrl remove routine.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

370ae6e4

nvme-rdma: introduce nvme_rdma_start_queue · 68e16fcf

由 Sagi Grimberg 提交于 7月 10, 2017

This should pair with nvme_rdma_stop_queue. While this is not a complete
inverse, it still pairs up pretty well because in fabrics we don't have a
disconnect capsule (yet) but we simply teardown the transport association.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

68e16fcf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功