提交 · 9c9883744dda1cc38339a448dd8435140537027e · openeuler / raspberrypi-kernel

26 9月, 2017 7 次提交

nvme-fcloop: fix port deletes and callbacks · fddc9923

由 James Smart 提交于 9月 19, 2017

Now that there are potentially long delays between when a remoteport or
targetport delete calls is made and when the callback occurs (dev_loss_tmo
timeout), no longer block in the delete routines and move the final nport
puts to the callbacks.

Moved the fcloop_nport_get/put/free routines to avoid forward declarations.

Ensure port_info structs used in registrations are nulled in case fields
are not set (ex: devloss_tmo values).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fddc9923

nvmet-fc: ensure target queue id within range. · 0c319d3a

由 James Smart 提交于 9月 19, 2017

When searching for queue id's ensure they are within the expected range.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0c319d3a

nvmet-fc: on port remove call put outside lock · 3688feb5

由 James Smart 提交于 9月 19, 2017

Avoid calling the put routine, as it may traverse to free routines while
holding the target lock.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3688feb5

nvme-rdma: don't fully stop the controller in error recovery · e4d753d7

由 Sagi Grimberg 提交于 9月 21, 2017

By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.
Reported-by: NYi Zhang <yizhan@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e4d753d7

nvme-rdma: give up reconnect if state change fails · 0a960afd

由 Sagi Grimberg 提交于 9月 21, 2017

If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0a960afd

nvme-core: Use nvme_wq to queue async events and fw activation · 1a40d972

由 Sagi Grimberg 提交于 9月 21, 2017

async_event_work might race as it is executed from two different
workqueues at the moment.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1a40d972

nvme: fix sqhd reference when admin queue connect fails · 8cbd96a6

由 James Smart 提交于 9月 21, 2017

Fix bug in sqhd patch.

It wasn't the sq that was at risk. In the case where the admin queue
connect command fails, the sq->size field is not set. Therefore, this
becomes a divide by zero error.

Add a quick check to bypass under this failure condition.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8cbd96a6

25 9月, 2017 11 次提交

nvmet: implement valid sqhd values in completions · bb1cc747

由 James Smart 提交于 9月 18, 2017

To support sqhd, for initiators that are following the spec and
paying attention to sqhd vs their sqtail values:

- add sqhd to struct nvmet_sq
- initialize sqhd to 0 in nvmet_sq_setup
- rather than propagate the 0's-based qsize value from the connect message
  which requires a +1 in every sqhd update, and as nothing else references
  it, convert to 1's-based value in nvmt_sq/cq_setup() calls.
- validate connect message sqsize being non-zero per spec.
- updated assign sqhd for every completion that goes back.

Also remove handling the NULL sq case in __nvmet_req_complete, as it can't
happen with the current code.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bb1cc747

nvme-fabrics: Allow 0 as KATO value · 8edd11c9

由 Guilherme G. Piccoli 提交于 9月 14, 2017

Currently, driver code allows user to set 0 as KATO
(Keep Alive TimeOut), but this is not being respected.
This patch enforces the expected behavior.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8edd11c9

nvme: allow timed-out ios to retry · 0951338d

由 James Smart 提交于 9月 07, 2017

Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0951338d

nvme: stop aer posting if controller state not live · cd48282c

由 James Smart 提交于 9月 14, 2017

If an nvme async_event command completes, in most cases, a new
async event is posted. However, if the controller enters a
resetting or reconnecting state, there is nothing to block the
scheduled work element from posting the async event again. Nor are
there calls from the transport to stop async events when an
association dies.

In the case of FC, where the association is torn down, the aer must
be aborted on the FC link and completes through the normal job
completion path. Thus the terminated async event ends up being
rescheduled even though the controller isn't in a valid state for
the aer, and the reposting gets the transport into a partially torn
down data structure.

It's possible to hit the scenario on rdma, although much less likely
due to an aer completing right as the association is terminated and
as the association teardown reclaims the blk requests via
nvme_cancel_request() so its immediate, not a link-related action
like on FC.

Fix by putting controller state checks in both the async event
completion routine where it schedules the async event and in the
async event work routine before it calls into the transport. It's
effectively a "stop_async_events()" behavior.  The transport, when
it creates a new association with the subsystem will transition
the state back to live and is already restarting the async event
posting.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
[hch: remove taking a lock over reading the controller state]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd48282c

nvme-pci: Print invalid SGL only once · d0877473

由 Keith Busch 提交于 9月 15, 2017

The WARN_ONCE macro returns true if the condition is true, not if the
warn was raised, so we're printing the scatter list every time it's
invalid. This is excessive and makes debugging harder, so this patch
prints it just once.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0877473

nvme-pci: initialize queue memory before interrupts · 161b8be2

由 Keith Busch 提交于 9月 14, 2017

A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.

The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

161b8be2

nvmet-fc: fix failing max io queue connections · deb61742

由 James Smart 提交于 9月 11, 2017

fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g.
admin queue plus NVMET_NR_QUEUES-1 io queues.  But NVMET_NR_QUEUES is
the number of io queues, so maximum queue count is really
NVMET_NR_QUEUES+1.

Fix the handling in the target fc transport
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

deb61742

nvme-fc: use transport-specific sgl format · d9d34c0b

由 James Smart 提交于 9月 07, 2017

Sync with NVM Express spec change and FC-NVME 1.18.

FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.

Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d9d34c0b

nvmet-fcloop: remove use of FC-specific error codes · fc9608e8

由 James Smart 提交于 9月 07, 2017

The FC-NVME transport loopback test module used the FC-specific error
codes in cases where it emulated a transport abort case. Instead of
using the FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fc9608e8

nvmet-fc: remove use of FC-specific error codes · 29b3d26e

由 James Smart 提交于 9月 07, 2017

The FC-NVME target transport used the FC-specific error codes in
return codes when the transport or lldd failed. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

29b3d26e

nvme-fc: remove use of FC-specific error codes · 56b7103a

由 James Smart 提交于 9月 07, 2017

The FC-NVME transport used the FC-specific error codes in cases where
it had to fabricate an error to go back up stack. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56b7103a

12 9月, 2017 5 次提交

nvme-pci: implement the HMB entry number and size limitations · 044a9df1

由 Christoph Hellwig 提交于 9月 11, 2017

Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size
and Host Memory Maximum Descriptors Entries field that were added in
TP 4002 HMB Enhancements.  These allow the controller to advertise
limits for the usual number of segments in the host memory buffer, as
well as a minimum usable per-segment size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

044a9df1

nvme-pci: propagate (some) errors from host memory buffer setup · 9620cfba

由 Christoph Hellwig 提交于 9月 06, 2017

We want to catch command execution errors when resetting the device, so
propagate errors from the Set Features when setting up the host memory
buffer.  We keep ignoring memory allocation failures, as the spec
clearly says that the controller must work without a host memory buffer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org

9620cfba

nvme-pci: use appropriate initial chunk size for HMB allocation · 30f92d62

由 Akinobu Mita 提交于 9月 06, 2017

The initial chunk size for host memory buffer allocation is currently
PAGE_SIZE << MAX_ORDER.  MAX_ORDER order allocation is usually failed
without CONFIG_DMA_CMA.  So the HMB allocation is retried with chunk size
PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the
retry allocation works correctly.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
[hch: rebased]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org

30f92d62

nvme-pci: fix host memory buffer allocation fallback · 92dc6895

由 Christoph Hellwig 提交于 9月 11, 2017

nvme_alloc_host_mem currently contains two loops that are interwinded,
and the outer retry loop turns out to be broken.  Fix this by untangling
the two.

Based on a report an initial patch from Akinobu Mita.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NAkinobu Mita <akinobu.mita@gmail.com>
Tested-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org

92dc6895

nvme: fix lightnvm check · 608cc4b1

由 Christoph Hellwig 提交于 9月 06, 2017

nvme_nvm_ns_supported assumes every device is a pci_dev, which leads to
reading an incorrect field, or possible even a dereference of unallocated
memory for fabrics controllers.

Fix this by introducing a quirk for lighnvm capable devices instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatias Bjørling <mb@lightnvm.io>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

608cc4b1

01 9月, 2017 1 次提交

nvme-fabrics: generate spec-compliant UUID NQNs · 40a5fce4

由 Daniel Verkamp 提交于 8月 30, 2017

The default host NQN, which is generated based on the host's UUID,
does not follow the UUID-based NQN format laid out in the NVMe 1.3
specification.  Remove the "NVMf:" portion of the NQN to match the spec.
Signed-off-by: NDaniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Cc: stable@vger.kernel.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>

40a5fce4

30 8月, 2017 8 次提交

nvmet: add support for reporting the host identifier · 28dd5cf7

由 Omri Mann 提交于 8月 30, 2017

And fix the Get/Set Log Page implementation to take all 8 bits of the
feature identifier into account.
Signed-off-by: NOmri Mann <omri@excelero.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[hch: used the UUID API, updated changelog]

28dd5cf7

nvme: Use metadata for passthrough commands · 63263d60

由 Keith Busch 提交于 8月 29, 2017

The ioctls' struct allows the user to provide a metadata address and
length for a passthrough command. This patch uses these values that were
previously ignored and deletes the now unused wrapper function.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

63263d60

nvme: Make nvme user functions static · 485783ca

由 Keith Busch 提交于 8月 29, 2017

These functions are used only locally in the nvme core.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

485783ca

nvme/pci: Use req_op to determine DIF remapping · b5d8af5b

由 Keith Busch 提交于 8月 29, 2017

Only read and write commands need DIF remapping. Everything else uses
a passthrough integrity payload.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b5d8af5b

nvme: factor metadata handling out of __nvme_submit_user_cmd · 1cad6562

由 Christoph Hellwig 提交于 8月 29, 2017

Keep the metadata code in a separate helper instead of making the
main function more complicated.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1cad6562

nvme-fabrics: Convert nvmf_transports_mutex to an rwsem · 489beb91

由 Roland Dreier 提交于 8月 29, 2017

The mutex protects against the list of transports changing while a
controller is being created, but using a plain old mutex means that it
also serializes controller creation.  This unnecessarily slows down
creating multiple controllers - for example for the RDMA transport,
creating a controller involves establishing one connection for every IO
queue, which involves even more network/software round trips, so the
delay can become significant.

The simplest way to fix this is to change the mutex to an rwsem and only
hold it for writing when the list is being mutated.  Since we can take
the rwsem for reading while creating a controller, we can create multiple
controllers in parallel.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

489beb91

nvme-pci: use dma memory for the host memory buffer descriptors · 4033f35d

由 Christoph Hellwig 提交于 8月 28, 2017

The NVMe 1.3 specification says in section 5.21.1.13:

"After a successful completion of a Set Features enabling the host memory
 buffer, the host shall not write to the associated host memory region,
 buffer size, or descriptor list until the host memory buffer has been
 disabled."

While this doesn't state that the descriptor list must remain accessible
to the device it certainly implies it must remaing readable by the device.

So switch to a dma coherent allocation for the descriptor list just to be
safe - it's not like the cost for it matters compared to the actual
memory buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Fixes: 87ad72a5 ("nvme-pci: implement host memory buffer support")

4033f35d

nvme-rdma: default MR page size to 4k · b925a2dc

由 Max Gurtovoy 提交于 8月 28, 2017

Due to various page sizes in the system (IOMMU/device/kernel), we
set the fabrics controller page size to 4k and block layer boundaries
accordinglly. In architectures that uses different kernel page size
we'll have a mismatch to the MR page size that may cause a mapping error.
Update the MR page size to correspond to the core ctrl settings.
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b925a2dc

29 8月, 2017 8 次提交

nvme: don't blindly overwrite identifiers on disk revalidate · 1d5df6af

由 Christoph Hellwig 提交于 8月 17, 2017

Instead validate that these identifiers do not change, as that is
prohibited by the specification.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

1d5df6af

nvme: remove nvme_revalidate_ns · cdbff4f2

由 Christoph Hellwig 提交于 8月 16, 2017

The function is used in two places, and the shared code for those will
diverge later in this series.

Instead factor out a new helper to get the ids for a namespace, simplify
the calling conventions for nvme_identify_ns and just open code the
sequence.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

cdbff4f2

nvme: remove unused struct nvme_ns fields · 57eeaf8e

由 Christoph Hellwig 提交于 8月 16, 2017

And move the flags for the flags field near that field while touching
this area.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

57eeaf8e

nvme: allow calling nvme_change_ctrl_state from irq context · 0a72bbba

由 Christoph Hellwig 提交于 8月 22, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

0a72bbba

nvme: report more detailed status codes to the block layer · a751da33

由 Christoph Hellwig 提交于 8月 22, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

a751da33

nvme: honor RTD3 Entry Latency for shutdowns · 07fbd32a

由 Martin K. Petersen 提交于 8月 25, 2017

If an NVMe controller reports RTD3 Entry Latency larger than
shutdown_timeout, up to a maximum of 60 seconds, use that value to set
the shutdown timer. Otherwise fall back to the module parameter which
defaults to 5 seconds.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
[hch: removed do_div, made transition time local scope]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

07fbd32a

nvme: fix uninitialized prp2 value on small transfers · 5228b328

由 Jan H. Schönherr 提交于 8月 27, 2017

The value of iod->first_dma ends up as prp2 in NVMe commands. In case
there is not enough data to cross a page boundary, iod->first_dma is
never initialized and contains random data.

Comply with the NVMe specification and fill in 0 in that case.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5228b328

nvme-rdma: Use unlikely macro in the fast path · a7b7c7a1

由 Max Gurtovoy 提交于 8月 14, 2017

This patch slightly improves performance (mainly for small block sizes).
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a7b7c7a1