提交 · 0544f5494a03b8846db74e02be5685d1f32b06c9 · openeuler / raspberrypi-kernel

23 5月, 2017 1 次提交

nvme-rdma: support devices with queue size < 32 · 0544f549

由 Marta Rybczynska 提交于 4月 10, 2017

In the case of small NVMe-oF queue size (<32) we may enter a deadlock
caused by the fact that the IB completions aren't sent waiting for 32
and the send queue will fill up.

The error is seen as (using mlx5):
[ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
[ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12

This patch changes the way the signaling is done so that it depends on
the queue depth now. The magic define has been removed completely.

Cc: stable@vger.kernel.org
Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu>
Signed-off-by: NSamuel Jones <sjones@kalray.eu>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0544f549

21 5月, 2017 5 次提交

nvmet: release the sq ref on rdma read errors · 549f01ae

由 Vijay Immanuel 提交于 5月 08, 2017

On rdma read errors, release the sq ref that was taken
when the req was initialized. This avoids a hang in
nvmet_sq_destroy() when the queue is being freed.
Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

549f01ae

nvmet-fc: remove target cpu scheduling flag · 4b8ba5fa

由 James Smart 提交于 4月 25, 2017

Remove NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED. It's unnecessary.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4b8ba5fa

nvme-fc: stop queues on error detection · 2952a879

由 James Smart 提交于 4月 25, 2017

Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Rather than waiting for reset work thread to stop queues and abort the ios,
immediately stop the queues on error detection. Reset thread will restop
the queues (as it's called on other paths), but it does not appear to have
a side effect.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2952a879

nvme-fc: require target or discovery role for fc-nvme targets · 85e6a6ad

由 James Smart 提交于 5月 05, 2017

In order to create an association, the remoteport must be
serving either a target role or a discovery role.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

85e6a6ad

nvme: unmap CMB and remove sysfs file in reset path · f63572df

由 Jon Derrick 提交于 5月 05, 2017

CMB doesn't get unmapped until removal while getting remapped on every
reset. Add the unmapping and sysfs file removal to the reset path in
nvme_pci_disable to match the mapping path in nvme_pci_enable.

Fixes: 202021c1 ("nvme : Add sysfs entry for NVMe CMBs when appropriate")
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Reviewed-By: NStephen Bates <sbates@raithlin.com>
Cc: <stable@vger.kernel.org> # 4.9+
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f63572df

10 5月, 2017 1 次提交

nvme: lightnvm: fix memory leak · fba704b4

由 Rakesh Pandit 提交于 5月 09, 2017

Free up kmalloc allocated memory if failure happens while handling L2P
table transfer in nvme_nvm_get_l2p_tbl.

Fixes: 8e79b5cb ("lightnvm: move block provisioning to targets")
Signed-off-by: NRakesh Pandit <rakesh@tuxera.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

fba704b4

08 5月, 2017 1 次提交

lightnvm: remove unused rq parameter of nvme_nvm_rqtocmd() to kill warning · 629b1b2e

由 Geert Uytterhoeven 提交于 5月 07, 2017

With gcc 4.1.2:

    drivers/nvme/host/lightnvm.c: In function ‘nvme_nvm_submit_io’:
    drivers/nvme/host/lightnvm.c:498: warning: ‘rq’ is used uninitialized in this function

Indeed, since commit 2e13f33a ("lightnvm: create cmd before
allocating request"), the request is passed to nvme_nvm_rqtocmd() before
it is allocated.

Fortunately, as of commit 91276162 ("lightnvm: refactor end_io
functions for sync"), nvme_nvm_rqtocmd () no longer uses the passed
request, so this warning is a false positive.

Drop the unused parameter to clean up the code and kill the warning.

Fixes: 2e13f33a ("lightnvm: create cmd before allocating request")
Fixes: 91276162 ("lightnvm: refactor end_io functions for sync")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

629b1b2e

04 5月, 2017 1 次提交

lightnvm: create cmd before allocating request · 2e13f33a

由 Javier González 提交于 5月 03, 2017

Create nvme command before allocating a request using
nvme_alloc_request, which uses the command direction. Up until now, the
command has been zeroized, so all commands have been allocated as a
read operation.
Signed-off-by: NJavier González <javier@cnexlabs.com>
Reviewed-by: NMatias Bjørling <matias@cnexlabs.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2e13f33a

02 5月, 2017 1 次提交

blk-mq: update ->init_request and ->exit_request prototypes · d6296d39

由 Christoph Hellwig 提交于 5月 01, 2017

Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq.  Except for a superflous check in
mtip32xx it was unused anyway.

Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

d6296d39

27 4月, 2017 1 次提交

nvme-scsi: remove nvme_trans_security_protocol · 7569b90a

由 Christoph Hellwig 提交于 4月 25, 2017

This function just returns the same error code and sense data as
the default statement in the switch in the caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

7569b90a

26 4月, 2017 3 次提交

C
nvme-lightnvm: add missing endianess conversion in nvme_nvm_end_io · 25d9baa4
由 Christoph Hellwig 提交于 4月 21, 2017
```
Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatias Bjørling <matias@cnexlabs.com>
```
25d9baa4

nvme-scsi: Consider LBA format in IO splitting calculation · 7fad1fd4

由 Jon Derrick 提交于 4月 24, 2017

The current command submission code uses a sector-based value when
considering the maximum number of blocks per command. With a
4k-formatted namespace and a command exceeding max hardware limits, this
calculation doesn't split IOs which should be split and fails in the
nvme layer. This patch fixes that calculation and enables IO splitting
in these circumstances.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7fad1fd4

nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice · de41447a

由 Ewan D. Milne 提交于 4月 24, 2017

Do not call nvmf_free_options() from the nvme_fc_ctlr destructor if
nvme_fc_create_ctrl() returns an error, because nvmf_create_ctrl()
frees the options when an error is returned.
Signed-off-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

de41447a

25 4月, 2017 3 次提交

nvme: Add nvme_core.force_apst to ignore the NO_APST quirk · c35e30b4

由 Andy Lutomirski 提交于 4月 21, 2017

We're probably going to be stuck quirking APST off on an over-broad
range of devices for 4.11.  Let's make it easy to override the quirk
for testing.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

c35e30b4

nvme: Display raw APST configuration via DYNAMIC_DEBUG · fb0dc399

由 Andy Lutomirski 提交于 4月 21, 2017

Debugging APST is currently a bit of a pain.  This gives optional
simple log messages that describe the APST state.

The easiest way to use this is probably with the nvme_core.dyndbg=+p
module parameter.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fb0dc399

nvme: Fix APST comment · 76e4ad09

由 Andy Lutomirski 提交于 4月 21, 2017

There was a typo in the description of the timeout heuristic.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

76e4ad09

24 4月, 2017 9 次提交

nvmet-fcloop: mark two symbols static · 36b8890e

由 Christoph Hellwig 提交于 4月 21, 2017

Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

36b8890e

nvmet-fc: properly endian swap sq_head · 8ad76cf1

由 Christoph Hellwig 提交于 4月 21, 2017

Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

8ad76cf1

nvmet-fc: mark the sqhd field as __le16 · f63688a6

由 Christoph Hellwig 提交于 4月 21, 2017

That's what it's used as.

Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

f63688a6

nvmet-fc: fix endianess annoations for nvmet_fc_format_rsp_hdr · 3f5e1188

由 Christoph Hellwig 提交于 4月 21, 2017

The passed in desc_len is a big endian value, so mark it as such.

Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

3f5e1188

nvmet-fc: mark nvmet_fc_handle_fcp_rqst static · edba98dd

由 Christoph Hellwig 提交于 4月 21, 2017

Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

edba98dd

nvme-fc: mark two symbols static · baee29ac

由 Christoph Hellwig 提交于 4月 21, 2017

Found by sparse.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

baee29ac

nvme_fc: add controller reset support · 61bff8ef

由 James Smart 提交于 4月 23, 2017

This patch actually does quite a few things.  When looking to add
controller reset support, the organization modeled after rdma was
very fragmented. rdma duplicates the reset and teardown paths and does
different things to the block layer on the two paths. The code to build
up the controller is also duplicated between the initial creation and
the reset/error recovery paths. So I decided to make this sane.

I reorganized the controller creation and teardown so that there is a
connect path and a disconnect path.  Initial creation obviously uses
the connect path.  Controller teardown will use the disconnect path,
followed last access code. Controller reset will use the disconnect
path to stop operation, and then the connect path to re-establish
the controller.

Along the way, several things were fixed
- aens were not properly set up. They are allocated differently from
  the per-request structure on the blk queues.
- aens were oddly torn down. the prior patch corrected to abort, but
  we still need to dma unmap and free relative elements.
- missed a few ref counting points: in aen completion and on i/o's
  that fail
- controller initial create failure paths were still confused vs teardown
  before converting to ref counting vs after we convert to refcounting.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

61bff8ef

nvme_fc: add aen abort to teardown · 78a7ac26

由 James Smart 提交于 4月 23, 2017

Add abort support for aens. Commonized the op abort to apply to aen or
real ios (caused some reorg/routine movement). Abort path sets termination
flag in prep for next patch that will be watching i/o abort completion
before proceeding with controller teardown.

Now that we're aborting aens, the "exit" code that simply cleared out
their context no longer applies.

Also clarified how we detect an AEN vs a normal io - by a flag, not
by whether a rq exists or the a rqno is out of range.

Note: saw some interesting cases where if the queues are stopped and
we're waiting for the aborts, the core layer can call the complete_rq
callback for the io. So the io completion synchronizes link side completion
with possible blk layer completion under error.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

78a7ac26

nvme_fc: fix command id check · 458f280d

由 James Smart 提交于 4月 23, 2017

The code validates the command_id in the response to the original
sqe command. But prior code was using the rq->rqno as the sqe command
id. The core layer overwrites what the transport set there originally.

Use the actual sqe content.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

458f280d

21 4月, 2017 14 次提交

nvme: let dm-mpath distinguish nvme error codes · e02ab023

由 Junxiong Guan 提交于 4月 21, 2017

Currently most IOs which return the nvme error codes are retried on
the other path if those IOs returns EIO from NVMe driver. This
patch let Multipath distinguish nvme media error codes and some
generic or cmd-specific nvme error codes so that multipath will
not retry those kinds of IO, to save bandwidth.
Signed-off-by: NJunxiong Guan <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e02ab023

nvme/pci: Poll CQ on timeout · 7776db1c

由 Keith Busch 提交于 2月 24, 2017

If an IO timeout occurs, it's helpful to know if the controller did not
post a completion or the driver missed an interrupt. While we never expect
the latter, this patch will make it possible to tell the difference so
we don't have to guess.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

7776db1c

nvmet_fc: Change traddr field separator to a colon · 43631357

由 James Smart 提交于 4月 12, 2017

The FC-NVME spec revised syntax to avoid comma separators.
Sync with the change in the parser for traddr on port attachments.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

43631357

nvme_fc: Add ls aborts on remote port teardown · 8d64daf7

由 James Smart 提交于 4月 11, 2017

remoteport teardown never aborted the LS opertions. Add support.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

8d64daf7

nvme_fc: Move LS's to rport · c913a8b0

由 James Smart 提交于 4月 11, 2017

Link LS's on the remoteport rather than the controller. LS's are
between nport's. Makes more sense, especially on async teardown where
the controller is torn down regardless of the LS (LS is more of a notifier
to the target of the teardown), to have them on the remoteport.

While revising ls send/done routines, issues were seen relative to
refcounting and cleanup, especially in async path. Reworked these code
paths.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c913a8b0

nvmet_fc: add missing reference in add_port · 568ad51e

由 James Smart 提交于 4月 11, 2017

Add missing reference in add_port
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

568ad51e

nvmet_fc: Rework target side abort handling · a97ec51b

由 James Smart 提交于 4月 11, 2017

target transport:
----------------------
There are cases when there is a need to abort in-progress target
operations (writedata) so that controller termination or errors can
clean up. That can't happen currently as the abort is another target
op type, so it can't be used till the running one finishes (and it may
not).  Solve by removing the abort op type and creating a separate
downcall from the transport to the lldd to request an io to be aborted.

The transport will abort ios on queue teardown or io errors. In general
the transport tries to call the lldd abort only when the io state is
idle. Meaning: ops that transmit data (readdata or rsp) will always
finish their transmit (or the lldd will see a state on the
link or initiator port that fails the transmit) and the done call for
the operation will occur. The transport will wait for the op done
upcall before calling the abort function, and as the io is idle, the
io can be cleaned up immediately after the abort call; Similarly, ios
that are not waiting for data or transmitting data must be in the nvmet
layer being processed. The transport will wait for the nvmet layer
completion before calling the abort function, and as the io is idle,
the io can be cleaned up immediately after the abort call; As for ops
that are waiting for data (writedata), they may be outstanding
indefinitely if the lldd doesn't see a condition where the initiatior
port or link is bad. In those cases, the transport will call the abort
function and wait for the lldd's op done upcall for the operation, where
it will then clean up the io.

Additionally, if a lldd receives an ABTS and matches it to an outstanding
request in the transport, A new new transport upcall was created to abort
the outstanding request in the transport. The transport expects any
outstanding op call (readdata or writedata) will completed by the lldd and
the operation upcall made. The transport doesn't act on the reported
abort (e.g. clean up the io) until an op done upcall occurs, a new op is
attempted, or the nvmet layer completes the io processing.

fcloop:
----------------------
Updated to support the new target apis.
On fcp io aborts from the initiator, the loopback context is updated to
NULL out the half that has completed. The initiator side is immediately
called after the abort request with an io completion (abort status).
On fcp io aborts from the target, the io is stopped and the initiator side
sees it as an aborted io. Target side ops, perhaps in progress while the
initiator side is done, continue but noop the data movement as there's no
structure on the initiator side to reference.

patch also contains:
----------------------
Revised lpfc to support the new abort api

commonized rsp buffer syncing and nulling of private data based on
calling paths.

errors in op done calls don't take action on the fod. They're bad
operations which implies the fod may be bad.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

a97ec51b

nvme_fcloop: split job struct from transport for req_release · ce79bfc2

由 James Smart 提交于 4月 11, 2017

Current design has the fcloop job struct, used for both initiator and
target processing, allocated as part of the initiator request structure.
On aborts, the initiator side (based on the request) may terminate, yet
the target side wants to continue processing. the target side can't do
that if the initiator side goes away.
Revise fcloop to allocate an independent target side structure when it
starts an io from the initiator.

Added a lock to the request struct as well to synchronize pointer updates
on abort calls.

Modified target downcalls to recognize conditions where initiator has
aborted the io (thus nulled the pointer between job structs), thus
avoid referencing sgl lists which are gone and no longer making upcalls
to the initiator.

In conditions where the targetport is no longer connected, have the
initiator return an access failure rather than simulating a command
completion.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

ce79bfc2

nvmet_fc: add req_release to lldd api · 19b58d94

由 James Smart 提交于 4月 11, 2017

With the advent of the opdone calls changing context, the lldd can no
longer assume that once the op->done call returns for RSP operations
that the request struct is no longer being accessed.

As such, revise the lldd api for a req_release callback that the
transport will call when the job is complete. This will also be used
with abort cases.

Fixed text in api header for change in io complete semantics.

Revised lpfc to support the new req_release api.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

19b58d94

nvmet_fc: add target feature flags for upcall isr contexts · 39498fae

由 James Smart 提交于 4月 11, 2017

Two new feature flags were added to control whether upcalls to the
transport result in context switches or stay in the calling context.

NVMET_FCTGTFEAT_CMD_IN_ISR:
  By default, if the flag is not set, the transport assumes the
  lldd is in a non-isr context and in the cpu context it should be
  for the io queue. As such, the cmd handler is called directly in the
  calling context.
  If the flag is set, indicating the upcall is an isr context, the
  transport mandates a transition to a workqueue. The workqueue assigned
  to the queue is used for the context.
NVMET_FCTGTFEAT_OPDONE_IN_ISR
  By default, if the flag is not set, the transport assumes the
  lldd is in a non-isr context and in the cpu context it should be
  for the io queue. As such, the fcp operation done callback is called
  directly in the calling context.
  If the flag is set, indicating the upcall is an isr context, the
  transport mandates a transition to a workqueue. The workqueue assigned
  to the queue is used for the context.

Updated lpfc for flags
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

39498fae

nvmet: convert from kmap to nvmet_copy_from_sgl · 1c05cf90

由 Logan Gunthorpe 提交于 4月 18, 2017

This is safer as it doesn't rely on the data being stored in
a single page in an sgl.

It also aids our effort to start phasing out users of sg_page. See [1].

For this we kmalloc some memory, copy to it and free at the end. Note:
we can't allocate this memory on the stack as the kbuild test robot
reports some frame size overflows on i386.

[1] https://lwn.net/Articles/720053/Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

1c05cf90

nvme: improve performance for virtual NVMe devices · f9f38e33

由 Helen Koike 提交于 4月 10, 2017

This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
 1) to store the doorbell values so they can be lookup without the doorbell
    MMIO write
 2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specification, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.

FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).

The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.

Contributions:
  Eric Northup <digitaleric@google.com>
  Frank Swiderski <fes@google.com>
  Ted Tso <tytso@mit.edu>
  Keith Busch <keith.busch@intel.com>

Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.

[1] Running on a 4 core machine:
  fio --time_based --name=benchmark --runtime=30
  --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
  --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
  --rw=randread --blocksize=4k --randrepeat=false
Signed-off-by: NRob Nelson <rlnelson@google.com>
[mlin: port for upstream]
Signed-off-by: NMing Lin <mlin@kernel.org>
[koike: updated for upstream]
Signed-off-by: NHelen Koike <helen.koike@collabora.co.uk>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

f9f38e33

nvme/pci: Don't set reserved SQ create flags · 81c1cd98

由 Keith Busch 提交于 4月 04, 2017

The QPRIO field is only valid if weighted round robin arbitration is used,
and this driver doesn't enable that controller configuration option.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

81c1cd98

nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA" · be56945c

由 Andy Lutomirski 提交于 4月 20, 2017

There's a report that it malfunctions with APST on.

See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184

Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

be56945c