提交 · 1052b8ac5282daf35df331edcbdb645839d17e6a · openeuler / Kernel

26 11月, 2018 1 次提交

blk-mq: when polling for IO, look for any completion · 1052b8ac

由 Jens Axboe 提交于 11月 26, 2018

If we want to support async IO polling, then we have to allow finding
completions that aren't just for the one we are looking for. Always pass
in -1 to the mq_ops->poll() helper, and have that return how many events
were found in this poll loop.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1052b8ac

19 11月, 2018 1 次提交

nvme: default to 0 poll queues · a4668d9b

由 Jens Axboe 提交于 11月 19, 2018

We need a better way of configuring this, and given that polling is
(still) a bit niche, let's default to using 0 poll queues. That way
we'll have the same read/write/poll behavior as 4.20, and users that
want to test/use polling are required to do manual configuration of the
number of poll queues.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a4668d9b

16 11月, 2018 2 次提交

nvme: provide optimized poll function for separate poll queues · dabcefab

由 Jens Axboe 提交于 11月 14, 2018

If we have separate poll queues, we know that they aren't using
interrupts. Hence we don't need to disable interrupts around
finding completions.

Provide a separate set of blk_mq_ops for such devices.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dabcefab

nvme: fix handling of EINVAL on pci_alloc_irq_vectors_affinity() · db29eb05

由 Jens Axboe 提交于 11月 15, 2018

At least on SPARC, if MSI/MSI-X isn't supported, we get EINVAL if
we ask for more than one vector. This isn't covered by our ENOSPC
check.

If we get EINVAL, decrease our ask to just one vector, instead of
bailing out in error.

Fixes: 3b6592f7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db29eb05

15 11月, 2018 1 次提交

nvme: fix boot hang with only being able to get one IRQ vector · 30e06628

由 Jens Axboe 提交于 11月 14, 2018

NVMe always asks for io_queues + 1 worth of IRQ vectors, which
means that even when we scale all the way down, we still ask
for 2 vectors and get -ENOSPC in return if the system can't
support more than 1.

Getting just 1 vector is fine, it just means that we'll have
1 IO queue and 1 admin queue, with a shared vector between them.
Check for this case and don't add our + 1 if it happens.

Fixes: 3b6592f7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

30e06628

08 11月, 2018 3 次提交

nvme: add separate poll queue map · 4b04cc6a

由 Jens Axboe 提交于 11月 05, 2018

Adds support for defining a variable number of poll queues, currently
configurable with the 'poll_queues' module parameter. Defaults to
a single poll queue.

And now we finally have poll support without triggering interrupts!
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b04cc6a

nvme: utilize two queue maps, one for reads and one for writes · 3b6592f7

由 Jens Axboe 提交于 10月 31, 2018

NVMe does round-robin between queues by default, which means that
sharing a queue map for both reads and writes can be problematic
in terms of read servicing. It's much easier to flood the queue
with writes and reduce the read servicing.

Implement two queue maps, one for reads and one for writes. The
write queue count is configurable through the 'write_queues'
parameter.

By default, we retain the previous behavior of having a single
queue set, shared between reads and writes. Setting 'write_queues'
to a non-zero value will create two queue sets, one for reads and
one for writes, the latter using the configurable number of
queues (hardware queue counts permitting).
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b6592f7

blk-mq: abstract out queue map · ed76e329

由 Jens Axboe 提交于 10月 29, 2018

This is in preparation for allowing multiple sets of maps per
queue, if so desired.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed76e329

02 11月, 2018 1 次提交

nvme-pci: fix conflicting p2p resource adds · 9fe5c59f

由 Keith Busch 提交于 10月 31, 2018

The nvme pci driver had been adding its CMB resource to the P2P DMA
subsystem everytime on on a controller reset. This results in the
following warning:

    ------------[ cut here ]------------
    nvme 0000:00:03.0: Conflicting mapping in same section
    WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
    ...
    Call Trace:
     pci_p2pdma_add_resource+0x153/0x370
     nvme_reset_work+0x28c/0x17b1 [nvme]
     ? add_timer+0x107/0x1e0
     ? dequeue_entity+0x81/0x660
     ? dequeue_entity+0x3b0/0x660
     ? pick_next_task_fair+0xaf/0x610
     ? __switch_to+0xbc/0x410
     process_one_work+0x1cf/0x350
     worker_thread+0x215/0x3d0
     ? process_one_work+0x350/0x350
     kthread+0x107/0x120
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30
    ---[ end trace f7ea76ac6ee72727 ]---
    nvme nvme0: failed to register the CMB

This patch fixes this by registering the CMB with P2P only once.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9fe5c59f

18 10月, 2018 3 次提交

nvme-pci: remove duplicate check · 3045c0d0

由 Chaitanya Kulkarni 提交于 10月 17, 2018

This is a cleanup patch doesn't change any functionality. It removes
the duplicate call to the blk_integrity_rq() in the nvme_map_data().
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3045c0d0

nvme-pci: Add support for P2P memory in requests · e0596ab2

由 Logan Gunthorpe 提交于 10月 04, 2018

For P2P requests, we must use the pci_p2pmem_map_sg() function instead of
the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.  For
this, we create an NVME_F_PCI_P2P flag which tells the core to set
QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

e0596ab2

nvme-pci: Use PCI p2pmem subsystem to manage the CMB · 0f238ff5

由 Logan Gunthorpe 提交于 10月 04, 2018

Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.

If the CMB supports WDS and RDS, publish it for use as P2P memory by other
devices.

Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seeing the main use-cases for the CMB is P2P operations, this
seems like a reasonable dependency.

We drop the __iomem safety on the buffer seeing that, by convention, it's
safe to directly access memory mapped by memremap()/devm_memremap_pages().
Architectures where this is not safe will not be supported by memremap()
and therefore will not support PCI P2P and have no support for CMB.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

0f238ff5

17 10月, 2018 2 次提交

nvme-pci: fix hot removal during error handling · cb4bfda6

由 Keith Busch 提交于 10月 15, 2018

A removal waits for the reset_work to complete. If a surprise removal
occurs around the same time as an error triggered controller reset, and
reset work happened to dispatch a command to the removed controller, the
command won't be recovered since the timeout work doesn't do anything
during error recovery. We wouldn't want to wait for timeout handling
anyway, so this patch fixes this by disabling the controller and killing
admin queues prior to syncing with the reset_work.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cb4bfda6

nvme-pci: fix nvme_suspend_queue() kernel-doc header · 40581d1a

由 Bart Van Assche 提交于 10月 08, 2018

This patch avoids that the kernel-doc tool complains about the
nvme_suspend_queue() function header when building with W=1.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

40581d1a

03 10月, 2018 1 次提交

PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls · 62b36c3e

由 Oza Pawandeep 提交于 9月 28, 2018

After bfcb79fc ("PCI/ERR: Run error recovery callbacks for all affected
devices"), AER errors are always cleared by the PCI core and drivers don't
need to do it themselves.

Remove calls to pci_cleanup_aer_uncorrect_error_status() from device
driver error recovery functions.
Signed-off-by: NOza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove PCI core changes, remove unused variables]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

62b36c3e

28 8月, 2018 1 次提交

nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2

由 Michal Wnukowski 提交于 8月 15, 2018

In many architectures loads may be reordered with older stores to
different locations.  In the nvme driver the following two operations
could be reordered:

 - Write shadow doorbell (dbbuf_db) into memory.
 - Read EventIdx (dbbuf_ei) from memory.

This can result in a potential race condition between driver and VM host
processing requests (if given virtual NVMe controller has a support for
shadow doorbell).  If that occurs, then the NVMe controller may decide to
wait for MMIO doorbell from guest operating system, and guest driver may
decide not to issue MMIO doorbell on any of subsequent commands.

This issue is purely timing-dependent one, so there is no easy way to
reproduce it. Currently the easiest known approach is to run "Oracle IO
Numbers" (orion) that is shipped with Oracle DB:

orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
	concat -write 40 -duration 120 -matrix row -testname nvme_test

Where nvme_test is a .lun file that contains a list of NVMe block
devices to run test against. Limiting number of vCPUs assigned to given
VM instance seems to increase chances for this bug to occur. On test
environment with VM that got 4 NVMe drives and 1 vCPU assigned the
virtual NVMe controller hang could be observed within 10-20 minutes.
That correspond to about 400-500k IO operations processed (or about
100GB of IO read/writes).

Orion tool was used as a validation and set to run in a loop for 36
hours (equivalent of pushing 550M IO operations). No issues were
observed. That suggest that the patch fixes the issue.

Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
[hch: updated changelog and comment a bit]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f1ed3df2

30 7月, 2018 1 次提交

nvme: use blk API to remap ref tags for IOs with metadata · f7f1fc36

由 Max Gurtovoy 提交于 7月 30, 2018

Also moved the logic of the remapping to the nvme core driver instead
of implementing it in the nvme pci driver. This way all the other nvme
transport drivers will benefit from it (in case they'll implement metadata
support).
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7f1fc36

23 7月, 2018 1 次提交

nvme: cache struct nvme_ctrl reference to struct nvme_request · 59e29ce6

由 Sagi Grimberg 提交于 6月 29, 2018

We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

59e29ce6

12 7月, 2018 1 次提交

nvme-pci: fix memory leak on probe failure · b6e44b4c

由 Keith Busch 提交于 7月 11, 2018

The nvme driver specific structures need to be initialized prior to
enabling the generic controller so we can unwind on failure with out
using the reference counting callbacks so that 'probe' and 'remove'
can be symmetric.

The newly added iod_mempool is the only resource that was being
allocated out of order, and a failure there would leak the generic
controller memory. This patch just moves that allocation above the
controller initialization.

Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations")
Reported-by: NWeiping Zhang <zwp10758@gmail.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b6e44b4c

22 6月, 2018 1 次提交

nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e

由 Jens Axboe 提交于 6月 21, 2018

nvme requires an sg table allocation for each request. If the request
is large, then the allocation can become quite large. For instance,
with our default software settings of 1280KB IO size, we'll need
10248 bytes of sg table. That turns into a 2nd order allocation,
which we can't always guarantee. If we fail the allocation, blk-mq
will retry it later. But there's no guarantee that we'll EVER be
able to allocate that much contigious memory.

Limit the IO size such that we never need more than a single page
of memory. That's a lot faster and more reliable. Then back that
allocation with a mempool, so that we know we'll always be able
to succeed the allocation at some point.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

943e942e

21 6月, 2018 1 次提交

nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl · 9f9cafc1

由 Jianchao Wang 提交于 6月 20, 2018

There is race between nvme_remove and nvme_reset_work that can
lead to io hang.

nvme_remove                    nvme_reset_work
                               -> nvme_remove_dead_ctrl
                                 -> nvme_dev_disable
                                   -> quiesce request_queue
                                 -> queue remove_work
-> cancel_work_sync reset_work
-> nvme_remove_namespaces
  -> splice ctrl->namespaces
                               nvme_remove_dead_ctrl_work
                               -> nvme_kill_queues
  -> nvme_ns_remove               do nothing
    -> blk_cleanup_queue
      -> blk_freeze_queue

Finally, the request_queue is quiesced state when wait freeze,
we will get io hang here. To fix it, move the nvme_kill_queues
from nvme_remove_dead_ctrl_work to nvme_remove_dead_ctrl.
Suggested-by: NKeith Busch <keith.busch@linux.intel.com>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9f9cafc1

09 6月, 2018 6 次提交

nvme-pci: make CMB SQ mod-param read-only · 69f4eb9f

由 Keith Busch 提交于 6月 06, 2018

A controller reset after a run time change of the CMB module parameter
breaks the driver. An 'on -> off' will have the driver use NULL for the
host memory queue, and 'off -> on' will use mismatched queue depth between
the device and the host.

We could fix both, but there isn't really a good reason to change this
at run time anyway, compared to at module load time, so this patch makes
parameter read-only after after modprobe.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69f4eb9f

nvme-pci: unquiesce dead controller queues · 1d39e692

由 Keith Busch 提交于 6月 06, 2018

This patch ensures the nvme namsepace request queues are not quiesced
on a surprise removal. It's possible the queues were previously killed
in a failed reset, so the queues need to be unquiesced to ensure all
requests are flushed to completion.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1d39e692

nvme-pci: remove HMB teardown on reset · fe76fcfb

由 Keith Busch 提交于 6月 06, 2018

The controller is required to disable its host memory buffer use on
controller reset. We don't need to submit an admin command to delete it,
so this patch skips sending that command so we don't need to worry about
handling a timeout.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe76fcfb

nvme-pci: queue creation fixes · ded45505

由 Keith Busch 提交于 6月 06, 2018

We've been ignoring NVMe error status on queue creations. Fortunately they
are uncommon, but we should handle these anyway. This patch adds checks
for the a positive error return value that indicates an NVMe status.

If we do see a negative return, the controller isn't usable, so this
patch returns immediately in since we can't unwind that failure.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ded45505

nvme-pci: remove unnecessary completion doorbell check · 397c699f

由 Keith Busch 提交于 6月 06, 2018

The nvme pci driver never unmaps the doorbell registers while the requests
are active, so we can always safely update the completion queue head.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

397c699f

nvme-pci: remove unnecessary nested locking · 0bc88192

由 Keith Busch 提交于 6月 06, 2018

The nvme pci driver no longer handles completions under the cq lock,
so the nested locking is not necessary.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bc88192

30 5月, 2018 2 次提交

nvme-pci: simplify __nvme_submit_cmd · 90ea5ca4

由 Christoph Hellwig 提交于 5月 26, 2018

With recent CQ handling improvements we can now move the locking into
__nvme_submit_cmd.  Also remove the local tail variable to make the code
more obvious, remove the __ prefix in the name, and fix the comments
describing the function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>

90ea5ca4

nvme-pci: Rate limit the nvme timeout warnings · b9cac43c

由 Keith Busch 提交于 5月 24, 2018

The block layer's timeout handling currently prevents drivers from
completing commands outside the timeout callback once blk-mq decides
they've expired. If a device breaks, this could potentially create many
thousands of timed out commands. There's nothing of value to be gleaned
from observing each of those messages, so this patch adds a rate limit
on them.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b9cac43c

29 5月, 2018 1 次提交

nvme: return BLK_EH_DONE from ->timeout · db8c48e4

由 Christoph Hellwig 提交于 5月 29, 2018

NVMe always completes the request before returning from ->timeout, either
by polling for it, or by disabling the controller.  Return BLK_EH_DONE so
that the block layer doesn't even try to complete it again.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db8c48e4

25 5月, 2018 2 次提交

nvme-pci: Fix AER reset handling · 72cd4cc2

由 Keith Busch 提交于 5月 24, 2018

The nvme timeout handling doesn't do anything if the pci channel is
offline, which is the case when recovering from PCI error event, so it
was a bad idea to sync the controller reset in this state. This patch
flushes the reset work in the error_resume callback instead when the
channel is back to online. This keeps AER handling serialized and
can recover from timeouts.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199757
Fixes: cc1d5e74 ("nvme/pci: Sync controller reset for AER slot_reset")
Reported-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Tested-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

72cd4cc2

nvme-pci: set nvmeq->cq_vector after alloc cq/sq · a8e3e0bb

由 Jianchao Wang 提交于 5月 24, 2018

Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will invoke
free_irq for it and cause a 'Trying to free already-free IRQ  xxx'
warning if the create CQ/SQ command times out.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
[hch: fixed to pass a s16 and clean up the comment]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a8e3e0bb

21 5月, 2018 1 次提交

nvme-pci: fix race between poll and IRQ completions · 68fa9dbe

由 Jens Axboe 提交于 5月 21, 2018

If polling completions are racing with the IRQ triggered by a
completion, the IRQ handler will find no work and return IRQ_NONE.
This can trigger complaints about spurious interrupts:

[  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
[  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
[  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
[  560.175991] Call Trace:
[  560.175994]  <IRQ>
[  560.176005]  dump_stack+0x5c/0x7b
[  560.176010]  __report_bad_irq+0x30/0xc0
[  560.176013]  note_interrupt+0x235/0x280
[  560.176020]  handle_irq_event_percpu+0x51/0x70
[  560.176023]  handle_irq_event+0x27/0x50
[  560.176026]  handle_edge_irq+0x6d/0x180
[  560.176031]  handle_irq+0xa5/0x110
[  560.176036]  do_IRQ+0x41/0xc0
[  560.176042]  common_interrupt+0xf/0xf
[  560.176043]  </IRQ>
[  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
[  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
[  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
[  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
[  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
[  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
[  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
[  560.176071]  do_idle+0x1f4/0x260
[  560.176075]  cpu_startup_entry+0x6f/0x80
[  560.176080]  start_secondary+0x184/0x1d0
[  560.176085]  secondary_startup_64+0xa5/0xb0
[  560.176088] handlers:
[  560.178387] [<00000000efb612be>] nvme_irq [nvme]
[  560.183019] Disabling IRQ #630

A previous commit removed ->cqe_seen that was handling this case,
but we need to handle this a bit differently due to completions
now running outside the queue lock. Return IRQ_HANDLED from the
IRQ handler, if the completion ring head was moved since we last
saw it.

Fixes: 5cb525c8 ("nvme-pci: handle completions outside of the queue lock")
Reported-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Tested-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

68fa9dbe

19 5月, 2018 6 次提交

nvme-pci: drop IRQ disabling on submission queue lock · 1eae349d

由 Jens Axboe 提交于 5月 17, 2018

Since we aren't sharing the lock for completions now, we don't
have to make it IRQ safe.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1eae349d

nvme-pci: split the nvme queue lock into submission and completion locks · 1ab0cd69

由 Jens Axboe 提交于 5月 17, 2018

This is now feasible. We protect the submission queue ring with
->sq_lock, and the completion side with ->cq_lock.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1ab0cd69

nvme-pci: handle completions outside of the queue lock · 5cb525c8

由 Jens Axboe 提交于 5月 17, 2018

Split the completion of events into a two part process:

1) Reap the events inside the queue lock
2) Complete the events outside the queue lock

Since we never wrap the queue, we can access it locklessly after we've
updated the completion queue head. This patch started off with batching
events on the stack, but with this trick we don't have to. Keith Busch
<keith.busch@intel.com> came up with that idea.

Note that this kills the ->cqe_seen as well. I haven't been able to
trigger any ill effects of this. If we do race with polling every so
often, it should be rare enough NOT to trigger any issues.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[hch: refactored, restored poll early exit optimization]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5cb525c8

nvme-pci: move ->cq_vector == -1 check outside of ->q_lock · d1f06f4a

由 Jens Axboe 提交于 5月 17, 2018

We only clear it dynamically in nvme_suspend_queue(). When we do, ensure
to do a full flush so that any nvme_queue_rq() invocation will see it.

Ideally we'd kill this check completely, but we're using it to flush
requests on a dying queue.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d1f06f4a

nvme-pci: remove cq check after submission · f9dde187

由 Jens Axboe 提交于 5月 17, 2018

We always check the completion queue after submitting, but in my testing
this isn't a win even on DRAM/xpoint devices. In some cases it's
actually worse. Kill it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f9dde187

nvme-pci: simplify nvme_cqe_valid · 750dde44

由 Christoph Hellwig 提交于 5月 18, 2018

We always look at the current CQ head and phase, so don't pass these
as separate arguments, and rename the function to nvme_cqe_pending.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

750dde44

12 5月, 2018 1 次提交

nvme/pci: Sync controller reset for AER slot_reset · cc1d5e74

由 Keith Busch 提交于 5月 10, 2018

AER handling expects a successful return from slot_reset means the
driver made the device functional again. The nvme driver had been using
an asynchronous reset to recover the device, so the device
may still be initializing after control is returned to the
AER handler. This creates problems for subsequent event handling,
causing the initializion to fail.

This patch fixes that by syncing the controller reset before returning
to the AER driver, and reporting the true state of the reset.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657Reported-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Tested-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

cc1d5e74

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功