提交 · f63859ea6129d62bcb20495641dbb2c78f81af5f · openanolis / cloud-kernel

02 9月, 2020 1 次提交

blk-mq: abstract out queue map · f63859ea

由 Jens Axboe 提交于 10月 29, 2018

to #28991349

commit ed76e329d74a4b15ac0f5fd3adbd52ec0178a134 upstream

This is in preparation for allowing multiple sets of maps per
queue, if so desired.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f63859ea

29 6月, 2020 1 次提交

nvme: implement mq_ops->commit_rqs() hook · a2724bab

由 Jens Axboe 提交于 11月 29, 2018

fix #28871358

commit 04f3eafda6e05adc56afed4d3ae6e24aaa429058 upstream

Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on virtualized hardware, where writing the
SQ doorbell is more expensive than on real hardware. For those cases,
performance increases of 2-3x have been observed.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests at the time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a2724bab

15 6月, 2020 1 次提交

nvme: retain split access workaround for capability reads · c2fe0cbc

由 Ard Biesheuvel 提交于 10月 03, 2019

task #28557808

[ Upstream commit 3a8ecc935efabdad106b5e06d07b150c394b4465 ]

Commit 7fd8930f

  "nvme: add a common helper to read Identify Controller data"

has re-introduced an issue that we have attempted to work around in the
past, in commit a310acd7 ("NVMe: use split lo_hi_{read,write}q").

The problem is that some PCIe NVMe controllers do not implement 64-bit
outbound accesses correctly, which is why the commit above switched
to using lo_hi_[read|write]q for all 64-bit BAR accesses occuring in
the code.

In the mean time, the NVMe subsystem has been refactored, and now calls
into the PCIe support layer for NVMe via a .reg_read64() method, which
fails to use lo_hi_readq(), and thus reintroduces the problem that the
workaround above aimed to address.

Given that, at the moment, .reg_read64() is only used to read the
capability register [which is known to tolerate split reads], let's
switch .reg_read64() to lo_hi_readq() as well.

This fixes a boot issue on some ARM boxes with NVMe behind a Synopsys
DesignWare PCIe host controller.

Fixes: 7fd8930f ("nvme: add a common helper to read Identify Controller data")
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c2fe0cbc

17 1月, 2020 1 次提交

blk-mq: when polling for IO, look for any completion · d25f577c

由 Jens Axboe 提交于 11月 26, 2018

commit 1052b8ac5282daf35df331edcbdb645839d17e6a upstream.

If we want to support async IO polling, then we have to allow finding
completions that aren't just for the one we are looking for. Always pass
in -1 to the mq_ops->poll() helper, and have that return how many events
were found in this poll loop.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

d25f577c

15 1月, 2020 1 次提交

alinux: nvme-pci: Disable dicard zero-out functionality on Intel's P3600 NVMe disk drive · 2cde0dfb

由 Wenwei Tao 提交于 12月 13, 2017

We found huge performance lost on below particular Intel's disk drive
when discard zeroout functionality is enabled on it. The issue was
found when we have ext4 filesystem mounted on the disk drive and
started regular FIO testing. With it disabled, we don't observe
performance lost any more.

81:00.0 Non-Volatile memory controller: Intel Corporation \
             PCIe Data Center SSD (rev 01)

This imposes to disable the discard zero-out functionality on above
disk drive in order to regain the high performance that NVMe disk
driver supposes to provide.

Differential Revision: https://aone.alibaba-inc.com/code/D377540Signed-off-by: NWenwei Tao <wenwei.tao@linux.alibaba.com>
Reviewed-by: NGavin Shan <shan.gavin@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

2cde0dfb

01 12月, 2019 2 次提交

nvme-pci: fix conflicting p2p resource adds · fa7f1bce

由 Keith Busch 提交于 10月 31, 2018

[ Upstream commit 9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1 ]

The nvme pci driver had been adding its CMB resource to the P2P DMA
subsystem everytime on on a controller reset. This results in the
following warning:

    ------------[ cut here ]------------
    nvme 0000:00:03.0: Conflicting mapping in same section
    WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
    ...
    Call Trace:
     pci_p2pdma_add_resource+0x153/0x370
     nvme_reset_work+0x28c/0x17b1 [nvme]
     ? add_timer+0x107/0x1e0
     ? dequeue_entity+0x81/0x660
     ? dequeue_entity+0x3b0/0x660
     ? pick_next_task_fair+0xaf/0x610
     ? __switch_to+0xbc/0x410
     process_one_work+0x1cf/0x350
     worker_thread+0x215/0x3d0
     ? process_one_work+0x350/0x350
     kthread+0x107/0x120
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30
    ---[ end trace f7ea76ac6ee72727 ]---
    nvme nvme0: failed to register the CMB

This patch fixes this by registering the CMB with P2P only once.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fa7f1bce

nvme-pci: fix hot removal during error handling · 305c262f

由 Keith Busch 提交于 10月 15, 2018

[ Upstream commit cb4bfda62afa25b4eee3d635d33fccdd9485dd7c ]

A removal waits for the reset_work to complete. If a surprise removal
occurs around the same time as an error triggered controller reset, and
reset work happened to dispatch a command to the removed controller, the
command won't be recovered since the timeout work doesn't do anything
during error recovery. We wouldn't want to wait for timeout handling
anyway, so this patch fixes this by disabling the controller and killing
admin queues prior to syncing with the reset_work.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

305c262f

06 9月, 2019 1 次提交

nvme-pci: Fix async probe remove race · 4a982919

由 Keith Busch 提交于 7月 29, 2019

[ Upstream commit bd46a90634302bfe791e93ad5496f98f165f7ae0 ]

Ensure the controller is not in the NEW state when nvme_probe() exits.
This will always allow a subsequent nvme_remove() to set the state to
DELETING, fixing a potential race between the initial asynchronous probe
and device removal.
Reported-by: NLi Zhong <lizhongfs@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4a982919

26 7月, 2019 2 次提交

nvme-pci: set the errno on ctrl state change error · 762bba1b

由 Chaitanya Kulkarni 提交于 6月 08, 2019

[ Upstream commit e71afda49335620e3d9adf56015676db33a3bd86 ]

This patch removes the confusing assignment of the variable result at
the time of declaration and sets the value in error cases next to the
places where the actual error is happening.

Here we also set the result value to -ENODEV when we fail at the final
ctrl state transition in nvme_reset_work(). Without this assignment
result will hold 0 from nvme_setup_io_queue() and on failure 0 will be
passed to he nvme_remove_dead_ctrl() from final state transition.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

762bba1b

nvme-pci: properly report state change failure in nvme_reset_work · c876a665

由 Minwoo Im 提交于 6月 09, 2019

[ Upstream commit cee6c269b016ba89c62e34d6bccb103ee2c7de4f ]

If the state change to NVME_CTRL_CONNECTING fails, the dmesg is going to
be like:

  [  293.689160] nvme nvme0: failed to mark controller CONNECTING
  [  293.689160] nvme nvme0: Removing after probe failure status: 0

Even it prints the first line to indicate the situation, the second line
is not proper because the status is 0 which means normally success of
the previous operation.

This patch makes it indicate the proper error value when it fails.
  [   25.932367] nvme nvme0: failed to mark controller CONNECTING
  [   25.932369] nvme nvme0: Removing after probe failure status: -16

This situation is able to be easily reproduced by:
  root@target:~# rmmod nvme && modprobe nvme && rmmod nvme
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c876a665

15 6月, 2019 2 次提交

nvme-pci: shutdown on timeout during deletion · 52d7b067

由 Keith Busch 提交于 4月 30, 2019

[ Upstream commit 9dc1a38ef1925d23c2933c5867df816386d92ff8 ]

We do not restart a controller in a deleting state for timeout errors.
When in this state, unblock potential request dispatchers with failed
completions by shutting down the controller on timeout detection.
Reported-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

52d7b067

nvme-pci: unquiesce admin queue on shutdown · 6ce2ad24

由 Keith Busch 提交于 4月 30, 2019

[ Upstream commit c8e9e9b7646ebe1c5066ddc420d7630876277eb4 ]

Just like IO queues, the admin queue also will not be restarted after a
controller shutdown. Unquiesce this queue so that we do not block
request dispatch on a permanently disabled controller.
Reported-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

6ce2ad24

14 3月, 2019 2 次提交

nvme-pci: add missing unlock for reset error · 7066774e

由 Keith Busch 提交于 2月 11, 2019

[ Upstream commit 4726bcf30fad37cc555cd9dcd6c73f2b2668c879 ]

The reset work holds a mutex to prevent races with removal modifying the
same resources, but was unlocking only on success. Unlock on failure
too.

Fixes: 5c959d73dba64 ("nvme-pci: fix rapid add remove sequence")
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7066774e

nvme-pci: fix rapid add remove sequence · 3cc6703d

由 Keith Busch 提交于 1月 23, 2019

[ Upstream commit 5c959d73dba6495ec01d04c206ee679d61ccb2b0 ]

A surprise removal may fail to tear down request queues if it is racing
with the initial asynchronous probe. If that happens, the remove path
won't see the queue resources to tear down, and the controller reset
path may create a new request queue on a removed device, but will not
be able to make forward progress, deadlocking the pci removal.

Protect setting up non-blocking resources from a shutdown by holding the
same mutex, and transition to the CONNECTING state after these resources
are initialized so the probe path may see the dead controller state
before dispatching new IO.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081Reported-by: NAlex Gagniuc <Alex_Gagniuc@Dellteam.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Tested-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3cc6703d

20 2月, 2019 2 次提交

nvme-pci: fix out of bounds access in nvme_cqe_pending · 095cfdf8

由 Hongbo Yao 提交于 1月 07, 2019

[ Upstream commit dcca1662727220d18fa351097ddff33f95f516c5 ]

There is an out of bounds array access in nvme_cqe_peding().

When enable irq_thread for nvme interrupt, there is racing between the
nvmeq->cq_head updating and reading.

nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head
equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending()
uses its value as an array index, the index will be out of bounds.
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
[hch: slight coding style update]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

095cfdf8

nvme-pci: use the same attributes when freeing host_mem_desc_bufs. · 1e746fe2

由 Liviu Dudau 提交于 12月 29, 2018

[ Upstream commit cc667f6d5de023ee131e96bb88e5cddca23272bd ]

When using HMB the PCIe host driver allocates host_mem_desc_bufs using
dma_alloc_attrs() but frees them using dma_free_coherent(). Use the
correct dma_free_attrs() function to free the buffers.
Signed-off-by: NLiviu Dudau <liviu@dudau.co.uk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1e746fe2

28 8月, 2018 1 次提交

nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2

由 Michal Wnukowski 提交于 8月 15, 2018

In many architectures loads may be reordered with older stores to
different locations.  In the nvme driver the following two operations
could be reordered:

 - Write shadow doorbell (dbbuf_db) into memory.
 - Read EventIdx (dbbuf_ei) from memory.

This can result in a potential race condition between driver and VM host
processing requests (if given virtual NVMe controller has a support for
shadow doorbell).  If that occurs, then the NVMe controller may decide to
wait for MMIO doorbell from guest operating system, and guest driver may
decide not to issue MMIO doorbell on any of subsequent commands.

This issue is purely timing-dependent one, so there is no easy way to
reproduce it. Currently the easiest known approach is to run "Oracle IO
Numbers" (orion) that is shipped with Oracle DB:

orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
	concat -write 40 -duration 120 -matrix row -testname nvme_test

Where nvme_test is a .lun file that contains a list of NVMe block
devices to run test against. Limiting number of vCPUs assigned to given
VM instance seems to increase chances for this bug to occur. On test
environment with VM that got 4 NVMe drives and 1 vCPU assigned the
virtual NVMe controller hang could be observed within 10-20 minutes.
That correspond to about 400-500k IO operations processed (or about
100GB of IO read/writes).

Orion tool was used as a validation and set to run in a loop for 36
hours (equivalent of pushing 550M IO operations). No issues were
observed. That suggest that the patch fixes the issue.

Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
[hch: updated changelog and comment a bit]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f1ed3df2

30 7月, 2018 1 次提交

nvme: use blk API to remap ref tags for IOs with metadata · f7f1fc36

由 Max Gurtovoy 提交于 7月 30, 2018

Also moved the logic of the remapping to the nvme core driver instead
of implementing it in the nvme pci driver. This way all the other nvme
transport drivers will benefit from it (in case they'll implement metadata
support).
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7f1fc36

23 7月, 2018 1 次提交

nvme: cache struct nvme_ctrl reference to struct nvme_request · 59e29ce6

由 Sagi Grimberg 提交于 6月 29, 2018

We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

59e29ce6

12 7月, 2018 1 次提交

nvme-pci: fix memory leak on probe failure · b6e44b4c

由 Keith Busch 提交于 7月 11, 2018

The nvme driver specific structures need to be initialized prior to
enabling the generic controller so we can unwind on failure with out
using the reference counting callbacks so that 'probe' and 'remove'
can be symmetric.

The newly added iod_mempool is the only resource that was being
allocated out of order, and a failure there would leak the generic
controller memory. This patch just moves that allocation above the
controller initialization.

Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations")
Reported-by: NWeiping Zhang <zwp10758@gmail.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b6e44b4c

22 6月, 2018 1 次提交

nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e

由 Jens Axboe 提交于 6月 21, 2018

nvme requires an sg table allocation for each request. If the request
is large, then the allocation can become quite large. For instance,
with our default software settings of 1280KB IO size, we'll need
10248 bytes of sg table. That turns into a 2nd order allocation,
which we can't always guarantee. If we fail the allocation, blk-mq
will retry it later. But there's no guarantee that we'll EVER be
able to allocate that much contigious memory.

Limit the IO size such that we never need more than a single page
of memory. That's a lot faster and more reliable. Then back that
allocation with a mempool, so that we know we'll always be able
to succeed the allocation at some point.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

943e942e

21 6月, 2018 1 次提交

nvme-pci: move nvme_kill_queues to nvme_remove_dead_ctrl · 9f9cafc1

由 Jianchao Wang 提交于 6月 20, 2018

There is race between nvme_remove and nvme_reset_work that can
lead to io hang.

nvme_remove                    nvme_reset_work
                               -> nvme_remove_dead_ctrl
                                 -> nvme_dev_disable
                                   -> quiesce request_queue
                                 -> queue remove_work
-> cancel_work_sync reset_work
-> nvme_remove_namespaces
  -> splice ctrl->namespaces
                               nvme_remove_dead_ctrl_work
                               -> nvme_kill_queues
  -> nvme_ns_remove               do nothing
    -> blk_cleanup_queue
      -> blk_freeze_queue

Finally, the request_queue is quiesced state when wait freeze,
we will get io hang here. To fix it, move the nvme_kill_queues
from nvme_remove_dead_ctrl_work to nvme_remove_dead_ctrl.
Suggested-by: NKeith Busch <keith.busch@linux.intel.com>
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9f9cafc1

09 6月, 2018 6 次提交

nvme-pci: make CMB SQ mod-param read-only · 69f4eb9f

由 Keith Busch 提交于 6月 06, 2018

A controller reset after a run time change of the CMB module parameter
breaks the driver. An 'on -> off' will have the driver use NULL for the
host memory queue, and 'off -> on' will use mismatched queue depth between
the device and the host.

We could fix both, but there isn't really a good reason to change this
at run time anyway, compared to at module load time, so this patch makes
parameter read-only after after modprobe.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

69f4eb9f

nvme-pci: unquiesce dead controller queues · 1d39e692

由 Keith Busch 提交于 6月 06, 2018

This patch ensures the nvme namsepace request queues are not quiesced
on a surprise removal. It's possible the queues were previously killed
in a failed reset, so the queues need to be unquiesced to ensure all
requests are flushed to completion.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1d39e692

nvme-pci: remove HMB teardown on reset · fe76fcfb

由 Keith Busch 提交于 6月 06, 2018

The controller is required to disable its host memory buffer use on
controller reset. We don't need to submit an admin command to delete it,
so this patch skips sending that command so we don't need to worry about
handling a timeout.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe76fcfb

nvme-pci: queue creation fixes · ded45505

由 Keith Busch 提交于 6月 06, 2018

We've been ignoring NVMe error status on queue creations. Fortunately they
are uncommon, but we should handle these anyway. This patch adds checks
for the a positive error return value that indicates an NVMe status.

If we do see a negative return, the controller isn't usable, so this
patch returns immediately in since we can't unwind that failure.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ded45505

nvme-pci: remove unnecessary completion doorbell check · 397c699f

由 Keith Busch 提交于 6月 06, 2018

The nvme pci driver never unmaps the doorbell registers while the requests
are active, so we can always safely update the completion queue head.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

397c699f

nvme-pci: remove unnecessary nested locking · 0bc88192

由 Keith Busch 提交于 6月 06, 2018

The nvme pci driver no longer handles completions under the cq lock,
so the nested locking is not necessary.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bc88192

30 5月, 2018 2 次提交

nvme-pci: simplify __nvme_submit_cmd · 90ea5ca4

由 Christoph Hellwig 提交于 5月 26, 2018

With recent CQ handling improvements we can now move the locking into
__nvme_submit_cmd.  Also remove the local tail variable to make the code
more obvious, remove the __ prefix in the name, and fix the comments
describing the function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>

90ea5ca4

nvme-pci: Rate limit the nvme timeout warnings · b9cac43c

由 Keith Busch 提交于 5月 24, 2018

The block layer's timeout handling currently prevents drivers from
completing commands outside the timeout callback once blk-mq decides
they've expired. If a device breaks, this could potentially create many
thousands of timed out commands. There's nothing of value to be gleaned
from observing each of those messages, so this patch adds a rate limit
on them.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b9cac43c

29 5月, 2018 1 次提交

nvme: return BLK_EH_DONE from ->timeout · db8c48e4

由 Christoph Hellwig 提交于 5月 29, 2018

NVMe always completes the request before returning from ->timeout, either
by polling for it, or by disabling the controller.  Return BLK_EH_DONE so
that the block layer doesn't even try to complete it again.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db8c48e4

25 5月, 2018 2 次提交

nvme-pci: Fix AER reset handling · 72cd4cc2

由 Keith Busch 提交于 5月 24, 2018

The nvme timeout handling doesn't do anything if the pci channel is
offline, which is the case when recovering from PCI error event, so it
was a bad idea to sync the controller reset in this state. This patch
flushes the reset work in the error_resume callback instead when the
channel is back to online. This keeps AER handling serialized and
can recover from timeouts.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199757
Fixes: cc1d5e74 ("nvme/pci: Sync controller reset for AER slot_reset")
Reported-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Tested-by: NAlex Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

72cd4cc2

nvme-pci: set nvmeq->cq_vector after alloc cq/sq · a8e3e0bb

由 Jianchao Wang 提交于 5月 24, 2018

Set cq_vector after alloc cq/sq, otherwise nvme_suspend_queue will invoke
free_irq for it and cause a 'Trying to free already-free IRQ  xxx'
warning if the create CQ/SQ command times out.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
[hch: fixed to pass a s16 and clean up the comment]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a8e3e0bb

21 5月, 2018 1 次提交

nvme-pci: fix race between poll and IRQ completions · 68fa9dbe

由 Jens Axboe 提交于 5月 21, 2018

If polling completions are racing with the IRQ triggered by a
completion, the IRQ handler will find no work and return IRQ_NONE.
This can trigger complaints about spurious interrupts:

[  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
[  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
[  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
[  560.175991] Call Trace:
[  560.175994]  <IRQ>
[  560.176005]  dump_stack+0x5c/0x7b
[  560.176010]  __report_bad_irq+0x30/0xc0
[  560.176013]  note_interrupt+0x235/0x280
[  560.176020]  handle_irq_event_percpu+0x51/0x70
[  560.176023]  handle_irq_event+0x27/0x50
[  560.176026]  handle_edge_irq+0x6d/0x180
[  560.176031]  handle_irq+0xa5/0x110
[  560.176036]  do_IRQ+0x41/0xc0
[  560.176042]  common_interrupt+0xf/0xf
[  560.176043]  </IRQ>
[  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
[  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
[  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
[  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
[  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
[  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
[  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
[  560.176071]  do_idle+0x1f4/0x260
[  560.176075]  cpu_startup_entry+0x6f/0x80
[  560.176080]  start_secondary+0x184/0x1d0
[  560.176085]  secondary_startup_64+0xa5/0xb0
[  560.176088] handlers:
[  560.178387] [<00000000efb612be>] nvme_irq [nvme]
[  560.183019] Disabling IRQ #630

A previous commit removed ->cqe_seen that was handling this case,
but we need to handle this a bit differently due to completions
now running outside the queue lock. Return IRQ_HANDLED from the
IRQ handler, if the completion ring head was moved since we last
saw it.

Fixes: 5cb525c8 ("nvme-pci: handle completions outside of the queue lock")
Reported-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Tested-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

68fa9dbe

19 5月, 2018 6 次提交

nvme-pci: drop IRQ disabling on submission queue lock · 1eae349d

由 Jens Axboe 提交于 5月 17, 2018

Since we aren't sharing the lock for completions now, we don't
have to make it IRQ safe.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1eae349d

nvme-pci: split the nvme queue lock into submission and completion locks · 1ab0cd69

由 Jens Axboe 提交于 5月 17, 2018

This is now feasible. We protect the submission queue ring with
->sq_lock, and the completion side with ->cq_lock.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1ab0cd69

nvme-pci: handle completions outside of the queue lock · 5cb525c8

由 Jens Axboe 提交于 5月 17, 2018

Split the completion of events into a two part process:

1) Reap the events inside the queue lock
2) Complete the events outside the queue lock

Since we never wrap the queue, we can access it locklessly after we've
updated the completion queue head. This patch started off with batching
events on the stack, but with this trick we don't have to. Keith Busch
<keith.busch@intel.com> came up with that idea.

Note that this kills the ->cqe_seen as well. I haven't been able to
trigger any ill effects of this. If we do race with polling every so
often, it should be rare enough NOT to trigger any issues.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[hch: refactored, restored poll early exit optimization]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5cb525c8

nvme-pci: move ->cq_vector == -1 check outside of ->q_lock · d1f06f4a

由 Jens Axboe 提交于 5月 17, 2018

We only clear it dynamically in nvme_suspend_queue(). When we do, ensure
to do a full flush so that any nvme_queue_rq() invocation will see it.

Ideally we'd kill this check completely, but we're using it to flush
requests on a dying queue.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d1f06f4a

nvme-pci: remove cq check after submission · f9dde187

由 Jens Axboe 提交于 5月 17, 2018

We always check the completion queue after submitting, but in my testing
this isn't a win even on DRAM/xpoint devices. In some cases it's
actually worse. Kill it.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f9dde187

nvme-pci: simplify nvme_cqe_valid · 750dde44

由 Christoph Hellwig 提交于 5月 18, 2018

We always look at the current CQ head and phase, so don't pass these
as separate arguments, and rename the function to nvme_cqe_pending.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

750dde44

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功