提交 · 6299358d198a0635da2dd3c4b3ec37789e811e44 · openeuler / Kernel

10 1月, 2019 5 次提交

nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN · 6299358d

由 James Dingwall 提交于 1月 08, 2019

If a device provides an NQN it is expected to be globally unique.
Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did
not satisfy this requirement. In these circumstances if a system has >1
affected device then only one device is enabled. If this quirk is enabled
then the device supplied subnqn is ignored and we fallback to generating
one as if the field was empty. In this case we also suppress the version
check so we don't print a warning when the quirk is enabled.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJames Dingwall <james@dingwall.me.uk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6299358d

nvme-pci: fix out of bounds access in nvme_cqe_pending · dcca1662

由 Hongbo Yao 提交于 1月 07, 2019

There is an out of bounds array access in nvme_cqe_peding().

When enable irq_thread for nvme interrupt, there is racing between the
nvmeq->cq_head updating and reading.

nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head
equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending()
uses its value as an array index, the index will be out of bounds.
Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
[hch: slight coding style update]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

dcca1662

nvme-pci: rerun irq setup on IO queue init errors · 8fae268b

由 Keith Busch 提交于 1月 04, 2019

If the driver is unable to create a subset of IO queues for any reason,
the read/write and polled queue sets will not match the actual allocated
hardware contexts. This leaves gaps in the CPU affinity mappings and
causes the following kernel panic after blk_mq_map_queue_type() returns
a NULL hctx.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 64 PID: 1171 Comm: kworker/u259:1 Not tainted 4.20.0+ #241
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
  Workqueue: nvme-wq nvme_scan_work [nvme_core]
  RIP: 0010:blk_mq_init_allocated_queue+0x2d9/0x440
  RSP: 0018:ffffb1bf0abc3cd0 EFLAGS: 00010286
  RAX: 000000000000001f RBX: ffff8ea744cf0718 RCX: 0000000000000000
  RDX: 0000000000000002 RSI: 000000000000007c RDI: ffffffff9109a820
  RBP: ffff8ea7565f7008 R08: 000000000000001f R09: 000000000000003f
  R10: ffffb1bf0abc3c00 R11: 0000000000000000 R12: 000000000001d008
  R13: ffff8ea7565f7008 R14: 000000000000003f R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8ea757200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000198 CR3: 0000000013058000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   blk_mq_init_queue+0x35/0x60
   nvme_validate_ns+0xc6/0x7c0 [nvme_core]
   ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core]
   nvme_scan_work+0xc8/0x340 [nvme_core]
   ? __wake_up_common+0x6d/0x120
   ? try_to_wake_up+0x55/0x410
   process_one_work+0x1e9/0x3d0
   worker_thread+0x2d/0x3d0
   ? process_one_work+0x3d0/0x3d0
   kthread+0x111/0x130
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30
  Modules linked in: nvme nvme_core serio_raw
  CR2: 0000000000000198

Fix by re-running the interrupt vector setup from scratch using a reduced
count that may be successful until the created queues matches the irq
affinity plus polling queue sets.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8fae268b

nvme-pci: use the same attributes when freeing host_mem_desc_bufs. · cc667f6d

由 Liviu Dudau 提交于 12月 29, 2018

When using HMB the PCIe host driver allocates host_mem_desc_bufs using
dma_alloc_attrs() but frees them using dma_free_coherent(). Use the
correct dma_free_attrs() function to free the buffers.
Signed-off-by: NLiviu Dudau <liviu@dudau.co.uk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cc667f6d

nvme-pci: fix the wrong setting of nr_maps · c61e678f

由 Jianchao Wang 提交于 12月 24, 2018

We only set the nr_maps to 3 if poll queues are supported.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c61e678f

19 12月, 2018 3 次提交

nvme-pci: trace SQ status on completions · 604c01d5

由 yupeng 提交于 12月 18, 2018

Export the disk name, queue id, sq_head, sq_tail to a trace event in
completion handling.

Usage example:

cd /sys/kernel/debug/tracing/events/nvme/nvme_sq

echo 'disk=="nvme1n1"' > filter

echo 1 > enable

cat /sys/kernel/debug/tracing/trace_pipe
Signed-off-by: Nyupeng <yupeng0921@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
[hch: slight formatting tweaks, use standard nvme tracepoint
 conventions]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

wip

604c01d5

nvme-pci: refactor nvme_poll_irqdisable to make sparse happy · 91a509f8

由 Christoph Hellwig 提交于 12月 13, 2018

By duplicating the nvme_process_cq in both branches we keep the
sparse lock context checking happy, so do it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

91a509f8

nvme-pci: only set nr_maps to 2 if poll queues are supported · ed92ad37

由 Christoph Hellwig 提交于 12月 14, 2018

The block layer now enables polling support on a queue if nr_maps
includes the poll map, so we should only set that if we actually
support poll queues.

Fixes:  6544d229 ("block: enable polling by default if a poll map is initalized")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

ed92ad37

17 12月, 2018 1 次提交

nvme-pci: don't share queue maps · 7e849dd9

由 Christoph Hellwig 提交于 12月 17, 2018

Now that the block layer checks if a queue map has any queues inside
it there is no more reason to duplicate the maps for the non-default
types.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7e849dd9

11 12月, 2018 1 次提交

nvme: fix irq vs io_queue calculations · 6451fe73

由 Jens Axboe 提交于 12月 09, 2018

Guenter reported an boot hang issue on HPPA after we default to 0 poll
queues. We have two issues in the queue count calculations:

1) We don't separate the poll queues from the read/write queues. This is
   important, since the former doesn't need interrupts.
2) The adjust logic is broken.

Adjust the poll queue count before doing nvme_calc_io_queues(). The poll
queue count is only limited by the IO queue count we were able to get
from the controller, not failures in the IRQ allocation loop. This
leaves nvme_calc_io_queues() just adjusting the read/write queue map.
Reported-by: NReported-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6451fe73

05 12月, 2018 9 次提交

block: only allow polling if a poll queue_map exists · 376f7ef8

由 Christoph Hellwig 提交于 12月 02, 2018

This avoids having to have differnet mq_ops for different setups
with or without poll queues.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

376f7ef8

nvme-pci: remove the CQ lock for interrupt driven queues · 3a7afd8e

由 Christoph Hellwig 提交于 12月 02, 2018

Now that we can't poll regular, interrupt driven I/O queues there
is almost nothing that can race with an interrupt.  The only
possible other contexts polling a CQ are the error handler and
queue shutdown, and both are so far off in the slow path that
we can simply use the big hammer of disabling interrupts.

With that we can stop taking the cq_lock for normal queues.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3a7afd8e

nvme-pci: don't poll from irq context when deleting queues · d1ed6aa1

由 Christoph Hellwig 提交于 12月 02, 2018

This is the last place outside of nvme_irq that handles CQEs from
interrupt context, and thus is in the way of removing the cq_lock for
normal queues, and avoiding lockdep warnings on the poll queues, for
which we already take it without IRQ disabling.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d1ed6aa1

nvme-pci: refactor nvme_disable_io_queues · 5271edd4

由 Christoph Hellwig 提交于 12月 02, 2018

Pass the opcode for the delete SQ/CQ command as an argument instead of
the somewhat confusing pass loop.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5271edd4

nvme-pci: consolidate code for polling non-dedicated queues · 0b2a8a9f

由 Christoph Hellwig 提交于 12月 02, 2018

We have three places that can poll for I/O completions on a normal
interrupt-enabled queue.  All of them are in slow path code, so
consolidate them to a single helper that uses spin_lock_irqsave and
removes the fast path cqe_pending check.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b2a8a9f

nvme-pci: only allow polling with separate poll queues · c6d962ae

由 Christoph Hellwig 提交于 12月 02, 2018

This will allow us to simplify both the regular NVMe interrupt handler
and the upcoming aio poll code.  In addition to that the separate
queues are generally a good idea for performance reasons.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6d962ae

nvme-pci: cleanup SQ allocation a bit · 63223078

由 Christoph Hellwig 提交于 12月 02, 2018

Use a bit flag to mark if the SQ was allocated from the CMB, and clean
up the surrounding code a bit.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

63223078

nvme-pci: use atomic bitops to mark a queue enabled · 4e224106

由 Christoph Hellwig 提交于 12月 02, 2018

This gets rid of all the messing with cq_vector and the ->polled field
by using an atomic bitop to mark the queue enabled or not.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4e224106

block: move queues types to the block layer · e20ba6e1

由 Christoph Hellwig 提交于 12月 02, 2018

Having another indirect all in the fast path doesn't really help
in our post-spectre world.  Also having too many queue type is just
going to create confusion, so I'd rather manage them centrally.

Note that the queue type naming and ordering changes a bit - the
first index now is the default queue for everything not explicitly
marked, the optional ones are read and poll queues.
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e20ba6e1

30 11月, 2018 1 次提交

nvme: implement mq_ops->commit_rqs() hook · 04f3eafd

由 Jens Axboe 提交于 11月 29, 2018

Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on virtualized hardware, where writing the
SQ doorbell is more expensive than on real hardware. For those cases,
performance increases of 2-3x have been observed.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests at the time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

04f3eafd

26 11月, 2018 2 次提交

blk-mq: remove 'tag' parameter from mq_ops->poll() · 9743139c

由 Jens Axboe 提交于 11月 16, 2018

We always pass in -1 now and none of the callers use the tag value,
remove the parameter.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9743139c

blk-mq: when polling for IO, look for any completion · 1052b8ac

由 Jens Axboe 提交于 11月 26, 2018

If we want to support async IO polling, then we have to allow finding
completions that aren't just for the one we are looking for. Always pass
in -1 to the mq_ops->poll() helper, and have that return how many events
were found in this poll loop.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1052b8ac

19 11月, 2018 1 次提交

nvme: default to 0 poll queues · a4668d9b

由 Jens Axboe 提交于 11月 19, 2018

We need a better way of configuring this, and given that polling is
(still) a bit niche, let's default to using 0 poll queues. That way
we'll have the same read/write/poll behavior as 4.20, and users that
want to test/use polling are required to do manual configuration of the
number of poll queues.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a4668d9b

16 11月, 2018 2 次提交

nvme: provide optimized poll function for separate poll queues · dabcefab

由 Jens Axboe 提交于 11月 14, 2018

If we have separate poll queues, we know that they aren't using
interrupts. Hence we don't need to disable interrupts around
finding completions.

Provide a separate set of blk_mq_ops for such devices.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dabcefab

nvme: fix handling of EINVAL on pci_alloc_irq_vectors_affinity() · db29eb05

由 Jens Axboe 提交于 11月 15, 2018

At least on SPARC, if MSI/MSI-X isn't supported, we get EINVAL if
we ask for more than one vector. This isn't covered by our ENOSPC
check.

If we get EINVAL, decrease our ask to just one vector, instead of
bailing out in error.

Fixes: 3b6592f7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

db29eb05

15 11月, 2018 1 次提交

nvme: fix boot hang with only being able to get one IRQ vector · 30e06628

由 Jens Axboe 提交于 11月 14, 2018

NVMe always asks for io_queues + 1 worth of IRQ vectors, which
means that even when we scale all the way down, we still ask
for 2 vectors and get -ENOSPC in return if the system can't
support more than 1.

Getting just 1 vector is fine, it just means that we'll have
1 IO queue and 1 admin queue, with a shared vector between them.
Check for this case and don't add our + 1 if it happens.

Fixes: 3b6592f7 ("nvme: utilize two queue maps, one for reads and one for writes")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

30e06628

08 11月, 2018 3 次提交

nvme: add separate poll queue map · 4b04cc6a

由 Jens Axboe 提交于 11月 05, 2018

Adds support for defining a variable number of poll queues, currently
configurable with the 'poll_queues' module parameter. Defaults to
a single poll queue.

And now we finally have poll support without triggering interrupts!
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b04cc6a

nvme: utilize two queue maps, one for reads and one for writes · 3b6592f7

由 Jens Axboe 提交于 10月 31, 2018

NVMe does round-robin between queues by default, which means that
sharing a queue map for both reads and writes can be problematic
in terms of read servicing. It's much easier to flood the queue
with writes and reduce the read servicing.

Implement two queue maps, one for reads and one for writes. The
write queue count is configurable through the 'write_queues'
parameter.

By default, we retain the previous behavior of having a single
queue set, shared between reads and writes. Setting 'write_queues'
to a non-zero value will create two queue sets, one for reads and
one for writes, the latter using the configurable number of
queues (hardware queue counts permitting).
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b6592f7

blk-mq: abstract out queue map · ed76e329

由 Jens Axboe 提交于 10月 29, 2018

This is in preparation for allowing multiple sets of maps per
queue, if so desired.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ed76e329

02 11月, 2018 1 次提交

nvme-pci: fix conflicting p2p resource adds · 9fe5c59f

由 Keith Busch 提交于 10月 31, 2018

The nvme pci driver had been adding its CMB resource to the P2P DMA
subsystem everytime on on a controller reset. This results in the
following warning:

    ------------[ cut here ]------------
    nvme 0000:00:03.0: Conflicting mapping in same section
    WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
    ...
    Call Trace:
     pci_p2pdma_add_resource+0x153/0x370
     nvme_reset_work+0x28c/0x17b1 [nvme]
     ? add_timer+0x107/0x1e0
     ? dequeue_entity+0x81/0x660
     ? dequeue_entity+0x3b0/0x660
     ? pick_next_task_fair+0xaf/0x610
     ? __switch_to+0xbc/0x410
     process_one_work+0x1cf/0x350
     worker_thread+0x215/0x3d0
     ? process_one_work+0x350/0x350
     kthread+0x107/0x120
     ? kthread_park+0x80/0x80
     ret_from_fork+0x1f/0x30
    ---[ end trace f7ea76ac6ee72727 ]---
    nvme nvme0: failed to register the CMB

This patch fixes this by registering the CMB with P2P only once.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9fe5c59f

18 10月, 2018 3 次提交

nvme-pci: remove duplicate check · 3045c0d0

由 Chaitanya Kulkarni 提交于 10月 17, 2018

This is a cleanup patch doesn't change any functionality. It removes
the duplicate call to the blk_integrity_rq() in the nvme_map_data().
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3045c0d0

nvme-pci: Add support for P2P memory in requests · e0596ab2

由 Logan Gunthorpe 提交于 10月 04, 2018

For P2P requests, we must use the pci_p2pmem_map_sg() function instead of
the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.  For
this, we create an NVME_F_PCI_P2P flag which tells the core to set
QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

e0596ab2

nvme-pci: Use PCI p2pmem subsystem to manage the CMB · 0f238ff5

由 Logan Gunthorpe 提交于 10月 04, 2018

Register the CMB buffer as p2pmem and use the appropriate allocation
functions to create and destroy the IO submission queues.

If the CMB supports WDS and RDS, publish it for use as P2P memory by other
devices.

Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
However, seeing the main use-cases for the CMB is P2P operations, this
seems like a reasonable dependency.

We drop the __iomem safety on the buffer seeing that, by convention, it's
safe to directly access memory mapped by memremap()/devm_memremap_pages().
Architectures where this is not safe will not be supported by memremap()
and therefore will not support PCI P2P and have no support for CMB.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

0f238ff5

17 10月, 2018 2 次提交

nvme-pci: fix hot removal during error handling · cb4bfda6

由 Keith Busch 提交于 10月 15, 2018

A removal waits for the reset_work to complete. If a surprise removal
occurs around the same time as an error triggered controller reset, and
reset work happened to dispatch a command to the removed controller, the
command won't be recovered since the timeout work doesn't do anything
during error recovery. We wouldn't want to wait for timeout handling
anyway, so this patch fixes this by disabling the controller and killing
admin queues prior to syncing with the reset_work.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cb4bfda6

nvme-pci: fix nvme_suspend_queue() kernel-doc header · 40581d1a

由 Bart Van Assche 提交于 10月 08, 2018

This patch avoids that the kernel-doc tool complains about the
nvme_suspend_queue() function header when building with W=1.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

40581d1a

03 10月, 2018 1 次提交

PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls · 62b36c3e

由 Oza Pawandeep 提交于 9月 28, 2018

After bfcb79fc ("PCI/ERR: Run error recovery callbacks for all affected
devices"), AER errors are always cleared by the PCI core and drivers don't
need to do it themselves.

Remove calls to pci_cleanup_aer_uncorrect_error_status() from device
driver error recovery functions.
Signed-off-by: NOza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove PCI core changes, remove unused variables]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

62b36c3e

28 8月, 2018 1 次提交

nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2

由 Michal Wnukowski 提交于 8月 15, 2018

In many architectures loads may be reordered with older stores to
different locations.  In the nvme driver the following two operations
could be reordered:

 - Write shadow doorbell (dbbuf_db) into memory.
 - Read EventIdx (dbbuf_ei) from memory.

This can result in a potential race condition between driver and VM host
processing requests (if given virtual NVMe controller has a support for
shadow doorbell).  If that occurs, then the NVMe controller may decide to
wait for MMIO doorbell from guest operating system, and guest driver may
decide not to issue MMIO doorbell on any of subsequent commands.

This issue is purely timing-dependent one, so there is no easy way to
reproduce it. Currently the easiest known approach is to run "Oracle IO
Numbers" (orion) that is shipped with Oracle DB:

orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
	concat -write 40 -duration 120 -matrix row -testname nvme_test

Where nvme_test is a .lun file that contains a list of NVMe block
devices to run test against. Limiting number of vCPUs assigned to given
VM instance seems to increase chances for this bug to occur. On test
environment with VM that got 4 NVMe drives and 1 vCPU assigned the
virtual NVMe controller hang could be observed within 10-20 minutes.
That correspond to about 400-500k IO operations processed (or about
100GB of IO read/writes).

Orion tool was used as a validation and set to run in a loop for 36
hours (equivalent of pushing 550M IO operations). No issues were
observed. That suggest that the patch fixes the issue.

Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
[hch: updated changelog and comment a bit]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f1ed3df2

30 7月, 2018 1 次提交

nvme: use blk API to remap ref tags for IOs with metadata · f7f1fc36

由 Max Gurtovoy 提交于 7月 30, 2018

Also moved the logic of the remapping to the nvme core driver instead
of implementing it in the nvme pci driver. This way all the other nvme
transport drivers will benefit from it (in case they'll implement metadata
support).
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7f1fc36

23 7月, 2018 1 次提交

nvme: cache struct nvme_ctrl reference to struct nvme_request · 59e29ce6

由 Sagi Grimberg 提交于 6月 29, 2018

We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

59e29ce6

12 7月, 2018 1 次提交

nvme-pci: fix memory leak on probe failure · b6e44b4c

由 Keith Busch 提交于 7月 11, 2018

The nvme driver specific structures need to be initialized prior to
enabling the generic controller so we can unwind on failure with out
using the reference counting callbacks so that 'probe' and 'remove'
can be symmetric.

The newly added iod_mempool is the only resource that was being
allocated out of order, and a failure there would leak the generic
controller memory. This patch just moves that allocation above the
controller initialization.

Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations")
Reported-by: NWeiping Zhang <zwp10758@gmail.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b6e44b4c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功