1. 17 1月, 2020 1 次提交
  2. 15 1月, 2020 1 次提交
  3. 01 12月, 2019 2 次提交
    • K
      nvme-pci: fix conflicting p2p resource adds · fa7f1bce
      Keith Busch 提交于
      [ Upstream commit 9fe5c59ff6a1e5e26a39b75489a1420e7eaaf0b1 ]
      
      The nvme pci driver had been adding its CMB resource to the P2P DMA
      subsystem everytime on on a controller reset. This results in the
      following warning:
      
          ------------[ cut here ]------------
          nvme 0000:00:03.0: Conflicting mapping in same section
          WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
          ...
          Call Trace:
           pci_p2pdma_add_resource+0x153/0x370
           nvme_reset_work+0x28c/0x17b1 [nvme]
           ? add_timer+0x107/0x1e0
           ? dequeue_entity+0x81/0x660
           ? dequeue_entity+0x3b0/0x660
           ? pick_next_task_fair+0xaf/0x610
           ? __switch_to+0xbc/0x410
           process_one_work+0x1cf/0x350
           worker_thread+0x215/0x3d0
           ? process_one_work+0x350/0x350
           kthread+0x107/0x120
           ? kthread_park+0x80/0x80
           ret_from_fork+0x1f/0x30
          ---[ end trace f7ea76ac6ee72727 ]---
          nvme nvme0: failed to register the CMB
      
      This patch fixes this by registering the CMB with P2P only once.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fa7f1bce
    • K
      nvme-pci: fix hot removal during error handling · 305c262f
      Keith Busch 提交于
      [ Upstream commit cb4bfda62afa25b4eee3d635d33fccdd9485dd7c ]
      
      A removal waits for the reset_work to complete. If a surprise removal
      occurs around the same time as an error triggered controller reset, and
      reset work happened to dispatch a command to the removed controller, the
      command won't be recovered since the timeout work doesn't do anything
      during error recovery. We wouldn't want to wait for timeout handling
      anyway, so this patch fixes this by disabling the controller and killing
      admin queues prior to syncing with the reset_work.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      305c262f
  4. 06 9月, 2019 1 次提交
  5. 26 7月, 2019 2 次提交
  6. 15 6月, 2019 2 次提交
  7. 14 3月, 2019 2 次提交
  8. 20 2月, 2019 2 次提交
  9. 28 8月, 2018 1 次提交
    • M
      nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2
      Michal Wnukowski 提交于
      In many architectures loads may be reordered with older stores to
      different locations.  In the nvme driver the following two operations
      could be reordered:
      
       - Write shadow doorbell (dbbuf_db) into memory.
       - Read EventIdx (dbbuf_ei) from memory.
      
      This can result in a potential race condition between driver and VM host
      processing requests (if given virtual NVMe controller has a support for
      shadow doorbell).  If that occurs, then the NVMe controller may decide to
      wait for MMIO doorbell from guest operating system, and guest driver may
      decide not to issue MMIO doorbell on any of subsequent commands.
      
      This issue is purely timing-dependent one, so there is no easy way to
      reproduce it. Currently the easiest known approach is to run "Oracle IO
      Numbers" (orion) that is shipped with Oracle DB:
      
      orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
      	concat -write 40 -duration 120 -matrix row -testname nvme_test
      
      Where nvme_test is a .lun file that contains a list of NVMe block
      devices to run test against. Limiting number of vCPUs assigned to given
      VM instance seems to increase chances for this bug to occur. On test
      environment with VM that got 4 NVMe drives and 1 vCPU assigned the
      virtual NVMe controller hang could be observed within 10-20 minutes.
      That correspond to about 400-500k IO operations processed (or about
      100GB of IO read/writes).
      
      Orion tool was used as a validation and set to run in a loop for 36
      hours (equivalent of pushing 550M IO operations). No issues were
      observed. That suggest that the patch fixes the issue.
      
      Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
      Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: updated changelog and comment a bit]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f1ed3df2
  10. 30 7月, 2018 1 次提交
  11. 23 7月, 2018 1 次提交
  12. 12 7月, 2018 1 次提交
    • K
      nvme-pci: fix memory leak on probe failure · b6e44b4c
      Keith Busch 提交于
      The nvme driver specific structures need to be initialized prior to
      enabling the generic controller so we can unwind on failure with out
      using the reference counting callbacks so that 'probe' and 'remove'
      can be symmetric.
      
      The newly added iod_mempool is the only resource that was being
      allocated out of order, and a failure there would leak the generic
      controller memory. This patch just moves that allocation above the
      controller initialization.
      
      Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations")
      Reported-by: NWeiping Zhang <zwp10758@gmail.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b6e44b4c
  13. 22 6月, 2018 1 次提交
    • J
      nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e
      Jens Axboe 提交于
      nvme requires an sg table allocation for each request. If the request
      is large, then the allocation can become quite large. For instance,
      with our default software settings of 1280KB IO size, we'll need
      10248 bytes of sg table. That turns into a 2nd order allocation,
      which we can't always guarantee. If we fail the allocation, blk-mq
      will retry it later. But there's no guarantee that we'll EVER be
      able to allocate that much contigious memory.
      
      Limit the IO size such that we never need more than a single page
      of memory. That's a lot faster and more reliable. Then back that
      allocation with a mempool, so that we know we'll always be able
      to succeed the allocation at some point.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      943e942e
  14. 21 6月, 2018 1 次提交
  15. 09 6月, 2018 6 次提交
  16. 30 5月, 2018 2 次提交
  17. 29 5月, 2018 1 次提交
  18. 25 5月, 2018 2 次提交
  19. 21 5月, 2018 1 次提交
    • J
      nvme-pci: fix race between poll and IRQ completions · 68fa9dbe
      Jens Axboe 提交于
      If polling completions are racing with the IRQ triggered by a
      completion, the IRQ handler will find no work and return IRQ_NONE.
      This can trigger complaints about spurious interrupts:
      
      [  560.169153] irq 630: nobody cared (try booting with the "irqpoll" option)
      [  560.175988] CPU: 40 PID: 0 Comm: swapper/40 Not tainted 4.17.0-rc2+ #65
      [  560.175990] Hardware name: Intel Corporation S2600STB/S2600STB, BIOS SE5C620.86B.00.01.0010.010920180151 01/09/2018
      [  560.175991] Call Trace:
      [  560.175994]  <IRQ>
      [  560.176005]  dump_stack+0x5c/0x7b
      [  560.176010]  __report_bad_irq+0x30/0xc0
      [  560.176013]  note_interrupt+0x235/0x280
      [  560.176020]  handle_irq_event_percpu+0x51/0x70
      [  560.176023]  handle_irq_event+0x27/0x50
      [  560.176026]  handle_edge_irq+0x6d/0x180
      [  560.176031]  handle_irq+0xa5/0x110
      [  560.176036]  do_IRQ+0x41/0xc0
      [  560.176042]  common_interrupt+0xf/0xf
      [  560.176043]  </IRQ>
      [  560.176050] RIP: 0010:cpuidle_enter_state+0x9b/0x2b0
      [  560.176052] RSP: 0018:ffffa0ed4659fe98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
      [  560.176055] RAX: ffff9527beb20a80 RBX: 000000826caee491 RCX: 000000000000001f
      [  560.176056] RDX: 000000826caee491 RSI: 00000000335206ee RDI: 0000000000000000
      [  560.176057] RBP: 0000000000000001 R08: 00000000ffffffff R09: 0000000000000008
      [  560.176059] R10: ffffa0ed4659fe78 R11: 0000000000000001 R12: ffff9527beb29358
      [  560.176060] R13: ffffffffa235d4b8 R14: 0000000000000000 R15: 000000826caed593
      [  560.176065]  ? cpuidle_enter_state+0x8b/0x2b0
      [  560.176071]  do_idle+0x1f4/0x260
      [  560.176075]  cpu_startup_entry+0x6f/0x80
      [  560.176080]  start_secondary+0x184/0x1d0
      [  560.176085]  secondary_startup_64+0xa5/0xb0
      [  560.176088] handlers:
      [  560.178387] [<00000000efb612be>] nvme_irq [nvme]
      [  560.183019] Disabling IRQ #630
      
      A previous commit removed ->cqe_seen that was handling this case,
      but we need to handle this a bit differently due to completions
      now running outside the queue lock. Return IRQ_HANDLED from the
      IRQ handler, if the completion ring head was moved since we last
      saw it.
      
      Fixes: 5cb525c8 ("nvme-pci: handle completions outside of the queue lock")
      Reported-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      68fa9dbe
  20. 19 5月, 2018 6 次提交
  21. 12 5月, 2018 2 次提交
  22. 07 5月, 2018 1 次提交