1. 07 12月, 2018 1 次提交
    • J
      nvme: validate controller state before rescheduling keep alive · 86880d64
      James Smart 提交于
      Delete operations are seeing NULL pointer references in call_timer_fn.
      Tracking these back, the timer appears to be the keep alive timer.
      
      nvme_keep_alive_work() which is tied to the timer that is cancelled
      by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
      wait for it's completion. So nvme_stop_keep_alive() only stops a timer
      when it's pending. When a keep alive is in flight, there is no timer
      running and the nvme_stop_keep_alive() will have no affect on the keep
      alive io. Thus, if the io completes successfully, the keep alive timer
      will be rescheduled.   In the failure case, delete is called, the
      controller state is changed, the nvme_stop_keep_alive() is called while
      the io is outstanding, and the delete path continues on. The keep
      alive happens to successfully complete before the delete paths mark it
      as aborted as part of the queue termination, so the timer is restarted.
      The delete paths then tear down the controller, and later on the timer
      code fires and the timer entry is now corrupt.
      
      Fix by validating the controller state before rescheduling the keep
      alive. Testing with the fix has confirmed the condition above was hit.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      86880d64
  2. 01 12月, 2018 3 次提交
  3. 28 11月, 2018 2 次提交
    • I
      nvme-pci: fix surprise removal · 751a0cc0
      Igor Konopko 提交于
      When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
      blk_cleanup_queue() on the admin queue, which frees the hctx for that
      queue.  Moments later, on the same path nvme_kill_queues() calls
      blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
      which leads to following OOPS:
      
      Oops: 0000 [#1] SMP PTI
      RIP: 0010:sbitmap_any_bit_set+0xb/0x40
      Call Trace:
       blk_mq_run_hw_queue+0xd5/0x150
       blk_mq_run_hw_queues+0x3a/0x50
       nvme_kill_queues+0x26/0x50
       nvme_remove_namespaces+0xb2/0xc0
       nvme_remove+0x60/0x140
       pci_device_remove+0x3b/0xb0
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      751a0cc0
    • E
      nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() · dfa74422
      Ewan D. Milne 提交于
      __nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which
      NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request().
      This apparently was not referenced until commit faf4a44fff ("nvme: support traffic
      based keep-alive") which now results in a crash in nvme_complete_rq():
      
      [ 8386.897130] RIP: 0010:panic+0x220/0x26c
      [ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e
      [ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
      [ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0
      [ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8
      [ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c
      [ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      [ 8386.970613]  oops_end+0xd1/0xe0
      [ 8386.974116]  no_context+0x1b2/0x3c0
      [ 8386.978006]  do_page_fault+0x32/0x140
      [ 8386.982090]  page_fault+0x1e/0x30
      [ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core]
      [ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000
      [ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246
      [ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001
      [ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280
      [ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890
      [ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990
      [ 8387.058782]  ? swiotlb_unmap_page+0x40/0x40
      [ 8387.063448]  nvme_fc_complete_rq+0x2d/0x70 [nvme_fc]
      [ 8387.068986]  blk_done_softirq+0xa1/0xd0
      [ 8387.073264]  __do_softirq+0xd6/0x2a9
      [ 8387.077251]  run_ksoftirqd+0x26/0x40
      [ 8387.081238]  smpboot_thread_fn+0x10e/0x160
      [ 8387.085807]  kthread+0xf8/0x130
      [ 8387.089309]  ? sort_range+0x20/0x20
      [ 8387.093198]  ? kthread_stop+0x110/0x110
      [ 8387.097475]  ret_from_fork+0x35/0x40
      [ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]---
      
      Fixes: faf4a44fff ("nvme: support traffic based keep-alive")
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      dfa74422
  4. 27 11月, 2018 1 次提交
  5. 15 11月, 2018 1 次提交
    • J
      nvme-fc: resolve io failures during connect · 4cff280a
      James Smart 提交于
      If an io error occurs on an io issued while connecting, recovery
      of the io falls flat as the state checking ends up nooping the error
      handler.
      
      Create an err_work work item that is scheduled upon an io error while
      connecting. The work thread terminates all io on all queues and marks
      the queues as not connected.  The termination of the io will return
      back to the callee, which will then back out of the connection attempt
      and will reschedule, if possible, the connection attempt.
      
      The changes:
      - in case there are several commands hitting the error handler, a
        state flag is kept so that the error work is only scheduled once,
        on the first error. The subsequent errors can be ignored.
      - The calling sequence to stop keep alive and terminate the queues
        and their io is lifted from the reset routine. Made a small
        service routine used by both reset and err_work.
      - During debugging, found that the teardown path can reference
        an uninitialized pointer, resulting in a NULL pointer oops.
        The aen_ops weren't initialized yet. Add validation on their
        initialization before calling the teardown routine.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4cff280a
  6. 09 11月, 2018 1 次提交
  7. 02 11月, 2018 2 次提交
    • K
      nvme-pci: fix conflicting p2p resource adds · 9fe5c59f
      Keith Busch 提交于
      The nvme pci driver had been adding its CMB resource to the P2P DMA
      subsystem everytime on on a controller reset. This results in the
      following warning:
      
          ------------[ cut here ]------------
          nvme 0000:00:03.0: Conflicting mapping in same section
          WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
          ...
          Call Trace:
           pci_p2pdma_add_resource+0x153/0x370
           nvme_reset_work+0x28c/0x17b1 [nvme]
           ? add_timer+0x107/0x1e0
           ? dequeue_entity+0x81/0x660
           ? dequeue_entity+0x3b0/0x660
           ? pick_next_task_fair+0xaf/0x610
           ? __switch_to+0xbc/0x410
           process_one_work+0x1cf/0x350
           worker_thread+0x215/0x3d0
           ? process_one_work+0x350/0x350
           kthread+0x107/0x120
           ? kthread_park+0x80/0x80
           ret_from_fork+0x1f/0x30
          ---[ end trace f7ea76ac6ee72727 ]---
          nvme nvme0: failed to register the CMB
      
      This patch fixes this by registering the CMB with P2P only once.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9fe5c59f
    • J
      nvme-fc: fix request private initialization · d19b8bc8
      James Smart 提交于
      The patch made to avoid Coverity reporting of out of bounds access
      on aen_op moved the assignment of a pointer, leaving it null when it
      was subsequently used to calculate a private pointer. Thus the private
      pointer was bad.
      
      Move/correct the private pointer initialization to be in sync with the
      patch.
      
      Fixes: 0d2bdf9f ("nvme-fc: rework the request initialization code")
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d19b8bc8
  8. 19 10月, 2018 2 次提交
  9. 18 10月, 2018 3 次提交
  10. 17 10月, 2018 10 次提交
  11. 09 10月, 2018 3 次提交
    • J
      lightnvm: do no update csecs and sos on 1.2 · 6fd05cad
      Javier González 提交于
      1.2 devices exposes their data and metadata size through the separate
      identify command. Make sure that the NVMe LBA format does not override
      these values.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6fd05cad
    • J
      lightnvm: use internal allocation for chunk log page · 090ee26f
      Javier González 提交于
      The lightnvm subsystem provides helpers to retrieve chunk metadata,
      where the target needs to provide a buffer to store the metadata. An
      implicit assumption is that this buffer is contiguous and can be used to
      retrieve the data from the device. If the device exposes too many
      chunks, then kmalloc might fail, thus failing instance creation.
      
      This patch removes this assumption by implementing an internal buffer in
      the lightnvm subsystem to retrieve chunk metadata. Targets can then
      use virtual memory allocations. Since this is a target API change, adapt
      pblk accordingly.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      090ee26f
    • M
      lightnvm: move bad block and chunk state logic to core · aff3fb18
      Matias Bjørling 提交于
      pblk implements two data paths for recovery line state. One for 1.2
      and another for 2.0, instead of having pblk implement these, combine
      them in the core to reduce complexity and make available to other
      targets.
      
      The new interface will adhere to the 2.0 chunk definition,
      including managing open chunks with an active write pointer. To provide
      this interface, a 1.2 device recovers the state of the chunks by
      manually detecting if a chunk is either free/open/close/offline, and if
      open, scanning the flash pages sequentially to find the next writeable
      page. This process takes on average ~10 seconds on a device with 64 dies,
      1024 blocks and 60us read access time. The process can be parallelized
      but is left out for maintenance simplicity, as the 1.2 specification is
      deprecated. For 2.0 devices, the logic is maintained internally in the
      drive and retrieved through the 2.0 interface.
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aff3fb18
  12. 08 10月, 2018 1 次提交
  13. 03 10月, 2018 1 次提交
  14. 02 10月, 2018 6 次提交
  15. 28 9月, 2018 2 次提交
  16. 26 9月, 2018 1 次提交