1. 07 12月, 2018 1 次提交
    • J
      nvme: validate controller state before rescheduling keep alive · 86880d64
      James Smart 提交于
      Delete operations are seeing NULL pointer references in call_timer_fn.
      Tracking these back, the timer appears to be the keep alive timer.
      
      nvme_keep_alive_work() which is tied to the timer that is cancelled
      by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
      wait for it's completion. So nvme_stop_keep_alive() only stops a timer
      when it's pending. When a keep alive is in flight, there is no timer
      running and the nvme_stop_keep_alive() will have no affect on the keep
      alive io. Thus, if the io completes successfully, the keep alive timer
      will be rescheduled.   In the failure case, delete is called, the
      controller state is changed, the nvme_stop_keep_alive() is called while
      the io is outstanding, and the delete path continues on. The keep
      alive happens to successfully complete before the delete paths mark it
      as aborted as part of the queue termination, so the timer is restarted.
      The delete paths then tear down the controller, and later on the timer
      code fires and the timer entry is now corrupt.
      
      Fix by validating the controller state before rescheduling the keep
      alive. Testing with the fix has confirmed the condition above was hit.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      86880d64
  2. 01 12月, 2018 1 次提交
  3. 28 11月, 2018 1 次提交
    • I
      nvme-pci: fix surprise removal · 751a0cc0
      Igor Konopko 提交于
      When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
      blk_cleanup_queue() on the admin queue, which frees the hctx for that
      queue.  Moments later, on the same path nvme_kill_queues() calls
      blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
      which leads to following OOPS:
      
      Oops: 0000 [#1] SMP PTI
      RIP: 0010:sbitmap_any_bit_set+0xb/0x40
      Call Trace:
       blk_mq_run_hw_queue+0xd5/0x150
       blk_mq_run_hw_queues+0x3a/0x50
       nvme_kill_queues+0x26/0x50
       nvme_remove_namespaces+0xb2/0xc0
       nvme_remove+0x60/0x140
       pci_device_remove+0x3b/0xb0
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      751a0cc0
  4. 27 11月, 2018 1 次提交
  5. 09 11月, 2018 1 次提交
  6. 18 10月, 2018 1 次提交
  7. 17 10月, 2018 3 次提交
  8. 08 10月, 2018 1 次提交
  9. 02 10月, 2018 3 次提交
  10. 28 9月, 2018 2 次提交
  11. 08 8月, 2018 1 次提交
  12. 30 7月, 2018 2 次提交
  13. 28 7月, 2018 3 次提交
  14. 23 7月, 2018 2 次提交
  15. 20 7月, 2018 1 次提交
  16. 17 7月, 2018 2 次提交
    • W
      nvme: don't enable AEN if not supported · fa441b71
      Weiping Zhang 提交于
      Avoid excuting set_feature command if there is no supported bit in
      Optional Asynchronous Events Supported (OAES).
      
      Fixes: c0561f82 ("nvme: submit AEN event configuration on startup")
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NWeiping Zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      fa441b71
    • S
      nvme: ensure forward progress during Admin passthru · cf39a6bc
      Scott Bauer 提交于
      If the controller supports effects and goes down during the passthru admin
      command we will deadlock during namespace revalidation.
      
      [  363.488275] INFO: task kworker/u16:5:231 blocked for more than 120 seconds.
      [  363.488290]       Not tainted 4.17.0+ #2
      [  363.488296] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  363.488303] kworker/u16:5   D    0   231      2 0x80000000
      [  363.488331] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
      [  363.488338] Call Trace:
      [  363.488385]  schedule+0x75/0x190
      [  363.488396]  rwsem_down_read_failed+0x1c3/0x2f0
      [  363.488481]  call_rwsem_down_read_failed+0x14/0x30
      [  363.488504]  down_read+0x1d/0x80
      [  363.488523]  nvme_stop_queues+0x1e/0xa0 [nvme_core]
      [  363.488536]  nvme_dev_disable+0xae4/0x1620 [nvme]
      [  363.488614]  nvme_reset_work+0xd1e/0x49d9 [nvme]
      [  363.488911]  process_one_work+0x81a/0x1400
      [  363.488934]  worker_thread+0x87/0xe80
      [  363.488955]  kthread+0x2db/0x390
      [  363.488977]  ret_from_fork+0x35/0x40
      
      Fixes: 84fef62d ("nvme: check admin passthru command effects")
      Signed-off-by: NScott Bauer <scott.bauer@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cf39a6bc
  17. 22 6月, 2018 1 次提交
    • J
      nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e
      Jens Axboe 提交于
      nvme requires an sg table allocation for each request. If the request
      is large, then the allocation can become quite large. For instance,
      with our default software settings of 1280KB IO size, we'll need
      10248 bytes of sg table. That turns into a 2nd order allocation,
      which we can't always guarantee. If we fail the allocation, blk-mq
      will retry it later. But there's no guarantee that we'll EVER be
      able to allocate that much contigious memory.
      
      Limit the IO size such that we never need more than a single page
      of memory. That's a lot faster and more reliable. Then back that
      allocation with a mempool, so that we know we'll always be able
      to succeed the allocation at some point.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      943e942e
  18. 14 6月, 2018 1 次提交
  19. 13 6月, 2018 1 次提交
  20. 11 6月, 2018 1 次提交
  21. 09 6月, 2018 1 次提交
  22. 01 6月, 2018 4 次提交
  23. 31 5月, 2018 1 次提交
  24. 30 5月, 2018 1 次提交
  25. 25 5月, 2018 2 次提交
  26. 23 5月, 2018 1 次提交