1. 15 9月, 2016 1 次提交
  2. 11 8月, 2016 1 次提交
    • G
      nvme: Suspend all queues before deletion · c21377f8
      Gabriel Krisman Bertazi 提交于
      When nvme_delete_queue fails in the first pass of the
      nvme_disable_io_queues() loop, we return early, failing to suspend all
      of the IO queues.  Later, on the nvme_pci_disable path, this causes us
      to disable MSI without actually having freed all the IRQs, which
      triggers the BUG_ON in free_msi_irqs(), as show below.
      
      This patch refactors nvme_disable_io_queues to suspend all queues before
      start submitting delete queue commands.  This way, we ensure that we
      have at least returned every IRQ before continuing with the removal
      path.
      
      [  487.529200] kernel BUG at ../drivers/pci/msi.c:368!
      cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650]
          pc: c000000000627a50: free_msi_irqs+0x90/0x200
          lr: c000000000627a40: free_msi_irqs+0x80/0x200
          sp: c0000078c5b838d0
         msr: 9000000100029033
        current = 0xc0000078c5b40000
        paca    = 0xc000000002bd7600   softe: 0        irq_happened: 0x01
          pid   = 1376, comm = kworker/70:1H
      kernel BUG at ../drivers/pci/msi.c:368!
      Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413
      (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016
      enter ? for help
      [c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme]
      [c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme]
      [c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110
      [c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170
      [c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110
      [c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0
      [c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590
      [c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660
      [c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130
      [c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c
      Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Brian King <brking@linux.vnet.ibm.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-nvme@lists.infradead.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c21377f8
  3. 21 7月, 2016 1 次提交
  4. 14 7月, 2016 1 次提交
  5. 13 7月, 2016 1 次提交
    • K
      nvme: Limit command retries · f80ec966
      Keith Busch 提交于
      Many controller implementations will return errors to commands that will
      not succeed, but without the DNR bit set. The driver previously retried
      these commands an unlimited number of times until the command timeout
      has exceeded, which takes an unnecessarilly long period of time.
      
      This patch limits the number of retries a command can have, defaulting
      to 5, but is user tunable at load or runtime.
      
      The struct request's 'retries' field is used to track the number of
      retries attempted. This is in contrast with scsi's use of this field,
      which indicates how many retries are allowed.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f80ec966
  6. 12 7月, 2016 1 次提交
    • G
      nvme/quirk: Add a delay before checking for adapter readiness · 54adc010
      Guilherme G. Piccoli 提交于
      When disabling the controller, the specification says the register
      NVME_REG_CC should be written and then driver needs to wait the
      adapter to be ready, which is checked by reading another register
      bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
      so in case this timeout is reached the driver gives up and removes
      the adapter from the system.
      
      After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
      (HGST adapter) end up being removed if we issue a reset_controller,
      because driver keeps verifying the NVME_REG_CSTS until the timeout is
      reached. This patch adds a necessary quirk for this adapter, by
      introducing a delay before nvme_wait_ready(), so the reset procedure
      is able to be completed. This quirk is needed because just increasing
      the timeout is not enough in case of this adapter - the driver must
      wait before start reading NVME_REG_CSTS register on this specific
      device.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      54adc010
  7. 06 7月, 2016 3 次提交
  8. 22 6月, 2016 1 次提交
  9. 12 6月, 2016 1 次提交
  10. 10 6月, 2016 1 次提交
  11. 08 6月, 2016 2 次提交
  12. 18 5月, 2016 6 次提交
  13. 04 5月, 2016 1 次提交
    • K
      NVMe: Fix reset/remove race · 87c32077
      Keith Busch 提交于
      This fixes a scenario where device is present and being reset, but a
      request to unbind the driver occurs.
      
      A previous patch series addressing a device failure removal scenario
      flushed reset_work after controller disable to unblock reset_work waiting
      on a completion that wouldn't occur. This isn't safe as-is. The broken
      scenario can potentially be induced with:
      
        modprobe nvme && modprobe -r nvme
      
      To fix, the reset work is flushed immediately after setting the controller
      removing flag, and any subsequent reset will not proceed with controller
      initialization if the flag is set.
      
      The controller status must be polled while active, so the watchdog timer
      is also left active until the controller is disabled to cleanup requests
      that may be stuck during namespace removal.
      
      [Fixes: ff23a2a1]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      87c32077
  14. 02 5月, 2016 7 次提交
  15. 15 4月, 2016 1 次提交
    • K
      NVMe: Always use MSI/MSI-x interrupts · a5229050
      Keith Busch 提交于
      Multiple users have reported device initialization failure due the driver
      not receiving legacy PCI interrupts. This is not unique to any particular
      controller, but has been observed on multiple platforms.
      
      There have been no issues reported or observed when with message signaled
      interrupts, so this patch attempts to use MSI-x during initialization,
      falling back to MSI. If that fails, legacy would become the default.
      
      The setup_io_queues error handling had to change as a result: the admin
      queue's msix_entry used to be initialized to the legacy IRQ. The case
      where nr_io_queues is 0 would fail request_irq when setting up the admin
      queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
      the admin queue's msix_entry invalid. Instead, return success immediately.
      Reported-by: NTim Muhlemmer <muhlemmer@gmail.com>
      Reported-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a5229050
  16. 13 4月, 2016 9 次提交
    • G
      nvme: Avoid reset work on watchdog timer function during error recovery · c875a709
      Guilherme G. Piccoli 提交于
      This patch adds a check on nvme_watchdog_timer() function to avoid the
      call to reset_work() when an error recovery process is ongoing on
      controller. The check is made by looking at pci_channel_offline()
      result.
      
      If we don't check for this on nvme_watchdog_timer(), error recovery
      mechanism can't recover well, because reset_work() won't be able to
      do its job (since we're in the middle of an error) and so the
      controller is removed from the system before error recovery mechanism
      can perform slot reset (which would allow the adapter to recover).
      
      In this patch we also have split the huge condition expression on
      nvme_watchdog_timer() by introducing an auxiliary function to help
      make the code more readable.
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c875a709
    • J
      NVMe: silence warning about unused 'dev' · 7e197930
      Jens Axboe 提交于
      Depending on options, we might not be using dev in nvme_cancel_io():
      
      drivers/nvme/host/pci.c: In function ‘nvme_cancel_io’:
      drivers/nvme/host/pci.c:970:19: warning: unused variable ‘dev’ [-Wunused-variable]
        struct nvme_dev *dev = data;
                         ^
      
      So get rid of it, and just cast for the dev_dbg_ratelimited() call.
      
      Fixes: 82b4552b ("nvme: Use blk-mq helper for IO termination")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7e197930
    • S
      nvme: Use blk-mq helper for IO termination · 82b4552b
      Sagi Grimberg 提交于
      blk-mq offers a tagset iterator so let's use that
      instead of using nvme_clear_queues.
      
      Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io
      as there is no concept of a queue now in this function (we
      also lost the print).
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      82b4552b
    • K
      NVMe: Skip async events for degraded controllers · 21f033f7
      Keith Busch 提交于
      If the controller is degraded, the driver should stay out of the way so
      the user can recover the drive. This patch skips driver initiated async
      event requests when the drive is in this state.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      21f033f7
    • M
      nvme: add helper nvme_setup_cmd() · 8093f7ca
      Ming Lin 提交于
      This moves nvme_setup_{flush,discard,rw} calls into a common
      nvme_setup_cmd() helper. So we can eventually hide all the command
      setup in the core module and don't even need to update the fabrics
      drivers for any specific command type.
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8093f7ca
    • M
      nvme: rewrite discard support · 03b5929e
      Ming Lin 提交于
      This rewrites nvme_setup_discard() with blk_add_request_payload().
      It allocates only the necessary amount(16 bytes) for the payload.
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      03b5929e
    • M
      nvme: add helper nvme_map_len() · 58b45602
      Ming Lin 提交于
      The helper returns the number of bytes that need to be mapped
      using PRPs/SGL entries.
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      58b45602
    • M
      nvme: add missing lock nesting notation · 2e39e0f6
      Ming Lin 提交于
      When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue()
      that sends nvme_admin_delete_cq command to admin sq. So when the command
      completed, the lock acquired by nvme_irq() actually belongs to admin queue.
      
      While the lock that nvme_del_cq_end() trying to acquire belongs to io queue.
      So it will not deadlock.
      
      This patch adds lock nesting notation to fix following report.
      
      [  109.840952] =============================================
      [  109.846379] [ INFO: possible recursive locking detected ]
      [  109.851806] 4.5.0+ #180 Tainted: G            E
      [  109.856533] ---------------------------------------------
      [  109.861958] swapper/0/0 is trying to acquire lock:
      [  109.866771]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
      [  109.876535]
      [  109.876535] but task is already holding lock:
      [  109.882398]  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
      [  109.891547]
      [  109.891547] other info that might help us debug this:
      [  109.898107]  Possible unsafe locking scenario:
      [  109.898107]
      [  109.904056]        CPU0
      [  109.906515]        ----
      [  109.908974]   lock(&(&nvmeq->q_lock)->rlock);
      [  109.913381]   lock(&(&nvmeq->q_lock)->rlock);
      [  109.917787]
      [  109.917787]  *** DEADLOCK ***
      [  109.917787]
      [  109.923738]  May be due to missing lock nesting notation
      [  109.923738]
      [  109.930558] 1 lock held by swapper/0/0:
      [  109.934413]  #0:  (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme]
      [  109.944010]
      [  109.944010] stack backtrace:
      [  109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G            E   4.5.0+ #180
      [  109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013
      [  109.962989]  0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540
      [  109.970478]  ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046
      [  109.977964]  0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00
      [  109.985453] Call Trace:
      [  109.987911]  <IRQ>  [<ffffffff81383d9c>] dump_stack+0x85/0xc9
      [  109.993711]  [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60
      [  109.999575]  [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10
      [  110.005524]  [<ffffffff810b386d>] ? complete+0x3d/0x50
      [  110.010688]  [<ffffffff810bb760>] lock_acquire+0x90/0xf0
      [  110.016029]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
      [  110.022418]  [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60
      [  110.028632]  [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme]
      [  110.035019]  [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme]
      [  110.041232]  [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60
      [  110.047095]  [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme]
      [  110.053481]  [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130
      [  110.060043]  [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40
      [  110.066343]  [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme]
      [  110.072818]  [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme]
      [  110.078419]  [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110
      [  110.084804]  [<ffffffff810cdc77>] handle_irq_event+0x37/0x60
      [  110.090491]  [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150
      [  110.096180]  [<ffffffff81012306>] handle_irq+0xa6/0x130
      [  110.101431]  [<ffffffff81011abe>] do_IRQ+0x5e/0x120
      [  110.106333]  [<ffffffff8177384c>] common_interrupt+0x8c/0x8c
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2e39e0f6
    • K
      NVMe: Always use MSI/MSI-x interrupts · 788e15ab
      Keith Busch 提交于
      Multiple users have reported device initialization failure due the driver
      not receiving legacy PCI interrupts. This is not unique to any particular
      controller, but has been observed on multiple platforms.
      
      There have been no issues reported or observed when with message signaled
      interrupts, so this patch attempts to use MSI-x during initialization,
      falling back to MSI. If that fails, legacy would become the default.
      
      The setup_io_queues error handling had to change as a result: the admin
      queue's msix_entry used to be initialized to the legacy IRQ. The case
      where nr_io_queues is 0 would fail request_irq when setting up the admin
      queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving
      the admin queue's msix_entry invalid. Instead, return success immediately.
      Reported-by: NTim Muhlemmer <muhlemmer@gmail.com>
      Reported-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      788e15ab
  17. 12 4月, 2016 1 次提交
    • K
      NVMe: Fix reset/remove race · 9bf2b972
      Keith Busch 提交于
      This fixes a scenario where device is present and being reset, but a
      request to unbind the driver occurs.
      
      A previous patch series addressing a device failure removal scenario
      flushed reset_work after controller disable to unblock reset_work waiting
      on a completion that wouldn't occur. This isn't safe as-is. The broken
      scenario can potentially be induced with:
      
        modprobe nvme && modprobe -r nvme
      
      To fix, the reset work is flushed immediately after setting the controller
      removing flag, and any subsequent reset will not proceed with controller
      initialization if the flag is set.
      
      The controller status must be polled while active, so the watchdog timer
      is also left active until the controller is disabled to cleanup requests
      that may be stuck during namespace removal.
      
      [Fixes: ff23a2a1]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9bf2b972
  18. 23 3月, 2016 1 次提交