1. 31 5月, 2022 1 次提交
  2. 28 5月, 2022 1 次提交
  3. 16 5月, 2022 3 次提交
    • S
      nvme-pci: harden drive presence detect in nvme_dev_disable() · b98235d3
      Stefan Roese 提交于
      On our ZynqMP system we observe, that a NVMe drive that resets itself
      while doing a firmware update causes a Kernel crash like this:
      
      [ 67.720772] pcieport 0000:02:02.0: pciehp: Slot(2): Link Down
      [ 67.720783] pcieport 0000:02:02.0: pciehp: Slot(2): Card not present
      [ 67.720795] nvme 0000:04:00.0: PME# disabled
      [ 67.720849] Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
      [ 67.720853] nwl-pcie fd0e0000.pcie: Slave error
      
      Analysis: When nvme_dev_disable() is called because of this PCIe hotplug
      event, pci_is_enabled() is still true. And accessing the NVMe drive
      which is currently not available as it's in reboot process causes this
      "synchronous external abort" on this ARM64 platform.
      
      This patch adds the pci_device_is_present() check as well, which returns
      false in this "Card not present" hot-plug case. With this change, the
      NVMe driver does not try to access the NVMe registers any more and the
      FW update finishes without any problems.
      Signed-off-by: NStefan Roese <sr@denx.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b98235d3
    • S
      nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags · da427611
      Smith, Kyle Miller (Nimble Kernel) 提交于
      In nvme_alloc_admin_tags, the admin_q can be set to an error (typically
      -ENOMEM) if the blk_mq_init_queue call fails to set up the queue, which
      is checked immediately after the call. However, when we return the error
      message up the stack, to nvme_reset_work the error takes us to
      nvme_remove_dead_ctrl()
        nvme_dev_disable()
         nvme_suspend_queue(&dev->queues[0]).
      
      Here, we only check that the admin_q is non-NULL, rather than not
      an error or NULL, and begin quiescing a queue that never existed, leading
      to bad / NULL pointer dereference.
      Signed-off-by: NKyle Smith <kyles@hpe.com>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      da427611
    • C
      nvme: mark internal passthru request RQF_QUIET · 128126a7
      Chaitanya Kulkarni 提交于
      Most of the internal passthru commands use __nvme_submit_sync_cmd()
      interface. There are few places we open code the request submission :-
      
      1. nvme_keep_alive_work(struct work_struct *work)
      2. nvme_timeout(struct request *req, bool reserved)
      3. nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
      
      Mark the internal passthru request quiet so that we can skip the verbose
      error message from nvme_log_error() in nvme_end_req() completion path,
      this will be consistent with what we have in __nvme_submit_sync_cmd().
      Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NAlan Adamson <alan.adamson@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      128126a7
  4. 15 4月, 2022 2 次提交
  5. 23 3月, 2022 2 次提交
  6. 16 3月, 2022 1 次提交
  7. 04 3月, 2022 1 次提交
  8. 27 1月, 2022 1 次提交
  9. 06 1月, 2022 1 次提交
  10. 17 12月, 2021 3 次提交
  11. 29 11月, 2021 1 次提交
  12. 21 10月, 2021 1 次提交
  13. 20 10月, 2021 1 次提交
  14. 19 10月, 2021 3 次提交
    • J
      nvme: wire up completion batching for the IRQ path · 4f502245
      Jens Axboe 提交于
      Trivial to do now, just need our own io_comp_batch on the stack and pass
      that in to the usual command completion handling.
      
      I pondered making this dependent on how many entries we had to process,
      but even for a single entry there's no discernable difference in
      performance or latency. Running a sync workload over io_uring:
      
      t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
      
      yields the below performance before the patch:
      
      IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      
      and the following after:
      
      IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
      IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
      
      which definitely isn't slower, about the same if you factor in a bit of
      variance. For peak performance workloads, benchmarking shows a 2%
      improvement.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4f502245
    • J
      nvme: add support for batched completion of polled IO · c234a653
      Jens Axboe 提交于
      Take advantage of struct io_comp_batch, if passed in to the nvme poll
      handler. If it's set, rather than complete each request individually
      inline, store them in the io_comp_batch list. We only do so for requests
      that will complete successfully, anything else will be completed inline as
      before.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c234a653
    • J
      block: add a struct io_comp_batch argument to fops->iopoll() · 5a72e899
      Jens Axboe 提交于
      struct io_comp_batch contains a list head and a completion handler, which
      will allow completions to more effciently completed batches of IO.
      
      For now, no functional changes in this patch, we just define the
      io_comp_batch structure and add the argument to the file_operations iopoll
      handler.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5a72e899
  15. 18 10月, 2021 1 次提交
  16. 07 10月, 2021 1 次提交
  17. 28 9月, 2021 1 次提交
  18. 16 8月, 2021 6 次提交
  19. 15 8月, 2021 1 次提交
  20. 21 7月, 2021 1 次提交
    • Z
      nvme-pci: don't WARN_ON in nvme_reset_work if ctrl.state is not RESETTING · 7764656b
      Zhihao Cheng 提交于
      Followling process:
      nvme_probe
        nvme_reset_ctrl
          nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING)
          queue_work(nvme_reset_wq, &ctrl->reset_work)
      
      -------------->	nvme_remove
      		  nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING)
      worker_thread
        process_one_work
          nvme_reset_work
          WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)
      
      , which will trigger WARN_ON in nvme_reset_work():
      [  127.534298] WARNING: CPU: 0 PID: 139 at drivers/nvme/host/pci.c:2594
      [  127.536161] CPU: 0 PID: 139 Comm: kworker/u8:7 Not tainted 5.13.0
      [  127.552518] Call Trace:
      [  127.552840]  ? kvm_sched_clock_read+0x25/0x40
      [  127.553936]  ? native_send_call_func_single_ipi+0x1c/0x30
      [  127.555117]  ? send_call_function_single_ipi+0x9b/0x130
      [  127.556263]  ? __smp_call_single_queue+0x48/0x60
      [  127.557278]  ? ttwu_queue_wakelist+0xfa/0x1c0
      [  127.558231]  ? try_to_wake_up+0x265/0x9d0
      [  127.559120]  ? ext4_end_io_rsv_work+0x160/0x290
      [  127.560118]  process_one_work+0x28c/0x640
      [  127.561002]  worker_thread+0x39a/0x700
      [  127.561833]  ? rescuer_thread+0x580/0x580
      [  127.562714]  kthread+0x18c/0x1e0
      [  127.563444]  ? set_kthread_struct+0x70/0x70
      [  127.564347]  ret_from_fork+0x1f/0x30
      
      The preceding problem can be easily reproduced by executing following
      script (based on blktests suite):
      test() {
        pdev="$(_get_pci_dev_from_blkdev)"
        sysfs="/sys/bus/pci/devices/${pdev}"
        for ((i = 0; i < 10; i++)); do
          echo 1 > "$sysfs/remove"
          echo 1 > /sys/bus/pci/rescan
        done
      }
      
      Since the device ctrl could be updated as an non-RESETTING state by
      repeating probe/remove in userspace (which is a normal situation), we
      can replace stack dumping WARN_ON with a warnning message.
      
      Fixes: 82b057ca ("nvme-pci: fix multiple ctrl removal schedulin")
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      7764656b
  21. 13 7月, 2021 2 次提交
    • C
      nvme-pci: do not call nvme_dev_remove_admin from nvme_remove · 251ef6f7
      Casey Chen 提交于
      nvme_dev_remove_admin could free dev->admin_q and the admin_tagset
      while they are being accessed by nvme_dev_disable(), which can be called
      by nvme_reset_work via nvme_remove_dead_ctrl.
      
      Commit cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      intended to avoid requests being stuck on a removed controller by killing
      the admin queue. But the later fix c8e9e9b7 ("nvme-pci: unquiesce
      admin queue on shutdown"), together with nvme_dev_disable(dev, true)
      right before nvme_dev_remove_admin() could help dispatch requests and
      fail them early, so we don't need nvme_dev_remove_admin() any more.
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: NCasey Chen <cachen@purestorage.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      251ef6f7
    • C
      nvme-pci: fix multiple races in nvme_setup_io_queues · e4b9852a
      Casey Chen 提交于
      Below two paths could overlap each other if we power off a drive quickly
      after powering it on. There are multiple races in nvme_setup_io_queues()
      because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.
      
      nvme_reset_work()                                nvme_remove()
        nvme_setup_io_queues()                           nvme_dev_disable()
        ...                                              ...
      A1  clear NVMEQ_ENABLED bit for admin queue          lock
          retry:                                       B1  nvme_suspend_io_queues()
      A2    pci_free_irq() admin queue                 B2  nvme_suspend_queue() admin queue
      A3    pci_free_irq_vectors()                         nvme_pci_disable()
      A4    nvme_setup_irqs();                         B3    pci_free_irq_vectors()
            ...                                            unlock
      A5    queue_request_irq() for admin queue
            set NVMEQ_ENABLED bit
            ...
            nvme_create_io_queues()
      A6      result = queue_request_irq();
              set NVMEQ_ENABLED bit
            ...
            fail to allocate enough IO queues:
      A7      nvme_suspend_io_queues()
              goto retry
      
      If B3 runs in between A1 and A2, it will crash if irqaction haven't
      been freed by A2. B2 is supposed to free admin queue IRQ but it simply
      can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.
      
      Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
      gets cleared.
      
      After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
      is checking irqaction. A3 also could race with B2 if B2 is freeing
      IRQ while A3 is checking irqaction.
      
      Fix: A2 and A3 take lock for mutual exclusion.
      
      A3 could race with B3 since they could run free_msi_irqs() in parallel.
      
      Fix: A3 takes lock for mutual exclusion.
      
      A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
      interrupted by B3.
      
      Fix: A4 takes lock for mutual exclusion.
      
      If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
      They are just allocated by A5/A6.
      
      Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.
      
      A7 could get chance to pci_free_irq() for certain IO queue while B3 is
      checking irqaction.
      
      Fix: A7 takes lock.
      
      nvme_dev->online_queues need to be protected by shutdown_lock. Since it
      is not atomic, both paths could modify it using its own copy.
      Co-developed-by: NYuanyuan Zhong <yzhong@purestorage.com>
      Signed-off-by: NCasey Chen <cachen@purestorage.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e4b9852a
  22. 17 6月, 2021 4 次提交
  23. 16 6月, 2021 1 次提交