1. 27 3月, 2019 1 次提交
    • P
      srcu: Remove cleanup_srcu_struct_quiesced() · f5ad3991
      Paul E. McKenney 提交于
      The cleanup_srcu_struct_quiesced() function was added because NVME
      used WQ_MEM_RECLAIM workqueues and SRCU did not, which meant that
      NVME workqueues waiting on SRCU workqueues could result in deadlocks
      during low-memory conditions.  However, SRCU now also has WQ_MEM_RECLAIM
      workqueues, so there is no longer a potential for deadlock.  Furthermore,
      it turns out to be extremely hard to use cleanup_srcu_struct_quiesced()
      correctly due to the fact that SRCU callback invocation accesses the
      srcu_struct structure's per-CPU data area just after callbacks are
      invoked.  Therefore, the usual practice of using srcu_barrier() to wait
      for callbacks to be invoked before invoking cleanup_srcu_struct_quiesced()
      fails because SRCU's callback-invocation workqueue handler might be
      delayed, which can result in cleanup_srcu_struct_quiesced() being invoked
      (and thus freeing the per-CPU data) before the SRCU's callback-invocation
      workqueue handler is finished using that per-CPU data.  Nor is this a
      theoretical problem: KASAN emitted use-after-free warnings because of
      this problem on actual runs.
      
      In short, NVME can now safely invoke cleanup_srcu_struct(), which
      avoids the use-after-free scenario.  And cleanup_srcu_struct_quiesced()
      is quite difficult to use safely.  This commit therefore removes
      cleanup_srcu_struct_quiesced(), switching its sole user back to
      cleanup_srcu_struct().  This effectively reverts the following pair
      of commits:
      
      f7194ac3 ("srcu: Add cleanup_srcu_struct_quiesced()")
      4317228a ("nvme: Avoid flush dependency in delete controller flow")
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      f5ad3991
  2. 14 3月, 2019 7 次提交
  3. 20 2月, 2019 6 次提交
    • C
      nvme: convert to SPDX identifiers · bc50ad75
      Christoph Hellwig 提交于
      Update license to use SPDX-License-Identifier instead of verbose license
      text.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      bc50ad75
    • H
      nvme: return error from nvme_alloc_ns() · ab4ab09c
      Hannes Reinecke 提交于
      nvme_alloc_ns() might fail, so we should be returning an error code.
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ab4ab09c
    • B
      nvme: avoid that deleting a controller triggers a circular locking complaint · b9c77583
      Bart Van Assche 提交于
      Rework nvme_delete_ctrl_sync() such that it does not have to wait for
      queued work. This patch avoids that test nvme/008 triggers the following
      complaint:
      
      WARNING: possible circular locking dependency detected
      5.0.0-rc6-dbg+ #10 Not tainted
      ------------------------------------------------------
      nvme/7918 is trying to acquire lock:
      000000009a1a7b69 ((work_completion)(&ctrl->delete_work)){+.+.}, at: __flush_work+0x379/0x410
      
      but task is already holding lock:
      00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (kn->count#389){++++}:
             lock_acquire+0xc5/0x1e0
             __kernfs_remove+0x42a/0x4a0
             kernfs_remove_by_name_ns+0x45/0x90
             remove_files.isra.1+0x3a/0x90
             sysfs_remove_group+0x5c/0xc0
             sysfs_remove_groups+0x39/0x60
             device_remove_attrs+0x68/0xb0
             device_del+0x24d/0x570
             cdev_device_del+0x1a/0x50
             nvme_delete_ctrl_work+0xbd/0xe0
             process_one_work+0x4f1/0xa40
             worker_thread+0x67/0x5b0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #0 ((work_completion)(&ctrl->delete_work)){+.+.}:
             __lock_acquire+0x1323/0x17b0
             lock_acquire+0xc5/0x1e0
             __flush_work+0x399/0x410
             flush_work+0x10/0x20
             nvme_delete_ctrl_sync+0x65/0x70
             nvme_sysfs_delete+0x4f/0x60
             dev_attr_store+0x3e/0x50
             sysfs_kf_write+0x87/0xa0
             kernfs_fop_write+0x186/0x240
             __vfs_write+0xd7/0x430
             vfs_write+0xfa/0x260
             ksys_write+0xab/0x130
             __x64_sys_write+0x43/0x50
             do_syscall_64+0x71/0x210
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(kn->count#389);
                                     lock((work_completion)(&ctrl->delete_work));
                                     lock(kn->count#389);
        lock((work_completion)(&ctrl->delete_work));
      
       *** DEADLOCK ***
      
      3 locks held by nvme/7918:
       #0: 00000000e2223b44 (sb_writers#6){.+.+}, at: vfs_write+0x1eb/0x260
       #1: 000000003404976f (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240
       #2: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210
      
      stack backtrace:
      CPU: 4 PID: 7918 Comm: nvme Not tainted 5.0.0-rc6-dbg+ #10
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       dump_stack+0x86/0xca
       print_circular_bug.isra.36.cold.54+0x173/0x1d5
       check_prev_add.constprop.45+0x996/0x1110
       __lock_acquire+0x1323/0x17b0
       lock_acquire+0xc5/0x1e0
       __flush_work+0x399/0x410
       flush_work+0x10/0x20
       nvme_delete_ctrl_sync+0x65/0x70
       nvme_sysfs_delete+0x4f/0x60
       dev_attr_store+0x3e/0x50
       sysfs_kf_write+0x87/0xa0
       kernfs_fop_write+0x186/0x240
       __vfs_write+0xd7/0x430
       vfs_write+0xfa/0x260
       ksys_write+0xab/0x130
       __x64_sys_write+0x43/0x50
       do_syscall_64+0x71/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b9c77583
    • B
      nvme: introduce a helper function for controller deletion · a686ed75
      Bart Van Assche 提交于
      This patch does not change any functionality but makes the next patch
      in this series easier to read.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a686ed75
    • B
      nvme: unexport nvme_delete_ctrl_sync() · d84c4b02
      Bart Van Assche 提交于
      Since nvme_delete_ctrl_sync() is not called from any other kernel module,
      unexport it.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d84c4b02
    • H
      nvme-multipath: round-robin I/O policy · 75c10e73
      Hannes Reinecke 提交于
      Implement a simple round-robin I/O policy for multipathing.  Path
      selection is done in two rounds, first iterating across all optimized
      paths, and if that doesn't return any valid paths, iterate over all
      optimized and non-optimized paths.  If no paths are found, use the
      existing algorithm.  Also add a sysfs attribute 'iopolicy' to switch
      between the current NUMA-aware I/O policy and the 'round-robin' I/O
      policy.
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      75c10e73
  4. 06 2月, 2019 1 次提交
  5. 04 2月, 2019 2 次提交
  6. 10 1月, 2019 3 次提交
  7. 19 12月, 2018 1 次提交
  8. 14 12月, 2018 1 次提交
  9. 13 12月, 2018 3 次提交
  10. 12 12月, 2018 1 次提交
  11. 08 12月, 2018 6 次提交
  12. 07 12月, 2018 1 次提交
    • J
      nvme: validate controller state before rescheduling keep alive · 86880d64
      James Smart 提交于
      Delete operations are seeing NULL pointer references in call_timer_fn.
      Tracking these back, the timer appears to be the keep alive timer.
      
      nvme_keep_alive_work() which is tied to the timer that is cancelled
      by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
      wait for it's completion. So nvme_stop_keep_alive() only stops a timer
      when it's pending. When a keep alive is in flight, there is no timer
      running and the nvme_stop_keep_alive() will have no affect on the keep
      alive io. Thus, if the io completes successfully, the keep alive timer
      will be rescheduled.   In the failure case, delete is called, the
      controller state is changed, the nvme_stop_keep_alive() is called while
      the io is outstanding, and the delete path continues on. The keep
      alive happens to successfully complete before the delete paths mark it
      as aborted as part of the queue termination, so the timer is restarted.
      The delete paths then tear down the controller, and later on the timer
      code fires and the timer entry is now corrupt.
      
      Fix by validating the controller state before rescheduling the keep
      alive. Testing with the fix has confirmed the condition above was hit.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      86880d64
  13. 01 12月, 2018 1 次提交
  14. 28 11月, 2018 1 次提交
    • I
      nvme-pci: fix surprise removal · 751a0cc0
      Igor Konopko 提交于
      When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
      blk_cleanup_queue() on the admin queue, which frees the hctx for that
      queue.  Moments later, on the same path nvme_kill_queues() calls
      blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
      which leads to following OOPS:
      
      Oops: 0000 [#1] SMP PTI
      RIP: 0010:sbitmap_any_bit_set+0xb/0x40
      Call Trace:
       blk_mq_run_hw_queue+0xd5/0x150
       blk_mq_run_hw_queues+0x3a/0x50
       nvme_kill_queues+0x26/0x50
       nvme_remove_namespaces+0xb2/0xc0
       nvme_remove+0x60/0x140
       pci_device_remove+0x3b/0xb0
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      751a0cc0
  15. 27 11月, 2018 1 次提交
  16. 09 11月, 2018 2 次提交
  17. 18 10月, 2018 1 次提交
  18. 17 10月, 2018 1 次提交