1. 16 1月, 2018 1 次提交
    • R
      nvme: host delete_work and reset_work on separate workqueues · b227c59b
      Roy Shterman 提交于
      We need to ensure that delete_work will be hosted on a different
      workqueue than all the works we flush or cancel from it.
      Otherwise we may hit a circular dependency warning [1].
      
      Also, given that delete_work flushes reset_work, host reset_work
      on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
      fix the flushing in the individual drivers to flush nvme_delete_wq
      when draining queued deletes.
      
      [1]:
      [  178.491942] =============================================
      [  178.492718] [ INFO: possible recursive locking detected ]
      [  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
      [  178.494382] ---------------------------------------------
      [  178.495160] kworker/5:1/135 is trying to acquire lock:
      [  178.495894]  (
      [  178.496120] "nvme-wq"
      [  178.496471] ){++++.+}
      [  178.496599] , at:
      [  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
      [  178.497670]
                     but task is already holding lock:
      [  178.498499]  (
      [  178.498724] "nvme-wq"
      [  178.499074] ){++++.+}
      [  178.499202] , at:
      [  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.500343]
                     other info that might help us debug this:
      [  178.501269]  Possible unsafe locking scenario:
      
      [  178.502113]        CPU0
      [  178.502472]        ----
      [  178.502829]   lock(
      [  178.503115] "nvme-wq"
      [  178.503467] );
      [  178.503716]   lock(
      [  178.504001] "nvme-wq"
      [  178.504353] );
      [  178.504601]
                      *** DEADLOCK ***
      
      [  178.505441]  May be due to missing lock nesting notation
      
      [  178.506453] 2 locks held by kworker/5:1/135:
      [  178.507068]  #0:
      [  178.507330]  (
      [  178.507598] "nvme-wq"
      [  178.507726] ){++++.+}
      [  178.508079] , at:
      [  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.509004]  #1:
      [  178.509265]  (
      [  178.509532] (&ctrl->delete_work)
      [  178.509795] ){+.+.+.}
      [  178.510145] , at:
      [  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.511070]
                     stack backtrace:
      :
      [  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
      [  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
      [  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
      [  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
      [  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
      [  178.518443] Call Trace:
      [  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
      [  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
      [  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
      [  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
      [  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
      [  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
      [  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
      [  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
      [  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
      [  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
      [  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
      [  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
      [  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
      [  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
      [  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
      [  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
      [  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
      [  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
      [  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
      Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b227c59b
  2. 08 1月, 2018 1 次提交
  3. 29 11月, 2017 1 次提交
    • M
      nvme-rdma: fix memory leak during queue allocation · eb1bd249
      Max Gurtovoy 提交于
      In case nvme_rdma_wait_for_cm timeout expires before we get
      an established or rejected event (rdma_connect succeeded) from
      rdma_cm, we end up with leaking the ib transport resources for
      dedicated queue. This scenario can easily reproduced using traffic
      test during port toggling.
      Also, in order to protect from parallel ib queue destruction, that
      may be invoked from different context's, introduce new flag that
      stands for transport readiness. While we're here, protect also against
      a situation that we can receive rdma_cm events during ib queue destruction.
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      eb1bd249
  4. 26 11月, 2017 5 次提交
  5. 20 11月, 2017 1 次提交
  6. 11 11月, 2017 3 次提交
  7. 01 11月, 2017 4 次提交
  8. 27 10月, 2017 2 次提交
    • J
      nvme-rdma: add support for duplicate_connect option · 36e835f2
      James Smart 提交于
      Adds support for the duplicate_connect option. When set to true,
      checks whether there's an existing controller via the same target
      address (traddr), target port (trsvcid), and if specified, host
      address (host_traddr). Fails the connection request if there is
      an existing controller.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      36e835f2
    • C
      nvme: switch controller refcounting to use struct device · d22524a4
      Christoph Hellwig 提交于
      Instead of allocating a separate struct device for the character device
      handle embedd it into struct nvme_ctrl and use it for the main controller
      refcounting.  This removes double refcounting and gets us an automatic
      reference for the character device operations.  We keep ctrl->device as a
      pointer for now to avoid chaning printks all over, but in the future we
      could look into message printing helpers that take a controller structure
      similar to what other subsystems do.
      
      Note the delete_ctrl operation always already has a reference (either
      through sysfs due this change, or because every open file on the
      /dev/nvme-fabrics node has a refernece) when it is entered now, so we
      don't need to do the unless_zero variant there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      d22524a4
  9. 23 10月, 2017 3 次提交
    • N
      e62a538d
    • M
      nvme-rdma: align nvme_rdma_device structure · f87c89ad
      Max Gurtovoy 提交于
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f87c89ad
    • S
      nvme-rdma: fix possible hang when issuing commands during ctrl removal · 7db81446
      Sagi Grimberg 提交于
      nvme_rdma_queue_is_ready() fails requests in case a queue is not
      LIVE. If the controller is in RECONNECTING state, we might be in
      this state for a long time (until we successfully reconnect) and
      we are better off with failing the request fast. Otherwise, we
      fail with BLK_STS_RESOURCE to have the block layer try again
      soon.
      
      In case we are removing the controller when the admin queue
      is not LIVE, we will terminate the request with BLK_STS_RESOURCE
      but it happens before we call blk_mq_start_request() so the
      request timeout never expires, and the queue will never get
      back to LIVE (because we are removing the controller). This
      causes the removal operation to block infinitly [1].
      
      Thus, if we are removing (state DELETING), and the queue is
      not LIVE, we need to fail the request permanently as there is
      no chance for it to ever complete successfully.
      
      [1]
      --
      sysrq: SysRq : Show Blocked State
        task                        PC stack   pid father
      kworker/u66:2   D    0   440      2 0x80000000
      Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
      Call Trace:
       __schedule+0x3e9/0xb00
       schedule+0x40/0x90
       schedule_timeout+0x221/0x580
       io_schedule_timeout+0x1e/0x50
       wait_for_completion_io_timeout+0x118/0x180
       blk_execute_rq+0x86/0xc0
       __nvme_submit_sync_cmd+0x89/0xf0
       nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
       nvme_shutdown_ctrl+0x41/0xe0
       nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
       nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
       nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
       process_one_work+0x1fd/0x630
       worker_thread+0x1db/0x3b0
       kthread+0x11e/0x150
       ret_from_fork+0x27/0x40
      01              D    0  2868   2862 0x00000000
      Call Trace:
       __schedule+0x3e9/0xb00
       schedule+0x40/0x90
       schedule_timeout+0x260/0x580
       wait_for_completion+0x108/0x170
       flush_work+0x1e0/0x270
       nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
       nvme_sysfs_delete+0x2a/0x40
       dev_attr_store+0x18/0x30
       sysfs_kf_write+0x45/0x60
       kernfs_fop_write+0x124/0x1c0
       __vfs_write+0x28/0x150
       vfs_write+0xc7/0x1b0
       SyS_write+0x49/0xa0
       entry_SYSCALL_64_fastpath+0x18/0xad
      --
      Reported-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      7db81446
  10. 19 10月, 2017 12 次提交
  11. 26 9月, 2017 2 次提交
  12. 30 8月, 2017 1 次提交
  13. 29 8月, 2017 4 次提交