1. 25 7月, 2018 1 次提交
  2. 15 6月, 2018 1 次提交
  3. 11 6月, 2018 1 次提交
  4. 09 6月, 2018 1 次提交
  5. 29 5月, 2018 1 次提交
  6. 25 5月, 2018 1 次提交
  7. 12 4月, 2018 1 次提交
    • J
      nvme: expand nvmf_check_if_ready checks · bb06ec31
      James Smart 提交于
      The nvmf_check_if_ready() checks that were added are very simplistic.
      As such, the routine allows a lot of cases to fail ios during windows
      of reset or re-connection. In cases where there are not multi-path
      options present, the error goes back to the callee - the filesystem
      or application. Not good.
      
      The common routine was rewritten and calling syntax slightly expanded
      so that per-transport is_ready routines don't need to be present.
      The transports now call the routine directly. The routine is now a
      fabrics routine rather than an inline function.
      
      The routine now looks at controller state to decide the action to
      take. Some states mandate io failure. Others define the condition where
      a command can be accepted.  When the decision is unclear, a generic
      queue-or-reject check is made to look for failfast or multipath ios and
      only fails the io if it is so marked. Otherwise, the io will be queued
      and wait for the controller state to resolve.
      
      Admin commands issued via ioctl share a live admin queue with commands
      from the transport for controller init. The ioctls could be intermixed
      with the initialization commands. It's possible for the ioctl cmd to
      be issued prior to the controller being enabled. To block this, the
      ioctl admin commands need to be distinguished from admin commands used
      for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
      reflect this division and set it on ioctls requests.  As the
      nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
      ensure that commands allocated by the ioctl path (actually anything
      in core.c) preps the nvme_req(req) before starting the io. This will
      preserve the USERCMD flag during execution and/or retry.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.e>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb06ec31
  8. 26 3月, 2018 4 次提交
  9. 22 2月, 2018 1 次提交
  10. 14 2月, 2018 1 次提交
    • N
      nvme-rdma: fix sysfs invoked reset_ctrl error flow · 8000d1fd
      Nitzan Carmi 提交于
      When reset_controller that is invoked by sysfs fails,
      it enters an error flow which practically removes the
      nvme ctrl entirely (similar to delete_ctrl flow). It
      causes the system to hang, since a sysfs attribute cannot
      be unregistered by one of its own methods.
      
      This can be fixed by calling delete_ctrl as a work rather
      than sequential code. In addition, it should give the ctrl
      a chance to recover using reconnection mechanism (consistant
      with FC reset_ctrl error flow). Also, while we're here, return
      suitable errno in case the reset ended with non live ctrl.
      Signed-off-by: NNitzan Carmi <nitzanc@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      8000d1fd
  11. 09 2月, 2018 2 次提交
  12. 26 1月, 2018 1 次提交
  13. 16 1月, 2018 1 次提交
    • R
      nvme: host delete_work and reset_work on separate workqueues · b227c59b
      Roy Shterman 提交于
      We need to ensure that delete_work will be hosted on a different
      workqueue than all the works we flush or cancel from it.
      Otherwise we may hit a circular dependency warning [1].
      
      Also, given that delete_work flushes reset_work, host reset_work
      on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
      fix the flushing in the individual drivers to flush nvme_delete_wq
      when draining queued deletes.
      
      [1]:
      [  178.491942] =============================================
      [  178.492718] [ INFO: possible recursive locking detected ]
      [  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
      [  178.494382] ---------------------------------------------
      [  178.495160] kworker/5:1/135 is trying to acquire lock:
      [  178.495894]  (
      [  178.496120] "nvme-wq"
      [  178.496471] ){++++.+}
      [  178.496599] , at:
      [  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
      [  178.497670]
                     but task is already holding lock:
      [  178.498499]  (
      [  178.498724] "nvme-wq"
      [  178.499074] ){++++.+}
      [  178.499202] , at:
      [  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.500343]
                     other info that might help us debug this:
      [  178.501269]  Possible unsafe locking scenario:
      
      [  178.502113]        CPU0
      [  178.502472]        ----
      [  178.502829]   lock(
      [  178.503115] "nvme-wq"
      [  178.503467] );
      [  178.503716]   lock(
      [  178.504001] "nvme-wq"
      [  178.504353] );
      [  178.504601]
                      *** DEADLOCK ***
      
      [  178.505441]  May be due to missing lock nesting notation
      
      [  178.506453] 2 locks held by kworker/5:1/135:
      [  178.507068]  #0:
      [  178.507330]  (
      [  178.507598] "nvme-wq"
      [  178.507726] ){++++.+}
      [  178.508079] , at:
      [  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.509004]  #1:
      [  178.509265]  (
      [  178.509532] (&ctrl->delete_work)
      [  178.509795] ){+.+.+.}
      [  178.510145] , at:
      [  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.511070]
                     stack backtrace:
      :
      [  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
      [  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
      [  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
      [  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
      [  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
      [  178.518443] Call Trace:
      [  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
      [  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
      [  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
      [  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
      [  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
      [  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
      [  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
      [  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
      [  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
      [  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
      [  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
      [  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
      [  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
      [  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
      [  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
      [  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
      [  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
      [  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
      [  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
      Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b227c59b
  14. 08 1月, 2018 1 次提交
  15. 29 12月, 2017 1 次提交
    • S
      nvme-rdma: fix concurrent reset and reconnect · d5bf4b7f
      Sagi Grimberg 提交于
      Now ctrl state machine allows to transition from RESETTING to
      RECONNECTING.  In nvme-rdma when we receive a rdma cm DISONNECTED event,
      we trigger nvme_rdma_error_recovery. This happens also when we execute a
      controller reset, issue a cm diconnect request and receive a cm
      disconnect reply, as a result, the reset work and the error recovery work
      can run concurrently.
      
      Until now the state machine prevented from the error recovery work from
      running as a result of a controller reset (RESETTING -> RECONNECTING was
      not allowed).
      
      To fix this, we adopt the FC state machine approach, we always transition
      from LIVE to RESETTING and only then to RECONNECTING.  We do this both
      for the error recovery work and the controller reset work:
      
       1. transition to RESETTING
       2. teardown the controller association
       3. transition to RECONNECTING
      
      This will restore the protection against reset work and error recovery work
      from concurrently running together.
      
      Fixes: 3cec7f9d ("nvme: allow controller RESETTING to RECONNECTING transition")
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d5bf4b7f
  16. 29 11月, 2017 1 次提交
    • M
      nvme-rdma: fix memory leak during queue allocation · eb1bd249
      Max Gurtovoy 提交于
      In case nvme_rdma_wait_for_cm timeout expires before we get
      an established or rejected event (rdma_connect succeeded) from
      rdma_cm, we end up with leaking the ib transport resources for
      dedicated queue. This scenario can easily reproduced using traffic
      test during port toggling.
      Also, in order to protect from parallel ib queue destruction, that
      may be invoked from different context's, introduce new flag that
      stands for transport readiness. While we're here, protect also against
      a situation that we can receive rdma_cm events during ib queue destruction.
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      eb1bd249
  17. 26 11月, 2017 5 次提交
  18. 20 11月, 2017 1 次提交
  19. 11 11月, 2017 3 次提交
  20. 01 11月, 2017 4 次提交
  21. 27 10月, 2017 2 次提交
    • J
      nvme-rdma: add support for duplicate_connect option · 36e835f2
      James Smart 提交于
      Adds support for the duplicate_connect option. When set to true,
      checks whether there's an existing controller via the same target
      address (traddr), target port (trsvcid), and if specified, host
      address (host_traddr). Fails the connection request if there is
      an existing controller.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      36e835f2
    • C
      nvme: switch controller refcounting to use struct device · d22524a4
      Christoph Hellwig 提交于
      Instead of allocating a separate struct device for the character device
      handle embedd it into struct nvme_ctrl and use it for the main controller
      refcounting.  This removes double refcounting and gets us an automatic
      reference for the character device operations.  We keep ctrl->device as a
      pointer for now to avoid chaning printks all over, but in the future we
      could look into message printing helpers that take a controller structure
      similar to what other subsystems do.
      
      Note the delete_ctrl operation always already has a reference (either
      through sysfs due this change, or because every open file on the
      /dev/nvme-fabrics node has a refernece) when it is entered now, so we
      don't need to do the unless_zero variant there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      d22524a4
  22. 23 10月, 2017 3 次提交
    • N
      e62a538d
    • M
      nvme-rdma: align nvme_rdma_device structure · f87c89ad
      Max Gurtovoy 提交于
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f87c89ad
    • S
      nvme-rdma: fix possible hang when issuing commands during ctrl removal · 7db81446
      Sagi Grimberg 提交于
      nvme_rdma_queue_is_ready() fails requests in case a queue is not
      LIVE. If the controller is in RECONNECTING state, we might be in
      this state for a long time (until we successfully reconnect) and
      we are better off with failing the request fast. Otherwise, we
      fail with BLK_STS_RESOURCE to have the block layer try again
      soon.
      
      In case we are removing the controller when the admin queue
      is not LIVE, we will terminate the request with BLK_STS_RESOURCE
      but it happens before we call blk_mq_start_request() so the
      request timeout never expires, and the queue will never get
      back to LIVE (because we are removing the controller). This
      causes the removal operation to block infinitly [1].
      
      Thus, if we are removing (state DELETING), and the queue is
      not LIVE, we need to fail the request permanently as there is
      no chance for it to ever complete successfully.
      
      [1]
      --
      sysrq: SysRq : Show Blocked State
        task                        PC stack   pid father
      kworker/u66:2   D    0   440      2 0x80000000
      Workqueue: nvme-wq nvme_rdma_del_ctrl_work [nvme_rdma]
      Call Trace:
       __schedule+0x3e9/0xb00
       schedule+0x40/0x90
       schedule_timeout+0x221/0x580
       io_schedule_timeout+0x1e/0x50
       wait_for_completion_io_timeout+0x118/0x180
       blk_execute_rq+0x86/0xc0
       __nvme_submit_sync_cmd+0x89/0xf0
       nvmf_reg_write32+0x4b/0x90 [nvme_fabrics]
       nvme_shutdown_ctrl+0x41/0xe0
       nvme_rdma_shutdown_ctrl+0xca/0xd0 [nvme_rdma]
       nvme_rdma_remove_ctrl+0x2b/0x40 [nvme_rdma]
       nvme_rdma_del_ctrl_work+0x25/0x30 [nvme_rdma]
       process_one_work+0x1fd/0x630
       worker_thread+0x1db/0x3b0
       kthread+0x11e/0x150
       ret_from_fork+0x27/0x40
      01              D    0  2868   2862 0x00000000
      Call Trace:
       __schedule+0x3e9/0xb00
       schedule+0x40/0x90
       schedule_timeout+0x260/0x580
       wait_for_completion+0x108/0x170
       flush_work+0x1e0/0x270
       nvme_rdma_del_ctrl+0x5a/0x80 [nvme_rdma]
       nvme_sysfs_delete+0x2a/0x40
       dev_attr_store+0x18/0x30
       sysfs_kf_write+0x45/0x60
       kernfs_fop_write+0x124/0x1c0
       __vfs_write+0x28/0x150
       vfs_write+0xc7/0x1b0
       SyS_write+0x49/0xa0
       entry_SYSCALL_64_fastpath+0x18/0xad
      --
      Reported-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      7db81446
  23. 19 10月, 2017 2 次提交