1. 04 6月, 2021 2 次提交
  2. 04 5月, 2021 1 次提交
  3. 22 4月, 2021 1 次提交
    • H
      nvme: sanitize KATO setting · a70b81bd
      Hannes Reinecke 提交于
      According to the NVMe base spec the KATO commands should be sent
      at half of the KATO interval, to properly account for round-trip
      times.
      As we now will only ever send one KATO command per connection we
      can easily use the recommended values.
      This also fixes a potential issue where the request timeout for
      the KATO command does not match the value in the connect command,
      which might be causing spurious connection drops from the target.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a70b81bd
  4. 05 3月, 2021 1 次提交
    • M
      nvme-fabrics: fix kato initialization · 32feb6de
      Martin George 提交于
      Currently kato is initialized to NVME_DEFAULT_KATO for both
      discovery & i/o controllers. This is a problem specifically
      for non-persistent discovery controllers since it always ends
      up with a non-zero kato value. Fix this by initializing kato
      to zero instead, and ensuring various controllers are assigned
      appropriate kato values as follows:
      
      non-persistent controllers  - kato set to zero
      persistent controllers      - kato set to NVMF_DEV_DISC_TMO
                                    (or any positive int via nvme-cli)
      i/o controllers             - kato set to NVME_DEFAULT_KATO
                                    (or any positive int via nvme-cli)
      Signed-off-by: NMartin George <marting@netapp.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      32feb6de
  5. 10 2月, 2021 1 次提交
    • C
      nvme-fabrics: avoid double completions in nvmf_fail_nonready_command · ea5e5f42
      Chao Leng 提交于
      When reconnecting, the request may be completed with
      NVME_SC_HOST_PATH_ERROR in nvmf_fail_nonready_command, which currently
      set the state of the request to MQ_RQ_IN_FLIGHT before calling
      nvme_complete_rq.  When this happens for a request that is freed by
      the caller, such as nvme_submit_user_cmd, in the worst case the request
      could be completed again in tear down process.
      
      Instead of calling blk_mq_start_request from nvmf_fail_nonready_command,
      just use the new nvme_host_path_error helper to complete the command
      without starting it.
      Signed-off-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ea5e5f42
  6. 02 12月, 2020 1 次提交
    • V
      nvme-fabrics: reject I/O to offline device · 8c4dfea9
      Victor Gladkov 提交于
      Commands get stuck while Host NVMe-oF controller is in reconnect state.
      The controller enters into reconnect state when it loses connection with
      the target.  It tries to reconnect every 10 seconds (default) until
      a successful reconnect or until the reconnect time-out is reached.
      The default reconnect time out is 10 minutes.
      
      Applications are expecting commands to complete with success or error
      within a certain timeout (30 seconds by default).  The NVMe host is
      enforcing that timeout while it is connected, but during reconnect the
      timeout is not enforced and commands may get stuck for a long period or
      even forever.
      
      To fix this long delay due to the default timeout, introduce new
      "fast_io_fail_tmo" session parameter.  The timeout is measured in seconds
      from the controller reconnect and any command beyond that timeout is
      rejected.  The new parameter value may be passed during 'connect'.
      The default value of -1 means no timeout (similar to current behavior).
      Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com>
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8c4dfea9
  7. 09 9月, 2020 1 次提交
    • S
      nvme-fabrics: allow to queue requests for live queues · 73a53799
      Sagi Grimberg 提交于
      Right now we are failing requests based on the controller state (which
      is checked inline in nvmf_check_ready) however we should definitely
      accept requests if the queue is live.
      
      When entering controller reset, we transition the controller into
      NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
      requests (have blk_noretry_request set).
      
      This is also the case for NVME_REQ_USER for the wrong reason. There
      shouldn't be any reason for us to reject this I/O in a controller reset.
      We do want to prevent passthru commands on the admin queue because we
      need the controller to fully initialize first before we let user passthru
      admin commands to be issued.
      
      In a non-mpath setup, this means that the requests will simply be
      requeued over and over forever not allowing the q_usage_counter to drop
      its final reference, causing controller reset to hang if running
      concurrently with heavy I/O.
      
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      73a53799
  8. 29 8月, 2020 1 次提交
  9. 29 7月, 2020 1 次提交
    • S
      nvme: fix deadlock in disconnect during scan_work and/or ana_work · ecca390e
      Sagi Grimberg 提交于
      A deadlock happens in the following scenario with multipath:
      1) scan_work(nvme0) detects a new nsid while nvme0
          is an optimized path to it, path nvme1 happens to be
          inaccessible.
      
      2) Before scan_work is complete nvme0 disconnect is initiated
          nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING
      
      3) scan_work(1) attempts to submit IO,
          but nvme_path_is_optimized() observes nvme0 is not LIVE.
          Since nvme1 is a possible path IO is requeued and scan_work hangs.
      
      --
      Workqueue: nvme-wq nvme_scan_work [nvme_core]
      kernel: Call Trace:
      kernel:  __schedule+0x2b9/0x6c0
      kernel:  schedule+0x42/0xb0
      kernel:  io_schedule+0x16/0x40
      kernel:  do_read_cache_page+0x438/0x830
      kernel:  read_cache_page+0x12/0x20
      kernel:  read_dev_sector+0x27/0xc0
      kernel:  read_lba+0xc1/0x220
      kernel:  efi_partition+0x1e6/0x708
      kernel:  check_partition+0x154/0x244
      kernel:  rescan_partitions+0xae/0x280
      kernel:  __blkdev_get+0x40f/0x560
      kernel:  blkdev_get+0x3d/0x140
      kernel:  __device_add_disk+0x388/0x480
      kernel:  device_add_disk+0x13/0x20
      kernel:  nvme_mpath_set_live+0x119/0x140 [nvme_core]
      kernel:  nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
      kernel:  nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
      kernel:  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      kernel:  nvme_mpath_add_disk+0x47/0x90 [nvme_core]
      kernel:  nvme_validate_ns+0x396/0x940 [nvme_core]
      kernel:  nvme_scan_work+0x24f/0x380 [nvme_core]
      kernel:  process_one_work+0x1db/0x380
      kernel:  worker_thread+0x249/0x400
      kernel:  kthread+0x104/0x140
      --
      
      4) Delete also hangs in flush_work(ctrl->scan_work)
          from nvme_remove_namespaces().
      
      Similiarly a deadlock with ana_work may happen: if ana_work has started
      and calls nvme_mpath_set_live and device_add_disk, it will
      trigger I/O. When we trigger disconnect I/O will block because
      our accessible (optimized) path is disconnecting, but the alternate
      path is inaccessible, so I/O blocks. Then disconnect tries to flush
      the ana_work and hangs.
      
      [  605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
      [  605.552087] Call Trace:
      [  605.552683]  __schedule+0x2b9/0x6c0
      [  605.553507]  schedule+0x42/0xb0
      [  605.554201]  io_schedule+0x16/0x40
      [  605.555012]  do_read_cache_page+0x438/0x830
      [  605.556925]  read_cache_page+0x12/0x20
      [  605.557757]  read_dev_sector+0x27/0xc0
      [  605.558587]  amiga_partition+0x4d/0x4c5
      [  605.561278]  check_partition+0x154/0x244
      [  605.562138]  rescan_partitions+0xae/0x280
      [  605.563076]  __blkdev_get+0x40f/0x560
      [  605.563830]  blkdev_get+0x3d/0x140
      [  605.564500]  __device_add_disk+0x388/0x480
      [  605.565316]  device_add_disk+0x13/0x20
      [  605.566070]  nvme_mpath_set_live+0x5e/0x130 [nvme_core]
      [  605.567114]  nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
      [  605.568197]  nvme_update_ana_state+0xca/0xe0 [nvme_core]
      [  605.569360]  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      [  605.571385]  nvme_read_ana_log+0x76/0x100 [nvme_core]
      [  605.572376]  nvme_ana_work+0x15/0x20 [nvme_core]
      [  605.573330]  process_one_work+0x1db/0x380
      [  605.574144]  worker_thread+0x4d/0x400
      [  605.574896]  kthread+0x104/0x140
      [  605.577205]  ret_from_fork+0x35/0x40
      [  605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
      [  605.579239]       Tainted: G           OE     5.3.5-050305-generic #201910071830
      [  605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  605.582320] nvme            D    0 14044  14043 0x00000000
      [  605.583424] Call Trace:
      [  605.583935]  __schedule+0x2b9/0x6c0
      [  605.584625]  schedule+0x42/0xb0
      [  605.585290]  schedule_timeout+0x203/0x2f0
      [  605.588493]  wait_for_completion+0xb1/0x120
      [  605.590066]  __flush_work+0x123/0x1d0
      [  605.591758]  __cancel_work_timer+0x10e/0x190
      [  605.593542]  cancel_work_sync+0x10/0x20
      [  605.594347]  nvme_mpath_stop+0x2f/0x40 [nvme_core]
      [  605.595328]  nvme_stop_ctrl+0x12/0x50 [nvme_core]
      [  605.596262]  nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
      [  605.597333]  nvme_sysfs_delete+0x5c/0x70 [nvme_core]
      [  605.598320]  dev_attr_store+0x17/0x30
      
      Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
      indicate the phase of controller deletion where I/O cannot be allowed
      to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
      be issued to the bottom device, and only after we flush the ana_work
      and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
      we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
      from re-firing by aborting early if we are not LIVE, so we should be safe
      here.
      
      In addition, change the transport drivers to follow the updated state
      machine.
      
      Fixes: 0d0b660f ("nvme: add ANA support")
      Reported-by: NAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ecca390e
  10. 26 3月, 2020 1 次提交
  11. 12 9月, 2019 1 次提交
  12. 30 8月, 2019 2 次提交
    • S
      nvme: make fabrics command run on a separate request queue · e7832cb4
      Sagi Grimberg 提交于
      We have a fundamental issue that fabric commands use the admin_q.
      The reason is, that admin-connect, register reads and writes and
      admin commands cannot be guaranteed ordering while we are running
      controller resets.
      
      For example, when we reset a controller we perform:
      1. disable the controller
      2. teardown the admin queue
      3. re-establish the admin queue
      4. enable the controller
      
      In order to perform (3), we need to unquiesce the admin queue, however
      we may have some admin commands that are already pending on the
      quiesced admin_q and will immediate execute when we unquiesce it before
      we execute (4). The host must not send admin commands to the controller
      before enabling the controller.
      
      To fix this, we have the fabric commands (admin connect and property
      get/set, but not I/O queue connect) use a separate fabrics_q and make
      sure to quiesce the admin_q before we disable the controller, and
      unquiesce it only after we enable the controller.
      
      This fixes the error prints from nvmet in a controller reset storm test:
      kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
      Which indicate that the host is sending an admin command when the
      controller is not enabled.
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e7832cb4
    • I
      nvme-fabrics: Add type of service (TOS) configuration · 52b4451a
      Israel Rukshin 提交于
      TOS is user-defined and needs to be configured via nvme-cli.
      It must be set before initiating any traffic and once set the TOS
      cannot be changed.
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      52b4451a
  13. 21 6月, 2019 1 次提交
  14. 14 5月, 2019 1 次提交
  15. 01 5月, 2019 1 次提交
  16. 20 2月, 2019 2 次提交
  17. 10 1月, 2019 1 次提交
  18. 19 12月, 2018 3 次提交
  19. 13 12月, 2018 3 次提交
  20. 08 12月, 2018 1 次提交
  21. 19 10月, 2018 1 次提交
  22. 02 10月, 2018 1 次提交
    • J
      nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O · 783f4a44
      James Smart 提交于
      When an io is rejected by nvmf_check_ready() due to validation of the
      controller state, the nvmf_fail_nonready_command() will normally return
      BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
      dying or the I/O is marked for NVMe multipath, the I/O is failed so that
      the controller can terminate or so that the io can be issued on a
      different path.  Unfortunately, as this reject point is before the
      transport has accepted the command, blk-mq ends up completing the I/O
      and never calls nvme_complete_rq(), which is where multipath may preserve
      or re-route the I/O. The end result is, the device user ends up seeing an
      EIO error.
      
      Example: single path connectivity, controller is under load, and a reset
      is induced.  An I/O is received:
      
        a) while the reset state has been set but the queues have yet to be
           stopped; or
        b) after queues are started (at end of reset) but before the reconnect
           has completed.
      
      The I/O finishes with an EIO status.
      
      This patch makes the following changes:
      
        - Adds the HOST_PATH_ERROR pathing status from TP4028
        - Modifies the reject point such that it appears to queue successfully,
          but actually completes the io with the new pathing status and calls
          nvme_complete_rq().
        - nvme_complete_rq() recognizes the new status, avoids resetting the
          controller (likely was already done in order to get this new status),
          and calls the multipather to clear the current path that errored.
          This allows the next command (retry or new command) to select a new
          path if there is one.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      783f4a44
  23. 08 8月, 2018 1 次提交
  24. 24 7月, 2018 1 次提交
    • J
      nvme: if_ready checks to fail io to deleting controller · 6cdefc6e
      James Smart 提交于
      The revised if_ready checks skipped over the case of returning error when
      the controller is being deleted.  Instead it was returning BUSY, which
      caused the ios to retry, which caused the ns delete to hang waiting for
      the ios to drain.
      
      Stack trace of hang looks like:
       kworker/u64:2   D    0    74      2 0x80000000
       Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
       Call Trace:
        ? __schedule+0x26d/0x820
        schedule+0x32/0x80
        blk_mq_freeze_queue_wait+0x36/0x80
        ? remove_wait_queue+0x60/0x60
        blk_cleanup_queue+0x72/0x160
        nvme_ns_remove+0x106/0x140 [nvme_core]
        nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
        nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
        process_one_work+0x160/0x350
        worker_thread+0x1c3/0x3d0
        kthread+0xf5/0x130
        ? process_one_work+0x350/0x350
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x1f/0x30
      
      Extend nvmf_fail_nonready_command() to supply the controller pointer so
      that the controller state can be looked at. Fail any io to a controller
      that is deleting.
      
      Fixes: 3bc32bb1 ("nvme-fabrics: refactor queue ready check")
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      6cdefc6e
  25. 15 6月, 2018 2 次提交
  26. 09 6月, 2018 1 次提交
  27. 01 6月, 2018 1 次提交
  28. 25 5月, 2018 4 次提交
  29. 03 5月, 2018 1 次提交