1. 06 1月, 2021 1 次提交
  2. 02 12月, 2020 5 次提交
    • J
      nvme: export zoned namespaces without Zone Append support read-only · 2f4c9ba2
      Javier González 提交于
      Allow ZNS NVMe SSDs to present a read-only namespace when append is not
      supported, instead of rejecting the namespace directly.
      
      This allows (i) the namespace to be used in read-only mode, which is not
      a problem as the append command only affects the write path, and (ii) to
      use standard management tools such as nvme-cli to choose a different
      format or firmware slot that is compatible with the Linux zoned block
      device.
      Signed-off-by: NJavier González <javier.gonz@samsung.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2f4c9ba2
    • V
      nvme-fabrics: reject I/O to offline device · 8c4dfea9
      Victor Gladkov 提交于
      Commands get stuck while Host NVMe-oF controller is in reconnect state.
      The controller enters into reconnect state when it loses connection with
      the target.  It tries to reconnect every 10 seconds (default) until
      a successful reconnect or until the reconnect time-out is reached.
      The default reconnect time out is 10 minutes.
      
      Applications are expecting commands to complete with success or error
      within a certain timeout (30 seconds by default).  The NVMe host is
      enforcing that timeout while it is connected, but during reconnect the
      timeout is not enforced and commands may get stuck for a long period or
      even forever.
      
      To fix this long delay due to the default timeout, introduce new
      "fast_io_fail_tmo" session parameter.  The timeout is measured in seconds
      from the controller reconnect and any command beyond that timeout is
      rejected.  The new parameter value may be passed during 'connect'.
      The default value of -1 means no timeout (similar to current behavior).
      Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com>
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8c4dfea9
    • C
      nvme: split nvme_alloc_request() · 39dfe844
      Chaitanya Kulkarni 提交于
      Right now nvme_alloc_request() allocates a request from block layer
      based on the value of the qid. When qid set to NVME_QID_ANY it used
      blk_mq_alloc_request() else blk_mq_alloc_request_hctx().
      
      The function nvme_alloc_request() is called from different context, The
      only place where it uses non NVME_QID_ANY value is for fabrics connect
      commands :-
      
      nvme_submit_sync_cmd()		NVME_QID_ANY
      nvme_features()			NVME_QID_ANY
      nvme_sec_submit()		NVME_QID_ANY
      nvmf_reg_read32()		NVME_QID_ANY
      nvmf_reg_read64()		NVME_QID_ANY
      nvmf_reg_write32()		NVME_QID_ANY
      nvmf_connect_admin_queue()	NVME_QID_ANY
      nvme_submit_user_cmd()		NVME_QID_ANY
      	nvme_alloc_request()
      nvme_keep_alive()		NVME_QID_ANY
      	nvme_alloc_request()
      nvme_timeout()			NVME_QID_ANY
      	nvme_alloc_request()
      nvme_delete_queue()		NVME_QID_ANY
      	nvme_alloc_request()
      nvmet_passthru_execute_cmd()	NVME_QID_ANY
      	nvme_alloc_request()
      nvmf_connect_io_queue() 	QID
      	__nvme_submit_sync_cmd()
      		nvme_alloc_request()
      
      With passthru nvme_alloc_request() now falls into the I/O fast path such
      that blk_mq_alloc_request_hctx() is never gets called and that adds
      additional branch check in fast path.
      
      Split the nvme_alloc_request() into nvme_alloc_request() and
      nvme_alloc_request_qid().
      
      Replace each call of the nvme_alloc_request() with NVME_QID_ANY param
      with a call to newly added nvme_alloc_request() without NVME_QID_ANY.
      
      Replace a call to nvme_alloc_request() with QID param with a call to
      newly added nvme_alloc_request() and nvme_alloc_request_qid()
      based on the qid value set in the __nvme_submit_sync_cmd().
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      39dfe844
    • C
      nvme: use consistent macro name for timeout · dc96f938
      Chaitanya Kulkarni 提交于
      This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to
      make consistent with NVME_IO_TIMEOUT.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      dc96f938
    • B
      nvme: simplify nvme_req_qid() · 84115d6d
      Baolin Wang 提交于
      Use the request's '->mq_hctx->queue_num' directly to simplify the
      nvme_req_qid() function.
      Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      84115d6d
  3. 14 11月, 2020 1 次提交
  4. 03 11月, 2020 1 次提交
  5. 22 10月, 2020 1 次提交
  6. 07 10月, 2020 2 次提交
  7. 27 9月, 2020 2 次提交
  8. 22 9月, 2020 1 次提交
  9. 09 9月, 2020 1 次提交
    • J
      nvme: Revert: Fix controller creation races with teardown flow · b63de840
      James Smart 提交于
      The indicated patch introduced a barrier in the sysfs_delete attribute
      for the controller that rejects the request if the controller isn't
      created. "Created" is defined as at least 1 call to nvme_start_ctrl().
      
      This is problematic in error-injection testing.  If an error occurs on
      the initial attempt to create an association and the controller enters
      reconnect(s) attempts, the admin cannot delete the controller until
      either there is a successful association created or ctrl_loss_tmo
      times out.
      
      Where this issue is particularly hurtful is when the "admin" is the
      nvme-cli, it is performing a connection to a discovery controller, and
      it is initiated via auto-connect scripts.  With the FC transport, if the
      first connection attempt fails, the controller enters a normal reconnect
      state but returns control to the cli thread that created the controller.
      In this scenario, the cli attempts to read the discovery log via ioctl,
      which fails, causing the cli to see it as an empty log and then proceeds
      to delete the discovery controller. The delete is rejected and the
      controller is left live. If the discovery controller reconnect then
      succeeds, there is no action to delete it, and it sits live doing nothing.
      
      Cc: <stable@vger.kernel.org> # v5.7+
      Fixes: ce151813 ("nvme: Fix controller creation races with teardown flow")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      CC: Israel Rukshin <israelr@mellanox.com>
      CC: Max Gurtovoy <maxg@mellanox.com>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Keith Busch <kbusch@kernel.org>
      CC: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b63de840
  10. 02 9月, 2020 2 次提交
  11. 29 8月, 2020 1 次提交
  12. 22 8月, 2020 3 次提交
  13. 29 7月, 2020 8 次提交
    • C
      nvme: add a Identify Namespace Identification Descriptor list quirk · 5bedd3af
      Christoph Hellwig 提交于
      Add a quirk for a device that does not support the Identify Namespace
      Identification Descriptor list despite claiming 1.3 compliance.
      
      Fixes: ea43d970 ("nvme: fix identify error status silent ignore")
      Reported-by: NIngo Brunberg <ingo_brunberg@web.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NIngo Brunberg <ingo_brunberg@web.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      5bedd3af
    • L
      nvme: export nvme_find_get_ns() and nvme_put_ns() · 24493b8b
      Logan Gunthorpe 提交于
      nvme_find_get_ns() and nvme_put_ns() are required by the target passthru
      code and are exported under the NVME_TARGET_PASSTHRU namespace.
      Based-on-a-patch-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      24493b8b
    • L
      nvme: introduce nvme_ctrl_get_by_path() · f783f444
      Logan Gunthorpe 提交于
      nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it
      gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0).
      It makes use of filp_open() to open the file and uses the private
      data to obtain a pointer to the struct nvme_ctrl. If the fops of the
      file do not match, -EINVAL is returned.
      
      The purpose of this function is to support NVMe-OF target passthru
      and is exported under the NVME_TARGET_PASSTHRU namespace.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f783f444
    • L
      nvme: introduce nvme_execute_passthru_rq to call nvme_passthru_[start|end]() · 17365ae6
      Logan Gunthorpe 提交于
      Introduce a new nvme_execute_passthru_rq() helper which calls
      nvme_passthru_[start|end]() around blk_execute_rq(). This ensures
      all passthru calls (including nvme_submit_io()) will be wrapped
      appropriately.
      
      nvme_execute_passthru_rq() will also be useful for the nvmet passthru
      code and is exported in the NVME_TARGET_PASSTHRU namespace.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      17365ae6
    • L
      nvme: create helper function to obtain command effects · df21b6b1
      Logan Gunthorpe 提交于
      Separate the code to obtain command effects from the code
      to start a passthru request and move the nvme_passthru_start() and
      nvme_passthru_end() functions up above nvme_submit_user_cmd() in order
      that they may be used in a new helper a subsequent patch.
      
      The new helper function will be necessary for nvmet passthru
      code to determine if we need to change out of interrupt context
      to handle the effects. It is exported in the NVME_TARGET_PASSTHRU
      namespace.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      df21b6b1
    • S
      nvme: fix deadlock in disconnect during scan_work and/or ana_work · ecca390e
      Sagi Grimberg 提交于
      A deadlock happens in the following scenario with multipath:
      1) scan_work(nvme0) detects a new nsid while nvme0
          is an optimized path to it, path nvme1 happens to be
          inaccessible.
      
      2) Before scan_work is complete nvme0 disconnect is initiated
          nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING
      
      3) scan_work(1) attempts to submit IO,
          but nvme_path_is_optimized() observes nvme0 is not LIVE.
          Since nvme1 is a possible path IO is requeued and scan_work hangs.
      
      --
      Workqueue: nvme-wq nvme_scan_work [nvme_core]
      kernel: Call Trace:
      kernel:  __schedule+0x2b9/0x6c0
      kernel:  schedule+0x42/0xb0
      kernel:  io_schedule+0x16/0x40
      kernel:  do_read_cache_page+0x438/0x830
      kernel:  read_cache_page+0x12/0x20
      kernel:  read_dev_sector+0x27/0xc0
      kernel:  read_lba+0xc1/0x220
      kernel:  efi_partition+0x1e6/0x708
      kernel:  check_partition+0x154/0x244
      kernel:  rescan_partitions+0xae/0x280
      kernel:  __blkdev_get+0x40f/0x560
      kernel:  blkdev_get+0x3d/0x140
      kernel:  __device_add_disk+0x388/0x480
      kernel:  device_add_disk+0x13/0x20
      kernel:  nvme_mpath_set_live+0x119/0x140 [nvme_core]
      kernel:  nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
      kernel:  nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
      kernel:  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      kernel:  nvme_mpath_add_disk+0x47/0x90 [nvme_core]
      kernel:  nvme_validate_ns+0x396/0x940 [nvme_core]
      kernel:  nvme_scan_work+0x24f/0x380 [nvme_core]
      kernel:  process_one_work+0x1db/0x380
      kernel:  worker_thread+0x249/0x400
      kernel:  kthread+0x104/0x140
      --
      
      4) Delete also hangs in flush_work(ctrl->scan_work)
          from nvme_remove_namespaces().
      
      Similiarly a deadlock with ana_work may happen: if ana_work has started
      and calls nvme_mpath_set_live and device_add_disk, it will
      trigger I/O. When we trigger disconnect I/O will block because
      our accessible (optimized) path is disconnecting, but the alternate
      path is inaccessible, so I/O blocks. Then disconnect tries to flush
      the ana_work and hangs.
      
      [  605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
      [  605.552087] Call Trace:
      [  605.552683]  __schedule+0x2b9/0x6c0
      [  605.553507]  schedule+0x42/0xb0
      [  605.554201]  io_schedule+0x16/0x40
      [  605.555012]  do_read_cache_page+0x438/0x830
      [  605.556925]  read_cache_page+0x12/0x20
      [  605.557757]  read_dev_sector+0x27/0xc0
      [  605.558587]  amiga_partition+0x4d/0x4c5
      [  605.561278]  check_partition+0x154/0x244
      [  605.562138]  rescan_partitions+0xae/0x280
      [  605.563076]  __blkdev_get+0x40f/0x560
      [  605.563830]  blkdev_get+0x3d/0x140
      [  605.564500]  __device_add_disk+0x388/0x480
      [  605.565316]  device_add_disk+0x13/0x20
      [  605.566070]  nvme_mpath_set_live+0x5e/0x130 [nvme_core]
      [  605.567114]  nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
      [  605.568197]  nvme_update_ana_state+0xca/0xe0 [nvme_core]
      [  605.569360]  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      [  605.571385]  nvme_read_ana_log+0x76/0x100 [nvme_core]
      [  605.572376]  nvme_ana_work+0x15/0x20 [nvme_core]
      [  605.573330]  process_one_work+0x1db/0x380
      [  605.574144]  worker_thread+0x4d/0x400
      [  605.574896]  kthread+0x104/0x140
      [  605.577205]  ret_from_fork+0x35/0x40
      [  605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
      [  605.579239]       Tainted: G           OE     5.3.5-050305-generic #201910071830
      [  605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  605.582320] nvme            D    0 14044  14043 0x00000000
      [  605.583424] Call Trace:
      [  605.583935]  __schedule+0x2b9/0x6c0
      [  605.584625]  schedule+0x42/0xb0
      [  605.585290]  schedule_timeout+0x203/0x2f0
      [  605.588493]  wait_for_completion+0xb1/0x120
      [  605.590066]  __flush_work+0x123/0x1d0
      [  605.591758]  __cancel_work_timer+0x10e/0x190
      [  605.593542]  cancel_work_sync+0x10/0x20
      [  605.594347]  nvme_mpath_stop+0x2f/0x40 [nvme_core]
      [  605.595328]  nvme_stop_ctrl+0x12/0x50 [nvme_core]
      [  605.596262]  nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
      [  605.597333]  nvme_sysfs_delete+0x5c/0x70 [nvme_core]
      [  605.598320]  dev_attr_store+0x17/0x30
      
      Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
      indicate the phase of controller deletion where I/O cannot be allowed
      to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
      be issued to the bottom device, and only after we flush the ana_work
      and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
      we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
      from re-firing by aborting early if we are not LIVE, so we should be safe
      here.
      
      In addition, change the transport drivers to follow the updated state
      machine.
      
      Fixes: 0d0b660f ("nvme: add ANA support")
      Reported-by: NAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ecca390e
    • S
      nvme: document nvme controller states · 4212f4e9
      Sagi Grimberg 提交于
      We are starting to see some non-trivial states
      so lets start documenting them.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4212f4e9
    • C
      nvme-core: replace ctrl page size with a macro · 6c3c05b0
      Chaitanya Kulkarni 提交于
      Saving the nvme controller's page size was from a time when the driver
      tried to use different sized pages, but this value is always set to
      a constant, and has been this way for some time. Remove the 'page_size'
      field and replace its usage with the constant value.
      
      This also lets the compiler make some micro-optimizations in the io
      path, and that's always a good thing.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6c3c05b0
  14. 16 7月, 2020 1 次提交
  15. 08 7月, 2020 3 次提交
  16. 01 7月, 2020 1 次提交
  17. 25 6月, 2020 1 次提交
    • A
      nvme-multipath: fix deadlock due to head->lock · d8a22f85
      Anton Eidelman 提交于
      In the following scenario scan_work and ana_work will deadlock:
      
      When scan_work calls nvme_mpath_add_disk() this holds ana_lock
      and invokes nvme_parse_ana_log(), which may issue IO
      in device_add_disk() and hang waiting for an accessible path.
      
      While nvme_mpath_set_live() only called when nvme_state_is_live(),
      a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO.
      
      Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on
      ANY ctrl will not be able to complete nvme_mpath_set_live()
      on the same ns->head, which is required in order to update
      the new accessible path and remove NVME_NS_ANA_PENDING..
      Therefore IO never completes: deadlock [1].
      
      Fix:
      Move device_add_disk out of the head->lock and protect it with an
      atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit.
      
      [1]:
      kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds.
      kernel:       Tainted: G           OE     5.3.5-050305-generic #201910071830
      kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kernel: kworker/u8:2    D    0   160      2 0x80004000
      kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core]
      kernel: Call Trace:
      kernel:  __schedule+0x2b9/0x6c0
      kernel:  schedule+0x42/0xb0
      kernel:  schedule_preempt_disabled+0xe/0x10
      kernel:  __mutex_lock.isra.0+0x182/0x4f0
      kernel:  __mutex_lock_slowpath+0x13/0x20
      kernel:  mutex_lock+0x2e/0x40
      kernel:  nvme_update_ns_ana_state+0x22/0x60 [nvme_core]
      kernel:  nvme_update_ana_state+0xca/0xe0 [nvme_core]
      kernel:  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      kernel:  nvme_read_ana_log+0x76/0x100 [nvme_core]
      kernel:  nvme_ana_work+0x15/0x20 [nvme_core]
      kernel:  process_one_work+0x1db/0x380
      kernel:  worker_thread+0x4d/0x400
      kernel:  kthread+0x104/0x140
      kernel:  ret_from_fork+0x35/0x40
      kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds.
      kernel:       Tainted: G           OE     5.3.5-050305-generic #201910071830
      kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kernel: kworker/u8:4    D    0   439      2 0x80004000
      kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core]
      kernel: Call Trace:
      kernel:  __schedule+0x2b9/0x6c0
      kernel:  schedule+0x42/0xb0
      kernel:  io_schedule+0x16/0x40
      kernel:  do_read_cache_page+0x438/0x830
      kernel:  read_cache_page+0x12/0x20
      kernel:  read_dev_sector+0x27/0xc0
      kernel:  read_lba+0xc1/0x220
      kernel:  efi_partition+0x1e6/0x708
      kernel:  check_partition+0x154/0x244
      kernel:  rescan_partitions+0xae/0x280
      kernel:  __blkdev_get+0x40f/0x560
      kernel:  blkdev_get+0x3d/0x140
      kernel:  __device_add_disk+0x388/0x480
      kernel:  device_add_disk+0x13/0x20
      kernel:  nvme_mpath_set_live+0x119/0x140 [nvme_core]
      kernel:  nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
      kernel:  nvme_mpath_add_disk+0xbe/0x100 [nvme_core]
      kernel:  nvme_validate_ns+0x396/0x940 [nvme_core]
      kernel:  nvme_scan_work+0x256/0x390 [nvme_core]
      kernel:  process_one_work+0x1db/0x380
      kernel:  worker_thread+0x4d/0x400
      kernel:  kthread+0x104/0x140
      kernel:  ret_from_fork+0x35/0x40
      
      Fixes: 0d0b660f ("nvme: add ANA support")
      Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d8a22f85
  18. 24 6月, 2020 2 次提交
  19. 05 6月, 2020 1 次提交
  20. 27 5月, 2020 2 次提交