1. 03 8月, 2022 6 次提交
    • N
      nvme: define compat_ioctl again to unbreak 32-bit userspace. · a25d4261
      Nick Bowler 提交于
      Commit 89b3d6e6 ("nvme: simplify the compat ioctl handling") removed
      the initialization of compat_ioctl from the nvme block_device_operations
      structures.
      
      Presumably the expectation was that 32-bit ioctls would be directed
      through the regular handler but this is not the case: failing to assign
      .compat_ioctl actually means that the compat case is disabled entirely,
      and any attempt to submit nvme ioctls from 32-bit userspace fails
      outright with -ENOTTY.
      
      For example:
      
        % smartctl -x /dev/nvme0n1
        [...]
        Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Inappropriate ioctl for device
      
      The blkdev_compat_ptr_ioctl helper can be used to direct compat calls
      through the main ioctl handler and makes things work again.
      
      Fixes: 89b3d6e6 ("nvme: simplify the compat ioctl handling")
      Signed-off-by: NNick Bowler <nbowler@draconx.ca>
      Reviewed-by: NGuixin Liu <kanie@linux.alibaba.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a25d4261
    • J
      nvme-multipath: refactor nvme_mpath_add_disk · c13cf14f
      Joel Granados 提交于
      Pass anagrpid as second argument. This is prep patch that allows reusing
      this function for supporting unknown command sets.
      Signed-off-by: NJoel Granados <j.granados@samsung.com>
      Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c13cf14f
    • H
      nvme: implement In-Band authentication · f50fff73
      Hannes Reinecke 提交于
      Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006.
      This patch adds two new fabric options 'dhchap_secret' to specify the
      pre-shared key (in ASCII respresentation according to NVMe 2.0 section
      8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify
      the pre-shared controller key for bi-directional authentication of both
      the host and the controller.
      Re-authentication can be triggered by writing the PSK into the new
      controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [axboe: fold in clang build fix]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f50fff73
    • C
      nvme: fix qid param blk_mq_alloc_request_hctx · b10907b8
      Chaitanya Kulkarni 提交于
      Only caller of the __nvme_submit_sync_cmd() with qid value not equal to
      NVME_QID_ANY is nvmf_connect_io_queues(), where qid value is alway set
      to > 0.
      
      [1] __nvme_submit_sync_cmd() callers with  qid parameter from :-
      
              Caller                  |   qid parameter
      ------------------------------------------------------
      * nvme_fc_connect_io_queues()   |
         nvmf_connect_io_queue()      |      qid > 0
      * nvme_rdma_start_io_queues()   |
         nvme_rdma_start_queue()      |
          nvmf_connect_io_queues()    |      qid > 0
      * nvme_tcp_start_io_queues()    |
         nvme_tcp_start_queue()       |
          nvmf_connect_io_queues()    |      qid > 0
      * nvme_loop_connect_io_queues() |
         nvmf_connect_io_queues()     |      qid > 0
      
      When qid value of the function parameter __nvme_submit_sync_cmd() is > 0
      from above callers, we use blk_mq_alloc_request_hctx(), where we pass
      last parameter as 0 if qid functional parameter value is set to 0 with
      conditional operators, see 1002 :-
      
      991 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
       992                 union nvme_result *result, void *buffer, unsigned bufflen,
       993                 int qid, int at_head, blk_mq_req_flags_t flags)
       994 {
       995         struct request *req;
       996         int ret;
       997
       998         if (qid == NVME_QID_ANY)
       999                 req = blk_mq_alloc_request(q, nvme_req_op(cmd), flags);
      1000         else
      1001                 req = blk_mq_alloc_request_hctx(q, nvme_req_op(cmd), flags,
      1002                                                 qid ? qid - 1 : 0);
      1003
      
      But qid function parameter value of the __nvme_submit_sync_cmd() will
      never be 0 from above caller list see [1], and all the other callers of
      __nvme_submit_sync_cmd() use NVME_QID_ANY as qid value :-
      1. nvme_submit_sync_cmd()
      2. nvme_features()
      3. nvme_sec_submit()
      4. nvmf_reg_read32()
      5. nvmf_reg_read64()
      6. nvmf_ref_write32()
      7. nvmf_connect_admin_queue()
      
      Remove the conditional operator to pass the qid as 0 in the call to
      blk_mq_alloc_requst_hctx().
      Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b10907b8
    • C
      nvme: remove unused timeout parameter · 6b46fa02
      Chaitanya Kulkarni 提交于
      The function __nvme_submit_sync_cmd() has following list of callers
      that sets the timeout value to 0 :-
      
              Callers               |   Timeout value
      ------------------------------------------------
      nvme_submit_sync_cmd()        |        0
      nvme_features()               |        0
      nvme_sec_submit()             |        0
      nvmf_reg_read32()             |        0
      nvmf_reg_read64()             |        0
      nvmf_reg_write32()            |        0
      nvmf_connect_admin_queue()    |        0
      nvmf_connect_io_queue()       |        0
      
      Remove the timeout function parameter from __nvme_submit_sync_cmd() and
      adjust the rest of code accordingly.
      Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6b46fa02
    • M
      nvme: handle the persistent internal error AER · 2c61c97f
      Michael Kelley 提交于
      In the NVM Express Revision 1.4 spec, Figure 145 describes possible
      values for an AER with event type "Error" (value 000b). For a
      Persistent Internal Error (value 03h), the host should perform a
      controller reset.
      
      Add support for this error using code that already exists for
      doing a controller reset. As part of this support, introduce
      two utility functions for parsing the AER type and subtype.
      
      This new support was tested in a lab environment where we can
      generate the persistent internal error on demand, and observe
      both the Linux side and NVMe controller side to see that the
      controller reset has been done.
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2c61c97f
  2. 14 7月, 2022 1 次提交
  3. 06 7月, 2022 1 次提交
  4. 29 6月, 2022 1 次提交
    • R
      nvme: fix regression when disconnect a recovering ctrl · f7f70f4a
      Ruozhu Li 提交于
      We encountered a problem that the disconnect command hangs.
      After analyzing the log and stack, we found that the triggering
      process is as follows:
      CPU0                          CPU1
                                      nvme_rdma_error_recovery_work
                                        nvme_rdma_teardown_io_queues
      nvme_do_delete_ctrl                 nvme_stop_queues
        nvme_remove_namespaces
        --clear ctrl->namespaces
                                          nvme_start_queues
                                          --no ns in ctrl->namespaces
          nvme_ns_remove                  return(because ctrl is deleting)
            blk_freeze_queue
              blk_mq_freeze_queue_wait
              --wait for ns to unquiesce to clean infligt IO, hang forever
      
      This problem was not found in older kernels because we will flush
      err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
      seem to be modified for functional reasons, the patch can be revert
      to solve the problem.
      
      Revert commit 794a4cb3 ("nvme: remove the .stop_ctrl callout")
      Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f7f70f4a
  5. 28 6月, 2022 2 次提交
  6. 23 6月, 2022 1 次提交
  7. 14 6月, 2022 2 次提交
  8. 31 5月, 2022 1 次提交
    • N
      nvme: set controller enable bit in a separate write · aa41d2fe
      Niklas Cassel 提交于
      The NVM Express Base Specification 2.0 specifies in the description
      of the CC – Controller Configuration register:
      "Host software shall set the Arbitration Mechanism Selected (CC.AMS),
      the Memory Page Size (CC.MPS), and the I/O Command Set Selected (CC.CSS)
      to valid values prior to enabling the controller by setting CC.EN to ‘1’.
      
      While we haven't seen any controller misbehaving while setting all bits
      in a single write, let's do it in the order that it is written in the
      spec, as there could potentially be controllers that are implemented to
      rely on the configuration bits being set before enabling the controller.
      Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      aa41d2fe
  9. 28 5月, 2022 1 次提交
  10. 20 5月, 2022 2 次提交
    • K
      nvme: enable uring-passthrough for admin commands · 58e5bdeb
      Kanchan Joshi 提交于
      Add two new opcodes that userspace can use for admin commands:
      NVME_URING_CMD_ADMIN : non-vectroed
      NVME_URING_CMD_ADMIN_VEC : vectored variant
      
      Wire up support when these are issued on controller node(/dev/nvmeX).
      Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      58e5bdeb
    • C
      nvme: set non-mdts limits in nvme_scan_work · 78288665
      Chaitanya Kulkarni 提交于
      In current implementation we set the non-mdts limits by calling
      nvme_init_non_mdts_limits() from nvme_init_ctrl_finish().
      This also tries to set the limits for the discovery controller which
      has no I/O queues resulting in the warning message reported by the
      nvme_log_error() when running blktest nvme/002: -
      
      [ 2005.155946] run blktests nvme/002 at 2022-04-09 16:57:47
      [ 2005.192223] loop: module loaded
      [ 2005.196429] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
      [ 2005.200334] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
      
      <------------------------------SNIP---------------------------------->
      
      [ 2008.958108] nvmet: adding nsid 1 to subsystem blktests-subsystem-997
      [ 2008.962082] nvmet: adding nsid 1 to subsystem blktests-subsystem-998
      [ 2008.966102] nvmet: adding nsid 1 to subsystem blktests-subsystem-999
      [ 2008.973132] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN testhostnqn.
      *[ 2008.973196] nvme1: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) MORE DNR*
      [ 2008.974595] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery"
      [ 2009.103248] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
      
      Move the call of nvme_init_non_mdts_limits() to nvme_scan_work() after
      we verify that I/O queues are created since that is a converging point
      for each transport where these limits are actually used.
      
      1. FC :
      nvme_fc_create_association()
       ...
       nvme_fc_create_io_queues(ctrl);
       ...
       nvme_start_ctrl()
        nvme_scan_queue()
         nvme_scan_work()
      
      2. PCIe:-
      nvme_reset_work()
       ...
       nvme_setup_io_queues()
        nvme_create_io_queues()
         nvme_alloc_queue()
       ...
       nvme_start_ctrl()
        nvme_scan_queue()
         nvme_scan_work()
      
      3. RDMA :-
      nvme_rdma_setup_ctrl
       ...
        nvme_rdma_configure_io_queues
        ...
        nvme_start_ctrl()
         nvme_scan_queue()
          nvme_scan_work()
      
      4. TCP :-
      nvme_tcp_setup_ctrl
       ...
        nvme_tcp_configure_io_queues
        ...
        nvme_start_ctrl()
         nvme_scan_queue()
          nvme_scan_work()
      
      * nvme_scan_work()
      ...
      nvme_validate_or_alloc_ns()
        nvme_alloc_ns()
         nvme_update_ns_info()
          nvme_update_disk_info()
           nvme_config_discard() <---
           blk_queue_max_write_zeroes_sectors() <---
      Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      78288665
  11. 19 5月, 2022 1 次提交
  12. 16 5月, 2022 3 次提交
  13. 11 5月, 2022 1 次提交
  14. 04 5月, 2022 1 次提交
  15. 18 4月, 2022 1 次提交
  16. 15 4月, 2022 2 次提交
  17. 29 3月, 2022 3 次提交
    • A
      nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8
      Anton Eidelman 提交于
      nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
      fresh ANA log from the ctrl.  This is essential to have an up to date
      path states for both existing namespaces and for those scan_work may
      discover once the ctrl is up.
      
      This happens in the following cases:
        1) A new ctrl is being connected.
        2) An existing ctrl is successfully reconnected.
        3) An existing ctrl is being reset.
      
      While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
      nvme_read_ana_log() may call nvme_update_ns_ana_state().
      
      This result in a hang when the ANA state of an existing namespace changes
      and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
      through the ctrl, which does NOT have IO queues yet.
      
      See sample hang below.
      
      Solution:
      - nvme_update_ns_ana_state() to call set_live only if ctrl is live
      - nvme_read_ana_log() call from nvme_mpath_init_identify()
        therefore only fetches and parses the ANA log;
        any erros in this process will fail the ctrl setup as appropriate;
      - a separate function nvme_mpath_update()
        is called in nvme_start_ctrl();
        this parses the ANA log without fetching it.
        At this point the ctrl is live,
        therefore, disks can be set live normally.
      
      Sample failure:
          nvme nvme0: starting error recovery
          nvme nvme0: Reconnecting in 10 seconds...
          block nvme0n6: no usable path - requeuing I/O
          INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
                Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
          Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
          Call Trace:
           __schedule+0x2a2/0x7e0
           schedule+0x4e/0xb0
           io_schedule+0x16/0x40
           wait_on_page_bit_common+0x15c/0x3e0
           do_read_cache_page+0x1e0/0x410
           read_cache_page+0x12/0x20
           read_part_sector+0x46/0x100
           read_lba+0x121/0x240
           efi_partition+0x1d2/0x6a0
           bdev_disk_changed.part.0+0x1df/0x430
           bdev_disk_changed+0x18/0x20
           blkdev_get_whole+0x77/0xe0
           blkdev_get_by_dev+0xd2/0x3a0
           __device_add_disk+0x1ed/0x310
           device_add_disk+0x13/0x20
           nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
           nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
           nvme_update_ana_state+0xca/0xe0 [nvme_core]
           nvme_parse_ana_log+0xac/0x170 [nvme_core]
           nvme_read_ana_log+0x7d/0xe0 [nvme_core]
           nvme_mpath_init_identify+0x105/0x150 [nvme_core]
           nvme_init_identify+0x2df/0x4d0 [nvme_core]
           nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
           nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
           nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
           process_one_work+0x1bd/0x360
           worker_thread+0x50/0x3d0
      Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a4a6f3c8
    • C
      nvme: fix RCU hole that allowed for endless looping in multipath round robin · d6d67427
      Chris Leech 提交于
      Make nvme_ns_remove match the assumptions elsewhere.
      
      1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
         running in __nvme_find_path or nvme_round_robin_path that will
         re-assign this ns to current_path.
      
      2) Any matching current_path entries need to be cleared before removing
         from the siblings list, to prevent calling nvme_round_robin_path with
         an "old" ns that's off list.
      
      3) Finally the list_del_rcu can happen, and then synchronize again
         before releasing any reference counts.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d6d67427
    • S
      nvme: allow duplicate NSIDs for private namespaces · 5974ea7c
      Sungup Moon 提交于
      A NVMe subsystem with multiple controller can have private namespaces
      that use the same NSID under some conditions:
      
       "If Namespace Management, ANA Reporting, or NVM Sets are supported, the
        NSIDs shall be unique within the NVM subsystem. If the Namespace
        Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
         a) for shared namespace shall be unique; and
         b) for private namespace are not required to be unique."
      
      Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.
      
      Make sure this specific setup is supported in Linux.
      
      Fixes: 9ad1927a ("nvme: always search for namespace head")
      Signed-off-by: NSungup Moon <sungup.moon@samsung.com>
      [hch: refactored and fixed the controller vs subsystem based naming
            conflict]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      5974ea7c
  18. 23 3月, 2022 1 次提交
    • P
      nvme: fix the read-only state for zoned namespaces with unsupposed features · 726be2c7
      Pankaj Raghav 提交于
      commit 2f4c9ba2 ("nvme: export zoned namespaces without Zone Append
      support read-only") marks zoned namespaces without append support
      read-only.  It does iso by setting NVME_NS_FORCE_RO in ns->flags in
      nvme_update_zone_info and checking for that flag later in
      nvme_update_disk_info to mark the disk as read-only.
      
      But commit 73d90386 ("nvme: cleanup zone information initialization")
      rearranged nvme_update_disk_info to be called before
      nvme_update_zone_info and thus not marking the disk as read-only.
      The call order cannot be just reverted because nvme_update_zone_info sets
      certain queue parameters such as zone_write_granularity that depend on the
      prior call to nvme_update_disk_info.
      
      Remove the call to set_disk_ro in nvme_update_disk_info. and call
      set_disk_ro after nvme_update_zone_info and nvme_update_disk_info to set
      the permission for ZNS drives correctly. The same applies to the
      multipath disk path.
      
      Fixes: 73d90386 ("nvme: cleanup zone information initialization")
      Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      726be2c7
  19. 16 3月, 2022 3 次提交
  20. 08 3月, 2022 3 次提交
  21. 28 2月, 2022 3 次提交