1. 04 5月, 2022 1 次提交
  2. 18 4月, 2022 1 次提交
  3. 15 4月, 2022 2 次提交
  4. 29 3月, 2022 3 次提交
    • A
      nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8
      Anton Eidelman 提交于
      nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
      fresh ANA log from the ctrl.  This is essential to have an up to date
      path states for both existing namespaces and for those scan_work may
      discover once the ctrl is up.
      
      This happens in the following cases:
        1) A new ctrl is being connected.
        2) An existing ctrl is successfully reconnected.
        3) An existing ctrl is being reset.
      
      While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
      nvme_read_ana_log() may call nvme_update_ns_ana_state().
      
      This result in a hang when the ANA state of an existing namespace changes
      and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
      through the ctrl, which does NOT have IO queues yet.
      
      See sample hang below.
      
      Solution:
      - nvme_update_ns_ana_state() to call set_live only if ctrl is live
      - nvme_read_ana_log() call from nvme_mpath_init_identify()
        therefore only fetches and parses the ANA log;
        any erros in this process will fail the ctrl setup as appropriate;
      - a separate function nvme_mpath_update()
        is called in nvme_start_ctrl();
        this parses the ANA log without fetching it.
        At this point the ctrl is live,
        therefore, disks can be set live normally.
      
      Sample failure:
          nvme nvme0: starting error recovery
          nvme nvme0: Reconnecting in 10 seconds...
          block nvme0n6: no usable path - requeuing I/O
          INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
                Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
          Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
          Call Trace:
           __schedule+0x2a2/0x7e0
           schedule+0x4e/0xb0
           io_schedule+0x16/0x40
           wait_on_page_bit_common+0x15c/0x3e0
           do_read_cache_page+0x1e0/0x410
           read_cache_page+0x12/0x20
           read_part_sector+0x46/0x100
           read_lba+0x121/0x240
           efi_partition+0x1d2/0x6a0
           bdev_disk_changed.part.0+0x1df/0x430
           bdev_disk_changed+0x18/0x20
           blkdev_get_whole+0x77/0xe0
           blkdev_get_by_dev+0xd2/0x3a0
           __device_add_disk+0x1ed/0x310
           device_add_disk+0x13/0x20
           nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
           nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
           nvme_update_ana_state+0xca/0xe0 [nvme_core]
           nvme_parse_ana_log+0xac/0x170 [nvme_core]
           nvme_read_ana_log+0x7d/0xe0 [nvme_core]
           nvme_mpath_init_identify+0x105/0x150 [nvme_core]
           nvme_init_identify+0x2df/0x4d0 [nvme_core]
           nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
           nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
           nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
           process_one_work+0x1bd/0x360
           worker_thread+0x50/0x3d0
      Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a4a6f3c8
    • C
      nvme: fix RCU hole that allowed for endless looping in multipath round robin · d6d67427
      Chris Leech 提交于
      Make nvme_ns_remove match the assumptions elsewhere.
      
      1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
         running in __nvme_find_path or nvme_round_robin_path that will
         re-assign this ns to current_path.
      
      2) Any matching current_path entries need to be cleared before removing
         from the siblings list, to prevent calling nvme_round_robin_path with
         an "old" ns that's off list.
      
      3) Finally the list_del_rcu can happen, and then synchronize again
         before releasing any reference counts.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d6d67427
    • S
      nvme: allow duplicate NSIDs for private namespaces · 5974ea7c
      Sungup Moon 提交于
      A NVMe subsystem with multiple controller can have private namespaces
      that use the same NSID under some conditions:
      
       "If Namespace Management, ANA Reporting, or NVM Sets are supported, the
        NSIDs shall be unique within the NVM subsystem. If the Namespace
        Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
         a) for shared namespace shall be unique; and
         b) for private namespace are not required to be unique."
      
      Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.
      
      Make sure this specific setup is supported in Linux.
      
      Fixes: 9ad1927a ("nvme: always search for namespace head")
      Signed-off-by: NSungup Moon <sungup.moon@samsung.com>
      [hch: refactored and fixed the controller vs subsystem based naming
            conflict]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      5974ea7c
  5. 23 3月, 2022 1 次提交
    • P
      nvme: fix the read-only state for zoned namespaces with unsupposed features · 726be2c7
      Pankaj Raghav 提交于
      commit 2f4c9ba2 ("nvme: export zoned namespaces without Zone Append
      support read-only") marks zoned namespaces without append support
      read-only.  It does iso by setting NVME_NS_FORCE_RO in ns->flags in
      nvme_update_zone_info and checking for that flag later in
      nvme_update_disk_info to mark the disk as read-only.
      
      But commit 73d90386 ("nvme: cleanup zone information initialization")
      rearranged nvme_update_disk_info to be called before
      nvme_update_zone_info and thus not marking the disk as read-only.
      The call order cannot be just reverted because nvme_update_zone_info sets
      certain queue parameters such as zone_write_granularity that depend on the
      prior call to nvme_update_disk_info.
      
      Remove the call to set_disk_ro in nvme_update_disk_info. and call
      set_disk_ro after nvme_update_zone_info and nvme_update_disk_info to set
      the permission for ZNS drives correctly. The same applies to the
      multipath disk path.
      
      Fixes: 73d90386 ("nvme: cleanup zone information initialization")
      Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      726be2c7
  6. 16 3月, 2022 3 次提交
  7. 08 3月, 2022 3 次提交
  8. 28 2月, 2022 12 次提交
  9. 23 2月, 2022 2 次提交
  10. 17 2月, 2022 1 次提交
  11. 09 2月, 2022 1 次提交
  12. 02 2月, 2022 1 次提交
    • S
      nvme: fix a possible use-after-free in controller reset during load · 0fa0f99f
      Sagi Grimberg 提交于
      Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
      readiness for AER submission. This may lead to a use-after-free
      condition that was observed with nvme-tcp.
      
      The race condition may happen in the following scenario:
      1. driver executes its reset_ctrl_work
      2. -> nvme_stop_ctrl - flushes ctrl async_event_work
      3. ctrl sends AEN which is received by the host, which in turn
         schedules AEN handling
      4. teardown admin queue (which releases the queue socket)
      5. AEN processed, submits another AER, calling the driver to submit
      6. driver attempts to send the cmd
      ==> use-after-free
      
      In order to fix that, add ctrl state check to validate the ctrl
      is actually able to accept the AER submission.
      
      This addresses the above race in controller resets because the driver
      during teardown should:
      1. change ctrl state to RESETTING
      2. flush async_event_work (as well as other async work elements)
      
      So after 1,2, any other AER command will find the
      ctrl state to be RESETTING and bail out without submitting the AER.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      0fa0f99f
  13. 23 12月, 2021 3 次提交
  14. 08 12月, 2021 1 次提交
    • R
      nvme: fix use after free when disconnecting a reconnecting ctrl · 8b77fa6f
      Ruozhu Li 提交于
      A crash happens when trying to disconnect a reconnecting ctrl:
      
       1) The network was cut off when the connection was just established,
          scan work hang there waiting for some IOs complete.  Those I/Os were
          retried because we return BLK_STS_RESOURCE to blk in reconnecting.
       2) After a while, I tried to disconnect this connection.  This
          procedure also hangs because it tried to obtain ctrl->scan_lock.
          It should be noted that now we have switched the controller state
          to NVME_CTRL_DELETING.
       3) In nvme_check_ready(), we always return true when ctrl->state is
          NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
          device which was already freed.
      
      To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
      device only when queue state is live.  If not, return host path error to
      the block layer
      Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8b77fa6f
  15. 06 12月, 2021 2 次提交
  16. 29 11月, 2021 1 次提交
  17. 24 11月, 2021 2 次提交