1. 22 12月, 2022 1 次提交
    • Y
      nvme: fix multipath crash caused by flush request when blktrace is enabled · 3659fb5a
      Yanjun Zhang 提交于
      The flush request initialized by blk_kick_flush has NULL bio,
      and it may be dealt with nvme_end_req during io completion.
      When blktrace is enabled, nvme_trace_bio_complete with multipath
      activated trying to access NULL pointer bio from flush request
      results in the following crash:
      
      [ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a
      [ 2517.835213] #PF: supervisor read access in kernel mode
      [ 2517.838724] #PF: error_code(0x0000) - not-present page
      [ 2517.842222] PGD 7b2d51067 P4D 0
      [ 2517.845684] Oops: 0000 [#1] SMP NOPTI
      [ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S                5.15.67-0.cl9.x86_64 #1
      [ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022
      [ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
      [ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30
      [ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba
      [ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286
      [ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000
      [ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000
      [ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000
      [ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8
      [ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018
      [ 2517.894434] FS:  0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000
      [ 2517.898299] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0
      [ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2517.913761] PKRU: 55555554
      [ 2517.917558] Call Trace:
      [ 2517.921294]  <TASK>
      [ 2517.924982]  nvme_complete_rq+0x1c3/0x1e0 [nvme_core]
      [ 2517.928715]  nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp]
      [ 2517.932442]  nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp]
      [ 2517.936137]  ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp]
      [ 2517.939830]  tcp_read_sock+0x9c/0x260
      [ 2517.943486]  nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp]
      [ 2517.947173]  nvme_tcp_io_work+0x64/0x90 [nvme_tcp]
      [ 2517.950834]  process_one_work+0x1e8/0x390
      [ 2517.954473]  worker_thread+0x53/0x3c0
      [ 2517.958069]  ? process_one_work+0x390/0x390
      [ 2517.961655]  kthread+0x10c/0x130
      [ 2517.965211]  ? set_kthread_struct+0x40/0x40
      [ 2517.968760]  ret_from_fork+0x1f/0x30
      [ 2517.972285]  </TASK>
      
      To avoid this situation, add a NULL check for req->bio before
      calling trace_block_bio_complete.
      Signed-off-by: NYanjun Zhang <zhangyanjun@cestc.cn>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      3659fb5a
  2. 07 12月, 2022 2 次提交
  3. 06 12月, 2022 3 次提交
  4. 18 11月, 2022 1 次提交
  5. 16 11月, 2022 4 次提交
  6. 15 11月, 2022 3 次提交
  7. 02 11月, 2022 2 次提交
  8. 27 9月, 2022 6 次提交
  9. 22 9月, 2022 2 次提交
  10. 03 8月, 2022 4 次提交
  11. 26 7月, 2022 1 次提交
  12. 15 7月, 2022 1 次提交
  13. 06 7月, 2022 1 次提交
  14. 29 6月, 2022 1 次提交
    • R
      nvme: fix regression when disconnect a recovering ctrl · f7f70f4a
      Ruozhu Li 提交于
      We encountered a problem that the disconnect command hangs.
      After analyzing the log and stack, we found that the triggering
      process is as follows:
      CPU0                          CPU1
                                      nvme_rdma_error_recovery_work
                                        nvme_rdma_teardown_io_queues
      nvme_do_delete_ctrl                 nvme_stop_queues
        nvme_remove_namespaces
        --clear ctrl->namespaces
                                          nvme_start_queues
                                          --no ns in ctrl->namespaces
          nvme_ns_remove                  return(because ctrl is deleting)
            blk_freeze_queue
              blk_mq_freeze_queue_wait
              --wait for ns to unquiesce to clean infligt IO, hang forever
      
      This problem was not found in older kernels because we will flush
      err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
      seem to be modified for functional reasons, the patch can be revert
      to solve the problem.
      
      Revert commit 794a4cb3 ("nvme: remove the .stop_ctrl callout")
      Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f7f70f4a
  15. 14 6月, 2022 1 次提交
  16. 20 5月, 2022 1 次提交
  17. 16 5月, 2022 1 次提交
  18. 11 5月, 2022 1 次提交
  19. 15 4月, 2022 1 次提交
  20. 29 3月, 2022 2 次提交
    • A
      nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8
      Anton Eidelman 提交于
      nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
      fresh ANA log from the ctrl.  This is essential to have an up to date
      path states for both existing namespaces and for those scan_work may
      discover once the ctrl is up.
      
      This happens in the following cases:
        1) A new ctrl is being connected.
        2) An existing ctrl is successfully reconnected.
        3) An existing ctrl is being reset.
      
      While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
      nvme_read_ana_log() may call nvme_update_ns_ana_state().
      
      This result in a hang when the ANA state of an existing namespace changes
      and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
      through the ctrl, which does NOT have IO queues yet.
      
      See sample hang below.
      
      Solution:
      - nvme_update_ns_ana_state() to call set_live only if ctrl is live
      - nvme_read_ana_log() call from nvme_mpath_init_identify()
        therefore only fetches and parses the ANA log;
        any erros in this process will fail the ctrl setup as appropriate;
      - a separate function nvme_mpath_update()
        is called in nvme_start_ctrl();
        this parses the ANA log without fetching it.
        At this point the ctrl is live,
        therefore, disks can be set live normally.
      
      Sample failure:
          nvme nvme0: starting error recovery
          nvme nvme0: Reconnecting in 10 seconds...
          block nvme0n6: no usable path - requeuing I/O
          INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
                Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
          Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
          Call Trace:
           __schedule+0x2a2/0x7e0
           schedule+0x4e/0xb0
           io_schedule+0x16/0x40
           wait_on_page_bit_common+0x15c/0x3e0
           do_read_cache_page+0x1e0/0x410
           read_cache_page+0x12/0x20
           read_part_sector+0x46/0x100
           read_lba+0x121/0x240
           efi_partition+0x1d2/0x6a0
           bdev_disk_changed.part.0+0x1df/0x430
           bdev_disk_changed+0x18/0x20
           blkdev_get_whole+0x77/0xe0
           blkdev_get_by_dev+0xd2/0x3a0
           __device_add_disk+0x1ed/0x310
           device_add_disk+0x13/0x20
           nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
           nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
           nvme_update_ana_state+0xca/0xe0 [nvme_core]
           nvme_parse_ana_log+0xac/0x170 [nvme_core]
           nvme_read_ana_log+0x7d/0xe0 [nvme_core]
           nvme_mpath_init_identify+0x105/0x150 [nvme_core]
           nvme_init_identify+0x2df/0x4d0 [nvme_core]
           nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
           nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
           nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
           process_one_work+0x1bd/0x360
           worker_thread+0x50/0x3d0
      Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a4a6f3c8
    • S
      nvme: allow duplicate NSIDs for private namespaces · 5974ea7c
      Sungup Moon 提交于
      A NVMe subsystem with multiple controller can have private namespaces
      that use the same NSID under some conditions:
      
       "If Namespace Management, ANA Reporting, or NVM Sets are supported, the
        NSIDs shall be unique within the NVM subsystem. If the Namespace
        Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
         a) for shared namespace shall be unique; and
         b) for private namespace are not required to be unique."
      
      Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.
      
      Make sure this specific setup is supported in Linux.
      
      Fixes: 9ad1927a ("nvme: always search for namespace head")
      Signed-off-by: NSungup Moon <sungup.moon@samsung.com>
      [hch: refactored and fixed the controller vs subsystem based naming
            conflict]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      5974ea7c
  21. 16 3月, 2022 1 次提交