1. 06 12月, 2022 1 次提交
  2. 30 11月, 2022 1 次提交
    • C
      nvme: fix SRCU protection of nvme_ns_head list · 899d2a05
      Caleb Sander 提交于
      Walking the nvme_ns_head siblings list is protected by the head's srcu
      in nvme_ns_head_submit_bio() but not nvme_mpath_revalidate_paths().
      Removing namespaces from the list also fails to synchronize the srcu.
      Concurrent scan work can therefore cause use-after-frees.
      
      Hold the head's srcu lock in nvme_mpath_revalidate_paths() and
      synchronize with the srcu, not the global RCU, in nvme_ns_remove().
      
      Observed the following panic when making NVMe/RDMA connections
      with native multipath on the Rocky Linux 8.6 kernel
      (it seems the upstream kernel has the same race condition).
      Disassembly shows the faulting instruction is cmp 0x50(%rdx),%rcx;
      computing capacity != get_capacity(ns->disk).
      Address 0x50 is dereferenced because ns->disk is NULL.
      The NULL disk appears to be the result of concurrent scan work
      freeing the namespace (note the log line in the middle of the panic).
      
      [37314.206036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [37314.206036] nvme0n3: detected capacity change from 0 to 11811160064
      [37314.299753] PGD 0 P4D 0
      [37314.299756] Oops: 0000 [#1] SMP PTI
      [37314.299759] CPU: 29 PID: 322046 Comm: kworker/u98:3 Kdump: loaded Tainted: G        W      X --------- -  - 4.18.0-372.32.1.el8test86.x86_64 #1
      [37314.299762] Hardware name: Dell Inc. PowerEdge R720/0JP31P, BIOS 2.7.0 05/23/2018
      [37314.299763] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [37314.299783] RIP: 0010:nvme_mpath_revalidate_paths+0x26/0xb0 [nvme_core]
      [37314.299790] Code: 1f 44 00 00 66 66 66 66 90 55 53 48 8b 5f 50 48 8b 83 c8 c9 00 00 48 8b 13 48 8b 48 50 48 39 d3 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 50 74 05 f0 80 60 70 ef 48 8b 50 30 48 8d 42 d0 48 39 d3
      [37315.058803] RSP: 0018:ffffabe28f913d10 EFLAGS: 00010202
      [37315.121316] RAX: ffff927a077da800 RBX: ffff92991dd70000 RCX: 0000000001600000
      [37315.206704] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff92991b719800
      [37315.292106] RBP: ffff929a6b70c000 R08: 000000010234cd4a R09: c0000000ffff7fff
      [37315.377501] R10: 0000000000000001 R11: ffffabe28f913a30 R12: 0000000000000000
      [37315.462889] R13: ffff92992716600c R14: ffff929964e6e030 R15: ffff92991dd70000
      [37315.548286] FS:  0000000000000000(0000) GS:ffff92b87fb80000(0000) knlGS:0000000000000000
      [37315.645111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [37315.713871] CR2: 0000000000000050 CR3: 0000002208810006 CR4: 00000000000606e0
      [37315.799267] Call Trace:
      [37315.828515]  nvme_update_ns_info+0x1ac/0x250 [nvme_core]
      [37315.892075]  nvme_validate_or_alloc_ns+0x2ff/0xa00 [nvme_core]
      [37315.961871]  ? __blk_mq_free_request+0x6b/0x90
      [37316.015021]  nvme_scan_work+0x151/0x240 [nvme_core]
      [37316.073371]  process_one_work+0x1a7/0x360
      [37316.121318]  ? create_worker+0x1a0/0x1a0
      [37316.168227]  worker_thread+0x30/0x390
      [37316.212024]  ? create_worker+0x1a0/0x1a0
      [37316.258939]  kthread+0x10a/0x120
      [37316.297557]  ? set_kthread_struct+0x50/0x50
      [37316.347590]  ret_from_fork+0x35/0x40
      [37316.390360] Modules linked in: nvme_rdma nvme_tcp(X) nvme_fabrics nvme_core netconsole iscsi_tcp libiscsi_tcp dm_queue_length dm_service_time nf_conntrack_netlink br_netfilter bridge stp llc overlay nft_chain_nat ipt_MASQUERADE nf_nat xt_addrtype xt_CT nft_counter xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_multiport nft_compat nf_tables libcrc32c nfnetlink dm_multipath tg3 rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm intel_rapl_msr iTCO_wdt iTCO_vendor_support dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel ib_uverbs rapl intel_cstate intel_uncore ib_core ipmi_si joydev mei_me pcspkr ipmi_devintf mei lpc_ich wmi ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 mlx5_core drm_kms_helper syscopyarea
      [37316.390419]  sysfillrect ahci sysimgblt fb_sys_fops libahci drm crc32c_intel libata mlxfw pci_hyperv_intf tls i2c_algo_bit psample dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nvme_core]
      [37317.645908] CR2: 0000000000000050
      
      Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
      Signed-off-by: NCaleb Sander <csander@purestorage.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      899d2a05
  3. 09 11月, 2022 1 次提交
  4. 19 10月, 2022 2 次提交
  5. 30 9月, 2022 1 次提交
  6. 27 9月, 2022 5 次提交
  7. 22 9月, 2022 1 次提交
  8. 19 9月, 2022 2 次提交
  9. 07 9月, 2022 1 次提交
  10. 03 8月, 2022 12 次提交
  11. 26 7月, 2022 1 次提交
  12. 14 7月, 2022 1 次提交
  13. 06 7月, 2022 1 次提交
  14. 29 6月, 2022 1 次提交
    • R
      nvme: fix regression when disconnect a recovering ctrl · f7f70f4a
      Ruozhu Li 提交于
      We encountered a problem that the disconnect command hangs.
      After analyzing the log and stack, we found that the triggering
      process is as follows:
      CPU0                          CPU1
                                      nvme_rdma_error_recovery_work
                                        nvme_rdma_teardown_io_queues
      nvme_do_delete_ctrl                 nvme_stop_queues
        nvme_remove_namespaces
        --clear ctrl->namespaces
                                          nvme_start_queues
                                          --no ns in ctrl->namespaces
          nvme_ns_remove                  return(because ctrl is deleting)
            blk_freeze_queue
              blk_mq_freeze_queue_wait
              --wait for ns to unquiesce to clean infligt IO, hang forever
      
      This problem was not found in older kernels because we will flush
      err work in nvme_stop_ctrl before nvme_remove_namespaces.It does not
      seem to be modified for functional reasons, the patch can be revert
      to solve the problem.
      
      Revert commit 794a4cb3 ("nvme: remove the .stop_ctrl callout")
      Signed-off-by: NRuozhu Li <liruozhu@huawei.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f7f70f4a
  15. 28 6月, 2022 2 次提交
  16. 23 6月, 2022 1 次提交
  17. 14 6月, 2022 2 次提交
  18. 31 5月, 2022 1 次提交
    • N
      nvme: set controller enable bit in a separate write · aa41d2fe
      Niklas Cassel 提交于
      The NVM Express Base Specification 2.0 specifies in the description
      of the CC – Controller Configuration register:
      "Host software shall set the Arbitration Mechanism Selected (CC.AMS),
      the Memory Page Size (CC.MPS), and the I/O Command Set Selected (CC.CSS)
      to valid values prior to enabling the controller by setting CC.EN to ‘1’.
      
      While we haven't seen any controller misbehaving while setting all bits
      in a single write, let's do it in the order that it is written in the
      spec, as there could potentially be controllers that are implemented to
      rely on the configuration bits being set before enabling the controller.
      Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      aa41d2fe
  19. 28 5月, 2022 1 次提交
  20. 20 5月, 2022 2 次提交
    • K
      nvme: enable uring-passthrough for admin commands · 58e5bdeb
      Kanchan Joshi 提交于
      Add two new opcodes that userspace can use for admin commands:
      NVME_URING_CMD_ADMIN : non-vectroed
      NVME_URING_CMD_ADMIN_VEC : vectored variant
      
      Wire up support when these are issued on controller node(/dev/nvmeX).
      Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      58e5bdeb
    • C
      nvme: set non-mdts limits in nvme_scan_work · 78288665
      Chaitanya Kulkarni 提交于
      In current implementation we set the non-mdts limits by calling
      nvme_init_non_mdts_limits() from nvme_init_ctrl_finish().
      This also tries to set the limits for the discovery controller which
      has no I/O queues resulting in the warning message reported by the
      nvme_log_error() when running blktest nvme/002: -
      
      [ 2005.155946] run blktests nvme/002 at 2022-04-09 16:57:47
      [ 2005.192223] loop: module loaded
      [ 2005.196429] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
      [ 2005.200334] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
      
      <------------------------------SNIP---------------------------------->
      
      [ 2008.958108] nvmet: adding nsid 1 to subsystem blktests-subsystem-997
      [ 2008.962082] nvmet: adding nsid 1 to subsystem blktests-subsystem-998
      [ 2008.966102] nvmet: adding nsid 1 to subsystem blktests-subsystem-999
      [ 2008.973132] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN testhostnqn.
      *[ 2008.973196] nvme1: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) MORE DNR*
      [ 2008.974595] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery"
      [ 2009.103248] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
      
      Move the call of nvme_init_non_mdts_limits() to nvme_scan_work() after
      we verify that I/O queues are created since that is a converging point
      for each transport where these limits are actually used.
      
      1. FC :
      nvme_fc_create_association()
       ...
       nvme_fc_create_io_queues(ctrl);
       ...
       nvme_start_ctrl()
        nvme_scan_queue()
         nvme_scan_work()
      
      2. PCIe:-
      nvme_reset_work()
       ...
       nvme_setup_io_queues()
        nvme_create_io_queues()
         nvme_alloc_queue()
       ...
       nvme_start_ctrl()
        nvme_scan_queue()
         nvme_scan_work()
      
      3. RDMA :-
      nvme_rdma_setup_ctrl
       ...
        nvme_rdma_configure_io_queues
        ...
        nvme_start_ctrl()
         nvme_scan_queue()
          nvme_scan_work()
      
      4. TCP :-
      nvme_tcp_setup_ctrl
       ...
        nvme_tcp_configure_io_queues
        ...
        nvme_start_ctrl()
         nvme_scan_queue()
          nvme_scan_work()
      
      * nvme_scan_work()
      ...
      nvme_validate_or_alloc_ns()
        nvme_alloc_ns()
         nvme_update_ns_info()
          nvme_update_disk_info()
           nvme_config_discard() <---
           blk_queue_max_write_zeroes_sectors() <---
      Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      78288665