1. 27 1月, 2022 1 次提交
  2. 23 12月, 2021 1 次提交
  3. 24 11月, 2021 1 次提交
  4. 21 10月, 2021 1 次提交
  5. 16 8月, 2021 1 次提交
  6. 01 7月, 2021 1 次提交
  7. 17 6月, 2021 4 次提交
  8. 10 6月, 2021 1 次提交
  9. 04 6月, 2021 2 次提交
  10. 03 6月, 2021 5 次提交
    • C
      nvme-fabrics: remove extra braces · 97ba6931
      Chaitanya Kulkarni 提交于
      No need to use the braces around ~ operator.
      
      No functionality change in this patch.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      97ba6931
    • C
      nvme-fabrics: remove an extra comment · 6f860c92
      Chaitanya Kulkarni 提交于
      Remove the comment at the end of the switch that is not needed as
      function is small enough.
      
      No functionality change in this patch.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6f860c92
    • C
      nvme-fabrics: remove extra new lines in the switch · 63d20f54
      Chaitanya Kulkarni 提交于
      Remove the extra lines in the switch block that is not common practice
      in the kernel code.
      
      No functionality change in this patch.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      63d20f54
    • C
      nvme-fabrics: fix the kerneldco comment for nvmf_log_connect_error() · 25e1de8c
      Chaitanya Kulkarni 提交于
      Fix the comment style that matches existing code.
      
      No functionality change in this patch.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      25e1de8c
    • M
      nvme-tcp: allow selecting the network interface for connections · 3ede8f72
      Martin Belanger 提交于
      In our application, we need a way to force TCP connections to go out a
      specific IP interface instead of letting Linux select the interface
      based on the routing tables.
      
      Add the 'host-iface' option to allow specifying the interface to use.
      When the option host-iface is specified, the driver uses the specified
      interface to set the option SO_BINDTODEVICE on the TCP socket before
      connecting.
      
      This new option is needed in addtion to the existing host-traddr for
      the following reasons:
      
      Specifying an IP interface by its associated IP address is less
      intuitive than specifying the actual interface name and, in some cases,
      simply doesn't work. That's because the association between interfaces
      and IP addresses is not predictable. IP addresses can be changed or can
      change by themselves over time (e.g. DHCP). Interface names are
      predictable [1] and will persist over time. Consider the following
      configuration.
      
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ...
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 100.0.0.100/24 scope global lo
             valid_lft forever preferred_lft forever
      2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
          link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
          inet 100.0.0.100/24 scope global enp0s3
             valid_lft forever preferred_lft forever
      3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
          link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
          inet 100.0.0.100/24 scope global enp0s8
             valid_lft forever preferred_lft forever
      
      The above is a VM that I configured with the same IP address
      (100.0.0.100) on all interfaces. Doing a reverse lookup to identify the
      unique interface associated with 100.0.0.100 does not work here. And
      this is why the option host_iface is required. I understand that the
      above config does not represent a standard host system, but I'm using
      this to prove a point: "We can never know how users will configure
      their systems". By te way, The above configuration is perfectly fine
      by Linux.
      
      The current TCP implementation for host_traddr performs a
      bind()-before-connect(). This is a common construct to set the source
      IP address on a TCP socket before connecting. This has no effect on how
      Linux selects the interface for the connection. That's because Linux
      uses the Weak End System model as described in RFC1122 [2]. On the other
      hand, setting the Source IP Address has benefits and should be supported
      by linux-nvme. In fact, setting the Source IP Address is a mandatory
      FedGov requirement (e.g. connection to a RADIUS/TACACS+ server).
      Consider the following configuration.
      
      $ ip addr list dev enp0s8
      3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
          link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
          inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8
             valid_lft 426sec preferred_lft 426sec
          inet 192.168.56.102/24 scope global secondary enp0s8
             valid_lft forever preferred_lft forever
          inet 192.168.56.103/24 scope global secondary enp0s8
             valid_lft forever preferred_lft forever
          inet 192.168.56.104/24 scope global secondary enp0s8
             valid_lft forever preferred_lft forever
      
      Here we can see that several addresses are associated with interface
      enp0s8. By default, Linux always selects the default IP address,
      192.168.56.101, as the source address when connecting over interface
      enp0s8. Some users, however, want the ability to specify a different
      source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option
      host_traddr can be used as-is to perform this function.
      
      In conclusion, I believe that we need 2 options for TCP connections.
      One that can be used to specify an interface (host-iface). And one that
      can be used to set the source address (host-traddr). Users should be
      allowed to use one or the other, or both, or none. Of course, the
      documentation for host_traddr will need some clarification. It should
      state that when used for TCP connection, this option only sets the
      source address. And the documentation for host_iface should say that
      this option is only available for TCP connections.
      
      References:
      [1] https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
      [2] https://tools.ietf.org/html/rfc1122
      
      Tested both IPv4 and IPv6 connections.
      Signed-off-by: NMartin Belanger <martin.belanger@dell.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      3ede8f72
  11. 25 5月, 2021 1 次提交
  12. 04 5月, 2021 1 次提交
  13. 22 4月, 2021 1 次提交
    • H
      nvme: sanitize KATO setting · a70b81bd
      Hannes Reinecke 提交于
      According to the NVMe base spec the KATO commands should be sent
      at half of the KATO interval, to properly account for round-trip
      times.
      As we now will only ever send one KATO command per connection we
      can easily use the recommended values.
      This also fixes a potential issue where the request timeout for
      the KATO command does not match the value in the connect command,
      which might be causing spurious connection drops from the target.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a70b81bd
  14. 05 3月, 2021 1 次提交
    • M
      nvme-fabrics: fix kato initialization · 32feb6de
      Martin George 提交于
      Currently kato is initialized to NVME_DEFAULT_KATO for both
      discovery & i/o controllers. This is a problem specifically
      for non-persistent discovery controllers since it always ends
      up with a non-zero kato value. Fix this by initializing kato
      to zero instead, and ensuring various controllers are assigned
      appropriate kato values as follows:
      
      non-persistent controllers  - kato set to zero
      persistent controllers      - kato set to NVMF_DEV_DISC_TMO
                                    (or any positive int via nvme-cli)
      i/o controllers             - kato set to NVME_DEFAULT_KATO
                                    (or any positive int via nvme-cli)
      Signed-off-by: NMartin George <marting@netapp.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      32feb6de
  15. 10 2月, 2021 1 次提交
    • C
      nvme-fabrics: avoid double completions in nvmf_fail_nonready_command · ea5e5f42
      Chao Leng 提交于
      When reconnecting, the request may be completed with
      NVME_SC_HOST_PATH_ERROR in nvmf_fail_nonready_command, which currently
      set the state of the request to MQ_RQ_IN_FLIGHT before calling
      nvme_complete_rq.  When this happens for a request that is freed by
      the caller, such as nvme_submit_user_cmd, in the worst case the request
      could be completed again in tear down process.
      
      Instead of calling blk_mq_start_request from nvmf_fail_nonready_command,
      just use the new nvme_host_path_error helper to complete the command
      without starting it.
      Signed-off-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ea5e5f42
  16. 02 12月, 2020 1 次提交
    • V
      nvme-fabrics: reject I/O to offline device · 8c4dfea9
      Victor Gladkov 提交于
      Commands get stuck while Host NVMe-oF controller is in reconnect state.
      The controller enters into reconnect state when it loses connection with
      the target.  It tries to reconnect every 10 seconds (default) until
      a successful reconnect or until the reconnect time-out is reached.
      The default reconnect time out is 10 minutes.
      
      Applications are expecting commands to complete with success or error
      within a certain timeout (30 seconds by default).  The NVMe host is
      enforcing that timeout while it is connected, but during reconnect the
      timeout is not enforced and commands may get stuck for a long period or
      even forever.
      
      To fix this long delay due to the default timeout, introduce new
      "fast_io_fail_tmo" session parameter.  The timeout is measured in seconds
      from the controller reconnect and any command beyond that timeout is
      rejected.  The new parameter value may be passed during 'connect'.
      The default value of -1 means no timeout (similar to current behavior).
      Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com>
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8c4dfea9
  17. 09 9月, 2020 1 次提交
    • S
      nvme-fabrics: allow to queue requests for live queues · 73a53799
      Sagi Grimberg 提交于
      Right now we are failing requests based on the controller state (which
      is checked inline in nvmf_check_ready) however we should definitely
      accept requests if the queue is live.
      
      When entering controller reset, we transition the controller into
      NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
      requests (have blk_noretry_request set).
      
      This is also the case for NVME_REQ_USER for the wrong reason. There
      shouldn't be any reason for us to reject this I/O in a controller reset.
      We do want to prevent passthru commands on the admin queue because we
      need the controller to fully initialize first before we let user passthru
      admin commands to be issued.
      
      In a non-mpath setup, this means that the requests will simply be
      requeued over and over forever not allowing the q_usage_counter to drop
      its final reference, causing controller reset to hang if running
      concurrently with heavy I/O.
      
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      73a53799
  18. 29 8月, 2020 1 次提交
  19. 29 7月, 2020 1 次提交
    • S
      nvme: fix deadlock in disconnect during scan_work and/or ana_work · ecca390e
      Sagi Grimberg 提交于
      A deadlock happens in the following scenario with multipath:
      1) scan_work(nvme0) detects a new nsid while nvme0
          is an optimized path to it, path nvme1 happens to be
          inaccessible.
      
      2) Before scan_work is complete nvme0 disconnect is initiated
          nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING
      
      3) scan_work(1) attempts to submit IO,
          but nvme_path_is_optimized() observes nvme0 is not LIVE.
          Since nvme1 is a possible path IO is requeued and scan_work hangs.
      
      --
      Workqueue: nvme-wq nvme_scan_work [nvme_core]
      kernel: Call Trace:
      kernel:  __schedule+0x2b9/0x6c0
      kernel:  schedule+0x42/0xb0
      kernel:  io_schedule+0x16/0x40
      kernel:  do_read_cache_page+0x438/0x830
      kernel:  read_cache_page+0x12/0x20
      kernel:  read_dev_sector+0x27/0xc0
      kernel:  read_lba+0xc1/0x220
      kernel:  efi_partition+0x1e6/0x708
      kernel:  check_partition+0x154/0x244
      kernel:  rescan_partitions+0xae/0x280
      kernel:  __blkdev_get+0x40f/0x560
      kernel:  blkdev_get+0x3d/0x140
      kernel:  __device_add_disk+0x388/0x480
      kernel:  device_add_disk+0x13/0x20
      kernel:  nvme_mpath_set_live+0x119/0x140 [nvme_core]
      kernel:  nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
      kernel:  nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
      kernel:  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      kernel:  nvme_mpath_add_disk+0x47/0x90 [nvme_core]
      kernel:  nvme_validate_ns+0x396/0x940 [nvme_core]
      kernel:  nvme_scan_work+0x24f/0x380 [nvme_core]
      kernel:  process_one_work+0x1db/0x380
      kernel:  worker_thread+0x249/0x400
      kernel:  kthread+0x104/0x140
      --
      
      4) Delete also hangs in flush_work(ctrl->scan_work)
          from nvme_remove_namespaces().
      
      Similiarly a deadlock with ana_work may happen: if ana_work has started
      and calls nvme_mpath_set_live and device_add_disk, it will
      trigger I/O. When we trigger disconnect I/O will block because
      our accessible (optimized) path is disconnecting, but the alternate
      path is inaccessible, so I/O blocks. Then disconnect tries to flush
      the ana_work and hangs.
      
      [  605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
      [  605.552087] Call Trace:
      [  605.552683]  __schedule+0x2b9/0x6c0
      [  605.553507]  schedule+0x42/0xb0
      [  605.554201]  io_schedule+0x16/0x40
      [  605.555012]  do_read_cache_page+0x438/0x830
      [  605.556925]  read_cache_page+0x12/0x20
      [  605.557757]  read_dev_sector+0x27/0xc0
      [  605.558587]  amiga_partition+0x4d/0x4c5
      [  605.561278]  check_partition+0x154/0x244
      [  605.562138]  rescan_partitions+0xae/0x280
      [  605.563076]  __blkdev_get+0x40f/0x560
      [  605.563830]  blkdev_get+0x3d/0x140
      [  605.564500]  __device_add_disk+0x388/0x480
      [  605.565316]  device_add_disk+0x13/0x20
      [  605.566070]  nvme_mpath_set_live+0x5e/0x130 [nvme_core]
      [  605.567114]  nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
      [  605.568197]  nvme_update_ana_state+0xca/0xe0 [nvme_core]
      [  605.569360]  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
      [  605.571385]  nvme_read_ana_log+0x76/0x100 [nvme_core]
      [  605.572376]  nvme_ana_work+0x15/0x20 [nvme_core]
      [  605.573330]  process_one_work+0x1db/0x380
      [  605.574144]  worker_thread+0x4d/0x400
      [  605.574896]  kthread+0x104/0x140
      [  605.577205]  ret_from_fork+0x35/0x40
      [  605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
      [  605.579239]       Tainted: G           OE     5.3.5-050305-generic #201910071830
      [  605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  605.582320] nvme            D    0 14044  14043 0x00000000
      [  605.583424] Call Trace:
      [  605.583935]  __schedule+0x2b9/0x6c0
      [  605.584625]  schedule+0x42/0xb0
      [  605.585290]  schedule_timeout+0x203/0x2f0
      [  605.588493]  wait_for_completion+0xb1/0x120
      [  605.590066]  __flush_work+0x123/0x1d0
      [  605.591758]  __cancel_work_timer+0x10e/0x190
      [  605.593542]  cancel_work_sync+0x10/0x20
      [  605.594347]  nvme_mpath_stop+0x2f/0x40 [nvme_core]
      [  605.595328]  nvme_stop_ctrl+0x12/0x50 [nvme_core]
      [  605.596262]  nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
      [  605.597333]  nvme_sysfs_delete+0x5c/0x70 [nvme_core]
      [  605.598320]  dev_attr_store+0x17/0x30
      
      Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
      indicate the phase of controller deletion where I/O cannot be allowed
      to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
      be issued to the bottom device, and only after we flush the ana_work
      and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
      we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
      from re-firing by aborting early if we are not LIVE, so we should be safe
      here.
      
      In addition, change the transport drivers to follow the updated state
      machine.
      
      Fixes: 0d0b660f ("nvme: add ANA support")
      Reported-by: NAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ecca390e
  20. 26 3月, 2020 1 次提交
  21. 12 9月, 2019 1 次提交
  22. 30 8月, 2019 2 次提交
    • S
      nvme: make fabrics command run on a separate request queue · e7832cb4
      Sagi Grimberg 提交于
      We have a fundamental issue that fabric commands use the admin_q.
      The reason is, that admin-connect, register reads and writes and
      admin commands cannot be guaranteed ordering while we are running
      controller resets.
      
      For example, when we reset a controller we perform:
      1. disable the controller
      2. teardown the admin queue
      3. re-establish the admin queue
      4. enable the controller
      
      In order to perform (3), we need to unquiesce the admin queue, however
      we may have some admin commands that are already pending on the
      quiesced admin_q and will immediate execute when we unquiesce it before
      we execute (4). The host must not send admin commands to the controller
      before enabling the controller.
      
      To fix this, we have the fabric commands (admin connect and property
      get/set, but not I/O queue connect) use a separate fabrics_q and make
      sure to quiesce the admin_q before we disable the controller, and
      unquiesce it only after we enable the controller.
      
      This fixes the error prints from nvmet in a controller reset storm test:
      kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
      Which indicate that the host is sending an admin command when the
      controller is not enabled.
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e7832cb4
    • I
      nvme-fabrics: Add type of service (TOS) configuration · 52b4451a
      Israel Rukshin 提交于
      TOS is user-defined and needs to be configured via nvme-cli.
      It must be set before initiating any traffic and once set the TOS
      cannot be changed.
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      52b4451a
  23. 21 6月, 2019 1 次提交
  24. 14 5月, 2019 1 次提交
  25. 01 5月, 2019 1 次提交
  26. 20 2月, 2019 2 次提交
  27. 10 1月, 2019 1 次提交
  28. 19 12月, 2018 3 次提交