提交 · 3c8cef9f3d86d9bf3402f5b397f92fc7026f78b6 · openeuler / Kernel

27 1月, 2022 1 次提交

nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show · a5f3851b

由 Changcheng Deng 提交于 1月 07, 2022

Remove unneeded variable and directly return 0.
Reported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NChangcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a5f3851b

23 12月, 2021 1 次提交

nvme-fabrics: print out valid arguments when reading from /dev/nvme-fabrics · f18ee3d9

由 Hannes Reinecke 提交于 12月 07, 2021

Currently applications have a hard time figuring out which
nvme-over-fabrics arguments are supported for any given kernel;
the ioctl will return an error code on failure, and the application
has to guess whether this was due to an invalid argument or due
to a connection or controller error.
With this patch applications can read a list of supported
arguments by simply reading from /dev/nvme-fabrics, allowing
them to validate the connection string.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f18ee3d9

24 11月, 2021 1 次提交

nvme-fabrics: ignore invalid fast_io_fail_tmo values · 8e8aaf51

由 Maurizio Lombardi 提交于 11月 12, 2021

Valid fast_io_fail_tmo values are integers >= 0 or -1 (disabled).
Prevent userspace from setting arbitrary negative values.
Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8e8aaf51

21 10月, 2021 1 次提交

nvme: Add connect option 'discovery' · 20e8b689

由 Hannes Reinecke 提交于 9月 22, 2021

Add a connect option 'discovery' to specify that the connection
should be made to a discovery controller, not a normal I/O controller.
With discovery controllers supporting unique subsystem NQNs we
cannot easily distinguish by the subsystem NQN if this should be
a discovery connection, but we need this information to blank out
options not supported by discovery controllers.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

20e8b689

16 8月, 2021 1 次提交

nvme-fabrics: remove superfluous nvmf_host_put in nvmf_parse_options · e23439e9

由 Hou Pu 提交于 7月 09, 2021

Opts->host is NULL there. It is checked just before. So remove
nvmf_host_put. It is introduced by commit 59a2f3f0 ("nvme: fix
potential memory leak in option parsing").
Signed-off-by: NHou Pu <houpu.main@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e23439e9

01 7月, 2021 1 次提交

nvme: use blk_execute_rq() for passthrough commands · be42a33b

由 Keith Busch 提交于 6月 10, 2021

The generic blk_execute_rq() knows how to handle polled completions. Use
that instead of implementing an nvme specific handler.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

be42a33b

17 6月, 2021 4 次提交

nvme-fabrics: remove memset in connect io q · eff4423e

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

eff4423e

nvme-fabrics: remove memset in connect admin q · bfa9d122

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfa9d122

nvme-fabrics: remove memset in nvmf_reg_write32() · c22c2720

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c22c2720

nvme-fabrics: remove memset in nvmf_reg_read64() · 2796a8e4

由 Chaitanya Kulkarni 提交于 6月 14, 2021

Declare and initialize structure variable to the zero values so that we
can get rid of the zeroout memset call.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2796a8e4

10 6月, 2021 1 次提交

Revert "nvme-tcp-offload: ULP Series" · daf6e8c9

由 Shai Malin 提交于 6月 09, 2021

This reverts commits:
- 76241154
     nvme: NVME_TCP_OFFLOAD should not default to m
- 5ff5622e:
     Merge branch 'NVMeTCP-Offload-ULP'

As requested on the mailing-list: https://lore.kernel.org/netdev/SJ0PR18MB3882C20793EA35A3E8DAE300CC379@SJ0PR18MB3882.namprd18.prod.outlook.com/
This patch will revert the nvme-tcp-offload ULP from net-next.

The nvme-tcp-offload ULP series will continue to be considered only on
linux-nvme@lists.infradead.org.
Signed-off-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: NMichal Kalderon <mkalderon@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NShai Malin <smalin@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

daf6e8c9

04 6月, 2021 2 次提交

nvme-fabrics: Expose nvmf_check_required_opts() globally · af527935

由 Prabhakar Kushwaha 提交于 6月 02, 2021

nvmf_check_required_opts() is used to check if user provided opts has
the required_opts or not. if not, it will log which options are not
provided.

It can be leveraged by nvme-tcp-offload to check if provided opts are
supported by this specific vendor driver or not.

So expose nvmf_check_required_opts() globally.
Acked-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: NOmkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: NMichal Kalderon <mkalderon@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NShai Malin <smalin@marvell.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af527935

nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions · 98a5097d

由 Prabhakar Kushwaha 提交于 6月 02, 2021

Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
to header file, so it can be used by the different HW devices.

NVMeTCP offload devices might have different limitations of the
allowed options, for example, a device that does not support all the
queue types. With tcp and rdma, only the nvme-tcp and nvme-rdma layers
handle those attributes and the HW devices do not create any limitations
for the allowed options.

An alternative design could be to add separate fields in
nvme_tcp_ofld_ops such as max_hw_sectors and max_segments that
we already have in this series.
Acked-by: NIgor Russkikh <irusskikh@marvell.com>
Signed-off-by: NArie Gershberg <agershberg@marvell.com>
Signed-off-by: NPrabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: NOmkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: NMichal Kalderon <mkalderon@marvell.com>
Signed-off-by: NAriel Elior <aelior@marvell.com>
Signed-off-by: NShai Malin <smalin@marvell.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98a5097d

03 6月, 2021 5 次提交

nvme-fabrics: remove extra braces · 97ba6931

由 Chaitanya Kulkarni 提交于 5月 21, 2021

No need to use the braces around ~ operator.

No functionality change in this patch.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

97ba6931

nvme-fabrics: remove an extra comment · 6f860c92

由 Chaitanya Kulkarni 提交于 5月 21, 2021

Remove the comment at the end of the switch that is not needed as
function is small enough.

No functionality change in this patch.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6f860c92

nvme-fabrics: remove extra new lines in the switch · 63d20f54

由 Chaitanya Kulkarni 提交于 6月 03, 2021

Remove the extra lines in the switch block that is not common practice
in the kernel code.

No functionality change in this patch.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

63d20f54

nvme-fabrics: fix the kerneldco comment for nvmf_log_connect_error() · 25e1de8c

由 Chaitanya Kulkarni 提交于 5月 21, 2021

Fix the comment style that matches existing code.

No functionality change in this patch.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

25e1de8c

nvme-tcp: allow selecting the network interface for connections · 3ede8f72

由 Martin Belanger 提交于 5月 20, 2021

In our application, we need a way to force TCP connections to go out a
specific IP interface instead of letting Linux select the interface
based on the routing tables.

Add the 'host-iface' option to allow specifying the interface to use.
When the option host-iface is specified, the driver uses the specified
interface to set the option SO_BINDTODEVICE on the TCP socket before
connecting.

This new option is needed in addtion to the existing host-traddr for
the following reasons:

Specifying an IP interface by its associated IP address is less
intuitive than specifying the actual interface name and, in some cases,
simply doesn't work. That's because the association between interfaces
and IP addresses is not predictable. IP addresses can be changed or can
change by themselves over time (e.g. DHCP). Interface names are
predictable [1] and will persist over time. Consider the following
configuration.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ...
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 100.0.0.100/24 scope global lo
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
    link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
    inet 100.0.0.100/24 scope global enp0s3
       valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
    link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
    inet 100.0.0.100/24 scope global enp0s8
       valid_lft forever preferred_lft forever

The above is a VM that I configured with the same IP address
(100.0.0.100) on all interfaces. Doing a reverse lookup to identify the
unique interface associated with 100.0.0.100 does not work here. And
this is why the option host_iface is required. I understand that the
above config does not represent a standard host system, but I'm using
this to prove a point: "We can never know how users will configure
their systems". By te way, The above configuration is perfectly fine
by Linux.

The current TCP implementation for host_traddr performs a
bind()-before-connect(). This is a common construct to set the source
IP address on a TCP socket before connecting. This has no effect on how
Linux selects the interface for the connection. That's because Linux
uses the Weak End System model as described in RFC1122 [2]. On the other
hand, setting the Source IP Address has benefits and should be supported
by linux-nvme. In fact, setting the Source IP Address is a mandatory
FedGov requirement (e.g. connection to a RADIUS/TACACS+ server).
Consider the following configuration.

$ ip addr list dev enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
    link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8
       valid_lft 426sec preferred_lft 426sec
    inet 192.168.56.102/24 scope global secondary enp0s8
       valid_lft forever preferred_lft forever
    inet 192.168.56.103/24 scope global secondary enp0s8
       valid_lft forever preferred_lft forever
    inet 192.168.56.104/24 scope global secondary enp0s8
       valid_lft forever preferred_lft forever

Here we can see that several addresses are associated with interface
enp0s8. By default, Linux always selects the default IP address,
192.168.56.101, as the source address when connecting over interface
enp0s8. Some users, however, want the ability to specify a different
source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option
host_traddr can be used as-is to perform this function.

In conclusion, I believe that we need 2 options for TCP connections.
One that can be used to specify an interface (host-iface). And one that
can be used to set the source address (host-traddr). Users should be
allowed to use one or the other, or both, or none. Of course, the
documentation for host_traddr will need some clarification. It should
state that when used for TCP connection, this option only sets the
source address. And the documentation for host_iface should say that
this option is only available for TCP connections.

References:
[1] https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
[2] https://tools.ietf.org/html/rfc1122

Tested both IPv4 and IPv6 connections.
Signed-off-by: NMartin Belanger <martin.belanger@dell.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3ede8f72

25 5月, 2021 1 次提交

nvme-fabrics: decode host pathing error for connect · 4d9442bf

由 Hannes Reinecke 提交于 5月 21, 2021

Add an additional decoding for 'host pathing error' during connect.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4d9442bf

04 5月, 2021 1 次提交

nvme: move the fabrics queue ready check routines to core · a9715744

由 Tao Chiu 提交于 4月 26, 2021

queue_rq() in pci only checks if the dispatched queue (nvmeq) is ready,
e.g. not being suspended. Since nvme_alloc_admin_tags() in reset flow
restarts the admin queue, users are able to submit admin commands to a
controller before reset_work() completes. Commands submitted under this
condition may interfere with commands that performs identify, IO queue
setup in reset_work(), and may result in a hang described in the
following patch.

As seen in the fabrics, user commands are prevented from being executed
under inproper controller states. We may reuse this logic to maintain a
clear admin queue during reset_work().
Signed-off-by: NTao Chiu <taochiu@synology.com>
Signed-off-by: NCody Wong <codywong@synology.com>
Reviewed-by: NLeon Chien <leonchien@synology.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a9715744

22 4月, 2021 1 次提交

nvme: sanitize KATO setting · a70b81bd

由 Hannes Reinecke 提交于 4月 16, 2021

According to the NVMe base spec the KATO commands should be sent
at half of the KATO interval, to properly account for round-trip
times.
As we now will only ever send one KATO command per connection we
can easily use the recommended values.
This also fixes a potential issue where the request timeout for
the KATO command does not match the value in the connect command,
which might be causing spurious connection drops from the target.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a70b81bd

05 3月, 2021 1 次提交

nvme-fabrics: fix kato initialization · 32feb6de

由 Martin George 提交于 2月 11, 2021

Currently kato is initialized to NVME_DEFAULT_KATO for both
discovery & i/o controllers. This is a problem specifically
for non-persistent discovery controllers since it always ends
up with a non-zero kato value. Fix this by initializing kato
to zero instead, and ensuring various controllers are assigned
appropriate kato values as follows:

non-persistent controllers  - kato set to zero
persistent controllers      - kato set to NVMF_DEV_DISC_TMO
                              (or any positive int via nvme-cli)
i/o controllers             - kato set to NVME_DEFAULT_KATO
                              (or any positive int via nvme-cli)
Signed-off-by: NMartin George <marting@netapp.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

32feb6de

10 2月, 2021 1 次提交

nvme-fabrics: avoid double completions in nvmf_fail_nonready_command · ea5e5f42

由 Chao Leng 提交于 2月 01, 2021

When reconnecting, the request may be completed with
NVME_SC_HOST_PATH_ERROR in nvmf_fail_nonready_command, which currently
set the state of the request to MQ_RQ_IN_FLIGHT before calling
nvme_complete_rq.  When this happens for a request that is freed by
the caller, such as nvme_submit_user_cmd, in the worst case the request
could be completed again in tear down process.

Instead of calling blk_mq_start_request from nvmf_fail_nonready_command,
just use the new nvme_host_path_error helper to complete the command
without starting it.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ea5e5f42

02 12月, 2020 1 次提交

nvme-fabrics: reject I/O to offline device · 8c4dfea9

由 Victor Gladkov 提交于 11月 24, 2020

Commands get stuck while Host NVMe-oF controller is in reconnect state.
The controller enters into reconnect state when it loses connection with
the target.  It tries to reconnect every 10 seconds (default) until
a successful reconnect or until the reconnect time-out is reached.
The default reconnect time out is 10 minutes.

Applications are expecting commands to complete with success or error
within a certain timeout (30 seconds by default).  The NVMe host is
enforcing that timeout while it is connected, but during reconnect the
timeout is not enforced and commands may get stuck for a long period or
even forever.

To fix this long delay due to the default timeout, introduce new
"fast_io_fail_tmo" session parameter.  The timeout is measured in seconds
from the controller reconnect and any command beyond that timeout is
rejected.  The new parameter value may be passed during 'connect'.
The default value of -1 means no timeout (similar to current behavior).
Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChao Leng <lengchao@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8c4dfea9

09 9月, 2020 1 次提交

nvme-fabrics: allow to queue requests for live queues · 73a53799

由 Sagi Grimberg 提交于 9月 08, 2020

Right now we are failing requests based on the controller state (which
is checked inline in nvmf_check_ready) however we should definitely
accept requests if the queue is live.

When entering controller reset, we transition the controller into
NVME_CTRL_RESETTING, and then return BLK_STS_RESOURCE for non-mpath
requests (have blk_noretry_request set).

This is also the case for NVME_REQ_USER for the wrong reason. There
shouldn't be any reason for us to reject this I/O in a controller reset.
We do want to prevent passthru commands on the admin queue because we
need the controller to fully initialize first before we let user passthru
admin commands to be issued.

In a non-mpath setup, this means that the requests will simply be
requeued over and over forever not allowing the q_usage_counter to drop
its final reference, causing controller reset to hang if running
concurrently with heavy I/O.

Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

73a53799

29 8月, 2020 1 次提交

nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance · d7144f5c

由 Sagi Grimberg 提交于 8月 14, 2020

NVME_CTRL_NEW should never see any I/O, because in order to start
initialization it has to transition to NVME_CTRL_CONNECTING and from
there it will never return to this state.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

d7144f5c

29 7月, 2020 1 次提交

nvme: fix deadlock in disconnect during scan_work and/or ana_work · ecca390e

由 Sagi Grimberg 提交于 7月 22, 2020

A deadlock happens in the following scenario with multipath:
1) scan_work(nvme0) detects a new nsid while nvme0
    is an optimized path to it, path nvme1 happens to be
    inaccessible.

2) Before scan_work is complete nvme0 disconnect is initiated
    nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING

3) scan_work(1) attempts to submit IO,
    but nvme_path_is_optimized() observes nvme0 is not LIVE.
    Since nvme1 is a possible path IO is requeued and scan_work hangs.

--
Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel:  __schedule+0x2b9/0x6c0
kernel:  schedule+0x42/0xb0
kernel:  io_schedule+0x16/0x40
kernel:  do_read_cache_page+0x438/0x830
kernel:  read_cache_page+0x12/0x20
kernel:  read_dev_sector+0x27/0xc0
kernel:  read_lba+0xc1/0x220
kernel:  efi_partition+0x1e6/0x708
kernel:  check_partition+0x154/0x244
kernel:  rescan_partitions+0xae/0x280
kernel:  __blkdev_get+0x40f/0x560
kernel:  blkdev_get+0x3d/0x140
kernel:  __device_add_disk+0x388/0x480
kernel:  device_add_disk+0x13/0x20
kernel:  nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel:  nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel:  nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel:  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel:  nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel:  nvme_validate_ns+0x396/0x940 [nvme_core]
kernel:  nvme_scan_work+0x24f/0x380 [nvme_core]
kernel:  process_one_work+0x1db/0x380
kernel:  worker_thread+0x249/0x400
kernel:  kthread+0x104/0x140
--

4) Delete also hangs in flush_work(ctrl->scan_work)
    from nvme_remove_namespaces().

Similiarly a deadlock with ana_work may happen: if ana_work has started
and calls nvme_mpath_set_live and device_add_disk, it will
trigger I/O. When we trigger disconnect I/O will block because
our accessible (optimized) path is disconnecting, but the alternate
path is inaccessible, so I/O blocks. Then disconnect tries to flush
the ana_work and hangs.

[  605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
[  605.552087] Call Trace:
[  605.552683]  __schedule+0x2b9/0x6c0
[  605.553507]  schedule+0x42/0xb0
[  605.554201]  io_schedule+0x16/0x40
[  605.555012]  do_read_cache_page+0x438/0x830
[  605.556925]  read_cache_page+0x12/0x20
[  605.557757]  read_dev_sector+0x27/0xc0
[  605.558587]  amiga_partition+0x4d/0x4c5
[  605.561278]  check_partition+0x154/0x244
[  605.562138]  rescan_partitions+0xae/0x280
[  605.563076]  __blkdev_get+0x40f/0x560
[  605.563830]  blkdev_get+0x3d/0x140
[  605.564500]  __device_add_disk+0x388/0x480
[  605.565316]  device_add_disk+0x13/0x20
[  605.566070]  nvme_mpath_set_live+0x5e/0x130 [nvme_core]
[  605.567114]  nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
[  605.568197]  nvme_update_ana_state+0xca/0xe0 [nvme_core]
[  605.569360]  nvme_parse_ana_log+0xa1/0x180 [nvme_core]
[  605.571385]  nvme_read_ana_log+0x76/0x100 [nvme_core]
[  605.572376]  nvme_ana_work+0x15/0x20 [nvme_core]
[  605.573330]  process_one_work+0x1db/0x380
[  605.574144]  worker_thread+0x4d/0x400
[  605.574896]  kthread+0x104/0x140
[  605.577205]  ret_from_fork+0x35/0x40
[  605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
[  605.579239]       Tainted: G           OE     5.3.5-050305-generic #201910071830
[  605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.582320] nvme            D    0 14044  14043 0x00000000
[  605.583424] Call Trace:
[  605.583935]  __schedule+0x2b9/0x6c0
[  605.584625]  schedule+0x42/0xb0
[  605.585290]  schedule_timeout+0x203/0x2f0
[  605.588493]  wait_for_completion+0xb1/0x120
[  605.590066]  __flush_work+0x123/0x1d0
[  605.591758]  __cancel_work_timer+0x10e/0x190
[  605.593542]  cancel_work_sync+0x10/0x20
[  605.594347]  nvme_mpath_stop+0x2f/0x40 [nvme_core]
[  605.595328]  nvme_stop_ctrl+0x12/0x50 [nvme_core]
[  605.596262]  nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
[  605.597333]  nvme_sysfs_delete+0x5c/0x70 [nvme_core]
[  605.598320]  dev_attr_store+0x17/0x30

Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
indicate the phase of controller deletion where I/O cannot be allowed
to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
be issued to the bottom device, and only after we flush the ana_work
and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
from re-firing by aborting early if we are not LIVE, so we should be safe
here.

In addition, change the transport drivers to follow the updated state
machine.

Fixes: 0d0b660f ("nvme: add ANA support")
Reported-by: NAnton Eidelman <anton@lightbitslabs.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ecca390e

26 3月, 2020 1 次提交

nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow · 8d8a50e2

由 Takashi Iwai 提交于 3月 11, 2020

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

8d8a50e2

12 9月, 2019 1 次提交

nvme-fabrics: allow discovery subsystems accept a kato · 2d352df5

由 Sagi Grimberg 提交于 7月 12, 2019

This modifies the behavior of discovery subsystems to accept
a kato as a preparation to support discovery log change
events. This also means that now every discovery controller
will have a default kato value, and for non-persistent connections
the host needs to pass in a zero kato value (keep_alive_tmo=0).
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

2d352df5

30 8月, 2019 2 次提交

nvme: make fabrics command run on a separate request queue · e7832cb4

由 Sagi Grimberg 提交于 8月 02, 2019

We have a fundamental issue that fabric commands use the admin_q.
The reason is, that admin-connect, register reads and writes and
admin commands cannot be guaranteed ordering while we are running
controller resets.

For example, when we reset a controller we perform:
1. disable the controller
2. teardown the admin queue
3. re-establish the admin queue
4. enable the controller

In order to perform (3), we need to unquiesce the admin queue, however
we may have some admin commands that are already pending on the
quiesced admin_q and will immediate execute when we unquiesce it before
we execute (4). The host must not send admin commands to the controller
before enabling the controller.

To fix this, we have the fabric commands (admin connect and property
get/set, but not I/O queue connect) use a separate fabrics_q and make
sure to quiesce the admin_q before we disable the controller, and
unquiesce it only after we enable the controller.

This fixes the error prints from nvmet in a controller reset storm test:
kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
Which indicate that the host is sending an admin command when the
controller is not enabled.
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

e7832cb4

nvme-fabrics: Add type of service (TOS) configuration · 52b4451a

由 Israel Rukshin 提交于 8月 18, 2019

TOS is user-defined and needs to be configured via nvme-cli.
It must be set before initiating any traffic and once set the TOS
cannot be changed.
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

52b4451a

21 6月, 2019 1 次提交

nvme: introduce nvme_is_fabrics to check fabrics cmd · 7a1f46e3

由 Minwoo Im 提交于 6月 06, 2019

This patch introduces a nvme_is_fabrics() inline function to check
whether or not the given command structure is for fabrics.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

7a1f46e3

14 5月, 2019 1 次提交

nvme-fabrics: remove unused argument · 94e970b6

由 Minwoo Im 提交于 5月 11, 2019

The variable 'count' is not currently used by nvmf_create_ctrl(), so
remove it.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

94e970b6

01 5月, 2019 1 次提交

nvme-fabrics: check more command sizes · a2faf94e

由 Minwoo Im 提交于 4月 12, 2019

struct common_command provides a common structure for NVMe-oF command
format.  It also needs to be checked for unintended size growth.
Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a2faf94e

20 2月, 2019 2 次提交

nvme-fabrics: convert to SPDX identifiers · 9002c4e5

由 Christoph Hellwig 提交于 2月 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

9002c4e5

nvme-fabrics: document the poll function argument · a467fc55

由 Bart Van Assche 提交于 2月 14, 2019

This patch avoids that the kernel-doc tool reports a warning when
building with W=1.

Fixes: 26c68227 ("nvme-fabrics: allow nvmf_connect_io_queue to poll") # v5.0-rc1
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a467fc55

10 1月, 2019 1 次提交

nvme-fabrics: unset write/poll queues for discovery controllers · 9846ac01

由 Sagi Grimberg 提交于 1月 07, 2019

Even if user-space sent it to us, it got it wrong so lets
help by disallowing it.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9846ac01

19 12月, 2018 3 次提交

nvme-fabrics: allow user to pass in nr_poll_queues · 89d43802

由 Sagi Grimberg 提交于 12月 14, 2018

This argument will specify how many polling I/O queues to connect when
creating the controller. These I/O queues will host I/O that is set with
REQ_HIPRI.
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

89d43802

nvme-fabrics: allow nvmf_connect_io_queue to poll · 26c68227

由 Sagi Grimberg 提交于 12月 14, 2018

Preparation for polling support for fabrics. Polling support
means that our completion queues are not generating any interrupts
which means we need to poll for the nvmf io queue connect as well.

Reviewed by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

26c68227

nvme-core: optionally poll sync commands · 6287b51c

由 Sagi Grimberg 提交于 12月 14, 2018

Pass poll bool to indicate that we need it to poll. This prepares us for
polling support in nvmf since connect is an I/O that will be queued
and has to be polled in order to complete. If poll is passed,
we call nvme_execute_rq_polled which sends the requests and polls
for its completion.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6287b51c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功