- 23 12月, 2021 4 次提交
-
-
由 Hannes Reinecke 提交于
While the 'iopolicy' sysfs attribute can be set at runtime, most storage arrays prefer to use the 'round-robin' iopolicy per default. We can use udev rules to set this, but is getting rather unwieldy for rebranded arrays as we would have to update the udev rules anytime a new array shows up, leading to the same mess we currently have in multipathd for configuring the RDAC arrays. Hence this patch adds a module parameter 'iopolicy' to allow the admin to switch the default, and to do away with the need for a udev rule here. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Geliang Tang 提交于
The variable 'ctrl' became useless since the code using it was dropped from nvme_setup_cmd() in the commit 292ddf67bbd5 ("nvme: increment request genctr on completion"). Fix it to get rid of this compilation warning in the nvme-5.17 branch: drivers/nvme/host/core.c: In function ‘nvme_setup_cmd’: drivers/nvme/host/core.c:993:20: warning: unused variable ‘ctrl’ [-Wunused-variable] struct nvme_ctrl *ctrl = nvme_req(req)->ctrl; ^~~~ Fixes: 292ddf67bbd5 ("nvme: increment request genctr on completion") Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The nvme request generation counter is intended to catch duplicate completions. Incrementing the counter on submission means duplicates can only be caught if the request tag is reallocated and dispatched prior to the driver observing the corrupted CQE. Incrementing on completion removes this window, making it possible to detect duplicate completions in consecutive entries. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Currently applications have a hard time figuring out which nvme-over-fabrics arguments are supported for any given kernel; the ioctl will return an error code on failure, and the application has to guess whether this was due to an invalid argument or due to a connection or controller error. With this patch applications can read a list of supported arguments by simply reading from /dev/nvme-fabrics, allowing them to validate the connection string. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 29 11月, 2021 2 次提交
-
-
由 Christoph Hellwig 提交于
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Just use the disk attached to the request_queue instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 25 11月, 2021 1 次提交
-
-
由 Maurizio Lombardi 提交于
Submit I/O requests with the IOCB_NOWAIT flag set only if the underlying filesystem supports it. Fixes: 50a909db ("nvmet: use IOCB_NOWAIT for file-ns buffered I/O") Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 24 11月, 2021 9 次提交
-
-
由 Klaus Jensen 提交于
Write Zeroes sets PRACT when block integrity is enabled (as it should), but neglects to also set the reftag which is expected by reads. This causes protection errors on reads. Fix this by setting the reftag for type 1 and 2 (for type 3, reads will not check the reftag). Signed-off-by: NKlaus Jensen <k.jensen@samsung.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Maurizio Lombardi 提交于
Valid fast_io_fail_tmo values are integers >= 0 or -1 (disabled). Prevent userspace from setting arbitrary negative values. Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Enzo Matsumiya 提交于
This particular Kioxia device times out and aborts I/O during any load, but it's more easily observable with discards (fstrim). The device gets to a state that is also not possible to use "nvme set-feature" to disable APST. Booting with nvme_core.default_ps_max_latency=0 solves the issue. We had a dozen or so of these devices behaving this same way in customer environments. Signed-off-by: NEnzo Matsumiya <ematsumiya@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Maurizio Lombardi 提交于
Release the page frag cache when tearing down the io queues Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Varun Prakash 提交于
If maxh2cdata < r2t_length then driver will form multiple H2CData PDUs, validate R2T PDU in nvme_tcp_handle_r2t() to reuse nvme_tcp_setup_h2c_data_pdu(). Also set req->state to NVME_TCP_SEND_H2C_PDU in nvme_tcp_setup_h2c_data_pdu(). Signed-off-by: NVarun Prakash <varun@chelsio.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Varun Prakash 提交于
Current nvmet_try_send_ddgst() code does not check whether all data digest bytes are transmitted, fix this by returning -EAGAIN if all data digest bytes are not transmitted. Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: NVarun Prakash <varun@chelsio.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Maurizio Lombardi 提交于
If a reset controller is executed while the initiator is performing some I/O the driver may leak the memory allocated for the commands' iovec. Make sure that nvmet_tcp_uninit_data_in_cmds() releases all the memory. Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Maurizio Lombardi 提交于
Makes the code easier to read and to debug. Sets the freed pointers to NULL, it will be useful when destroying the queues to understand if the commands' buffers have been released already or not. Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Maurizio Lombardi 提交于
If the initiator executes a reset controller operation while performing I/O, the target kernel will crash because of a race condition between release_queue and io_work; nvmet_tcp_uninit_data_in_cmds() may be executed while io_work is running, calling flush_work() was not sufficient to prevent this because io_work could requeue itself. Fix this bug by using cancel_work_sync() to prevent io_work from requeuing itself and set rcv_state to NVMET_TCP_RECV_ERR to make sure we don't receive any more data from the socket. Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 09 11月, 2021 1 次提交
-
-
由 Ming Lei 提交于
NVMe uses one atomic flag to check if quiesce is needed. If quiesce is started, the helper returns immediately. This way is wrong, since we have to wait until quiesce is done. Fixes: e70feb8b ("blk-mq: support concurrent queue quiesce/unquiesce") Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-5-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 10月, 2021 7 次提交
-
-
由 Amit Engel 提交于
Pass the correct length to nvmet_tcp_verify_hdgst, which is the pdu header length. This fixes a wrong behaviour where header digest verification passes although the digest is wrong. Signed-off-by: NAmit Engel <amit.engel@dell.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Len Baker 提交于
In an effort to avoid open-coded arithmetic in the kernel [1], use the flex_array_size() and struct_size() helpers instead of an open-coded calculation. [1] https://github.com/KSPP/linux/issues/160Signed-off-by: NLen Baker <len.baker@gmx.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Register the discovery subsystem as the 'current' discovery subsystem, and add a new discovery log page entry for it. Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Invert the check for discovery subsystem type to allow for additional discovery subsystem types. Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Varun Prakash 提交于
exp_ddgst is of type __le32, &cmd->exp_ddgst + cmd->offset increases &cmd->exp_ddgst by 4 * cmd->offset, fix this by type casting &cmd->exp_ddgst to u8 *. Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: NVarun Prakash <varun@chelsio.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Varun Prakash 提交于
ddgst is of type __le32, &req->ddgst + req->offset increases &req->ddgst by 4 * req->offset, fix this by type casting &req->ddgst to u8 *. Fixes: 3f2304f8 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: NVarun Prakash <varun@chelsio.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Varun Prakash 提交于
With commit db5ad6b7 ("nvme-tcp: try to send request in queue_rq context") r2t and response PDU can get processed while send function is executing. Current data digest send code uses req->offset after kernel_sendmsg(), this creates a race condition where req->offset gets reset before it is used in send function. This can happen in two cases - 1. Target sends r2t PDU which resets req->offset. 2. Target send response PDU which completes the req and then req is used for a new command, nvme_tcp_setup_cmd_pdu() resets req->offset. Fix this by storing req->offset in a local variable and using this local variable after kernel_sendmsg(). Fixes: db5ad6b7 ("nvme-tcp: try to send request in queue_rq context") Signed-off-by: NVarun Prakash <varun@chelsio.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 26 10月, 2021 3 次提交
-
-
由 Sagi Grimberg 提交于
We should not access request members after the last send, even to determine if indeed it was the last data payload send. The reason is that a completion could have arrived and trigger a new execution of the request which overridden these members. This was fixed by commit 825619b0 ("nvme-tcp: fix possible use-after-completion"). Commit e371af03 broke that assumption again to address cases where multiple r2t pdus are sent per request. To fix it, we need to record the request data_sent and data_len and after the payload network send we reference these counters to determine weather we should advance the request iterator. Fixes: e371af03 ("nvme-tcp: fix incorrect h2cdata pdu offset accounting") Reported-by: NKeith Busch <kbusch@kernel.org> Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Maurizio Lombardi 提交于
page_frag_free() won't completely release the memory allocated for the commands, the cache page must be explicitly freed by calling __page_frag_cache_drain(). This bug can be easily reproduced by repeatedly executing the following command on the initiator: $echo 1 > /sys/devices/virtual/nvme-fabrics/ctl/nvme0/reset_controller Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 10月, 2021 13 次提交
-
-
由 Len Baker 提交于
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. In this case this is not actually dynamic size: all the operands involved in the calculation are constant values. However it is better to refactor this anyway, just to keep the open-coded math idiom out of code. So, use the struct_size() helper to do the arithmetic instead of the argument "size + count * size" in the kmalloc() function. This code was detected with the help of Coccinelle and audited and fixed manually. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-argumentsSigned-off-by: NLen Baker <len.baker@gmx.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
When reading the partition table on initial scan hits an I/O error the I/O will hang with the scan_mutex held: [<0>] do_read_cache_page+0x49b/0x790 [<0>] read_part_sector+0x39/0xe0 [<0>] read_lba+0xf9/0x1d0 [<0>] efi_partition+0xf1/0x7f0 [<0>] bdev_disk_changed+0x1ee/0x550 [<0>] blkdev_get_whole+0x81/0x90 [<0>] blkdev_get_by_dev+0x128/0x2e0 [<0>] device_add_disk+0x377/0x3c0 [<0>] nvme_mpath_set_live+0x130/0x1b0 [nvme_core] [<0>] nvme_mpath_add_disk+0x150/0x160 [nvme_core] [<0>] nvme_alloc_ns+0x417/0x950 [nvme_core] [<0>] nvme_validate_or_alloc_ns+0xe9/0x1e0 [nvme_core] [<0>] nvme_scan_work+0x168/0x310 [nvme_core] [<0>] process_one_work+0x231/0x420 and trying to delete the controller will deadlock as it tries to grab the scan mutex: [<0>] nvme_mpath_clear_ctrl_paths+0x25/0x80 [nvme_core] [<0>] nvme_remove_namespaces+0x31/0xf0 [nvme_core] [<0>] nvme_do_delete_ctrl+0x4b/0x80 [nvme_core] As we're now properly ordering the namespace list there is no need to hold the scan_mutex in nvme_mpath_clear_ctrl_paths() anymore. And we always need to kick the requeue list as the path will be marked as unusable and I/O will be requeued _without_ a current path. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
The host memory doorbell and event buffers need to be initialized on each reset so the driver doesn't observe stale values from the previous instantiation. Signed-off-by: NKeith Busch <kbusch@kernel.org> Tested-by: NJohn Levon <john.levon@nutanix.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
In case that icdoff is not zero or mandatory keyed sgls are not supported by the NVMe/RDMA target, we'll go to error flow but we'll return 0 to the caller. Fix it by returning an appropriate error code. Fixes: c66e2998 ("nvme-rdma: centralize controller setup sequence") Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Luis Chamberlain 提交于
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Since we now can tell for sure when a disk was added, move setting the bit NVME_NSHEAD_DISK_LIVE only when we did add the disk successfully. Nothing to do here as the cleanup is done elsewhere. We take care and use test_and_set_bit() because it is protects against two nvme paths simultaneously calling device_add_disk() on the same namespace head. Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
This makes the code more readable. Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
This makes the code more readable. Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
With discovery controllers supporting unique subsystem NQNs the actual subsystem NQN might be different from that one passed in via the connect args. So add a helper to display the resulting subsystem NQN. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Add a connect option 'discovery' to specify that the connection should be made to a discovery controller, not a normal I/O controller. With discovery controllers supporting unique subsystem NQNs we cannot easily distinguish by the subsystem NQN if this should be a discovery connection, but we need this information to blank out options not supported by discovery controllers. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
With unique discovery controller NQNs we cannot distinguish the subsystem type by the NQN alone, but need to check the subsystem type, too. So expose the subsystem type in a new sysfs attribute 'subsystype'. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Set the correct 'CNTRLTYPE' field in the identify controller data. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Add a helper function to determine if a given subsystem is a discovery subsystem. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
TPAR8013 allows for unique discovery NQNs, so make the discovery controller NQN configurable by exposing a subsys attribute 'discovery_nqn'. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-