- 20 10月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Add two APIs for stopping and starting admin queue. Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 10月, 2021 3 次提交
-
-
由 Jens Axboe 提交于
Trivial to do now, just need our own io_comp_batch on the stack and pass that in to the usual command completion handling. I pondered making this dependent on how many entries we had to process, but even for a single entry there's no discernable difference in performance or latency. Running a sync workload over io_uring: t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1 yields the below performance before the patch: IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) and the following after: IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) which definitely isn't slower, about the same if you factor in a bit of variance. For peak performance workloads, benchmarking shows a 2% improvement. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Take advantage of struct io_comp_batch, if passed in to the nvme poll handler. If it's set, rather than complete each request individually inline, store them in the io_comp_batch list. We only do so for requests that will complete successfully, anything else will be completed inline as before. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
Set the poll queue flag to enable polling, given that the multipath node just dispatches the bios to a lower queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Split the integrity/metadata handling definitions out into a new header. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 10月, 2021 1 次提交
-
-
由 Adam Manzanares 提交于
Decrease reference count of chardevice during char device deletion in order to fix a memory leak. Add a release callabck for the device associated chardev and move ida_simple_remove into the release function. Fixes: 2637baed ("nvme: introduce generic per-namespace chardev") Reported-by: NYi Zhang <yi.zhang@redhat.com> Suggested-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAdam Manzanares <a.manzanares@samsung.com> Reviewed-by: NJavier González <javier@javigon.com> Tested-by: NYi Zhang <yi.zhang@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 07 10月, 2021 1 次提交
-
-
由 Keith Busch 提交于
The request tag is no longer the only component of the command id. Fixes: e7006de6 ("nvme: code command_id with a genctr for use-after-free validation") Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 28 9月, 2021 1 次提交
-
-
由 Keith Busch 提交于
Some apple controllers use the command id as an index to implementation specific data structures and will fail if the value is out of bounds. The nvme driver's recently introduced command sequence number breaks this controller. Provide a quirk so these spec incompliant controllers can function as before. The driver will not have the ability to detect bad completions when this quirk is used, but we weren't previously checking this anyway. The quirk bit was selected so that it can readily apply to stable. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214509 Cc: Sven Peter <sven@svenpeter.dev> Reported-by: NOrlando Chamberlain <redecorating@protonmail.com> Reported-by: NAditya Garg <gargaditya08@live.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Tested-by: NSven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20210927154306.387437-1-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 9月, 2021 5 次提交
-
-
由 Christoph Hellwig 提交于
Various places in the nvme code that rely on ctrl->namespace to be ordered. Ensure that the namespae is inserted into the list at the right position from the start instead of sorting it after the fact. Fixes: 540c801c ("NVMe: Implement namespace list scanning") Reported-by: NAnton Eidelman <anton.eidelman@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
-
由 Sagi Grimberg 提交于
When the controller sends us multiple r2t PDUs in a single request we need to account for it correctly as our send/recv context run concurrently (i.e. we get a new r2t with r2t_offset before we updated our iterator and req->data_sent marker). This can cause wrong offsets to be sent to the controller. To fix that, we will first know that this may happen only in the send sequence of the last page, hence we will take the r2t_offset to the h2c PDU data_offset, and in nvme_tcp_try_send_data loop, we make sure to increment the request markers also when we completed a PDU but we are expecting more r2t PDUs as we still did not send the entire data of the request. Fixes: 825619b0 ("nvme-tcp: fix possible use-after-completion") Reported-by: NNowak, Lukasz <Lukasz.Nowak@Dell.com> Tested-by: NNowak, Lukasz <Lukasz.Nowak@Dell.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Remove the freeze/unfreeze around changes to the number of hardware queues. Study and retest has indicated there are no ios that can be active at this point so there is nothing to freeze. nvme-fc is draining the queues in the shutdown and error recovery path in __nvme_fc_abort_outstanding_ios. This patch primarily reverts 88e837ed "nvme-fc: wait for queues to freeze before calling update_hr_hw_queues". It's not an exact revert as it leaves the adjusting of hw queues only if the count changes. Signed-off-by: NJames Smart <jsmart2021@gmail.com> [dwagner: added explanation why no IO is pending] Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. This patch merges the admin and io sync ops into the queue teardown logic as shown in the RDMA patch 3017013d "nvme-rdma: avoid race between time out and tear down". There is no teardown_lock in nvme-fc. Signed-off-by: NJames Smart <jsmart2021@gmail.com> Tested-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Daniel Wagner 提交于
In case the number of hardware queues changes, we need to update the tagset and the mapping of ctx to hctx first. If we try to create and connect the I/O queues first, this operation will fail (target will reject the connect call due to the wrong number of queues) and hence we bail out of the recreate function. Then we will to try the very same operation again, thus we don't make any progress. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 15 9月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no need to explicitly unregister the integrity profile when deleting the gendisk. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20210914070657.87677-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 9月, 2021 3 次提交
-
-
由 Keith Busch 提交于
Dispatching requests inline with the .queue_rq() call may block while holding the send_mutex. If the tcp io_work also happens to schedule, it may see the req_list is non-empty, leaving "pending" true and remaining in TASK_RUNNING. Since io_work is of higher scheduling priority, the .queue_rq task may not get a chance to run, blocking forward progress and leading to io timeouts. Instead of checking for pending requests within io_work, let the queueing restart io_work outside the send_mutex lock if there is more work to be done. Fixes: a0fdd141 ("nvme-tcp: rerun io_work if req_list is not empty") Reported-by: NSamuel Jones <sjones@kalrayinc.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Ruozhu Li 提交于
We should always destroy cm_id before destroy qp to avoid to get cma event after qp was destroyed, which may lead to use after free. In RDMA connection establishment error flow, don't destroy qp in cm event handler.Just report cm_error to upper level, qp will be destroy in nvme_rdma_alloc_queue() after destroy cm id. Signed-off-by: NRuozhu Li <liruozhu@huawei.com> Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Anton Eidelman 提交于
nvme_update_ana_state() has a deficiency that results in a failure to properly update the ana state for a namespace in the following case: NSIDs in ctrl->namespaces: 1, 3, 4 NSIDs in desc->nsids: 1, 2, 3, 4 Loop iteration 0: ns index = 0, n = 0, ns->head->ns_id = 1, nsid = 1, MATCH. Loop iteration 1: ns index = 1, n = 1, ns->head->ns_id = 3, nsid = 2, NO MATCH. Loop iteration 2: ns index = 2, n = 2, ns->head->ns_id = 4, nsid = 4, MATCH. Where the update to the ANA state of NSID 3 is missed. To fix this increment n and retry the update with the same ns when ns->head->ns_id is higher than nsid, Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 13 9月, 2021 2 次提交
-
-
由 Daniel Wagner 提交于
When we remove the siblings entry, we update ns->head->list, hence we can't separate the removal and test for being empty. They have to be in the same critical section to avoid a race. To avoid breaking the refcounting imbalance again, add a list empty check to nvme_find_ns_head. Fixes: 5396fdac ("nvme: fix refcounting imbalance when all paths are down") Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Tested-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Dan Carpenter 提交于
This was intended to limit the number of characters printed from "subsys->serial" to NVMET_SN_MAX_SIZE. But accidentally the width specifier was used instead of the precision specifier so it only affects the alignment and not the number of characters printed. Fixes: f0406481 ("nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 06 9月, 2021 10 次提交
-
-
由 Luis Chamberlain 提交于
We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Daniel Wagner 提交于
The function nmve_mpath_clear_current_path returns true if the current path has changed. In this case we have to wait for all concurrent submissions to finish. But if we didn't change the current path, there is no point in waiting for another RCU period to finish. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Tatsuya Sasaki 提交于
Currently the connection between host and NVMe-oF target gets disconnected by keep-alive timeout when a user connects to a target with a relatively large kato value and then sets the smaller kato with a set features command (e.g. connects with 60 seconds kato value and then sets 10 seconds kato value). The cause is that keep alive command interval on the host, which is defined as unsigned int kato in nvme_ctrl structure, does not follow the kato value changes. This patch updates the keep alive interval in the following steps when the kato is modified by a set features command: stops the keep alive work queue, then sets the kato as new timer value and re-start the queue. Signed-off-by: NTatsuya Sasaki <tatsuya6.sasaki@kioxia.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Daniel Wagner 提交于
The spec says 7.4.6.1 Digest Error handling When a host detects a data digest error in a C2HData PDU, that host shall continue processing C2HData PDUs associated with the command and when the command processing has completed, if a successful status was returned by the controller, the host shall fail the command with a non-fatal transport error. Currently the transport is reseted when a data digest error is detected. Instead, when a digest error is detected, mark the final status as NVME_SC_DATA_XFER_ERROR and let the upper layer handle the error. In order to keep track of the final result maintain a status field in nvme_tcp_request object and use it to overwrite the completion queue status (which might be successful even though a digest error has been detected) when completing the request. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
The serial number is copied into the buffer via memcpy_and_pad() with the length NVMET_SN_MAX_SIZE. So when printing out we also need to take just that length as anything beyond that will be uninitialized. Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
The target core code never needs the host-side nvme_ctrl structure. Open code two uses of nvmet_is_passthru_req in passthru.c, and then switch the helpers used by the core to return bool. Also rename the fuctions to better match their usage: nvmet_passthru_ctrl -> nvmet_is_passthru_subsys nvmet_req_passthru_ctrl -> nvmet_is_passthru_req Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Adam Manzanares 提交于
For a passthru controller make cap initialization dependent on the cap of the passthru controller, given that multiple Command Set support needs to be supported by the underlying controller. For that move the initialization of CAP later so that it can use the fully initialized nvmet_ctrl structure. Fixes: ab5d0b38 (nvmet: add Command Set Identifier support) Signed-off-by: NAdam Manzanares <a.manzanares@samsung.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> [hch: refactored the code a bit to keep it more contained in passthru.c] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Adam Manzanares 提交于
Preparatory patch in order to reuse nvme_multi_css in the nvme target code. Signed-off-by: NAdam Manzanares <a.manzanares@samsung.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
When triggering a rescan due to a namespace resize we will be receiving AENs on every controller, triggering a rescan of all attached namespaces. If multipath is active only the current path and the ns_head disk will be updated, the other paths will still refer to the old size until AENs for the remaining controllers are received. If I/O comes in before that it might be routed to one of the old paths, triggering an I/O failure with 'access beyond end of device'. With this patch the old paths are skipped from multipath path selection until the controller serving these paths has been rescanned. Signed-off-by: NHannes Reinecke <hare@suse.de> [dwagner: - introduce NVME_NS_READY flag instead of NVME_NS_INVALIDATE - use 'revalidate' instead of 'invalidate' which follows the zoned device code path. - clear NVME_NS_READY before clearing current_path] Signed-off-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
The nvme multipathing code just dispatches bios to one of the blk-mq based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT to support REQ_NOWAIT bios. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 24 8月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch to use the blk_mq_alloc_disk helper for allocating the request_queue and gendisk. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20210816131910.615153-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 8月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
These values are unused now that the lightnvm support is gone. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org>
-
由 Keith Busch 提交于
Now that the lightnvm driver is removed, we don't need a pointer to it's now non-existent struct. Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Transport drivers need both core and fabrics modules, instead of selecting both, have the selection transitive such that NVME_FABRICS selects NVME_CORE and transport drivers select NVME_FABRICS. Suggested-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Use bvec_virt instead of open coding it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20210804095634.460779-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 8月, 2021 3 次提交
-
-
由 Amit Engel 提交于
Check that host sqsize is not greater-than Maximum Queue Entries Supported (MQES) value supported by the controller. Signed-off-by: NAmit Engel <amit.engel@dell.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Amit Engel 提交于
According to the NVMe specification, if the host sends a Connect command specifying a queue id which has already been created, a status value of NVME_SC_CMD_SEQ_ERROR is returned. Signed-off-by: NAmit Engel <amit.engel@dell.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Amit Engel 提交于
According to the NVMe specification, the response dword 0 value of the Connect command is based on status code: return cntlid for successful compeltion return IPO and IATTR for connect invalid parameters. Fix a missing error information for a zero sized queue, and return the cntlid also for I/O queue Connect commands. Signed-off-by: NAmit Engel <amit.engel@dell.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-