- 06 1月, 2021 5 次提交
-
-
由 Minwoo Im 提交于
There are no callers for nvme_reset_ctrl_sync() and nvme_alloc_request_qid() so that we keep the symbols exported. Unexport those functions, mark them static and update the header file respectively. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
While handling the completion queue, keep a local copy of the command id from the DMA-accessible completion entry. This silences a time-of-check to time-of-use (TOCTOU) warning from KF/x[1], with respect to a Thunderclap[2] vulnerability analysis. The double-read impact appears benign. There may be a theoretical window for @command_id to be used as an adversary-controlled array-index-value for mounting a speculative execution attack, but that mitigation is saved for a potential follow-on. A man-in-the-middle attack on the data payload is out of scope for this analysis and is hopefully mitigated by filesystem integrity mechanisms. [1] https://github.com/intel/kernel-fuzzer-for-xen-project [2] http://thunderclap.io/thunderclap-paper-ndss2019.pdfSigned-off-by: NLalithambika Krishna Kumar <lalithambika.krishnakumar@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We may send a request (with or without its data) from two paths: 1. From our I/O context nvme_tcp_io_work which is triggered from: - queue_rq - r2t reception - socket data_ready and write_space callbacks 2. Directly from queue_rq if the send_list is empty (because we want to save the context switch associated with scheduling our io_work). However, given that now we have the send_mutex, we may run into a race condition where none of these contexts will send the pending payload to the controller. Both io_work send path and queue_rq send path opportunistically attempt to acquire the send_mutex however queue_rq only attempts to send a single request, and if io_work context fails to acquire the send_mutex it will complete without rescheduling itself. The race can trigger with the following sequence: 1. queue_rq sends request (no incapsule data) and blocks 2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU to the send_list and schedules io_work 3. io_work triggers and cannot acquire the send_mutex - because of (1), ends without self rescheduling 4. queue_rq completes the send, and completes ==> no context will send the h2cdata - timeout. Fix this by having queue_rq sending as much as it can from the send_list such that if it still has any left, its because the socket buffer is full and the socket write_space callback will trigger, thus guaranteeing that a context will be scheduled to send the h2cdata PDU. Fixes: db5ad6b7 ("nvme-tcp: try to send request in queue_rq context") Reported-by: NPotnuri Bharat Teja <bharat@chelsio.com> Reported-by: NSamuel Jones <sjones@kalrayinc.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Tested-by: NPotnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Gopal Tiwari 提交于
A system with more than one of these SSDs will only have one usable. Hence the kernel fails to detect nvme devices due to duplicate cntlids. [ 6.274554] nvme nvme1: Duplicate cntlid 33 with nvme0, rejecting [ 6.274566] nvme nvme1: Removing after probe failure status: -22 Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue. Signed-off-by: NGopal Tiwari <gtiwari@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios used to be called from a timeout or work context. Now it is being called in an io completion context, which can be an interrupt handler. Unfortunately, the abort outstanding ios routine attempts to stop nvme queues and nested routines that may try to sleep, which is in conflict with the interrupt handler. Correct replacing the direct call with a work element scheduling, and the abort outstanding ios routine will be called in the work element. Fixes: 95ced8a2 ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery") Signed-off-by: NJames Smart <james.smart@broadcom.com> Reported-by: NDaniel Wagner <dwagner@suse.de> Tested-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 05 12月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The request_queue can trivially be derived from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 12月, 2020 13 次提交
-
-
由 Javier González 提交于
Allow ZNS NVMe SSDs to present a read-only namespace when append is not supported, instead of rejecting the namespace directly. This allows (i) the namespace to be used in read-only mode, which is not a problem as the append command only affects the write path, and (ii) to use standard management tools such as nvme-cli to choose a different format or firmware slot that is compatible with the Linux zoned block device. Signed-off-by: NJavier González <javier.gonz@samsung.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Javier González 提交于
Remane block device operations in preparation to add char device file operations. Signed-off-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Javier González 提交于
Rename controller base dev_t char device in preparation for adding a namespace char device. Signed-off-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Javier González 提交于
Cleanup unnecessary ret values that are not checked or used in nvme_alloc_ns(). Signed-off-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Minwoo Im 提交于
During the scan_work, an Identify command is issued to figure out which namespaces are active. If this command fails, the nvme driver falls back to scanning namespaces sequentially. In this situation, we don't see any warnings and don't even know whether list-ns command has been failed or not easiliy. Printa warning when the Identify command executin fail: [ 1.108399] nvme nvme0: Identify NS List failed (status=0x400b) [ 1.109583] nvme0n1: detected capacity change from 0 to 1048576 [ 1.112186] nvme nvme0: Identify Descriptors failed (nsid=2, status=0x4002) [ 1.113929] nvme nvme0: Identify Descriptors failed (nsid=3, status=0x4002) [ 1.116537] nvme nvme0: Identify Descriptors failed (nsid=4, status=0x4002) ... Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Minwoo Im 提交于
Add the namespace ID to the error message when the Identify command used to retrieve the Namespace Identification Descriptor list fails. This avoids rather useless and duplicative messages like the following: [ 1.321031] nvme nvme0: Identify Descriptors failed (16386) [ 1.321948] nvme nvme0: Identify Descriptors failed (16386) [ 1.322872] nvme nvme0: Identify Descriptors failed (16386) [ 1.323775] nvme nvme0: Identify Descriptors failed (16386) [ 1.324687] nvme nvme0: Identify Descriptors failed (16386) ... Also, print the nvme status code in hexadecimal rather than decimal format rather for better readability. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Victor Gladkov 提交于
Commands get stuck while Host NVMe-oF controller is in reconnect state. The controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until a successful reconnect or until the reconnect time-out is reached. The default reconnect time out is 10 minutes. Applications are expecting commands to complete with success or error within a certain timeout (30 seconds by default). The NVMe host is enforcing that timeout while it is connected, but during reconnect the timeout is not enforced and commands may get stuck for a long period or even forever. To fix this long delay due to the default timeout, introduce new "fast_io_fail_tmo" session parameter. The timeout is measured in seconds from the controller reconnect and any command beyond that timeout is rejected. The new parameter value may be passed during 'connect'. The default value of -1 means no timeout (similar to current behavior). Signed-off-by: NVictor Gladkov <victor.gladkov@kioxia.com> Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Schnelle 提交于
currently the NVME_QUIRK_SHARED_TAGS quirk for Apple devices is handled during the assignment of nr_io_queues in nvme_setup_io_queues(). This however means that for these devices nvme_max_io_queues() will actually not return the supported maximum which is confusing and unexpected and also means that in nvme_probe() we are allocating for I/O queues that will never be used. Fix this by moving the quirk handling into nvme_max_io_queues(). Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Schnelle 提交于
in nvme_setup_io_queues() the number of I/O queues is set to either 1 in case of a quirky Apple device or to the min of nvme_max_io_queues() or dev->nr_allocated_queues - 1. This is unnecessarily complicated as dev->nr_allocated_queues is only assigned once and is nvme_max_io_queues() + 1. Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Right now nvme_alloc_request() allocates a request from block layer based on the value of the qid. When qid set to NVME_QID_ANY it used blk_mq_alloc_request() else blk_mq_alloc_request_hctx(). The function nvme_alloc_request() is called from different context, The only place where it uses non NVME_QID_ANY value is for fabrics connect commands :- nvme_submit_sync_cmd() NVME_QID_ANY nvme_features() NVME_QID_ANY nvme_sec_submit() NVME_QID_ANY nvmf_reg_read32() NVME_QID_ANY nvmf_reg_read64() NVME_QID_ANY nvmf_reg_write32() NVME_QID_ANY nvmf_connect_admin_queue() NVME_QID_ANY nvme_submit_user_cmd() NVME_QID_ANY nvme_alloc_request() nvme_keep_alive() NVME_QID_ANY nvme_alloc_request() nvme_timeout() NVME_QID_ANY nvme_alloc_request() nvme_delete_queue() NVME_QID_ANY nvme_alloc_request() nvmet_passthru_execute_cmd() NVME_QID_ANY nvme_alloc_request() nvmf_connect_io_queue() QID __nvme_submit_sync_cmd() nvme_alloc_request() With passthru nvme_alloc_request() now falls into the I/O fast path such that blk_mq_alloc_request_hctx() is never gets called and that adds additional branch check in fast path. Split the nvme_alloc_request() into nvme_alloc_request() and nvme_alloc_request_qid(). Replace each call of the nvme_alloc_request() with NVME_QID_ANY param with a call to newly added nvme_alloc_request() without NVME_QID_ANY. Replace a call to nvme_alloc_request() with QID param with a call to newly added nvme_alloc_request() and nvme_alloc_request_qid() based on the qid value set in the __nvme_submit_sync_cmd(). Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to make consistent with NVME_IO_TIMEOUT. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
The function nvme_alloc_request() is called from different context (I/O and Admin queue) where callers do not consider the I/O timeout when called from I/O queue context. Update nvme_alloc_request() to set the default I/O and Admin timeout value based on whether the queuedata is set or not. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Baolin Wang 提交于
Use the request's '->mq_hctx->queue_num' directly to simplify the nvme_req_qid() function. Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 16 11月, 2020 3 次提交
-
-
由 Christoph Hellwig 提交于
Use the block layer helper to update both the disk and block device sizes. Contrary to the name no notification is sent in this case, as a size 0 is special cased. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The update_bdev argument is always set to true, so remove it. Also rename the function to the slighly less verbose set_capacity_and_notify, as propagating the disk size to the block device isn't really revalidation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NPetr Vorel <pvorel@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
There is no good reason to call revalidate_disk_size separately. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 11月, 2020 3 次提交
-
-
由 Keith Busch 提交于
xa_destroy() frees only internal data. The caller is responsible for freeing the exteranl objects referenced by an xarray. Fixes: 1cf7a12e ("nvme: use an xarray to lookup the Commands Supported and Effects log") Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Remove the struct used for tracking known command effects logs in a list. This is now saved in an xarray that doesn't use these elements. Instead, store the log directly instead of the wrapper struct. Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Minwoo Im 提交于
If Doorbell Buffer Config command fails even 'dev->dbbuf_dbs != NULL' which means OACS indicates that NVME_CTRL_OACS_DBBUF_SUPP is set, nvme_dbbuf_update_and_check_event() will check event even it's not been successfully set. This patch fixes mismatch among dbbuf for sq/cqs in case that dbbuf command fails. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 13 11月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
->dma_device is a private implementation detail of the RDMA core. Use the ibdev_to_node helper to get the NUMA node for a ib_device instead of poking into ->dma_device. Link: https://lore.kernel.org/r/20201106181941.1878556-5-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
-
- 10 11月, 2020 1 次提交
-
-
由 Sagi Grimberg 提交于
The offending commit breaks BLKROSET ioctl because a device revalidation will blindly override BLKROSET setting. Hence, we remove the disk rw setting in case NVME_NS_ATTR_RO is cleared from by the controller. Fixes: 1293477f ("nvme: set gendisk read only based on nsattr") Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 03 11月, 2020 6 次提交
-
-
由 Sagi Grimberg 提交于
The request may be executed asynchronously, and rq->state may be changed to IDLE. To avoid repeated request completion, only MQ_RQ_COMPLETE of rq->state is checked in nvme_tcp_complete_timed_out. It is not safe, so need adding check IDLE for rq->state. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
The request may be executed asynchronously, and rq->state may be changed to IDLE. To avoid repeated request completion, only MQ_RQ_COMPLETE of rq->state is checked in nvme_rdma_complete_timed_out. It is not safe, so need adding check IDLE for rq->state. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chao Leng 提交于
Now use teardown_lock to serialize for time out and tear down. This may cause abnormal: first cancel all request in tear down, then time out may complete the request again, but the request may already be freed or restarted. To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. At the same time we need to delete teardown_lock. Signed-off-by: NChao Leng <lengchao@huawei.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chao Leng 提交于
Now use teardown_lock to serialize for time out and tear down. This may cause abnormal: first cancel all request in tear down, then time out may complete the request again, but the request may already be freed or restarted. To avoid race between time out and tear down, in tear down process, first we quiesce the queue, and then delete the timer and cancel the time out work for the queue. At the same time we need to delete teardown_lock. Signed-off-by: NChao Leng <lengchao@huawei.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chao Leng 提交于
Introduce sync io queues for some scenarios which just only need sync io queues not sync all queues. Signed-off-by: NChao Leng <lengchao@huawei.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Multiple CPUs may be mapped to the same hctx, allowing mulitple submission contexts to attempt commit_rqs(). We need to verify we're not writing the same doorbell value multiple times since that's a spec violation. Revert commit 54b2fcee. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1878596Reported-by: N"B.L. Jones" <brandon.gustav@googlemail.com> Signed-off-by: NKeith Busch <kbusch@kernel.org>
-
- 28 10月, 2020 1 次提交
-
-
由 Jason Gunthorpe 提交于
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.comReported-and-tested-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com> Fixes: 2a7cec53 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: NJack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
-
- 27 10月, 2020 6 次提交
-
-
由 James Smart 提交于
__nvme_fc_terminate_io() is now called by only 1 place, in reset_work. Consoldate and move the functionality of terminate_io into reset_work. In reset_work, rather than calling the create_association directly, schedule the connect work element to do its thing. After scheduling, flush the connect work element to continue with semantic of not returning until connect has been attempted at least once. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
nvme_fc_error_recovery() special cases handling when in CONNECTING state and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself special cases CONNECTING state and calls the routine to abort outstanding ios. Simplify the sequence by putting the call to abort outstanding I/Os directly in nvme_fc_error_recovery. Move the location of __nvme_fc_abort_outstanding_ios(), and nvme_fc_terminate_exchange() which is called by it, to avoid adding function prototypes for nvme_fc_error_recovery(). Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
err_work was created to handle errors (mainly I/O timeouts) while in CONNECTING state. The flag for err_work_active is also unneeded. Remove err_work_active and err_work. The actions to abort I/Os are moved inline to nvme_error_recovery(). Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Whenever there are errors during CONNECTING, the driver recovers by aborting all outstanding ios and counts on the io completion to fail them and thus the connection/association they are on. However, the connection failure depends on a failure state from the core routines. Not all commands that are issued by the core routine are guaranteed to cause a failure of the core routine. They may be treated as a failure status and the status is then ignored. As such, whenever the transport enters error_recovery while CONNECTING, it will set a new flag indicating an association failed. The create_association routine which creates and initializes the controller, will monitor the state of the flag as well as the core routine error status and ensure the association fails if there was an error. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 zhenwei pi 提交于
Receiving a zero length message leads to the following warnings because the CQE is processed twice: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0xd9/0xe0 Call Trace: <IRQ> nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma] __ib_process_cq+0x76/0x150 [ib_core] ... Sanity check the received data length, to avoids this. Thanks to Chao Leng & Sagi for suggestions. Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Revalidating nvme zoned namespaces requires IO commands, and there are controller states that prevent IO. For example, a sanitize in progress is required to fail all IO, but we don't want to remove a namespace we've previously added just because the controller is in such a state. Suppress the error in this case. Reported-by: NMichael Nguyen <michael.nguyen@wdc.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-