- 20 2月, 2019 6 次提交
-
-
由 Bart Van Assche 提交于
This patch does not change any functionality but makes the next patch in this series easier to read. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Bart Van Assche 提交于
Since nvme_delete_ctrl_sync() is not called from any other kernel module, unexport it. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Bart Van Assche 提交于
This patch avoids that the compiler complains about 'ret' being set but not being used when building with W=1. Fixes: 3b6592f7 ("nvme: utilize two queue maps, one for reads and one for writes") # v5.0-rc1 Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Bart Van Assche 提交于
This patch avoids that the kernel-doc tool reports a warning when building with W=1. Fixes: 26c68227 ("nvme-fabrics: allow nvmf_connect_io_queue to poll") # v5.0-rc1 Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Bart Van Assche 提交于
This patch avoids that smatch complains about inconsistent indentation. Fixes: a07b4970 ("nvmet: add a generic NVMe target") # v4.10 Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Implement a simple round-robin I/O policy for multipathing. Path selection is done in two rounds, first iterating across all optimized paths, and if that doesn't return any valid paths, iterate over all optimized and non-optimized paths. If no paths are found, use the existing algorithm. Also add a sysfs attribute 'iopolicy' to switch between the current NUMA-aware I/O policy and the 'round-robin' I/O policy. Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 06 2月, 2019 2 次提交
-
-
由 Keith Busch 提交于
A surprise removal may fail to tear down request queues if it is racing with the initial asynchronous probe. If that happens, the remove path won't see the queue resources to tear down, and the controller reset path may create a new request queue on a removed device, but will not be able to make forward progress, deadlocking the pci removal. Protect setting up non-blocking resources from a shutdown by holding the same mutex, and transition to the CONNECTING state after these resources are initialized so the probe path may see the dead controller state before dispatching new IO. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202081Reported-by: NAlex Gagniuc <Alex_Gagniuc@Dellteam.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Tested-by: NAlex Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
If a controller supports the NS Change Notification, the namespace scan_work is automatically triggered after attaching a new namespace. Occasionally the namespace scan_work may append the new namespace to the list before the admin command effects handling is completed. The effects handling unfreezes namespaces, but if it unfreezes the newly attached namespace, its request_queue freeze depth will be off and we'll hit the warning in blk_mq_unfreeze_queue(). On the next namespace add, we will fail to freeze that queue due to the previous bad accounting and deadlock waiting for frozen. Fix that by preventing scan work from altering the namespace list while command effects handling needs to pair freeze with unfreeze. Reported-by: NWen Xiong <wenxiong@us.ibm.com> Tested-by: NWen Xiong <wenxiong@us.ibm.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 04 2月, 2019 2 次提交
-
-
由 Sagi Grimberg 提交于
It is used now just to flush error recovery and reconnect work items in the RDMA and TCP transports, which can simply be moved to the corresponding teardown routines. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chaitanya Kulkarni 提交于
Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block device, if the device supports an optional command bit set for write zeroes. Add support to setup write zeroes command. Set maximum possible write zeroes sectors in one write zeroes command according to nvme write zeroes command definition. This patch was posted as a part of block-write-zeroes support implementation (https://patchwork.kernel.org/patch/9454859/), but did not make into mainline kernel as it got reverted due to failure on the Linus's machine. In this patch in order to be more cautious, we use NVMe controller's maximum hardware sector size which is calculated based on the controller's MDTS (Maximum Data Transfer Size) field to calculate the maximum sectors for the write zeroes request. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> [folded a fix from Keith Busch to properly respect NVME_QUIRK_DEALLOCATE_ZEROES] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 24 1月, 2019 5 次提交
-
-
由 Hannes Reinecke 提交于
Bit 6 in the ANACAP field is used to indicate that the ANA group ID doesn't change while the namespace is attached to the controller. There is an optimisation in the code to only allocate space for the ANA group header, as the namespace list won't change and hence would not need to be refreshed. However, this optimisation was never carried over to the actual workflow, which always assumes that the buffer is large enough to hold the ANA header _and_ the namespace list. So drop this optimisation and always allocate enough space. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Raju Rangoju 提交于
Under heavy load if we don't have any pre-allocated rsps left, we dynamically allocate a rsp, but we are not actually allocating memory for nvme_completion (rsp->req.rsp). In such a case, accessing pointer fields (req->rsp->status) in nvmet_req_init() will result in crash. To fix this, allocate the memory for nvme_completion by calling nvmet_rdma_alloc_rsp() Fixes: 8407879c("nvmet-rdma:fix possible bogus dereference under heavy load") Cc: <stable@vger.kernel.org> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NRaju Rangoju <rajur@chelsio.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
If the device supports less queues than provided (if the device has less completion vectors), we might hit a bug due to the fact that we ignore that in nvme_rdma_map_queues (we override the maps nr_queues with user opts). Instead, keep track of how many default/read/poll queues we actually allocated (rather than asked by the user) and use that to assign our queue mappings. Fixes: b65bb777 (" nvme-rdma: support separate queue maps for read and write") Reported-by: NSaleem, Shiraz <shiraz.saleem@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
Currently, we have several problems with the timeout handler: 1. If we timeout on the controller establishment flow, we will hang because we don't execute the error recovery (and we shouldn't because the create_ctrl flow needs to fail and cleanup on its own) 2. We might also hang if we get a disconnet on a queue while the controller is already deleting. This racy flow can cause the controller disable/shutdown admin command to hang. We cannot complete a timed out request from the timeout handler without mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work). So we serialize it in the timeout handler and teardown io and admin queues to guarantee that no one races with us from completing the request. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
Currently, we have several problems with the timeout handler: 1. If we timeout on the controller establishment flow, we will hang because we don't execute the error recovery (and we shouldn't because the create_ctrl flow needs to fail and cleanup on its own) 2. We might also hang if we get a disconnet on a queue while the controller is already deleting. This racy flow can cause the controller disable/shutdown admin command to hang. We cannot complete a timed out request from the timeout handler without mutual exclusion from the teardown flow (e.g. nvme_rdma_error_recovery_work). So we serialize it in the timeout handler and teardown io and admin queues to guarantee that no one races with us from completing the request. Reported-by: NJaesoo Lee <jalee@purestorage.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 1月, 2019 2 次提交
-
-
由 Ming Lei 提交于
When -ENOSPC is returned from pci_alloc_irq_vectors_affinity(), we still try to allocate multiple irq vectors again, so irq queues covers the admin queue actually. But we don't consider that, then number of the allocated irq vector may be same with sum of io_queues[HCTX_TYPE_DEFAULT] and io_queues[HCTX_TYPE_READ], this way is obviously wrong, and finally breaks nvme_pci_map_queues(), and warning from pci_irq_get_affinity() is triggered. IRQ queues should cover admin queues, this patch makes this point explicitely in nvme_calc_io_queues(). We got severl boot failure internal report on aarch64, so please consider to fix it in v4.20. Fixes: 6451fe73 ("nvme: fix irq vs io_queue calculations") Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NKeith Busch <keith.busch@intel.com> Tested-by: Nfin4478 <fin4478@hotmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
If we end up in nvmet_tcp_try_recv_one with a bogus state queue receive state we will access result which is uninitialized. Initialize restult to 0 which will be considered as if no data was received by the tcp socket. Fixes: 872d26a3 ("nvmet-tcp: add NVMe over TCP target driver") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 1月, 2019 11 次提交
-
-
由 Andrey Smirnov 提交于
ctrl->cntlid will already be initialized from id->cntlid for non-NVME_F_FABRICS controllers few lines below. For NVME_F_FABRICS controllers this field should already be initialized, otherwise the check if (ctrl->cntlid != le16_to_cpu(id->cntlid)) below will always be a no-op. Signed-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Dingwall 提交于
If a device provides an NQN it is expected to be globally unique. Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did not satisfy this requirement. In these circumstances if a system has >1 affected device then only one device is enabled. If this quirk is enabled then the device supplied subnqn is ignored and we fallback to generating one as if the field was empty. In this case we also suppress the version check so we don't print a warning when the quirk is enabled. Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJames Dingwall <james@dingwall.me.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
We need to preserve the leading zeros in the vid and ssvid when generating a unique NQN. Truncating these may lead to naming collisions. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
When nvme_init_identify() fails the ANA log buffer is deallocated but _not_ set to NULL. This can cause double free oops when this controller is deleted without ever being reconnected. Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Even if user-space sent it to us, it got it wrong so lets help by disallowing it. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
For sure we are a fabric driver. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We should never touch the opal device from the transport driver. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hongbo Yao 提交于
There is an out of bounds array access in nvme_cqe_peding(). When enable irq_thread for nvme interrupt, there is racing between the nvmeq->cq_head updating and reading. nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending() uses its value as an array index, the index will be out of bounds. Signed-off-by: NHongbo Yao <yaohongbo@huawei.com> [hch: slight coding style update] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
If the driver is unable to create a subset of IO queues for any reason, the read/write and polled queue sets will not match the actual allocated hardware contexts. This leaves gaps in the CPU affinity mappings and causes the following kernel panic after blk_mq_map_queue_type() returns a NULL hctx. BUG: unable to handle kernel NULL pointer dereference at 0000000000000198 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 64 PID: 1171 Comm: kworker/u259:1 Not tainted 4.20.0+ #241 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 Workqueue: nvme-wq nvme_scan_work [nvme_core] RIP: 0010:blk_mq_init_allocated_queue+0x2d9/0x440 RSP: 0018:ffffb1bf0abc3cd0 EFLAGS: 00010286 RAX: 000000000000001f RBX: ffff8ea744cf0718 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 000000000000007c RDI: ffffffff9109a820 RBP: ffff8ea7565f7008 R08: 000000000000001f R09: 000000000000003f R10: ffffb1bf0abc3c00 R11: 0000000000000000 R12: 000000000001d008 R13: ffff8ea7565f7008 R14: 000000000000003f R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8ea757200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000198 CR3: 0000000013058000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: blk_mq_init_queue+0x35/0x60 nvme_validate_ns+0xc6/0x7c0 [nvme_core] ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core] nvme_scan_work+0xc8/0x340 [nvme_core] ? __wake_up_common+0x6d/0x120 ? try_to_wake_up+0x55/0x410 process_one_work+0x1e9/0x3d0 worker_thread+0x2d/0x3d0 ? process_one_work+0x3d0/0x3d0 kthread+0x111/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 Modules linked in: nvme nvme_core serio_raw CR2: 0000000000000198 Fix by re-running the interrupt vector setup from scratch using a reduced count that may be successful until the created queues matches the irq affinity plus polling queue sets. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Liviu Dudau 提交于
When using HMB the PCIe host driver allocates host_mem_desc_bufs using dma_alloc_attrs() but frees them using dma_free_coherent(). Use the correct dma_free_attrs() function to free the buffers. Signed-off-by: NLiviu Dudau <liviu@dudau.co.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Jianchao Wang 提交于
We only set the nr_maps to 3 if poll queues are supported. Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 08 1月, 2019 1 次提交
-
-
由 Luis Chamberlain 提交于
We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 19 12月, 2018 11 次提交
-
-
由 yupeng 提交于
Export the disk name, queue id, sq_head, sq_tail to a trace event in completion handling. Usage example: cd /sys/kernel/debug/tracing/events/nvme/nvme_sq echo 'disk=="nvme1n1"' > filter echo 1 > enable cat /sys/kernel/debug/tracing/trace_pipe Signed-off-by: Nyupeng <yupeng0921@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NKeith Busch <keith.busch@intel.com> [hch: slight formatting tweaks, use standard nvme tracepoint conventions] Signed-off-by: NChristoph Hellwig <hch@lst.de> wip
-
由 Sagi Grimberg 提交于
When passed with nr_poll_queues setup additional queues with cq polling context IB_POLL_DIRECT (no interrupts) and make sure to set QUEUE_FLAG_POLL on the connect_q. In addition add the third queue mapping for polling queues. nvmf connect on this queue is polled for like all other requests so make nvmf_connect_io_queue poll for polling queues. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
This argument will specify how many polling I/O queues to connect when creating the controller. These I/O queues will host I/O that is set with REQ_HIPRI. Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Preparation for polling support for fabrics. Polling support means that our completion queues are not generating any interrupts which means we need to poll for the nvmf io queue connect as well. Reviewed by Steve Wise <swise@opengridcomputing.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Pass poll bool to indicate that we need it to poll. This prepares us for polling support in nvmf since connect is an I/O that will be queued and has to be polled in order to complete. If poll is passed, we call nvme_execute_rq_polled which sends the requests and polls for its completion. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Colin Ian King 提交于
There is a spelling mistake in a dev_info message, fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Christoph Hellwig 提交于
By duplicating the nvme_process_cq in both branches we keep the sparse lock context checking happy, so do it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Christoph Hellwig 提交于
The block layer now enables polling support on a queue if nr_maps includes the poll map, so we should only set that if we actually support poll queues. Fixes: 6544d229 ("block: enable polling by default if a poll map is initalized") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Chaitanya Kulkarni 提交于
This patch defines a new macro NVMET_NO_ERROR_LOC to represent the default error location value in the nvme-error-log-page. This is a pure cleanup patch and it does not change any functionality. Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-