- 23 5月, 2017 1 次提交
-
-
由 Marta Rybczynska 提交于
In the case of small NVMe-oF queue size (<32) we may enter a deadlock caused by the fact that the IB completions aren't sent waiting for 32 and the send queue will fill up. The error is seen as (using mlx5): [ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273): [ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12 This patch changes the way the signaling is done so that it depends on the queue depth now. The magic define has been removed completely. Cc: stable@vger.kernel.org Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu> Signed-off-by: NSamuel Jones <sjones@kalray.eu> Acked-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 21 5月, 2017 3 次提交
-
-
由 James Smart 提交于
Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Rather than waiting for reset work thread to stop queues and abort the ios, immediately stop the queues on error detection. Reset thread will restop the queues (as it's called on other paths), but it does not appear to have a side effect. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 James Smart 提交于
In order to create an association, the remoteport must be serving either a target role or a discovery role. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jon Derrick 提交于
CMB doesn't get unmapped until removal while getting remapped on every reset. Add the unmapping and sysfs file removal to the reset path in nvme_pci_disable to match the mapping path in nvme_pci_enable. Fixes: 202021c1 ("nvme : Add sysfs entry for NVMe CMBs when appropriate") Signed-off-by: NJon Derrick <jonathan.derrick@intel.com> Acked-by: NKeith Busch <keith.busch@intel.com> Reviewed-By: NStephen Bates <sbates@raithlin.com> Cc: <stable@vger.kernel.org> # 4.9+ Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 5月, 2017 1 次提交
-
-
由 Rakesh Pandit 提交于
Free up kmalloc allocated memory if failure happens while handling L2P table transfer in nvme_nvm_get_l2p_tbl. Fixes: 8e79b5cb ("lightnvm: move block provisioning to targets") Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 5月, 2017 1 次提交
-
-
由 Geert Uytterhoeven 提交于
With gcc 4.1.2: drivers/nvme/host/lightnvm.c: In function ‘nvme_nvm_submit_io’: drivers/nvme/host/lightnvm.c:498: warning: ‘rq’ is used uninitialized in this function Indeed, since commit 2e13f33a ("lightnvm: create cmd before allocating request"), the request is passed to nvme_nvm_rqtocmd() before it is allocated. Fortunately, as of commit 91276162 ("lightnvm: refactor end_io functions for sync"), nvme_nvm_rqtocmd () no longer uses the passed request, so this warning is a false positive. Drop the unused parameter to clean up the code and kill the warning. Fixes: 2e13f33a ("lightnvm: create cmd before allocating request") Fixes: 91276162 ("lightnvm: refactor end_io functions for sync") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 5月, 2017 1 次提交
-
-
由 Javier González 提交于
Create nvme command before allocating a request using nvme_alloc_request, which uses the command direction. Up until now, the command has been zeroized, so all commands have been allocated as a read operation. Signed-off-by: NJavier González <javier@cnexlabs.com> Reviewed-by: NMatias Bjørling <matias@cnexlabs.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 02 5月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 27 4月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
This function just returns the same error code and sense data as the default statement in the switch in the caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com>
-
- 26 4月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
Found by sparse. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMatias Bjørling <matias@cnexlabs.com>
-
由 Jon Derrick 提交于
The current command submission code uses a sector-based value when considering the maximum number of blocks per command. With a 4k-formatted namespace and a command exceeding max hardware limits, this calculation doesn't split IOs which should be split and fails in the nvme layer. This patch fixes that calculation and enables IO splitting in these circumstances. Signed-off-by: NJon Derrick <jonathan.derrick@intel.com> Reviewed-by: NJens Axboe <axboe@fb.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Ewan D. Milne 提交于
Do not call nvmf_free_options() from the nvme_fc_ctlr destructor if nvme_fc_create_ctrl() returns an error, because nvmf_create_ctrl() frees the options when an error is returned. Signed-off-by: NEwan D. Milne <emilne@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 25 4月, 2017 3 次提交
-
-
由 Andy Lutomirski 提交于
We're probably going to be stuck quirking APST off on an over-broad range of devices for 4.11. Let's make it easy to override the quirk for testing. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andy Lutomirski 提交于
Debugging APST is currently a bit of a pain. This gives optional simple log messages that describe the APST state. The easiest way to use this is probably with the nvme_core.dyndbg=+p module parameter. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andy Lutomirski 提交于
There was a typo in the description of the timeout heuristic. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 4月, 2017 4 次提交
-
-
由 Christoph Hellwig 提交于
Found by sparse. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
由 James Smart 提交于
This patch actually does quite a few things. When looking to add controller reset support, the organization modeled after rdma was very fragmented. rdma duplicates the reset and teardown paths and does different things to the block layer on the two paths. The code to build up the controller is also duplicated between the initial creation and the reset/error recovery paths. So I decided to make this sane. I reorganized the controller creation and teardown so that there is a connect path and a disconnect path. Initial creation obviously uses the connect path. Controller teardown will use the disconnect path, followed last access code. Controller reset will use the disconnect path to stop operation, and then the connect path to re-establish the controller. Along the way, several things were fixed - aens were not properly set up. They are allocated differently from the per-request structure on the blk queues. - aens were oddly torn down. the prior patch corrected to abort, but we still need to dma unmap and free relative elements. - missed a few ref counting points: in aen completion and on i/o's that fail - controller initial create failure paths were still confused vs teardown before converting to ref counting vs after we convert to refcounting. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Add abort support for aens. Commonized the op abort to apply to aen or real ios (caused some reorg/routine movement). Abort path sets termination flag in prep for next patch that will be watching i/o abort completion before proceeding with controller teardown. Now that we're aborting aens, the "exit" code that simply cleared out their context no longer applies. Also clarified how we detect an AEN vs a normal io - by a flag, not by whether a rq exists or the a rqno is out of range. Note: saw some interesting cases where if the queues are stopped and we're waiting for the aborts, the core layer can call the complete_rq callback for the io. So the io completion synchronizes link side completion with possible blk layer completion under error. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
The code validates the command_id in the response to the original sqe command. But prior code was using the rq->rqno as the sqe command id. The core layer overwrites what the transport set there originally. Use the actual sqe content. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 21 4月, 2017 12 次提交
-
-
由 Junxiong Guan 提交于
Currently most IOs which return the nvme error codes are retried on the other path if those IOs returns EIO from NVMe driver. This patch let Multipath distinguish nvme media error codes and some generic or cmd-specific nvme error codes so that multipath will not retry those kinds of IO, to save bandwidth. Signed-off-by: NJunxiong Guan <guanjunxiong@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
If an IO timeout occurs, it's helpful to know if the controller did not post a completion or the driver missed an interrupt. While we never expect the latter, this patch will make it possible to tell the difference so we don't have to guess. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
由 James Smart 提交于
remoteport teardown never aborted the LS opertions. Add support. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 James Smart 提交于
Link LS's on the remoteport rather than the controller. LS's are between nport's. Makes more sense, especially on async teardown where the controller is torn down regardless of the LS (LS is more of a notifier to the target of the teardown), to have them on the remoteport. While revising ls send/done routines, issues were seen relative to refcounting and cleanup, especially in async path. Reworked these code paths. Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Helen Koike 提交于
This change provides a mechanism to reduce the number of MMIO doorbell writes for the NVMe driver. When running in a virtualized environment like QEMU, the cost of an MMIO is quite hefy here. The main idea for the patch is provide the device two memory location locations: 1) to store the doorbell values so they can be lookup without the doorbell MMIO write 2) to store an event index. I believe the doorbell value is obvious, the event index not so much. Similar to the virtio specification, the virtual device can tell the driver (guest OS) not to write MMIO unless you are writing past this value. FYI: doorbell values are written by the nvme driver (guest OS) and the event index is written by the virtual device (host OS). The patch implements a new admin command that will communicate where these two memory locations reside. If the command fails, the nvme driver will work as before without any optimizations. Contributions: Eric Northup <digitaleric@google.com> Frank Swiderski <fes@google.com> Ted Tso <tytso@mit.edu> Keith Busch <keith.busch@intel.com> Just to give an idea on the performance boost with the vendor extension: Running fio [1], a stock NVMe driver I get about 200K read IOPs with my vendor patch I get about 1000K read IOPs. This was running with a null device i.e. the backing device simply returned success on every read IO request. [1] Running on a 4 core machine: fio --time_based --name=benchmark --runtime=30 --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --randrepeat=false Signed-off-by: NRob Nelson <rlnelson@google.com> [mlin: port for upstream] Signed-off-by: NMing Lin <mlin@kernel.org> [koike: updated for upstream] Signed-off-by: NHelen Koike <helen.koike@collabora.co.uk> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <keith.busch@intel.com>
-
由 Keith Busch 提交于
The QPRIO field is only valid if weighted round robin arbitration is used, and this driver doesn't enable that controller configuration option. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Andy Lutomirski 提交于
There's a report that it malfunctions with APST on. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andy Lutomirski 提交于
I got a couple more reports: the Samsung APST issues appears to affect multiple 950-series devices in Dell XPS 15 9550 and Precision 5510 laptops. Change the quirk: rather than blacklisting the firmware on the first problematic SSD that was reported, disable APST on all 144d:a802 devices if they're installed in the two affected Dell models. While we're at it, disable only the deepest sleep state instead of all of them -- the reporters say that this is sufficient to fix the problem. (I have a device that appears to be entirely identical to one of the affected devices, but I have a different Dell laptop, so it's not the case that all Samsung devices with firmware BXW75D0Q are broken under all circumstances.) Samsung engineers have an affected system, and hopefully they'll give us a better workaround some time soon. In the mean time, this should minimize regressions. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Currently it's used by the lighnvm passthrough ioctl, but we'd like to make it private in preparation of block layer specific error code. Lighnvm already returns the real NVMe status anyway, so I think we can just limit it to returning -EIO for any status set. This will need a careful audit from the lightnvm folks, though. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
We want our own clearly defined error field for NVMe passthrough commands, and the request errors field is going away in its current form. Just store the status and result field in the nvme_request field from hardirq completion context (using a new helper) and then generate a Linux errno for the block layer only when we actually need it. Because we can't overload the status value with a negative error code for cancelled command we now have a flags filed in struct nvme_request that contains a bit for this condition. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
nvme_complete_async_event expects the little endian status code including the phase bit, and a new completion handler I plan to introduce will do so as well. Change the status variable into the little endian format with the phase bit used in the NVMe CQE to fix / enable this. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 4月, 2017 1 次提交
-
-
由 Bart Van Assche 提交于
This patch changes the behavior of the lightnvm driver as follows: * REQ_FAILFAST_MASK is set for read-ahead requests. * If no I/O priority has been set in the bio, the I/O priority is copied from the I/O context. * The rq_disk member is initialized if bio->bi_bdev != NULL. * The bio sector offset is copied into req->__sector instead of retaining the value -1 set by blk_mq_alloc_request(). * req->errors is initialized to zero. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Matias Bjørling <m@bjorling.me> Cc: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 19 4月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKeith Busch <keith.busch@intel.com>
-
- 17 4月, 2017 4 次提交
-
-
由 Javier González 提交于
The NVMe I/O command control bits are 16 bytes, but is interpreted as 32 bytes in the lightnvm user I/O data path. Signed-off-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
The asserts in _nvme_nvm_check_size are not compiled due to the function not begin called. Make sure that it is called, and also fix the wrong sizes of asserts for nvme_nvm_addr_format, and nvme_nvm_bb_tbl, which checked for number of bits instead of bytes. Reported-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Javier González 提交于
Until now erases have been submitted as synchronous commands through a dedicated erase function. In order to enable targets implementing asynchronous erases, refactor the erase path so that it uses the normal async I/O submission functions. If a target requires sync I/O, it can implement it internally. Also, adapt rrpc to use the new erase path. Signed-off-by: NJavier González <javier@cnexlabs.com> Fixed spelling error. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Scott Bauer 提交于
There are two closely named structs in lightnvm: struct nvme_nvm_addr_format and struct nvme_addr_format. The first struct has 4 reserved bytes at the end, the second does not. (gdb) p sizeof(struct nvme_nvm_addr_format) $1 = 16 (gdb) p sizeof(struct nvm_addr_format) $2 = 12 In the nvme_nvm_identify function we memcpy from the larger struct to the smaller struct. We incorrectly pass the length of the larger struct and overflow by 4 bytes, lets not do that. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 10 4月, 2017 2 次提交
-
-
由 Sagi Grimberg 提交于
both our sqsize and the controller MQES cap are a 0 based value, so making it 1 based is wrong. Reported-by: NTrapp, Darren <Darren.Trapp@cavium.com> Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sagi Grimberg 提交于
both our sqsize and the controller MQES cap are a 0 based value, so making it 1 based is wrong. Reported-by: NTrapp, Darren <Darren.Trapp@cavium.com> Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 4月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
But now for the real NVMe Write Zeroes yet, just to get rid of the discard abuse for zeroing. Also rename the quirk flag to be a bit more self-explanatory. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-