- 23 2月, 2017 9 次提交
-
-
由 Keith Busch 提交于
This driver previously required we have a special check for IO submitted to nvme IO queues that are temporarily suspended. That is no longer necessary since blk-mq provides a quiesce, so any IO that actually gets submitted to such a queue must be ended since the queue isn't going to start back up. This is fixing a condition where we have fewer IO queues after a controller reset. This may happen if the number of CPU's has changed, or controller firmware update changed the queue count, for example. While it may be possible to complete the IO on a different queue, the block layer does not provide a way to resubmit a request on a different hardware context once the request has entered the queue. We don't want these requests to be stuck indefinitely either, so ending them in error is our only option at the moment. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If a namespace has already been marked dead, we don't want to kick the request_queue again since we may have just freed it from another thread. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If the device is not present, the driver should disable the queues immediately. Prior to this, the driver was relying on the watchdog timer to kill the queues if requests were outstanding to the device, and that just delays removal up to one second. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andy Lutomirski 提交于
NVMe devices can advertise multiple power states. These states can be either "operational" (the device is fully functional but possibly slow) or "non-operational" (the device is asleep until woken up). Some devices can automatically enter a non-operational state when idle for a specified amount of time and then automatically wake back up when needed. The hardware configuration is a table. For each state, an entry in the table indicates the next deeper non-operational state, if any, to autonomously transition to and the idle time required before transitioning. This patch teaches the driver to program APST so that each successive non-operational state will be entered after an idle time equal to 100% of the total latency (entry plus exit) associated with that state. The maximum acceptable latency is controlled using dev_pm_qos (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational states with total latency greater than this value will not be used. As a special case, setting the latency tolerance to 0 will disable APST entirely. On hardware without APST support, the sysfs file will not be exposed. The latency tolerance for newly-probed devices is set by the module parameter nvme_core.default_ps_max_latency_us. In theory, the device can expose "default" APST table, but this doesn't seem to function correctly on my device (Samsung 950), nor does it seem particularly useful. There is also an optional mechanism by which a configuration can be "saved" so it will be automatically loaded on reset. This can be configured from userspace, but it doesn't seem useful to support in the driver. On my laptop, enabling APST seems to save nearly 1W. The hardware tables can be decoded in userspace with nvme-cli. 'nvme id-ctrl /dev/nvmeN' will show the power state table and 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST configuration. This feature is quirked off on a known-buggy Samsung device. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andy Lutomirski 提交于
Currently, all NVMe quirks are based on PCI IDs. Add a mechanism to define quirks based on identify_ctrl's vendor id, model number, and/or firmware revision. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Johannes Thumshirn 提交于
nvmf_create_ctrl() relys on the presence of a create_crtl callback in the registered nvmf_transport_ops, so make nvmf_register_transport require one. Update the available call-sites as well to reflect these changes. Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Parav Pandit 提交于
This patch defines CNS field as 8-bit field and avoids cpu_to/from_le conversions. Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS for readability (don't rely on the fact that NVME_ID_CNS_NS = 0). Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Max Gurtovoy 提交于
Reviewed-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sagi Grimberg 提交于
Easier for debugging and testing state machine transitions. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 2月, 2017 2 次提交
-
-
由 Scott Bauer 提交于
We need to verify that the controller supports the security commands before actually trying to issue them. Signed-off-by: NScott Bauer <scott.bauer@intel.com> [hch: moved the check so that we don't call into the OPAL code if not supported] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
Insted of bloating the containing structure with it all the time this allocates struct opal_dev dynamically. Additionally this allows moving the definition of struct opal_dev into sed-opal.c. For this a new private data field is added to it that is passed to the send/receive callback. After that a lot of internals can be made private as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NScott Bauer <scott.bauer@intel.com> Reviewed-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 2月, 2017 1 次提交
-
-
由 Scott Bauer 提交于
When CONFIG_KASAN is enabled, compilation fails: block/sed-opal.c: In function 'sed_ioctl': block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Moved all the ioctl structures off the stack and dynamically allocate using _IOC_SIZE() Fixes: 455a7b23 ("block: Add Sed-opal library") Reported-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 09 2月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
NVMe supports up to 256 ranges per DSM command, so wire up support for ranged discards up to that limit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 07 2月, 2017 1 次提交
-
-
由 Scott Bauer 提交于
This patch implements the necessary logic to unlock an Opal enabled device coming back from an S3. The patch also implements the SED/Opal allocation necessary to support the opal ioctls. Signed-off-by: NScott Bauer <scott.bauer@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 2月, 2017 2 次提交
-
-
由 Christoph Hellwig 提交于
Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Christoph Hellwig 提交于
This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 31 1月, 2017 5 次提交
-
-
由 Keith Busch 提交于
This patch sets the aborted flag only if an abort was sent, reducing excessive kernel message spamming for completed IO that wasn't actually aborted. Reported-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
When the lightnvm core had the "gennvm" layer between the device and the target, there was a need for the core to be able to figure out which target it should send an end_io callback to. Leading to a "double" end_io, first for the media manager instance, and then for the target instance. Now that core and gennvm is merged, there is no longer a need for this, and a single end_io callback will do. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
Enable user-space to issue vector I/O commands through ioctls. To issue a vector I/O, the ppa list with addresses is also required and must be mapped for the controller to access. For each ioctl, the result and status bits are returned as well, such that user-space can retrieve the open-channel SSD completion bits. The implementation covers the traditional use-cases of bad block management, and vectored read/write/erase. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Metadata implementation, test, and fixes. Signed-off-by: NSimon A.F. Lund <slund@cnexlabs.com> Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
The number of configuration groups has been limited to one in current code, even if there is support for up to four. With the introduction of the open-channel SSD 1.3 specification, only a single group is exposed onwards. Reflect this in the nvm_id structure. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
For the first iteration of Open-Channel SSDs, it was anticipated that there could be various media managers on top of an open-channel SSD, such to allow vendors to plug in their own host-side FTLs, without the media manager in between. Now that an Open-Channel SSD is exposed as a traditional block device, there is no longer a need for this. Therefore lets merge the gennvm code with core and simplify the stack. Signed-off-by: NMatias Bjørling <matias@cnexlabs.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 30 1月, 2017 2 次提交
-
-
由 Christoph Hellwig 提交于
The NVMe SCSI emulation doesn't use BLOCK_PC requests, so BLK_MAX_CDB doesn't have a meaning for it. Instead opencode the value of 16 and refactor the code a bit so that related checks are next to each other and we only need to use the value in one place. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Since we moved the cdb parts and define out of the block proper, we need to include scsi/scsi_request.h for the nvme scsi layer. Fixes: 82ed4db4 ("block: split scsi_request out of struct request") Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 26 1月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Without this deallocate won't work properly due to the mismatch of the bio/request size and the actual payload size. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
-
- 18 1月, 2017 1 次提交
-
-
由 Jens Axboe 提交于
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NOmar Sandoval <osandov@fb.com>
-
- 14 1月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
The new blk_rq_payload_bytes generalizes the payload length hacks that nvme_map_len did before. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 1月, 2017 2 次提交
-
-
由 Guilherme G. Piccoli 提交于
Commit 54adc010 ("nvme/quirk: Add a delay before checking for adapter readiness") introduced a quirk to adapters that cannot read the bit NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters need a delay or else the action of reading the bit NVME_CSTS_RDY could somehow corrupt adapter's registers state and it never recovers. When this quirk was added, we checked ctrl->tagset in order to avoid quirking in probe time, supposing we would never require such delay during probe. Well, it was too optimistic; we in fact need this quirk at probe time in some cases, like after a kexec. In some experiments, after abnormal shutdown of machine (aka power cord unplug), we booted into our bootloader in Power, which is a Linux kernel, and kexec'ed into another distro. If this kexec is too quick, we end up reaching the probe of NVMe adapter in that distro when adapter is in bad state (not fully initialized on our bootloader). What happens next is that nvme_wait_ready() is unable to complete, except if the quirk is enabled. So, this patch removes the original ctrl->tagset verification in order to enable the quirk even on probe time. Fixes: 54adc010 ("nvme/quirk: Add a delay before checking for adapter readiness") Reported-by: NAndrew Byrne <byrneadw@ie.ibm.com> Reported-by: NJaime A. H. Gomez <jahgomez@mx1.ibm.com> Reported-by: NZachary D. Myers <zdmyers@us.ibm.com> Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: NJeffrey Lien <Jeff.Lien@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Now that we don't abuse the cmd field in struct request for nvme command passthrough this function needs to be converted to the proper accessor as well. Fixes: d49187e9 ("nvme: introduce struct nvme_request") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
-
- 21 12月, 2016 6 次提交
-
-
由 Johannes Thumshirn 提交于
Simplify the error handling of nvme_fc_create_hw_io_queues(), this saves us one variable and one level of indentation. Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviwed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 James Smart 提交于
Dan Carpenters's tool caught a pointer reference - should have been just ptr, not &ptr. Don't bother. Remove the pointer value in the printf. Its irrelevant. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andy Lutomirski 提交于
Now that the broken power state control is gone, it appears to serve no purpose. Just delete it. NVME devices don't have a concept of started vs stopped anyway. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
It is not theoretically possible for this driver to wrap twice while processing completions. The driver allocates only 'queue_depth - 1' tags, so there can never be more than that to reap when processing a completion queue. Removing this misleading comment makes it a little less likely people with broken controllers will blame the driver for their spurious interrupts. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Max Gurtovoy 提交于
Convert to tabs and remove unneeded whitespaces. Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Some OEMs believe they own the Identify Controller vendor specific region and will repurpose it with their own values. While not common, we can't rely on the PCI VID:DID to tell use how to decode the field we reserved for this as the stripe size so we need to do something else for the list of devices using this quirk. The field was supposed to allow flexibility on the device's back-end striping, but it turned out that never materialized; the chunk is always the same as MDTS in the products subscribing to this quirk, so this patch removes the stripe_size field and sets the chunk to the max hw transfer size for the devices using this quirk. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 19 12月, 2016 1 次提交
-
-
由 Stephen Bates 提交于
Make sure we are using the correct scnprintf in the sysfs show function for the CMB. Signed-off-by: NStephen Bates <sbates@raithlin.com> Reviewed-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 12月, 2016 1 次提交
-
-
由 Steve Wise 提交于
Also add nvme cm status strings and use them. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 14 12月, 2016 2 次提交
-
-
由 Andy Lutomirski 提交于
When debugging nvme controller crashes, it's nice to know whether the controller died cleanly so that the failure is just reflected in CSTS, it died and put an error in PCI_STATUS, or whether it died so badly that it stopped responding to PCI configuration space reads. I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung "SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q". Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NKeith Busch <keith.busch@intel.com> Fixed up white space and hunk reject. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Linus Torvalds 提交于
This reverts commit 6d31e3ba. This causes bootup problems for me both on my laptop and my desktop. What they have in common is that they have NVMe disks with dm-crypt, but it's not the same controller, so it's not controller-specific. Jens does not see it on his machine (also NVMe), so it's presumably something that triggers just on bootup. Possibly related to dm-crypt and the fact that I mark my luks volume with "allow-discards" in /etc/crypttab. It's 100% repeatable for me, which made it fairly straightforward to bisect the problem to this commit. Small mercies. So we don't know what the reason is yet, but the revert is needed to get things going again. Acked-by: NJens Axboe <axboe@fb.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 12月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 12月, 2016 1 次提交
-
-
由 James Smart 提交于
Implements the FC-NVME T11 definition of how nvme fabric capsules are performed on an FC fabric. Utilizes a lower-layer API to FC host adapters to send/receive FC-4 LS operations and FCP operations that comprise NVME over FC operation. The T11 definitions for FC-4 Link Services are implemented which create NVMeOF connections. Implements the hooks with blk-mq to then submit admin and io requests to the different connections. Signed-off-by: NJames Smart <james.smart@broadcom.com> Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-