- 05 11月, 2014 3 次提交
-
-
由 Matias Bjørling 提交于
This converts the NVMe driver to a blk-mq request-based driver. The NVMe driver is currently bio-based and implements queue logic within itself. By using blk-mq, a lot of these responsibilities can be moved and simplified. The patch is divided into the following blocks: * Per-command data and cmdid have been moved into the struct request field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id maintenance are now handled by blk-mq through the rq->tag field. * The logic for splitting bio's has been moved into the blk-mq layer. The driver instead notifies the block layer about limited gap support in SG lists. * blk-mq handles timeouts and is reimplemented within nvme_timeout(). This both includes abort handling and command cancelation. * Assignment of nvme queues to CPUs are replaced with the blk-mq version. The current blk-mq strategy is to assign the number of mapped queues and CPUs to provide synergy, while the nvme driver assign as many nvme hw queues as possible. This can be implemented in blk-mq if needed. * NVMe queues are merged with the tags structure of blk-mq. * blk-mq takes care of setup/teardown of nvme queues and guards invalid accesses. Therefore, RCU-usage for nvme queues can be removed. * IO tracing and accounting are handled by blk-mq and therefore removed. * Queue suspension logic is replaced with the logic from the block layer. Contributions in this patch from: Sam Bradshaw <sbradshaw@micron.com> Jens Axboe <axboe@fb.com> Keith Busch <keith.busch@intel.com> Robert Nelson <rlnelson@google.com> Acked-by: NKeith Busch <keith.busch@intel.com> Acked-by: NJens Axboe <axboe@fb.com> Updated for new ->queue_rq() prototype. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Submits NVMe asynchronous event requests, one event up to the controller maximum or number of possible different event types (8), whichever is smaller. Events successfully returned by the controller are logged. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 6月, 2014 1 次提交
-
-
由 Keith Busch 提交于
There is a potential dead lock if a cpu event occurs during nvme probe since it registered with hot cpu notification. This fixes the race by having the module register with notification outside of probe rather than have each device register. The actual work is done in a scheduled work queue instead of in the notifier since assigning IO queues has the potential to block if the driver creates additional queues. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 04 6月, 2014 1 次提交
-
-
由 Matthew Wilcox 提交于
It's positively immoral to have a global variable called 'io_timeout'. Keep the module parameter called io_timeout, though. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 05 5月, 2014 3 次提交
-
-
由 Keith Busch 提交于
It is possible a filesystem may send a flush flagged bio with write data. There is no such composite NVMe command, so the driver sends flush and write separately. The device is allowed to execute these commands in any order, so it was possible the driver ends the bio after the write completes, but while the flush is still active. We don't want to let a filesystem believe flush succeeded before it really has; this could cause data corruption on a power loss between these events. To fix, this patch splits the flush and write into chained bios. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
This configures an nvme request_queue as flush capable if the device has a volatile write cache present. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
Make the copyright dates accurate and remove the final paragraph that includes the address of the FSF. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 11 4月, 2014 4 次提交
-
-
由 Keith Busch 提交于
For commands returned with failed status, queue these for resubmission and continue retrying them until success or for a limited amount of time. The final timeout was arbitrarily chosen so requests can't be retried indefinitely. Since these are requeued on the nvmeq that submitted the command, the callbacks have to take an nvmeq instead of an nvme_dev as a parameter so that we can use the locked queue to append the iod to retry later. The nvme_iod conviently can be used to track how long we've been trying to successfully complete an iod request. The nvme_iod also provides the nvme prp dma mappings, so I had to move a few things around so we can keep those mappings. Signed-off-by: NKeith Busch <keith.busch@intel.com> [fixed checkpatch issue with long line] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Increase the default timeout to 30 seconds to match SCSI. Signed-off-by: NKeith Busch <keith.busch@intel.com> [use byte instead of ushort] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Registers with hot cpu notification to rebalance, and potentially allocate additional, io queues. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
The device's IO queues are associated with CPUs, so we can use a per-cpu variable to map the a qid to a cpu. This provides a convienient way to optimally assign queues to multiple cpus when the device supports fewer queues than the host has cpus. The previous implementation may have assigned these poorly in these situations. This patch addresses this by sharing queues among cpus that are "close" together and should have a lower lock contention penalty. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 24 3月, 2014 2 次提交
-
-
由 Keith Busch 提交于
This adds rcu protected access to a queue in the nvme IOCTL path to fix potential races between a surprise removal and queue usage in nvme_submit_sync_cmd. The fix holds the rcu_read_lock() here to prevent the nvme_queue from freeing while this path is executing so it can't sleep, and so this path will no longer wait for a available command id should they all be in use at the time a passthrough IOCTL request is received. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
This adds rcu protected access to nvme_queue to fix a race between a surprise removal freeing the queue and a thread with open reference on a NVMe block device using that queue. The queues do not need to be rcu protected during the initialization or shutdown parts, so I've added a helper function for raw deferencing to get around the sparse errors. There is still a hole in the IOCTL path for the same problem, which is fixed in a subsequent patch. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 07 3月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
PREPARE_[DELAYED_]WORK() are being phased out. They have few users and a nasty surprise in terms of reentrancy guarantee as workqueue considers work items to be different if they don't have the same work function. nvme_dev->reset_work is multiplexed with multiple work functions. Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and always use it as the work function and update the users to set the ->reset_workfn field instead of overriding the work function using PREPARE_WORK(). It would probably be best to route this with other related updates through the workqueue tree. Compile tested. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: linux-nvme@lists.infradead.org
-
- 28 1月, 2014 2 次提交
-
-
由 Keith Busch 提交于
Send nvme abort command to io requests that have timed out on an initialized device. If the command is not returned after another timeout, schedule the controller for reset. Signed-off-by: NKeith Busch <keith.busch@intel.com> [fix endianness issues] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Schedules a controller reset when it indicates it has a failed status. If the device does not become ready after a reset, the pci device will be scheduled for removal. Signed-off-by: NKeith Busch <keith.busch@intel.com> [fixed checkpatch issue] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 17 12月, 2013 2 次提交
-
-
由 Keith Busch 提交于
Adds controller error handling on resume power management. If the device fails to initialize, the device is queued for a reset. If the reset fails, a thread is spawned to remove the pci device. If the device resumes as "busy", the device is responding to admin commands but will not create IO queues. In this case, we need to remove the gendisks and free the IO queues since they can't be used and may be holding bios in their lists. From testing, the dma pools require a pci device so this had to change the pci driver 'remove' to release the dma resources in line with that call instead of after all references to the device are released. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
For 32-bit versions of sg3-utils running on a 64-bit system. This is mostly a copy from the relevent portions of fs/compat_ioctl.c, with slight modifications for going through block_device_operations. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com> [fixed up CONFIG_COMPAT=n build problems] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 19 11月, 2013 1 次提交
-
-
由 Haiyan Hu 提交于
Changes the type of dev->db_stride to unsigned and changes the value stored there to be 1 << the current value. Then there is less calculation to be done at completion time. Signed-off-by: NHaiyan Hu <huhaiyan@huawei.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 04 9月, 2013 3 次提交
-
-
由 Keith Busch 提交于
The NVMe spec recommends using the shutdown normal sequence when safely taking the controller offline instead of hitting CC.EN on the next start-up to reset the controller. The spec recommends a minimum of 1 second for the shutdown complete. This patch waits 2 seconds to be on the safe side. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
The 'Number of Namespaces' read from the device was being treated as signed, which would cause us to not scan any namespaces for a device with more than 2 billion namespaces. That led to noticing that the namespace ID was also being treated as signed, which could lead to the result from NVME_IOCTL_ID being treated as an error code. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
To build user programs that call the NVMe ioctls, we need to have a user header file. Catch up to the new way of doing that by splitting the header file into kernel and uapi portions. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 21 6月, 2013 1 次提交
-
-
由 Keith Busch 提交于
Add io stats accounting for bio requests so nvme block devices show useful disk stats. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 08 5月, 2013 1 次提交
-
-
由 Matthew Wilcox 提交于
Add definitions for the three Firmware Activate actions, and change the SCSI translation code to construct the command into a temporary variable instead of translating the endianness back-and-forth. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
-
- 03 5月, 2013 2 次提交
-
-
由 Keith Busch 提交于
This adds support for namespaces with separate meta-data formats in the submit io ioctl. The meta-data buffer has to be a contiguous, so such a buffer is allocated and the mapped user pages are copied to/from this buffer for write/read commands. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
We have an nvme device that has a concept of a stripe size. IO requests that do not transfer data crossing a stripe boundary has greater performance compared to IO that does cross it. This patch sets the stripe size for the device if the device and vendor ids match one with this feature and splits IO requests that cross the stripe boundary. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 17 4月, 2013 3 次提交
-
-
由 Keith Busch 提交于
Registers a miscellaneous device for each nvme controller probed. This creates character device files as /dev/nvmeN, where N is the device instance, and supports nvme admin ioctl commands so devices without namespaces can be managed. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
When constructing the command, dsmgmt needs to be treated as a 32-bit value, not a 16-bit value. reftag, apptag and appmask all need to be converted from native-endian to little-endian. Again, sparse's bitwise warnings caught this problem. Thanks to Keith for pointing out the correct way to fix the reftag. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Acked-by: NKeith Busch <keith.busch@intel.com>
-
由 Matthew Wilcox 提交于
Introduce nvme_block_nr() to help convert sectors to block numbers. This fixes an integer overflow in the SCSI conversion layer, and it's slightly less typing than opencoding it. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Acked-by: NKeith Busch <keith.busch@intel.com>
-
- 29 3月, 2013 1 次提交
-
-
由 Vishal Verma 提交于
Translates SCSI commands in SG_IO ioctl to NVMe commands. Uses the scsi-nvme translation spec from nvmexpress.org as reference. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 27 3月, 2013 3 次提交
-
-
由 Vishal Verma 提交于
The SCSI emulation has the ability to send format commands, so we need to add the definition of the command. Also add a missing error code. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Vishal Verma 提交于
nvme-scsi.c uses several data structures and definitions that were previously private to nvme-core.c. Move the definitions to nvme.h, protected by __KERNEL__. Signed-off-by: NVishal Verma <vishal.l.verma@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
This adds discard support to block queues if the nvme device is capable of deallocating blocks as indicated by the controller's optional command support. A discard flagged bio request will submit an NVMe deallocate Data Set Management command for the requested blocks. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 13 11月, 2012 1 次提交
-
-
由 Keith Busch 提交于
This data structure is defined in the NVMe specification. It's not used by the kernel, but is available for use by userspace software. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 28 7月, 2012 1 次提交
-
-
由 Keith Busch 提交于
Set the depth for IO queues to the device's maximum supported queue entries if the requested depth exceeds the device's capabilities. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 27 7月, 2012 1 次提交
-
-
由 Keith Busch 提交于
Set the max hw sectors in a namespace's request queue if the nvme device has a max data transfer size. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 05 11月, 2011 3 次提交
-
-
由 Matthew Wilcox 提交于
The driver was still using an old definition of Identify Controller which only came to light once we started using the 'number of namespaces' field properly. Reported-by: NNisheeth Bhat <nisheeth.bhat@intel.com> Reported-by: NKhosrow Panah <Khosrow.Panah@idt.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
The doorbell stride allows devices to spread out their doorbells instead of packing them tightly. This feature was added as part of ECN 003. This patch also enables support for more than 512 queues :-) Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD ioctl that can submit any admin command. Add a new ID ioctl that returns the namespace ID of the queried device. It corresponds to the SCSI Idlun ioctl. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-