- 05 11月, 2014 17 次提交
-
-
由 Keith Busch 提交于
We need to update the nvme queue's wait_queue_t entry during each initialization since the nvme_thread may be ended and restarted when the device is reset. If a device reset occurs during a large amount of buffered IO, it would take a lot longer to complete the outstanding requests due to the 1 second polling instead of waking up as completions occur. Fixes: b9afca3eSigned-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This returns a more appropriate error for the "capacity exceeded" status. In case other NVMe statuses have a better errno, this patch adds a convience function to translate an NVMe status code to an errno for IO commands, defaulting to the current -EIO. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We can return -ENOIOCTLCMD and the ioctl will be handled by fs/compat_ioctl.c instead. This removes a lot of duplicate code in the nvme driver. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If an nvme device is removed but user space has an open reference, the nvme driver would have been holding an invalid reference to its pci device. You may get a general protection fault on x86 h/w when the driver uses that reference in dma_map_sg(), as is done in nvme_map_user_pages() from the IOCTL interface. This patch fixes the fault by taking a reference on the pci device and holding it even after device removal until all opens on the nvme device are closed. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NNilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreea-Cristina Bernat 提交于
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sam Bradshaw 提交于
nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index. This patch fixes the case where there are more cpus from which the ioctl call can originate than online queues, which can happen when a device supports or was allocated fewer interrupt vectors than exist cpu cores. Thanks to Keith Busch for the implementation suggestion. Signed-off-by: NSam Bradshaw <sbradshaw@micron.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This changes the order of deleting the gendisks so it happens after the nvme IO queues are freed. If a device is removed while a filesystem has associated dirty data, the removal will wait on these to complete before proceeding from del_gendisk, which could have caused deadlock before. The implication of this is that an orderly removal of a responsive device won't necessarily wait for dirty data to be written, but we are not guaranteed the device is even going to respond at this point either. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Rather than relying on call_rcu, this patch directly frees the nvme_queue's memory after ensuring no readers exist. Some arch specific dma_free_coherent implementations may not be called from a call_rcu's soft interrupt context, hence the change. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NMatthew Minter <matthew_minter@xyratex.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan McLeran 提交于
The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Rather than skipping shutdown only for devices that have been removed, skip the orderly shutdown on failed devices to avoid the long timeout handling that inevitably happens when deleting queues on such a device. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Fixing tabs inadvertently converted to spaces. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Race conditions are theoretically possible between the NVMe PCI device removal and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make the NVMe code use pci_stop_and_remove_bus_device_locked(). Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This is a minor refactor for handling devices that are incapable of IO. The driver previously used special error codes to know that IO queues are unavailable, but we have an online queue count now. This also fixes an issue where the driver successfully sets the queue count, but either is unable to allocate an IO queue or the device can't create one for some reason. If the driver can successfully enable the device and get responses to admin commands, the driver will bring up a character device for managment but not create block devices. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan McLeran 提交于
Change the behavior of nvme_enable_ctrl to set EN. Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl. Remove reading of the CC register and manage the state in dev->ctrl_config. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> [removed an unwanted write to CC] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Submits NVMe asynchronous event requests, one event up to the controller maximum or number of possible different event types (8), whichever is smaller. Events successfully returned by the controller are logged. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Joe Perches 提交于
Use the zeroing function instead of dma_alloc_coherent & memset(,0,) Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 10月, 2014 1 次提交
-
-
由 Mike Snitzer 提交于
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set QUEUE_FLAG_NONROT. Historically, all block devices have automatically made entropy contributions. But as previously stated in commit e2e1a148 ("block: add sysfs knob for turning off disk entropy contributions"): - On SSD disks, the completion times aren't as random as they are for rotational drives. So it's questionable whether they should contribute to the random pool in the first place. - Calling add_disk_randomness() has a lot of overhead. There are more reliable sources for randomness than non-rotational block devices. From a security perspective it is better to err on the side of caution than to allow entropy contributions from unreliable "random" sources. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 6月, 2014 1 次提交
-
-
由 Keith Busch 提交于
There is a potential dead lock if a cpu event occurs during nvme probe since it registered with hot cpu notification. This fixes the race by having the module register with notification outside of probe rather than have each device register. The actual work is done in a scheduled work queue instead of in the notifier since assigning IO queues has the potential to block if the driver creates additional queues. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 04 6月, 2014 6 次提交
-
-
由 Matthew Wilcox 提交于
It's positively immoral to have a global variable called 'io_timeout'. Keep the module parameter called io_timeout, though. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Sam Bradshaw 提交于
Recently, a new sysfs control "iostats" was added to selectively enable or disable io statistics collection for request queues. This patch hooks that control. IO statistics collection is rather expensive on large, multi-node machines with drives pushing millions of iops. Having the ability to disable collection if not needed can improve throughput significantly. As a data point, on a quad E5-4640, I see more than 50% throughput improvement when io statistics accounting is disabled during heavily multi-threaded small block random read benchmarks where device performance is in the million iops+ range. Signed-off-by: NSam Bradshaw <sbradshaw@micron.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
The routines to get and lock nvme queues required the caller to "put" or "unlock" them even if getting one returned NULL. This patch fixes that. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Signed-off-by: NKeith Busch <keith.busch@intel.com> [made admin_timeout static] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
This was originally set to 4 times the IO timeout, but that was when the IO timeout was 5 seconds instead of 30. 20 seconds for total time to failure seemed more reasonable than 2 minutes for most, but other users have requested to make this a module parameter instead. Signed-off-by: NKeith Busch <keith.busch@intel.com> [renamed the module parameter to retry_time] [made retry_time static] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Santosh Y 提交于
kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod' can fail. So check the return value. Signed-off-by: NSantosh Y <santosh.sy@samsung.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 28 5月, 2014 1 次提交
-
-
由 Keith Busch 提交于
Quiesce and shutdown the device prior to reset, then restart the device and resume IO after. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 10 5月, 2014 1 次提交
-
-
由 Matthew Wilcox 提交于
Since _nvme_check_size() wasn't being called from anywhere, the compiler was optimising it away ... along with all the link-time build failures that would result if any of the structures were the wrong size. Call it from nvme_exit() for no particular reason. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 05 5月, 2014 6 次提交
-
-
由 Keith Busch 提交于
It is possible a filesystem may send a flush flagged bio with write data. There is no such composite NVMe command, so the driver sends flush and write separately. The device is allowed to execute these commands in any order, so it was possible the driver ends the bio after the write completes, but while the flush is still active. We don't want to let a filesystem believe flush succeeded before it really has; this could cause data corruption on a power loss between these events. To fix, this patch splits the flush and write into chained bios. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
This configures an nvme request_queue as flush capable if the device has a volatile write cache present. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Adding tracepoints for bio_complete and block_split into nvme to help with gathering IO info using blktrace and blkparse. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
If a misbehaving device posts a CQE with a command id < depth but for one that was never allocated, the command info will have a callback function set to NULL and we don't want to try invoking that. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Matthew Wilcox 提交于
Help people diagnose what is going wrong at initialisation time by printing out which command has gone wrong and what the device returned. Also fix the error message printed while waiting for reset. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: NKeith Busch <keith.busch@intel.com>
-
由 Matthew Wilcox 提交于
Make the copyright dates accurate and remove the final paragraph that includes the address of the FSF. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 11 4月, 2014 6 次提交
-
-
由 Keith Busch 提交于
For commands returned with failed status, queue these for resubmission and continue retrying them until success or for a limited amount of time. The final timeout was arbitrarily chosen so requests can't be retried indefinitely. Since these are requeued on the nvmeq that submitted the command, the callbacks have to take an nvmeq instead of an nvme_dev as a parameter so that we can use the locked queue to append the iod to retry later. The nvme_iod conviently can be used to track how long we've been trying to successfully complete an iod request. The nvme_iod also provides the nvme prp dma mappings, so I had to move a few things around so we can keep those mappings. Signed-off-by: NKeith Busch <keith.busch@intel.com> [fixed checkpatch issue with long line] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Some programs require HDIO_GETGEO work, which requires we implement getgeo. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Dan McLeran 提交于
Done to ensure nvme_thread is not running when there are no devices to poll. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Increase the default timeout to 30 seconds to match SCSI. Signed-off-by: NKeith Busch <keith.busch@intel.com> [use byte instead of ushort] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
Registers with hot cpu notification to rebalance, and potentially allocate additional, io queues. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
由 Keith Busch 提交于
The device's IO queues are associated with CPUs, so we can use a per-cpu variable to map the a qid to a cpu. This provides a convienient way to optimally assign queues to multiple cpus when the device supports fewer queues than the host has cpus. The previous implementation may have assigned these poorly in these situations. This patch addresses this by sharing queues among cpus that are "close" together and should have a lower lock contention penalty. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-
- 24 3月, 2014 1 次提交
-
-
由 Matthew Wilcox 提交于
Checkpatch has started warning against using DEFINE_PCI_DEVICE_TABLE, so replace it. Also update the copyright date and bump the module version number to 0.9. Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
-