- 18 11月, 2014 5 次提交
-
-
由 Jens Axboe 提交于
We are called for async event notification issues, and the nvmeq lock is already held. If we fail the request allocation, we'll just retry next time. Reported-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
No point in using blk_put_request(), since we know we are blk-mq. This only makes sense in core code where we could be dealing with either legacy or blk-mq drivers. Additionally, use blk_mq_free_hctx_request() for the request completion fast path, where we already know the mapping from request to hardware queue. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
-
由 Jens Axboe 提交于
It's silly to use blk_mq_free_request() which in turn maps the request to the hardware queue, for places where we already know what the hardware queue is. This saves us an extra mapping of a hardware queue on request completion, if the caller knows this information already. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
Drivers that know they are blk-mq should just use this function instead of calling through blk_put_request(). Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 12 11月, 2014 2 次提交
-
-
由 Paolo Bonzini 提交于
blk-mq is using preempt_disable/enable in order to ensure that the queue runners are placed on the right CPU. This does not work with the RT patches, because __blk_mq_run_hw_queue takes a non-raw spinlock with the preemption-disabled region. If there is contention on the lock, this violates the rules for preemption-disabled regions. While this should be easily fixable within the RT patches just by doing migrate_disable/enable, we can do better and document _why_ this particular region runs with disabled preemption. After the previous patch, it is trivial to switch it to get/put_cpu; the RT patches then can change it to get_cpu_light, which lets virtio-blk run under RT kernels. Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: NClark Williams <williams@redhat.com> Tested-by: NClark Williams <williams@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Paolo Bonzini 提交于
preempt_disable/enable surrounds every call to blk_mq_run_hw_queue, except the one in blk-flush.c. In fact that one is always asynchronous, and it does not need smp_processor_id(). We can do the same for all other calls, avoiding preempt_disable when async is true. This avoids peppering blk-mq.c with preemption-disabled regions. Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: NClark Williams <williams@redhat.com> Tested-by: NClark Williams <williams@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 11月, 2014 8 次提交
-
-
由 Philipp Reisner 提交于
Old backward-compat cruft Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
A connection timeout affects all volumes of a resource! Under the following conditions: A resource with multiple volumes AND ko-count >=1 AND a write request triggers the timeout (ko-count * timeout) DRBD's internal state gets confused. That in turn may lead to very miss leading follow up failures. E.g. "BUG: scheduling while atomic" CC: stable@kernel.org # v3.17 Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
This was not noticed for many years. Affects operation if md raid is used a backing device for DRBD. CC: stable@kernel.org # v3.2+ Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
If for some reason DRBD resync was the only activity on a backend device, drbd_rs_c_min_rate_throttle() would mistakenly decide that it is still initialization time, and keep throttling the resync. This patch explicitly initializes ->rs_last_events to the current backend event counters, and drops the rs_last_events == 0 from the throttle condition. Reported-by: NMikhail Sugakov <msugakov@amazon.de> Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
Symptoms: If DRBD was "cleanly shut down" (all in sync, both Secondary before disconnect, identical data generation uuids), and then one side was promoted *during* the next connection handshake, the role change could confuse the handshake. The Primary would get stuck in WFBitmapS, the Secondary would log unexpected cstate (Connected) in receive_bitmap and get stuck in WFBitmapT. Fix: The test in is_valid_soft_transition wrong. It works because the not allowed actions (promote/attach) do not touch the cstate. The previous condition failed to demand a cstate change in one clause. In order to avoid deadlocks give up the state_mutex while waiting for the transient state to go away. Conflicts: drbd/drbd_state.c drbd/drbd_state.h drbd/drbd_wrappers.h Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreas Gruenbacher 提交于
Avoid generic netlink calls in other parts of the code base. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreas Gruenbacher 提交于
. Update comments . drbd_set_{in,out_of}_sync(): Remove unused parameters . Move common code into adm_del_resource() . Redefine ERR_MINOR_EXISTS -> ERR_MINOR_OR_VOLUME_EXISTS Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 kbuild test robot 提交于
drivers/block/nvme-core.c:865:5: sparse: symbol '__nvme_submit_admin_cmd' was not declared. Should it be static? Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 11月, 2014 1 次提交
-
-
由 Dan Carpenter 提交于
We recently converted this to blk_mq but the error checks have to be updated to check for IS_ERR() instead of NULL. Fixes: a4aea562 ('NVMe: Convert to blk-mq') Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 11月, 2014 24 次提交
-
-
由 Matias Bjørling 提交于
This converts the NVMe driver to a blk-mq request-based driver. The NVMe driver is currently bio-based and implements queue logic within itself. By using blk-mq, a lot of these responsibilities can be moved and simplified. The patch is divided into the following blocks: * Per-command data and cmdid have been moved into the struct request field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id maintenance are now handled by blk-mq through the rq->tag field. * The logic for splitting bio's has been moved into the blk-mq layer. The driver instead notifies the block layer about limited gap support in SG lists. * blk-mq handles timeouts and is reimplemented within nvme_timeout(). This both includes abort handling and command cancelation. * Assignment of nvme queues to CPUs are replaced with the blk-mq version. The current blk-mq strategy is to assign the number of mapped queues and CPUs to provide synergy, while the nvme driver assign as many nvme hw queues as possible. This can be implemented in blk-mq if needed. * NVMe queues are merged with the tags structure of blk-mq. * blk-mq takes care of setup/teardown of nvme queues and guards invalid accesses. Therefore, RCU-usage for nvme queues can be removed. * IO tracing and accounting are handled by blk-mq and therefore removed. * Queue suspension logic is replaced with the logic from the block layer. Contributions in this patch from: Sam Bradshaw <sbradshaw@micron.com> Jens Axboe <axboe@fb.com> Keith Busch <keith.busch@intel.com> Robert Nelson <rlnelson@google.com> Acked-by: NKeith Busch <keith.busch@intel.com> Acked-by: NJens Axboe <axboe@fb.com> Updated for new ->queue_rq() prototype. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Discard requests are often for very large ranges. The discard size is not representative of the data transfer size so we don't need to allocate for such a large prp list. This patch requests allocating only enough for the memory needed for the data transfer and saves a little over 8k of memory per max discard request. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NPaul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
It is possible the block layer will request to open a block device after the driver deleted it. Subsequent releases will cause a double free, or the disk's private_data is pointing to freed memory. This patch protects the driver's freed disks from being opened and accessed: the nvme namespaces are freed only when the device's refcount is 0, so at that moment there were no active openers and no more should be allowed, and it is safe to clear the disk's private_data that is about to be freed. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NHenry Chow <henry.chow@oracle.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
The nvme namespace request_queue's flags are initialized to QUEUE_FLAG_DEFAULT, which currently sets QUEUE_FLAG_STACKABLE. The device-mapper indicates this flag means the block driver is requset based, though this driver is bio-based and problems will occur if an nvme namespace is used with a request based dm device. This patch clears the stackable flag. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If we ever do parallel device probing, we need to wake up all processes waiting for nvme kthread to start, not just one. This is currently serialized so the bug is not reachable today, but fixing this anyway in the hopes we implement parallel or asynchronous probe in the future. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Updating commands and structures for NVMe 1.1 updates, mostly for nvme reservations. There are no additional in-kernel uses, but this is for the uapi. While doing this, I noticed that the software progress features was using the wrong value, so updating that value as well. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
The NVME_IOCTL_SUBMIT_IO only works for IO commands with block data transfers and isn't usable for other NVMe commands like flush, data set management, or any sort of vendor unique command. The NVME_IOCTL_ADMIN_CMD, however, can easily be modified to accept arbitrary IO commands in addition to arbitrary admin commands without breaking backward compatibility. This patch just adds a new IOCTL to distinguish if the driver should submit the command on an IO or Admin queue. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This adds a callback to revalidate the disk and change its block size and capacity if needed. Before, a user would have to remove + rescan an entire device if they changed the logical block size using an NVMe Format or other vendor specific command; now they can just run something that issues the BLKRRPART IOCTL, like # hdparm -z /dev/nvmeXnY This can also be used in response to the 1.2 Spec's Namespace Attribute Change asynchronous event. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We need to update the nvme queue's wait_queue_t entry during each initialization since the nvme_thread may be ended and restarted when the device is reset. If a device reset occurs during a large amount of buffered IO, it would take a lot longer to complete the outstanding requests due to the 1 second polling instead of waking up as completions occur. Fixes: b9afca3eSigned-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This returns a more appropriate error for the "capacity exceeded" status. In case other NVMe statuses have a better errno, this patch adds a convience function to translate an NVMe status code to an errno for IO commands, defaulting to the current -EIO. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We've only been setting the sg_io_hdr status values on SCSI commands that require an nvme command to complete the translation. The fields in the struct are output parameters, so we have to set them, otherwise user space will see whatever was in memory from before. In the case of compat SG_IO, this would reveal kernel memory. This fixes the issue by initializing the sg_io_hdr with successful status. Signed-off-by: NKeith Busch <keith.busch@intel.com> Acked-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We can return -ENOIOCTLCMD and the ioctl will be handled by fs/compat_ioctl.c instead. This removes a lot of duplicate code in the nvme driver. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If an nvme device is removed but user space has an open reference, the nvme driver would have been holding an invalid reference to its pci device. You may get a general protection fault on x86 h/w when the driver uses that reference in dma_map_sg(), as is done in nvme_map_user_pages() from the IOCTL interface. This patch fixes the fault by taking a reference on the pci device and holding it even after device removal until all opens on the nvme device are closed. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NNilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreea-Cristina Bernat 提交于
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sam Bradshaw 提交于
nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index. This patch fixes the case where there are more cpus from which the ioctl call can originate than online queues, which can happen when a device supports or was allocated fewer interrupt vectors than exist cpu cores. Thanks to Keith Busch for the implementation suggestion. Signed-off-by: NSam Bradshaw <sbradshaw@micron.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This changes the order of deleting the gendisks so it happens after the nvme IO queues are freed. If a device is removed while a filesystem has associated dirty data, the removal will wait on these to complete before proceeding from del_gendisk, which could have caused deadlock before. The implication of this is that an orderly removal of a responsive device won't necessarily wait for dirty data to be written, but we are not guaranteed the device is even going to respond at this point either. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Rather than relying on call_rcu, this patch directly frees the nvme_queue's memory after ensuring no readers exist. Some arch specific dma_free_coherent implementations may not be called from a call_rcu's soft interrupt context, hence the change. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NMatthew Minter <matthew_minter@xyratex.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan McLeran 提交于
The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Rather than skipping shutdown only for devices that have been removed, skip the orderly shutdown on failed devices to avoid the long timeout handling that inevitably happens when deleting queues on such a device. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Fixing tabs inadvertently converted to spaces. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Race conditions are theoretically possible between the NVMe PCI device removal and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make the NVMe code use pci_stop_and_remove_bus_device_locked(). Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This is a minor refactor for handling devices that are incapable of IO. The driver previously used special error codes to know that IO queues are unavailable, but we have an online queue count now. This also fixes an issue where the driver successfully sets the queue count, but either is unable to allocate an IO queue or the device can't create one for some reason. If the driver can successfully enable the device and get responses to admin commands, the driver will bring up a character device for managment but not create block devices. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan McLeran 提交于
Change the behavior of nvme_enable_ctrl to set EN. Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl. Remove reading of the CC register and manage the state in dev->ctrl_config. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> [removed an unwanted write to CC] Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-