- 11 12月, 2014 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
Guard against issuing unsupported REQ_FUA and REQ_FLUSH was introduced in d11e6158 and was factored out into blkif_request_flush_valid() in 0f1ca65e. However: 1) This check in incomplete. In case we negotiated to feature_flush = REQ_FLUSH and flush_op = BLKIF_OP_FLUSH_DISKCACHE (so FUA is unsupported) FUA request will still pass the check. 2) blkif_request_flush_valid() is misnamed. It is bool but returns true when the request is invalid. 3) When blkif_request_flush_valid() fails -EIO is being returned. It seems that -EOPNOTSUPP is more appropriate here. Fix all of the above issues. This patch is based on the original patch by Laszlo Ersek and a comment by Jeff Moyer. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLaszlo Ersek <lersek@redhat.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 04 12月, 2014 1 次提交
-
-
由 Keith Busch 提交于
On retry, the req->special is pointing to an already setup IOD, but we still need to setup the command context and callback, otherwise you'll see false twice completed errors and leak requests. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 27 11月, 2014 1 次提交
-
-
由 Matias Bjorling 提交于
When either queue_mode or irq_mode parameter is set outside its boundaries, the driver will not complete requests. This stalls driver initialization when partitions are probed. Fix by setting out of bound values to the parameters default. Signed-off-by: NMatias Bjørling <m@bjorling.me> Updated by me to have the parse+check in just one function. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 24 11月, 2014 2 次提交
-
-
由 Gu Zheng 提交于
Use generic io stats accounting help functions (generic_{start,end}_io_acct) to simplify io stat accounting. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Gu Zheng 提交于
Use generic io stats accounting help functions (generic_{start,end}_io_acct) to simplify io stat accounting. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 22 11月, 2014 1 次提交
-
-
由 Keith Busch 提交于
It's already near impossible to tell what bits someone is running based on a 'modinfo nvme', and I don't want to try guessing if someone is running blk-mq or bio-based. Let's make it obvious with the module version that the blk-mq conversion is a major change. Future bio-based versions can increment to 0.10 in a fork if revisions occur. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 21 11月, 2014 2 次提交
-
-
由 Jens Axboe 提交于
The PCI init of NVMe doesn't check for valid bars before proceeding to map and use BAR 0. If the device is hosed (or firmware is), then we should catch this case and give up early. This fixes a: [ 1662.035778] WARNING: CPU: 0 PID: 4 at arch/x86/mm/ioremap.c:63 __ioremap_check_ram+0xa7/0xc0() and later badness on such a device. Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
If we do teardown and setup of the queue and block related parts of the driver, then we should clear nvmeq->hctx once we kill the hardware queue. Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 20 11月, 2014 1 次提交
-
-
由 Jens Axboe 提交于
The setup/probe part currently relies on INTx being there and working, that's not always the case. For devices that don't advertise INTx, enable a single MSIx vector early on and disable it again before we ask for our full range of queue vecs. Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 11月, 2014 3 次提交
-
-
由 Jens Axboe 提交于
Before the blk-mq conversion they were on by default, we should not change behavior there. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
We are called for async event notification issues, and the nvmeq lock is already held. If we fail the request allocation, we'll just retry next time. Reported-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
No point in using blk_put_request(), since we know we are blk-mq. This only makes sense in core code where we could be dealing with either legacy or blk-mq drivers. Additionally, use blk_mq_free_hctx_request() for the request completion fast path, where we already know the mapping from request to hardware queue. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 11月, 2014 1 次提交
-
-
由 Weijie Yang 提交于
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram page becomes a full-zeroed page after a partial write io. The current code doesn't handle this case and performs kunmap_atomic() on a NULL pointer, which panics the kernel. This patch fixes this issue. Signed-off-by: NWeijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 11月, 2014 8 次提交
-
-
由 Philipp Reisner 提交于
Old backward-compat cruft Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
A connection timeout affects all volumes of a resource! Under the following conditions: A resource with multiple volumes AND ko-count >=1 AND a write request triggers the timeout (ko-count * timeout) DRBD's internal state gets confused. That in turn may lead to very miss leading follow up failures. E.g. "BUG: scheduling while atomic" CC: stable@kernel.org # v3.17 Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
This was not noticed for many years. Affects operation if md raid is used a backing device for DRBD. CC: stable@kernel.org # v3.2+ Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
If for some reason DRBD resync was the only activity on a backend device, drbd_rs_c_min_rate_throttle() would mistakenly decide that it is still initialization time, and keep throttling the resync. This patch explicitly initializes ->rs_last_events to the current backend event counters, and drops the rs_last_events == 0 from the throttle condition. Reported-by: NMikhail Sugakov <msugakov@amazon.de> Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
Symptoms: If DRBD was "cleanly shut down" (all in sync, both Secondary before disconnect, identical data generation uuids), and then one side was promoted *during* the next connection handshake, the role change could confuse the handshake. The Primary would get stuck in WFBitmapS, the Secondary would log unexpected cstate (Connected) in receive_bitmap and get stuck in WFBitmapT. Fix: The test in is_valid_soft_transition wrong. It works because the not allowed actions (promote/attach) do not touch the cstate. The previous condition failed to demand a cstate change in one clause. In order to avoid deadlocks give up the state_mutex while waiting for the transient state to go away. Conflicts: drbd/drbd_state.c drbd/drbd_state.h drbd/drbd_wrappers.h Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreas Gruenbacher 提交于
Avoid generic netlink calls in other parts of the code base. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreas Gruenbacher 提交于
. Update comments . drbd_set_{in,out_of}_sync(): Remove unused parameters . Move common code into adm_del_resource() . Redefine ERR_MINOR_EXISTS -> ERR_MINOR_OR_VOLUME_EXISTS Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 kbuild test robot 提交于
drivers/block/nvme-core.c:865:5: sparse: symbol '__nvme_submit_admin_cmd' was not declared. Should it be static? Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 11月, 2014 1 次提交
-
-
由 Dan Carpenter 提交于
We recently converted this to blk_mq but the error checks have to be updated to check for IS_ERR() instead of NULL. Fixes: a4aea562 ('NVMe: Convert to blk-mq') Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 11月, 2014 18 次提交
-
-
由 Matias Bjørling 提交于
This converts the NVMe driver to a blk-mq request-based driver. The NVMe driver is currently bio-based and implements queue logic within itself. By using blk-mq, a lot of these responsibilities can be moved and simplified. The patch is divided into the following blocks: * Per-command data and cmdid have been moved into the struct request field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id maintenance are now handled by blk-mq through the rq->tag field. * The logic for splitting bio's has been moved into the blk-mq layer. The driver instead notifies the block layer about limited gap support in SG lists. * blk-mq handles timeouts and is reimplemented within nvme_timeout(). This both includes abort handling and command cancelation. * Assignment of nvme queues to CPUs are replaced with the blk-mq version. The current blk-mq strategy is to assign the number of mapped queues and CPUs to provide synergy, while the nvme driver assign as many nvme hw queues as possible. This can be implemented in blk-mq if needed. * NVMe queues are merged with the tags structure of blk-mq. * blk-mq takes care of setup/teardown of nvme queues and guards invalid accesses. Therefore, RCU-usage for nvme queues can be removed. * IO tracing and accounting are handled by blk-mq and therefore removed. * Queue suspension logic is replaced with the logic from the block layer. Contributions in this patch from: Sam Bradshaw <sbradshaw@micron.com> Jens Axboe <axboe@fb.com> Keith Busch <keith.busch@intel.com> Robert Nelson <rlnelson@google.com> Acked-by: NKeith Busch <keith.busch@intel.com> Acked-by: NJens Axboe <axboe@fb.com> Updated for new ->queue_rq() prototype. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Discard requests are often for very large ranges. The discard size is not representative of the data transfer size so we don't need to allocate for such a large prp list. This patch requests allocating only enough for the memory needed for the data transfer and saves a little over 8k of memory per max discard request. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NPaul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
It is possible the block layer will request to open a block device after the driver deleted it. Subsequent releases will cause a double free, or the disk's private_data is pointing to freed memory. This patch protects the driver's freed disks from being opened and accessed: the nvme namespaces are freed only when the device's refcount is 0, so at that moment there were no active openers and no more should be allowed, and it is safe to clear the disk's private_data that is about to be freed. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NHenry Chow <henry.chow@oracle.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
The nvme namespace request_queue's flags are initialized to QUEUE_FLAG_DEFAULT, which currently sets QUEUE_FLAG_STACKABLE. The device-mapper indicates this flag means the block driver is requset based, though this driver is bio-based and problems will occur if an nvme namespace is used with a request based dm device. This patch clears the stackable flag. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If we ever do parallel device probing, we need to wake up all processes waiting for nvme kthread to start, not just one. This is currently serialized so the bug is not reachable today, but fixing this anyway in the hopes we implement parallel or asynchronous probe in the future. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
The NVME_IOCTL_SUBMIT_IO only works for IO commands with block data transfers and isn't usable for other NVMe commands like flush, data set management, or any sort of vendor unique command. The NVME_IOCTL_ADMIN_CMD, however, can easily be modified to accept arbitrary IO commands in addition to arbitrary admin commands without breaking backward compatibility. This patch just adds a new IOCTL to distinguish if the driver should submit the command on an IO or Admin queue. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This adds a callback to revalidate the disk and change its block size and capacity if needed. Before, a user would have to remove + rescan an entire device if they changed the logical block size using an NVMe Format or other vendor specific command; now they can just run something that issues the BLKRRPART IOCTL, like # hdparm -z /dev/nvmeXnY This can also be used in response to the 1.2 Spec's Namespace Attribute Change asynchronous event. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We need to update the nvme queue's wait_queue_t entry during each initialization since the nvme_thread may be ended and restarted when the device is reset. If a device reset occurs during a large amount of buffered IO, it would take a lot longer to complete the outstanding requests due to the 1 second polling instead of waking up as completions occur. Fixes: b9afca3eSigned-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This returns a more appropriate error for the "capacity exceeded" status. In case other NVMe statuses have a better errno, this patch adds a convience function to translate an NVMe status code to an errno for IO commands, defaulting to the current -EIO. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We've only been setting the sg_io_hdr status values on SCSI commands that require an nvme command to complete the translation. The fields in the struct are output parameters, so we have to set them, otherwise user space will see whatever was in memory from before. In the case of compat SG_IO, this would reveal kernel memory. This fixes the issue by initializing the sg_io_hdr with successful status. Signed-off-by: NKeith Busch <keith.busch@intel.com> Acked-by: NVishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
We can return -ENOIOCTLCMD and the ioctl will be handled by fs/compat_ioctl.c instead. This removes a lot of duplicate code in the nvme driver. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
If an nvme device is removed but user space has an open reference, the nvme driver would have been holding an invalid reference to its pci device. You may get a general protection fault on x86 h/w when the driver uses that reference in dma_map_sg(), as is done in nvme_map_user_pages() from the IOCTL interface. This patch fixes the fault by taking a reference on the pci device and holding it even after device removal until all opens on the nvme device are closed. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NNilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreea-Cristina Bernat 提交于
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Sam Bradshaw 提交于
nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index. This patch fixes the case where there are more cpus from which the ioctl call can originate than online queues, which can happen when a device supports or was allocated fewer interrupt vectors than exist cpu cores. Thanks to Keith Busch for the implementation suggestion. Signed-off-by: NSam Bradshaw <sbradshaw@micron.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This changes the order of deleting the gendisks so it happens after the nvme IO queues are freed. If a device is removed while a filesystem has associated dirty data, the removal will wait on these to complete before proceeding from del_gendisk, which could have caused deadlock before. The implication of this is that an orderly removal of a responsive device won't necessarily wait for dirty data to be written, but we are not guaranteed the device is even going to respond at this point either. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Rather than relying on call_rcu, this patch directly frees the nvme_queue's memory after ensuring no readers exist. Some arch specific dma_free_coherent implementations may not be called from a call_rcu's soft interrupt context, hence the change. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reported-by: NMatthew Minter <matthew_minter@xyratex.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Dan McLeran 提交于
The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds. Signed-off-by: NDan McLeran <daniel.mcleran@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
Rather than skipping shutdown only for devices that have been removed, skip the orderly shutdown on failed devices to avoid the long timeout handling that inevitably happens when deleting queues on such a device. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-