- 21 6月, 2019 2 次提交
-
-
由 Akinobu Mita 提交于
Currenlty fault injection support for nvme only enables to inject errors into the commands submitted to I/O queues. In preparation for fault injection into the admin commands, this makes the helper functions independent of struct nvme_ns. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Future use intends to make use of both, so export these functions. And since their implementation is identical except for the opcode, provide a new function that implement both. [akinobu.mita@gmail.com>: fix line over 80 characters] Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 18 5月, 2019 1 次提交
-
-
由 Keith Busch 提交于
A controller with multiple namespaces may have multiple request_queues with their own timeout work. If a controller fails with IO outstanding to diffent namespaces, each request queue may attempt to handle it, so ensure there is no previously scheduled timeout work executing prior to starting controller initialization by synchronizing with each queue. Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <keith.busch@intel.com>
-
- 01 5月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
- 14 3月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Qemu started out with a broken implementation of Write Zeroes written by yours truly. Disable Write Zeroes on qemu for now, eventually we need to go back and make all the qemu quirks version specific, but that is left for another time. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Tested-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 2月, 2019 3 次提交
-
-
由 Christoph Hellwig 提交于
Update license to use SPDX-License-Identifier instead of verbose license text. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Bart Van Assche 提交于
Since nvme_delete_ctrl_sync() is not called from any other kernel module, unexport it. Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
Implement a simple round-robin I/O policy for multipathing. Path selection is done in two rounds, first iterating across all optimized paths, and if that doesn't return any valid paths, iterate over all optimized and non-optimized paths. If no paths are found, use the existing algorithm. Also add a sysfs attribute 'iopolicy' to switch between the current NUMA-aware I/O policy and the 'round-robin' I/O policy. Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 06 2月, 2019 1 次提交
-
-
由 Keith Busch 提交于
If a controller supports the NS Change Notification, the namespace scan_work is automatically triggered after attaching a new namespace. Occasionally the namespace scan_work may append the new namespace to the list before the admin command effects handling is completed. The effects handling unfreezes namespaces, but if it unfreezes the newly attached namespace, its request_queue freeze depth will be off and we'll hit the warning in blk_mq_unfreeze_queue(). On the next namespace add, we will fail to freeze that queue due to the previous bad accounting and deadlock waiting for frozen. Fix that by preventing scan work from altering the namespace list while command effects handling needs to pair freeze with unfreeze. Reported-by: NWen Xiong <wenxiong@us.ibm.com> Tested-by: NWen Xiong <wenxiong@us.ibm.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 04 2月, 2019 1 次提交
-
-
由 Sagi Grimberg 提交于
It is used now just to flush error recovery and reconnect work items in the RDMA and TCP transports, which can simply be moved to the corresponding teardown routines. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 10 1月, 2019 1 次提交
-
-
由 James Dingwall 提交于
If a device provides an NQN it is expected to be globally unique. Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did not satisfy this requirement. In these circumstances if a system has >1 affected device then only one device is enabled. If this quirk is enabled then the device supplied subnqn is ignored and we fallback to generating one as if the field was empty. In this case we also suppress the version check so we don't print a warning when the quirk is enabled. Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJames Dingwall <james@dingwall.me.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 19 12月, 2018 1 次提交
-
-
由 Sagi Grimberg 提交于
Pass poll bool to indicate that we need it to poll. This prepares us for polling support in nvmf since connect is an I/O that will be queued and has to be polled in order to complete. If poll is passed, we call nvme_execute_rq_polled which sends the requests and polls for its completion. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 13 12月, 2018 2 次提交
-
-
由 Jens Axboe 提交于
When boxes are run near (or to) OOM, we have a problem with the discard page allocation in nvme. If we fail allocating the special page, we return busy, and it'll get retried. But since ordering is honored for dispatch requests, we can keep retrying this same IO and failing. Behind that IO could be requests that want to free memory, but they never get the chance. Allocate a fixed discard page per controller for a safe fallback, and use that if the initial allocation fails. Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Chengguang Xu 提交于
Add __exit annotation to cleanup helper which is only called once in the module. Signed-off-by: NChengguang Xu <cgxu519@gmx.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 12月, 2018 1 次提交
-
-
由 Matias Bjørling 提交于
Currently the geometry of an OCSSD is enumerated using a two step approach: First, nvm_register is called, the OCSSD identify command is issued, and second the geometry sos and csecs values are read either from the OCSSD identify if it is a 1.2 drive, or from the NVMe namespace data structure if it is a 2.0 device. This patch recombines it into a single step, such that nvm_register can use the csecs and sos fields independent of which version is used. This enables one to dynamically size the lightnvm subsystem dma pool. Reviewed-by: NIgor Konopko <igor.j.konopko@intel.com> Reviewed-by: NJavier González <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <mb@lightnvm.io> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 12月, 2018 5 次提交
-
-
由 Israel Rukshin 提交于
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
A controller may have an internal state that is not able to successfully process commands for a short duration. In such states, an immediate command requeue is expected to fail. The driver may exceed its max retry count, which permanently ends the command in failure when the same command would succeed after waiting for the controller to be ready. NVMe ratified TP 4033 provides a delay hint in the completion status code for failed commands. Implement the retry delay based on the command completion status and the controller's requested delay. Note that requeued commands are handled per request_queue, not per individual request. If multiple commands fail, the controller should consistently report the desired delay time for retryable commands in all CQEs, otherwise the requeue list may be kicked too soon. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
If the controller supports traffic based keep alive, we restart the keep alive timer if any admin or io commands was completed during the kato period. This prevents a possible starvation of keep alive commands in the presence of heavy traffic as in such case, we already have a health indication from the host perspective. Only set a comp_seen indicator in case the controller supports keep alive to minimize the overhead for pci controllers. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Sagi Grimberg 提交于
We get the controller attributes in identify, cache them as we'll need them for traffic based keep alive support. Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hannes Reinecke 提交于
Instead of directly poking into the struct device add a new numa_node field to struct nvme_ctrl. This allows fabrics drivers where ctrl->dev is a virtual device to support NUMA affinity as well. Also expose the field as a sysfs attribute, and populate it for the RDMA and FC transports. Signed-off-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 12月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Without CONFIG_NVME_MULTIPATH enabled a multi-port subsystem might show up as invididual devices and cause problems, warn about it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 09 11月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
We have this functionality in sbitmap, but we don't export it in blk-mq for users of the tags busy iteration. This can be useful for stopping the iteration, if the caller doesn't need to find more requests. Reviewed-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2018 1 次提交
-
-
由 Logan Gunthorpe 提交于
For P2P requests, we must use the pci_p2pmem_map_sg() function instead of the dma_map_sg functions. With that, we can then indicate PCI_P2P support in the request queue. For this, we create an NVME_F_PCI_P2P flag which tells the core to set QUEUE_FLAG_PCI_P2P in the request queue. Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com>
-
- 02 10月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Make current_path an array with an entry for every possible node, and cache the best path on a per-node basis. Take the node distance into account when selecting it. This is primarily useful for dual-ported PCIe devices which are connected to PCIe root ports on different sockets. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NHannes Reinecke <hare@suse.com>
-
- 28 9月, 2018 1 次提交
-
-
由 Hannes Reinecke 提交于
We should be registering the ns_id attribute as default sysfs attribute groups, otherwise we have a race condition between the uevent and the attributes appearing in sysfs. Suggested-by: NBart van Assche <bvanassche@acm.org> Reviewed-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 7月, 2018 1 次提交
-
-
由 Max Gurtovoy 提交于
Also moved the logic of the remapping to the nvme core driver instead of implementing it in the nvme pci driver. This way all the other nvme transport drivers will benefit from it (in case they'll implement metadata support). Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMax Gurtovoy <maxg@mellanox.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 7月, 2018 3 次提交
-
-
由 Christoph Hellwig 提交于
Add support for Asynchronous Namespace Access as specified in NVMe 1.3 TP 4004. With ANA each namespace attached to a controller belongs to an ANA group that describes the characteristics of accessing the namespaces through this controller. In the optimized and non-optimized states namespaces can be accessed regularly, although in a multi-pathing environment we should always prefer to access a namespace through a controller where an optimized relationship exists. Namespaces in Inaccessible, Permanent-Loss or Change state for a given controller should not be accessed. The states are updated through reading the ANA log page, which is read once during controller initialization, whenever the ANA change notice AEN is received, or when one of the ANA specific status codes that signal a state change is received on a command. The ANA state is kept in the nvme_ns structure, which makes the checks in the fast path very simple. Updating the ANA state when reading the log page is also very simple, the only downside is that finding the initial ANA state when scanning for namespaces is a bit cumbersome. The gendisk for a ns_head is only registered once a live path for it exists. Without that the kernel would hang during partition scanning. Includes fixes and improvements from Hannes Reinecke. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
由 Christoph Hellwig 提交于
Now that we just call out to blk_path_error there isn't really any good reason to not merge it into the only caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
由 Christoph Hellwig 提交于
Merge nvme_get_log and nvme_get_log_ext into a single helper, which takes a plain nsid instead of the nvme_ns pointer. Also add support for the log specific field while we're at it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
- 23 7月, 2018 2 次提交
-
-
由 Keith Busch 提交于
We can not match a command to its completion based on the command id alone. We need the submitting queue identifier to pair with the completion, so this patch adds that to the trace buffer. This patch is also collapsing the admin and IO submission traces into a single one so we don't need to duplicate this and creating unnecessary code branches: we know if the command is an admin vs IO based on the qid. And since we're here, the patch fixes code formatting in the area. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> [hch: move the qid helper to nvme.h and made it an inline function] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
We will need to reference the controller in the setup and completion time for tracing and future traffic based keep alive support. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 22 6月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
nvme requires an sg table allocation for each request. If the request is large, then the allocation can become quite large. For instance, with our default software settings of 1280KB IO size, we'll need 10248 bytes of sg table. That turns into a 2nd order allocation, which we can't always guarantee. If we fail the allocation, blk-mq will retry it later. But there's no guarantee that we'll EVER be able to allocate that much contigious memory. Limit the IO size such that we never need more than a single page of memory. That's a lot faster and more reliable. Then back that allocation with a mempool, so that we know we'll always be able to succeed the allocation at some point. Signed-off-by: NJens Axboe <axboe@kernel.dk> Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 14 6月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Unused now that all transports stopped using it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJens Axboe <axboe@kernel.dk>
-
- 09 6月, 2018 1 次提交
-
-
由 Dan Carpenter 提交于
The problem here is that set_bit() and test_bit() take a bit number so we should be passing 0 but instead we're passing (1 << 0) which leads to a double shift. It doesn't cause a runtime bug in the current code because it's done consistently and we only set that one bit. I decided to just re-use NVME_AER_NOTICE_NS_CHANGED instead of introducing a new define for this. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 6月, 2018 3 次提交
-
-
由 Christoph Hellwig 提交于
Per section 5.2 we need to issue the corresponding log page to clear an AEN, so for a namespace data changed AEN we need to read the changed namespace list log. And once we read that log anyway we might as well use it to optimize the rescan. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
由 Christoph Hellwig 提交于
And move it toward the top of the file to avoid a forward declaration. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
-
由 Hannes Reinecke 提交于
We should register for AEN events; some law-abiding targets might not be sending us AENs otherwise. Signed-off-by: NHannes Reinecke <hare@suse.com> [hch: slight cleanups] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
-
- 23 5月, 2018 1 次提交
-
-
由 Johannes Thumshirn 提交于
When running blktest's nvme/005 with a lockdep enabled kernel the test case fails due to the following lockdep splat in dmesg: ============================= WARNING: suspicious RCU usage 4.17.0-rc5 #881 Not tainted ----------------------------- drivers/nvme/host/nvme.h:457 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by kworker/u32:5/1102: #0: (ptrval) ((wq_completion)"nvme-wq"){+.+.}, at: process_one_work+0x152/0x5c0 #1: (ptrval) ((work_completion)(&ctrl->scan_work)){+.+.}, at: process_one_work+0x152/0x5c0 #2: (ptrval) (&subsys->lock#2){+.+.}, at: nvme_ns_remove+0x43/0x1c0 [nvme_core] The only caller of nvme_mpath_clear_current_path() is nvme_ns_remove() which holds the subsys lock so it's likely a false positive, but when using rcu_access_pointer(), we're telling rcu and lockdep that we're only after the pointer falue. Fixes: 32acab31 ("nvme: implement multipath access to nvme subsystems") Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de> Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <keith.busch@intel.com>
-
- 19 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
We'll need that in the PCIe driver soon as we'll read it straight off the CQ. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 5月, 2018 1 次提交
-
-
由 Jens Axboe 提交于
Some P3100 drives have a bug where they think WRRU (weighted round robin) is always enabled, even though the host doesn't set it. Since they think it's enabled, they also look at the submission queue creation priority. We used to set that to MEDIUM by default, but that was removed in commit 81c1cd98. This causes various issues on that drive. Add a quirk to still set MEDIUM priority for that controller. Fixes: 81c1cd98 ("nvme/pci: Don't set reserved SQ create flags") Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NKeith Busch <keith.busch@intel.com>
-