提交 · 2b24e6f63ac9e817630424c6d8f008256348dfc4 · openeuler / Kernel

14 3月, 2019 7 次提交

nvme: add proper write zeroes setup for the multipath device · 9f0916ab

由 Christoph Hellwig 提交于 3月 13, 2019

Add a gendisk argument to nvme_config_write_zeroes so that the call to
nvme_update_disk_info for the multipath device node updates the
proper request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9f0916ab

nvme: add proper discard setup for the multipath device · 26318571

由 Christoph Hellwig 提交于 3月 13, 2019

Add a gendisk argument to nvme_config_discard so that the call to
nvme_update_disk_info for the multipath device node updates the
proper request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

26318571

nvme: remove nvme_ns_config_oncs · b1aafb35

由 Christoph Hellwig 提交于 3月 13, 2019

Just opencode the two function calls in the caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1aafb35

nvme: disable Write Zeroes for qemu controllers · 7b210e4e

由 Christoph Hellwig 提交于 3月 13, 2019

Qemu started out with a broken implementation of Write Zeroes written
by yours truly.  Disable Write Zeroes on qemu for now, eventually
we need to go back and make all the qemu quirks version specific,
but that is left for another time.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b210e4e

nvme: update comment to make the code easier to read · 01fc08ff

由 Yufen Yu 提交于 3月 13, 2019

After commit a686ed75 ("nvme: introduce a helper function for
controller deletion), nvme_delete_ctrl_sync no longer use flush_work.
Update comment, accordingly.
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

01fc08ff

nvme: put ns_head ref if namespace fails allocation · a63b8370

由 Sagi Grimberg 提交于 3月 13, 2019

In case nvme_alloc_ns fails after we initialize ns_head but before we
add the ns to the controller namespaces list we need to explicitly put
the ns_head reference because when we teardown the controller we
won't find it, causing us to leak a dangling subsystem eventually.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a63b8370

nvme: don't warn on block content change effects · 415df90b

由 Keith Busch 提交于 3月 13, 2019

A write or flush IO passthrough command is expected to change the
logical block content, so don't warn on these as no additional handling
is necessary.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

415df90b

20 2月, 2019 6 次提交

nvme: convert to SPDX identifiers · bc50ad75

由 Christoph Hellwig 提交于 2月 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

bc50ad75

nvme: return error from nvme_alloc_ns() · ab4ab09c

由 Hannes Reinecke 提交于 2月 19, 2019

nvme_alloc_ns() might fail, so we should be returning an error code.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ab4ab09c

nvme: avoid that deleting a controller triggers a circular locking complaint · b9c77583

由 Bart Van Assche 提交于 2月 14, 2019

Rework nvme_delete_ctrl_sync() such that it does not have to wait for
queued work. This patch avoids that test nvme/008 triggers the following
complaint:

WARNING: possible circular locking dependency detected
5.0.0-rc6-dbg+ #10 Not tainted
------------------------------------------------------
nvme/7918 is trying to acquire lock:
000000009a1a7b69 ((work_completion)(&ctrl->delete_work)){+.+.}, at: __flush_work+0x379/0x410

but task is already holding lock:
00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (kn->count#389){++++}:
       lock_acquire+0xc5/0x1e0
       __kernfs_remove+0x42a/0x4a0
       kernfs_remove_by_name_ns+0x45/0x90
       remove_files.isra.1+0x3a/0x90
       sysfs_remove_group+0x5c/0xc0
       sysfs_remove_groups+0x39/0x60
       device_remove_attrs+0x68/0xb0
       device_del+0x24d/0x570
       cdev_device_del+0x1a/0x50
       nvme_delete_ctrl_work+0xbd/0xe0
       process_one_work+0x4f1/0xa40
       worker_thread+0x67/0x5b0
       kthread+0x1cf/0x1f0
       ret_from_fork+0x24/0x30

-> #0 ((work_completion)(&ctrl->delete_work)){+.+.}:
       __lock_acquire+0x1323/0x17b0
       lock_acquire+0xc5/0x1e0
       __flush_work+0x399/0x410
       flush_work+0x10/0x20
       nvme_delete_ctrl_sync+0x65/0x70
       nvme_sysfs_delete+0x4f/0x60
       dev_attr_store+0x3e/0x50
       sysfs_kf_write+0x87/0xa0
       kernfs_fop_write+0x186/0x240
       __vfs_write+0xd7/0x430
       vfs_write+0xfa/0x260
       ksys_write+0xab/0x130
       __x64_sys_write+0x43/0x50
       do_syscall_64+0x71/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(kn->count#389);
                               lock((work_completion)(&ctrl->delete_work));
                               lock(kn->count#389);
  lock((work_completion)(&ctrl->delete_work));

 *** DEADLOCK ***

3 locks held by nvme/7918:
 #0: 00000000e2223b44 (sb_writers#6){.+.+}, at: vfs_write+0x1eb/0x260
 #1: 000000003404976f (&of->mutex){+.+.}, at: kernfs_fop_write+0x128/0x240
 #2: 00000000ef5a45b4 (kn->count#389){++++}, at: kernfs_remove_self+0x196/0x210

stack backtrace:
CPU: 4 PID: 7918 Comm: nvme Not tainted 5.0.0-rc6-dbg+ #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 dump_stack+0x86/0xca
 print_circular_bug.isra.36.cold.54+0x173/0x1d5
 check_prev_add.constprop.45+0x996/0x1110
 __lock_acquire+0x1323/0x17b0
 lock_acquire+0xc5/0x1e0
 __flush_work+0x399/0x410
 flush_work+0x10/0x20
 nvme_delete_ctrl_sync+0x65/0x70
 nvme_sysfs_delete+0x4f/0x60
 dev_attr_store+0x3e/0x50
 sysfs_kf_write+0x87/0xa0
 kernfs_fop_write+0x186/0x240
 __vfs_write+0xd7/0x430
 vfs_write+0xfa/0x260
 ksys_write+0xab/0x130
 __x64_sys_write+0x43/0x50
 do_syscall_64+0x71/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b9c77583

nvme: introduce a helper function for controller deletion · a686ed75

由 Bart Van Assche 提交于 2月 14, 2019

This patch does not change any functionality but makes the next patch
in this series easier to read.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a686ed75

nvme: unexport nvme_delete_ctrl_sync() · d84c4b02

由 Bart Van Assche 提交于 2月 14, 2019

Since nvme_delete_ctrl_sync() is not called from any other kernel module,
unexport it.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d84c4b02

nvme-multipath: round-robin I/O policy · 75c10e73

由 Hannes Reinecke 提交于 2月 18, 2019

Implement a simple round-robin I/O policy for multipathing.  Path
selection is done in two rounds, first iterating across all optimized
paths, and if that doesn't return any valid paths, iterate over all
optimized and non-optimized paths.  If no paths are found, use the
existing algorithm.  Also add a sysfs attribute 'iopolicy' to switch
between the current NUMA-aware I/O policy and the 'round-robin' I/O
policy.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

75c10e73

06 2月, 2019 1 次提交

nvme: lock NS list changes while handling command effects · e7ad43c3

由 Keith Busch 提交于 1月 28, 2019

If a controller supports the NS Change Notification, the namespace
scan_work is automatically triggered after attaching a new namespace.

Occasionally the namespace scan_work may append the new namespace to the
list before the admin command effects handling is completed. The effects
handling unfreezes namespaces, but if it unfreezes the newly attached
namespace, its request_queue freeze depth will be off and we'll hit the
warning in blk_mq_unfreeze_queue().

On the next namespace add, we will fail to freeze that queue due to the
previous bad accounting and deadlock waiting for frozen.

Fix that by preventing scan work from altering the namespace list while
command effects handling needs to pair freeze with unfreeze.
Reported-by: NWen Xiong <wenxiong@us.ibm.com>
Tested-by: NWen Xiong <wenxiong@us.ibm.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e7ad43c3

04 2月, 2019 2 次提交

nvme: remove the .stop_ctrl callout · 794a4cb3

由 Sagi Grimberg 提交于 1月 01, 2019

It is used now just to flush error recovery and reconnect work items in
the RDMA and TCP transports, which can simply be moved to the
corresponding teardown routines.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

794a4cb3

nvme: add support for the Write Zeroes command · 6e02318e

由 Chaitanya Kulkarni 提交于 12月 17, 2018

Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block
device, if the device supports an optional command bit set for write
zeroes. Add support to setup write zeroes command. Set maximum possible
write zeroes sectors in one write zeroes command according to
nvme write zeroes command definition.

This patch was posted as a part of block-write-zeroes support
implementation (https://patchwork.kernel.org/patch/9454859/),
but did not make into mainline kernel as it got reverted due to
failure on the Linus's machine.

In this patch in order to be more cautious, we use NVMe controller's
maximum hardware sector size which is calculated based on the
controller's MDTS (Maximum Data Transfer Size) field to calculate
the maximum sectors for the write zeroes request.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
[folded a fix from Keith Busch to properly respect
 NVME_QUIRK_DEALLOCATE_ZEROES]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6e02318e

10 1月, 2019 3 次提交

nvme: don't initlialize ctrl->cntlid twice · b8a38ea6

由 Andrey Smirnov 提交于 1月 07, 2019

ctrl->cntlid will already be initialized from id->cntlid for
non-NVME_F_FABRICS controllers few lines below. For NVME_F_FABRICS
controllers this field should already be initialized, otherwise the
check

	if (ctrl->cntlid != le16_to_cpu(id->cntlid))

below will always be a no-op.
Signed-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b8a38ea6

nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN · 6299358d

由 James Dingwall 提交于 1月 08, 2019

If a device provides an NQN it is expected to be globally unique.
Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did
not satisfy this requirement. In these circumstances if a system has >1
affected device then only one device is enabled. If this quirk is enabled
then the device supplied subnqn is ignored and we fallback to generating
one as if the field was empty. In this case we also suppress the version
check so we don't print a warning when the quirk is enabled.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJames Dingwall <james@dingwall.me.uk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6299358d

nvme: pad fake subsys NQN vid and ssvid with zeros · 3da584f5

由 Keith Busch 提交于 1月 08, 2019

We need to preserve the leading zeros in the vid and ssvid when generating
a unique NQN. Truncating these may lead to naming collisions.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3da584f5

19 12月, 2018 1 次提交

nvme-core: optionally poll sync commands · 6287b51c

由 Sagi Grimberg 提交于 12月 14, 2018

Pass poll bool to indicate that we need it to poll. This prepares us for
polling support in nvmf since connect is an I/O that will be queued
and has to be polled in order to complete. If poll is passed,
we call nvme_execute_rq_polled which sends the requests and polls
for its completion.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6287b51c

14 12月, 2018 1 次提交

nvme: fix kernel paging oops · 092ff052

由 Sagi Grimberg 提交于 12月 13, 2018

free the controller discard_page correctly.

Fixes: cb5b7262 ("nvme: provide fallback for discard alloc failure")
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

092ff052

13 12月, 2018 3 次提交

nvme: remove nvme_common command cdw10 array · b7c8f366

由 Chaitanya Kulkarni 提交于 12月 12, 2018

This is a preparation patch which removes the nvme common command cdw10
array and replace with individual fields. This is needed for the nvmet
error log page implementation make is error log page entry offset
assignment easier.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b7c8f366

nvme: provide fallback for discard alloc failure · cb5b7262

由 Jens Axboe 提交于 12月 12, 2018

When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in nvme. If we fail allocating the special page, we
return busy, and it'll get retried. But since ordering is honored for
dispatch requests, we can keep retrying this same IO and failing. Behind
that IO could be requests that want to free memory, but they never get
the chance.

Allocate a fixed discard page per controller for a safe fallback, and use
that if the initial allocation fails.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cb5b7262

nvme: add __exit annotation · 8eb5d89f

由 Chengguang Xu 提交于 12月 11, 2018

Add __exit annotation to cleanup helper which is only
called once in the module.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8eb5d89f

12 12月, 2018 1 次提交

lightnvm: simplify geometry enumeration · 85136c01

由 Matias Bjørling 提交于 12月 11, 2018

Currently the geometry of an OCSSD is enumerated using a two step
approach:

First, nvm_register is called, the OCSSD identify command is issued,
and second the geometry sos and csecs values are read either from the
OCSSD identify if it is a 1.2 drive, or from the NVMe namespace data
structure if it is a 2.0 device.

This patch recombines it into a single step, such that nvm_register can
use the csecs and sos fields independent of which version is used. This
enables one to dynamically size the lightnvm subsystem dma pool.
Reviewed-by: NIgor Konopko <igor.j.konopko@intel.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85136c01

08 12月, 2018 6 次提交

nvme: implement Enhanced Command Retry · 49cd84b6

由 Keith Busch 提交于 11月 27, 2018

A controller may have an internal state that is not able to successfully
process commands for a short duration. In such states, an immediate
command requeue is expected to fail. The driver may exceed its max
retry count, which permanently ends the command in failure when the same
command would succeed after waiting for the controller to be ready.

NVMe ratified TP 4033 provides a delay hint in the completion status
code for failed commands. Implement the retry delay based on the command
completion status and the controller's requested delay.

Note that requeued commands are handled per request_queue, not per
individual request. If multiple commands fail, the controller should
consistently report the desired delay time for retryable commands in
all CQEs, otherwise the requeue list may be kicked too soon.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49cd84b6

nvme: Remove unused forward declaration · 5c4072ad

由 Israel Rukshin 提交于 11月 19, 2018

Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c4072ad

nvme: support traffic based keep-alive · 6e3ca03e

由 Sagi Grimberg 提交于 11月 02, 2018

If the controller supports traffic based keep alive, we restart the keep
alive timer if any admin or io commands was completed during the kato
period.  This prevents a possible starvation of keep alive commands in
the presence of heavy traffic as in such case, we already have a health
indication from the host perspective.

Only set a comp_seen indicator in case the controller supports keep
alive to minimize the overhead for pci controllers.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e3ca03e

nvme: cache controller attributes · 3e53ba38

由 Sagi Grimberg 提交于 11月 02, 2018

We get the controller attributes in identify, cache them as we'll need
them for traffic based keep alive support.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e53ba38

nvme: add a numa_node field to struct nvme_ctrl · 103e515e

由 Hannes Reinecke 提交于 11月 16, 2018

Instead of directly poking into the struct device add a new numa_node
field to struct nvme_ctrl.  This allows fabrics drivers where ctrl->dev
is a virtual device to support NUMA affinity as well.

Also expose the field as a sysfs attribute, and populate it for the
RDMA and FC transports.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

103e515e

nvme: consolidate memset calls in the nvme_setup_cmd path · 11902035

由 Chaitanya Kulkarni 提交于 10月 29, 2018

In function nvme_setup_cmd() we call command specific setup function
for flush, rw, and discard. Instead of calling memset in each function
lets call it once in the parent function.

This is purely code cleanup patch and it does not change any existing
functionality.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

11902035

07 12月, 2018 1 次提交

nvme: validate controller state before rescheduling keep alive · 86880d64

由 James Smart 提交于 11月 27, 2018

Delete operations are seeing NULL pointer references in call_timer_fn.
Tracking these back, the timer appears to be the keep alive timer.

nvme_keep_alive_work() which is tied to the timer that is cancelled
by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
wait for it's completion. So nvme_stop_keep_alive() only stops a timer
when it's pending. When a keep alive is in flight, there is no timer
running and the nvme_stop_keep_alive() will have no affect on the keep
alive io. Thus, if the io completes successfully, the keep alive timer
will be rescheduled. In the failure case, delete is called, the
controller state is changed, the nvme_stop_keep_alive() is called while
the io is outstanding, and the delete path continues on. The keep
alive happens to successfully complete before the delete paths mark it
as aborted as part of the queue termination, so the timer is restarted.
The delete paths then tear down the controller, and later on the timer
code fires and the timer entry is now corrupt.

Fix by validating the controller state before rescheduling the keep
alive. Testing with the fix has confirmed the condition above was hit.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

86880d64

01 12月, 2018 1 次提交

nvme: flush namespace scanning work just before removing namespaces · f6c8e432

由 Sagi Grimberg 提交于 11月 21, 2018

nvme_stop_ctrl can be called also for reset flow and there is no need to
flush the scan_work as namespaces are not being removed. This can cause
deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
before controller teardown (and specifically I/O cancellation of the
scan_work itself) takes place, but the scan_work will be blocked anyways
so there is no need to flush it.

Instead, move scan_work flush to nvme_remove_namespaces() where it really
needs to flush.
Reported-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed by: James Smart <jsmart2021@gmail.com>
Tested-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f6c8e432

28 11月, 2018 1 次提交

nvme-pci: fix surprise removal · 751a0cc0

由 Igor Konopko 提交于 11月 23, 2018

When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
blk_cleanup_queue() on the admin queue, which frees the hctx for that
queue.  Moments later, on the same path nvme_kill_queues() calls
blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
which leads to following OOPS:

Oops: 0000 [#1] SMP PTI
RIP: 0010:sbitmap_any_bit_set+0xb/0x40
Call Trace:
 blk_mq_run_hw_queue+0xd5/0x150
 blk_mq_run_hw_queues+0x3a/0x50
 nvme_kill_queues+0x26/0x50
 nvme_remove_namespaces+0xb2/0xc0
 nvme_remove+0x60/0x140
 pci_device_remove+0x3b/0xb0

Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
Signed-off-by: NIgor Konopko <igor.j.konopko@intel.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

751a0cc0

27 11月, 2018 1 次提交

nvme: Free ctrl device name on init failure · d6a2b953

由 Keith Busch 提交于 11月 26, 2018

Free the kobject name that was allocated for the controller device on
failure rather than its parent.

Fixes: d22524a4 ("nvme: switch controller refcounting to use struct device")
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d6a2b953

09 11月, 2018 2 次提交

nvme: make sure ns head inherits underlying device limits · 8f676b85

由 Sagi Grimberg 提交于 11月 02, 2018

Whenever we update ns_head info, we need to make sure it is still
compatible with all underlying backing devices because although nvme
multipath doesn't have any explicit use of these limits, other devices
can still be stacked on top of it which may rely on the underlying limits.
Start with unlimited stacking limits, and every info update iterate over
siblings and adjust queue limits.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f676b85

blk-mq-tag: change busy_iter_fn to return whether to continue or not · 7baa8572

由 Jens Axboe 提交于 11月 08, 2018

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7baa8572

18 10月, 2018 1 次提交

nvme-pci: Add support for P2P memory in requests · e0596ab2

由 Logan Gunthorpe 提交于 10月 04, 2018

For P2P requests, we must use the pci_p2pmem_map_sg() function instead of
the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.  For
this, we create an NVME_F_PCI_P2P flag which tells the core to set
QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

e0596ab2

17 10月, 2018 2 次提交

nvme-core: make implicit seed truncation explicit · 202359c0

由 Bart Van Assche 提交于 10月 10, 2018

The nvme_user_io.slba field is 64 bits wide. That value is copied into the
32-bit bio_integrity_payload.bip_iter.bi_sector field. Make that truncation
explicit to avoid that Coverity complains about implicit truncation. See
also Coverity ID 1056486 on http://scan.coverity.com/projects/linux.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

202359c0

nvme-core: rework a NQN copying operation · bb2a1d4e

由 Bart Van Assche 提交于 10月 08, 2018

Although it is easy to see that the code in nvme_init_subnqn() guarantees that
the subsys->nqn string is '\0'-terminated, apparently Coverity is not smart
enough to see this. Make it easier for Coverity to analyze this code by changing
the strncpy() call into a strlcpy() call. This patch does not change the
behavior of the code but fixes Coveritiy ID 1423720.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bb2a1d4e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功