提交 · 66341331ba0d2de4ff421cdc401a1e34de50502a · openeuler / Kernel

30 8月, 2019 4 次提交

nvme-pci: Add support for Apple 2018+ models · 66341331

由 Benjamin Herrenschmidt 提交于 8月 07, 2019

Based on reverse engineering and original patch by

Paul Pawlowski <paul@mrarm.io>

This adds support for Apple weird implementation of NVME in their
2018 or later machines. It accounts for the twice-as-big SQ entries
for the IO queues, and the fact that only interrupt vector 0 appears
to function properly.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

66341331

nvme: trace bio completion · 35fe0d12

由 Hannes Reinecke 提交于 7月 24, 2019

When native multipathing is enabled we cannot enable blktrace for
the underlying paths, so any completion is never traced.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[fixed-up by Mikhail for non-multipath-build]
Signed-off-by: NMikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

35fe0d12

nvme: don't pass cap to nvme_disable_ctrl · b5b05048

由 Sagi Grimberg 提交于 7月 22, 2019

All seem to call it with ctrl->cap so no need to pass it
at all.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

b5b05048

nvme: move sqsize setting to the core · c0f2f45b

由 Sagi Grimberg 提交于 7月 22, 2019

nvme_enable_ctrl reads the cap register right after, so
no need to do that locally in the transport driver. Have
sqsize setting in nvme_init_identify.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

c0f2f45b

23 7月, 2019 1 次提交

nvme: fix multipath crash when ANA is deactivated · 66b20ac0

由 Marta Rybczynska 提交于 7月 23, 2019

Fix a crash with multipath activated. It happends when ANA log
page is larger than MDTS and because of that ANA is disabled.
The driver then tries to access unallocated buffer when connecting
to a nvme target. The signature is as follows:

[  300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192).
[  300.435387] nvme nvme0: disabling ANA support.
[  300.437835] nvme nvme0: creating 4 I/O queues.
[  300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009
[  300.464609] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  300.466342] #PF error: [normal kernel read fault]
[  300.467385] PGD 0 P4D 0
[  300.467987] Oops: 0000 [#1] SMP PTI
[  300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4
[  300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[  300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[  300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[  300.477374] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
[  300.478334] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
[  300.479784] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
[  300.481488] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
[  300.483203] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
[  300.484928] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
[  300.486626] FS:  0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
[  300.488538] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  300.489907] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
[  300.491612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  300.493303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  300.494991] Call Trace:
[  300.495645]  nvme_mpath_add_disk+0x5c/0xb0 [nvme_core]
[  300.496880]  nvme_validate_ns+0x2ef/0x550 [nvme_core]
[  300.498105]  ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core]
[  300.499539]  nvme_scan_work+0x2b4/0x370 [nvme_core]
[  300.500717]  ? __switch_to_asm+0x35/0x70
[  300.501663]  process_one_work+0x171/0x380
[  300.502340]  worker_thread+0x49/0x3f0
[  300.503079]  kthread+0xf8/0x130
[  300.503795]  ? max_active_store+0x80/0x80
[  300.504690]  ? kthread_bind+0x10/0x10
[  300.505502]  ret_from_fork+0x35/0x40
[  300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix serio_raw libata virtio_pci virtio_ring virtio
[  300.514984] CR2: 0000000000000008
[  300.515569] ---[ end trace faa2eefad7e7f218 ]---
[  300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[  300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[  300.520353] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296
[  300.521229] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000
[  300.522399] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258
[  300.523560] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
[  300.524734] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0
[  300.525915] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0
[  300.527084] FS:  0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000
[  300.528396] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  300.529440] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0
[  300.530739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  300.531989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  300.533264] Kernel panic - not syncing: Fatal exception
[  300.534338] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]---

Condition check refactoring from Christoph Hellwig.
Signed-off-by: NMarta Rybczynska <marta.rybczynska@kalray.eu>
Tested-by: NJean-Baptiste Riaux <jbriaux@kalray.eu>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

66b20ac0

10 7月, 2019 1 次提交

nvme: set physical block size and optimal I/O size · 81adb863

由 Bart Van Assche 提交于 6月 28, 2019

>From the NVMe 1.4 spec:

NSFEAT bit 4 if set to 1: indicates that the fields NPWG, NPWA, NPDG, NPDA,
and NOWS are defined for this namespace and should be used by the host for
I/O optimization;
[ ... ]
Namespace Preferred Write Granularity (NPWG): This field indicates the
smallest recommended write granularity in logical blocks for this namespace.
This is a 0's based value. The size indicated should be less than or equal
to Maximum Data Transfer Size (MDTS) that is specified in units of minimum
memory page size. The value of this field may change if the namespace is
reformatted. The size should be a multiple of Namespace Preferred Write
Alignment (NPWA). Refer to section 8.25 for how this field is utilized to
improve performance and endurance.
[ ... ]
Each Write, Write Uncorrectable, or Write Zeroes commands should address a
multiple of Namespace Preferred Write Granularity (NPWG) (refer to Figure
245) and Stream Write Size (SWS) (refer to Figure 515) logical blocks (as
expressed in the NLB field), and the SLBA field of the command should be
aligned to Namespace Preferred Write Alignment (NPWA) (refer to Figure 245)
for best performance.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

81adb863

21 6月, 2019 3 次提交

nvme: enable to inject errors into admin commands · f79d5fda

由 Akinobu Mita 提交于 6月 09, 2019

This enables to inject errors into the commands submitted to the admin
queue.

It is useful to test error handling in the controller initialization.

	# echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability
	# echo 1 > /sys/kernel/debug/nvme0/fault_inject/times
	# echo 10 > /sys/kernel/debug/nvme0/fault_inject/space
	# nvme reset /dev/nvme0
	# dmesg
	...
	nvme nvme0: Could not set queue count (16385)
	nvme nvme0: IO queues not created
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f79d5fda

nvme: prepare for fault injection into admin commands · a3646451

由 Akinobu Mita 提交于 6月 20, 2019

Currenlty fault injection support for nvme only enables to inject errors
into the commands submitted to I/O queues.

In preparation for fault injection into the admin commands, this makes
the helper functions independent of struct nvme_ns.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a3646451

nvme: export get and set features · 1a87ee65

由 Keith Busch 提交于 5月 27, 2019

Future use intends to make use of both, so export these functions. And
since their implementation is identical except for the opcode, provide a
new function that implement both.

[akinobu.mita@gmail.com>: fix line over 80 characters]
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1a87ee65

18 5月, 2019 1 次提交

nvme-pci: Sync queues on reset · d6135c3a

由 Keith Busch 提交于 5月 14, 2019

A controller with multiple namespaces may have multiple request_queues with
their own timeout work. If a controller fails with IO outstanding to
diffent namespaces, each request queue may attempt to handle it, so
ensure there is no previously scheduled timeout work executing prior to
starting controller initialization by synchronizing with each queue.
Reviewed-by: NMinwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

d6135c3a

01 5月, 2019 1 次提交

nvme: mark nvme_core_init and nvme_core_exit static · 893a74b7

由 Christoph Hellwig 提交于 4月 30, 2019

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

893a74b7

14 3月, 2019 1 次提交

nvme: disable Write Zeroes for qemu controllers · 7b210e4e

由 Christoph Hellwig 提交于 3月 13, 2019

Qemu started out with a broken implementation of Write Zeroes written
by yours truly.  Disable Write Zeroes on qemu for now, eventually
we need to go back and make all the qemu quirks version specific,
but that is left for another time.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Tested-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b210e4e

20 2月, 2019 3 次提交

nvme: convert to SPDX identifiers · bc50ad75

由 Christoph Hellwig 提交于 2月 18, 2019

Update license to use SPDX-License-Identifier instead of verbose license
text.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

bc50ad75

nvme: unexport nvme_delete_ctrl_sync() · d84c4b02

由 Bart Van Assche 提交于 2月 14, 2019

Since nvme_delete_ctrl_sync() is not called from any other kernel module,
unexport it.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d84c4b02

nvme-multipath: round-robin I/O policy · 75c10e73

由 Hannes Reinecke 提交于 2月 18, 2019

Implement a simple round-robin I/O policy for multipathing.  Path
selection is done in two rounds, first iterating across all optimized
paths, and if that doesn't return any valid paths, iterate over all
optimized and non-optimized paths.  If no paths are found, use the
existing algorithm.  Also add a sysfs attribute 'iopolicy' to switch
between the current NUMA-aware I/O policy and the 'round-robin' I/O
policy.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

75c10e73

06 2月, 2019 1 次提交

nvme: lock NS list changes while handling command effects · e7ad43c3

由 Keith Busch 提交于 1月 28, 2019

If a controller supports the NS Change Notification, the namespace
scan_work is automatically triggered after attaching a new namespace.

Occasionally the namespace scan_work may append the new namespace to the
list before the admin command effects handling is completed. The effects
handling unfreezes namespaces, but if it unfreezes the newly attached
namespace, its request_queue freeze depth will be off and we'll hit the
warning in blk_mq_unfreeze_queue().

On the next namespace add, we will fail to freeze that queue due to the
previous bad accounting and deadlock waiting for frozen.

Fix that by preventing scan work from altering the namespace list while
command effects handling needs to pair freeze with unfreeze.
Reported-by: NWen Xiong <wenxiong@us.ibm.com>
Tested-by: NWen Xiong <wenxiong@us.ibm.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e7ad43c3

04 2月, 2019 1 次提交

nvme: remove the .stop_ctrl callout · 794a4cb3

由 Sagi Grimberg 提交于 1月 01, 2019

It is used now just to flush error recovery and reconnect work items in
the RDMA and TCP transports, which can simply be moved to the
corresponding teardown routines.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

794a4cb3

10 1月, 2019 1 次提交

nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN · 6299358d

由 James Dingwall 提交于 1月 08, 2019

If a device provides an NQN it is expected to be globally unique.
Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did
not satisfy this requirement. In these circumstances if a system has >1
affected device then only one device is enabled. If this quirk is enabled
then the device supplied subnqn is ignored and we fallback to generating
one as if the field was empty. In this case we also suppress the version
check so we don't print a warning when the quirk is enabled.
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJames Dingwall <james@dingwall.me.uk>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6299358d

19 12月, 2018 1 次提交

nvme-core: optionally poll sync commands · 6287b51c

由 Sagi Grimberg 提交于 12月 14, 2018

Pass poll bool to indicate that we need it to poll. This prepares us for
polling support in nvmf since connect is an I/O that will be queued
and has to be polled in order to complete. If poll is passed,
we call nvme_execute_rq_polled which sends the requests and polls
for its completion.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6287b51c

13 12月, 2018 2 次提交

nvme: provide fallback for discard alloc failure · cb5b7262

由 Jens Axboe 提交于 12月 12, 2018

When boxes are run near (or to) OOM, we have a problem with the discard
page allocation in nvme. If we fail allocating the special page, we
return busy, and it'll get retried. But since ordering is honored for
dispatch requests, we can keep retrying this same IO and failing. Behind
that IO could be requests that want to free memory, but they never get
the chance.

Allocate a fixed discard page per controller for a safe fallback, and use
that if the initial allocation fails.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cb5b7262

nvme: add __exit annotation · 8eb5d89f

由 Chengguang Xu 提交于 12月 11, 2018

Add __exit annotation to cleanup helper which is only
called once in the module.
Signed-off-by: NChengguang Xu <cgxu519@gmx.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8eb5d89f

12 12月, 2018 1 次提交

lightnvm: simplify geometry enumeration · 85136c01

由 Matias Bjørling 提交于 12月 11, 2018

Currently the geometry of an OCSSD is enumerated using a two step
approach:

First, nvm_register is called, the OCSSD identify command is issued,
and second the geometry sos and csecs values are read either from the
OCSSD identify if it is a 1.2 drive, or from the NVMe namespace data
structure if it is a 2.0 device.

This patch recombines it into a single step, such that nvm_register can
use the csecs and sos fields independent of which version is used. This
enables one to dynamically size the lightnvm subsystem dma pool.
Reviewed-by: NIgor Konopko <igor.j.konopko@intel.com>
Reviewed-by: NJavier González <javier@cnexlabs.com>
Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85136c01

08 12月, 2018 5 次提交

nvme: remove unused function nvme_ctrl_ready · 3236b458

由 Israel Rukshin 提交于 12月 03, 2018

Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3236b458

nvme: implement Enhanced Command Retry · 49cd84b6

由 Keith Busch 提交于 11月 27, 2018

A controller may have an internal state that is not able to successfully
process commands for a short duration. In such states, an immediate
command requeue is expected to fail. The driver may exceed its max
retry count, which permanently ends the command in failure when the same
command would succeed after waiting for the controller to be ready.

NVMe ratified TP 4033 provides a delay hint in the completion status
code for failed commands. Implement the retry delay based on the command
completion status and the controller's requested delay.

Note that requeued commands are handled per request_queue, not per
individual request. If multiple commands fail, the controller should
consistently report the desired delay time for retryable commands in
all CQEs, otherwise the requeue list may be kicked too soon.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49cd84b6

nvme: support traffic based keep-alive · 6e3ca03e

由 Sagi Grimberg 提交于 11月 02, 2018

If the controller supports traffic based keep alive, we restart the keep
alive timer if any admin or io commands was completed during the kato
period.  This prevents a possible starvation of keep alive commands in
the presence of heavy traffic as in such case, we already have a health
indication from the host perspective.

Only set a comp_seen indicator in case the controller supports keep
alive to minimize the overhead for pci controllers.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e3ca03e

nvme: cache controller attributes · 3e53ba38

由 Sagi Grimberg 提交于 11月 02, 2018

We get the controller attributes in identify, cache them as we'll need
them for traffic based keep alive support.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e53ba38

nvme: add a numa_node field to struct nvme_ctrl · 103e515e

由 Hannes Reinecke 提交于 11月 16, 2018

Instead of directly poking into the struct device add a new numa_node
field to struct nvme_ctrl.  This allows fabrics drivers where ctrl->dev
is a virtual device to support NUMA affinity as well.

Also expose the field as a sysfs attribute, and populate it for the
RDMA and FC transports.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

103e515e

01 12月, 2018 1 次提交

nvme: warn when finding multi-port subsystems without multipathing enabled · 14a1336e

由 Christoph Hellwig 提交于 11月 20, 2018

Without CONFIG_NVME_MULTIPATH enabled a multi-port subsystem might
show up as invididual devices and cause problems, warn about it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

14a1336e

09 11月, 2018 1 次提交

blk-mq-tag: change busy_iter_fn to return whether to continue or not · 7baa8572

由 Jens Axboe 提交于 11月 08, 2018

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7baa8572

18 10月, 2018 1 次提交

nvme-pci: Add support for P2P memory in requests · e0596ab2

由 Logan Gunthorpe 提交于 10月 04, 2018

For P2P requests, we must use the pci_p2pmem_map_sg() function instead of
the dma_map_sg functions.

With that, we can then indicate PCI_P2P support in the request queue.  For
this, we create an NVME_F_PCI_P2P flag which tells the core to set
QUEUE_FLAG_PCI_P2P in the request queue.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

e0596ab2

02 10月, 2018 1 次提交

nvme: take node locality into account when selecting a path · f3334447

由 Christoph Hellwig 提交于 9月 11, 2018

Make current_path an array with an entry for every possible node, and
cache the best path on a per-node basis.  Take the node distance into
account when selecting it.  This is primarily useful for dual-ported PCIe
devices which are connected to PCIe root ports on different sockets.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>

f3334447

28 9月, 2018 1 次提交

nvme: register ns_id attributes as default sysfs groups · 33b14f67

由 Hannes Reinecke 提交于 9月 28, 2018

We should be registering the ns_id attribute as default sysfs
attribute groups, otherwise we have a race condition between
the uevent and the attributes appearing in sysfs.
Suggested-by: NBart van Assche <bvanassche@acm.org>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

33b14f67

30 7月, 2018 1 次提交

nvme: use blk API to remap ref tags for IOs with metadata · f7f1fc36

由 Max Gurtovoy 提交于 7月 30, 2018

Also moved the logic of the remapping to the nvme core driver instead
of implementing it in the nvme pci driver. This way all the other nvme
transport drivers will benefit from it (in case they'll implement metadata
support).
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f7f1fc36

28 7月, 2018 3 次提交

nvme: add ANA support · 0d0b660f

由 Christoph Hellwig 提交于 5月 14, 2018

Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004. With ANA each namespace attached to a controller belongs to an
ANA group that describes the characteristics of accessing the namespaces
through this controller. In the optimized and non-optimized states
namespaces can be accessed regularly, although in a multi-pathing
environment we should always prefer to access a namespace through a
controller where an optimized relationship exists. Namespaces in
Inaccessible, Permanent-Loss or Change state for a given controller
should not be accessed.

The states are updated through reading the ANA log page, which is read
once during controller initialization, whenever the ANA change notice
AEN is received, or when one of the ANA specific status codes that
signal a state change is received on a command.

The ANA state is kept in the nvme_ns structure, which makes the checks in
the fast path very simple. Updating the ANA state when reading the log
page is also very simple, the only downside is that finding the initial
ANA state when scanning for namespaces is a bit cumbersome.

The gendisk for a ns_head is only registered once a live path for it
exists. Without that the kernel would hang during partition scanning.

Includes fixes and improvements from Hannes Reinecke.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

0d0b660f

nvme: remove nvme_req_needs_failover · 8decf5d5

由 Christoph Hellwig 提交于 6月 04, 2018

Now that we just call out to blk_path_error there isn't really any good
reason to not merge it into the only caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

8decf5d5

nvme: simplify the API for getting log pages · 0e98719b

由 Christoph Hellwig 提交于 6月 06, 2018

Merge nvme_get_log and nvme_get_log_ext into a single helper, which takes
a plain nsid instead of the nvme_ns pointer.  Also add support for the
log specific field while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

0e98719b

23 7月, 2018 2 次提交

nvme: use hw qid in trace events · 5d87eb94

由 Keith Busch 提交于 6月 29, 2018

We can not match a command to its completion based on the command
id alone. We need the submitting queue identifier to pair with the
completion, so this patch adds that to the trace buffer.

This patch is also collapsing the admin and IO submission traces into a
single one so we don't need to duplicate this and creating unnecessary
code branches: we know if the command is an admin vs IO based on the qid.

And since we're here, the patch fixes code formatting in the area.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
[hch: move the qid helper to nvme.h and made it an inline function]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5d87eb94

nvme: cache struct nvme_ctrl reference to struct nvme_request · 59e29ce6

由 Sagi Grimberg 提交于 6月 29, 2018

We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

59e29ce6

22 6月, 2018 1 次提交

nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e

由 Jens Axboe 提交于 6月 21, 2018

nvme requires an sg table allocation for each request. If the request
is large, then the allocation can become quite large. For instance,
with our default software settings of 1280KB IO size, we'll need
10248 bytes of sg table. That turns into a 2nd order allocation,
which we can't always guarantee. If we fail the allocation, blk-mq
will retry it later. But there's no guarantee that we'll EVER be
able to allocate that much contigious memory.

Limit the IO size such that we never need more than a single page
of memory. That's a lot faster and more reliable. Then back that
allocation with a mempool, so that we know we'll always be able
to succeed the allocation at some point.
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

943e942e

14 6月, 2018 1 次提交

nvme: remove nvme_reinit_tagset · 14dfa400

由 Christoph Hellwig 提交于 6月 14, 2018

Unused now that all transports stopped using it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

14dfa400

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功