提交 · aa41d2fe60ee2e4452b0f9ca9f0f6d80a4ff9f9d · openeuler / Kernel

31 5月, 2022 1 次提交

nvme: set controller enable bit in a separate write · aa41d2fe

由 Niklas Cassel 提交于 5月 26, 2022

The NVM Express Base Specification 2.0 specifies in the description
of the CC – Controller Configuration register:
"Host software shall set the Arbitration Mechanism Selected (CC.AMS),
the Memory Page Size (CC.MPS), and the I/O Command Set Selected (CC.CSS)
to valid values prior to enabling the controller by setting CC.EN to ‘1’.

While we haven't seen any controller misbehaving while setting all bits
in a single write, let's do it in the order that it is written in the
spec, as there could potentially be controllers that are implemented to
rely on the configuration bits being set before enabling the controller.
Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

aa41d2fe

20 5月, 2022 1 次提交

nvme: set non-mdts limits in nvme_scan_work · 78288665

由 Chaitanya Kulkarni 提交于 5月 18, 2022

In current implementation we set the non-mdts limits by calling
nvme_init_non_mdts_limits() from nvme_init_ctrl_finish().
This also tries to set the limits for the discovery controller which
has no I/O queues resulting in the warning message reported by the
nvme_log_error() when running blktest nvme/002: -

[ 2005.155946] run blktests nvme/002 at 2022-04-09 16:57:47
[ 2005.192223] loop: module loaded
[ 2005.196429] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
[ 2005.200334] nvmet: adding nsid 1 to subsystem blktests-subsystem-1

<------------------------------SNIP---------------------------------->

[ 2008.958108] nvmet: adding nsid 1 to subsystem blktests-subsystem-997
[ 2008.962082] nvmet: adding nsid 1 to subsystem blktests-subsystem-998
[ 2008.966102] nvmet: adding nsid 1 to subsystem blktests-subsystem-999
[ 2008.973132] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN testhostnqn.
*[ 2008.973196] nvme1: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) MORE DNR*
[ 2008.974595] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery"
[ 2009.103248] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

Move the call of nvme_init_non_mdts_limits() to nvme_scan_work() after
we verify that I/O queues are created since that is a converging point
for each transport where these limits are actually used.

1. FC :
nvme_fc_create_association()
 ...
 nvme_fc_create_io_queues(ctrl);
 ...
 nvme_start_ctrl()
  nvme_scan_queue()
   nvme_scan_work()

2. PCIe:-
nvme_reset_work()
 ...
 nvme_setup_io_queues()
  nvme_create_io_queues()
   nvme_alloc_queue()
 ...
 nvme_start_ctrl()
  nvme_scan_queue()
   nvme_scan_work()

3. RDMA :-
nvme_rdma_setup_ctrl
 ...
  nvme_rdma_configure_io_queues
  ...
  nvme_start_ctrl()
   nvme_scan_queue()
    nvme_scan_work()

4. TCP :-
nvme_tcp_setup_ctrl
 ...
  nvme_tcp_configure_io_queues
  ...
  nvme_start_ctrl()
   nvme_scan_queue()
    nvme_scan_work()

* nvme_scan_work()
...
nvme_validate_or_alloc_ns()
  nvme_alloc_ns()
   nvme_update_ns_info()
    nvme_update_disk_info()
     nvme_config_discard() <---
     blk_queue_max_write_zeroes_sectors() <---
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

78288665

19 5月, 2022 1 次提交

nvme: add support for TP4084 - Time-to-Ready Enhancements · 354201c5

由 Christoph Hellwig 提交于 5月 16, 2022

Add support for using longer timeouts during controller initialization
and letting the controller come up with namespaces that are not ready
for I/O yet.  We skip these not ready namespaces during scanning and
only bring them online once anoter scan is kicked off by the AEN that
is set when the NRDY bit gets set in the  I/O Command Set Independent
Identify Namespace Data Structure.   This asynchronous probing avoids
blocking the kernel boot when controllers take a very long time to
recover after unclean shutdowns (up to minutes).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>

354201c5

16 5月, 2022 3 次提交

nvme: mark internal passthru request RQF_QUIET · 128126a7

由 Chaitanya Kulkarni 提交于 4月 19, 2022

Most of the internal passthru commands use __nvme_submit_sync_cmd()
interface. There are few places we open code the request submission :-

1. nvme_keep_alive_work(struct work_struct *work)
2. nvme_timeout(struct request *req, bool reserved)
3. nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)

Mark the internal passthru request quiet so that we can skip the verbose
error message from nvme_log_error() in nvme_end_req() completion path,
this will be consistent with what we have in __nvme_submit_sync_cmd().
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NAlan Adamson <alan.adamson@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

128126a7

nvme: set dma alignment to dword · 52fde2c0

由 Keith Busch 提交于 5月 04, 2022

The nvme specification only requires qword alignment for segment
descriptors, and the driver already guarantees that. The spec has always
allowed user data to be dword aligned, which is what the queue's
attribute is for, so relax the alignment requirement to that value.

While we could allow byte alignment for some controllers when using
SGLs, we still need to support PRP, and that only allows dword.

Fixes: 3b2a1ebc ("nvme: set dma alignment to qword")
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

52fde2c0

nvme: fix interpretation of DMRSL · 1a86924e

由 Tom Yan 提交于 4月 29, 2022

DMRSLl is in the unit of logical blocks, while max_discard_sectors is
in the unit of "linux sector".
Signed-off-by: NTom Yan <tom.ty89@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1a86924e

04 5月, 2022 1 次提交

nvme: remove a spurious clear of discard_alignment · 4e7f0ece

由 Christoph Hellwig 提交于 4月 18, 2022

The nvme driver never sets a discard_alignment, so it also doens't need
to clear it to zero.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

4e7f0ece

18 4月, 2022 1 次提交

block: remove QUEUE_FLAG_DISCARD · 70200574

由 Christoph Hellwig 提交于 4月 15, 2022

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

70200574

15 4月, 2022 2 次提交

nvme: add a quirk to disable namespace identifiers · 00ff400e

由 Christoph Hellwig 提交于 4月 11, 2022

Add a quirk to disable using and exporting namespace identifiers for
controllers where they are broken beyond repair.

The most directly visible problem with non-unique namespace identifiers
is that they break the /dev/disk/by-id/ links, with the link for a
supposedly unique identifier now pointing to one of multiple possible
namespaces that share the same ID, and a somewhat random selection of
which one actually shows up.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

00ff400e

nvme: don't print verbose errors for internal passthrough requests · b42b6f44

由 Chaitanya Kulkarni 提交于 4月 10, 2022

Use the RQF_QUIET flag to skip the newly added verbose error reporting,
and set the flag in __nvme_submit_sync_cmd, which is used for most
internal passthrough requests where we do expect errors (e.g. due to
probing for optional functionality).  This is similar to what the SCSI
verbose error logging does.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NAlan Adamson <alan.adamson@oracle.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Tested-by: NAlan Adamson <alan.adamson@oracle.com>
Tested-by: NYi Zhang <yi.zhang@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b42b6f44

29 3月, 2022 3 次提交

nvme-multipath: fix hang when disk goes live over reconnect · a4a6f3c8

由 Anton Eidelman 提交于 3月 24, 2022

nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a
fresh ANA log from the ctrl.  This is essential to have an up to date
path states for both existing namespaces and for those scan_work may
discover once the ctrl is up.

This happens in the following cases:
  1) A new ctrl is being connected.
  2) An existing ctrl is successfully reconnected.
  3) An existing ctrl is being reset.

While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and
nvme_read_ana_log() may call nvme_update_ns_ana_state().

This result in a hang when the ANA state of an existing namespace changes
and makes the disk live: nvme_mpath_set_live() issues IO to the namespace
through the ctrl, which does NOT have IO queues yet.

See sample hang below.

Solution:
- nvme_update_ns_ana_state() to call set_live only if ctrl is live
- nvme_read_ana_log() call from nvme_mpath_init_identify()
  therefore only fetches and parses the ANA log;
  any erros in this process will fail the ctrl setup as appropriate;
- a separate function nvme_mpath_update()
  is called in nvme_start_ctrl();
  this parses the ANA log without fetching it.
  At this point the ctrl is live,
  therefore, disks can be set live normally.

Sample failure:
    nvme nvme0: starting error recovery
    nvme nvme0: Reconnecting in 10 seconds...
    block nvme0n6: no usable path - requeuing I/O
    INFO: task kworker/u8:3:312 blocked for more than 122 seconds.
          Tainted: G            E     5.14.5-1.el7.elrepo.x86_64 #1
    Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
    Call Trace:
     __schedule+0x2a2/0x7e0
     schedule+0x4e/0xb0
     io_schedule+0x16/0x40
     wait_on_page_bit_common+0x15c/0x3e0
     do_read_cache_page+0x1e0/0x410
     read_cache_page+0x12/0x20
     read_part_sector+0x46/0x100
     read_lba+0x121/0x240
     efi_partition+0x1d2/0x6a0
     bdev_disk_changed.part.0+0x1df/0x430
     bdev_disk_changed+0x18/0x20
     blkdev_get_whole+0x77/0xe0
     blkdev_get_by_dev+0xd2/0x3a0
     __device_add_disk+0x1ed/0x310
     device_add_disk+0x13/0x20
     nvme_mpath_set_live+0x138/0x1b0 [nvme_core]
     nvme_update_ns_ana_state+0x2b/0x30 [nvme_core]
     nvme_update_ana_state+0xca/0xe0 [nvme_core]
     nvme_parse_ana_log+0xac/0x170 [nvme_core]
     nvme_read_ana_log+0x7d/0xe0 [nvme_core]
     nvme_mpath_init_identify+0x105/0x150 [nvme_core]
     nvme_init_identify+0x2df/0x4d0 [nvme_core]
     nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core]
     nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp]
     nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp]
     process_one_work+0x1bd/0x360
     worker_thread+0x50/0x3d0
Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

a4a6f3c8

nvme: fix RCU hole that allowed for endless looping in multipath round robin · d6d67427

由 Chris Leech 提交于 3月 21, 2022

Make nvme_ns_remove match the assumptions elsewhere.

1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is
   running in __nvme_find_path or nvme_round_robin_path that will
   re-assign this ns to current_path.

2) Any matching current_path entries need to be cleared before removing
   from the siblings list, to prevent calling nvme_round_robin_path with
   an "old" ns that's off list.

3) Finally the list_del_rcu can happen, and then synchronize again
   before releasing any reference counts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d6d67427

nvme: allow duplicate NSIDs for private namespaces · 5974ea7c

由 Sungup Moon 提交于 3月 14, 2022

A NVMe subsystem with multiple controller can have private namespaces
that use the same NSID under some conditions:

 "If Namespace Management, ANA Reporting, or NVM Sets are supported, the
  NSIDs shall be unique within the NVM subsystem. If the Namespace
  Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
   a) for shared namespace shall be unique; and
   b) for private namespace are not required to be unique."

Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.

Make sure this specific setup is supported in Linux.

Fixes: 9ad1927a ("nvme: always search for namespace head")
Signed-off-by: NSungup Moon <sungup.moon@samsung.com>
[hch: refactored and fixed the controller vs subsystem based naming
      conflict]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

5974ea7c

23 3月, 2022 1 次提交

nvme: fix the read-only state for zoned namespaces with unsupposed features · 726be2c7

由 Pankaj Raghav 提交于 3月 22, 2022

commit 2f4c9ba2 ("nvme: export zoned namespaces without Zone Append
support read-only") marks zoned namespaces without append support
read-only.  It does iso by setting NVME_NS_FORCE_RO in ns->flags in
nvme_update_zone_info and checking for that flag later in
nvme_update_disk_info to mark the disk as read-only.

But commit 73d90386 ("nvme: cleanup zone information initialization")
rearranged nvme_update_disk_info to be called before
nvme_update_zone_info and thus not marking the disk as read-only.
The call order cannot be just reverted because nvme_update_zone_info sets
certain queue parameters such as zone_write_granularity that depend on the
prior call to nvme_update_disk_info.

Remove the call to set_disk_ro in nvme_update_disk_info. and call
set_disk_ro after nvme_update_zone_info and nvme_update_disk_info to set
the permission for ZNS drives correctly. The same applies to the
multipath disk path.

Fixes: 73d90386 ("nvme: cleanup zone information initialization")
Signed-off-by: NPankaj Raghav <p.raghav@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

726be2c7

16 3月, 2022 3 次提交

nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH · ce8d7861

由 Christoph Hellwig 提交于 3月 15, 2022

Start warning about exposing a namespace as multiple block devices,
and set a fixed deprecation release.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>

ce8d7861

nvme: remove nvme_alloc_request and nvme_alloc_request_qid · e559398f

由 Christoph Hellwig 提交于 3月 15, 2022

Just open code the allocation + initialization in the callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

e559398f

nvme: cleanup how disk->disk_name is assigned · b739e137

由 Christoph Hellwig 提交于 3月 15, 2022

They way how assigning the disk name and commenting on why it is done
is split over core.c and multipath.c seems to be rather confusing.

Now that ns_head->disk always exists we can do all the work in core.c
and have a single big comment explaining the issues.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

b739e137

08 3月, 2022 3 次提交

nvme: add support for enhanced metadata · 4020aad8

由 Keith Busch 提交于 3月 03, 2022

NVM Express ratified TP 4068 defines new protection information formats.
Implement support for the CRC64 guard tags.

Since the block layer doesn't support variable length reference tags,
driver support for the Storage Tag space is not supported at this time.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Klaus Jensen <its@irrelevant.dk>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220303201312.3255347-9-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

4020aad8

nvme: allow integrity on extended metadata formats · 84b73542

由 Keith Busch 提交于 3月 03, 2022

The block integrity subsystem knows how to construct protection
information buffers with metadata beyond the protection information
fields. Remove the driver restriction.

Note, this can only work if the PI field appears first in the metadata,
as the integrity subsystem doesn't calculate guard tags on preceding
metadata.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220303201312.3255347-3-kbusch@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

84b73542

nvme: remove support or stream based temperature hint · 85e6c775

由 Christoph Hellwig 提交于 3月 04, 2022

This support was added for RocksDB, but RocksDB ended up not using it.
At the same time drives on the open marked (vs those build for OEMs
for non-Linux support) that actually support streams are extremly
rare.  Don't bloat the nvme driver for it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20220304175556.407719-1-hch@lst.de
[axboe: fold in ctrl->nr_streams removal from Keith]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85e6c775

28 2月, 2022 12 次提交

nvme: check that EUI/GUID/UUID are globally unique · 2079f41e

由 Christoph Hellwig 提交于 2月 24, 2022

Add a check to verify that the unique identifiers are unique globally
in addition to the existing check that verifies that they are unique
inside a single subsystem.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

2079f41e

nvme: check for duplicate identifiers earlier · e2d77d2e

由 Christoph Hellwig 提交于 2月 24, 2022

Lift the check for duplicate identifiers into nvme_init_ns_head, which
avoids pointless error unwinding in case they don't match, and also
matches where we check identifier validity for the multipath case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

e2d77d2e

nvme: fix the check for duplicate unique identifiers · e2724cb9

由 Christoph Hellwig 提交于 2月 24, 2022

nvme_subsys_check_duplicate_ids should needs to return an error if any of
the identifiers matches, not just if all of them match.  But it does not
need to and should not look at the CSI value for this sanity check.

Rewrite the logic to be separate from nvme_ns_ids_equal and optimize it
by reducing duplicate checks for non-present identifiers.

Fixes: ed754e5d ("nvme: track shared namespaces")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

e2724cb9

nvme: cleanup __nvme_check_ids · fd8099e7

由 Christoph Hellwig 提交于 2月 24, 2022

Pass the actual nvme_ns_ids used for the comparison instead of the
ns_head that isn't needed and use a more descriptive function name.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>

fd8099e7

nvme: remove nssa from struct nvme_ctrl · 0a9f8500

由 Keith Busch 提交于 2月 15, 2022

The reported number of streams is not used outside the function that
gets it, so no need to stash it in the controller structure. Use a local
variable instead.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0a9f8500

nvme: explicitly set non-error for directives · 1c3adf0d

由 Keith Busch 提交于 2月 15, 2022

Stream directives is an optional feature. It is not an error if a
controller doesn't support as many as the kernel can optionally use.
Explicitly set the non-error return value on this condition with a
comment explaining why.

Note, the return value was already 0 in this condition, so the setting
is redundant. This patch should just silence bots that falsely believe
the condition contains an error omission.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1c3adf0d

nvme: expose cntrltype and dctype through sysfs · 86c2457a

由 Martin Belanger 提交于 2月 08, 2022

TP8010 introduces the Discovery Controller Type attribute (dctype).
The dctype is returned in the response to the Identify command. This
patch exposes the dctype through the sysfs. Since the dctype depends on
the Controller Type (cntrltype), another attribute of the Identify
response, the patch also exposes the cntrltype as well. The dctype will
only be displayed for discovery controllers.

A note about the naming of this attribute:
Although TP8010 calls this attribute the Discovery Controller Type,
note that the dctype is now part of the response to the Identify
command for all controller types. I/O, Discovery, and Admin controllers
all share the same Identify response PDU structure. Non-discovery
controllers as well as pre-TP8010 discovery controllers will continue
to set this field to 0 (which has always been the default for reserved
bytes). Per TP8010, the value 0 now means "Discovery controller type is
not reported" instead of "Reserved". One could argue that this
definition is correct even for non-discovery controllers, and by
extension, exposing it in the sysfs for non-discovery controllers is
appropriate.
Signed-off-by: NMartin Belanger <martin.belanger@dell.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

86c2457a

nvme: send uevent on connection up · 20d64911

由 Martin Belanger 提交于 2月 08, 2022

When connectivity with a controller is lost, the driver will keep
trying to reconnect once every 10 sec. When connection is restored,
user-space apps need to be informed so that they can take proper
action. For example, TP8010 introduces the DIM PDU, which is used to
register with a discovery controller (DC). The DIM PDU is sent from
user-space.  The DIM PDU must be sent every time a connection is
established with a DC. Therefore, the kernel must tell user-space apps
when connection is restored so that registration can happen.

The uevent sent is a "change" uevent with environmental data
set to: "NVME_EVENT=connected".
Signed-off-by: NMartin Belanger <martin.belanger@dell.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NJohn Meneghini <jmeneghi@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

20d64911

nvme: add verbose error logging · bd83fe6f

由 Alan Adamson 提交于 2月 03, 2022

Improves logging of NVMe errors.  If NVME_VERBOSE_ERRORS is configured,
a verbose description of the error is logged, otherwise only status
codes/bits is logged.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
[kch]: fix several nits, cosmetics, and trim down code.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NAlan Adamson <alan.adamson@oracle.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bd83fe6f

nvme: replace ida_simple[get|remove] with the simler ida_[alloc|free] · 8b850475

由 Sagi Grimberg 提交于 2月 14, 2022

ida_simple_[get|remove] are wrappers anyways.

Also, use ida_alloc_min with the ns_ida as namespace
enumeration starts with 1.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8b850475

nvme-core: remove unnecessary function parameter · ba326643

由 Chaitanya Kulkarni 提交于 1月 21, 2022

In function nvme_execute_rq() we don't use gendisk parameter at all.
Remove the unsed parameter and adjust the calls.
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ba326643

nvme-core: remove unnecessary semicolon · 50ab19d8

由 Chaitanya Kulkarni 提交于 1月 18, 2022

It is not a good practice to have a semicolon at the end of the
function definition. Remove it from nvme_pr_type().
Signed-off-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

50ab19d8

23 2月, 2022 2 次提交

nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info · 602e57c9

由 Christoph Hellwig 提交于 2月 16, 2022

Commit e7d65803 ("nvme-multipath: revalidate paths during rescan")
introduced the NVME_NS_READY flag, which nvme_path_is_disabled() uses
to check if a path can be used or not.  We also need to set this flag
for devices that fail the ZNS feature validation and which are available
through passthrough devices only to that they can be used in multipathing
setups.

Fixes: e7d65803 ("nvme-multipath: revalidate paths during rescan")
Reported-by: NKanchan Joshi <joshi.k@samsung.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Tested-by: NKanchan Joshi <joshi.k@samsung.com>

602e57c9

nvme: don't return an error from nvme_configure_metadata · 363f6368

由 Christoph Hellwig 提交于 2月 16, 2022

When a fabrics controller claims to support an invalidate metadata
configuration we already warn and disable metadata support.  No need to
also return an error during revalidation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Tested-by: NKanchan Joshi <joshi.k@samsung.com>

363f6368

17 2月, 2022 1 次提交

block: fix surprise removal for drivers calling blk_set_queue_dying · 7a5428dc

由 Christoph Hellwig 提交于 2月 17, 2022

Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9e that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.

Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.

Fixes: 8e141f9e ("block: drain file system I/O on del_gendisk")
Reported-by: NMarkus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

7a5428dc

09 2月, 2022 1 次提交

nvme: add nvme_complete_req tracepoint for batched completion · 00e757b6

由 Bean Huo 提交于 2月 08, 2022

Add NVMe request completion trace in nvme_complete_batch_req() because
nvme:nvme_complete_req tracepoint is missing in case of request batched
completion.
Signed-off-by: NBean Huo <beanhuo@micron.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

00e757b6

02 2月, 2022 1 次提交

nvme: fix a possible use-after-free in controller reset during load · 0fa0f99f

由 Sagi Grimberg 提交于 2月 01, 2022

Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
readiness for AER submission. This may lead to a use-after-free
condition that was observed with nvme-tcp.

The race condition may happen in the following scenario:
1. driver executes its reset_ctrl_work
2. -> nvme_stop_ctrl - flushes ctrl async_event_work
3. ctrl sends AEN which is received by the host, which in turn
   schedules AEN handling
4. teardown admin queue (which releases the queue socket)
5. AEN processed, submits another AER, calling the driver to submit
6. driver attempts to send the cmd
==> use-after-free

In order to fix that, add ctrl state check to validate the ctrl
is actually able to accept the AER submission.

This addresses the above race in controller resets because the driver
during teardown should:
1. change ctrl state to RESETTING
2. flush async_event_work (as well as other async work elements)

So after 1,2, any other AER command will find the
ctrl state to be RESETTING and bail out without submitting the AER.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

0fa0f99f

23 12月, 2021 3 次提交

nvme: add 'iopolicy' module parameter · e3d34794

由 Hannes Reinecke 提交于 12月 20, 2021

While the 'iopolicy' sysfs attribute can be set at runtime, most
storage arrays prefer to use the 'round-robin' iopolicy per default.
We can use udev rules to set this, but is getting rather unwieldy
for rebranded arrays as we would have to update the udev rules
anytime a new array shows up, leading to the same mess we currently
have in multipathd for configuring the RDAC arrays.

Hence this patch adds a module parameter 'iopolicy' to allow the
admin to switch the default, and to do away with the need for a
udev rule here.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NDaniel Wagner <dwagner@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e3d34794

nvme: drop unused variable ctrl in nvme_setup_cmd · 3a605e32

由 Geliang Tang 提交于 12月 22, 2021

The variable 'ctrl' became useless since the code using it was dropped
from nvme_setup_cmd() in the commit 292ddf67bbd5 ("nvme: increment
request genctr on completion"). Fix it to get rid of this compilation
warning in the nvme-5.17 branch:

 drivers/nvme/host/core.c: In function ‘nvme_setup_cmd’:
 drivers/nvme/host/core.c:993:20: warning: unused variable ‘ctrl’ [-Wunused-variable]
   struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
                     ^~~~

Fixes: 292ddf67bbd5 ("nvme: increment request genctr on completion")
Signed-off-by: NGeliang Tang <geliang.tang@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3a605e32

nvme: increment request genctr on completion · e4fdb2b1

由 Keith Busch 提交于 12月 13, 2021

The nvme request generation counter is intended to catch duplicate
completions. Incrementing the counter on submission means duplicates can
only be caught if the request tag is reallocated and dispatched prior to
the driver observing the corrupted CQE. Incrementing on completion
removes this window, making it possible to detect duplicate completions
in consecutive entries.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e4fdb2b1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功