提交 · 6e3ca03ee934572d5de4fb2224c01e12c4d422c8 · openeuler / Kernel

08 12月, 2018 2 次提交

nvme: support traffic based keep-alive · 6e3ca03e

由 Sagi Grimberg 提交于 11月 02, 2018

If the controller supports traffic based keep alive, we restart the keep
alive timer if any admin or io commands was completed during the kato
period.  This prevents a possible starvation of keep alive commands in
the presence of heavy traffic as in such case, we already have a health
indication from the host perspective.

Only set a comp_seen indicator in case the controller supports keep
alive to minimize the overhead for pci controllers.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6e3ca03e

nvme: introduce ctrl attributes enumeration · 12b21171

由 Sagi Grimberg 提交于 11月 02, 2018

We are growing more controller attributes, so use a proper enumeration
for it.  For now just add the 128-bit hostid which we support.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

12b21171

02 10月, 2018 1 次提交

nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O · 783f4a44

由 James Smart 提交于 9月 27, 2018

When an io is rejected by nvmf_check_ready() due to validation of the
controller state, the nvmf_fail_nonready_command() will normally return
BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
dying or the I/O is marked for NVMe multipath, the I/O is failed so that
the controller can terminate or so that the io can be issued on a
different path.  Unfortunately, as this reject point is before the
transport has accepted the command, blk-mq ends up completing the I/O
and never calls nvme_complete_rq(), which is where multipath may preserve
or re-route the I/O. The end result is, the device user ends up seeing an
EIO error.

Example: single path connectivity, controller is under load, and a reset
is induced.  An I/O is received:

  a) while the reset state has been set but the queues have yet to be
     stopped; or
  b) after queues are started (at end of reset) but before the reconnect
     has completed.

The I/O finishes with an EIO status.

This patch makes the following changes:

  - Adds the HOST_PATH_ERROR pathing status from TP4028
  - Modifies the reject point such that it appears to queue successfully,
    but actually completes the io with the new pathing status and calls
    nvme_complete_rq().
  - nvme_complete_rq() recognizes the new status, avoids resetting the
    controller (likely was already done in order to get this new status),
    and calls the multipather to clear the current path that errored.
    This allows the next command (retry or new command) to select a new
    path if there is one.
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

783f4a44

08 8月, 2018 2 次提交

nvme.h: add support for ns write protect definitions · 93045d59

由 Chaitanya Kulkarni 提交于 8月 07, 2018

Add various definitions from NVMe 1.3 TP 4005.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

93045d59

nvme.h: fixup ANA group descriptor format · 8b92d0e3

由 Hannes Reinecke 提交于 8月 08, 2018

ANA Phase 3 draft had the 'reserved' field in the group descriptor
format set to '23:17' (so that the first namespace identifier started
at byte 24), but that got move with the approved TP to '31:17'
(so that the first namespace identifier started at byte 32).
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8b92d0e3

28 7月, 2018 2 次提交

nvme.h: add ANA definitions · 1a376216

由 Christoph Hellwig 提交于 5月 13, 2018

Add various defintions from NVMe 1.3 TP 4004.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

1a376216

nvme.h: add support for the log specific field · 9b89bc38

由 Christoph Hellwig 提交于 5月 12, 2018

NVMe 1.3 added a new log specific field to the get log page CQ
defintion, add it to our get_log_page SQ structure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

9b89bc38

23 7月, 2018 1 次提交

nvme.h: resync with nvme-cli · 40c6f9c2

由 Revanth Rajashekar 提交于 6月 15, 2018

Added some feature ids present in nvme-cli but not kernel.
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

40c6f9c2

01 6月, 2018 3 次提交

nvme.h: add AEN configuration symbols · aafd3afe

由 Hannes Reinecke 提交于 5月 25, 2018

Signed-off-by: NHannes Reinecke <hare@suse.com>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

aafd3afe

nvme.h: add the changed namespace list log · b3984e06

由 Christoph Hellwig 提交于 5月 25, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

b3984e06

nvme.h: untangle AEN notice definitions · 868c2392

由 Christoph Hellwig 提交于 5月 22, 2018

Stop including the event type in the definitions for the notice type.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

868c2392

18 1月, 2018 1 次提交

nvme-pci: clean up SMBSZ bit definitions · 88de4598

由 Christoph Hellwig 提交于 12月 20, 2017

Define the bit positions instead of macros using the magic values,
and move the expanded helpers to calculate the size and size unit into
the implementation C file.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>

88de4598

11 11月, 2017 3 次提交

nvme: send uevent for some asynchronous events · e3d7874d

由 Keith Busch 提交于 11月 07, 2017

This will give udev a chance to observe and handle asynchronous event
notifications and clear the log to unmask future events of the same type.
The driver will create a change uevent of the asyncronuos event result
before submitting the next AEN request to the device if a completed AEN
event is of type error, smart, command set or vendor specific,
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e3d7874d

nvme: centralize AEN defines · 38dabe21

由 Keith Busch 提交于 11月 07, 2017

All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

38dabe21

nvme: check admin passthru command effects · 84fef62d

由 Keith Busch 提交于 11月 07, 2017

The NVMe standard provides a command effects log page so the host may
be aware of special requirements it may need to do for a particular
command. For example, the command may need to run with IO quiesced to
prevent timeouts or undefined behavior, or it may change the logical block
formats that determine how the host needs to construct future commands.

This patch saves the nvme command effects log page if the controller
supports it, and performs appropriate actions before and after an admin
passthrough command is completed. If the controller does not support the
command effects log page, the driver will define the effects for known
opcodes. The nvme format and santize are the only commands in this patch
with known effects.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

84fef62d

25 9月, 2017 2 次提交

nvme: add transport SGL definitions · d85cf207

由 James Smart 提交于 9月 07, 2017

Add transport SGL defintions from NVMe TP 4008, required for
the final NVMe-FC standard.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d85cf207

nvme.h: remove FC transport-specific error values · c98cb3bd

由 James Smart 提交于 9月 07, 2017

The NVM express group recinded the reserved range for the transport.
Remove the FC-centric values that had been defined.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c98cb3bd

12 9月, 2017 1 次提交

nvme-pci: implement the HMB entry number and size limitations · 044a9df1

由 Christoph Hellwig 提交于 9月 11, 2017

Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size
and Host Memory Maximum Descriptors Entries field that were added in
TP 4002 HMB Enhancements.  These allow the controller to advertise
limits for the usual number of segments in the host memory buffer, as
well as a minimum usable per-segment size.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

044a9df1

30 8月, 2017 1 次提交

nvme: fix the definition of the doorbell buffer config support bit · 223694b9

由 Changpeng Liu 提交于 8月 31, 2017

NVMe 1.3 specification defines the Optional Admin Command Support feature
flags, bit 8 set to '1' then the controller supports the Doorbell Buffer
Config command. Bit 7 is used for Virtualization Mangement command.
Signed-off-by: NChangpeng Liu <changpeng.liu@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Cc: stable@vger.kernel.org

223694b9

29 8月, 2017 5 次提交

M
nvme: rename AMS symbolic constants to fit specification · 60b43f62
由 Max Gurtovoy 提交于 8月 13, 2017
```
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
60b43f62

nvme: add symbolic constants for CC identifiers · ad4e05b2

由 Max Gurtovoy 提交于 8月 13, 2017

Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ad4e05b2

nvme: add support for NVMe 1.3 Timestamp Feature · dbf86b39

由 Jon Derrick 提交于 8月 16, 2017

NVME's Timestamp feature allows controllers to be aware of the epoch
time in milliseconds. This patch adds the set features hook for various
transports through the identify path, so that resets and resumes can
update the controller as necessary.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
[hch: rebased on top of nvme-4.13 error handling changes,
      changed nvme_configure_timestamp to return the status]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

dbf86b39

nvme: define NVME_NSID_ALL · 62346eae

由 Arnav Dawn 提交于 7月 12, 2017

Define the constant "0xffffffff" (used as nsid for all namespaces)
as NVME_NSID_ALL.
Signed-off-by: NArnav Dawn <a.dawn@samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

62346eae

nvme: add support for FW activation without reset · b6dccf7f

由 Arnav Dawn 提交于 7月 12, 2017

This patch adds support for handling Fw activation without reset
On completion of FW-activation-starting AER, all queues are
paused till CSTS.PP is cleared or timed out (exceeds max time for
fw activtion MTFA). If device fails to clear CSTS.PP within MTFA,
driver issues reset controller.
Signed-off-by: NArnav Dawn <a.dawn@samsung.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

b6dccf7f

25 7月, 2017 1 次提交

nvme: fabrics commands should use the fctype field for data direction · 2fd4167f

由 Jon Derrick 提交于 7月 12, 2017

Fabrics commands with opcode 0x7F use the fctype field to indicate data
direction.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NSagi Grimberg <sai@grmberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")

2fd4167f

20 7月, 2017 1 次提交

nvme: fix byte swapping in the streams code · dc1a0afb

由 Christoph Hellwig 提交于 7月 14, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dc1a0afb

28 6月, 2017 2 次提交

nvme: use a single NVME_AQ_DEPTH and relax it to 32 · 7aa1f427

由 Sagi Grimberg 提交于 6月 18, 2017

No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7aa1f427

nvme: add support for streams and directives · f5d11840

由 Jens Axboe 提交于 6月 27, 2017

This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data, so
that it the device can do so most effiently. If an application is
managing and writing data with different life times, mixing differently
retentioned data onto the same locations on flash can cause write
amplification to grow. This, in turn, will reduce performance and life
time of the device.
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5d11840

16 6月, 2017 1 次提交

nvme: implement NS Optimal IO Boundary from 1.3 Spec · 6b8190d6

由 Scott Bauer 提交于 6月 15, 2017

The NVMe 1.3 spec introduces Namespace Optimal IO Boundaries (NOIOB),
which standardizes the stripe mechanism we currently have quirks for.
This patch implements the necessary logic to handle this new feature.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6b8190d6

15 6月, 2017 5 次提交

nvme: add fields into identify controller data structure · 435e8090

由 Guan Junxiong 提交于 6月 13, 2017

Add the new to NVMe 1.3 fields EDSTT, DSTO, FWUG, HCTMA, MNTMT, MXTMT,
and SANICAP into the idenfity controller data structure.
Signed-off-by: NGuan Junxiong <guanjunxiong@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

435e8090

nvmet: allow overriding the NVMe VS via configfs · c61d788b

由 Johannes Thumshirn 提交于 6月 07, 2017

Allow overriding the announced NVMe Version of a via configfs.

This is particularly helpful when debugging new features for the host
or target side without bumping the hard coded version (as the target
might not be fully compliant to the announced version yet).
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NGuan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c61d788b

nvme: introduce NVMe Namespace Identification Descriptor structures · af8b86e9

由 Johannes Thumshirn 提交于 6月 07, 2017

Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

af8b86e9

nvmet: use NVME_IDENTIFY_DATA_SIZE · 0add5e8e

由 Johannes Thumshirn 提交于 6月 07, 2017

Use NVME_IDENTIFY_DATA_SIZE define instead of hard coding the magic
4096 value.
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NHannes Reinecke <hare@suse.com>
[hch: converted three more users]
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0add5e8e

nvme-pci: remap BAR0 to cover admin CQ doorbell for large stride · 97f6ef64

由 Xu Yu 提交于 5月 24, 2017

The existing driver initially maps 8192 bytes of BAR0 which is
intended to cover doorbells of admin SQ and CQ. However, if a
large stride, e.g. 10, is used, the doorbell of admin CQ will
be out of 8192 bytes. Consequently, a page fault will be raised
when the admin CQ doorbell is accessed in nvme_configure_admin_queue().

This patch fixes this issue by remapping BAR0 before accessing
admin CQ doorbell if the initial mapping is not enough.
Signed-off-by: NXu Yu <yu.a.xu@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

97f6ef64

13 6月, 2017 2 次提交

nvme.h: add dword 12 - 15 fields to struct nvme_features · b85cf734

由 Arnav Dawn 提交于 5月 12, 2017

Signed-off-by: NArnav Dawn <a.dawn@samsung.com>
[hch: split from a larger patch, new changelog]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

b85cf734

nvme.h: add struct nvme_host_mem_buf_desc and HMB flags · 39673e19

由 Christoph Hellwig 提交于 1月 09, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>

39673e19

05 6月, 2017 1 次提交

nvme: switch to uuid_t · 8e412263

由 Christoph Hellwig 提交于 5月 17, 2017

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>

8e412263

21 4月, 2017 1 次提交

nvme: improve performance for virtual NVMe devices · f9f38e33

由 Helen Koike 提交于 4月 10, 2017

This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
 1) to store the doorbell values so they can be lookup without the doorbell
    MMIO write
 2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specification, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.

FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).

The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.

Contributions:
  Eric Northup <digitaleric@google.com>
  Frank Swiderski <fes@google.com>
  Ted Tso <tytso@mit.edu>
  Keith Busch <keith.busch@intel.com>

Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.

[1] Running on a 4 core machine:
  fio --time_based --name=benchmark --runtime=30
  --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
  --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
  --rw=randread --blocksize=4k --randrepeat=false
Signed-off-by: NRob Nelson <rlnelson@google.com>
[mlin: port for upstream]
Signed-off-by: NMing Lin <mlin@kernel.org>
[koike: updated for upstream]
Signed-off-by: NHelen Koike <helen.koike@collabora.co.uk>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>

f9f38e33

02 4月, 2017 1 次提交

nvme: Correct NVMF enum values to match NVMe-oF rev 1.0 · bf17aa36

由 Roland Dreier 提交于 3月 01, 2017

The enum values for QPTYPE, PRTYPE and CMS are off by 1 from the
values defined in figure 42 of the NVM Express over Fabrics 1.0:

    http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605-1.pdf

Fix our enums to match the final spec.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

bf17aa36

23 2月, 2017 1 次提交

nvme: Enable autonomous power state transitions · c5552fde

由 Andy Lutomirski 提交于 2月 07, 2017

NVMe devices can advertise multiple power states.  These states can
be either "operational" (the device is fully functional but possibly
slow) or "non-operational" (the device is asleep until woken up).
Some devices can automatically enter a non-operational state when
idle for a specified amount of time and then automatically wake back
up when needed.

The hardware configuration is a table.  For each state, an entry in
the table indicates the next deeper non-operational state, if any,
to autonomously transition to and the idle time required before
transitioning.

This patch teaches the driver to program APST so that each successive
non-operational state will be entered after an idle time equal to 100%
of the total latency (entry plus exit) associated with that state.
The maximum acceptable latency is controlled using dev_pm_qos
(e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
states with total latency greater than this value will not be used.
As a special case, setting the latency tolerance to 0 will disable
APST entirely.  On hardware without APST support, the sysfs file will
not be exposed.

The latency tolerance for newly-probed devices is set by the module
parameter nvme_core.default_ps_max_latency_us.

In theory, the device can expose "default" APST table, but this
doesn't seem to function correctly on my device (Samsung 950), nor
does it seem particularly useful.  There is also an optional
mechanism by which a configuration can be "saved" so it will be
automatically loaded on reset.  This can be configured from
userspace, but it doesn't seem useful to support in the driver.

On my laptop, enabling APST seems to save nearly 1W.

The hardware tables can be decoded in userspace with nvme-cli.
'nvme id-ctrl /dev/nvmeN' will show the power state table and
'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
configuration.

This feature is quirked off on a known-buggy Samsung device.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

c5552fde

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功