提交 fb4da215 编写于 作者: L Linus Torvalds

Merge tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Enumeration changes:

   - Evaluate PCI Boot Configuration _DSM to learn if firmware wants us
     to preserve its resource assignments (Benjamin Herrenschmidt)

   - Simplify resource distribution (Nicholas Johnson)

   - Decode 32 GT/s link speed (Gustavo Pimentel)

  Virtualization:

   - Fix incorrect caching of VF config space size (Alex Williamson)

   - Fix VF driver probing sysfs knobs (Alex Williamson)

  Peer-to-peer DMA:

   - Fix dma_virt_ops check (Logan Gunthorpe)

  Altera host bridge driver:

   - Allow building as module (Ley Foon Tan)

  Armada 8K host bridge driver:

   - add PHYs support (Miquel Raynal)

  DesignWare host bridge driver:

   - Export APIs to support removable loadable module (Vidya Sagar)

   - Enable Relaxed Ordering erratum workaround only on Tegra20 &
     Tegra30 (Vidya Sagar)

  Hyper-V host bridge driver:

   - Fix use-after-free in eject (Dexuan Cui)

  Mobiveil host bridge driver:

   - Clean up and fix many issues, including non-identify mapped
     windows, 64-bit windows, multi-MSI, class code, INTx clearing (Hou
     Zhiqiang)

  Qualcomm host bridge driver:

   - Use clk bulk API for 2.4.0 controllers (Bjorn Andersson)

   - Add QCS404 support (Bjorn Andersson)

   - Assert PERST for at least 100ms (Niklas Cassel)

  R-Car host bridge driver:

   - Add r8a774a1 DT support (Biju Das)

  Tegra host bridge driver:

   - Add support for Gen2, opportunistic UpdateFC and ACK (PCIe protocol
     details) AER, GPIO-based PERST# (Manikanta Maddireddy)

   - Fix many issues, including power-on failure cases, interrupt
     masking in suspend, UPHY settings, AFI dynamic clock gating,
     pending DLL transactions (Manikanta Maddireddy)

  Xilinx host bridge driver:

   - Fix NWL Multi-MSI programming (Bharat Kumar Gogada)

  Endpoint support:

   - Fix 64bit BAR support (Alan Mikhak)

   - Fix pcitest build issues (Alan Mikhak, Andy Shevchenko)

  Bug fixes:

   - Fix NVIDIA GPU multi-function power dependencies (Abhishek Sahu)

   - Fix NVIDIA GPU HDA enablement issue (Lukas Wunner)

   - Ignore lockdep for sysfs "remove" (Marek Vasut)

  Misc:

   - Convert docs to reST (Changbin Du, Mauro Carvalho Chehab)"

* tag 'pci-v5.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (107 commits)
  PCI: Enable NVIDIA HDA controllers
  tools: PCI: Fix installation when `make tools/pci_install`
  PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB
  PCI: Fix typos and whitespace errors
  PCI: mobiveil: Fix INTx interrupt clearing in mobiveil_pcie_isr()
  PCI: mobiveil: Fix infinite-loop in the INTx handling function
  PCI: mobiveil: Move PCIe PIO enablement out of inbound window routine
  PCI: mobiveil: Add upper 32-bit PCI base address setup in inbound window
  PCI: mobiveil: Add upper 32-bit CPU base address setup in outbound window
  PCI: mobiveil: Mask out hardcoded bits in inbound/outbound windows setup
  PCI: mobiveil: Clear the control fields before updating it
  PCI: mobiveil: Add configured inbound windows counter
  PCI: mobiveil: Fix the valid check for inbound and outbound windows
  PCI: mobiveil: Clean-up program_{ib/ob}_windows()
  PCI: mobiveil: Remove an unnecessary return value check
  PCI: mobiveil: Fix error return values
  PCI: mobiveil: Refactor the MEM/IO outbound window initialization
  PCI: mobiveil: Make some register updates more readable
  PCI: mobiveil: Reformat the code for readability
  dt-bindings: PCI: mobiveil: Change gpio_slave and apb_csr to optional
  ...
...@@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org ...@@ -5,7 +5,7 @@ Contact: linux-pm@vger.kernel.org
Description: Description:
The powercap/ class sub directory belongs to the power cap The powercap/ class sub directory belongs to the power cap
subsystem. Refer to subsystem. Refer to
Documentation/power/powercap/powercap.txt for details. Documentation/power/powercap/powercap.rst for details.
What: /sys/class/powercap/<control type> What: /sys/class/powercap/<control type>
Date: September 2013 Date: September 2013
......
ACPI considerations for PCI host bridges .. SPDX-License-Identifier: GPL-2.0
========================================
ACPI considerations for PCI host bridges
========================================
The general rule is that the ACPI namespace should describe everything the The general rule is that the ACPI namespace should describe everything the
OS might use unless there's another way for the OS to find it [1, 2]. OS might use unless there's another way for the OS to find it [1, 2].
...@@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge ...@@ -131,12 +135,13 @@ address always corresponds to bus 0, even if the bus range below the bridge
[4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4: [4] ACPI 6.2, sec 6.4.3.5.1, 2, 3, 4:
QWord/DWord/Word Address Space Descriptor (.1, .2, .3) QWord/DWord/Word Address Space Descriptor (.1, .2, .3)
General Flags: Bit [0] Ignored General Flags: Bit [0] Ignored
Extended Address Space Descriptor (.4) Extended Address Space Descriptor (.4)
General Flags: Bit [0] Consumer/Producer: General Flags: Bit [0] Consumer/Producer:
1–This device consumes this resource
0–This device produces and consumes this resource * 1 – This device consumes this resource
* 0 – This device produces and consumes this resource
[5] ACPI 6.2, sec 19.6.43: [5] ACPI 6.2, sec 19.6.43:
ResourceUsage specifies whether the Memory range is consumed by ResourceUsage specifies whether the Memory range is consumed by
......
.. SPDX-License-Identifier: GPL-2.0
======================
PCI Endpoint Framework
======================
.. toctree::
:maxdepth: 2
pci-endpoint
pci-endpoint-cfs
pci-test-function
pci-test-howto
CONFIGURING PCI ENDPOINT USING CONFIGFS .. SPDX-License-Identifier: GPL-2.0
Kishon Vijay Abraham I <kishon@ti.com>
=======================================
Configuring PCI Endpoint Using CONFIGFS
=======================================
:Author: Kishon Vijay Abraham I <kishon@ti.com>
The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the The PCI Endpoint Core exposes configfs entry (pci_ep) to configure the
PCI endpoint function and to bind the endpoint function PCI endpoint function and to bind the endpoint function
with the endpoint controller. (For introducing other mechanisms to with the endpoint controller. (For introducing other mechanisms to
configure the PCI Endpoint Function refer to [1]). configure the PCI Endpoint Function refer to [1]).
*) Mounting configfs Mounting configfs
=================
The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs The PCI Endpoint Core layer creates pci_ep directory in the mounted configfs
directory. configfs can be mounted using the following command. directory. configfs can be mounted using the following command::
mount -t configfs none /sys/kernel/config mount -t configfs none /sys/kernel/config
*) Directory Structure Directory Structure
===================
The pci_ep configfs has two directories at its root: controllers and The pci_ep configfs has two directories at its root: controllers and
functions. Every EPC device present in the system will have an entry in functions. Every EPC device present in the system will have an entry in
the *controllers* directory and and every EPF driver present in the system the *controllers* directory and and every EPF driver present in the system
will have an entry in the *functions* directory. will have an entry in the *functions* directory.
::
/sys/kernel/config/pci_ep/ /sys/kernel/config/pci_ep/
.. controllers/ .. controllers/
.. functions/ .. functions/
*) Creating EPF Device Creating EPF Device
===================
Every registered EPF driver will be listed in controllers directory. The Every registered EPF driver will be listed in controllers directory. The
entries corresponding to EPF driver will be created by the EPF core. entries corresponding to EPF driver will be created by the EPF core.
::
/sys/kernel/config/pci_ep/functions/ /sys/kernel/config/pci_ep/functions/
.. <EPF Driver1>/ .. <EPF Driver1>/
... <EPF Device 11>/ ... <EPF Device 11>/
... <EPF Device 21>/ ... <EPF Device 21>/
.. <EPF Driver2>/ .. <EPF Driver2>/
... <EPF Device 12>/ ... <EPF Device 12>/
... <EPF Device 22>/ ... <EPF Device 22>/
In order to create a <EPF device> of the type probed by <EPF Driver>, the In order to create a <EPF device> of the type probed by <EPF Driver>, the
user has to create a directory inside <EPF DriverN>. user has to create a directory inside <EPF DriverN>.
...@@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be ...@@ -44,34 +54,37 @@ Every <EPF device> directory consists of the following entries that can be
used to configure the standard configuration header of the endpoint function. used to configure the standard configuration header of the endpoint function.
(These entries are created by the framework when any new <EPF Device> is (These entries are created by the framework when any new <EPF Device> is
created) created)
::
.. <EPF Driver1>/
... <EPF Device 11>/ .. <EPF Driver1>/
... vendorid ... <EPF Device 11>/
... deviceid ... vendorid
... revid ... deviceid
... progif_code ... revid
... subclass_code ... progif_code
... baseclass_code ... subclass_code
... cache_line_size ... baseclass_code
... subsys_vendor_id ... cache_line_size
... subsys_id ... subsys_vendor_id
... interrupt_pin ... subsys_id
... interrupt_pin
*) EPC Device
EPC Device
==========
Every registered EPC device will be listed in controllers directory. The Every registered EPC device will be listed in controllers directory. The
entries corresponding to EPC device will be created by the EPC core. entries corresponding to EPC device will be created by the EPC core.
::
/sys/kernel/config/pci_ep/controllers/
.. <EPC Device1>/ /sys/kernel/config/pci_ep/controllers/
... <Symlink EPF Device11>/ .. <EPC Device1>/
... <Symlink EPF Device12>/ ... <Symlink EPF Device11>/
... start ... <Symlink EPF Device12>/
.. <EPC Device2>/ ... start
... <Symlink EPF Device21>/ .. <EPC Device2>/
... <Symlink EPF Device22>/ ... <Symlink EPF Device21>/
... start ... <Symlink EPF Device22>/
... start
The <EPC Device> directory will have a list of symbolic links to The <EPC Device> directory will have a list of symbolic links to
<EPF Device>. These symbolic links should be created by the user to <EPF Device>. These symbolic links should be created by the user to
...@@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once ...@@ -81,7 +94,7 @@ The <EPC Device> directory will also have a *start* field. Once
"1" is written to this field, the endpoint device will be ready to "1" is written to this field, the endpoint device will be ready to
establish the link with the host. This is usually done after establish the link with the host. This is usually done after
all the EPF devices are created and linked with the EPC device. all the EPF devices are created and linked with the EPC device.
::
| controllers/ | controllers/
| <Directory: EPC name>/ | <Directory: EPC name>/
...@@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device. ...@@ -102,4 +115,4 @@ all the EPF devices are created and linked with the EPC device.
| interrupt_pin | interrupt_pin
| function | function
[1] -> Documentation/PCI/endpoint/pci-endpoint.txt [1] :doc:`pci-endpoint`
PCI ENDPOINT FRAMEWORK .. SPDX-License-Identifier: GPL-2.0
Kishon Vijay Abraham I <kishon@ti.com>
:Author: Kishon Vijay Abraham I <kishon@ti.com>
This document is a guide to use the PCI Endpoint Framework in order to create This document is a guide to use the PCI Endpoint Framework in order to create
endpoint controller driver, endpoint function driver, and using configfs endpoint controller driver, endpoint function driver, and using configfs
interface to bind the function driver to the controller driver. interface to bind the function driver to the controller driver.
1. Introduction Introduction
============
Linux has a comprehensive PCI subsystem to support PCI controllers that Linux has a comprehensive PCI subsystem to support PCI controllers that
operates in Root Complex mode. The subsystem has capability to scan PCI bus, operates in Root Complex mode. The subsystem has capability to scan PCI bus,
...@@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an ...@@ -19,26 +21,30 @@ add endpoint mode support in Linux. This will help to run Linux in an
EP system which can have a wide variety of use cases from testing or EP system which can have a wide variety of use cases from testing or
validation, co-processor accelerator, etc. validation, co-processor accelerator, etc.
2. PCI Endpoint Core PCI Endpoint Core
=================
The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller The PCI Endpoint Core layer comprises 3 components: the Endpoint Controller
library, the Endpoint Function library, and the configfs layer to bind the library, the Endpoint Function library, and the configfs layer to bind the
endpoint function with the endpoint controller. endpoint function with the endpoint controller.
2.1 PCI Endpoint Controller(EPC) Library PCI Endpoint Controller(EPC) Library
------------------------------------
The EPC library provides APIs to be used by the controller that can operate The EPC library provides APIs to be used by the controller that can operate
in endpoint mode. It also provides APIs to be used by function driver/library in endpoint mode. It also provides APIs to be used by function driver/library
in order to implement a particular endpoint function. in order to implement a particular endpoint function.
2.1.1 APIs for the PCI controller Driver APIs for the PCI controller Driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI controller driver. by the PCI controller driver.
*) devm_pci_epc_create()/pci_epc_create() * devm_pci_epc_create()/pci_epc_create()
The PCI controller driver should implement the following ops: The PCI controller driver should implement the following ops:
* write_header: ops to populate configuration space header * write_header: ops to populate configuration space header
* set_bar: ops to configure the BAR * set_bar: ops to configure the BAR
* clear_bar: ops to reset the BAR * clear_bar: ops to reset the BAR
...@@ -51,110 +57,116 @@ by the PCI controller driver. ...@@ -51,110 +57,116 @@ by the PCI controller driver.
The PCI controller driver can then create a new EPC device by invoking The PCI controller driver can then create a new EPC device by invoking
devm_pci_epc_create()/pci_epc_create(). devm_pci_epc_create()/pci_epc_create().
*) devm_pci_epc_destroy()/pci_epc_destroy() * devm_pci_epc_destroy()/pci_epc_destroy()
The PCI controller driver can destroy the EPC device created by either The PCI controller driver can destroy the EPC device created by either
devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or devm_pci_epc_create() or pci_epc_create() using devm_pci_epc_destroy() or
pci_epc_destroy(). pci_epc_destroy().
*) pci_epc_linkup() * pci_epc_linkup()
In order to notify all the function devices that the EPC device to which In order to notify all the function devices that the EPC device to which
they are linked has established a link with the host, the PCI controller they are linked has established a link with the host, the PCI controller
driver should invoke pci_epc_linkup(). driver should invoke pci_epc_linkup().
*) pci_epc_mem_init() * pci_epc_mem_init()
Initialize the pci_epc_mem structure used for allocating EPC addr space. Initialize the pci_epc_mem structure used for allocating EPC addr space.
*) pci_epc_mem_exit() * pci_epc_mem_exit()
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init(). Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
2.1.2 APIs for the PCI Endpoint Function Driver
APIs for the PCI Endpoint Function Driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver. by the PCI endpoint function driver.
*) pci_epc_write_header() * pci_epc_write_header()
The PCI endpoint function driver should use pci_epc_write_header() to The PCI endpoint function driver should use pci_epc_write_header() to
write the standard configuration header to the endpoint controller. write the standard configuration header to the endpoint controller.
*) pci_epc_set_bar() * pci_epc_set_bar()
The PCI endpoint function driver should use pci_epc_set_bar() to configure The PCI endpoint function driver should use pci_epc_set_bar() to configure
the Base Address Register in order for the host to assign PCI addr space. the Base Address Register in order for the host to assign PCI addr space.
Register space of the function driver is usually configured Register space of the function driver is usually configured
using this API. using this API.
*) pci_epc_clear_bar() * pci_epc_clear_bar()
The PCI endpoint function driver should use pci_epc_clear_bar() to reset The PCI endpoint function driver should use pci_epc_clear_bar() to reset
the BAR. the BAR.
*) pci_epc_raise_irq() * pci_epc_raise_irq()
The PCI endpoint function driver should use pci_epc_raise_irq() to raise The PCI endpoint function driver should use pci_epc_raise_irq() to raise
Legacy Interrupt, MSI or MSI-X Interrupt. Legacy Interrupt, MSI or MSI-X Interrupt.
*) pci_epc_mem_alloc_addr() * pci_epc_mem_alloc_addr()
The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to The PCI endpoint function driver should use pci_epc_mem_alloc_addr(), to
allocate memory address from EPC addr space which is required to access allocate memory address from EPC addr space which is required to access
RC's buffer RC's buffer
*) pci_epc_mem_free_addr() * pci_epc_mem_free_addr()
The PCI endpoint function driver should use pci_epc_mem_free_addr() to The PCI endpoint function driver should use pci_epc_mem_free_addr() to
free the memory space allocated using pci_epc_mem_alloc_addr(). free the memory space allocated using pci_epc_mem_alloc_addr().
2.1.3 Other APIs Other APIs
~~~~~~~~~~
There are other APIs provided by the EPC library. These are used for binding There are other APIs provided by the EPC library. These are used for binding
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
using these APIs. using these APIs.
*) pci_epc_get() * pci_epc_get()
Get a reference to the PCI endpoint controller based on the device name of Get a reference to the PCI endpoint controller based on the device name of
the controller. the controller.
*) pci_epc_put() * pci_epc_put()
Release the reference to the PCI endpoint controller obtained using Release the reference to the PCI endpoint controller obtained using
pci_epc_get() pci_epc_get()
*) pci_epc_add_epf() * pci_epc_add_epf()
Add a PCI endpoint function to a PCI endpoint controller. A PCIe device Add a PCI endpoint function to a PCI endpoint controller. A PCIe device
can have up to 8 functions according to the specification. can have up to 8 functions according to the specification.
*) pci_epc_remove_epf() * pci_epc_remove_epf()
Remove the PCI endpoint function from PCI endpoint controller. Remove the PCI endpoint function from PCI endpoint controller.
*) pci_epc_start() * pci_epc_start()
The PCI endpoint function driver should invoke pci_epc_start() once it The PCI endpoint function driver should invoke pci_epc_start() once it
has configured the endpoint function and wants to start the PCI link. has configured the endpoint function and wants to start the PCI link.
*) pci_epc_stop() * pci_epc_stop()
The PCI endpoint function driver should invoke pci_epc_stop() to stop The PCI endpoint function driver should invoke pci_epc_stop() to stop
the PCI LINK. the PCI LINK.
2.2 PCI Endpoint Function(EPF) Library
PCI Endpoint Function(EPF) Library
----------------------------------
The EPF library provides APIs to be used by the function driver and the EPC The EPF library provides APIs to be used by the function driver and the EPC
library to provide endpoint mode functionality. library to provide endpoint mode functionality.
2.2.1 APIs for the PCI Endpoint Function Driver APIs for the PCI Endpoint Function Driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver. by the PCI endpoint function driver.
*) pci_epf_register_driver() * pci_epf_register_driver()
The PCI Endpoint Function driver should implement the following ops: The PCI Endpoint Function driver should implement the following ops:
* bind: ops to perform when a EPC device has been bound to EPF device * bind: ops to perform when a EPC device has been bound to EPF device
...@@ -166,50 +178,54 @@ by the PCI endpoint function driver. ...@@ -166,50 +178,54 @@ by the PCI endpoint function driver.
The PCI Function driver can then register the PCI EPF driver by using The PCI Function driver can then register the PCI EPF driver by using
pci_epf_register_driver(). pci_epf_register_driver().
*) pci_epf_unregister_driver() * pci_epf_unregister_driver()
The PCI Function driver can unregister the PCI EPF driver by using The PCI Function driver can unregister the PCI EPF driver by using
pci_epf_unregister_driver(). pci_epf_unregister_driver().
*) pci_epf_alloc_space() * pci_epf_alloc_space()
The PCI Function driver can allocate space for a particular BAR using The PCI Function driver can allocate space for a particular BAR using
pci_epf_alloc_space(). pci_epf_alloc_space().
*) pci_epf_free_space() * pci_epf_free_space()
The PCI Function driver can free the allocated space The PCI Function driver can free the allocated space
(using pci_epf_alloc_space) by invoking pci_epf_free_space(). (using pci_epf_alloc_space) by invoking pci_epf_free_space().
2.2.2 APIs for the PCI Endpoint Controller Library APIs for the PCI Endpoint Controller Library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint controller library. by the PCI endpoint controller library.
*) pci_epf_linkup() * pci_epf_linkup()
The PCI endpoint controller library invokes pci_epf_linkup() when the The PCI endpoint controller library invokes pci_epf_linkup() when the
EPC device has established the connection to the host. EPC device has established the connection to the host.
2.2.2 Other APIs Other APIs
~~~~~~~~~~
There are other APIs provided by the EPF library. These are used to notify There are other APIs provided by the EPF library. These are used to notify
the function driver when the EPF device is bound to the EPC device. the function driver when the EPF device is bound to the EPC device.
pci-ep-cfs.c can be used as reference for using these APIs. pci-ep-cfs.c can be used as reference for using these APIs.
*) pci_epf_create() * pci_epf_create()
Create a new PCI EPF device by passing the name of the PCI EPF device. Create a new PCI EPF device by passing the name of the PCI EPF device.
This name will be used to bind the the EPF device to a EPF driver. This name will be used to bind the the EPF device to a EPF driver.
*) pci_epf_destroy() * pci_epf_destroy()
Destroy the created PCI EPF device. Destroy the created PCI EPF device.
*) pci_epf_bind() * pci_epf_bind()
pci_epf_bind() should be invoked when the EPF device has been bound to pci_epf_bind() should be invoked when the EPF device has been bound to
a EPC device. a EPC device.
*) pci_epf_unbind() * pci_epf_unbind()
pci_epf_unbind() should be invoked when the binding between EPC device pci_epf_unbind() should be invoked when the binding between EPC device
and EPF device is lost. and EPF device is lost.
PCI TEST .. SPDX-License-Identifier: GPL-2.0
Kishon Vijay Abraham I <kishon@ti.com>
=================
PCI Test Function
=================
:Author: Kishon Vijay Abraham I <kishon@ti.com>
Traditionally PCI RC has always been validated by using standard Traditionally PCI RC has always been validated by using standard
PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards. PCI cards like ethernet PCI cards or USB PCI cards or SATA PCI cards.
...@@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers: ...@@ -23,65 +28,76 @@ The PCI endpoint test device has the following registers:
8) PCI_ENDPOINT_TEST_IRQ_TYPE 8) PCI_ENDPOINT_TEST_IRQ_TYPE
9) PCI_ENDPOINT_TEST_IRQ_NUMBER 9) PCI_ENDPOINT_TEST_IRQ_NUMBER
*) PCI_ENDPOINT_TEST_MAGIC * PCI_ENDPOINT_TEST_MAGIC
This register will be used to test BAR0. A known pattern will be written This register will be used to test BAR0. A known pattern will be written
and read back from MAGIC register to verify BAR0. and read back from MAGIC register to verify BAR0.
*) PCI_ENDPOINT_TEST_COMMAND: * PCI_ENDPOINT_TEST_COMMAND
This register will be used by the host driver to indicate the function This register will be used by the host driver to indicate the function
that the endpoint device must perform. that the endpoint device must perform.
Bitfield Description: ======== ================================================================
Bit 0 : raise legacy IRQ Bitfield Description
Bit 1 : raise MSI IRQ ======== ================================================================
Bit 2 : raise MSI-X IRQ Bit 0 raise legacy IRQ
Bit 3 : read command (read data from RC buffer) Bit 1 raise MSI IRQ
Bit 4 : write command (write data to RC buffer) Bit 2 raise MSI-X IRQ
Bit 5 : copy command (copy data from one RC buffer to another Bit 3 read command (read data from RC buffer)
RC buffer) Bit 4 write command (write data to RC buffer)
Bit 5 copy command (copy data from one RC buffer to another RC buffer)
======== ================================================================
*) PCI_ENDPOINT_TEST_STATUS * PCI_ENDPOINT_TEST_STATUS
This register reflects the status of the PCI endpoint device. This register reflects the status of the PCI endpoint device.
Bitfield Description: ======== ==============================
Bit 0 : read success Bitfield Description
Bit 1 : read fail ======== ==============================
Bit 2 : write success Bit 0 read success
Bit 3 : write fail Bit 1 read fail
Bit 4 : copy success Bit 2 write success
Bit 5 : copy fail Bit 3 write fail
Bit 6 : IRQ raised Bit 4 copy success
Bit 7 : source address is invalid Bit 5 copy fail
Bit 8 : destination address is invalid Bit 6 IRQ raised
Bit 7 source address is invalid
*) PCI_ENDPOINT_TEST_SRC_ADDR Bit 8 destination address is invalid
======== ==============================
* PCI_ENDPOINT_TEST_SRC_ADDR
This register contains the source address (RC buffer address) for the This register contains the source address (RC buffer address) for the
COPY/READ command. COPY/READ command.
*) PCI_ENDPOINT_TEST_DST_ADDR * PCI_ENDPOINT_TEST_DST_ADDR
This register contains the destination address (RC buffer address) for This register contains the destination address (RC buffer address) for
the COPY/WRITE command. the COPY/WRITE command.
*) PCI_ENDPOINT_TEST_IRQ_TYPE * PCI_ENDPOINT_TEST_IRQ_TYPE
This register contains the interrupt type (Legacy/MSI) triggered This register contains the interrupt type (Legacy/MSI) triggered
for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands. for the READ/WRITE/COPY and raise IRQ (Legacy/MSI) commands.
Possible types: Possible types:
- Legacy : 0
- MSI : 1
- MSI-X : 2
*) PCI_ENDPOINT_TEST_IRQ_NUMBER ====== ==
Legacy 0
MSI 1
MSI-X 2
====== ==
* PCI_ENDPOINT_TEST_IRQ_NUMBER
This register contains the triggered ID interrupt. This register contains the triggered ID interrupt.
Admissible values: Admissible values:
- Legacy : 0
- MSI : [1 .. 32] ====== ===========
- MSI-X : [1 .. 2048] Legacy 0
MSI [1 .. 32]
MSI-X [1 .. 2048]
====== ===========
PCI TEST USERGUIDE .. SPDX-License-Identifier: GPL-2.0
Kishon Vijay Abraham I <kishon@ti.com>
===================
PCI Test User Guide
===================
:Author: Kishon Vijay Abraham I <kishon@ti.com>
This document is a guide to help users use pci-epf-test function driver This document is a guide to help users use pci-epf-test function driver
and pci_endpoint_test host driver for testing PCI. The list of steps to and pci_endpoint_test host driver for testing PCI. The list of steps to
be followed in the host side and EP side is given below. be followed in the host side and EP side is given below.
1. Endpoint Device Endpoint Device
===============
1.1 Endpoint Controller Devices Endpoint Controller Devices
---------------------------
To find the list of endpoint controller devices in the system: To find the list of endpoint controller devices in the system::
# ls /sys/class/pci_epc/ # ls /sys/class/pci_epc/
51000000.pcie_ep 51000000.pcie_ep
If PCI_ENDPOINT_CONFIGFS is enabled If PCI_ENDPOINT_CONFIGFS is enabled::
# ls /sys/kernel/config/pci_ep/controllers # ls /sys/kernel/config/pci_ep/controllers
51000000.pcie_ep 51000000.pcie_ep
1.2 Endpoint Function Drivers
To find the list of endpoint function drivers in the system: Endpoint Function Drivers
-------------------------
To find the list of endpoint function drivers in the system::
# ls /sys/bus/pci-epf/drivers # ls /sys/bus/pci-epf/drivers
pci_epf_test pci_epf_test
If PCI_ENDPOINT_CONFIGFS is enabled If PCI_ENDPOINT_CONFIGFS is enabled::
# ls /sys/kernel/config/pci_ep/functions # ls /sys/kernel/config/pci_ep/functions
pci_epf_test pci_epf_test
1.3 Creating pci-epf-test Device
Creating pci-epf-test Device
----------------------------
PCI endpoint function device can be created using the configfs. To create PCI endpoint function device can be created using the configfs. To create
pci-epf-test device, the following commands can be used pci-epf-test device, the following commands can be used::
# mount -t configfs none /sys/kernel/config # mount -t configfs none /sys/kernel/config
# cd /sys/kernel/config/pci_ep/ # cd /sys/kernel/config/pci_ep/
...@@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will ...@@ -42,7 +55,7 @@ The "mkdir func1" above creates the pci-epf-test function device that will
be probed by pci_epf_test driver. be probed by pci_epf_test driver.
The PCI endpoint framework populates the directory with the following The PCI endpoint framework populates the directory with the following
configurable fields. configurable fields::
# ls functions/pci_epf_test/func1 # ls functions/pci_epf_test/func1
baseclass_code interrupt_pin progif_code subsys_id baseclass_code interrupt_pin progif_code subsys_id
...@@ -51,67 +64,83 @@ configurable fields. ...@@ -51,67 +64,83 @@ configurable fields.
The PCI endpoint function driver populates these entries with default values The PCI endpoint function driver populates these entries with default values
when the device is bound to the driver. The pci-epf-test driver populates when the device is bound to the driver. The pci-epf-test driver populates
vendorid with 0xffff and interrupt_pin with 0x0001 vendorid with 0xffff and interrupt_pin with 0x0001::
# cat functions/pci_epf_test/func1/vendorid # cat functions/pci_epf_test/func1/vendorid
0xffff 0xffff
# cat functions/pci_epf_test/func1/interrupt_pin # cat functions/pci_epf_test/func1/interrupt_pin
0x0001 0x0001
1.4 Configuring pci-epf-test Device
Configuring pci-epf-test Device
-------------------------------
The user can configure the pci-epf-test device using configfs entry. In order The user can configure the pci-epf-test device using configfs entry. In order
to change the vendorid and the number of MSI interrupts used by the function to change the vendorid and the number of MSI interrupts used by the function
device, the following commands can be used. device, the following commands can be used::
# echo 0x104c > functions/pci_epf_test/func1/vendorid # echo 0x104c > functions/pci_epf_test/func1/vendorid
# echo 0xb500 > functions/pci_epf_test/func1/deviceid # echo 0xb500 > functions/pci_epf_test/func1/deviceid
# echo 16 > functions/pci_epf_test/func1/msi_interrupts # echo 16 > functions/pci_epf_test/func1/msi_interrupts
# echo 8 > functions/pci_epf_test/func1/msix_interrupts # echo 8 > functions/pci_epf_test/func1/msix_interrupts
1.5 Binding pci-epf-test Device to EP Controller
Binding pci-epf-test Device to EP Controller
--------------------------------------------
In order for the endpoint function device to be useful, it has to be bound to In order for the endpoint function device to be useful, it has to be bound to
a PCI endpoint controller driver. Use the configfs to bind the function a PCI endpoint controller driver. Use the configfs to bind the function
device to one of the controller driver present in the system. device to one of the controller driver present in the system::
# ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/ # ln -s functions/pci_epf_test/func1 controllers/51000000.pcie_ep/
Once the above step is completed, the PCI endpoint is ready to establish a link Once the above step is completed, the PCI endpoint is ready to establish a link
with the host. with the host.
1.6 Start the Link
Start the Link
--------------
In order for the endpoint device to establish a link with the host, the _start_ In order for the endpoint device to establish a link with the host, the _start_
field should be populated with '1'. field should be populated with '1'::
# echo 1 > controllers/51000000.pcie_ep/start # echo 1 > controllers/51000000.pcie_ep/start
2. RootComplex Device
2.1 lspci Output RootComplex Device
==================
lspci Output
------------
Note that the devices listed here correspond to the value populated in 1.4 above Note that the devices listed here correspond to the value populated in 1.4
above::
00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01) 00:00.0 PCI bridge: Texas Instruments Device 8888 (rev 01)
01:00.0 Unassigned class [ff00]: Texas Instruments Device b500 01:00.0 Unassigned class [ff00]: Texas Instruments Device b500
2.2 Using Endpoint Test function Device
Using Endpoint Test function Device
-----------------------------------
pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint pcitest.sh added in tools/pci/ can be used to run all the default PCI endpoint
tests. To compile this tool the following commands should be used: tests. To compile this tool the following commands should be used::
# cd <kernel-dir> # cd <kernel-dir>
# make -C tools/pci # make -C tools/pci
or if you desire to compile and install in your system: or if you desire to compile and install in your system::
# cd <kernel-dir> # cd <kernel-dir>
# make -C tools/pci install # make -C tools/pci install
The tool and script will be located in <rootfs>/usr/bin/ The tool and script will be located in <rootfs>/usr/bin/
2.2.1 pcitest.sh Output
pcitest.sh Output
~~~~~~~~~~~~~~~~~
::
# pcitest.sh # pcitest.sh
BAR tests BAR tests
......
.. SPDX-License-Identifier: GPL-2.0
=======================
Linux PCI Bus Subsystem
=======================
.. toctree::
:maxdepth: 2
:numbered:
pci
picebus-howto
pci-iov-howto
msi-howto
acpi-info
pci-error-recovery
pcieaer-howto
endpoint/index
The MSI Driver Guide HOWTO .. SPDX-License-Identifier: GPL-2.0
Tom L Nguyen tom.l.nguyen@intel.com .. include:: <isonum.txt>
10/03/2003
Revised Feb 12, 2004 by Martine Silbermann
email: Martine.Silbermann@hp.com
Revised Jun 25, 2004 by Tom L Nguyen
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
Copyright 2003, 2008 Intel Corporation
1. About this guide ==========================
The MSI Driver Guide HOWTO
==========================
:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
:Copyright: 2003, 2008 Intel Corporation
About this guide
================
This guide describes the basics of Message Signaled Interrupts (MSIs), This guide describes the basics of Message Signaled Interrupts (MSIs),
the advantages of using MSI over traditional interrupt mechanisms, how the advantages of using MSI over traditional interrupt mechanisms, how
...@@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to ...@@ -15,7 +18,8 @@ to change your driver to use MSI or MSI-X and some basic diagnostics to
try if a device doesn't support MSIs. try if a device doesn't support MSIs.
2. What are MSIs? What are MSIs?
==============
A Message Signaled Interrupt is a write from the device to a special A Message Signaled Interrupt is a write from the device to a special
address which causes an interrupt to be received by the CPU. address which causes an interrupt to be received by the CPU.
...@@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at ...@@ -29,7 +33,8 @@ Devices may support both MSI and MSI-X, but only one can be enabled at
a time. a time.
3. Why use MSIs? Why use MSIs?
=============
There are three reasons why using MSIs can give an advantage over There are three reasons why using MSIs can give an advantage over
traditional pin-based interrupts. traditional pin-based interrupts.
...@@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue ...@@ -61,14 +66,16 @@ Other possible designs include giving one interrupt to each packet queue
in a network card or each port in a storage controller. in a network card or each port in a storage controller.
4. How to use MSIs How to use MSIs
===============
PCI devices are initialised to use pin-based interrupts. The device PCI devices are initialised to use pin-based interrupts. The device
driver has to set up the device to use MSI or MSI-X. Not all machines driver has to set up the device to use MSI or MSI-X. Not all machines
support MSIs correctly, and for those machines, the APIs described below support MSIs correctly, and for those machines, the APIs described below
will simply fail and the device will continue to use pin-based interrupts. will simply fail and the device will continue to use pin-based interrupts.
4.1 Include kernel support for MSIs Include kernel support for MSIs
-------------------------------
To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
option enabled. This option is only available on some architectures, option enabled. This option is only available on some architectures,
...@@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example, ...@@ -76,14 +83,15 @@ and it may depend on some other options also being set. For example,
on x86, you must also enable X86_UP_APIC or SMP in order to see the on x86, you must also enable X86_UP_APIC or SMP in order to see the
CONFIG_PCI_MSI option. CONFIG_PCI_MSI option.
4.2 Using MSI Using MSI
---------
Most of the hard work is done for the driver in the PCI layer. The driver Most of the hard work is done for the driver in the PCI layer. The driver
simply has to request that the PCI layer set up the MSI capability for this simply has to request that the PCI layer set up the MSI capability for this
device. device.
To automatically use MSI or MSI-X interrupt vectors, use the following To automatically use MSI or MSI-X interrupt vectors, use the following
function: function::
int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags); unsigned int max_vecs, unsigned int flags);
...@@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set, ...@@ -101,12 +109,12 @@ any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set,
pci_alloc_irq_vectors() will spread the interrupts around the available CPUs. pci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
To get the Linux IRQ numbers passed to request_irq() and free_irq() and the To get the Linux IRQ numbers passed to request_irq() and free_irq() and the
vectors, use the following function: vectors, use the following function::
int pci_irq_vector(struct pci_dev *dev, unsigned int nr); int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
Any allocated resources should be freed before removing the device using Any allocated resources should be freed before removing the device using
the following function: the following function::
void pci_free_irq_vectors(struct pci_dev *dev); void pci_free_irq_vectors(struct pci_dev *dev);
...@@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors ...@@ -126,7 +134,7 @@ The typical usage of MSI or MSI-X interrupts is to allocate as many vectors
as possible, likely up to the limit supported by the device. If nvec is as possible, likely up to the limit supported by the device. If nvec is
larger than the number supported by the device it will automatically be larger than the number supported by the device it will automatically be
capped to the supported limit, so there is no need to query the number of capped to the supported limit, so there is no need to query the number of
vectors supported beforehand: vectors supported beforehand::
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES) nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
if (nvec < 0) if (nvec < 0)
...@@ -135,7 +143,7 @@ vectors supported beforehand: ...@@ -135,7 +143,7 @@ vectors supported beforehand:
If a driver is unable or unwilling to deal with a variable number of MSI If a driver is unable or unwilling to deal with a variable number of MSI
interrupts it can request a particular number of interrupts by passing that interrupts it can request a particular number of interrupts by passing that
number to pci_alloc_irq_vectors() function as both 'min_vecs' and number to pci_alloc_irq_vectors() function as both 'min_vecs' and
'max_vecs' parameters: 'max_vecs' parameters::
ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES); ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
if (ret < 0) if (ret < 0)
...@@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and ...@@ -143,23 +151,24 @@ number to pci_alloc_irq_vectors() function as both 'min_vecs' and
The most notorious example of the request type described above is enabling The most notorious example of the request type described above is enabling
the single MSI mode for a device. It could be done by passing two 1s as the single MSI mode for a device. It could be done by passing two 1s as
'min_vecs' and 'max_vecs': 'min_vecs' and 'max_vecs'::
ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
if (ret < 0) if (ret < 0)
goto out_err; goto out_err;
Some devices might not support using legacy line interrupts, in which case Some devices might not support using legacy line interrupts, in which case
the driver can specify that only MSI or MSI-X is acceptable: the driver can specify that only MSI or MSI-X is acceptable::
nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX); nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
if (nvec < 0) if (nvec < 0)
goto out_err; goto out_err;
4.3 Legacy APIs Legacy APIs
-----------
The following old APIs to enable and disable MSI or MSI-X interrupts should The following old APIs to enable and disable MSI or MSI-X interrupts should
not be used in new code: not be used in new code::
pci_enable_msi() /* deprecated */ pci_enable_msi() /* deprecated */
pci_disable_msi() /* deprecated */ pci_disable_msi() /* deprecated */
...@@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count ...@@ -174,9 +183,11 @@ number of vectors. If you have a legitimate special use case for the count
of vectors we might have to revisit that decision and add a of vectors we might have to revisit that decision and add a
pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently. pci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
4.4 Considerations when using MSIs Considerations when using MSIs
------------------------------
4.4.1 Spinlocks Spinlocks
~~~~~~~~~
Most device drivers have a per-device spinlock which is taken in the Most device drivers have a per-device spinlock which is taken in the
interrupt handler. With pin-based interrupts or a single MSI, it is not interrupt handler. With pin-based interrupts or a single MSI, it is not
...@@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using ...@@ -188,7 +199,8 @@ acquire the spinlock. Such deadlocks can be avoided by using
spin_lock_irqsave() or spin_lock_irq() which disable local interrupts spin_lock_irqsave() or spin_lock_irq() which disable local interrupts
and acquire the lock (see Documentation/kernel-hacking/locking.rst). and acquire the lock (see Documentation/kernel-hacking/locking.rst).
4.5 How to tell whether MSI/MSI-X is enabled on a device How to tell whether MSI/MSI-X is enabled on a device
----------------------------------------------------
Using 'lspci -v' (as root) may show some devices with "MSI", "Message Using 'lspci -v' (as root) may show some devices with "MSI", "Message
Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
...@@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled) ...@@ -196,7 +208,8 @@ has an 'Enable' flag which is followed with either "+" (enabled)
or "-" (disabled). or "-" (disabled).
5. MSI quirks MSI quirks
==========
Several PCI chipsets or devices are known not to support MSIs. Several PCI chipsets or devices are known not to support MSIs.
The PCI stack provides three ways to disable MSIs: The PCI stack provides three ways to disable MSIs:
...@@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs: ...@@ -205,7 +218,8 @@ The PCI stack provides three ways to disable MSIs:
2. on all devices behind a specific bridge 2. on all devices behind a specific bridge
3. on a single device 3. on a single device
5.1. Disabling MSIs globally Disabling MSIs globally
-----------------------
Some host chipsets simply don't support MSIs properly. If we're Some host chipsets simply don't support MSIs properly. If we're
lucky, the manufacturer knows this and has indicated it in the ACPI lucky, the manufacturer knows this and has indicated it in the ACPI
...@@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be ...@@ -219,7 +233,8 @@ on the kernel command line to disable MSIs on all devices. It would be
in your best interests to report the problem to linux-pci@vger.kernel.org in your best interests to report the problem to linux-pci@vger.kernel.org
including a full 'lspci -v' so we can add the quirks to the kernel. including a full 'lspci -v' so we can add the quirks to the kernel.
5.2. Disabling MSIs below a bridge Disabling MSIs below a bridge
-----------------------------
Some PCI bridges are not able to route MSIs between busses properly. Some PCI bridges are not able to route MSIs between busses properly.
In this case, MSIs must be disabled on all devices behind the bridge. In this case, MSIs must be disabled on all devices behind the bridge.
...@@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets, ...@@ -230,7 +245,7 @@ as the nVidia nForce and Serverworks HT2000). As with host chipsets,
Linux mostly knows about them and automatically enables MSIs if it can. Linux mostly knows about them and automatically enables MSIs if it can.
If you have a bridge unknown to Linux, you can enable If you have a bridge unknown to Linux, you can enable
MSIs in configuration space using whatever method you know works, then MSIs in configuration space using whatever method you know works, then
enable MSIs on that bridge by doing: enable MSIs on that bridge by doing::
echo 1 > /sys/bus/pci/devices/$bridge/msi_bus echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
...@@ -244,7 +259,8 @@ below this bridge. ...@@ -244,7 +259,8 @@ below this bridge.
Again, please notify linux-pci@vger.kernel.org of any bridges that need Again, please notify linux-pci@vger.kernel.org of any bridges that need
special handling. special handling.
5.3. Disabling MSIs on a single device Disabling MSIs on a single device
---------------------------------
Some devices are known to have faulty MSI implementations. Usually this Some devices are known to have faulty MSI implementations. Usually this
is handled in the individual device driver, but occasionally it's necessary is handled in the individual device driver, but occasionally it's necessary
...@@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use ...@@ -252,7 +268,8 @@ to handle this with a quirk. Some drivers have an option to disable use
of MSI. While this is a convenient workaround for the driver author, of MSI. While this is a convenient workaround for the driver author,
it is not good practice, and should not be emulated. it is not good practice, and should not be emulated.
5.4. Finding why MSIs are disabled on a device Finding why MSIs are disabled on a device
-----------------------------------------
From the above three sections, you can see that there are many reasons From the above three sections, you can see that there are many reasons
why MSIs may not be enabled for a given device. Your first step should why MSIs may not be enabled for a given device. Your first step should
...@@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled ...@@ -260,8 +277,8 @@ be to examine your dmesg carefully to determine whether MSIs are enabled
for your machine. You should also check your .config to be sure you for your machine. You should also check your .config to be sure you
have enabled CONFIG_PCI_MSI. have enabled CONFIG_PCI_MSI.
Then, 'lspci -t' gives the list of bridges above a device. Reading Then, 'lspci -t' gives the list of bridges above a device. Reading
/sys/bus/pci/devices/*/msi_bus will tell you whether MSIs are enabled (1) `/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
or disabled (0). If 0 is found in any of the msi_bus files belonging or disabled (0). If 0 is found in any of the msi_bus files belonging
to bridges between the PCI root and the device, MSIs are disabled. to bridges between the PCI root and the device, MSIs are disabled.
......
.. SPDX-License-Identifier: GPL-2.0
PCI Error Recovery ==================
------------------ PCI Error Recovery
February 2, 2006 ==================
Current document maintainer:
Linas Vepstas <linasvepstas@gmail.com> :Authors: - Linas Vepstas <linasvepstas@gmail.com>
updated by Richard Lary <rlary@us.ibm.com> - Richard Lary <rlary@us.ibm.com>
and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009 - Mike Mason <mmlnx@us.ibm.com>
Many PCI bus controllers are able to detect a variety of hardware Many PCI bus controllers are able to detect a variety of hardware
...@@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets. ...@@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
Detailed Design Detailed Design
--------------- ===============
Design and implementation details below, based on a chain of Design and implementation details below, based on a chain of
public email discussions with Ben Herrenschmidt, circa 5 April 2005. public email discussions with Ben Herrenschmidt, circa 5 April 2005.
...@@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware", ...@@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
and the actual recovery steps taken are platform dependent. The and the actual recovery steps taken are platform dependent. The
arch/powerpc implementation will simulate a PCI hotplug remove/add. arch/powerpc implementation will simulate a PCI hotplug remove/add.
This structure has the form: This structure has the form::
struct pci_error_handlers
{ struct pci_error_handlers
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); {
int (*mmio_enabled)(struct pci_dev *dev); int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
int (*slot_reset)(struct pci_dev *dev); int (*mmio_enabled)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev); int (*slot_reset)(struct pci_dev *dev);
}; void (*resume)(struct pci_dev *dev);
};
The possible channel states are:
enum pci_channel_state { The possible channel states are::
pci_channel_io_normal, /* I/O channel is in normal state */
pci_channel_io_frozen, /* I/O to channel is blocked */ enum pci_channel_state {
pci_channel_io_perm_failure, /* PCI card is dead */ pci_channel_io_normal, /* I/O channel is in normal state */
}; pci_channel_io_frozen, /* I/O to channel is blocked */
pci_channel_io_perm_failure, /* PCI card is dead */
Possible return values are: };
enum pci_ers_result {
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ Possible return values are::
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ enum pci_ers_result {
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
}; PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
};
A driver does not have to implement all of these callbacks; however, A driver does not have to implement all of these callbacks; however,
if it implements any, it must implement error_detected(). If a callback if it implements any, it must implement error_detected(). If a callback
...@@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a ...@@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
All drivers participating in this system must implement this call. All drivers participating in this system must implement this call.
The driver must return one of the following result codes: The driver must return one of the following result codes:
- PCI_ERS_RESULT_CAN_RECOVER:
Driver returns this if it thinks it might be able to recover - PCI_ERS_RESULT_CAN_RECOVER
the HW by just banging IOs or if it wants to be given Driver returns this if it thinks it might be able to recover
a chance to extract some diagnostic information (see the HW by just banging IOs or if it wants to be given
mmio_enable, below). a chance to extract some diagnostic information (see
- PCI_ERS_RESULT_NEED_RESET: mmio_enable, below).
Driver returns this if it can't recover without a - PCI_ERS_RESULT_NEED_RESET
slot reset. Driver returns this if it can't recover without a
- PCI_ERS_RESULT_DISCONNECT: slot reset.
Driver returns this if it doesn't want to recover at all. - PCI_ERS_RESULT_DISCONNECT
Driver returns this if it doesn't want to recover at all.
The next step taken will depend on the result codes returned by the The next step taken will depend on the result codes returned by the
drivers. drivers.
...@@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset). ...@@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset).
If the platform is unable to recover the slot, the next step If the platform is unable to recover the slot, the next step
is STEP 6 (Permanent Failure). is STEP 6 (Permanent Failure).
>>> The current powerpc implementation assumes that a device driver will .. note::
>>> *not* schedule or semaphore in this routine; the current powerpc
>>> implementation uses one kernel thread to notify all devices; The current powerpc implementation assumes that a device driver will
>>> thus, if one device sleeps/schedules, all devices are affected. *not* schedule or semaphore in this routine; the current powerpc
>>> Doing better requires complex multi-threaded logic in the error implementation uses one kernel thread to notify all devices;
>>> recovery implementation (e.g. waiting for all notification threads thus, if one device sleeps/schedules, all devices are affected.
>>> to "join" before proceeding with recovery.) This seems excessively Doing better requires complex multi-threaded logic in the error
>>> complex and not worth implementing. recovery implementation (e.g. waiting for all notification threads
to "join" before proceeding with recovery.) This seems excessively
>>> The current powerpc implementation doesn't much care if the device complex and not worth implementing.
>>> attempts I/O at this point, or not. I/O's will fail, returning
>>> a value of 0xff on read, and writes will be dropped. If more than The current powerpc implementation doesn't much care if the device
>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH attempts I/O at this point, or not. I/O's will fail, returning
>>> assumes that the device driver has gone into an infinite loop a value of 0xff on read, and writes will be dropped. If more than
>>> and prints an error to syslog. A reboot is then required to EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
>>> get the device working again. assumes that the device driver has gone into an infinite loop
and prints an error to syslog. A reboot is then required to
get the device working again.
STEP 2: MMIO Enabled STEP 2: MMIO Enabled
------------------- --------------------
The platform re-enables MMIO to the device (but typically not the The platform re-enables MMIO to the device (but typically not the
DMA), and then calls the mmio_enabled() callback on all affected DMA), and then calls the mmio_enabled() callback on all affected
device drivers. device drivers.
...@@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs ...@@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs
without a slot reset or a link reset, it will not call this callback, and without a slot reset or a link reset, it will not call this callback, and
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
>>> The following is proposed; no platform implements this yet: .. note::
>>> Proposal: All I/O's should be done _synchronously_ from within
>>> this callback, errors triggered by them will be returned via The following is proposed; no platform implements this yet:
>>> the normal pci_check_whatever() API, no new error_detected() Proposal: All I/O's should be done _synchronously_ from within
>>> callback will be issued due to an error happening here. However, this callback, errors triggered by them will be returned via
>>> such an error might cause IOs to be re-blocked for the whole the normal pci_check_whatever() API, no new error_detected()
>>> segment, and thus invalidate the recovery that other devices callback will be issued due to an error happening here. However,
>>> on the same segment might have done, forcing the whole segment such an error might cause IOs to be re-blocked for the whole
>>> into one of the next states, that is, link reset or slot reset. segment, and thus invalidate the recovery that other devices
on the same segment might have done, forcing the whole segment
into one of the next states, that is, link reset or slot reset.
The driver should return one of the following result codes: The driver should return one of the following result codes:
- PCI_ERS_RESULT_RECOVERED - PCI_ERS_RESULT_RECOVERED
Driver returns this if it thinks the device is fully Driver returns this if it thinks the device is fully
functional and thinks it is ready to start functional and thinks it is ready to start
normal driver operations again. There is no normal driver operations again. There is no
guarantee that the driver will actually be guarantee that the driver will actually be
allowed to proceed, as another driver on the allowed to proceed, as another driver on the
same segment might have failed and thus triggered a same segment might have failed and thus triggered a
slot reset on platforms that support it. slot reset on platforms that support it.
- PCI_ERS_RESULT_NEED_RESET - PCI_ERS_RESULT_NEED_RESET
Driver returns this if it thinks the device is not Driver returns this if it thinks the device is not
recoverable in its current state and it needs a slot recoverable in its current state and it needs a slot
reset to proceed. reset to proceed.
- PCI_ERS_RESULT_DISCONNECT - PCI_ERS_RESULT_DISCONNECT
Same as above. Total failure, no recovery even after Same as above. Total failure, no recovery even after
reset driver dead. (To be defined more precisely) reset driver dead. (To be defined more precisely)
The next step taken depends on the results returned by the drivers. The next step taken depends on the results returned by the drivers.
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
...@@ -293,31 +303,33 @@ device will be considered "dead" in this case. ...@@ -293,31 +303,33 @@ device will be considered "dead" in this case.
Drivers for multi-function cards will need to coordinate among Drivers for multi-function cards will need to coordinate among
themselves as to which driver instance will perform any "one-shot" themselves as to which driver instance will perform any "one-shot"
or global device initialization. For example, the Symbios sym53cxx2 or global device initialization. For example, the Symbios sym53cxx2
driver performs device init only from PCI function 0: driver performs device init only from PCI function 0::
+ if (PCI_FUNC(pdev->devfn) == 0) + if (PCI_FUNC(pdev->devfn) == 0)
+ sym_reset_scsi_bus(np, 0); + sym_reset_scsi_bus(np, 0);
Result codes: Result codes:
- PCI_ERS_RESULT_DISCONNECT - PCI_ERS_RESULT_DISCONNECT
Same as above. Same as above.
Drivers for PCI Express cards that require a fundamental reset must Drivers for PCI Express cards that require a fundamental reset must
set the needs_freset bit in the pci_dev structure in their probe function. set the needs_freset bit in the pci_dev structure in their probe function.
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
PCI card types: PCI card types::
+ /* Set EEH reset type to fundamental if required by hba */ + /* Set EEH reset type to fundamental if required by hba */
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
+ pdev->needs_freset = 1; + pdev->needs_freset = 1;
+ +
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
Failure). Failure).
>>> The current powerpc implementation does not try a power-cycle .. note::
>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
>>> However, it probably should. The current powerpc implementation does not try a power-cycle
reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
However, it probably should.
STEP 5: Resume Operations STEP 5: Resume Operations
...@@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy. ...@@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy.
That is, the recovery API only requires that: That is, the recovery API only requires that:
- There is no guarantee that interrupt delivery can proceed from any - There is no guarantee that interrupt delivery can proceed from any
device on the segment starting from the error detection and until the device on the segment starting from the error detection and until the
slot_reset callback is called, at which point interrupts are expected slot_reset callback is called, at which point interrupts are expected
to be fully operational. to be fully operational.
- There is no guarantee that interrupt delivery is stopped, that is, - There is no guarantee that interrupt delivery is stopped, that is,
a driver that gets an interrupt after detecting an error, or that detects a driver that gets an interrupt after detecting an error, or that detects
an error within the interrupt handler such that it prevents proper an error within the interrupt handler such that it prevents proper
ack'ing of the interrupt (and thus removal of the source) should just ack'ing of the interrupt (and thus removal of the source) should just
return IRQ_NOTHANDLED. It's up to the platform to deal with that return IRQ_NOTHANDLED. It's up to the platform to deal with that
condition, typically by masking the IRQ source during the duration of condition, typically by masking the IRQ source during the duration of
the error handling. It is expected that the platform "knows" which the error handling. It is expected that the platform "knows" which
interrupts are routed to error-management capable slots and can deal interrupts are routed to error-management capable slots and can deal
with temporarily disabling that IRQ number during error processing (this with temporarily disabling that IRQ number during error processing (this
isn't terribly complex). That means some IRQ latency for other devices isn't terribly complex). That means some IRQ latency for other devices
sharing the interrupt, but there is simply no other way. High end sharing the interrupt, but there is simply no other way. High end
platforms aren't supposed to share interrupts between many devices platforms aren't supposed to share interrupts between many devices
anyway :) anyway :)
>>> Implementation details for the powerpc platform are discussed in .. note::
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
Implementation details for the powerpc platform are discussed in
>>> As of this writing, there is a growing list of device drivers with the file Documentation/powerpc/eeh-pci-error-recovery.txt
>>> patches implementing error recovery. Not all of these patches are in
>>> mainline yet. These may be used as "examples": As of this writing, there is a growing list of device drivers with
>>> patches implementing error recovery. Not all of these patches are in
>>> drivers/scsi/ipr mainline yet. These may be used as "examples":
>>> drivers/scsi/sym53c8xx_2
>>> drivers/scsi/qla2xxx - drivers/scsi/ipr
>>> drivers/scsi/lpfc - drivers/scsi/sym53c8xx_2
>>> drivers/next/bnx2.c - drivers/scsi/qla2xxx
>>> drivers/next/e100.c - drivers/scsi/lpfc
>>> drivers/net/e1000 - drivers/next/bnx2.c
>>> drivers/net/e1000e - drivers/next/e100.c
>>> drivers/net/ixgb - drivers/net/e1000
>>> drivers/net/ixgbe - drivers/net/e1000e
>>> drivers/net/cxgb3 - drivers/net/ixgb
>>> drivers/net/s2io.c - drivers/net/ixgbe
>>> drivers/net/qlge - drivers/net/cxgb3
- drivers/net/s2io.c
The End - drivers/net/qlge
-------
PCI Express I/O Virtualization Howto .. SPDX-License-Identifier: GPL-2.0
Copyright (C) 2009 Intel Corporation .. include:: <isonum.txt>
Yu Zhao <yu.zhao@intel.com>
Update: November 2012 ====================================
-- sysfs-based SRIOV enable-/disable-ment PCI Express I/O Virtualization Howto
Donald Dutile <ddutile@redhat.com> ====================================
1. Overview :Copyright: |copy| 2009 Intel Corporation
:Authors: - Yu Zhao <yu.zhao@intel.com>
- Donald Dutile <ddutile@redhat.com>
1.1 What is SR-IOV Overview
========
What is SR-IOV
--------------
Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended Single Root I/O Virtualization (SR-IOV) is a PCI Express Extended
capability which makes one physical device appear as multiple virtual capability which makes one physical device appear as multiple virtual
...@@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver ...@@ -23,9 +28,11 @@ Memory Space, which is used to map its register set. VF device driver
operates on the register set so it can be functional and appear as a operates on the register set so it can be functional and appear as a
real existing PCI device. real existing PCI device.
2. User Guide User Guide
==========
2.1 How can I enable SR-IOV capability How can I enable SR-IOV capability
----------------------------------
Multiple methods are available for SR-IOV enablement. Multiple methods are available for SR-IOV enablement.
In the first method, the device driver (PF driver) will control the In the first method, the device driver (PF driver) will control the
...@@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure ...@@ -43,105 +50,123 @@ checks, e.g., check numvfs == 0 if enabling VFs, ensure
numvfs <= totalvfs. numvfs <= totalvfs.
The second method is the recommended method for new/future VF devices. The second method is the recommended method for new/future VF devices.
2.2 How can I use the Virtual Functions How can I use the Virtual Functions
-----------------------------------
The VF is treated as hot-plugged PCI devices in the kernel, so they The VF is treated as hot-plugged PCI devices in the kernel, so they
should be able to work in the same way as real PCI devices. The VF should be able to work in the same way as real PCI devices. The VF
requires device driver that is same as a normal PCI device's. requires device driver that is same as a normal PCI device's.
3. Developer Guide Developer Guide
===============
3.1 SR-IOV API SR-IOV API
----------
To enable SR-IOV capability: To enable SR-IOV capability:
(a) For the first method, in the driver:
(a) For the first method, in the driver::
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
'nr_virtfn' is number of VFs to be enabled.
(b) For the second method, from sysfs: 'nr_virtfn' is number of VFs to be enabled.
(b) For the second method, from sysfs::
echo 'nr_virtfn' > \ echo 'nr_virtfn' > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
To disable SR-IOV capability: To disable SR-IOV capability:
(a) For the first method, in the driver:
(a) For the first method, in the driver::
void pci_disable_sriov(struct pci_dev *dev); void pci_disable_sriov(struct pci_dev *dev);
(b) For the second method, from sysfs:
(b) For the second method, from sysfs::
echo 0 > \ echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_numvfs
To enable auto probing VFs by a compatible driver on the host, run To enable auto probing VFs by a compatible driver on the host, run
command below before enabling SR-IOV capabilities. This is the command below before enabling SR-IOV capabilities. This is the
default behavior. default behavior.
::
echo 1 > \ echo 1 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
To disable auto probing VFs by a compatible driver on the host, run To disable auto probing VFs by a compatible driver on the host, run
command below before enabling SR-IOV capabilities. Updating this command below before enabling SR-IOV capabilities. Updating this
entry will not affect VFs which are already probed. entry will not affect VFs which are already probed.
::
echo 0 > \ echo 0 > \
/sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe /sys/bus/pci/devices/<DOMAIN:BUS:DEVICE.FUNCTION>/sriov_drivers_autoprobe
3.2 Usage example Usage example
-------------
Following piece of code illustrates the usage of the SR-IOV API. Following piece of code illustrates the usage of the SR-IOV API.
::
static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id) static int dev_probe(struct pci_dev *dev, const struct pci_device_id *id)
{ {
pci_enable_sriov(dev, NR_VIRTFN); pci_enable_sriov(dev, NR_VIRTFN);
... ...
return 0;
}
static void dev_remove(struct pci_dev *dev) return 0;
{ }
pci_disable_sriov(dev);
... static void dev_remove(struct pci_dev *dev)
} {
pci_disable_sriov(dev);
static int dev_suspend(struct pci_dev *dev, pm_message_t state) ...
{ }
...
return 0; static int dev_suspend(struct pci_dev *dev, pm_message_t state)
} {
...
static int dev_resume(struct pci_dev *dev) return 0;
{ }
...
return 0; static int dev_resume(struct pci_dev *dev)
} {
...
static void dev_shutdown(struct pci_dev *dev) return 0;
{ }
...
}
static int dev_sriov_configure(struct pci_dev *dev, int numvfs) static void dev_shutdown(struct pci_dev *dev)
{ {
if (numvfs > 0) {
...
pci_enable_sriov(dev, numvfs);
... ...
return numvfs;
} }
if (numvfs == 0) {
.... static int dev_sriov_configure(struct pci_dev *dev, int numvfs)
pci_disable_sriov(dev); {
... if (numvfs > 0) {
return 0; ...
pci_enable_sriov(dev, numvfs);
...
return numvfs;
}
if (numvfs == 0) {
....
pci_disable_sriov(dev);
...
return 0;
}
} }
}
static struct pci_driver dev_driver = {
static struct pci_driver dev_driver = { .name = "SR-IOV Physical Function driver",
.name = "SR-IOV Physical Function driver", .id_table = dev_id_table,
.id_table = dev_id_table, .probe = dev_probe,
.probe = dev_probe, .remove = dev_remove,
.remove = dev_remove, .suspend = dev_suspend,
.suspend = dev_suspend, .resume = dev_resume,
.resume = dev_resume, .shutdown = dev_shutdown,
.shutdown = dev_shutdown, .sriov_configure = dev_sriov_configure,
.sriov_configure = dev_sriov_configure, };
};
The PCI Express Advanced Error Reporting Driver Guide HOWTO .. SPDX-License-Identifier: GPL-2.0
T. Long Nguyen <tom.l.nguyen@intel.com> .. include:: <isonum.txt>
Yanmin Zhang <yanmin.zhang@intel.com>
07/29/2006
===========================================================
The PCI Express Advanced Error Reporting Driver Guide HOWTO
===========================================================
1. Overview :Authors: - T. Long Nguyen <tom.l.nguyen@intel.com>
- Yanmin Zhang <yanmin.zhang@intel.com>
1.1 About this guide :Copyright: |copy| 2006 Intel Corporation
Overview
===========
About this guide
----------------
This guide describes the basics of the PCI Express Advanced Error This guide describes the basics of the PCI Express Advanced Error
Reporting (AER) driver and provides information on how to use it, as Reporting (AER) driver and provides information on how to use it, as
well as how to enable the drivers of endpoint devices to conform with well as how to enable the drivers of endpoint devices to conform with
PCI Express AER driver. PCI Express AER driver.
1.2 Copyright (C) Intel Corporation 2006.
1.3 What is the PCI Express AER Driver? What is the PCI Express AER Driver?
-----------------------------------
PCI Express error signaling can occur on the PCI Express link itself PCI Express error signaling can occur on the PCI Express link itself
or on behalf of transactions initiated on the link. PCI Express or on behalf of transactions initiated on the link. PCI Express
...@@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI ...@@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI
Express Advanced Error Reporting capability. The PCI Express AER Express Advanced Error Reporting capability. The PCI Express AER
driver provides three basic functions: driver provides three basic functions:
- Gathers the comprehensive error information if errors occurred. - Gathers the comprehensive error information if errors occurred.
- Reports error to the users. - Reports error to the users.
- Performs error recovery actions. - Performs error recovery actions.
AER driver only attaches root ports which support PCI-Express AER AER driver only attaches root ports which support PCI-Express AER
capability. capability.
2. User Guide User Guide
==========
2.1 Include the PCI Express AER Root Driver into the Linux Kernel Include the PCI Express AER Root Driver into the Linux Kernel
-------------------------------------------------------------
The PCI Express AER Root driver is a Root Port service driver attached The PCI Express AER Root driver is a Root Port service driver attached
to the PCI Express Port Bus driver. If a user wants to use it, the driver to the PCI Express Port Bus driver. If a user wants to use it, the driver
...@@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It ...@@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It
depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
CONFIG_PCIEAER = y. CONFIG_PCIEAER = y.
2.2 Load PCI Express AER Root Driver Load PCI Express AER Root Driver
--------------------------------
Some systems have AER support in firmware. Enabling Linux AER support at Some systems have AER support in firmware. Enabling Linux AER support at
the same time the firmware handles AER may result in unpredictable the same time the firmware handles AER may result in unpredictable
...@@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware ...@@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware
grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0 grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
Specification for details regarding _OSC usage. Specification for details regarding _OSC usage.
2.3 AER error output AER error output
----------------
When a PCIe AER error is captured, an error message will be output to When a PCIe AER error is captured, an error message will be output to
console. If it's a correctable error, it is output as a warning. console. If it's a correctable error, it is output as a warning.
Otherwise, it is printed as an error. So users could choose different Otherwise, it is printed as an error. So users could choose different
log level to filter out correctable error messages. log level to filter out correctable error messages.
Below shows an example: Below shows an example::
0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID)
0000:50:00.0: [20] Unsupported Request (First) 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 0000:50:00.0: [20] Unsupported Request (First)
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
In the example, 'Requester ID' means the ID of the device who sends In the example, 'Requester ID' means the ID of the device who sends
the error message to root port. Pls. refer to pci express specs for the error message to root port. Pls. refer to pci express specs for
other fields. other fields.
2.4 AER Statistics / Counters AER Statistics / Counters
-------------------------
When PCIe AER errors are captured, the counters / statistics are also exposed When PCIe AER errors are captured, the counters / statistics are also exposed
in the form of sysfs attributes which are documented at in the form of sysfs attributes which are documented at
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
3. Developer Guide Developer Guide
===============
To enable AER aware support requires a software driver to configure To enable AER aware support requires a software driver to configure
the AER capability structure within its device and to provide callbacks. the AER capability structure within its device and to provide callbacks.
...@@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific ...@@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific
errors because device specific errors will still get sent directly to errors because device specific errors will still get sent directly to
the device driver. the device driver.
3.1 Configure the AER capability structure Configure the AER capability structure
--------------------------------------
AER aware drivers of PCI Express component need change the device AER aware drivers of PCI Express component need change the device
control registers to enable AER. They also could change AER registers, control registers to enable AER. They also could change AER registers,
...@@ -128,9 +144,11 @@ including mask and severity registers. Helper function ...@@ -128,9 +144,11 @@ including mask and severity registers. Helper function
pci_enable_pcie_error_reporting could be used to enable AER. See pci_enable_pcie_error_reporting could be used to enable AER. See
section 3.3. section 3.3.
3.2. Provide callbacks Provide callbacks
-----------------
3.2.1 callback reset_link to reset pci express link callback reset_link to reset pci express link
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This callback is used to reset the pci express physical link when a This callback is used to reset the pci express physical link when a
fatal error happens. The root port aer service driver provides a fatal error happens. The root port aer service driver provides a
...@@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions. ...@@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions.
In struct pcie_port_service_driver, a new pointer, reset_link, is In struct pcie_port_service_driver, a new pointer, reset_link, is
added. added.
::
pci_ers_result_t (*reset_link) (struct pci_dev *dev); pci_ers_result_t (*reset_link) (struct pci_dev *dev);
Section 3.2.2.2 provides more detailed info on when to call Section 3.2.2.2 provides more detailed info on when to call
reset_link. reset_link.
3.2.2 PCI error-recovery callbacks PCI error-recovery callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The PCI Express AER Root driver uses error callbacks to coordinate The PCI Express AER Root driver uses error callbacks to coordinate
with downstream device drivers associated with a hierarchy in question with downstream device drivers associated with a hierarchy in question
...@@ -161,7 +181,8 @@ definitions of the callbacks. ...@@ -161,7 +181,8 @@ definitions of the callbacks.
Below sections specify when to call the error callback functions. Below sections specify when to call the error callback functions.
3.2.2.1 Correctable errors Correctable errors
~~~~~~~~~~~~~~~~~~
Correctable errors pose no impacts on the functionality of Correctable errors pose no impacts on the functionality of
the interface. The PCI Express protocol can recover without any the interface. The PCI Express protocol can recover without any
...@@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not ...@@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not
require any recovery actions. The AER driver clears the device's require any recovery actions. The AER driver clears the device's
correctable error status register accordingly and logs these errors. correctable error status register accordingly and logs these errors.
3.2.2.2 Non-correctable (non-fatal and fatal) errors Non-correctable (non-fatal and fatal) errors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If an error message indicates a non-fatal error, performing link reset If an error message indicates a non-fatal error, performing link reset
at upstream is not required. The AER driver calls error_detected(dev, at upstream is not required. The AER driver calls error_detected(dev,
pci_channel_io_normal) to all drivers associated within a hierarchy in pci_channel_io_normal) to all drivers associated within a hierarchy in
question. for example, question. for example::
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
If Upstream port A captures an AER error, the hierarchy consists of If Upstream port A captures an AER error, the hierarchy consists of
Downstream port B and EndPoint. Downstream port B and EndPoint.
...@@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and ...@@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
to mmio_enabled. to mmio_enabled.
3.3 helper functions helper functions
----------------
::
int pci_enable_pcie_error_reporting(struct pci_dev *dev);
3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
pci_enable_pcie_error_reporting enables the device to send error pci_enable_pcie_error_reporting enables the device to send error
messages to root port when an error is detected. Note that devices messages to root port when an error is detected. Note that devices
don't enable the error reporting by default, so device drivers need don't enable the error reporting by default, so device drivers need
call this function to enable it. call this function to enable it.
3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev); ::
int pci_disable_pcie_error_reporting(struct pci_dev *dev);
pci_disable_pcie_error_reporting disables the device to send error pci_disable_pcie_error_reporting disables the device to send error
messages to root port when an error is detected. messages to root port when an error is detected.
3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); ::
int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);`
pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
error status register. error status register.
3.4 Frequent Asked Questions Frequent Asked Questions
------------------------
Q: What happens if a PCI Express device driver does not provide an Q:
error recovery handler (pci_driver->err_handler is equal to NULL)? What happens if a PCI Express device driver does not provide an
error recovery handler (pci_driver->err_handler is equal to NULL)?
A: The devices attached with the driver won't be recovered. If the A:
error is fatal, kernel will print out warning messages. Please refer The devices attached with the driver won't be recovered. If the
to section 3 for more information. error is fatal, kernel will print out warning messages. Please refer
to section 3 for more information.
Q: What happens if an upstream port service driver does not provide Q:
callback reset_link? What happens if an upstream port service driver does not provide
callback reset_link?
A: Fatal error recovery will fail if the errors are reported by the A:
upstream ports who are attached by the service driver. Fatal error recovery will fail if the errors are reported by the
upstream ports who are attached by the service driver.
Q: How does this infrastructure deal with driver that is not PCI Q:
Express aware? How does this infrastructure deal with driver that is not PCI
Express aware?
A: This infrastructure calls the error callback functions of the A:
driver when an error happens. But if the driver is not aware of This infrastructure calls the error callback functions of the
PCI Express, the device might not report its own errors to root driver when an error happens. But if the driver is not aware of
port. PCI Express, the device might not report its own errors to root
port.
Q: What modifications will that driver need to make it compatible Q:
with the PCI Express AER Root driver? What modifications will that driver need to make it compatible
with the PCI Express AER Root driver?
A: It could call the helper functions to enable AER in devices and A:
cleanup uncorrectable status register. Pls. refer to section 3.3. It could call the helper functions to enable AER in devices and
cleanup uncorrectable status register. Pls. refer to section 3.3.
4. Software error injection Software error injection
========================
Debugging PCIe AER error recovery code is quite difficult because it Debugging PCIe AER error recovery code is quite difficult because it
is hard to trigger real hardware errors. Software based error is hard to trigger real hardware errors. Software based error
...@@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named ...@@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named
Then, you need a user space tool named aer-inject, which can be gotten Then, you need a user space tool named aer-inject, which can be gotten
from: from:
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
More information about aer-inject can be found in the document comes More information about aer-inject can be found in the document comes
......
The PCI Express Port Bus Driver Guide HOWTO .. SPDX-License-Identifier: GPL-2.0
Tom L Nguyen tom.l.nguyen@intel.com .. include:: <isonum.txt>
11/03/2004
1. About this guide ===========================================
The PCI Express Port Bus Driver Guide HOWTO
===========================================
:Author: Tom L Nguyen tom.l.nguyen@intel.com 11/03/2004
:Copyright: |copy| 2004 Intel Corporation
About this guide
================
This guide describes the basics of the PCI Express Port Bus driver This guide describes the basics of the PCI Express Port Bus driver
and provides information on how to enable the service drivers to and provides information on how to enable the service drivers to
register/unregister with the PCI Express Port Bus Driver. register/unregister with the PCI Express Port Bus Driver.
2. Copyright 2004 Intel Corporation
3. What is the PCI Express Port Bus Driver What is the PCI Express Port Bus Driver
=======================================
A PCI Express Port is a logical PCI-PCI Bridge structure. There A PCI Express Port is a logical PCI-PCI Bridge structure. There
are two types of PCI Express Port: the Root Port and the Switch are two types of PCI Express Port: the Root Port and the Switch
...@@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may ...@@ -30,7 +37,8 @@ support (AER), and virtual channel support (VC). These services may
be handled by a single complex driver or be individually distributed be handled by a single complex driver or be individually distributed
and handled by corresponding service drivers. and handled by corresponding service drivers.
4. Why use the PCI Express Port Bus Driver? Why use the PCI Express Port Bus Driver?
========================================
In existing Linux kernels, the Linux Device Driver Model allows a In existing Linux kernels, the Linux Device Driver Model allows a
physical device to be handled by only a single driver. The PCI physical device to be handled by only a single driver. The PCI
...@@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests ...@@ -51,28 +59,31 @@ PCI Express Ports and distributes all provided service requests
to the corresponding service drivers as required. Some key to the corresponding service drivers as required. Some key
advantages of using the PCI Express Port Bus driver are listed below: advantages of using the PCI Express Port Bus driver are listed below:
- Allow multiple service drivers to run simultaneously on - Allow multiple service drivers to run simultaneously on
a PCI-PCI Bridge Port device. a PCI-PCI Bridge Port device.
- Allow service drivers implemented in an independent - Allow service drivers implemented in an independent
staged approach. staged approach.
- Allow one service driver to run on multiple PCI-PCI Bridge - Allow one service driver to run on multiple PCI-PCI Bridge
Port devices. Port devices.
- Manage and distribute resources of a PCI-PCI Bridge Port - Manage and distribute resources of a PCI-PCI Bridge Port
device to requested service drivers. device to requested service drivers.
5. Configuring the PCI Express Port Bus Driver vs. Service Drivers Configuring the PCI Express Port Bus Driver vs. Service Drivers
===============================================================
5.1 Including the PCI Express Port Bus Driver Support into the Kernel Including the PCI Express Port Bus Driver Support into the Kernel
-----------------------------------------------------------------
Including the PCI Express Port Bus driver depends on whether the PCI Including the PCI Express Port Bus driver depends on whether the PCI
Express support is included in the kernel config. The kernel will Express support is included in the kernel config. The kernel will
automatically include the PCI Express Port Bus driver as a kernel automatically include the PCI Express Port Bus driver as a kernel
driver when the PCI Express support is enabled in the kernel. driver when the PCI Express support is enabled in the kernel.
5.2 Enabling Service Driver Support Enabling Service Driver Support
-------------------------------
PCI device drivers are implemented based on Linux Device Driver Model. PCI device drivers are implemented based on Linux Device Driver Model.
All service drivers are PCI device drivers. As discussed above, it is All service drivers are PCI device drivers. As discussed above, it is
...@@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs. ...@@ -89,9 +100,11 @@ header file /include/linux/pcieport_if.h, before calling these APIs.
Failure to do so will result an identity mismatch, which prevents Failure to do so will result an identity mismatch, which prevents
the PCI Express Port Bus driver from loading a service driver. the PCI Express Port Bus driver from loading a service driver.
5.2.1 pcie_port_service_register pcie_port_service_register
~~~~~~~~~~~~~~~~~~~~~~~~~~
::
int pcie_port_service_register(struct pcie_port_service_driver *new) int pcie_port_service_register(struct pcie_port_service_driver *new)
This API replaces the Linux Driver Model's pci_register_driver API. A This API replaces the Linux Driver Model's pci_register_driver API. A
service driver should always calls pcie_port_service_register at service driver should always calls pcie_port_service_register at
...@@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls ...@@ -99,69 +112,76 @@ module init. Note that after service driver being loaded, calls
such as pci_enable_device(dev) and pci_set_master(dev) are no longer such as pci_enable_device(dev) and pci_set_master(dev) are no longer
necessary since these calls are executed by the PCI Port Bus driver. necessary since these calls are executed by the PCI Port Bus driver.
5.2.2 pcie_port_service_unregister pcie_port_service_unregister
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
::
void pcie_port_service_unregister(struct pcie_port_service_driver *new) void pcie_port_service_unregister(struct pcie_port_service_driver *new)
pcie_port_service_unregister replaces the Linux Driver Model's pcie_port_service_unregister replaces the Linux Driver Model's
pci_unregister_driver. It's always called by service driver when a pci_unregister_driver. It's always called by service driver when a
module exits. module exits.
5.2.3 Sample Code Sample Code
~~~~~~~~~~~
Below is sample service driver code to initialize the port service Below is sample service driver code to initialize the port service
driver data structure. driver data structure.
::
static struct pcie_port_service_id service_id[] = { { static struct pcie_port_service_id service_id[] = { {
.vendor = PCI_ANY_ID, .vendor = PCI_ANY_ID,
.device = PCI_ANY_ID, .device = PCI_ANY_ID,
.port_type = PCIE_RC_PORT, .port_type = PCIE_RC_PORT,
.service_type = PCIE_PORT_SERVICE_AER, .service_type = PCIE_PORT_SERVICE_AER,
}, { /* end: all zeroes */ } }, { /* end: all zeroes */ }
}; };
static struct pcie_port_service_driver root_aerdrv = { static struct pcie_port_service_driver root_aerdrv = {
.name = (char *)device_name, .name = (char *)device_name,
.id_table = &service_id[0], .id_table = &service_id[0],
.probe = aerdrv_load, .probe = aerdrv_load,
.remove = aerdrv_unload, .remove = aerdrv_unload,
.suspend = aerdrv_suspend, .suspend = aerdrv_suspend,
.resume = aerdrv_resume, .resume = aerdrv_resume,
}; };
Below is a sample code for registering/unregistering a service Below is a sample code for registering/unregistering a service
driver. driver.
::
static int __init aerdrv_service_init(void) static int __init aerdrv_service_init(void)
{ {
int retval = 0; int retval = 0;
retval = pcie_port_service_register(&root_aerdrv); retval = pcie_port_service_register(&root_aerdrv);
if (!retval) { if (!retval) {
/* /*
* FIX ME * FIX ME
*/ */
} }
return retval; return retval;
} }
static void __exit aerdrv_service_exit(void) static void __exit aerdrv_service_exit(void)
{ {
pcie_port_service_unregister(&root_aerdrv); pcie_port_service_unregister(&root_aerdrv);
} }
module_init(aerdrv_service_init); module_init(aerdrv_service_init);
module_exit(aerdrv_service_exit); module_exit(aerdrv_service_exit);
6. Possible Resource Conflicts Possible Resource Conflicts
===========================
Since all service drivers of a PCI-PCI Bridge Port device are Since all service drivers of a PCI-PCI Bridge Port device are
allowed to run simultaneously, below lists a few of possible resource allowed to run simultaneously, below lists a few of possible resource
conflicts with proposed solutions. conflicts with proposed solutions.
6.1 MSI and MSI-X Vector Resource MSI and MSI-X Vector Resource
-----------------------------
Once MSI or MSI-X interrupts are enabled on a device, it stays in this Once MSI or MSI-X interrupts are enabled on a device, it stays in this
mode until they are disabled again. Since service drivers of the same mode until they are disabled again. Since service drivers of the same
...@@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to ...@@ -179,7 +199,8 @@ driver. Service drivers should use (struct pcie_device*)dev->irq to
call request_irq/free_irq. In addition, the interrupt mode is stored call request_irq/free_irq. In addition, the interrupt mode is stored
in the field interrupt_mode of struct pcie_device. in the field interrupt_mode of struct pcie_device.
6.3 PCI Memory/IO Mapped Regions PCI Memory/IO Mapped Regions
----------------------------
Service drivers for PCI Express Power Management (PME), Advanced Service drivers for PCI Express Power Management (PME), Advanced
Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access Error Reporting (AER), Hot-Plug (HP) and Virtual Channel (VC) access
...@@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes ...@@ -188,7 +209,8 @@ registers accessed are independent of each other. This patch assumes
that all service drivers will be well behaved and not overwrite that all service drivers will be well behaved and not overwrite
other service driver's configuration settings. other service driver's configuration settings.
6.4 PCI Config Registers PCI Config Registers
--------------------
Each service driver runs its PCI config operations on its own Each service driver runs its PCI config operations on its own
capability structure except the PCI Express capability structure, in capability structure except the PCI Express capability structure, in
......
...@@ -13,7 +13,7 @@ ...@@ -13,7 +13,7 @@
For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force" For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
are available are available
See also Documentation/power/runtime_pm.txt, pci=noacpi See also Documentation/power/runtime_pm.rst, pci=noacpi
acpi_apic_instance= [ACPI, IOAPIC] acpi_apic_instance= [ACPI, IOAPIC]
Format: <int> Format: <int>
...@@ -223,7 +223,7 @@ ...@@ -223,7 +223,7 @@
acpi_sleep= [HW,ACPI] Sleep options acpi_sleep= [HW,ACPI] Sleep options
Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig, Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
old_ordering, nonvs, sci_force_enable, nobl } old_ordering, nonvs, sci_force_enable, nobl }
See Documentation/power/video.txt for information on See Documentation/power/video.rst for information on
s3_bios and s3_mode. s3_bios and s3_mode.
s3_beep is for debugging; it makes the PC's speaker beep s3_beep is for debugging; it makes the PC's speaker beep
as soon as the kernel's real-mode entry point is called. as soon as the kernel's real-mode entry point is called.
...@@ -4119,7 +4119,7 @@ ...@@ -4119,7 +4119,7 @@
Specify the offset from the beginning of the partition Specify the offset from the beginning of the partition
given by "resume=" at which the swap header is located, given by "resume=" at which the swap header is located,
in <PAGE_SIZE> units (needed only for swap files). in <PAGE_SIZE> units (needed only for swap files).
See Documentation/power/swsusp-and-swap-files.txt See Documentation/power/swsusp-and-swap-files.rst
resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to resumedelay= [HIBERNATION] Delay (in seconds) to pause before attempting to
read the resume files read the resume files
......
...@@ -95,7 +95,7 @@ flags - flags of the cpufreq driver ...@@ -95,7 +95,7 @@ flags - flags of the cpufreq driver
3. CPUFreq Table Generation with Operating Performance Point (OPP) 3. CPUFreq Table Generation with Operating Performance Point (OPP)
================================================================== ==================================================================
For details about OPP, see Documentation/power/opp.txt For details about OPP, see Documentation/power/opp.rst
dev_pm_opp_init_cpufreq_table - dev_pm_opp_init_cpufreq_table -
This function provides a ready to use conversion routine to translate This function provides a ready to use conversion routine to translate
......
...@@ -10,8 +10,10 @@ Required properties: ...@@ -10,8 +10,10 @@ Required properties:
interrupt source. The value must be 1. interrupt source. The value must be 1.
- compatible: Should contain "mbvl,gpex40-pcie" - compatible: Should contain "mbvl,gpex40-pcie"
- reg: Should contain PCIe registers location and length - reg: Should contain PCIe registers location and length
Mandatory:
"config_axi_slave": PCIe controller registers "config_axi_slave": PCIe controller registers
"csr_axi_slave" : Bridge config registers "csr_axi_slave" : Bridge config registers
Optional:
"gpio_slave" : GPIO registers to control slot power "gpio_slave" : GPIO registers to control slot power
"apb_csr" : MSI registers "apb_csr" : MSI registers
......
...@@ -65,6 +65,14 @@ Required properties: ...@@ -65,6 +65,14 @@ Required properties:
- afi - afi
- pcie_x - pcie_x
Optional properties:
- pinctrl-names: A list of pinctrl state names. Must contain the following
entries:
- "default": active state, puts PCIe I/O out of deep power down state
- "idle": puts PCIe I/O into deep power down state
- pinctrl-0: phandle for the default/active state of pin configurations.
- pinctrl-1: phandle for the idle state of pin configurations.
Required properties on Tegra124 and later (deprecated): Required properties on Tegra124 and later (deprecated):
- phys: Must contain an entry for each entry in phy-names. - phys: Must contain an entry for each entry in phy-names.
- phy-names: Must include the following entries: - phy-names: Must include the following entries:
......
...@@ -24,6 +24,9 @@ driver implementation may support the following properties: ...@@ -24,6 +24,9 @@ driver implementation may support the following properties:
unsupported link speed, for instance, trying to do training for unsupported link speed, for instance, trying to do training for
unsupported link speed, etc. Must be '4' for gen4, '3' for gen3, '2' unsupported link speed, etc. Must be '4' for gen4, '3' for gen3, '2'
for gen2, and '1' for gen1. Any other values are invalid. for gen2, and '1' for gen1. Any other values are invalid.
- reset-gpios:
If present this property specifies PERST# GPIO. Host drivers can parse the
GPIO and apply fundamental reset to endpoints.
PCI-PCI Bridge properties PCI-PCI Bridge properties
------------------------- -------------------------
......
...@@ -10,6 +10,7 @@ ...@@ -10,6 +10,7 @@
- "qcom,pcie-msm8996" for msm8996 or apq8096 - "qcom,pcie-msm8996" for msm8996 or apq8096
- "qcom,pcie-ipq4019" for ipq4019 - "qcom,pcie-ipq4019" for ipq4019
- "qcom,pcie-ipq8074" for ipq8074 - "qcom,pcie-ipq8074" for ipq8074
- "qcom,pcie-qcs404" for qcs404
- reg: - reg:
Usage: required Usage: required
...@@ -116,6 +117,15 @@ ...@@ -116,6 +117,15 @@
- "ahb" AHB clock - "ahb" AHB clock
- "aux" Auxiliary clock - "aux" Auxiliary clock
- clock-names:
Usage: required for qcs404
Value type: <stringlist>
Definition: Should contain the following entries
- "iface" AHB clock
- "aux" Auxiliary clock
- "master_bus" AXI Master clock
- "slave_bus" AXI Slave clock
- resets: - resets:
Usage: required Usage: required
Value type: <prop-encoded-array> Value type: <prop-encoded-array>
...@@ -167,6 +177,17 @@ ...@@ -167,6 +177,17 @@
- "ahb" AHB Reset - "ahb" AHB Reset
- "axi_m_sticky" AXI Master Sticky reset - "axi_m_sticky" AXI Master Sticky reset
- reset-names:
Usage: required for qcs404
Value type: <stringlist>
Definition: Should contain the following entries
- "axi_m" AXI Master reset
- "axi_s" AXI Slave reset
- "axi_m_sticky" AXI Master Sticky reset
- "pipe_sticky" PIPE sticky reset
- "pwr" PWR reset
- "ahb" AHB reset
- power-domains: - power-domains:
Usage: required for apq8084 and msm8996/apq8096 Usage: required for apq8084 and msm8996/apq8096
Value type: <prop-encoded-array> Value type: <prop-encoded-array>
...@@ -195,12 +216,12 @@ ...@@ -195,12 +216,12 @@
Definition: A phandle to the PCIe endpoint power supply Definition: A phandle to the PCIe endpoint power supply
- phys: - phys:
Usage: required for apq8084 Usage: required for apq8084 and qcs404
Value type: <phandle> Value type: <phandle>
Definition: List of phandle(s) as listed in phy-names property Definition: List of phandle(s) as listed in phy-names property
- phy-names: - phy-names:
Usage: required for apq8084 Usage: required for apq8084 and qcs404
Value type: <stringlist> Value type: <stringlist>
Definition: Should contain "pciephy" Definition: Should contain "pciephy"
......
...@@ -3,6 +3,7 @@ ...@@ -3,6 +3,7 @@
Required properties: Required properties:
compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC; compatible: "renesas,pcie-r8a7743" for the R8A7743 SoC;
"renesas,pcie-r8a7744" for the R8A7744 SoC; "renesas,pcie-r8a7744" for the R8A7744 SoC;
"renesas,pcie-r8a774a1" for the R8A774A1 SoC;
"renesas,pcie-r8a774c0" for the R8A774C0 SoC; "renesas,pcie-r8a774c0" for the R8A774C0 SoC;
"renesas,pcie-r8a7779" for the R8A7779 SoC; "renesas,pcie-r8a7779" for the R8A7779 SoC;
"renesas,pcie-r8a7790" for the R8A7790 SoC; "renesas,pcie-r8a7790" for the R8A7790 SoC;
......
...@@ -225,7 +225,7 @@ system-wide transition to a sleep state even though its :c:member:`runtime_auto` ...@@ -225,7 +225,7 @@ system-wide transition to a sleep state even though its :c:member:`runtime_auto`
flag is clear. flag is clear.
For more information about the runtime power management framework, refer to For more information about the runtime power management framework, refer to
:file:`Documentation/power/runtime_pm.txt`. :file:`Documentation/power/runtime_pm.rst`.
Calling Drivers to Enter and Leave System Sleep States Calling Drivers to Enter and Leave System Sleep States
...@@ -728,7 +728,7 @@ it into account in any way. ...@@ -728,7 +728,7 @@ it into account in any way.
Devices may be defined as IRQ-safe which indicates to the PM core that their Devices may be defined as IRQ-safe which indicates to the PM core that their
runtime PM callbacks may be invoked with disabled interrupts (see runtime PM callbacks may be invoked with disabled interrupts (see
:file:`Documentation/power/runtime_pm.txt` for more information). If an :file:`Documentation/power/runtime_pm.rst` for more information). If an
IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be IRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
disallowed, unless the domain itself is defined as IRQ-safe. However, it disallowed, unless the domain itself is defined as IRQ-safe. However, it
makes sense to define a PM domain as IRQ-safe only if all the devices in it makes sense to define a PM domain as IRQ-safe only if all the devices in it
...@@ -795,7 +795,7 @@ so on) and the final state of the device must reflect the "active" runtime PM ...@@ -795,7 +795,7 @@ so on) and the final state of the device must reflect the "active" runtime PM
status in that case. status in that case.
During system-wide resume from a sleep state it's easiest to put devices into During system-wide resume from a sleep state it's easiest to put devices into
the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`. the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
[Refer to that document for more information regarding this particular issue as [Refer to that document for more information regarding this particular issue as
well as for information on the device runtime power management framework in well as for information on the device runtime power management framework in
general.] general.]
......
...@@ -46,7 +46,7 @@ device is turned off while the system as a whole remains running, we ...@@ -46,7 +46,7 @@ device is turned off while the system as a whole remains running, we
call it a "dynamic suspend" (also known as a "runtime suspend" or call it a "dynamic suspend" (also known as a "runtime suspend" or
"selective suspend"). This document concentrates mostly on how "selective suspend"). This document concentrates mostly on how
dynamic PM is implemented in the USB subsystem, although system PM is dynamic PM is implemented in the USB subsystem, although system PM is
covered to some extent (see ``Documentation/power/*.txt`` for more covered to some extent (see ``Documentation/power/*.rst`` for more
information about system PM). information about system PM).
System PM support is present only if the kernel was built with System PM support is present only if the kernel was built with
......
...@@ -103,6 +103,7 @@ needed). ...@@ -103,6 +103,7 @@ needed).
vm/index vm/index
bpf/index bpf/index
usb/index usb/index
PCI/index
misc-devices/index misc-devices/index
Architecture-specific documentation Architecture-specific documentation
......
============
APM or ACPI? APM or ACPI?
------------ ============
If you have a relatively recent x86 mobile, desktop, or server system, If you have a relatively recent x86 mobile, desktop, or server system,
odds are it supports either Advanced Power Management (APM) or odds are it supports either Advanced Power Management (APM) or
Advanced Configuration and Power Interface (ACPI). ACPI is the newer Advanced Configuration and Power Interface (ACPI). ACPI is the newer
...@@ -28,5 +30,7 @@ and be sure that they are started sometime in the system boot process. ...@@ -28,5 +30,7 @@ and be sure that they are started sometime in the system boot process.
Go ahead and start both. If ACPI or APM is not available on your Go ahead and start both. If ACPI or APM is not available on your
system the associated daemon will exit gracefully. system the associated daemon will exit gracefully.
apmd: http://ftp.debian.org/pool/main/a/apmd/ ===== =======================================
acpid: http://acpid.sf.net/ apmd http://ftp.debian.org/pool/main/a/apmd/
acpid http://acpid.sf.net/
===== =======================================
=================================
Debugging hibernation and suspend Debugging hibernation and suspend
=================================
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
1. Testing hibernation (aka suspend to disk or STD) 1. Testing hibernation (aka suspend to disk or STD)
===================================================
To check if hibernation works, you can try to hibernate in the "reboot" mode: To check if hibernation works, you can try to hibernate in the "reboot" mode::
# echo reboot > /sys/power/disk # echo reboot > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
and the system should create a hibernation image, reboot, resume and get back to and the system should create a hibernation image, reboot, resume and get back to
the command prompt where you have started the transition. If that happens, the command prompt where you have started the transition. If that happens,
...@@ -15,20 +19,21 @@ test at least a couple of times in a row for confidence. [This is necessary, ...@@ -15,20 +19,21 @@ test at least a couple of times in a row for confidence. [This is necessary,
because some problems only show up on a second attempt at suspending and because some problems only show up on a second attempt at suspending and
resuming the system.] Moreover, hibernating in the "reboot" and "shutdown" resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
modes causes the PM core to skip some platform-related callbacks which on ACPI modes causes the PM core to skip some platform-related callbacks which on ACPI
systems might be necessary to make hibernation work. Thus, if your machine fails systems might be necessary to make hibernation work. Thus, if your machine
to hibernate or resume in the "reboot" mode, you should try the "platform" mode: fails to hibernate or resume in the "reboot" mode, you should try the
"platform" mode::
# echo platform > /sys/power/disk # echo platform > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
which is the default and recommended mode of hibernation. which is the default and recommended mode of hibernation.
Unfortunately, the "platform" mode of hibernation does not work on some systems Unfortunately, the "platform" mode of hibernation does not work on some systems
with broken BIOSes. In such cases the "shutdown" mode of hibernation might with broken BIOSes. In such cases the "shutdown" mode of hibernation might
work: work::
# echo shutdown > /sys/power/disk # echo shutdown > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
(it is similar to the "reboot" mode, but it requires you to press the power (it is similar to the "reboot" mode, but it requires you to press the power
button to make the system resume). button to make the system resume).
...@@ -37,6 +42,7 @@ If neither "platform" nor "shutdown" hibernation mode works, you will need to ...@@ -37,6 +42,7 @@ If neither "platform" nor "shutdown" hibernation mode works, you will need to
identify what goes wrong. identify what goes wrong.
a) Test modes of hibernation a) Test modes of hibernation
----------------------------
To find out why hibernation fails on your system, you can use a special testing To find out why hibernation fails on your system, you can use a special testing
facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then, facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
...@@ -44,36 +50,38 @@ there is the file /sys/power/pm_test that can be used to make the hibernation ...@@ -44,36 +50,38 @@ there is the file /sys/power/pm_test that can be used to make the hibernation
core run in a test mode. There are 5 test modes available: core run in a test mode. There are 5 test modes available:
freezer freezer
- test the freezing of processes - test the freezing of processes
devices devices
- test the freezing of processes and suspending of devices - test the freezing of processes and suspending of devices
platform platform
- test the freezing of processes, suspending of devices and platform - test the freezing of processes, suspending of devices and platform
global control methods(*) global control methods [1]_
processors processors
- test the freezing of processes, suspending of devices, platform - test the freezing of processes, suspending of devices, platform
global control methods(*) and the disabling of nonboot CPUs global control methods [1]_ and the disabling of nonboot CPUs
core core
- test the freezing of processes, suspending of devices, platform global - test the freezing of processes, suspending of devices, platform global
control methods(*), the disabling of nonboot CPUs and suspending of control methods\ [1]_, the disabling of nonboot CPUs and suspending
platform/system devices of platform/system devices
.. [1]
(*) the platform global control methods are only available on ACPI systems the platform global control methods are only available on ACPI systems
and are only tested if the hibernation mode is set to "platform" and are only tested if the hibernation mode is set to "platform"
To use one of them it is necessary to write the corresponding string to To use one of them it is necessary to write the corresponding string to
/sys/power/pm_test (eg. "devices" to test the freezing of processes and /sys/power/pm_test (eg. "devices" to test the freezing of processes and
suspending devices) and issue the standard hibernation commands. For example, suspending devices) and issue the standard hibernation commands. For example,
to use the "devices" test mode along with the "platform" mode of hibernation, to use the "devices" test mode along with the "platform" mode of hibernation,
you should do the following: you should do the following::
# echo devices > /sys/power/pm_test # echo devices > /sys/power/pm_test
# echo platform > /sys/power/disk # echo platform > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
Then, the kernel will try to freeze processes, suspend devices, wait a few Then, the kernel will try to freeze processes, suspend devices, wait a few
seconds (5 by default, but configurable by the suspend.pm_test_delay module seconds (5 by default, but configurable by the suspend.pm_test_delay module
...@@ -108,11 +116,12 @@ If the "devices" test fails, most likely there is a driver that cannot suspend ...@@ -108,11 +116,12 @@ If the "devices" test fails, most likely there is a driver that cannot suspend
or resume its device (in the latter case the system may hang or become unstable or resume its device (in the latter case the system may hang or become unstable
after the test, so please take that into consideration). To find this driver, after the test, so please take that into consideration). To find this driver,
you can carry out a binary search according to the rules: you can carry out a binary search according to the rules:
- if the test fails, unload a half of the drivers currently loaded and repeat - if the test fails, unload a half of the drivers currently loaded and repeat
(that would probably involve rebooting the system, so always note what drivers (that would probably involve rebooting the system, so always note what drivers
have been loaded before the test), have been loaded before the test),
- if the test succeeds, load a half of the drivers you have unloaded most - if the test succeeds, load a half of the drivers you have unloaded most
recently and repeat. recently and repeat.
Once you have found the failing driver (there can be more than just one of Once you have found the failing driver (there can be more than just one of
them), you have to unload it every time before hibernation. In that case please them), you have to unload it every time before hibernation. In that case please
...@@ -146,6 +155,7 @@ indicates a serious problem that very well may be related to the hardware, but ...@@ -146,6 +155,7 @@ indicates a serious problem that very well may be related to the hardware, but
please report it anyway. please report it anyway.
b) Testing minimal configuration b) Testing minimal configuration
--------------------------------
If all of the hibernation test modes work, you can boot the system with the If all of the hibernation test modes work, you can boot the system with the
"init=/bin/bash" command line parameter and attempt to hibernate in the "init=/bin/bash" command line parameter and attempt to hibernate in the
...@@ -165,14 +175,15 @@ Again, if you find the offending module(s), it(they) must be unloaded every time ...@@ -165,14 +175,15 @@ Again, if you find the offending module(s), it(they) must be unloaded every time
before hibernation, and please report the problem with it(them). before hibernation, and please report the problem with it(them).
c) Using the "test_resume" hibernation option c) Using the "test_resume" hibernation option
---------------------------------------------
/sys/power/disk generally tells the kernel what to do after creating a /sys/power/disk generally tells the kernel what to do after creating a
hibernation image. One of the available options is "test_resume" which hibernation image. One of the available options is "test_resume" which
causes the just created image to be used for immediate restoration. Namely, causes the just created image to be used for immediate restoration. Namely,
after doing: after doing::
# echo test_resume > /sys/power/disk # echo test_resume > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
a hibernation image will be created and a resume from it will be triggered a hibernation image will be created and a resume from it will be triggered
immediately without involving the platform firmware in any way. immediately without involving the platform firmware in any way.
...@@ -190,6 +201,7 @@ to resume may be related to the differences between the restore and image ...@@ -190,6 +201,7 @@ to resume may be related to the differences between the restore and image
kernels. kernels.
d) Advanced debugging d) Advanced debugging
---------------------
In case that hibernation does not work on your system even in the minimal In case that hibernation does not work on your system even in the minimal
configuration and compiling more drivers as modules is not practical or some configuration and compiling more drivers as modules is not practical or some
...@@ -200,9 +212,10 @@ kernel messages using the serial console. This may provide you with some ...@@ -200,9 +212,10 @@ kernel messages using the serial console. This may provide you with some
information about the reasons of the suspend (resume) failure. Alternatively, information about the reasons of the suspend (resume) failure. Alternatively,
it may be possible to use a FireWire port for debugging with firescope it may be possible to use a FireWire port for debugging with firescope
(http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to (http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to
use the PM_TRACE mechanism documented in Documentation/power/s2ram.txt . use the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
2. Testing suspend to RAM (STR) 2. Testing suspend to RAM (STR)
===============================
To verify that the STR works, it is generally more convenient to use the s2ram To verify that the STR works, it is generally more convenient to use the s2ram
tool available from http://suspend.sf.net and documented at tool available from http://suspend.sf.net and documented at
...@@ -230,7 +243,8 @@ you will have to unload them every time before an STR transition (ie. before ...@@ -230,7 +243,8 @@ you will have to unload them every time before an STR transition (ie. before
you run s2ram), and please report the problems with them. you run s2ram), and please report the problems with them.
There is a debugfs entry which shows the suspend to RAM statistics. Here is an There is a debugfs entry which shows the suspend to RAM statistics. Here is an
example of its output. example of its output::
# mount -t debugfs none /sys/kernel/debug # mount -t debugfs none /sys/kernel/debug
# cat /sys/kernel/debug/suspend_stats # cat /sys/kernel/debug/suspend_stats
success: 20 success: 20
...@@ -248,6 +262,7 @@ example of its output. ...@@ -248,6 +262,7 @@ example of its output.
-16 -16
last_failed_step: suspend last_failed_step: suspend
suspend suspend
Field success means the success number of suspend to RAM, and field fail means Field success means the success number of suspend to RAM, and field fail means
the failure number. Others are the failure number of different steps of suspend the failure number. Others are the failure number of different steps of suspend
to RAM. suspend_stats just lists the last 2 failed devices, error number and to RAM. suspend_stats just lists the last 2 failed devices, error number and
......
===============
Charger Manager Charger Manager
===============
(C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL (C) 2011 MyungJoo Ham <myungjoo.ham@samsung.com>, GPL
Charger Manager provides in-kernel battery charger management that Charger Manager provides in-kernel battery charger management that
...@@ -55,41 +58,39 @@ Charger Manager supports the following: ...@@ -55,41 +58,39 @@ Charger Manager supports the following:
notification to users with UEVENT. notification to users with UEVENT.
2. Global Charger-Manager Data related with suspend_again 2. Global Charger-Manager Data related with suspend_again
======================================================== =========================================================
In order to setup Charger Manager with suspend-again feature In order to setup Charger Manager with suspend-again feature
(in-suspend monitoring), the user should provide charger_global_desc (in-suspend monitoring), the user should provide charger_global_desc
with setup_charger_manager(struct charger_global_desc *). with setup_charger_manager(`struct charger_global_desc *`).
This charger_global_desc data for in-suspend monitoring is global This charger_global_desc data for in-suspend monitoring is global
as the name suggests. Thus, the user needs to provide only once even as the name suggests. Thus, the user needs to provide only once even
if there are multiple batteries. If there are multiple batteries, the if there are multiple batteries. If there are multiple batteries, the
multiple instances of Charger Manager share the same charger_global_desc multiple instances of Charger Manager share the same charger_global_desc
and it will manage in-suspend monitoring for all instances of Charger Manager. and it will manage in-suspend monitoring for all instances of Charger Manager.
The user needs to provide all the three entries properly in order to activate The user needs to provide all the three entries to `struct charger_global_desc`
in-suspend monitoring: properly in order to activate in-suspend monitoring:
struct charger_global_desc {
char *rtc_name; `char *rtc_name;`
: The name of rtc (e.g., "rtc0") used to wakeup the system from The name of rtc (e.g., "rtc0") used to wakeup the system from
suspend for Charger Manager. The alarm interrupt (AIE) of the rtc suspend for Charger Manager. The alarm interrupt (AIE) of the rtc
should be able to wake up the system from suspend. Charger Manager should be able to wake up the system from suspend. Charger Manager
saves and restores the alarm value and use the previously-defined saves and restores the alarm value and use the previously-defined
alarm if it is going to go off earlier than Charger Manager so that alarm if it is going to go off earlier than Charger Manager so that
Charger Manager does not interfere with previously-defined alarms. Charger Manager does not interfere with previously-defined alarms.
bool (*rtc_only_wakeup)(void); `bool (*rtc_only_wakeup)(void);`
: This callback should let CM know whether This callback should let CM know whether
the wakeup-from-suspend is caused only by the alarm of "rtc" in the the wakeup-from-suspend is caused only by the alarm of "rtc" in the
same struct. If there is any other wakeup source triggered the same struct. If there is any other wakeup source triggered the
wakeup, it should return false. If the "rtc" is the only wakeup wakeup, it should return false. If the "rtc" is the only wakeup
reason, it should return true. reason, it should return true.
bool assume_timer_stops_in_suspend; `bool assume_timer_stops_in_suspend;`
: if true, Charger Manager assumes that if true, Charger Manager assumes that
the timer (CM uses jiffies as timer) stops during suspend. Then, CM the timer (CM uses jiffies as timer) stops during suspend. Then, CM
assumes that the suspend-duration is same as the alarm length. assumes that the suspend-duration is same as the alarm length.
};
3. How to setup suspend_again 3. How to setup suspend_again
============================= =============================
...@@ -109,26 +110,28 @@ if the system was woken up by Charger Manager and the polling ...@@ -109,26 +110,28 @@ if the system was woken up by Charger Manager and the polling
============================================= =============================================
For each battery charged independently from other batteries (if a series of For each battery charged independently from other batteries (if a series of
batteries are charged by a single charger, they are counted as one independent batteries are charged by a single charger, they are counted as one independent
battery), an instance of Charger Manager is attached to it. battery), an instance of Charger Manager is attached to it. The following
struct charger_desc { struct charger_desc elements:
char *psy_name; `char *psy_name;`
: The power-supply-class name of the battery. Default is The power-supply-class name of the battery. Default is
"battery" if psy_name is NULL. Users can access the psy entries "battery" if psy_name is NULL. Users can access the psy entries
at "/sys/class/power_supply/[psy_name]/". at "/sys/class/power_supply/[psy_name]/".
enum polling_modes polling_mode; `enum polling_modes polling_mode;`
: CM_POLL_DISABLE: do not poll this battery. CM_POLL_DISABLE:
CM_POLL_ALWAYS: always poll this battery. do not poll this battery.
CM_POLL_EXTERNAL_POWER_ONLY: poll this battery if and only if CM_POLL_ALWAYS:
an external power source is attached. always poll this battery.
CM_POLL_CHARGING_ONLY: poll this battery if and only if the CM_POLL_EXTERNAL_POWER_ONLY:
battery is being charged. poll this battery if and only if an external power
source is attached.
unsigned int fullbatt_vchkdrop_ms; CM_POLL_CHARGING_ONLY:
unsigned int fullbatt_vchkdrop_uV; poll this battery if and only if the battery is being charged.
: If both have non-zero values, Charger Manager will check the
`unsigned int fullbatt_vchkdrop_ms; / unsigned int fullbatt_vchkdrop_uV;`
If both have non-zero values, Charger Manager will check the
battery voltage drop fullbatt_vchkdrop_ms after the battery is fully battery voltage drop fullbatt_vchkdrop_ms after the battery is fully
charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger charged. If the voltage drop is over fullbatt_vchkdrop_uV, Charger
Manager will try to recharge the battery by disabling and enabling Manager will try to recharge the battery by disabling and enabling
...@@ -136,50 +139,52 @@ unsigned int fullbatt_vchkdrop_uV; ...@@ -136,50 +139,52 @@ unsigned int fullbatt_vchkdrop_uV;
condition) is needed to be implemented with hardware interrupts from condition) is needed to be implemented with hardware interrupts from
fuel gauges or charger devices/chips. fuel gauges or charger devices/chips.
unsigned int fullbatt_uV; `unsigned int fullbatt_uV;`
: If specified with a non-zero value, Charger Manager assumes If specified with a non-zero value, Charger Manager assumes
that the battery is full (capacity = 100) if the battery is not being that the battery is full (capacity = 100) if the battery is not being
charged and the battery voltage is equal to or greater than charged and the battery voltage is equal to or greater than
fullbatt_uV. fullbatt_uV.
unsigned int polling_interval_ms; `unsigned int polling_interval_ms;`
: Required polling interval in ms. Charger Manager will poll Required polling interval in ms. Charger Manager will poll
this battery every polling_interval_ms or more frequently. this battery every polling_interval_ms or more frequently.
enum data_source battery_present; `enum data_source battery_present;`
: CM_BATTERY_PRESENT: assume that the battery exists. CM_BATTERY_PRESENT:
CM_NO_BATTERY: assume that the battery does not exists. assume that the battery exists.
CM_FUEL_GAUGE: get battery presence information from fuel gauge. CM_NO_BATTERY:
CM_CHARGER_STAT: get battery presence from chargers. assume that the battery does not exists.
CM_FUEL_GAUGE:
char **psy_charger_stat; get battery presence information from fuel gauge.
: An array ending with NULL that has power-supply-class names of CM_CHARGER_STAT:
get battery presence from chargers.
`char **psy_charger_stat;`
An array ending with NULL that has power-supply-class names of
chargers. Each power-supply-class should provide "PRESENT" (if chargers. Each power-supply-class should provide "PRESENT" (if
battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an battery_present is "CM_CHARGER_STAT"), "ONLINE" (shows whether an
external power source is attached or not), and "STATUS" (shows whether external power source is attached or not), and "STATUS" (shows whether
the battery is {"FULL" or not FULL} or {"FULL", "Charging", the battery is {"FULL" or not FULL} or {"FULL", "Charging",
"Discharging", "NotCharging"}). "Discharging", "NotCharging"}).
int num_charger_regulators; `int num_charger_regulators; / struct regulator_bulk_data *charger_regulators;`
struct regulator_bulk_data *charger_regulators; Regulators representing the chargers in the form for
: Regulators representing the chargers in the form for
regulator framework's bulk functions. regulator framework's bulk functions.
char *psy_fuel_gauge; `char *psy_fuel_gauge;`
: Power-supply-class name of the fuel gauge. Power-supply-class name of the fuel gauge.
int (*temperature_out_of_range)(int *mC); `int (*temperature_out_of_range)(int *mC); / bool measure_battery_temp;`
bool measure_battery_temp; This callback returns 0 if the temperature is safe for charging,
: This callback returns 0 if the temperature is safe for charging,
a positive number if it is too hot to charge, and a negative number a positive number if it is too hot to charge, and a negative number
if it is too cold to charge. With the variable mC, the callback returns if it is too cold to charge. With the variable mC, the callback returns
the temperature in 1/1000 of centigrade. the temperature in 1/1000 of centigrade.
The source of temperature can be battery or ambient one according to The source of temperature can be battery or ambient one according to
the value of measure_battery_temp. the value of measure_battery_temp.
};
5. Notify Charger-Manager of charger events: cm_notify_event() 5. Notify Charger-Manager of charger events: cm_notify_event()
========================================================= ==============================================================
If there is an charger event is required to notify If there is an charger event is required to notify
Charger Manager, a charger device driver that triggers the event can call Charger Manager, a charger device driver that triggers the event can call
cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager. cm_notify_event(psy, type, msg) to notify the corresponding Charger Manager.
......
====================================================
Testing suspend and resume support in device drivers Testing suspend and resume support in device drivers
====================================================
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
1. Preparing the test system 1. Preparing the test system
============================
Unfortunately, to effectively test the support for the system-wide suspend and Unfortunately, to effectively test the support for the system-wide suspend and
resume transitions in a driver, it is necessary to suspend and resume a fully resume transitions in a driver, it is necessary to suspend and resume a fully
...@@ -14,19 +18,20 @@ the machine's BIOS. ...@@ -14,19 +18,20 @@ the machine's BIOS.
Of course, for this purpose the test system has to be known to suspend and Of course, for this purpose the test system has to be known to suspend and
resume without the driver being tested. Thus, if possible, you should first resume without the driver being tested. Thus, if possible, you should first
resolve all suspend/resume-related problems in the test system before you start resolve all suspend/resume-related problems in the test system before you start
testing the new driver. Please see Documentation/power/basic-pm-debugging.txt testing the new driver. Please see Documentation/power/basic-pm-debugging.rst
for more information about the debugging of suspend/resume functionality. for more information about the debugging of suspend/resume functionality.
2. Testing the driver 2. Testing the driver
=====================
Once you have resolved the suspend/resume-related problems with your test system Once you have resolved the suspend/resume-related problems with your test system
without the new driver, you are ready to test it: without the new driver, you are ready to test it:
a) Build the driver as a module, load it and try the test modes of hibernation a) Build the driver as a module, load it and try the test modes of hibernation
(see: Documentation/power/basic-pm-debugging.txt, 1). (see: Documentation/power/basic-pm-debugging.rst, 1).
b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
"platform" modes (see: Documentation/power/basic-pm-debugging.txt, 1). "platform" modes (see: Documentation/power/basic-pm-debugging.rst, 1).
c) Compile the driver directly into the kernel and try the test modes of c) Compile the driver directly into the kernel and try the test modes of
hibernation. hibernation.
...@@ -34,12 +39,12 @@ c) Compile the driver directly into the kernel and try the test modes of ...@@ -34,12 +39,12 @@ c) Compile the driver directly into the kernel and try the test modes of
d) Attempt to hibernate with the driver compiled directly into the kernel d) Attempt to hibernate with the driver compiled directly into the kernel
in the "reboot", "shutdown" and "platform" modes. in the "reboot", "shutdown" and "platform" modes.
e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.txt, e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.rst,
2). [As far as the STR tests are concerned, it should not matter whether or 2). [As far as the STR tests are concerned, it should not matter whether or
not the driver is built as a module.] not the driver is built as a module.]
f) Attempt to suspend to RAM using the s2ram tool with the driver loaded f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
(see: Documentation/power/basic-pm-debugging.txt, 2). (see: Documentation/power/basic-pm-debugging.rst, 2).
Each of the above tests should be repeated several times and the STD tests Each of the above tests should be repeated several times and the STD tests
should be mixed with the STR tests. If any of them fails, the driver cannot be should be mixed with the STR tests. If any of them fails, the driver cannot be
......
==================== ====================
Energy Model of CPUs Energy Model of CPUs
==================== ====================
1. Overview 1. Overview
----------- -----------
...@@ -20,7 +20,7 @@ kernel, hence enabling to avoid redundant work. ...@@ -20,7 +20,7 @@ kernel, hence enabling to avoid redundant work.
The figure below depicts an example of drivers (Arm-specific here, but the The figure below depicts an example of drivers (Arm-specific here, but the
approach is applicable to any architecture) providing power costs to the EM approach is applicable to any architecture) providing power costs to the EM
framework, and interested clients reading the data from it. framework, and interested clients reading the data from it::
+---------------+ +-----------------+ +---------------+ +---------------+ +-----------------+ +---------------+
| Thermal (IPA) | | Scheduler (EAS) | | Other | | Thermal (IPA) | | Scheduler (EAS) | | Other |
...@@ -58,15 +58,17 @@ micro-architectures. ...@@ -58,15 +58,17 @@ micro-architectures.
2. Core APIs 2. Core APIs
------------ ------------
2.1 Config options 2.1 Config options
^^^^^^^^^^^^^^^^^^
CONFIG_ENERGY_MODEL must be enabled to use the EM framework. CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
2.2 Registration of performance domains 2.2 Registration of performance domains
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Drivers are expected to register performance domains into the EM framework by Drivers are expected to register performance domains into the EM framework by
calling the following API: calling the following API::
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states, int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
struct em_data_callback *cb); struct em_data_callback *cb);
...@@ -80,7 +82,8 @@ callback, and kernel/power/energy_model.c for further documentation on this ...@@ -80,7 +82,8 @@ callback, and kernel/power/energy_model.c for further documentation on this
API. API.
2.3 Accessing performance domains 2.3 Accessing performance domains
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Subsystems interested in the energy model of a CPU can retrieve it using the Subsystems interested in the energy model of a CPU can retrieve it using the
em_cpu_get() API. The energy model tables are allocated once upon creation of em_cpu_get() API. The energy model tables are allocated once upon creation of
...@@ -99,46 +102,46 @@ More details about the above APIs can be found in include/linux/energy_model.h. ...@@ -99,46 +102,46 @@ More details about the above APIs can be found in include/linux/energy_model.h.
This section provides a simple example of a CPUFreq driver registering a This section provides a simple example of a CPUFreq driver registering a
performance domain in the Energy Model framework using the (fake) 'foo' performance domain in the Energy Model framework using the (fake) 'foo'
protocol. The driver implements an est_power() function to be provided to the protocol. The driver implements an est_power() function to be provided to the
EM framework. EM framework::
-> drivers/cpufreq/foo_cpufreq.c -> drivers/cpufreq/foo_cpufreq.c
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu) 01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
02 { 02 {
03 long freq, power; 03 long freq, power;
04 04
05 /* Use the 'foo' protocol to ceil the frequency */ 05 /* Use the 'foo' protocol to ceil the frequency */
06 freq = foo_get_freq_ceil(cpu, *KHz); 06 freq = foo_get_freq_ceil(cpu, *KHz);
07 if (freq < 0); 07 if (freq < 0);
08 return freq; 08 return freq;
09 09
10 /* Estimate the power cost for the CPU at the relevant freq. */ 10 /* Estimate the power cost for the CPU at the relevant freq. */
11 power = foo_estimate_power(cpu, freq); 11 power = foo_estimate_power(cpu, freq);
12 if (power < 0); 12 if (power < 0);
13 return power; 13 return power;
14 14
15 /* Return the values to the EM framework */ 15 /* Return the values to the EM framework */
16 *mW = power; 16 *mW = power;
17 *KHz = freq; 17 *KHz = freq;
18 18
19 return 0; 19 return 0;
20 } 20 }
21 21
22 static int foo_cpufreq_init(struct cpufreq_policy *policy) 22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
23 { 23 {
24 struct em_data_callback em_cb = EM_DATA_CB(est_power); 24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
25 int nr_opp, ret; 25 int nr_opp, ret;
26 26
27 /* Do the actual CPUFreq init work ... */ 27 /* Do the actual CPUFreq init work ... */
28 ret = do_foo_cpufreq_init(policy); 28 ret = do_foo_cpufreq_init(policy);
29 if (ret) 29 if (ret)
30 return ret; 30 return ret;
31 31
32 /* Find the number of OPPs for this policy */ 32 /* Find the number of OPPs for this policy */
33 nr_opp = foo_get_nr_opp(policy); 33 nr_opp = foo_get_nr_opp(policy);
34 34
35 /* And register the new performance domain */ 35 /* And register the new performance domain */
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb); 36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
37 37
38 return 0; 38 return 0;
39 } 39 }
=================
Freezing of tasks Freezing of tasks
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL =================
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
I. What is the freezing of tasks? I. What is the freezing of tasks?
=================================
The freezing of tasks is a mechanism by which user space processes and some The freezing of tasks is a mechanism by which user space processes and some
kernel threads are controlled during hibernation or system-wide suspend (on some kernel threads are controlled during hibernation or system-wide suspend (on some
architectures). architectures).
II. How does it work? II. How does it work?
=====================
There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN There are three per-task flags used for that, PF_NOFREEZE, PF_FROZEN
and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have and PF_FREEZER_SKIP (the last one is auxiliary). The tasks that have
...@@ -41,7 +46,7 @@ explicitly in suitable places or use the wait_event_freezable() or ...@@ -41,7 +46,7 @@ explicitly in suitable places or use the wait_event_freezable() or
wait_event_freezable_timeout() macros (defined in include/linux/freezer.h) wait_event_freezable_timeout() macros (defined in include/linux/freezer.h)
that combine interruptible sleep with checking if the task is to be frozen and that combine interruptible sleep with checking if the task is to be frozen and
calling try_to_freeze(). The main loop of a freezable kernel thread may look calling try_to_freeze(). The main loop of a freezable kernel thread may look
like the following one: like the following one::
set_freezable(); set_freezable();
do { do {
...@@ -65,7 +70,7 @@ order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that ...@@ -65,7 +70,7 @@ order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
have been frozen leave __refrigerator() and continue running. have been frozen leave __refrigerator() and continue running.
Rationale behind the functions dealing with freezing and thawing of tasks: Rationale behind the functions dealing with freezing and thawing of tasks
------------------------------------------------------------------------- -------------------------------------------------------------------------
freeze_processes(): freeze_processes():
...@@ -86,6 +91,7 @@ thaw_processes(): ...@@ -86,6 +91,7 @@ thaw_processes():
III. Which kernel threads are freezable? III. Which kernel threads are freezable?
========================================
Kernel threads are not freezable by default. However, a kernel thread may clear Kernel threads are not freezable by default. However, a kernel thread may clear
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
...@@ -93,37 +99,39 @@ directly is not allowed). From this point it is regarded as freezable ...@@ -93,37 +99,39 @@ directly is not allowed). From this point it is regarded as freezable
and must call try_to_freeze() in a suitable place. and must call try_to_freeze() in a suitable place.
IV. Why do we do that? IV. Why do we do that?
======================
Generally speaking, there is a couple of reasons to use the freezing of tasks: Generally speaking, there is a couple of reasons to use the freezing of tasks:
1. The principal reason is to prevent filesystems from being damaged after 1. The principal reason is to prevent filesystems from being damaged after
hibernation. At the moment we have no simple means of checkpointing hibernation. At the moment we have no simple means of checkpointing
filesystems, so if there are any modifications made to filesystem data and/or filesystems, so if there are any modifications made to filesystem data and/or
metadata on disks, we cannot bring them back to the state from before the metadata on disks, we cannot bring them back to the state from before the
modifications. At the same time each hibernation image contains some modifications. At the same time each hibernation image contains some
filesystem-related information that must be consistent with the state of the filesystem-related information that must be consistent with the state of the
on-disk data and metadata after the system memory state has been restored from on-disk data and metadata after the system memory state has been restored
the image (otherwise the filesystems will be damaged in a nasty way, usually from the image (otherwise the filesystems will be damaged in a nasty way,
making them almost impossible to repair). We therefore freeze tasks that might usually making them almost impossible to repair). We therefore freeze
cause the on-disk filesystems' data and metadata to be modified after the tasks that might cause the on-disk filesystems' data and metadata to be
hibernation image has been created and before the system is finally powered off. modified after the hibernation image has been created and before the
The majority of these are user space processes, but if any of the kernel threads system is finally powered off. The majority of these are user space
may cause something like this to happen, they have to be freezable. processes, but if any of the kernel threads may cause something like this
to happen, they have to be freezable.
2. Next, to create the hibernation image we need to free a sufficient amount of 2. Next, to create the hibernation image we need to free a sufficient amount of
memory (approximately 50% of available RAM) and we need to do that before memory (approximately 50% of available RAM) and we need to do that before
devices are deactivated, because we generally need them for swapping out. Then, devices are deactivated, because we generally need them for swapping out.
after the memory for the image has been freed, we don't want tasks to allocate Then, after the memory for the image has been freed, we don't want tasks
additional memory and we prevent them from doing that by freezing them earlier. to allocate additional memory and we prevent them from doing that by
[Of course, this also means that device drivers should not allocate substantial freezing them earlier. [Of course, this also means that device drivers
amounts of memory from their .suspend() callbacks before hibernation, but this should not allocate substantial amounts of memory from their .suspend()
is a separate issue.] callbacks before hibernation, but this is a separate issue.]
3. The third reason is to prevent user space processes and some kernel threads 3. The third reason is to prevent user space processes and some kernel threads
from interfering with the suspending and resuming of devices. A user space from interfering with the suspending and resuming of devices. A user space
process running on a second CPU while we are suspending devices may, for process running on a second CPU while we are suspending devices may, for
example, be troublesome and without the freezing of tasks we would need some example, be troublesome and without the freezing of tasks we would need some
safeguards against race conditions that might occur in such a case. safeguards against race conditions that might occur in such a case.
Although Linus Torvalds doesn't like the freezing of tasks, he said this in one Although Linus Torvalds doesn't like the freezing of tasks, he said this in one
of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608): of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
...@@ -132,7 +140,7 @@ of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608): ...@@ -132,7 +140,7 @@ of the discussions on LKML (http://lkml.org/lkml/2007/4/27/608):
Linus: In many ways, 'at all'. Linus: In many ways, 'at all'.
I _do_ realize the IO request queue issues, and that we cannot actually do I **do** realize the IO request queue issues, and that we cannot actually do
s2ram with some devices in the middle of a DMA. So we want to be able to s2ram with some devices in the middle of a DMA. So we want to be able to
avoid *that*, there's no question about that. And I suspect that stopping avoid *that*, there's no question about that. And I suspect that stopping
user threads and then waiting for a sync is practically one of the easier user threads and then waiting for a sync is practically one of the easier
...@@ -150,17 +158,18 @@ thawed after the driver's .resume() callback has run, so it won't be accessing ...@@ -150,17 +158,18 @@ thawed after the driver's .resume() callback has run, so it won't be accessing
the device while it's suspended. the device while it's suspended.
4. Another reason for freezing tasks is to prevent user space processes from 4. Another reason for freezing tasks is to prevent user space processes from
realizing that hibernation (or suspend) operation takes place. Ideally, user realizing that hibernation (or suspend) operation takes place. Ideally, user
space processes should not notice that such a system-wide operation has occurred space processes should not notice that such a system-wide operation has
and should continue running without any problems after the restore (or resume occurred and should continue running without any problems after the restore
from suspend). Unfortunately, in the most general case this is quite difficult (or resume from suspend). Unfortunately, in the most general case this
to achieve without the freezing of tasks. Consider, for example, a process is quite difficult to achieve without the freezing of tasks. Consider,
that depends on all CPUs being online while it's running. Since we need to for example, a process that depends on all CPUs being online while it's
disable nonboot CPUs during the hibernation, if this process is not frozen, it running. Since we need to disable nonboot CPUs during the hibernation,
may notice that the number of CPUs has changed and may start to work incorrectly if this process is not frozen, it may notice that the number of CPUs has
because of that. changed and may start to work incorrectly because of that.
V. Are there any problems related to the freezing of tasks? V. Are there any problems related to the freezing of tasks?
===========================================================
Yes, there are. Yes, there are.
...@@ -172,11 +181,12 @@ may be undesirable. That's why kernel threads are not freezable by default. ...@@ -172,11 +181,12 @@ may be undesirable. That's why kernel threads are not freezable by default.
Second, there are the following two problems related to the freezing of user Second, there are the following two problems related to the freezing of user
space processes: space processes:
1. Putting processes into an uninterruptible sleep distorts the load average. 1. Putting processes into an uninterruptible sleep distorts the load average.
2. Now that we have FUSE, plus the framework for doing device drivers in 2. Now that we have FUSE, plus the framework for doing device drivers in
userspace, it gets even more complicated because some userspace processes are userspace, it gets even more complicated because some userspace processes are
now doing the sorts of things that kernel threads do now doing the sorts of things that kernel threads do
(https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html). (https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012309.html).
The problem 1. seems to be fixable, although it hasn't been fixed so far. The The problem 1. seems to be fixable, although it hasn't been fixed so far. The
other one is more serious, but it seems that we can work around it by using other one is more serious, but it seems that we can work around it by using
...@@ -201,6 +211,7 @@ requested early enough using the suspend notifier API described in ...@@ -201,6 +211,7 @@ requested early enough using the suspend notifier API described in
Documentation/driver-api/pm/notifiers.rst. Documentation/driver-api/pm/notifiers.rst.
VI. Are there any precautions to be taken to prevent freezing failures? VI. Are there any precautions to be taken to prevent freezing failures?
=======================================================================
Yes, there are. Yes, there are.
...@@ -226,6 +237,8 @@ So, to summarize, use [un]lock_system_sleep() instead of directly using ...@@ -226,6 +237,8 @@ So, to summarize, use [un]lock_system_sleep() instead of directly using
mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures. mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures.
V. Miscellaneous V. Miscellaneous
================
/sys/power/pm_freeze_timeout controls how long it will cost at most to freeze /sys/power/pm_freeze_timeout controls how long it will cost at most to freeze
all user space processes or all freezable kernel threads, in unit of millisecond. all user space processes or all freezable kernel threads, in unit of millisecond.
The default value is 20000, with range of unsigned integer. The default value is 20000, with range of unsigned integer.
:orphan:
================
Power Management
================
.. toctree::
:maxdepth: 1
apm-acpi
basic-pm-debugging
charger-manager
drivers-testing
energy-model
freezing-of-tasks
interface
opp
pci
pm_qos_interface
power_supply_class
runtime_pm
s2ram
suspend-and-cpuhotplug
suspend-and-interrupts
swsusp-and-swap-files
swsusp-dmcrypt
swsusp
video
tricks
userland-swsusp
powercap/powercap
regulator/consumer
regulator/design
regulator/machine
regulator/overview
regulator/regulator
.. only:: subproject and html
Indices
=======
* :ref:`genindex`
===========================================
Power Management Interface for System Sleep Power Management Interface for System Sleep
===========================================
Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com> Copyright (c) 2016 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
...@@ -11,10 +13,10 @@ mounted at /sys). ...@@ -11,10 +13,10 @@ mounted at /sys).
Reading from it returns a list of supported sleep states, encoded as: Reading from it returns a list of supported sleep states, encoded as:
'freeze' (Suspend-to-Idle) - 'freeze' (Suspend-to-Idle)
'standby' (Power-On Suspend) - 'standby' (Power-On Suspend)
'mem' (Suspend-to-RAM) - 'mem' (Suspend-to-RAM)
'disk' (Suspend-to-Disk) - 'disk' (Suspend-to-Disk)
Suspend-to-Idle is always supported. Suspend-to-Disk is always supported Suspend-to-Idle is always supported. Suspend-to-Disk is always supported
too as long the kernel has been configured to support hibernation at all too as long the kernel has been configured to support hibernation at all
...@@ -32,18 +34,18 @@ Specifically, it tells the kernel what to do after creating a hibernation image. ...@@ -32,18 +34,18 @@ Specifically, it tells the kernel what to do after creating a hibernation image.
Reading from it returns a list of supported options encoded as: Reading from it returns a list of supported options encoded as:
'platform' (put the system into sleep using a platform-provided method) - 'platform' (put the system into sleep using a platform-provided method)
'shutdown' (shut the system down) - 'shutdown' (shut the system down)
'reboot' (reboot the system) - 'reboot' (reboot the system)
'suspend' (trigger a Suspend-to-RAM transition) - 'suspend' (trigger a Suspend-to-RAM transition)
'test_resume' (resume-after-hibernation test mode) - 'test_resume' (resume-after-hibernation test mode)
The currently selected option is printed in square brackets. The currently selected option is printed in square brackets.
The 'platform' option is only available if the platform provides a special The 'platform' option is only available if the platform provides a special
mechanism to put the system to sleep after creating a hibernation image (ACPI mechanism to put the system to sleep after creating a hibernation image (ACPI
does that, for example). The 'suspend' option is available if Suspend-to-RAM does that, for example). The 'suspend' option is available if Suspend-to-RAM
is supported. Refer to Documentation/power/basic-pm-debugging.txt for the is supported. Refer to Documentation/power/basic-pm-debugging.rst for the
description of the 'test_resume' option. description of the 'test_resume' option.
To select an option, write the string representing it to /sys/power/disk. To select an option, write the string representing it to /sys/power/disk.
...@@ -71,7 +73,7 @@ If /sys/power/pm_trace contains '1', the fingerprint of each suspend/resume ...@@ -71,7 +73,7 @@ If /sys/power/pm_trace contains '1', the fingerprint of each suspend/resume
event point in turn will be stored in the RTC memory (overwriting the actual event point in turn will be stored in the RTC memory (overwriting the actual
RTC information), so it will survive a system crash if one occurs right after RTC information), so it will survive a system crash if one occurs right after
storing it and it can be used later to identify the driver that caused the crash storing it and it can be used later to identify the driver that caused the crash
to happen (see Documentation/power/s2ram.txt for more information). to happen (see Documentation/power/s2ram.rst for more information).
Initially it contains '0' which may be changed to '1' by writing a string Initially it contains '0' which may be changed to '1' by writing a string
representing a nonzero integer into it. representing a nonzero integer into it.
==========================================
Operating Performance Points (OPP) Library Operating Performance Points (OPP) Library
========================================== ==========================================
(C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated (C) 2009-2010 Nishanth Menon <nm@ti.com>, Texas Instruments Incorporated
Contents .. Contents
--------
1. Introduction 1. Introduction
2. Initial OPP List Registration 2. Initial OPP List Registration
3. OPP Search Functions 3. OPP Search Functions
4. OPP Availability Control Functions 4. OPP Availability Control Functions
5. OPP Data Retrieval Functions 5. OPP Data Retrieval Functions
6. Data Structures 6. Data Structures
1. Introduction 1. Introduction
=============== ===============
1.1 What is an Operating Performance Point (OPP)? 1.1 What is an Operating Performance Point (OPP)?
-------------------------------------------------
Complex SoCs of today consists of a multiple sub-modules working in conjunction. Complex SoCs of today consists of a multiple sub-modules working in conjunction.
In an operational system executing varied use cases, not all modules in the SoC In an operational system executing varied use cases, not all modules in the SoC
...@@ -28,16 +31,19 @@ the device will support per domain are called Operating Performance Points or ...@@ -28,16 +31,19 @@ the device will support per domain are called Operating Performance Points or
OPPs. OPPs.
As an example: As an example:
Let us consider an MPU device which supports the following: Let us consider an MPU device which supports the following:
{300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V}, {300MHz at minimum voltage of 1V}, {800MHz at minimum voltage of 1.2V},
{1GHz at minimum voltage of 1.3V} {1GHz at minimum voltage of 1.3V}
We can represent these as three OPPs as the following {Hz, uV} tuples: We can represent these as three OPPs as the following {Hz, uV} tuples:
{300000000, 1000000}
{800000000, 1200000} - {300000000, 1000000}
{1000000000, 1300000} - {800000000, 1200000}
- {1000000000, 1300000}
1.2 Operating Performance Points Library 1.2 Operating Performance Points Library
----------------------------------------
OPP library provides a set of helper functions to organize and query the OPP OPP library provides a set of helper functions to organize and query the OPP
information. The library is located in drivers/base/power/opp.c and the header information. The library is located in drivers/base/power/opp.c and the header
...@@ -46,9 +52,10 @@ CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on ...@@ -46,9 +52,10 @@ CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
optionally boot at a certain OPP without needing cpufreq. optionally boot at a certain OPP without needing cpufreq.
Typical usage of the OPP library is as follows: Typical usage of the OPP library is as follows::
(users) -> registers a set of default OPPs -> (library)
SoC framework -> modifies on required cases certain OPPs -> OPP layer (users) -> registers a set of default OPPs -> (library)
SoC framework -> modifies on required cases certain OPPs -> OPP layer
-> queries to search/retrieve information -> -> queries to search/retrieve information ->
OPP layer expects each domain to be represented by a unique device pointer. SoC OPP layer expects each domain to be represented by a unique device pointer. SoC
...@@ -57,8 +64,9 @@ list is expected to be an optimally small number typically around 5 per device. ...@@ -57,8 +64,9 @@ list is expected to be an optimally small number typically around 5 per device.
This initial list contains a set of OPPs that the framework expects to be safely This initial list contains a set of OPPs that the framework expects to be safely
enabled by default in the system. enabled by default in the system.
Note on OPP Availability: Note on OPP Availability
------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^
As the system proceeds to operate, SoC framework may choose to make certain As the system proceeds to operate, SoC framework may choose to make certain
OPPs available or not available on each device based on various external OPPs available or not available on each device based on various external
factors. Example usage: Thermal management or other exceptional situations where factors. Example usage: Thermal management or other exceptional situations where
...@@ -88,7 +96,8 @@ registering the OPPs is maintained by OPP library throughout the device ...@@ -88,7 +96,8 @@ registering the OPPs is maintained by OPP library throughout the device
operation. The SoC framework can subsequently control the availability of the operation. The SoC framework can subsequently control the availability of the
OPPs dynamically using the dev_pm_opp_enable / disable functions. OPPs dynamically using the dev_pm_opp_enable / disable functions.
dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer. dev_pm_opp_add
Add a new OPP for a specific domain represented by the device pointer.
The OPP is defined using the frequency and voltage. Once added, the OPP The OPP is defined using the frequency and voltage. Once added, the OPP
is assumed to be available and control of it's availability can be done is assumed to be available and control of it's availability can be done
with the dev_pm_opp_enable/disable functions. OPP library internally stores with the dev_pm_opp_enable/disable functions. OPP library internally stores
...@@ -96,9 +105,11 @@ dev_pm_opp_add - Add a new OPP for a specific domain represented by the device p ...@@ -96,9 +105,11 @@ dev_pm_opp_add - Add a new OPP for a specific domain represented by the device p
used by SoC framework to define a optimal list as per the demands of used by SoC framework to define a optimal list as per the demands of
SoC usage environment. SoC usage environment.
WARNING: Do not use this function in interrupt context. WARNING:
Do not use this function in interrupt context.
Example::
Example:
soc_pm_init() soc_pm_init()
{ {
/* Do things */ /* Do things */
...@@ -125,12 +136,15 @@ Callers of these functions shall call dev_pm_opp_put() after they have used the ...@@ -125,12 +136,15 @@ Callers of these functions shall call dev_pm_opp_put() after they have used the
OPP. Otherwise the memory for the OPP will never get freed and result in OPP. Otherwise the memory for the OPP will never get freed and result in
memleak. memleak.
dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and dev_pm_opp_find_freq_exact
Search for an OPP based on an *exact* frequency and
availability. This function is especially useful to enable an OPP which availability. This function is especially useful to enable an OPP which
is not available by default. is not available by default.
Example: In a case when SoC framework detects a situation where a Example: In a case when SoC framework detects a situation where a
higher frequency could be made available, it can use this function to higher frequency could be made available, it can use this function to
find the OPP prior to call the dev_pm_opp_enable to actually make it available. find the OPP prior to call the dev_pm_opp_enable to actually make
it available::
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
dev_pm_opp_put(opp); dev_pm_opp_put(opp);
/* dont operate on the pointer.. just do a sanity check.. */ /* dont operate on the pointer.. just do a sanity check.. */
...@@ -141,27 +155,34 @@ dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and ...@@ -141,27 +155,34 @@ dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
dev_pm_opp_enable(dev,1000000000); dev_pm_opp_enable(dev,1000000000);
} }
NOTE: This is the only search function that operates on OPPs which are NOTE:
not available. This is the only search function that operates on OPPs which are
not available.
dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the dev_pm_opp_find_freq_floor
Search for an available OPP which is *at most* the
provided frequency. This function is useful while searching for a lesser provided frequency. This function is useful while searching for a lesser
match OR operating on OPP information in the order of decreasing match OR operating on OPP information in the order of decreasing
frequency. frequency.
Example: To find the highest opp for a device: Example: To find the highest opp for a device::
freq = ULONG_MAX; freq = ULONG_MAX;
opp = dev_pm_opp_find_freq_floor(dev, &freq); opp = dev_pm_opp_find_freq_floor(dev, &freq);
dev_pm_opp_put(opp); dev_pm_opp_put(opp);
dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the dev_pm_opp_find_freq_ceil
Search for an available OPP which is *at least* the
provided frequency. This function is useful while searching for a provided frequency. This function is useful while searching for a
higher match OR operating on OPP information in the order of increasing higher match OR operating on OPP information in the order of increasing
frequency. frequency.
Example 1: To find the lowest opp for a device: Example 1: To find the lowest opp for a device::
freq = 0; freq = 0;
opp = dev_pm_opp_find_freq_ceil(dev, &freq); opp = dev_pm_opp_find_freq_ceil(dev, &freq);
dev_pm_opp_put(opp); dev_pm_opp_put(opp);
Example 2: A simplified implementation of a SoC cpufreq_driver->target:
Example 2: A simplified implementation of a SoC cpufreq_driver->target::
soc_cpufreq_target(..) soc_cpufreq_target(..)
{ {
/* Do stuff like policy checks etc. */ /* Do stuff like policy checks etc. */
...@@ -184,12 +205,15 @@ fine grained dynamic control of which sets of OPPs are operationally available. ...@@ -184,12 +205,15 @@ fine grained dynamic control of which sets of OPPs are operationally available.
These functions are intended to *temporarily* remove an OPP in conditions such These functions are intended to *temporarily* remove an OPP in conditions such
as thermal considerations (e.g. don't use OPPx until the temperature drops). as thermal considerations (e.g. don't use OPPx until the temperature drops).
WARNING: Do not use these functions in interrupt context. WARNING:
Do not use these functions in interrupt context.
dev_pm_opp_enable - Make a OPP available for operation. dev_pm_opp_enable
Make a OPP available for operation.
Example: Lets say that 1GHz OPP is to be made available only if the Example: Lets say that 1GHz OPP is to be made available only if the
SoC temperature is lower than a certain threshold. The SoC framework SoC temperature is lower than a certain threshold. The SoC framework
implementation might choose to do something as follows: implementation might choose to do something as follows::
if (cur_temp < temp_low_thresh) { if (cur_temp < temp_low_thresh) {
/* Enable 1GHz if it was disabled */ /* Enable 1GHz if it was disabled */
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
...@@ -201,10 +225,12 @@ dev_pm_opp_enable - Make a OPP available for operation. ...@@ -201,10 +225,12 @@ dev_pm_opp_enable - Make a OPP available for operation.
goto try_something_else; goto try_something_else;
} }
dev_pm_opp_disable - Make an OPP to be not available for operation dev_pm_opp_disable
Make an OPP to be not available for operation
Example: Lets say that 1GHz OPP is to be disabled if the temperature Example: Lets say that 1GHz OPP is to be disabled if the temperature
exceeds a threshold value. The SoC framework implementation might exceeds a threshold value. The SoC framework implementation might
choose to do something as follows: choose to do something as follows::
if (cur_temp > temp_high_thresh) { if (cur_temp > temp_high_thresh) {
/* Disable 1GHz if it was enabled */ /* Disable 1GHz if it was enabled */
opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true); opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
...@@ -223,11 +249,13 @@ information from the OPP structure is necessary. Once an OPP pointer is ...@@ -223,11 +249,13 @@ information from the OPP structure is necessary. Once an OPP pointer is
retrieved using the search functions, the following functions can be used by SoC retrieved using the search functions, the following functions can be used by SoC
framework to retrieve the information represented inside the OPP layer. framework to retrieve the information represented inside the OPP layer.
dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer. dev_pm_opp_get_voltage
Retrieve the voltage represented by the opp pointer.
Example: At a cpufreq transition to a different frequency, SoC Example: At a cpufreq transition to a different frequency, SoC
framework requires to set the voltage represented by the OPP using framework requires to set the voltage represented by the OPP using
the regulator framework to the Power Management chip providing the the regulator framework to the Power Management chip providing the
voltage. voltage::
soc_switch_to_freq_voltage(freq) soc_switch_to_freq_voltage(freq)
{ {
/* do things */ /* do things */
...@@ -239,10 +267,12 @@ dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer. ...@@ -239,10 +267,12 @@ dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
/* do other things */ /* do other things */
} }
dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer. dev_pm_opp_get_freq
Retrieve the freq represented by the opp pointer.
Example: Lets say the SoC framework uses a couple of helper functions Example: Lets say the SoC framework uses a couple of helper functions
we could pass opp pointers instead of doing additional parameters to we could pass opp pointers instead of doing additional parameters to
handle quiet a bit of data parameters. handle quiet a bit of data parameters::
soc_cpufreq_target(..) soc_cpufreq_target(..)
{ {
/* do things.. */ /* do things.. */
...@@ -264,9 +294,11 @@ dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer. ...@@ -264,9 +294,11 @@ dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
/* do things.. */ /* do things.. */
} }
dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device dev_pm_opp_get_opp_count
Retrieve the number of available opps for a device
Example: Lets say a co-processor in the SoC needs to know the available Example: Lets say a co-processor in the SoC needs to know the available
frequencies in a table, the main processor can notify as following: frequencies in a table, the main processor can notify as following::
soc_notify_coproc_available_frequencies() soc_notify_coproc_available_frequencies()
{ {
/* Do things */ /* Do things */
...@@ -289,54 +321,59 @@ dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device ...@@ -289,54 +321,59 @@ dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
================== ==================
Typically an SoC contains multiple voltage domains which are variable. Each Typically an SoC contains multiple voltage domains which are variable. Each
domain is represented by a device pointer. The relationship to OPP can be domain is represented by a device pointer. The relationship to OPP can be
represented as follows: represented as follows::
SoC
|- device 1 SoC
| |- opp 1 (availability, freq, voltage) |- device 1
| |- opp 2 .. | |- opp 1 (availability, freq, voltage)
... ... | |- opp 2 ..
| `- opp n .. ... ...
|- device 2 | `- opp n ..
... |- device 2
`- device m ...
`- device m
OPP library maintains a internal list that the SoC framework populates and OPP library maintains a internal list that the SoC framework populates and
accessed by various functions as described above. However, the structures accessed by various functions as described above. However, the structures
representing the actual OPPs and domains are internal to the OPP library itself representing the actual OPPs and domains are internal to the OPP library itself
to allow for suitable abstraction reusable across systems. to allow for suitable abstraction reusable across systems.
struct dev_pm_opp - The internal data structure of OPP library which is used to struct dev_pm_opp
The internal data structure of OPP library which is used to
represent an OPP. In addition to the freq, voltage, availability represent an OPP. In addition to the freq, voltage, availability
information, it also contains internal book keeping information required information, it also contains internal book keeping information required
for the OPP library to operate on. Pointer to this structure is for the OPP library to operate on. Pointer to this structure is
provided back to the users such as SoC framework to be used as a provided back to the users such as SoC framework to be used as a
identifier for OPP in the interactions with OPP layer. identifier for OPP in the interactions with OPP layer.
WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the WARNING:
users. The defaults of for an instance is populated by dev_pm_opp_add, but the The struct dev_pm_opp pointer should not be parsed or modified by the
availability of the OPP can be modified by dev_pm_opp_enable/disable functions. users. The defaults of for an instance is populated by
dev_pm_opp_add, but the availability of the OPP can be modified
by dev_pm_opp_enable/disable functions.
struct device - This is used to identify a domain to the OPP layer. The struct device
This is used to identify a domain to the OPP layer. The
nature of the device and it's implementation is left to the user of nature of the device and it's implementation is left to the user of
OPP library such as the SoC framework. OPP library such as the SoC framework.
Overall, in a simplistic view, the data structure operations is represented as Overall, in a simplistic view, the data structure operations is represented as
following: following::
Initialization / modification: Initialization / modification:
+-----+ /- dev_pm_opp_enable +-----+ /- dev_pm_opp_enable
dev_pm_opp_add --> | opp | <------- dev_pm_opp_add --> | opp | <-------
| +-----+ \- dev_pm_opp_disable | +-----+ \- dev_pm_opp_disable
\-------> domain_info(device) \-------> domain_info(device)
Search functions: Search functions:
/-- dev_pm_opp_find_freq_ceil ---\ +-----+ /-- dev_pm_opp_find_freq_ceil ---\ +-----+
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp | domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
\-- dev_pm_opp_find_freq_floor ---/ +-----+ \-- dev_pm_opp_find_freq_floor ---/ +-----+
Retrieval functions: Retrieval functions:
+-----+ /- dev_pm_opp_get_voltage +-----+ /- dev_pm_opp_get_voltage
| opp | <--- | opp | <---
+-----+ \- dev_pm_opp_get_freq +-----+ \- dev_pm_opp_get_freq
domain_info <- dev_pm_opp_get_opp_count domain_info <- dev_pm_opp_get_opp_count
====================
PCI Power Management PCI Power Management
====================
Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
...@@ -9,14 +11,14 @@ management. Based on previous work by Patrick Mochel <mochel@transmeta.com> ...@@ -9,14 +11,14 @@ management. Based on previous work by Patrick Mochel <mochel@transmeta.com>
This document only covers the aspects of power management specific to PCI This document only covers the aspects of power management specific to PCI
devices. For general description of the kernel's interfaces related to device devices. For general description of the kernel's interfaces related to device
power management refer to Documentation/driver-api/pm/devices.rst and power management refer to Documentation/driver-api/pm/devices.rst and
Documentation/power/runtime_pm.txt. Documentation/power/runtime_pm.rst.
--------------------------------------------------------------------------- .. contents:
1. Hardware and Platform Support for PCI Power Management 1. Hardware and Platform Support for PCI Power Management
2. PCI Subsystem and Device Power Management 2. PCI Subsystem and Device Power Management
3. PCI Device Drivers and Power Management 3. PCI Device Drivers and Power Management
4. Resources 4. Resources
1. Hardware and Platform Support for PCI Power Management 1. Hardware and Platform Support for PCI Power Management
...@@ -24,6 +26,7 @@ Documentation/power/runtime_pm.txt. ...@@ -24,6 +26,7 @@ Documentation/power/runtime_pm.txt.
1.1. Native and Platform-Based Power Management 1.1. Native and Platform-Based Power Management
----------------------------------------------- -----------------------------------------------
In general, power management is a feature allowing one to save energy by putting In general, power management is a feature allowing one to save energy by putting
devices into states in which they draw less power (low-power states) at the devices into states in which they draw less power (low-power states) at the
price of reduced functionality or performance. price of reduced functionality or performance.
...@@ -67,6 +70,7 @@ mechanisms have to be used simultaneously to obtain the desired result. ...@@ -67,6 +70,7 @@ mechanisms have to be used simultaneously to obtain the desired result.
1.2. Native PCI Power Management 1.2. Native PCI Power Management
-------------------------------- --------------------------------
The PCI Bus Power Management Interface Specification (PCI PM Spec) was The PCI Bus Power Management Interface Specification (PCI PM Spec) was
introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
standard interface for performing various operations related to power standard interface for performing various operations related to power
...@@ -134,6 +138,7 @@ sufficiently active to generate a wakeup signal. ...@@ -134,6 +138,7 @@ sufficiently active to generate a wakeup signal.
1.3. ACPI Device Power Management 1.3. ACPI Device Power Management
--------------------------------- ---------------------------------
The platform firmware support for the power management of PCI devices is The platform firmware support for the power management of PCI devices is
system-specific. However, if the system in question is compliant with the system-specific. However, if the system in question is compliant with the
Advanced Configuration and Power Interface (ACPI) Specification, like the Advanced Configuration and Power Interface (ACPI) Specification, like the
...@@ -194,6 +199,7 @@ enabled for the device to be able to generate wakeup signals. ...@@ -194,6 +199,7 @@ enabled for the device to be able to generate wakeup signals.
1.4. Wakeup Signaling 1.4. Wakeup Signaling
--------------------- ---------------------
Wakeup signals generated by PCI devices, either as native PCI PMEs, or as Wakeup signals generated by PCI devices, either as native PCI PMEs, or as
a result of the execution of the _DSW (or _PSW) ACPI control method before a result of the execution of the _DSW (or _PSW) ACPI control method before
putting the device into a low-power state, have to be caught and handled as putting the device into a low-power state, have to be caught and handled as
...@@ -265,14 +271,15 @@ the native PCI Express PME signaling cannot be used by the kernel in that case. ...@@ -265,14 +271,15 @@ the native PCI Express PME signaling cannot be used by the kernel in that case.
2.1. Device Power Management Callbacks 2.1. Device Power Management Callbacks
-------------------------------------- --------------------------------------
The PCI Subsystem participates in the power management of PCI devices in a The PCI Subsystem participates in the power management of PCI devices in a
number of ways. First of all, it provides an intermediate code layer between number of ways. First of all, it provides an intermediate code layer between
the device power management core (PM core) and PCI device drivers. the device power management core (PM core) and PCI device drivers.
Specifically, the pm field of the PCI subsystem's struct bus_type object, Specifically, the pm field of the PCI subsystem's struct bus_type object,
pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing
pointers to several device power management callbacks: pointers to several device power management callbacks::
const struct dev_pm_ops pci_dev_pm_ops = { const struct dev_pm_ops pci_dev_pm_ops = {
.prepare = pci_pm_prepare, .prepare = pci_pm_prepare,
.complete = pci_pm_complete, .complete = pci_pm_complete,
.suspend = pci_pm_suspend, .suspend = pci_pm_suspend,
...@@ -290,7 +297,7 @@ const struct dev_pm_ops pci_dev_pm_ops = { ...@@ -290,7 +297,7 @@ const struct dev_pm_ops pci_dev_pm_ops = {
.runtime_suspend = pci_pm_runtime_suspend, .runtime_suspend = pci_pm_runtime_suspend,
.runtime_resume = pci_pm_runtime_resume, .runtime_resume = pci_pm_runtime_resume,
.runtime_idle = pci_pm_runtime_idle, .runtime_idle = pci_pm_runtime_idle,
}; };
These callbacks are executed by the PM core in various situations related to These callbacks are executed by the PM core in various situations related to
device power management and they, in turn, execute power management callbacks device power management and they, in turn, execute power management callbacks
...@@ -299,9 +306,9 @@ involving some standard configuration registers of PCI devices that device ...@@ -299,9 +306,9 @@ involving some standard configuration registers of PCI devices that device
drivers need not know or care about. drivers need not know or care about.
The structure representing a PCI device, struct pci_dev, contains several fields The structure representing a PCI device, struct pci_dev, contains several fields
that these callbacks operate on: that these callbacks operate on::
struct pci_dev { struct pci_dev {
... ...
pci_power_t current_state; /* Current operating state. */ pci_power_t current_state; /* Current operating state. */
int pm_cap; /* PM capability offset in the int pm_cap; /* PM capability offset in the
...@@ -315,13 +322,14 @@ struct pci_dev { ...@@ -315,13 +322,14 @@ struct pci_dev {
unsigned int wakeup_prepared:1; /* Device prepared for wake up */ unsigned int wakeup_prepared:1; /* Device prepared for wake up */
unsigned int d3_delay; /* D3->D0 transition time in ms */ unsigned int d3_delay; /* D3->D0 transition time in ms */
... ...
}; };
They also indirectly use some fields of the struct device that is embedded in They also indirectly use some fields of the struct device that is embedded in
struct pci_dev. struct pci_dev.
2.2. Device Initialization 2.2. Device Initialization
-------------------------- --------------------------
The PCI subsystem's first task related to device power management is to The PCI subsystem's first task related to device power management is to
prepare the device for power management and initialize the fields of struct prepare the device for power management and initialize the fields of struct
pci_dev used for this purpose. This happens in two functions defined in pci_dev used for this purpose. This happens in two functions defined in
...@@ -348,10 +356,11 @@ during system-wide transitions to a sleep state and back to the working state. ...@@ -348,10 +356,11 @@ during system-wide transitions to a sleep state and back to the working state.
2.3. Runtime Device Power Management 2.3. Runtime Device Power Management
------------------------------------ ------------------------------------
The PCI subsystem plays a vital role in the runtime power management of PCI The PCI subsystem plays a vital role in the runtime power management of PCI
devices. For this purpose it uses the general runtime power management devices. For this purpose it uses the general runtime power management
(runtime PM) framework described in Documentation/power/runtime_pm.txt. (runtime PM) framework described in Documentation/power/runtime_pm.rst.
Namely, it provides subsystem-level callbacks: Namely, it provides subsystem-level callbacks::
pci_pm_runtime_suspend() pci_pm_runtime_suspend()
pci_pm_runtime_resume() pci_pm_runtime_resume()
...@@ -425,13 +434,14 @@ to the given subsystem before the next phase begins. These phases always run ...@@ -425,13 +434,14 @@ to the given subsystem before the next phase begins. These phases always run
after tasks have been frozen. after tasks have been frozen.
2.4.1. System Suspend 2.4.1. System Suspend
^^^^^^^^^^^^^^^^^^^^^
When the system is going into a sleep state in which the contents of memory will When the system is going into a sleep state in which the contents of memory will
be preserved, such as one of the ACPI sleep states S1-S3, the phases are: be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
prepare, suspend, suspend_noirq. prepare, suspend, suspend_noirq.
The following PCI bus type's callbacks, respectively, are used in these phases: The following PCI bus type's callbacks, respectively, are used in these phases::
pci_pm_prepare() pci_pm_prepare()
pci_pm_suspend() pci_pm_suspend()
...@@ -492,6 +502,7 @@ this purpose). PCI device drivers are not encouraged to do that, but in some ...@@ -492,6 +502,7 @@ this purpose). PCI device drivers are not encouraged to do that, but in some
rare cases doing that in the driver may be the optimum approach. rare cases doing that in the driver may be the optimum approach.
2.4.2. System Resume 2.4.2. System Resume
^^^^^^^^^^^^^^^^^^^^
When the system is undergoing a transition from a sleep state in which the When the system is undergoing a transition from a sleep state in which the
contents of memory have been preserved, such as one of the ACPI sleep states contents of memory have been preserved, such as one of the ACPI sleep states
...@@ -500,7 +511,7 @@ S1-S3, into the working state (ACPI S0), the phases are: ...@@ -500,7 +511,7 @@ S1-S3, into the working state (ACPI S0), the phases are:
resume_noirq, resume, complete. resume_noirq, resume, complete.
The following PCI bus type's callbacks, respectively, are executed in these The following PCI bus type's callbacks, respectively, are executed in these
phases: phases::
pci_pm_resume_noirq() pci_pm_resume_noirq()
pci_pm_resume() pci_pm_resume()
...@@ -539,6 +550,7 @@ The pci_pm_complete() routine only executes the device driver's pm->complete() ...@@ -539,6 +550,7 @@ The pci_pm_complete() routine only executes the device driver's pm->complete()
callback, if defined. callback, if defined.
2.4.3. System Hibernation 2.4.3. System Hibernation
^^^^^^^^^^^^^^^^^^^^^^^^^
System hibernation is more complicated than system suspend, because it requires System hibernation is more complicated than system suspend, because it requires
a system image to be created and written into a persistent storage medium. The a system image to be created and written into a persistent storage medium. The
...@@ -551,7 +563,7 @@ to be free) in the following three phases: ...@@ -551,7 +563,7 @@ to be free) in the following three phases:
prepare, freeze, freeze_noirq prepare, freeze, freeze_noirq
that correspond to the PCI bus type's callbacks: that correspond to the PCI bus type's callbacks::
pci_pm_prepare() pci_pm_prepare()
pci_pm_freeze() pci_pm_freeze()
...@@ -580,7 +592,7 @@ back to the fully functional state and this is done in the following phases: ...@@ -580,7 +592,7 @@ back to the fully functional state and this is done in the following phases:
thaw_noirq, thaw, complete thaw_noirq, thaw, complete
using the following PCI bus type's callbacks: using the following PCI bus type's callbacks::
pci_pm_thaw_noirq() pci_pm_thaw_noirq()
pci_pm_thaw() pci_pm_thaw()
...@@ -608,7 +620,7 @@ three phases: ...@@ -608,7 +620,7 @@ three phases:
where the prepare phase is exactly the same as for system suspend. The other where the prepare phase is exactly the same as for system suspend. The other
two phases are analogous to the suspend and suspend_noirq phases, respectively. two phases are analogous to the suspend and suspend_noirq phases, respectively.
The PCI subsystem-level callbacks they correspond to The PCI subsystem-level callbacks they correspond to::
pci_pm_poweroff() pci_pm_poweroff()
pci_pm_poweroff_noirq() pci_pm_poweroff_noirq()
...@@ -618,6 +630,7 @@ although they don't attempt to save the device's standard configuration ...@@ -618,6 +630,7 @@ although they don't attempt to save the device's standard configuration
registers. registers.
2.4.4. System Restore 2.4.4. System Restore
^^^^^^^^^^^^^^^^^^^^^
System restore requires a hibernation image to be loaded into memory and the System restore requires a hibernation image to be loaded into memory and the
pre-hibernation memory contents to be restored before the pre-hibernation system pre-hibernation memory contents to be restored before the pre-hibernation system
...@@ -653,7 +666,7 @@ phases: ...@@ -653,7 +666,7 @@ phases:
The first two of these are analogous to the resume_noirq and resume phases The first two of these are analogous to the resume_noirq and resume phases
described above, respectively, and correspond to the following PCI subsystem described above, respectively, and correspond to the following PCI subsystem
callbacks: callbacks::
pci_pm_restore_noirq() pci_pm_restore_noirq()
pci_pm_restore() pci_pm_restore()
...@@ -671,6 +684,7 @@ resume. ...@@ -671,6 +684,7 @@ resume.
3.1. Power Management Callbacks 3.1. Power Management Callbacks
------------------------------- -------------------------------
PCI device drivers participate in power management by providing callbacks to be PCI device drivers participate in power management by providing callbacks to be
executed by the PCI subsystem's power management routines described above and by executed by the PCI subsystem's power management routines described above and by
controlling the runtime power management of their devices. controlling the runtime power management of their devices.
...@@ -698,6 +712,7 @@ defined, though, they are expected to behave as described in the following ...@@ -698,6 +712,7 @@ defined, though, they are expected to behave as described in the following
subsections. subsections.
3.1.1. prepare() 3.1.1. prepare()
^^^^^^^^^^^^^^^^
The prepare() callback is executed during system suspend, during hibernation The prepare() callback is executed during system suspend, during hibernation
(when a hibernation image is about to be created), during power-off after (when a hibernation image is about to be created), during power-off after
...@@ -716,6 +731,7 @@ preallocated earlier, for example in a suspend/hibernate notifier as described ...@@ -716,6 +731,7 @@ preallocated earlier, for example in a suspend/hibernate notifier as described
in Documentation/driver-api/pm/notifiers.rst). in Documentation/driver-api/pm/notifiers.rst).
3.1.2. suspend() 3.1.2. suspend()
^^^^^^^^^^^^^^^^
The suspend() callback is only executed during system suspend, after prepare() The suspend() callback is only executed during system suspend, after prepare()
callbacks have been executed for all devices in the system. callbacks have been executed for all devices in the system.
...@@ -742,6 +758,7 @@ operations relying on the driver's ability to handle interrupts should be ...@@ -742,6 +758,7 @@ operations relying on the driver's ability to handle interrupts should be
carried out in this callback. carried out in this callback.
3.1.3. suspend_noirq() 3.1.3. suspend_noirq()
^^^^^^^^^^^^^^^^^^^^^^
The suspend_noirq() callback is only executed during system suspend, after The suspend_noirq() callback is only executed during system suspend, after
suspend() callbacks have been executed for all devices in the system and suspend() callbacks have been executed for all devices in the system and
...@@ -753,6 +770,7 @@ suspend_noirq() can carry out operations that would cause race conditions to ...@@ -753,6 +770,7 @@ suspend_noirq() can carry out operations that would cause race conditions to
arise if they were performed in suspend(). arise if they were performed in suspend().
3.1.4. freeze() 3.1.4. freeze()
^^^^^^^^^^^^^^^
The freeze() callback is hibernation-specific and is executed in two situations, The freeze() callback is hibernation-specific and is executed in two situations,
during hibernation, after prepare() callbacks have been executed for all devices during hibernation, after prepare() callbacks have been executed for all devices
...@@ -770,6 +788,7 @@ or put it into a low-power state. Still, either it or freeze_noirq() should ...@@ -770,6 +788,7 @@ or put it into a low-power state. Still, either it or freeze_noirq() should
save the device's standard configuration registers using pci_save_state(). save the device's standard configuration registers using pci_save_state().
3.1.5. freeze_noirq() 3.1.5. freeze_noirq()
^^^^^^^^^^^^^^^^^^^^^
The freeze_noirq() callback is hibernation-specific. It is executed during The freeze_noirq() callback is hibernation-specific. It is executed during
hibernation, after prepare() and freeze() callbacks have been executed for all hibernation, after prepare() and freeze() callbacks have been executed for all
...@@ -786,6 +805,7 @@ The difference between freeze_noirq() and freeze() is analogous to the ...@@ -786,6 +805,7 @@ The difference between freeze_noirq() and freeze() is analogous to the
difference between suspend_noirq() and suspend(). difference between suspend_noirq() and suspend().
3.1.6. poweroff() 3.1.6. poweroff()
^^^^^^^^^^^^^^^^^
The poweroff() callback is hibernation-specific. It is executed when the system The poweroff() callback is hibernation-specific. It is executed when the system
is about to be powered off after saving a hibernation image to a persistent is about to be powered off after saving a hibernation image to a persistent
...@@ -802,6 +822,7 @@ into a low-power state, respectively, but it need not save the device's standard ...@@ -802,6 +822,7 @@ into a low-power state, respectively, but it need not save the device's standard
configuration registers. configuration registers.
3.1.7. poweroff_noirq() 3.1.7. poweroff_noirq()
^^^^^^^^^^^^^^^^^^^^^^^
The poweroff_noirq() callback is hibernation-specific. It is executed after The poweroff_noirq() callback is hibernation-specific. It is executed after
poweroff() callbacks have been executed for all devices in the system. poweroff() callbacks have been executed for all devices in the system.
...@@ -814,6 +835,7 @@ The difference between poweroff_noirq() and poweroff() is analogous to the ...@@ -814,6 +835,7 @@ The difference between poweroff_noirq() and poweroff() is analogous to the
difference between suspend_noirq() and suspend(). difference between suspend_noirq() and suspend().
3.1.8. resume_noirq() 3.1.8. resume_noirq()
^^^^^^^^^^^^^^^^^^^^^
The resume_noirq() callback is only executed during system resume, after the The resume_noirq() callback is only executed during system resume, after the
PM core has enabled the non-boot CPUs. The driver's interrupt handler will not PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
...@@ -827,6 +849,7 @@ it should only be used for performing operations that would lead to race ...@@ -827,6 +849,7 @@ it should only be used for performing operations that would lead to race
conditions if carried out by resume(). conditions if carried out by resume().
3.1.9. resume() 3.1.9. resume()
^^^^^^^^^^^^^^^
The resume() callback is only executed during system resume, after The resume() callback is only executed during system resume, after
resume_noirq() callbacks have been executed for all devices in the system and resume_noirq() callbacks have been executed for all devices in the system and
...@@ -837,6 +860,7 @@ device and bringing it back to the fully functional state. The device should be ...@@ -837,6 +860,7 @@ device and bringing it back to the fully functional state. The device should be
able to process I/O in a usual way after resume() has returned. able to process I/O in a usual way after resume() has returned.
3.1.10. thaw_noirq() 3.1.10. thaw_noirq()
^^^^^^^^^^^^^^^^^^^^
The thaw_noirq() callback is hibernation-specific. It is executed after a The thaw_noirq() callback is hibernation-specific. It is executed after a
system image has been created and the non-boot CPUs have been enabled by the PM system image has been created and the non-boot CPUs have been enabled by the PM
...@@ -851,6 +875,7 @@ freeze() and freeze_noirq(), so in general it does not need to modify the ...@@ -851,6 +875,7 @@ freeze() and freeze_noirq(), so in general it does not need to modify the
contents of the device's registers. contents of the device's registers.
3.1.11. thaw() 3.1.11. thaw()
^^^^^^^^^^^^^^
The thaw() callback is hibernation-specific. It is executed after thaw_noirq() The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
callbacks have been executed for all devices in the system and after device callbacks have been executed for all devices in the system and after device
...@@ -860,6 +885,7 @@ This callback is responsible for restoring the pre-freeze configuration of ...@@ -860,6 +885,7 @@ This callback is responsible for restoring the pre-freeze configuration of
the device, so that it will work in a usual way after thaw() has returned. the device, so that it will work in a usual way after thaw() has returned.
3.1.12. restore_noirq() 3.1.12. restore_noirq()
^^^^^^^^^^^^^^^^^^^^^^^
The restore_noirq() callback is hibernation-specific. It is executed in the The restore_noirq() callback is hibernation-specific. It is executed in the
restore_noirq phase of hibernation, when the boot kernel has passed control to restore_noirq phase of hibernation, when the boot kernel has passed control to
...@@ -875,6 +901,7 @@ For the vast majority of PCI device drivers there is no difference between ...@@ -875,6 +901,7 @@ For the vast majority of PCI device drivers there is no difference between
resume_noirq() and restore_noirq(). resume_noirq() and restore_noirq().
3.1.13. restore() 3.1.13. restore()
^^^^^^^^^^^^^^^^^
The restore() callback is hibernation-specific. It is executed after The restore() callback is hibernation-specific. It is executed after
restore_noirq() callbacks have been executed for all devices in the system and restore_noirq() callbacks have been executed for all devices in the system and
...@@ -888,14 +915,17 @@ For the vast majority of PCI device drivers there is no difference between ...@@ -888,14 +915,17 @@ For the vast majority of PCI device drivers there is no difference between
resume() and restore(). resume() and restore().
3.1.14. complete() 3.1.14. complete()
^^^^^^^^^^^^^^^^^^
The complete() callback is executed in the following situations: The complete() callback is executed in the following situations:
- during system resume, after resume() callbacks have been executed for all - during system resume, after resume() callbacks have been executed for all
devices, devices,
- during hibernation, before saving the system image, after thaw() callbacks - during hibernation, before saving the system image, after thaw() callbacks
have been executed for all devices, have been executed for all devices,
- during system restore, when the system is going back to its pre-hibernation - during system restore, when the system is going back to its pre-hibernation
state, after restore() callbacks have been executed for all devices. state, after restore() callbacks have been executed for all devices.
It also may be executed if the loading of a hibernation image into memory fails It also may be executed if the loading of a hibernation image into memory fails
(in that case it is run after thaw() callbacks have been executed for all (in that case it is run after thaw() callbacks have been executed for all
devices that have drivers in the boot kernel). devices that have drivers in the boot kernel).
...@@ -904,6 +934,7 @@ This callback is entirely optional, although it may be necessary if the ...@@ -904,6 +934,7 @@ This callback is entirely optional, although it may be necessary if the
prepare() callback performs operations that need to be reversed. prepare() callback performs operations that need to be reversed.
3.1.15. runtime_suspend() 3.1.15. runtime_suspend()
^^^^^^^^^^^^^^^^^^^^^^^^^
The runtime_suspend() callback is specific to device runtime power management The runtime_suspend() callback is specific to device runtime power management
(runtime PM). It is executed by the PM core's runtime PM framework when the (runtime PM). It is executed by the PM core's runtime PM framework when the
...@@ -915,6 +946,7 @@ put into a low-power state, but it must allow the PCI subsystem to perform all ...@@ -915,6 +946,7 @@ put into a low-power state, but it must allow the PCI subsystem to perform all
of the PCI-specific actions necessary for suspending the device. of the PCI-specific actions necessary for suspending the device.
3.1.16. runtime_resume() 3.1.16. runtime_resume()
^^^^^^^^^^^^^^^^^^^^^^^^
The runtime_resume() callback is specific to device runtime PM. It is executed The runtime_resume() callback is specific to device runtime PM. It is executed
by the PM core's runtime PM framework when the device is about to be resumed by the PM core's runtime PM framework when the device is about to be resumed
...@@ -927,6 +959,7 @@ The device is expected to be able to process I/O in the usual way after ...@@ -927,6 +959,7 @@ The device is expected to be able to process I/O in the usual way after
runtime_resume() has returned. runtime_resume() has returned.
3.1.17. runtime_idle() 3.1.17. runtime_idle()
^^^^^^^^^^^^^^^^^^^^^^
The runtime_idle() callback is specific to device runtime PM. It is executed The runtime_idle() callback is specific to device runtime PM. It is executed
by the PM core's runtime PM framework whenever it may be desirable to suspend by the PM core's runtime PM framework whenever it may be desirable to suspend
...@@ -939,6 +972,7 @@ PCI subsystem will call pm_runtime_suspend() for the device, which in turn will ...@@ -939,6 +972,7 @@ PCI subsystem will call pm_runtime_suspend() for the device, which in turn will
cause the driver's runtime_suspend() callback to be executed. cause the driver's runtime_suspend() callback to be executed.
3.1.18. Pointing Multiple Callback Pointers to One Routine 3.1.18. Pointing Multiple Callback Pointers to One Routine
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Although in principle each of the callbacks described in the previous Although in principle each of the callbacks described in the previous
subsections can be defined as a separate function, it often is convenient to subsections can be defined as a separate function, it often is convenient to
...@@ -962,6 +996,7 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the ...@@ -962,6 +996,7 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the
be pointed to by the .resume(), .thaw(), and .restore() members. be pointed to by the .resume(), .thaw(), and .restore() members.
3.1.19. Driver Flags for Power Management 3.1.19. Driver Flags for Power Management
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The PM core allows device drivers to set flags that influence the handling of The PM core allows device drivers to set flags that influence the handling of
power management for the devices by the core itself and by middle layer code power management for the devices by the core itself and by middle layer code
...@@ -1007,6 +1042,7 @@ it. ...@@ -1007,6 +1042,7 @@ it.
3.2. Device Runtime Power Management 3.2. Device Runtime Power Management
------------------------------------ ------------------------------------
In addition to providing device power management callbacks PCI device drivers In addition to providing device power management callbacks PCI device drivers
are responsible for controlling the runtime power management (runtime PM) of are responsible for controlling the runtime power management (runtime PM) of
their devices. their devices.
...@@ -1073,22 +1109,27 @@ device the PM core automatically queues a request to check if the device is ...@@ -1073,22 +1109,27 @@ device the PM core automatically queues a request to check if the device is
idle), device drivers are generally responsible for queuing power management idle), device drivers are generally responsible for queuing power management
requests for their devices. For this purpose they should use the runtime PM requests for their devices. For this purpose they should use the runtime PM
helper functions provided by the PM core, discussed in helper functions provided by the PM core, discussed in
Documentation/power/runtime_pm.txt. Documentation/power/runtime_pm.rst.
Devices can also be suspended and resumed synchronously, without placing a Devices can also be suspended and resumed synchronously, without placing a
request into pm_wq. In the majority of cases this also is done by their request into pm_wq. In the majority of cases this also is done by their
drivers that use helper functions provided by the PM core for this purpose. drivers that use helper functions provided by the PM core for this purpose.
For more information on the runtime PM of devices refer to For more information on the runtime PM of devices refer to
Documentation/power/runtime_pm.txt. Documentation/power/runtime_pm.rst.
4. Resources 4. Resources
============ ============
PCI Local Bus Specification, Rev. 3.0 PCI Local Bus Specification, Rev. 3.0
PCI Bus Power Management Interface Specification, Rev. 1.2 PCI Bus Power Management Interface Specification, Rev. 1.2
Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b
PCI Express Base Specification, Rev. 2.0 PCI Express Base Specification, Rev. 2.0
Documentation/driver-api/pm/devices.rst Documentation/driver-api/pm/devices.rst
Documentation/power/runtime_pm.txt
Documentation/power/runtime_pm.rst
PM Quality Of Service Interface. ===============================
PM Quality Of Service Interface
===============================
This interface provides a kernel and user mode interface for registering This interface provides a kernel and user mode interface for registering
performance expectations by drivers, subsystems and user space applications on performance expectations by drivers, subsystems and user space applications on
...@@ -11,6 +13,7 @@ memory_bandwidth. ...@@ -11,6 +13,7 @@ memory_bandwidth.
constraints and PM QoS flags. constraints and PM QoS flags.
Each parameters have defined units: Each parameters have defined units:
* latency: usec * latency: usec
* timeout: usec * timeout: usec
* throughput: kbs (kilo bit / sec) * throughput: kbs (kilo bit / sec)
...@@ -18,6 +21,7 @@ Each parameters have defined units: ...@@ -18,6 +21,7 @@ Each parameters have defined units:
1. PM QoS framework 1. PM QoS framework
===================
The infrastructure exposes multiple misc device nodes one per implemented The infrastructure exposes multiple misc device nodes one per implemented
parameter. The set of parameters implement is defined by pm_qos_power_init() parameter. The set of parameters implement is defined by pm_qos_power_init()
...@@ -37,38 +41,39 @@ reading the aggregated value does not require any locking mechanism. ...@@ -37,38 +41,39 @@ reading the aggregated value does not require any locking mechanism.
From kernel mode the use of this interface is simple: From kernel mode the use of this interface is simple:
void pm_qos_add_request(handle, param_class, target_value): void pm_qos_add_request(handle, param_class, target_value):
Will insert an element into the list for that identified PM QoS class with the Will insert an element into the list for that identified PM QoS class with the
target value. Upon change to this list the new target is recomputed and any target value. Upon change to this list the new target is recomputed and any
registered notifiers are called only if the target value is now different. registered notifiers are called only if the target value is now different.
Clients of pm_qos need to save the returned handle for future use in other Clients of pm_qos need to save the returned handle for future use in other
pm_qos API functions. pm_qos API functions.
void pm_qos_update_request(handle, new_target_value): void pm_qos_update_request(handle, new_target_value):
Will update the list element pointed to by the handle with the new target value Will update the list element pointed to by the handle with the new target value
and recompute the new aggregated target, calling the notification tree if the and recompute the new aggregated target, calling the notification tree if the
target is changed. target is changed.
void pm_qos_remove_request(handle): void pm_qos_remove_request(handle):
Will remove the element. After removal it will update the aggregate target and Will remove the element. After removal it will update the aggregate target and
call the notification tree if the target was changed as a result of removing call the notification tree if the target was changed as a result of removing
the request. the request.
int pm_qos_request(param_class): int pm_qos_request(param_class):
Returns the aggregated value for a given PM QoS class. Returns the aggregated value for a given PM QoS class.
int pm_qos_request_active(handle): int pm_qos_request_active(handle):
Returns if the request is still active, i.e. it has not been removed from a Returns if the request is still active, i.e. it has not been removed from a
PM QoS class constraints list. PM QoS class constraints list.
int pm_qos_add_notifier(param_class, notifier): int pm_qos_add_notifier(param_class, notifier):
Adds a notification callback function to the PM QoS class. The callback is Adds a notification callback function to the PM QoS class. The callback is
called when the aggregated value for the PM QoS class is changed. called when the aggregated value for the PM QoS class is changed.
int pm_qos_remove_notifier(int param_class, notifier): int pm_qos_remove_notifier(int param_class, notifier):
Removes the notification callback function for the PM QoS class. Removes the notification callback function for the PM QoS class.
From user mode: From user mode:
Only processes can register a pm_qos request. To provide for automatic Only processes can register a pm_qos request. To provide for automatic
cleanup of a process, the interface requires the process to register its cleanup of a process, the interface requires the process to register its
parameter requests in the following way: parameter requests in the following way:
...@@ -89,6 +94,7 @@ node. ...@@ -89,6 +94,7 @@ node.
2. PM QoS per-device latency and flags framework 2. PM QoS per-device latency and flags framework
================================================
For each device, there are three lists of PM QoS requests. Two of them are For each device, there are three lists of PM QoS requests. Two of them are
maintained along with the aggregated targets of resume latency and active maintained along with the aggregated targets of resume latency and active
...@@ -107,73 +113,80 @@ the aggregated value does not require any locking mechanism. ...@@ -107,73 +113,80 @@ the aggregated value does not require any locking mechanism.
From kernel mode the use of this interface is the following: From kernel mode the use of this interface is the following:
int dev_pm_qos_add_request(device, handle, type, value): int dev_pm_qos_add_request(device, handle, type, value):
Will insert an element into the list for that identified device with the Will insert an element into the list for that identified device with the
target value. Upon change to this list the new target is recomputed and any target value. Upon change to this list the new target is recomputed and any
registered notifiers are called only if the target value is now different. registered notifiers are called only if the target value is now different.
Clients of dev_pm_qos need to save the handle for future use in other Clients of dev_pm_qos need to save the handle for future use in other
dev_pm_qos API functions. dev_pm_qos API functions.
int dev_pm_qos_update_request(handle, new_value): int dev_pm_qos_update_request(handle, new_value):
Will update the list element pointed to by the handle with the new target value Will update the list element pointed to by the handle with the new target
and recompute the new aggregated target, calling the notification trees if the value and recompute the new aggregated target, calling the notification
target is changed. trees if the target is changed.
int dev_pm_qos_remove_request(handle): int dev_pm_qos_remove_request(handle):
Will remove the element. After removal it will update the aggregate target and Will remove the element. After removal it will update the aggregate target
call the notification trees if the target was changed as a result of removing and call the notification trees if the target was changed as a result of
the request. removing the request.
s32 dev_pm_qos_read_value(device): s32 dev_pm_qos_read_value(device):
Returns the aggregated value for a given device's constraints list. Returns the aggregated value for a given device's constraints list.
enum pm_qos_flags_status dev_pm_qos_flags(device, mask) enum pm_qos_flags_status dev_pm_qos_flags(device, mask)
Check PM QoS flags of the given device against the given mask of flags. Check PM QoS flags of the given device against the given mask of flags.
The meaning of the return values is as follows: The meaning of the return values is as follows:
PM_QOS_FLAGS_ALL: All flags from the mask are set
PM_QOS_FLAGS_SOME: Some flags from the mask are set PM_QOS_FLAGS_ALL:
PM_QOS_FLAGS_NONE: No flags from the mask are set All flags from the mask are set
PM_QOS_FLAGS_UNDEFINED: The device's PM QoS structure has not been PM_QOS_FLAGS_SOME:
initialized or the list of requests is empty. Some flags from the mask are set
PM_QOS_FLAGS_NONE:
No flags from the mask are set
PM_QOS_FLAGS_UNDEFINED:
The device's PM QoS structure has not been initialized
or the list of requests is empty.
int dev_pm_qos_add_ancestor_request(dev, handle, type, value) int dev_pm_qos_add_ancestor_request(dev, handle, type, value)
Add a PM QoS request for the first direct ancestor of the given device whose Add a PM QoS request for the first direct ancestor of the given device whose
power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests) power.ignore_children flag is unset (for DEV_PM_QOS_RESUME_LATENCY requests)
or whose power.set_latency_tolerance callback pointer is not NULL (for or whose power.set_latency_tolerance callback pointer is not NULL (for
DEV_PM_QOS_LATENCY_TOLERANCE requests). DEV_PM_QOS_LATENCY_TOLERANCE requests).
int dev_pm_qos_expose_latency_limit(device, value) int dev_pm_qos_expose_latency_limit(device, value)
Add a request to the device's PM QoS list of resume latency constraints and Add a request to the device's PM QoS list of resume latency constraints and
create a sysfs attribute pm_qos_resume_latency_us under the device's power create a sysfs attribute pm_qos_resume_latency_us under the device's power
directory allowing user space to manipulate that request. directory allowing user space to manipulate that request.
void dev_pm_qos_hide_latency_limit(device) void dev_pm_qos_hide_latency_limit(device)
Drop the request added by dev_pm_qos_expose_latency_limit() from the device's Drop the request added by dev_pm_qos_expose_latency_limit() from the device's
PM QoS list of resume latency constraints and remove sysfs attribute PM QoS list of resume latency constraints and remove sysfs attribute
pm_qos_resume_latency_us from the device's power directory. pm_qos_resume_latency_us from the device's power directory.
int dev_pm_qos_expose_flags(device, value) int dev_pm_qos_expose_flags(device, value)
Add a request to the device's PM QoS list of flags and create sysfs attribute Add a request to the device's PM QoS list of flags and create sysfs attribute
pm_qos_no_power_off under the device's power directory allowing user space to pm_qos_no_power_off under the device's power directory allowing user space to
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag. change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.
void dev_pm_qos_hide_flags(device) void dev_pm_qos_hide_flags(device)
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
directory. directory.
Notification mechanisms: Notification mechanisms:
The per-device PM QoS framework has a per-device notification tree. The per-device PM QoS framework has a per-device notification tree.
int dev_pm_qos_add_notifier(device, notifier): int dev_pm_qos_add_notifier(device, notifier):
Adds a notification callback function for the device. Adds a notification callback function for the device.
The callback is called when the aggregated value of the device constraints list The callback is called when the aggregated value of the device constraints list
is changed (for resume latency device PM QoS only). is changed (for resume latency device PM QoS only).
int dev_pm_qos_remove_notifier(device, notifier): int dev_pm_qos_remove_notifier(device, notifier):
Removes the notification callback function for the device. Removes the notification callback function for the device.
Active state latency tolerance Active state latency tolerance
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This device PM QoS type is used to support systems in which hardware may switch This device PM QoS type is used to support systems in which hardware may switch
to energy-saving operation modes on the fly. In those systems, if the operation to energy-saving operation modes on the fly. In those systems, if the operation
......
========================
Linux power supply class Linux power supply class
======================== ========================
...@@ -56,112 +57,155 @@ Quoting include/linux/power_supply.h: ...@@ -56,112 +57,155 @@ Quoting include/linux/power_supply.h:
Attributes/properties detailed Attributes/properties detailed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~ +--------------------------------------------------------------------------+
~ ~ | **Charge/Energy/Capacity - how to not confuse** |
~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~ +--------------------------------------------------------------------------+
~ of battery, this class distinguish these terms. Don't mix them! ~ | **Because both "charge" (µAh) and "energy" (µWh) represents "capacity" |
~ ~ | of battery, this class distinguish these terms. Don't mix them!** |
~ CHARGE_* attributes represents capacity in µAh only. ~ | |
~ ENERGY_* attributes represents capacity in µWh only. ~ | - `CHARGE_*` |
~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~ | attributes represents capacity in µAh only. |
~ ~ | - `ENERGY_*` |
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | attributes represents capacity in µWh only. |
| - `CAPACITY` |
| attribute represents capacity in *percents*, from 0 to 100. |
+--------------------------------------------------------------------------+
Postfixes: Postfixes:
_AVG - *hardware* averaged value, use it if your hardware is really able to
report averaged values.
_NOW - momentary/instantaneous values.
STATUS - this attribute represents operating status (charging, full,
discharging (i.e. powering a load), etc.). This corresponds to
BATTERY_STATUS_* values, as defined in battery.h.
CHARGE_TYPE - batteries can typically charge at different rates.
This defines trickle and fast charges. For batteries that
are already charged or discharging, 'n/a' can be displayed (or
'unknown', if the status is not known).
AUTHENTIC - indicates the power supply (battery or charger) connected
to the platform is authentic(1) or non authentic(0).
HEALTH - represents health of the battery, values corresponds to
POWER_SUPPLY_HEALTH_*, defined in battery.h.
VOLTAGE_OCV - open circuit voltage of the battery.
VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and
minimal power supply voltages. Maximal/minimal means values of voltages
when battery considered "full"/"empty" at normal conditions. Yes, there is
no direct relation between voltage and battery capacity, but some dumb
batteries use voltage for very approximated calculation of capacity.
Battery driver also can use this attribute just to inform userspace
about maximal and minimal voltage thresholds of a given battery.
VOLTAGE_MAX, VOLTAGE_MIN - same as _DESIGN voltage values except that
these ones should be used if hardware could only guess (measure and
retain) the thresholds of a given power supply.
VOLTAGE_BOOT - Reports the voltage measured during boot
CURRENT_BOOT - Reports the current measured during boot
CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when
battery considered full/empty.
ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy.
CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value _AVG
of charge when battery became full/empty". It also could mean "value of *hardware* averaged value, use it if your hardware is really able to
charge when battery considered full/empty at given conditions (temperature, report averaged values.
age)". I.e. these attributes represents real thresholds, not design values. _NOW
momentary/instantaneous values.
ENERGY_FULL, ENERGY_EMPTY - same as above but for energy.
STATUS
CHARGE_COUNTER - the current charge counter (in µAh). This could easily this attribute represents operating status (charging, full,
be negative; there is no empty or full value. It is only useful for discharging (i.e. powering a load), etc.). This corresponds to
relative, time-based measurements. `BATTERY_STATUS_*` values, as defined in battery.h.
PRECHARGE_CURRENT - the maximum charge current during precharge phase CHARGE_TYPE
of charge cycle (typically 20% of battery capacity). batteries can typically charge at different rates.
CHARGE_TERM_CURRENT - Charge termination current. The charge cycle This defines trickle and fast charges. For batteries that
terminates when battery voltage is above recharge threshold, and charge are already charged or discharging, 'n/a' can be displayed (or
current is below this setting (typically 10% of battery capacity). 'unknown', if the status is not known).
CONSTANT_CHARGE_CURRENT - constant charge current programmed by charger. AUTHENTIC
CONSTANT_CHARGE_CURRENT_MAX - maximum charge current supported by the indicates the power supply (battery or charger) connected
power supply object. to the platform is authentic(1) or non authentic(0).
CONSTANT_CHARGE_VOLTAGE - constant charge voltage programmed by charger. HEALTH
CONSTANT_CHARGE_VOLTAGE_MAX - maximum charge voltage supported by the represents health of the battery, values corresponds to
power supply object. POWER_SUPPLY_HEALTH_*, defined in battery.h.
INPUT_CURRENT_LIMIT - input current limit programmed by charger. Indicates VOLTAGE_OCV
the current drawn from a charging source. open circuit voltage of the battery.
CHARGE_CONTROL_LIMIT - current charge control limit setting VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN
CHARGE_CONTROL_LIMIT_MAX - maximum charge control limit setting design values for maximal and minimal power supply voltages.
Maximal/minimal means values of voltages when battery considered
CALIBRATE - battery or coulomb counter calibration status "full"/"empty" at normal conditions. Yes, there is no direct relation
between voltage and battery capacity, but some dumb
CAPACITY - capacity in percents. batteries use voltage for very approximated calculation of capacity.
CAPACITY_ALERT_MIN - minimum capacity alert value in percents. Battery driver also can use this attribute just to inform userspace
CAPACITY_ALERT_MAX - maximum capacity alert value in percents. about maximal and minimal voltage thresholds of a given battery.
CAPACITY_LEVEL - capacity level. This corresponds to
POWER_SUPPLY_CAPACITY_LEVEL_*. VOLTAGE_MAX, VOLTAGE_MIN
same as _DESIGN voltage values except that these ones should be used
TEMP - temperature of the power supply. if hardware could only guess (measure and retain) the thresholds of a
TEMP_ALERT_MIN - minimum battery temperature alert. given power supply.
TEMP_ALERT_MAX - maximum battery temperature alert.
TEMP_AMBIENT - ambient temperature. VOLTAGE_BOOT
TEMP_AMBIENT_ALERT_MIN - minimum ambient temperature alert. Reports the voltage measured during boot
TEMP_AMBIENT_ALERT_MAX - maximum ambient temperature alert.
TEMP_MIN - minimum operatable temperature CURRENT_BOOT
TEMP_MAX - maximum operatable temperature Reports the current measured during boot
TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e. CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN
while battery powers a load) design charge values, when battery considered full/empty.
TIME_TO_FULL - seconds left for battery to be considered full (i.e.
while battery is charging) ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN
same as above but for energy.
CHARGE_FULL, CHARGE_EMPTY
These attributes means "last remembered value of charge when battery
became full/empty". It also could mean "value of charge when battery
considered full/empty at given conditions (temperature, age)".
I.e. these attributes represents real thresholds, not design values.
ENERGY_FULL, ENERGY_EMPTY
same as above but for energy.
CHARGE_COUNTER
the current charge counter (in µAh). This could easily
be negative; there is no empty or full value. It is only useful for
relative, time-based measurements.
PRECHARGE_CURRENT
the maximum charge current during precharge phase of charge cycle
(typically 20% of battery capacity).
CHARGE_TERM_CURRENT
Charge termination current. The charge cycle terminates when battery
voltage is above recharge threshold, and charge current is below
this setting (typically 10% of battery capacity).
CONSTANT_CHARGE_CURRENT
constant charge current programmed by charger.
CONSTANT_CHARGE_CURRENT_MAX
maximum charge current supported by the power supply object.
CONSTANT_CHARGE_VOLTAGE
constant charge voltage programmed by charger.
CONSTANT_CHARGE_VOLTAGE_MAX
maximum charge voltage supported by the power supply object.
INPUT_CURRENT_LIMIT
input current limit programmed by charger. Indicates
the current drawn from a charging source.
CHARGE_CONTROL_LIMIT
current charge control limit setting
CHARGE_CONTROL_LIMIT_MAX
maximum charge control limit setting
CALIBRATE
battery or coulomb counter calibration status
CAPACITY
capacity in percents.
CAPACITY_ALERT_MIN
minimum capacity alert value in percents.
CAPACITY_ALERT_MAX
maximum capacity alert value in percents.
CAPACITY_LEVEL
capacity level. This corresponds to POWER_SUPPLY_CAPACITY_LEVEL_*.
TEMP
temperature of the power supply.
TEMP_ALERT_MIN
minimum battery temperature alert.
TEMP_ALERT_MAX
maximum battery temperature alert.
TEMP_AMBIENT
ambient temperature.
TEMP_AMBIENT_ALERT_MIN
minimum ambient temperature alert.
TEMP_AMBIENT_ALERT_MAX
maximum ambient temperature alert.
TEMP_MIN
minimum operatable temperature
TEMP_MAX
maximum operatable temperature
TIME_TO_EMPTY
seconds left for battery to be considered empty
(i.e. while battery powers a load)
TIME_TO_FULL
seconds left for battery to be considered full
(i.e. while battery is charging)
Battery <-> external power supply interaction Battery <-> external power supply interaction
...@@ -193,8 +237,11 @@ for naming consistency between sysfs attributes and battery node properties. ...@@ -193,8 +237,11 @@ for naming consistency between sysfs attributes and battery node properties.
QA QA
~~ ~~
Q: Where is POWER_SUPPLY_PROP_XYZ attribute?
A: If you cannot find attribute suitable for your driver needs, feel free Q:
Where is POWER_SUPPLY_PROP_XYZ attribute?
A:
If you cannot find attribute suitable for your driver needs, feel free
to add it and send patch along with your driver. to add it and send patch along with your driver.
The attributes available currently are the ones currently provided by the The attributes available currently are the ones currently provided by the
...@@ -204,20 +251,24 @@ A: If you cannot find attribute suitable for your driver needs, feel free ...@@ -204,20 +251,24 @@ A: If you cannot find attribute suitable for your driver needs, feel free
etc. etc.
Q: I have some very specific attribute (e.g. battery color), should I add Q:
I have some very specific attribute (e.g. battery color), should I add
this attribute to standard ones? this attribute to standard ones?
A: Most likely, no. Such attribute can be placed in the driver itself, if A:
Most likely, no. Such attribute can be placed in the driver itself, if
it is useful. Of course, if the attribute in question applicable to it is useful. Of course, if the attribute in question applicable to
large set of batteries, provided by many drivers, and/or comes from large set of batteries, provided by many drivers, and/or comes from
some general battery specification/standard, it may be a candidate to some general battery specification/standard, it may be a candidate to
be added to the core attribute set. be added to the core attribute set.
Q: Suppose, my battery monitoring chip/firmware does not provides capacity Q:
Suppose, my battery monitoring chip/firmware does not provides capacity
in percents, but provides charge_{now,full,empty}. Should I calculate in percents, but provides charge_{now,full,empty}. Should I calculate
percentage capacity manually, inside the driver, and register CAPACITY percentage capacity manually, inside the driver, and register CAPACITY
attribute? The same question about time_to_empty/time_to_full. attribute? The same question about time_to_empty/time_to_full.
A: Most likely, no. This class is designed to export properties which are A:
Most likely, no. This class is designed to export properties which are
directly measurable by the specific hardware available. directly measurable by the specific hardware available.
Inferring not available properties using some heuristics or mathematical Inferring not available properties using some heuristics or mathematical
......
==========================
Regulator API design notes Regulator API design notes
========================== ==========================
...@@ -14,7 +15,9 @@ Safety ...@@ -14,7 +15,9 @@ Safety
have different power requirements, and not all components with power have different power requirements, and not all components with power
requirements are visible to software. requirements are visible to software.
=> The API should make no changes to the hardware state unless it has .. note::
The API should make no changes to the hardware state unless it has
specific knowledge that these changes are safe to perform on this specific knowledge that these changes are safe to perform on this
particular system. particular system.
...@@ -28,6 +31,8 @@ Consumer use cases ...@@ -28,6 +31,8 @@ Consumer use cases
- Many of the power supplies in the system will be shared between many - Many of the power supplies in the system will be shared between many
different consumers. different consumers.
=> The consumer API should be structured so that these use cases are .. note::
The consumer API should be structured so that these use cases are
very easy to handle and so that consumers will work with shared very easy to handle and so that consumers will work with shared
supplies without any additional effort. supplies without any additional effort.
==================================
Regulator Machine Driver Interface Regulator Machine Driver Interface
=================================== ==================================
The regulator machine driver interface is intended for board/machine specific The regulator machine driver interface is intended for board/machine specific
initialisation code to configure the regulator subsystem. initialisation code to configure the regulator subsystem.
Consider the following machine :- Consider the following machine::
Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V] Regulator-1 -+-> Regulator-2 --> [Consumer A @ 1.8 - 2.0V]
| |
...@@ -13,31 +14,31 @@ Consider the following machine :- ...@@ -13,31 +14,31 @@ Consider the following machine :-
The drivers for consumers A & B must be mapped to the correct regulator in The drivers for consumers A & B must be mapped to the correct regulator in
order to control their power supplies. This mapping can be achieved in machine order to control their power supplies. This mapping can be achieved in machine
initialisation code by creating a struct regulator_consumer_supply for initialisation code by creating a struct regulator_consumer_supply for
each regulator. each regulator::
struct regulator_consumer_supply { struct regulator_consumer_supply {
const char *dev_name; /* consumer dev_name() */ const char *dev_name; /* consumer dev_name() */
const char *supply; /* consumer supply - e.g. "vcc" */ const char *supply; /* consumer supply - e.g. "vcc" */
}; };
e.g. for the machine above e.g. for the machine above::
static struct regulator_consumer_supply regulator1_consumers[] = { static struct regulator_consumer_supply regulator1_consumers[] = {
REGULATOR_SUPPLY("Vcc", "consumer B"), REGULATOR_SUPPLY("Vcc", "consumer B"),
}; };
static struct regulator_consumer_supply regulator2_consumers[] = { static struct regulator_consumer_supply regulator2_consumers[] = {
REGULATOR_SUPPLY("Vcc", "consumer A"), REGULATOR_SUPPLY("Vcc", "consumer A"),
}; };
This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2 This maps Regulator-1 to the 'Vcc' supply for Consumer B and maps Regulator-2
to the 'Vcc' supply for Consumer A. to the 'Vcc' supply for Consumer A.
Constraints can now be registered by defining a struct regulator_init_data Constraints can now be registered by defining a struct regulator_init_data
for each regulator power domain. This structure also maps the consumers for each regulator power domain. This structure also maps the consumers
to their supply regulators :- to their supply regulators::
static struct regulator_init_data regulator1_data = { static struct regulator_init_data regulator1_data = {
.constraints = { .constraints = {
.name = "Regulator-1", .name = "Regulator-1",
.min_uV = 3300000, .min_uV = 3300000,
...@@ -46,7 +47,7 @@ static struct regulator_init_data regulator1_data = { ...@@ -46,7 +47,7 @@ static struct regulator_init_data regulator1_data = {
}, },
.num_consumer_supplies = ARRAY_SIZE(regulator1_consumers), .num_consumer_supplies = ARRAY_SIZE(regulator1_consumers),
.consumer_supplies = regulator1_consumers, .consumer_supplies = regulator1_consumers,
}; };
The name field should be set to something that is usefully descriptive The name field should be set to something that is usefully descriptive
for the board for configuration of supplies for other regulators and for the board for configuration of supplies for other regulators and
...@@ -57,9 +58,9 @@ name is provided then the subsystem will choose one. ...@@ -57,9 +58,9 @@ name is provided then the subsystem will choose one.
Regulator-1 supplies power to Regulator-2. This relationship must be registered Regulator-1 supplies power to Regulator-2. This relationship must be registered
with the core so that Regulator-1 is also enabled when Consumer A enables its with the core so that Regulator-1 is also enabled when Consumer A enables its
supply (Regulator-2). The supply regulator is set by the supply_regulator supply (Regulator-2). The supply regulator is set by the supply_regulator
field below and co:- field below and co::
static struct regulator_init_data regulator2_data = { static struct regulator_init_data regulator2_data = {
.supply_regulator = "Regulator-1", .supply_regulator = "Regulator-1",
.constraints = { .constraints = {
.min_uV = 1800000, .min_uV = 1800000,
...@@ -69,11 +70,11 @@ static struct regulator_init_data regulator2_data = { ...@@ -69,11 +70,11 @@ static struct regulator_init_data regulator2_data = {
}, },
.num_consumer_supplies = ARRAY_SIZE(regulator2_consumers), .num_consumer_supplies = ARRAY_SIZE(regulator2_consumers),
.consumer_supplies = regulator2_consumers, .consumer_supplies = regulator2_consumers,
}; };
Finally the regulator devices must be registered in the usual manner. Finally the regulator devices must be registered in the usual manner::
static struct platform_device regulator_devices[] = { static struct platform_device regulator_devices[] = {
{ {
.name = "regulator", .name = "regulator",
.id = DCDC_1, .id = DCDC_1,
...@@ -88,9 +89,9 @@ static struct platform_device regulator_devices[] = { ...@@ -88,9 +89,9 @@ static struct platform_device regulator_devices[] = {
.platform_data = &regulator2_data, .platform_data = &regulator2_data,
}, },
}, },
}; };
/* register regulator 1 device */ /* register regulator 1 device */
platform_device_register(&regulator_devices[0]); platform_device_register(&regulator_devices[0]);
/* register regulator 2 device */ /* register regulator 2 device */
platform_device_register(&regulator_devices[1]); platform_device_register(&regulator_devices[1]);
====================================
System Suspend and Device Interrupts System Suspend and Device Interrupts
====================================
Copyright (C) 2014 Intel Corp. Copyright (C) 2014 Intel Corp.
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
......
此差异已折叠。
此差异已折叠。
swsusp/S3 tricks ================
~~~~~~~~~~~~~~~~ swsusp/S3 tricks
================
Pavel Machek <pavel@ucw.cz> Pavel Machek <pavel@ucw.cz>
If you want to trick swsusp/S3 into working, you might want to try: If you want to trick swsusp/S3 into working, you might want to try:
......
...@@ -117,7 +117,7 @@ PM support: ...@@ -117,7 +117,7 @@ PM support:
implemented") error. You should also try to make sure that your implemented") error. You should also try to make sure that your
driver uses as little power as possible when it's not doing driver uses as little power as possible when it's not doing
anything. For the driver testing instructions see anything. For the driver testing instructions see
Documentation/power/drivers-testing.txt and for a relatively Documentation/power/drivers-testing.rst and for a relatively
complete overview of the power management issues related to complete overview of the power management issues related to
drivers see :ref:`Documentation/driver-api/pm/devices.rst <driverapi_pm_devices>`. drivers see :ref:`Documentation/driver-api/pm/devices.rst <driverapi_pm_devices>`.
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册