pci.txt 10.8 KB
Newer Older
L
Linus Torvalds 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104

PCI Power Management
~~~~~~~~~~~~~~~~~~~~

An overview of the concepts and the related functions in the Linux kernel

Patrick Mochel <mochel@transmeta.com>
(and others)

---------------------------------------------------------------------------

1. Overview
2. How the PCI Subsystem Does Power Management
3. PCI Utility Functions
4. PCI Device Drivers
5. Resources

1. Overview
~~~~~~~~~~~

The PCI Power Management Specification was introduced between the PCI 2.1 and
PCI 2.2 Specifications. It a standard interface for controlling various 
power management operations.

Implementation of the PCI PM Spec is optional, as are several sub-components of
it. If a device supports the PCI PM Spec, the device will have an 8 byte
capability field in its PCI configuration space. This field is used to describe
and control the standard PCI power management features.

The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses
(B0 - B3). The higher the number, the less power the device consumes. However,
the higher the number, the longer the latency is for the device to return to 
an operational state (D0).

There are actually two D3 states.  When someone talks about D3, they usually
mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the
device may lose some context).  But they may also mean D3cold, which is an
ACPI D3 state (power is fully off, all state was discarded); or both.

Bus power management is not covered in this version of this document.

Note that all PCI devices support D0 and D3cold by default, regardless of
whether or not they implement any of the PCI PM spec.

The possible state transitions that a device can undergo are:

+---------------------------+
| Current State | New State |
+---------------------------+
| D0            | D1, D2, D3|
+---------------------------+
| D1            | D2, D3    |
+---------------------------+
| D2            | D3        |
+---------------------------+
| D1, D2, D3    | D0        |
+---------------------------+

Note that when the system is entering a global suspend state, all devices will
be placed into D3 and when resuming, all devices will be placed into D0.
However, when the system is running, other state transitions are possible.

2. How The PCI Subsystem Handles Power Management
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The PCI suspend/resume functionality is accessed indirectly via the Power
Management subsystem. At boot, the PCI driver registers a power management
callback with that layer. Upon entering a suspend state, the PM layer iterates
through all of its registered callbacks. This currently takes place only during
APM state transitions.

Upon going to sleep, the PCI subsystem walks its device tree twice. Both times,
it does a depth first walk of the device tree. The first walk saves each of the
device's state and checks for devices that will prevent the system from entering
a global power state. The next walk then places the devices in a low power
state.

The first walk allows a graceful recovery in the event of a failure, since none
of the devices have actually been powered down.

In both walks, in particular the second, all children of a bridge are touched
before the actual bridge itself. This allows the bridge to retain power while
its children are being accessed.

Upon resuming from sleep, just the opposite must be true: all bridges must be
powered on and restored before their children are powered on. This is easily
accomplished with a breadth-first walk of the PCI device tree.


3. PCI Utility Functions
~~~~~~~~~~~~~~~~~~~~~~~~

These are helper functions designed to be called by individual device drivers.
Assuming that a device behaves as advertised, these should be applicable in most
cases. However, results may vary.

Note that these functions are never implicitly called for the driver. The driver
is always responsible for deciding when and if to call these.


pci_save_state
--------------

Usage:
105
	pci_save_state(struct pci_dev *dev);
L
Linus Torvalds 已提交
106 107

Description:
108 109
	Save first 64 bytes of PCI config space, along with any additional
	PCI-Express or PCI-X information.
L
Linus Torvalds 已提交
110 111 112 113 114 115


pci_restore_state
-----------------

Usage:
116
	pci_restore_state(struct pci_dev *dev);
L
Linus Torvalds 已提交
117 118

Description:
119
	Restore previously saved config space.
L
Linus Torvalds 已提交
120 121 122 123 124 125


pci_set_power_state
-------------------

Usage:
126
	pci_set_power_state(struct pci_dev *dev, pci_power_t state);
L
Linus Torvalds 已提交
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141

Description:
	Transition device to low power state using PCI PM Capabilities
	registers.

	Will fail under one of the following conditions:
	- If state is less than current state, but not D0 (illegal transition)
	- Device doesn't support PM Capabilities
	- Device does not support requested state


pci_enable_wake
---------------

Usage:
142
	pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable);
L
Linus Torvalds 已提交
143 144 145 146 147 148 149 150 151 152

Description:
	Enable device to generate PME# during low power state using PCI PM 
	Capabilities.

	Checks whether if device supports generating PME# from requested state
	and fail if it does not, unless enable == 0 (request is to disable wake
	events, which is implicit if it doesn't even support it in the first
	place).

M
Matt LaPlante 已提交
153
	Note that the PMC Register in the device's PM Capabilities has a bitmask
L
Linus Torvalds 已提交
154 155 156 157 158 159 160 161 162 163 164
	of the states it supports generating PME# from. D3hot is bit 3 and
	D3cold is bit 4. So, while a value of 4 as the state may not seem
	semantically correct, it is. 


4. PCI Device Drivers
~~~~~~~~~~~~~~~~~~~~~

These functions are intended for use by individual drivers, and are defined in 
struct pci_driver:

165
        int  (*suspend) (struct pci_dev *dev, pm_message_t state);
L
Linus Torvalds 已提交
166
        int  (*resume) (struct pci_dev *dev);
167
        int  (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable);
L
Linus Torvalds 已提交
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205


suspend
-------

Usage:

if (dev->driver && dev->driver->suspend)
	dev->driver->suspend(dev,state);

A driver uses this function to actually transition the device into a low power
state. This should include disabling I/O, IRQs, and bus-mastering, as well as
physically transitioning the device to a lower power state; it may also include
calls to pci_enable_wake().

Bus mastering may be disabled by doing:

pci_disable_device(dev);

For devices that support the PCI PM Spec, this may be used to set the device's
power state to match the suspend() parameter:

pci_set_power_state(dev,state);

The driver is also responsible for disabling any other device-specific features
(e.g blanking screen, turning off on-card memory, etc).

The driver should be sure to track the current state of the device, as it may
obviate the need for some operations.

The driver should update the current_state field in its pci_dev structure in
this function, except for PM-capable devices when pci_set_power_state is used.

resume
------

Usage:

206
if (dev->driver && dev->driver->resume)
L
Linus Torvalds 已提交
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267
	dev->driver->resume(dev)

The resume callback may be called from any power state, and is always meant to
transition the device to the D0 state. 

The driver is responsible for reenabling any features of the device that had
been disabled during previous suspend calls, such as IRQs and bus mastering,
as well as calling pci_restore_state().

If the device is currently in D3, it may need to be reinitialized in resume().

  * Some types of devices, like bus controllers, will preserve context in D3hot
    (using Vcc power).  Their drivers will often want to avoid re-initializing
    them after re-entering D0 (perhaps to avoid resetting downstream devices).

  * Other kinds of devices in D3hot will discard device context as part of a
    soft reset when re-entering the D0 state.
    
  * Devices resuming from D3cold always go through a power-on reset.  Some
    device context can also be preserved using Vaux power.

  * Some systems hide D3cold resume paths from drivers.  For example, on PCs
    the resume path for suspend-to-disk often runs BIOS powerup code, which
    will sometimes re-initialize the device.

To handle resets during D3 to D0 transitions, it may be convenient to share
device initialization code between probe() and resume().  Device parameters
can also be saved before the driver suspends into D3, avoiding re-probe.

If the device supports the PCI PM Spec, it can use this to physically transition
the device to D0:

pci_set_power_state(dev,0);

Note that if the entire system is transitioning out of a global sleep state, all
devices will be placed in the D0 state, so this is not necessary. However, in
the event that the device is placed in the D3 state during normal operation,
this call is necessary. It is impossible to determine which of the two events is
taking place in the driver, so it is always a good idea to make that call.

The driver should take note of the state that it is resuming from in order to
ensure correct (and speedy) operation.

The driver should update the current_state field in its pci_dev structure in
this function, except for PM-capable devices when pci_set_power_state is used.


enable_wake
-----------

Usage:

if (dev->driver && dev->driver->enable_wake)
	dev->driver->enable_wake(dev,state,enable);

This callback is generally only relevant for devices that support the PCI PM
spec and have the ability to generate a PME# (Power Management Event Signal)
to wake the system up. (However, it is possible that a device may support 
some non-standard way of generating a wake event on sleep.)

Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's
M
Matt LaPlante 已提交
268
PM Capabilities describe what power states the device supports generating a 
L
Linus Torvalds 已提交
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290
wake event from:

+------------------+
|  Bit  |  State   |
+------------------+
|  11   |   D0     |
|  12   |   D1     |
|  13   |   D2     |
|  14   |   D3hot  |
|  15   |   D3cold |
+------------------+

A device can use this to enable wake events:

	 pci_enable_wake(dev,state,enable);

Note that to enable PME# from D3cold, a value of 4 should be passed to 
pci_enable_wake (since it uses an index into a bitmask). If a driver gets
a request to enable wake events from D3, two calls should be made to 
pci_enable_wake (one for both D3hot and D3cold).


291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325
A reference implementation
-------------------------
.suspend()
{
	/* driver specific operations */

	/* Disable IRQ */
	free_irq();
	/* If using MSI */
	pci_disable_msi();

	pci_save_state();
	pci_enable_wake();
	/* Disable IO/bus master/irq router */
	pci_disable_device();
	pci_set_power_state(pci_choose_state());
}

.resume()
{
	pci_set_power_state(PCI_D0);
	pci_restore_state();
	/* device's irq possibly is changed, driver should take care */
	pci_enable_device();
	pci_set_master();

	/* if using MSI, device's vector possibly is changed */
	pci_enable_msi();

	request_irq();
	/* driver specific operations; */
}

This is a typical implementation. Drivers can slightly change the order
of the operations in the implementation, ignore some operations or add
326
more driver specific operations in it, but drivers should do something like
327 328
this on the whole.

L
Linus Torvalds 已提交
329 330 331 332 333 334
5. Resources
~~~~~~~~~~~~

PCI Local Bus Specification 
PCI Bus Power Management Interface Specification

R
Randy Dunlap 已提交
335
  http://www.pcisig.com
L
Linus Torvalds 已提交
336