MSI-HOWTO.txt 25.5 KB
Newer Older
L
Linus Torvalds 已提交
1 2 3 4 5 6 7 8 9 10 11 12
		The MSI Driver Guide HOWTO
	Tom L Nguyen tom.l.nguyen@intel.com
			10/03/2003
	Revised Feb 12, 2004 by Martine Silbermann
		email: Martine.Silbermann@hp.com
	Revised Jun 25, 2004 by Tom L Nguyen

1. About this guide

This guide describes the basics of Message Signaled Interrupts (MSI),
the advantages of using MSI over traditional interrupt mechanisms,
and how to enable your driver to use MSI or MSI-X. Also included is
R
Randy Dunlap 已提交
13 14 15 16 17 18 19 20 21
a Frequently Asked Questions (FAQ) section.

1.1 Terminology

PCI devices can be single-function or multi-function.  In either case,
when this text talks about enabling or disabling MSI on a "device
function," it is referring to one specific PCI device and function and
not to all functions on a PCI device (unless the PCI device has only
one function).
L
Linus Torvalds 已提交
22 23 24 25 26 27

2. Copyright 2003 Intel Corporation

3. What is MSI/MSI-X?

Message Signaled Interrupt (MSI), as described in the PCI Local Bus
R
Randy Dunlap 已提交
28
Specification Revision 2.3 or later, is an optional feature, and a
L
Linus Torvalds 已提交
29 30 31 32 33 34 35 36 37
required feature for PCI Express devices. MSI enables a device function
to request service by sending an Inbound Memory Write on its PCI bus to
the FSB as a Message Signal Interrupt transaction. Because MSI is
generated in the form of a Memory Write, all transaction conditions,
such as a Retry, Master-Abort, Target-Abort or normal completion, are
supported.

A PCI device that supports MSI must also support pin IRQ assertion
interrupt mechanism to provide backward compatibility for systems that
R
Randy Dunlap 已提交
38
do not support MSI. In systems which support MSI, the bus driver is
L
Linus Torvalds 已提交
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
responsible for initializing the message address and message data of
the device function's MSI/MSI-X capability structure during device
initial configuration.

An MSI capable device function indicates MSI support by implementing
the MSI/MSI-X capability structure in its PCI capability list. The
device function may implement both the MSI capability structure and
the MSI-X capability structure; however, the bus driver should not
enable both.

The MSI capability structure contains Message Control register,
Message Address register and Message Data register. These registers
provide the bus driver control over MSI. The Message Control register
indicates the MSI capability supported by the device. The Message
Address register specifies the target address and the Message Data
register specifies the characteristics of the message. To request
service, the device function writes the content of the Message Data
register to the target address. The device and its software driver
are prohibited from writing to these registers.

The MSI-X capability structure is an optional extension to MSI. It
uses an independent and separate capability structure. There are
some key advantages to implementing the MSI-X capability structure
over the MSI capability structure as described below.

	- Support a larger maximum number of vectors per function.

	- Provide the ability for system software to configure
	each vector with an independent message address and message
	data, specified by a table that resides in Memory Space.

        - MSI and MSI-X both support per-vector masking. Per-vector
	masking is an optional extension of MSI but a required
R
Randy Dunlap 已提交
72 73 74
	feature for MSI-X. Per-vector masking provides the kernel the
	ability to mask/unmask a single MSI while running its
	interrupt service routine. If per-vector masking is
L
Linus Torvalds 已提交
75 76 77 78 79 80
	not supported, then the device driver should provide the
	hardware/software synchronization to ensure that the device
	generates MSI when the driver wants it to do so.

4. Why use MSI?

R
Randy Dunlap 已提交
81 82
As a benefit to the simplification of board design, MSI allows board
designers to remove out-of-band interrupt routing. MSI is another
L
Linus Torvalds 已提交
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
step towards a legacy-free environment.

Due to increasing pressure on chipset and processor packages to
reduce pin count, the need for interrupt pins is expected to
diminish over time. Devices, due to pin constraints, may implement
messages to increase performance.

PCI Express endpoints uses INTx emulation (in-band messages) instead
of IRQ pin assertion. Using INTx emulation requires interrupt
sharing among devices connected to the same node (PCI bridge) while
MSI is unique (non-shared) and does not require BIOS configuration
support. As a result, the PCI Express technology requires MSI
support for better interrupt performance.

Using MSI enables the device functions to support two or more
R
Randy Dunlap 已提交
98
vectors, which can be configured to target different CPUs to
L
Linus Torvalds 已提交
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
increase scalability.

5. Configuring a driver to use MSI/MSI-X

By default, the kernel will not enable MSI/MSI-X on all devices that
support this capability. The CONFIG_PCI_MSI kernel option
must be selected to enable MSI/MSI-X support.

5.1 Including MSI/MSI-X support into the kernel

To allow MSI/MSI-X capable device drivers to selectively enable
MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
below), the VECTOR based scheme needs to be enabled by setting
CONFIG_PCI_MSI during kernel config.

Since the target of the inbound message is the local APIC, providing
CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.

5.2 Configuring for MSI support

Due to the non-contiguous fashion in vector assignment of the
existing Linux kernel, this version does not support multiple
messages regardless of a device function is capable of supporting
more than one vector. To enable MSI on a device function's MSI
capability structure requires a device driver to call the function
pci_enable_msi() explicitly.

5.2.1 API pci_enable_msi

int pci_enable_msi(struct pci_dev *dev)

R
Randy Dunlap 已提交
130 131
With this new API, a device driver that wants to have MSI
enabled on its device function must call this API to enable MSI.
L
Linus Torvalds 已提交
132 133 134
A successful call will initialize the MSI capability structure
with ONE vector, regardless of whether a device function is
capable of supporting multiple messages. This vector replaces the
R
Randy Dunlap 已提交
135 136
pre-assigned dev->irq with a new MSI vector. To avoid a conflict
of the new assigned vector with existing pre-assigned vector requires
L
Linus Torvalds 已提交
137 138 139 140 141 142 143 144 145 146 147
a device driver to call this API before calling request_irq().

5.2.2 API pci_disable_msi

void pci_disable_msi(struct pci_dev *dev)

This API should always be used to undo the effect of pci_enable_msi()
when a device driver is unloading. This API restores dev->irq with
the pre-assigned IOAPIC vector and switches a device's interrupt
mode to PCI pin-irq assertion/INTx emulation mode.

R
Randy Dunlap 已提交
148 149 150
Note that a device driver should always call free_irq() on the MSI vector
that it has done request_irq() on before calling this API. Failure to do
so results in a BUG_ON() and a device will be left with MSI enabled and
L
Linus Torvalds 已提交
151 152 153 154
leaks its vector.

5.2.3 MSI mode vs. legacy mode diagram

R
Randy Dunlap 已提交
155
The below diagram shows the events which switch the interrupt
L
Linus Torvalds 已提交
156 157 158 159 160 161 162 163 164 165
mode on the MSI-capable device function between MSI mode and
PIN-IRQ assertion mode.

	 ------------   pci_enable_msi 	 ------------------------
	|	     | <===============	| 			 |
	| MSI MODE   |	  	     	| PIN-IRQ ASSERTION MODE |
	| 	     | ===============>	|			 |
 	 ------------	pci_disable_msi  ------------------------


R
Randy Dunlap 已提交
166
Figure 1. MSI Mode vs. Legacy Mode
L
Linus Torvalds 已提交
167

R
Randy Dunlap 已提交
168
In Figure 1, a device operates by default in legacy mode. Legacy
L
Linus Torvalds 已提交
169 170 171 172 173 174 175 176
in this context means PCI pin-irq assertion or PCI-Express INTx
emulation. A successful MSI request (using pci_enable_msi()) switches
a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
stored in dev->irq will be saved by the PCI subsystem and a new
assigned MSI vector will replace dev->irq.

To return back to its default mode, a device driver should always call
pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
R
Randy Dunlap 已提交
177 178 179
device driver should always call free_irq() on the MSI vector it has
done request_irq() on before calling pci_disable_msi(). Failure to do
so results in a BUG_ON() and a device will be left with MSI enabled and
L
Linus Torvalds 已提交
180
leaks its vector. Otherwise, the PCI subsystem restores a device's
R
Randy Dunlap 已提交
181
dev->irq with a pre-assigned IOAPIC vector and marks the released
L
Linus Torvalds 已提交
182 183 184 185 186 187 188
MSI vector as unused.

Once being marked as unused, there is no guarantee that the PCI
subsystem will reserve this MSI vector for a device. Depending on
the availability of current PCI vector resources and the number of
MSI/MSI-X requests from other drivers, this MSI may be re-assigned.

R
Randy Dunlap 已提交
189 190
For the case where the PCI subsystem re-assigns this MSI vector to
another driver, a request to switch back to MSI mode may result
L
Linus Torvalds 已提交
191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
in being assigned a different MSI vector or a failure if no more
vectors are available.

5.3 Configuring for MSI-X support

Due to the ability of the system software to configure each vector of
the MSI-X capability structure with an independent message address
and message data, the non-contiguous fashion in vector assignment of
the existing Linux kernel has no impact on supporting multiple
messages on an MSI-X capable device functions. To enable MSI-X on
a device function's MSI-X capability structure requires its device
driver to call the function pci_enable_msix() explicitly.

The function pci_enable_msix(), once invoked, enables either
all or nothing, depending on the current availability of PCI vector
resources. If the PCI vector resources are available for the number
of vectors requested by a device driver, this function will configure
the MSI-X table of the MSI-X capability structure of a device with
requested messages. To emphasize this reason, for example, a device
may be capable for supporting the maximum of 32 vectors while its
software driver usually may request 4 vectors. It is recommended
that the device driver should call this function once during the
initialization phase of the device driver.

Unlike the function pci_enable_msi(), the function pci_enable_msix()
does not replace the pre-assigned IOAPIC dev->irq with a new MSI
vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
into the field vector of each element contained in a second argument.
R
Randy Dunlap 已提交
219 220
Note that the pre-assigned IOAPIC dev->irq is valid only if the device
operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
L
Linus Torvalds 已提交
221
using dev->irq by the device driver to request for interrupt service
222
may result in unpredictable behavior.
L
Linus Torvalds 已提交
223

R
Randy Dunlap 已提交
224
For each MSI-X vector granted, a device driver is responsible for calling
L
Linus Torvalds 已提交
225 226 227 228 229 230 231 232 233 234
other functions like request_irq(), enable_irq(), etc. to enable
this vector with its corresponding interrupt service handler. It is
a device driver's choice to assign all vectors with the same
interrupt service handler or each vector with a unique interrupt
service handler.

5.3.1 Handling MMIO address space of MSI-X Table

The PCI 3.0 specification has implementation notes that MMIO address
space for a device's MSI-X structure should be isolated so that the
R
Randy Dunlap 已提交
235 236
software system can set different pages for controlling accesses to the
MSI-X structure. The implementation of MSI support requires the PCI
L
Linus Torvalds 已提交
237
subsystem, not a device driver, to maintain full control of the MSI-X
R
Randy Dunlap 已提交
238 239 240 241
table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
table/MSI-X PBA.  A device driver is prohibited from requesting the MMIO
address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem
will fail enabling MSI-X on its hardware device when it calls the function
L
Linus Torvalds 已提交
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
pci_enable_msix().

5.3.2 Handling MSI-X allocation

Determining the number of MSI-X vectors allocated to a function is
dependent on the number of MSI capable devices and MSI-X capable
devices populated in the system. The policy of allocating MSI-X
vectors to a function is defined as the following:

#of MSI-X vectors allocated to a function = (x - y)/z where

x = 	The number of available PCI vector resources by the time
	the device driver calls pci_enable_msix(). The PCI vector
	resources is the sum of the number of unassigned vectors
	(new) and the number of released vectors when any MSI/MSI-X
	device driver switches its hardware device back to a legacy
	mode or is hot-removed.	The number of unassigned vectors
	may exclude some vectors reserved, as defined in parameter
	NR_HP_RESERVED_VECTORS, for the case where the system is
	capable of supporting hot-add/hot-remove operations. Users
	may change the value defined in NR_HR_RESERVED_VECTORS to
	meet their specific needs.

y =	The number of MSI capable devices populated in the system.
	This policy ensures that each MSI capable device has its
	vector reserved to avoid the case where some MSI-X capable
	drivers may attempt to claim all available vector resources.

270
z =	The number of MSI-X capable devices populated in the system.
L
Linus Torvalds 已提交
271 272 273 274 275 276 277 278 279 280 281 282 283 284
	This policy ensures that maximum (x - y) is distributed
	evenly among MSI-X capable devices.

Note that the PCI subsystem scans y and z during a bus enumeration.
When the PCI subsystem completes configuring MSI/MSI-X capability
structure of a device as requested by its device driver, y/z is
decremented accordingly.

5.3.3 Handling MSI-X shortages

For the case where fewer MSI-X vectors are allocated to a function
than requested, the function pci_enable_msix() will return the
maximum number of MSI-X vectors available to the caller. A device
driver may re-send its request with fewer or equal vectors indicated
R
Randy Dunlap 已提交
285 286 287
in the return. For example, if a device driver requests 5 vectors, but
the number of available vectors is 3 vectors, a value of 3 will be
returned as a result of pci_enable_msix() call. A function could be
L
Linus Torvalds 已提交
288 289 290 291 292 293 294 295
designed for its driver to use only 3 MSI-X table entries as
different combinations as ABC--, A-B-C, A--CB, etc. Note that this
patch does not support multiple entries with the same vector. Such
attempt by a device driver to use 5 MSI-X table entries with 3 vectors
as ABBCC, AABCC, BCCBA, etc will result as a failure by the function
pci_enable_msix(). Below are the reasons why supporting multiple
entries with the same vector is an undesirable solution.

R
Randy Dunlap 已提交
296 297
	- The PCI subsystem cannot determine the entry that
	  generated the message to mask/unmask MSI while handling
L
Linus Torvalds 已提交
298 299 300 301
	  software driver ISR. Attempting to walk through all MSI-X
	  table entries (2048 max) to mask/unmask any match vector
	  is an undesirable solution.

R
Randy Dunlap 已提交
302
	- Walking through all MSI-X table entries (2048 max) to handle
L
Linus Torvalds 已提交
303 304 305 306
	  SMP affinity of any match vector is an undesirable solution.

5.3.4 API pci_enable_msix

R
Randy Dunlap 已提交
307
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
L
Linus Torvalds 已提交
308 309

This API enables a device driver to request the PCI subsystem
R
Randy Dunlap 已提交
310
to enable MSI-X messages on its hardware device. Depending on
L
Linus Torvalds 已提交
311
the availability of PCI vectors resources, the PCI subsystem enables
R
Randy Dunlap 已提交
312
either all or none of the requested vectors.
L
Linus Torvalds 已提交
313

R
Randy Dunlap 已提交
314
Argument 'dev' points to the device (pci_dev) structure.
L
Linus Torvalds 已提交
315

R
Randy Dunlap 已提交
316 317 318
Argument 'entries' is a pointer to an array of msix_entry structs.
The number of entries is indicated in argument 'nvec'.
struct msix_entry is defined in /driver/pci/msi.h:
L
Linus Torvalds 已提交
319 320 321 322 323 324

struct msix_entry {
	u16 	vector; /* kernel uses to write alloc vector */
	u16	entry; /* driver uses to specify entry */
};

R
Randy Dunlap 已提交
325 326
A device driver is responsible for initializing the field 'entry' of
each element with a unique entry supported by MSI-X table. Otherwise,
L
Linus Torvalds 已提交
327
-EINVAL will be returned as a result. A successful return of zero
R
Randy Dunlap 已提交
328
indicates the PCI subsystem completed initializing each of the requested
L
Linus Torvalds 已提交
329 330
entries of the MSI-X table with message address and message data.
Last but not least, the PCI subsystem will write the 1:1
R
Randy Dunlap 已提交
331 332
vector-to-entry mapping into the field 'vector' of each element. A
device driver is responsible for keeping track of allocated MSI-X
L
Linus Torvalds 已提交
333 334
vectors in its internal data structure.

R
Randy Dunlap 已提交
335
A return of zero indicates that the number of MSI-X vectors was
L
Linus Torvalds 已提交
336 337 338 339 340 341 342 343 344 345 346 347 348
successfully allocated. A return of greater than zero indicates
MSI-X vector shortage. Or a return of less than zero indicates
a failure. This failure may be a result of duplicate entries
specified in second argument, or a result of no available vector,
or a result of failing to initialize MSI-X table entries.

5.3.5 API pci_disable_msix

void pci_disable_msix(struct pci_dev *dev)

This API should always be used to undo the effect of pci_enable_msix()
when a device driver is unloading. Note that a device driver should
always call free_irq() on all MSI-X vectors it has done request_irq()
R
Randy Dunlap 已提交
349
on before calling this API. Failure to do so results in a BUG_ON() and
L
Linus Torvalds 已提交
350 351 352 353
a device will be left with MSI-X enabled and leaks its vectors.

5.3.6 MSI-X mode vs. legacy mode diagram

R
Randy Dunlap 已提交
354
The below diagram shows the events which switch the interrupt
L
Linus Torvalds 已提交
355 356 357 358 359 360 361 362 363
mode on the MSI-X capable device function between MSI-X mode and
PIN-IRQ assertion mode (legacy).

	 ------------   pci_enable_msix(,,n) ------------------------
	|	     | <===============	    | 			     |
	| MSI-X MODE |	  	     	    | PIN-IRQ ASSERTION MODE |
	| 	     | ===============>	    |			     |
 	 ------------	pci_disable_msix     ------------------------

R
Randy Dunlap 已提交
364
Figure 2. MSI-X Mode vs. Legacy Mode
L
Linus Torvalds 已提交
365

R
Randy Dunlap 已提交
366
In Figure 2, a device operates by default in legacy mode. A
L
Linus Torvalds 已提交
367 368 369 370 371
successful MSI-X request (using pci_enable_msix()) switches a
device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
stored in dev->irq will be saved by the PCI subsystem; however,
unlike MSI mode, the PCI subsystem will not replace dev->irq with
assigned MSI-X vector because the PCI subsystem already writes the 1:1
R
Randy Dunlap 已提交
372
vector-to-entry mapping into the field 'vector' of each element
L
Linus Torvalds 已提交
373 374 375 376 377 378
specified in second argument.

To return back to its default mode, a device driver should always call
pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
a device driver should always call free_irq() on all MSI-X vectors it
has done request_irq() on before calling pci_disable_msix(). Failure
R
Randy Dunlap 已提交
379
to do so results in a BUG_ON() and a device will be left with MSI-X
L
Linus Torvalds 已提交
380 381 382 383 384 385 386 387 388 389 390
enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
device function's interrupt mode from MSI-X mode to legacy mode and
marks all allocated MSI-X vectors as unused.

Once being marked as unused, there is no guarantee that the PCI
subsystem will reserve these MSI-X vectors for a device. Depending on
the availability of current PCI vector resources and the number of
MSI/MSI-X requests from other drivers, these MSI-X vectors may be
re-assigned.

For the case where the PCI subsystem re-assigned these MSI-X vectors
R
Randy Dunlap 已提交
391
to other drivers, a request to switch back to MSI-X mode may result
L
Linus Torvalds 已提交
392 393 394
being assigned with another set of MSI-X vectors or a failure if no
more vectors are available.

R
Randy Dunlap 已提交
395
5.4 Handling function implementing both MSI and MSI-X capabilities
L
Linus Torvalds 已提交
396 397 398 399 400

For the case where a function implements both MSI and MSI-X
capabilities, the PCI subsystem enables a device to run either in MSI
mode or MSI-X mode but not both. A device driver determines whether it
wants MSI or MSI-X enabled on its hardware device. Once a device
R
Randy Dunlap 已提交
401
driver requests for MSI, for example, it is prohibited from requesting
L
Linus Torvalds 已提交
402 403 404 405
MSI-X; in other words, a device driver is not permitted to ping-pong
between MSI mod MSI-X mode during a run-time.

5.5 Hardware requirements for MSI/MSI-X support
R
Randy Dunlap 已提交
406

L
Linus Torvalds 已提交
407 408 409 410
MSI/MSI-X support requires support from both system hardware and
individual hardware device functions.

5.5.1 System hardware support
R
Randy Dunlap 已提交
411

L
Linus Torvalds 已提交
412
Since the target of MSI address is the local APIC CPU, enabling
R
Randy Dunlap 已提交
413 414 415 416
MSI/MSI-X support in the Linux kernel is dependent on whether existing
system hardware supports local APIC. Users should verify that their
system supports local APIC operation by testing that it runs when
CONFIG_X86_LOCAL_APIC=y.
L
Linus Torvalds 已提交
417 418 419 420

In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
however, in UP environment, users must manually set
CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
R
Randy Dunlap 已提交
421 422
CONFIG_PCI_MSI enables the VECTOR based scheme and the option for
MSI-capable device drivers to selectively enable MSI/MSI-X.
L
Linus Torvalds 已提交
423 424 425 426

Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
vector is allocated new during runtime and MSI/MSI-X support does not
depend on BIOS support. This key independency enables MSI/MSI-X
R
Randy Dunlap 已提交
427
support on future IOxAPIC free platforms.
L
Linus Torvalds 已提交
428 429

5.5.2 Device hardware support
R
Randy Dunlap 已提交
430

L
Linus Torvalds 已提交
431 432 433 434 435 436
The hardware device function supports MSI by indicating the
MSI/MSI-X capability structure on its PCI capability list. By
default, this capability structure will not be initialized by
the kernel to enable MSI during the system boot. In other words,
the device function is running on its default pin assertion mode.
Note that in many cases the hardware supporting MSI have bugs,
R
Randy Dunlap 已提交
437 438
which may result in system hangs. The software driver of specific
MSI-capable hardware is responsible for deciding whether to call
L
Linus Torvalds 已提交
439
pci_enable_msi or not. A return of zero indicates the kernel
R
Randy Dunlap 已提交
440
successfully initialized the MSI/MSI-X capability structure of the
441
device function. The device function is now running on MSI/MSI-X mode.
L
Linus Torvalds 已提交
442 443 444 445 446 447 448 449

5.6 How to tell whether MSI/MSI-X is enabled on device function

At the driver level, a return of zero from the function call of
pci_enable_msi()/pci_enable_msix() indicates to a device driver that
its device function is initialized successfully and ready to run in
MSI/MSI-X mode.

R
Randy Dunlap 已提交
450 451 452 453
At the user level, users can use the command 'cat /proc/interrupts'
to display the vectors allocated for devices and their interrupt
MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is
enabled on a SCSI Adaptec 39320D Ultra320 controller.
L
Linus Torvalds 已提交
454 455 456 457 458 459 460 461 462 463

           CPU0       CPU1
  0:     324639          0    IO-APIC-edge  timer
  1:       1186          0    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
 12:       2797          0    IO-APIC-edge  i8042
 14:       6543          0    IO-APIC-edge  ide0
 15:          1          0    IO-APIC-edge  ide1
169:          0          0   IO-APIC-level  uhci-hcd
185:          0          0   IO-APIC-level  uhci-hcd
R
Randy Dunlap 已提交
464 465
193:        138         10         PCI-MSI  aic79xx
201:         30          0         PCI-MSI  aic79xx
L
Linus Torvalds 已提交
466 467 468 469 470 471 472
225:         30          0   IO-APIC-level  aic7xxx
233:         30          0   IO-APIC-level  aic7xxx
NMI:          0          0
LOC:     324553     325068
ERR:          0
MIS:          0

473 474 475 476 477 478 479 480 481 482
6. MSI quirks

Several PCI chipsets or devices are known to not support MSI.
The PCI stack provides 3 possible levels of MSI disabling:
* on a single device
* on all devices behind a specific bridge
* globally

6.1. Disabling MSI on a single device

M
Matt LaPlante 已提交
483 484
Under some circumstances it might be required to disable MSI on a
single device.  This may be achieved by either not calling pci_enable_msi()
485 486 487 488 489 490 491 492 493 494
or all, or setting the pci_dev->no_msi flag before (most of the time
in a quirk).

6.2. Disabling MSI below a bridge

The vast majority of MSI quirks are required by PCI bridges not
being able to route MSI between busses. In this case, MSI have to be
disabled on all devices behind this bridge. It is achieves by setting
the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
subordinate bus. There is no need to set the same flag on bridges that
M
Matt LaPlante 已提交
495
are below the broken bridge. When pci_enable_msi() is called to enable
496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534
MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
flag in all parent busses of the device.

Some bridges actually support dynamic MSI support enabling/disabling
by changing some bits in their PCI configuration space (especially
the Hypertransport chipsets such as the nVidia nForce and Serverworks
HT2000). It may then be required to update the NO_MSI flag on the
corresponding devices in the sysfs hierarchy. To enable MSI support
on device "0000:00:0e", do:

	echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus

To disable MSI support, echo 0 instead of 1. Note that it should be
used with caution since changing this value might break interrupts.

6.3. Disabling MSI globally

Some extreme cases may require to disable MSI globally on the system.
For now, the only known case is a Serverworks PCI-X chipsets (MSI are
not supported on several busses that are not all connected to the
chipset in the Linux PCI hierarchy). In the vast majority of other
cases, disabling only behind a specific bridge is enough.

For debugging purpose, the user may also pass pci=nomsi on the kernel
command-line to explicitly disable MSI globally. But, once the appro-
priate quirks are added to the kernel, this option should not be
required anymore.

6.4. Finding why MSI cannot be enabled on a device

Assuming that MSI are not enabled on a device, you should look at
dmesg to find messages that quirks may output when disabling MSI
on some devices, some bridges or even globally.
Then, lspci -t gives the list of bridges above a device. Reading
/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
are enabled (1) or disabled (0). In 0 is found in a single bridge
msi_bus file above the device, MSI cannot be enabled.

7. FAQ
L
Linus Torvalds 已提交
535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561

Q1. Are there any limitations on using the MSI?

A1. If the PCI device supports MSI and conforms to the
specification and the platform supports the APIC local bus,
then using MSI should work.

Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
AMD processors)? In P3 IPI's are transmitted on the APIC local
bus and in P4 and Xeon they are transmitted on the system
bus. Are there any implications with this?

A2. MSI support enables a PCI device sending an inbound
memory write (0xfeexxxxx as target address) on its PCI bus
directly to the FSB. Since the message address has a
redirection hint bit cleared, it should work.

Q3. The target address 0xfeexxxxx will be translated by the
Host Bridge into an interrupt message. Are there any
limitations on the chipsets such as Intel 8xx, Intel e7xxx,
or VIA?

A3. If these chipsets support an inbound memory write with
target address set as 0xfeexxxxx, as conformed to PCI
specification 2.3 or latest, then it should work.

Q4. From the driver point of view, if the MSI is lost because
R
Randy Dunlap 已提交
562 563
of errors occurring during inbound memory write, then it may
wait forever. Is there a mechanism for it to recover?
L
Linus Torvalds 已提交
564 565 566 567 568 569 570 571 572

A4. Since the target of the transaction is an inbound memory
write, all transaction termination conditions (Retry,
Master-Abort, Target-Abort, or normal completion) are
supported. A device sending an MSI must abide by all the PCI
rules and conditions regarding that inbound memory write. So,
if a retry is signaled it must retry, etc... We believe that
the recommendation for Abort is also a retry (refer to PCI
specification 2.3 or latest).