- 17 2月, 2010 1 次提交
-
-
由 Breno Leitao 提交于
During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: Unable to handle kernel paging request for data at address 0x000000a0 Faulting instruction address: 0xc00000000006b8b4 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0 LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 Call Trace: [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70 The bug occurs because pci_name() tries to access a null pointer. This patch just guarantee that pci_name() is not called on Null pointers. Signed-off-by: NBreno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: NLinas Vepstas <linasvepstas@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 9月, 2009 1 次提交
-
-
由 Mike Mason 提交于
By default, the EEH framework on powerpc does what's known as a "hot reset" during recovery of a PCI Express device. We've found a case where the device needs a "fundamental reset" to recover properly. The current PCI error recovery and EEH frameworks do not support this distinction. The attached patch makes changes to EEH to utilize the new bit field. Signed-off-by: NMike Mason <mmlnx@us.ibm.com> Signed-off-by: NRichard Lary <rlary@us.ibm.com> Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
-
- 06 11月, 2008 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
To properly fix PCI hotplug, it's useful to be able to make the fixup passes on all devices whether they were just hot plugged or already there. The EEH code however used to not be very friendly with calling eeh_add_device_late() multiple time, and not very rebust in the way it generally tests whether a device is in the expected state vs. the EEH code. This improves it, along with cleaning up a couple of debug printk's. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 7月, 2008 1 次提交
-
-
由 Mike Mason 提交于
This patch changes the EEH_MAX_FAILS action from panic to printing an error message. Panicking under under this condition is too harsh. Although performance will be affected and the device may not recover, the system is still running, which at the very least should allow for a more graceful shutdown. The patch also removes the msleep() within a spinlock, which can lead to a deadlock and is not recommended. Signed-off-by: NMike Mason <mmlnx@us.ibm.com> Acked-by: NLinas Vepstas <linasvepstas@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 7月, 2008 1 次提交
-
-
由 Mike Mason 提交于
The following patch restores the PERR and SERR bits in the PCI command register during an EEH device recovery. We have found at least one case (an Agilent test card) where the PERR/SERR bits are set to 1 by firmware at boot time, but are not restored to 1 during EEH recovery. The patch fixes the Agilent card problem. It has been tested on several other EEH-enabled cards with no regressions. Signed-off-by: NMike Mason <mmlnx@us.ibm.com> Acked-by: NLinas Vepstas <linasvepstas@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 29 4月, 2008 1 次提交
-
-
由 Denis V. Lunev 提交于
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Add correct ->owner to proc_fops to fix reading/module unloading race. Signed-off-by: NDenis V. Lunev <den@openvz.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 4月, 2008 1 次提交
-
-
由 Michael Ellerman 提交于
Add a DEBUG config setting which turns on all (most) of the debugging under platforms/pseries. To have this take effect we need to remove all the #undef DEBUG's, in various files. We leave the #undef DEBUG in platforms/pseries/lpar.c, as this enables debugging printks from the low-level hash table routines, and tends to make your system unusable. If you want those enabled you still have to turn them on by hand. Also some of the RAS code has a DEBUG block which causes a functional change, so I've keyed this off a different (non-existant) debug #define. This is only enabled if you have PPC_EARLY_DEBUG enabled also. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 07 4月, 2008 1 次提交
-
-
由 Nathan Lynch 提交于
A couple of places are duplicating the function of of_device_is_available; convert them to use it. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 17 1月, 2008 1 次提交
-
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 03 12月, 2007 2 次提交
-
-
由 Linas Vepstas 提交于
If an "empty" slot is failing, make sure its a permanent failure; else process the error normally. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Perform all error checking at the "partitonable endpoint" of the device. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 08 11月, 2007 2 次提交
-
-
由 Linas Vepstas 提交于
Fix old buglet; a warning message should have been printed when a hardware reset takes too long. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Bugfix: avoid crash if there's no PCI device for a given openfirmware node. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 02 10月, 2007 1 次提交
-
-
由 Linas Vepstas 提交于
It seems that some versions of firmware will report a device node status as the string "okay". As we are not expecting this string, the device node will be ignored by the EEH subsystem. Which means EEH will not be enabled. When EEH is not enabled, PCI errors will be converted into Machine Check exceptions, and we'll have a very unhappy system. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 17 8月, 2007 3 次提交
-
-
由 Linas Vepstas 提交于
Remove dead code, and a misleading comment about EEH checking for video devices. The removed code is a left-over from the olden days where there was concern over how video devices worked in Linux. We are never going to go that way again, so kill this. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 17 ----------------- 1 file changed, 17 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Gather bridge-specific data on EEH events. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Print return code to print message. Also fix whitespace. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 14 6月, 2007 4 次提交
-
-
由 Linas Vepstas 提交于
Twiddle the copyright notices. Per current guidelines, the use of the (C) or (c) in source code is deprecated. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 6 +++++- arch/powerpc/platforms/pseries/eeh_cache.c | 3 ++- arch/powerpc/platforms/pseries/eeh_driver.c | 6 +++--- 3 files changed, 10 insertions(+), 5 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Remove some dead code. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Track and report the number of times we read an all-1s value (0xff, 0xffff or 0xffffffff) from each device which is valid data, not indicating EEH isolation. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 5 +++++ arch/powerpc/platforms/pseries/eeh_sysfs.c | 3 +++ include/asm-powerpc/pci-bridge.h | 1 + 3 files changed, 9 insertions(+) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Add sysfs blinkenlights for EEH statistics. Shuffle the eeh_add_device_tree() call so that it appears in the correct sequence. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/Makefile | 2 arch/powerpc/platforms/pseries/eeh.c | 4 + arch/powerpc/platforms/pseries/eeh_cache.c | 2 arch/powerpc/platforms/pseries/eeh_sysfs.c | 84 +++++++++++++++++++++++++++++ arch/powerpc/platforms/pseries/pci_dlpar.c | 7 +- include/asm-powerpc/ppc-pci.h | 3 + 6 files changed, 98 insertions(+), 4 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 10 5月, 2007 1 次提交
-
-
由 Linas Vepstas 提交于
Assorted minor cleanups to EEH code; -- use literals, use kerneldoc format. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c | 13 ++++++++++--- arch/powerpc/platforms/pseries/eeh_driver.c | 7 ++++--- include/asm-powerpc/ppc-pci.h | 18 +++++++++++++++--- 3 files changed, 29 insertions(+), 9 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 09 5月, 2007 2 次提交
-
-
由 Linas Vepstas 提交于
When an EEH event is detected, and after the device driver has been notified, but before the device is reset, enable MMIO to the adapter, and grab the contents of the PCI status and command registers, the PCI-X status and command, and the PCI-E capability 10 and AER registers. Pass these up to the RTAS error log, and also printk them. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
If an EEH event is observed, capture PCI config space info about the device, wrap it up and pass it to the event logger. This pach just slots in the basic logging function. A later patch will provide for more through data gathering. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 08 5月, 2007 1 次提交
-
-
由 Brian King 提交于
Adds the pSeries platform implementation for a new PCI API which can be used to issue various types of PCI-E reset, including PCI-E warm reset and PCI-E hot reset. This is needed for an ipr PCI-E adapter which does not properly implement BIST. Running BIST on this adapter results in PCI-E errors. The only reliable reset mechanism that exists on this hardware is PCI Fundamental reset (warm reset). Acked-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 4月, 2007 1 次提交
-
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 3月, 2007 8 次提交
-
-
由 Linas Vepstas 提交于
Rework how multi-function PCI devices are identified and traversed. This fixes a bug with multi-function recovery on Power4 that was introduced by a recent Power4 EEH patch. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
After requesting a state change, verify that the state change actually ocurred, and the system ends up in the expected state. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
The EEH event notification system passes around data that is not needed or at least, not used properly. Stop passing this data; get it in a more reliable fashion. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Modify routine that returns PCI slot status to wait for slot status to become available. This is needed, as slots that are in some remote card cage may go offline for extended periods of time. New users for this routine in following patches. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Some firmware versions will return a slot reset state of "1" when a slot is EEH frozen. Recognize this as a state that can be handled. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Provide support for the new ibm,get-config-addr-info2 RTAS token, whenever it is actually available. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Some drivers will attempt to perform a lot of mmio even after an EEH event was detected. This is especially the case for fast cpu's and PCI-E slots. Be a bit more lenient in allowing this. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Change the order in which pci error state is examined; the "capabilites" is not valid if "reset state" is 5. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 2月, 2007 1 次提交
-
-
由 Arjan van de Ven 提交于
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. [akpm@osdl.org: sparc64 fix] Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 2月, 2007 1 次提交
-
-
由 Linas Vepstas 提交于
It appears that EEH is improperly enabled for some Power4 systems. On these systems, the ibm,set-eeh-option returns a value of success even when EEH is not supported on the given node. Thus, an explicit check for support is required. During boot, on power4, without this patch, one sees messages similar to: EEH: event on unsupported device, rc=0 dn=/pci@400000000110/IBM,sp@1 EEH: event on unsupported device, rc=0 dn=/pci@400000000110/pci@2 EEH: event on unsupported device, rc=0 dn=/pci@400000000110/pci@2,2 etc. The patch makes these go away. Without this patch, EEH recovery does seem to work correctly for at least some devices (I tested ethernet e1000), but fails to recover others (the Emulex LightPulse LPFC, most notably). Off the top of my head, I don't remember why some devices are affected, but not others. The PAPR indicates that the correct way to test for EEH is as done in this patch; its not clear to me if this was in the PAPR all along, or recently added; if it was there all along, its not clear to me why this hadn't been fixed long ago. I suspect only certain firmware levels are affected. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 08 12月, 2006 1 次提交
-
-
由 Linas Vepstas 提交于
If one attempts to create a device driver recovery sequence that does not depend on a hard reset of the device, but simply just attempts to resume processing, then one discovers that the recovery sequence implemented on powerpc is not quite right. This patch fixes this up. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 26 9月, 2006 1 次提交
-
-
由 Linas Vepstas 提交于
Bug fix: when marking a slot as frozen, we forgot to mark pci device itself as frozen. (we did manage to mark the pci children, but forget the parent itself.) This is needed so that some device drivers can check the pci status in critical sections (e.g. in spin loops with interrupts disabled). Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 9月, 2006 1 次提交
-
-
由 Linas Vepstas 提交于
On detection of an EEH error, some Power4 systems seem to occasionally want to be reset twice before they report themselves as fully recovered. This patch re-arranges the code to attempt additional resets if the first one doesn't take. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 21 9月, 2006 1 次提交
-
-
由 Linas Vepstas 提交于
Add wrapper around the rtas call to enable MMIO or DMA on a frozen pci slot. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-