- 20 5月, 2014 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This reverts commit b2b5efcf. This code was way too board specific, there are quirks as to how the PERST line is wired on different boards, we'll have to revisit this using/creating appropriate firmware interfaces. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 4月, 2014 7 次提交
-
-
由 Gavin Shan 提交于
The patch intends to support fundamental reset on PLX downstream ports. If the PCI device matches any one of the internal table, which includes PLX vendor ID, bridge device ID, register offset for fundamental reset and bit, fundamental reset will be done accordingly. Otherwise, it will fail back to hot reset. Additional flag (EEH_DEV_FRESET) is introduced to record the last reset type on the PCI bridge. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
Basically, we have 3 types of resets to fulfil PE reset: fundamental, hot and PHB reset. For the later 2 cases, we need PCI bus reset hold and settlement delay as specified by PCI spec. PowerNV and pSeries platforms are running on top of different firmware and some of the delays have been covered by underly firmware (PowerNV). The patch makes the delays unified to be done in backend, instead of EEH core. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The issue was detected in a bit complicated test case where we have multiple hierarchical PEs shown as following figure: +-----------------+ | PE#3 p2p#0 | | p2p#1 | +-----------------+ | +-----------------+ | PE#4 pdev#0 | | pdev#1 | +-----------------+ PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p bridges. We accidentally had less-known scenario: PE#4 was removed permanently from the system because of permanent failure (e.g. exceeding the max allowd failure times in last hour), then we detects EEH errors on PE#3 and tried to recover it. However, eeh_dev instances for pdev#0/1 were not detached from PE#4, which was still connected to PE#3. All of that was because of the fact that we rely on count-based pcibios_release_device(), which isn't reliable enough. When doing recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which are not valid any more. Eventually, we run into kernel crash. The patch fixes above issue from two aspects. For unplug, we simply skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read its PCI config so that PCI core will omit them. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
There're 2 EEH subsystem variables: eeh_subsystem_enabled and eeh_probe_mode. We needn't maintain 2 variables and we can just have one variable and introduce different flags. The patch also introduces additional flag EEH_FORCE_DISABLE, which will be used to disable EEH subsystem via boot parameter ("eeh=off") in future. Besides, the patch also introduces flag EEH_ENABLED, which is changed to disable or enable EEH functionality on the fly through debugfs entry in future. With the patch applied, the creteria to check the enabled EEH functionality is changed to: !EEH_FORCE_DISABLED && EEH_ENABLED : Enabled Other cases : Disabled Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
When calling into eeh_gather_pci_data() on pSeries platform, we possiblly don't have pci_dev instance yet, but eeh_dev is always ready. So we use cached capability from eeh_dev instead of pci_dev for log dump there. In order to keep things unified, we also cache PCI capability positions to eeh_dev for PowerNV as well. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
We've observed multiple PE reset failures because of PCI-CFG access during that period. Potentially, some device drivers can't support EEH very well and they can't put the device to motionless state before PE reset. So those device drivers might produce PCI-CFG accesses during PE reset. Also, we could have PCI-CFG access from user space (e.g. "lspci"). Since access to frozen PE should return 0xFF's, we can block PCI-CFG access during the period of PE reset so that we won't get recrusive EEH errors. The patch adds flag EEH_PE_RESET, which is kept during PE reset. The PowerNV/pSeries PCI-CFG accessors reuse the flag to block PCI-CFG accordingly. Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD would be removed from the system. However, it's safe to replace that with EEH_PE_ISOLATED. The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled, either failure or success. It makes the PHB PE state consistent with: PHB functions normally NONE PHB has been removed EEH_PE_ISOLATED PHB fenced, recovery in progress EEH_PE_ISOLATED | RECOVERING Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 17 2月, 2014 1 次提交
-
-
由 Gavin Shan 提交于
The patch cleans up variable eeh_subsystem_enabled so that we needn't refer the variable directly from external. Instead, we will use function eeh_enabled() and eeh_set_enable() to operate the variable. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 15 1月, 2014 3 次提交
-
-
由 Gavin Shan 提交于
For one PCI error relevant OPAL event, we possibly have multiple EEH errors for that. For example, multiple frozen PEs detected on different PHBs. Unfortunately, we didn't cover the case. The patch enumarates the return value from eeh_ops::next_error() and change eeh_handle_special_event() and eeh_ops::next_error() to handle all existing EEH errors. As Ben pointed out, we needn't list_for_each_entry_safe() since we are not deleting any PHB from the hose_list and the EEH serialized lock should be held while purging EEH events. The patch covers those suggestions as well. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
When EEH error comes to one specific PCI device before its driver is loaded, we will apply hotplug to recover the error. During the plug time, the PCI device will be probed and its driver is loaded. Then we wrongly calls to the error handlers if the driver supports EEH explicitly. The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and set it before we remove the PCI device. In turn, we can avoid wrongly calls the error handlers of the PCI device after its driver loaded. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
After reset on the specific PE or PHB, we never configure AER correctly on PowerNV platform. We needn't care it on pSeries platform. The patch introduces additional EEH operation eeh_ops:: restore_config() so that we have chance to configure AER correctly for PowerNV platform. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 7月, 2013 6 次提交
-
-
由 Gavin Shan 提交于
The patch introduces flag EEH_DEV_SYSFS to keep track that the sysfs entries for the corresponding EEH device (then PCI device) has been added or removed, in order to avoid race condition. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
While restoring BARs for one specific PCI device, the pci_dev instance should have been released. So it's not reliable to use the pci_dev instance on restoring BARs. However, we still need some information (e.g. PCIe capability position, header type) from the pci_dev instance. So we have to store those information to EEH device in advance. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
When EEH error happens to one specific PE, some devices with drivers supporting EEH won't except hotplug on the device. However, there might have other deivces without driver, or with driver without EEH support. For the case, we need do partial hotplug in order to make sure that the PE becomes absolutely quite during reset. Otherise, the PE reset might fail and leads to failure of error recovery. The current code doesn't handle that 'mixed' case properly, it either uses the error callbacks to the drivers, or tries hotplug, but doesn't handle a PE (EEH domain) composed of a combination of the two. The patch intends to support so-called "partial" hotplug for EEH: Before we do reset, we stop and remove those PCI devices without EEH sensitive driver. The corresponding EEH devices are not detached from its PE, but with special flag. After the reset is done, those EEH devices with the special flag will be scanned one by one. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
Currently, we're trasversing the EEH devices list using list_for_each_entry(). That's not safe enough because the EEH devices might be removed from its parent PE while doing iteration. The patch replaces that with list_for_each_entry_safe(). Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
When we do normal hotplug, the PE (shadow EEH structure) shouldn't be kept around. However, we need to keep it if the hotplug an artifial one caused by EEH errors recovery. Since we remove EEH device through the PCI hook pcibios_release_device(), the flag "purge_pe" passed to various functions is meaningless. So the patch removes the meaningless flag and introduce new flag "EEH_PE_KEEP" to save the PE while doing hotplug during EEH error recovery. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
Make some functions public in order to support hotplug on either specific PCI bus or PCI device in future. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 7月, 2013 1 次提交
-
-
由 Gavin Shan 提交于
The patch is for avoiding following build warnings: The function .pnv_pci_ioda_fixup() references the function __init .eeh_init(). This is often because .pnv_pci_ioda_fixup lacks a __init The function .pnv_pci_ioda_fixup() references the function __init .eeh_addr_cache_build(). This is often because .pnv_pci_ioda_fixup lacks a __init Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 25 6月, 2013 1 次提交
-
-
由 Gavin Shan 提交于
Originally, eeh_mutex was introduced to protect the PE hierarchy tree and the attached EEH devices because EEH core was possiblly running with multiple threads to access the PE hierarchy tree. However, we now have only one kthread in EEH core. So we needn't the eeh_mutex and just remove it. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 20 6月, 2013 9 次提交
-
-
由 Gavin Shan 提交于
On PowerNV platform, the EEH event caused by interrupt won't have binding PE. The patch enables EEH core to handle the special event. To avoid the current logic we have, The eeh_handle_event() is renamed to eeh_handle_normal_event(), and the eeh_handle_special_event() is introduced. The function eeh_handle_event() dispatches to above two functions according to the input parameter. Besides, new backend "next_error" added to eeh_ops and it's expected to have following return values: 4 - Dead IOC 3 - Dead PHB 2 - Fenced PHB 1 - Frozen PE 0 - No error found Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
An EEH event is created and queued to the event queue for each ingress EEH error. When there're mutiple EEH errors, we need serialize the process to keep consistent PE state (flags). The spinlock "confirm_error_lock" was introduced for the purpose. We'll inject EEH event upon error reporting interrupts on PowerNV platform. So we export the spinlock for that to use for consistent PE state. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
We're not expecting that one specific PE got frozen for over 5 times in last hour. Otherwise, the PE will be removed from the system upon newly coming EEH errors. The patch introduces time stamp to trace the first error on specific PE in last hour and function to update that accordingly. Besides, the time stamp is recovered during PE hotplug path as we did for frozen count. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch adds new EEH operation post_init. It's used to notify the platform that EEH core has completed the EEH probe. By that, PowerNV platform starts to use the services supplied by EEH functionality. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
For EEH on PowerNV platform, we will do EEH probe based on the real PCI devices. The PCI devices are available after PCI probe. So we have to call eeh_init() explicitly on PowerNV platform after PCI probe. The patch also does EEH probe for PowerNV platform in eeh_init(). Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
There're several types of PEs can be supported for now: PHB, Bus and Device dependent PE. For PCI bus dependent PE, tracing the corresponding PCI bus from PE (struct eeh_pe) would make the code more efficient. The patch also enables the retrieval of PCI bus based on the PCI bus dependent PE. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
While processing EEH event interrupt from P7IOC, we need function to retrieve the PE according to the indicated EEH device. The patch makes function eeh_pe_get() public so that other source files can call it for that purpose. Also, the patch fixes referring to wrong BDF (Bus/Device/Function) address while searching PE in function __eeh_pe_get(). Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
One of the possible cases indicated by P7IOC interrupt is fenced PHB. For that case, we need fetch the PE corresponding to the PHB and disable the PHB and all subordinate PCI buses/devices, recover from the fenced state and eventually enable the whole PHB. We need one function to fetch the PHB PE outside eeh_pe.c and the patch is going to make eeh_phb_pe_get() public for that purpose. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
Under some special circumstances, the EEH device doesn't have the associated device tree node or PCI device. The patch enhances those functions converting EEH device to device tree node or PCI device accordingly to avoid unnecessary system crash. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 1月, 2013 1 次提交
-
-
The DDW code uses a eeh_dev struct from the pci_dev. However, this is not set until eeh_add_device_late is called. Since pci_bus_add_devices is called before eeh_add_device_late, the PCI devices are added to the bus, making drivers' probe hooks to be called. These will call set_dma_mask, which will call the DDW code, which will require the eeh_dev struct from pci_dev. This would result in a crash, due to a NULL dereference. Calling eeh_add_device_late after pci_bus_add_devices would make the system BUG, because device files shouldn't be added to devices there were not added to the system. So, a new function is needed to add such files only after pci_bus_add_devices have been called. Cc: stable@vger.kernel.org Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 1月, 2013 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, __devinitconst, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 18 9月, 2012 2 次提交
-
-
由 Gavin Shan 提交于
Function eeh_rmv_from_parent_pe() could be called by the path of either normal PCI hotplug, or EEH recovery. For the former case, we need purge the corresponding PE on removal of the associated PE bus. The patch tries to cover that by passing more information to function pcibios_remove_pci_devices() so that we know if the corresponding PE needs to be purged or be marked as "invalid". Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
When EEH error happens on the PE whose PCI devices don't have attached drivers. In function eeh_handle_event(), the default value PCI_ERS_RESULT_NONE will be returned after iterating all drivers of those PCI devices belonging to the PE. Actually, we don't have installed drivers for the PCI devices. Under the circumstance, we will remove the corresponding PCI bus of the PE, including the associated EEH devices and PE instance. However, we still need the information stored in the PE instance to do PE reset after that. So it's unsafe to free the PE instance. The patch introduces EEH_PE_INVALID type PE to address the issue. When the PCI bus and the corresponding attached EEH devices are removed, we will mark the PE as EEH_PE_INVALID. At later point, the PE will be changed to EEH_PE_DEVICE or EEH_PE_BUS when the corresponding EEH devices are attached again. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 9月, 2012 7 次提交
-
-
由 Gavin Shan 提交于
The patch does cleanup on EEH PCI address cache based on the fact EEH core is the only user of the component. * Cleanup on function names so that they all have prefix "eeh" and looks more short. * Function printk() has been replaced with pr_debug() or pr_warning() accordingly. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The idea comes from Benjamin Herrenschmidt. The eeh cache helps fetching the pci device according to the given I/O address. Since the eeh cache is serving for eeh, it's reasonable for eeh cache to trace eeh device except pci device. The patch make eeh cache to trace eeh device. Also, the major eeh entry function eeh_dn_check_failure has been renamed to eeh_dev_check_failure since it will take eeh device as input parameter. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
While EEH module is installed, PCI devices is checked one by one to see if it supports eeh. On different platforms, the PCI devices are referred through different ways when the EEH module is loaded. For example, on pSeries platform, that is done by OF node. However, we would do that by real PCI devices (struct pci_dev) on PowerNV platform in future. So we needs some mechanism to differentiate those cases by classifying them to probe modes, either from OF nodes or real PCI devices. The patch implements the support to eeh probe mode. Also, the EEH on pSeries has set it into EEH_PROBE_MODE_DEVTREE. That means the probe will be done based on OF nodes on pSeries platform. In addition, On pSeries platform, it's done by OF nodes. The patch moves the the probe function from EEH core to platform dependent backend and some cleanup applied. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch removes the eeh related statistics for eeh device since they have been maintained by the corresponding eeh PE. Also, the flags used to trace the state of eeh device and PE have been reworked for a little bit. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch reworks the current implementation so that the eeh errors will be handled basing on PE instead of eeh device. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
The patch introduces the function to traverse the devices of the specified PE and its child PEs. Also, the restore on device bars is implemented based on the traverse function. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Gavin Shan 提交于
Originally, all the EEH operations were implemented based on OF node. Actually, it explicitly breaks the rules that the operation target is PE instead of device. Therefore, the patch makes all the operations based on PE instead of device. Unfortunately, the backend for config space has to be kept as original because it doesn't depend on PE. Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-