- 01 11月, 2013 1 次提交
-
-
由 Luck, Tony 提交于
cper.c contains code to decode and print "Common Platform Error Records". Originally added under drivers/acpi/apei because the only user was in that same directory - but now we have another consumer, and we shouldn't have to force CONFIG_ACPI_APEI get access to this code. Since CPER is defined in the UEFI specification - the logical home for this code is under drivers/firmware/efi/ Acked-by: NMatt Fleming <matt.fleming@intel.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 24 10月, 2013 3 次提交
-
-
由 Chen, Gong 提交于
Memory error reporting is much too verbose. Most users do not care about the DIMM internal bank/row/column information. Downgrade the fine details to "pr_debug" status so that those few who do care can get them if they really want to. The detail information will be later be provided by perf/trace interface. Since things are still a bit scary, and users are sometimes overly nervous, provide a reassuring message that corrected errors do not generally require any further action. Suggested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Chen, Gong 提交于
After H/W error happens under FFM enabled mode, lots of information are shown but new fields added by UEFI 2.4 (e.g. DIMM location) need to be added. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Chen, Gong 提交于
In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 22 10月, 2013 2 次提交
-
-
由 Chen, Gong 提交于
We have a lot of confusing names of functions and data structures in amongs the the error reporting code. In particular the "apei" prefix has been applied to many objects that are not part of APEI. Since we will be using these routines for extended error log reporting it will be clearer if we fix up the names first. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Chen, Gong 提交于
Commit aaf9d93b: ACPI / APEI: fix error status check condition for CPER only catches condition check before print, but a similar check is needed during printing CPER error sections. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 20 8月, 2013 4 次提交
-
-
由 Aruna Balakrishnaiah 提交于
In pstore write, set the section type to CPER_SECTION_TYPE_DMESG_COMPR if the data is compressed. In pstore read, read the section type and update the 'compressed' flag accordingly. Signed-off-by: NAruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Aruna Balakrishnaiah 提交于
Backends will set the flag 'compressed' after reading the log from persistent store to indicate the data being returned to pstore is compressed or not. Signed-off-by: NAruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Aruna Balakrishnaiah 提交于
Addition of new argument 'compressed' in the write call back will help the backend to know if the data passed from pstore is compressed or not (In case where compression fails.). If compressed, the backend can add a tag indicating the data is compressed while writing to persistent store. Signed-off-by: NAruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Wei Yongjun 提交于
Add the missing iounmap() before return from erst_exec_move_data() in the error handling case. Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 12 8月, 2013 1 次提交
-
-
由 Naveen N. Rao 提交于
Randconfig testing found this build error: >> hest.c(.init.text+0x6004): undefined reference to 'mce_disable_bank' Fix by wrapping body of hest_parse_cmc() inside #ifdef CONFIG_X86_MCE Reported-by: N"Wu, Fengguang" <fengguang.wu@intel.com> Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NBorislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/0129220@agluck-desk.sc.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 7月, 2013 1 次提交
-
-
由 Borislav Petkov 提交于
... according to acpi/apei/ conventions. Use standard pr_fmt prefix while at it. Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 11 7月, 2013 1 次提交
-
-
由 Naveen N. Rao 提交于
If the firmware indicates in GHES error data entry that the error threshold has exceeded for a corrected error event, then we try to soft-offline the page. This could be called in interrupt context, so we queue this up similar to how we handle memory failure scenarios. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 09 7月, 2013 2 次提交
-
-
由 Naveen N. Rao 提交于
Add a boot option to disable firmware first mode for corrected errors. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Naveen N. Rao 提交于
The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 01 7月, 2013 1 次提交
-
-
由 Aruna Balakrishnaiah 提交于
Header size is needed to distinguish between header and the dump data. Incorporate the addition of new argument (hsize) in the pstore write callback. Signed-off-by: NAruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 29 6月, 2013 1 次提交
-
-
由 Lenny Szubowicz 提交于
This is patch 2/3 of a patch set that avoids what misleadingly appears to be a error during boot: ERST: Could not register with persistent store This message is displayed if the system has a valid ACPI ERST table and the pstore.backend kernel parameter has been used to disable use of ERST by pstore. But this same message is used for errors that preclude registration. In erst_init don't complain if the setting of kernel parameter pstore.backend precludes use of ACPI ERST for pstore. Routine pstore_register will inform about the facility that does register. Also, don't leave a dangling pointer to deallocated mem for the pstore buffer when registration fails. Signed-off-by: NLenny Szubowicz <lszubowi@redhat.com> Reported-by: NNaotaka Hamaguchi <n.hamaguchi@jp.fujitsu.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 07 6月, 2013 2 次提交
-
-
由 Chen Gong 提交于
When param1 is enabled in EINJ but not assigned with a valid value, sometimes it will cause the error like below: APEI: Can not request [mem 0x7aaa7000-0x7aaa7007] for APEI EINJ Trigger registers It is because some firmware will access target address specified in param1 to trigger the error when injecting memory error. This will cause resource conflict with regular memory. So It must be removed from trigger table resources, but incorrect param1/param2 combination will stop this action. Add extra check to avoid this kind of error. Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Betty Dall 提交于
The CPER error record has a reset bit that indicates that the platform has reset the component. The reset bit can be set for any severity error including recoverable. From the AER code path's perspective, any error is fatal if the component has been reset. This patch upgrades the severity of the AER recovery to AER_FATAL whenever the CPER error record indicates that the component has been reset. [bhelgaas: s/bus has been reset/component has been reset/] Signed-off-by: NBetty Dall <betty.dall@hp.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 06 6月, 2013 1 次提交
-
-
由 Wei Yongjun 提交于
Fix to return -ENOMEM in the debugfs_create_xxx() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 05 6月, 2013 1 次提交
-
-
由 Wei Yongjun 提交于
Fix to return a negative error code in the acpi_gsi_to_irq() and request_irq() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 31 5月, 2013 1 次提交
-
-
由 Lance Ortiz 提交于
The following warning was seen on 3.9 when a corrected PCIe error was being handled by the AER subsystem. WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90() This occurred because a call to pci_get_domain_bus_and_slot() was added to cper_print_pcie() to setup for the call to cper_print_aer(). The warning showed up because cper_print_pcie() is called in an interrupt context and pci_get* functions are not supposed to be called in that context. The solution is to move the cper_print_aer() call out of the interrupt context and into aer_recover_work_func() to avoid any warnings when calling pci_get* functions. Signed-off-by: NLance Ortiz <lance.ortiz@hp.com> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 27 3月, 2013 1 次提交
-
-
由 Chen Gong 提交于
In Table 18-289, ACPI5.0 SPEC, the error data length in CPER Generic Error Data Entry can be 0, which means this generic error data entry can have only one header. So fix the check conditon for it. Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Reviewed-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 26 2月, 2013 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
In order to allow reporting errors via EDAC, add hooks for: 1) register an EDAC driver; 2) unregister an EDAC driver; 3) report errors via EDAC. As the EDAC driver will need to access the ghes structure, adds it as one of the parameters for ghes_do_proc. Acked-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
- 23 2月, 2013 1 次提交
-
-
由 Rafael J. Wysocki 提交于
After commit 92ef2a25 (ACPI: Change the ordering of PCI root bridge driver registrarion), acpi_hest_init() is never called for acpi=off (acpi_disabled), so hest_disable is not set, but hest_tab is NULL, which causes apei_hest_parse() to crash when it is called from aer_acpi_firmware_first(). Fix that by making apei_hest_parse() check if hest_tab is not NULL in addition to checking hest_disable. Also remove the now useless acpi_disabled check from apei_hest_parse(). Reported-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 22 2月, 2013 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
As a ghes_edac driver will need to access ghes structures, in order to properly handle the errors, move those structures to a separate header file. No functional changes. Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
- 19 1月, 2013 1 次提交
-
-
由 Lans Zhang 提交于
The bit width check was introduced by 15afae60 (ACPI, APEI: Fix incorrect APEI register bit width check and usage), and a fixup for incorrect 32-bit width memory address was given by f712c71f (ACPI, APEI: Fixup common access width firmware bug). Now there is a similar symptom: [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0x12345000/64/0/3/0] Another bogus BIOS reports an incorrect 64-bit width in trigger table. Thus, apply to a similar workaround for 64-bit width memory address. Signed-off-by: NLans Zhang <jia.zhang@windriver.com> Acked-by: NGary Hade <garyhade@us.ibm.com> Acked-by: NMyron Stowe <myron.stowe@redhat.com> Acked-by: NJean Delvare <jdelvare@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 04 1月, 2013 1 次提交
-
-
由 Lance Ortiz 提交于
This patch will provide a more reliable and easy way for user-space applications to have access to AER logs rather than reading them from the message buffer. It also provides a way to notify user-space when an AER event occurs. The aer driver is updated to generate a trace event of function 'aer_event' when a PCIe error is reported over the AER interface. The trace event was added to both the interrupt based aer path and the firmware first path. Signed-off-by: NLance Ortiz <lance.ortiz@hp.com> Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com> Acked-by: NBoris Petkov <bp@alien8.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 03 1月, 2013 1 次提交
-
-
由 Adrian Huang 提交于
If the persistent store is empty initially, the function 'erst_dbg_read' returns a nonzero value. The better way is to return a zero indicating the read operation reaches EOF. Tested on two different servers. Signed-off-by: NAdrian Huang <adrian.huang@hp.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 08 12月, 2012 1 次提交
-
-
由 Chen Gong 提交于
To handle error trigger table correctly, memory region must be removed from request region. We had a series of patches to do this culminating in: commit b4e008dc ACPI, APEI, EINJ, Refine the fix of resource conflict but when ACPI5 support was added, we missed updating this area. So when using EINJ table on an ACPI5 enabled machine, we get following error: APEI: Can not request [mem 0x526b80000-0x526b80007] for APEI EINJ Trigger registers Fix this by checking for the acpi5 case and using the same code that was added earlier. Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 29 11月, 2012 1 次提交
-
-
由 Bill Pemberton 提交于
CONFIG_HOTPLUG is going away as an option so __devinit is no longer needed. Signed-off-by: NBill Pemberton <wfp5p@virginia.edu> Acked-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 11月, 2012 2 次提交
-
-
由 Seiji Aguchi 提交于
[Issue] Currently, a variable name, which identifies each entry, consists of type, id and ctime. But if multiple events happens in a short time, a second/third event may fail to log because efi_pstore can't distinguish each event with current variable name. [Solution] A reasonable way to identify all events precisely is introducing a sequence counter to the variable name. The sequence counter has already supported in a pstore layer with "oopscount". So, this patch adds it to a variable name. Also, it is passed to read/erase callbacks of platform drivers in accordance with the modification of the variable name. <before applying this patch> a variable name of first event: dump-type0-1-12345678 a variable name of second event: dump-type0-1-12345678 type:0 id:1 ctime:12345678 If multiple events happen in a short time, efi_pstore can't distinguish them because variable names are same among them. <after applying this patch> it can be distinguishable by adding a sequence counter as follows. a variable name of first event: dump-type0-1-1-12345678 a variable name of Second event: dump-type0-1-2-12345678 type:0 id:1 sequence counter: 1(first event), 2(second event) ctime:12345678 In case of a write callback executed in pstore_console_write(), "0" is added to an argument of the write callback because it just logs all kernel messages and doesn't need to care about multiple events. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NMike Waychison <mikew@google.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Seiji Aguchi 提交于
[Issue] Currently, a variable name, which is used to identify each log entry, consists of type, id and ctime. But an erase callback does not use ctime. If efi_pstore supported just one log, type and id were enough. However, in case of supporting multiple logs, it doesn't work because it can't distinguish each entry without ctime at erasing time. <Example> As you can see below, efi_pstore can't differentiate first event from second one without ctime. a variable name of first event: dump-type0-1-12345678 a variable name of second event: dump-type0-1-23456789 type:0 id:1 ctime:12345678, 23456789 [Solution] This patch adds ctime to an argument of an erase callback. It works across reboots because ctime of pstore means the date that the record was originally stored. To do this, efi_pstore saves the ctime to variable name at writing time and passes it to pstore at reading time. Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com> Acked-by: NMike Waychison <mikew@google.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 22 11月, 2012 1 次提交
-
-
由 Bill Pemberton 提交于
CONFIG_HOTPLUG is going away as an option so __devexit is no longer needed. Signed-off-by: NBill Pemberton <wfp5p@virginia.edu> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 14 7月, 2012 1 次提交
-
-
由 Jean Delvare 提交于
Many firmwares have a common register definition bug where 8-bit access width is specified for a 32-bit register. Ideally this should be fixed in the BIOS, but earlier versions of the kernel did not complain, so fix that up silently. This closes kernel bug #43282: https://bugzilla.kernel.org/show_bug.cgi?id=43282Signed-off-by: NJean Delvare <jdelvare@suse.de> Acked-by: NHuang Ying <ying.huang@intel.com> Acked-by: NGary Hade <garyhade@us.ibm.com> Cc: stable@vger.kernel.org [3.4+] Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 12 6月, 2012 1 次提交
-
-
由 Huang Ying 提交于
This patch fixed the following bug. https://bugzilla.kernel.org/show_bug.cgi?id=43282 This is caused by a firmware bug checking (checking generic address register provided by firmware) in runtime. The checking should be done in address mapping time instead of runtime to avoid too much error reporting in runtime. Reported-by: NPawel Sikora <pluto@agmk.net> Signed-off-by: NHuang Ying <ying.huang@intel.com> Tested-by: NJean Delvare <khali@linux-fr.org> Cc: stable@vger.kernel.org Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 30 3月, 2012 4 次提交
-
-
由 Jiang Liu 提交于
The function apei_estatus_print() and apei_estatus_check() forget to move ahead the gdata pointer when dealing with multiple generic error data sections. Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Gary Hade 提交于
The current code incorrectly assumes that (1) the APEI register bit width is always 8, 16, 32, or 64 and (2) the APEI register bit width is always equal to the APEI register access width. ERST serialization instructions entries such as: [030h 0048 1] Action : 00 [Begin Write Operation] [031h 0049 1] Instruction : 03 [Write Register Value] [032h 0050 1] Flags (decoded below) : 01 Preserve Register Bits : 1 [033h 0051 1] Reserved : 00 [034h 0052 12] Register Region : [Generic Address Structure] [034h 0052 1] Space ID : 00 [SystemMemory] [035h 0053 1] Bit Width : 03 [036h 0054 1] Bit Offset : 00 [037h 0055 1] Encoded Access Width : 03 [DWord Access:32] [038h 0056 8] Address : 000000007F2D7038 [040h 0064 8] Value : 0000000000000001 [048h 0072 8] Mask : 0000000000000007 break this assumption by yielding: [Firmware Bug]: APEI: Invalid bit width in GAR [0x7f2d7038/3/0] I have found no ACPI specification requirements corresponding with the above assumptions. There is even a good example in the Serialization Instruction Entries section (ACPI 4.0 section 17.4,1.2, ACPI 4.0a section 2.5.1.2, ACPI 5.0 section 18.5.1.2) that mentions a serialization instruction with a bit range of [6:2] which is 5 bits wide, _not_ 8, 16, 32, or 64 bits wide. Compile and boot tested with 3.3.0-rc7 on a IBM HX5. Signed-off-by: NGary Hade <garyhade@us.ibm.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Chen Gong 提交于
Some APEI firmware implementation will access injected address specified in param1 to trigger the error when injecting memory error, which means if one SRAR error is injected, the crash always happens because it is executed in kernel context. This new parameter can disable trigger action and control is taken over by the user. In this way, an SRAR error can happen in user context instead of crashing the system. This function is highly depended on BIOS implementation so please ensure you know the BIOS trigger procedure before you enable this switch. v2: notrigger should be created together with param1/param2 Tested-by: NTony Luck <tony.luck@lintel.com> Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Chen Gong 提交于
On the platforms with ACPI4.x support, parameter extension is not always doable, which means only parameter extension is enabled, einj_param can take effect. v2->v1: stopping early in einj_get_parameter_address for einj_param Signed-off-by: NChen Gong <gong.chen@linux.intel.com> Acked-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-