提交 · 340286cd4e9aff841affb897f6d2535ed27605cf · openeuler / raspberrypi-kernel

24 10月, 2013 1 次提交

ACPI, APEI, CPER: Add UEFI 2.4 support for memory error · 147de147

由 Chen, Gong 提交于 10月 18, 2013

In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.

Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

147de147

22 10月, 2013 1 次提交

ACPI, CPER: Update cper info · 88f074f4

由 Chen, Gong 提交于 10月 18, 2013

We have a lot of confusing names of functions and data structures in
amongs the the error reporting code.  In particular the "apei" prefix
has been applied to many objects that are not part of APEI.  Since we
will be using these routines for extended error log reporting it will
be clearer if we fix up the names first.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

88f074f4

11 7月, 2013 1 次提交

mce: acpi/apei: Soft-offline a page on firmware GHES notification · cf870c70

由 Naveen N. Rao 提交于 7月 10, 2013

If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

cf870c70

07 6月, 2013 1 次提交

ACPI / APEI: Force fatal AER severity when component has been reset · 0ba98ec9

由 Betty Dall 提交于 6月 06, 2013

The CPER error record has a reset bit that indicates that the platform
has reset the component. The reset bit can be set for any severity
error including recoverable.  From the AER code path's perspective,
any error is fatal if the component has been reset.  This patch
upgrades the severity of the AER recovery to AER_FATAL whenever the
CPER error record indicates that the component has been reset.

[bhelgaas: s/bus has been reset/component has been reset/]
Signed-off-by: NBetty Dall <betty.dall@hp.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

0ba98ec9

05 6月, 2013 1 次提交

ACPI / APEI: fix error return code in ghes_probe() · a98d4f64

由 Wei Yongjun 提交于 6月 03, 2013

Fix to return a negative error code in the acpi_gsi_to_irq() and
request_irq() error handling case instead of 0, as done elsewhere
in this function.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a98d4f64

31 5月, 2013 1 次提交

aerdrv: Move cper_print_aer() call out of interrupt context · 37448adf

由 Lance Ortiz 提交于 5月 30, 2013

The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer().  The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.
Signed-off-by: NLance Ortiz <lance.ortiz@hp.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

37448adf

26 2月, 2013 1 次提交

ghes: add the needed hooks for EDAC error report · 21480547

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

In order to allow reporting errors via EDAC, add hooks for:

1) register an EDAC driver;
2) unregister an EDAC driver;
3) report errors via EDAC.

As the EDAC driver will need to access the ghes structure, adds it
as one of the parameters for ghes_do_proc.
Acked-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

21480547

22 2月, 2013 1 次提交

ghes: move structures/enum to a header file · 40e06415

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

As a ghes_edac driver will need to access ghes structures, in order
to properly handle the errors, move those structures to a separate
header file. No functional changes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

40e06415

29 11月, 2012 1 次提交

acpi: remove use of __devinit · da095fd3

由 Bill Pemberton 提交于 11月 19, 2012

CONFIG_HOTPLUG is going away as an option so __devinit is no longer
needed.
Signed-off-by: NBill Pemberton <wfp5p@virginia.edu>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

da095fd3

22 11月, 2012 1 次提交

ACPI: remove use of __devexit · b59bc2fb

由 Bill Pemberton 提交于 11月 21, 2012

CONFIG_HOTPLUG is going away as an option so __devexit is no
longer needed.
Signed-off-by: NBill Pemberton <wfp5p@virginia.edu>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b59bc2fb

12 6月, 2012 1 次提交

ACPI, APEI, Avoid too much error reporting in runtime · 34ddeb03

由 Huang Ying 提交于 6月 12, 2012

This patch fixed the following bug.

https://bugzilla.kernel.org/show_bug.cgi?id=43282

This is caused by a firmware bug checking (checking generic address
register provided by firmware) in runtime.  The checking should be
done in address mapping time instead of runtime to avoid too much
error reporting in runtime.
Reported-by: NPawel Sikora <pluto@agmk.net>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Tested-by: NJean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLen Brown <len.brown@intel.com>

34ddeb03

17 1月, 2012 4 次提交

ACPI APEI: Convert atomicio routines · 700130b4

由 Myron Stowe 提交于 11月 07, 2011

APEI needs memory access in interrupt context.  The obvious choice is
acpi_read(), but originally it couldn't be used in interrupt context
because it makes temporary mappings with ioremap().  Therefore, we added
drivers/acpi/atomicio.c, which provides:
    acpi_pre_map_gar()     -- ioremap in process context
	acpi_atomic_read()     -- memory access in interrupt context
	acpi_post_unmap_gar()  -- iounmap

Later we added acpi_os_map_generic_address() (29718521) and enhanced
acpi_read() so it works in interrupt context as long as the address has
been previously mapped (620242ae).  Now this sequence:
    acpi_os_map_generic_address()    -- ioremap in process context
    acpi_read()/apei_read()          -- now OK in interrupt context
    acpi_os_unmap_generic_address()
is equivalent to what atomicio.c provides.

This patch introduces apei_read() and apei_write(), which currently are
functional equivalents of acpi_read() and acpi_write().  This is mainly
proactive, to prevent APEI breakages if acpi_read() and acpi_write()
are ever augmented to support the 'bit_offset' field of GAS, as APEI's
__apei_exec_write_register() precludes splitting up functionality
related to 'bit_offset' and APEI's 'mask' (see its
APEI_EXEC_PRESERVE_REGISTER block).

With apei_read() and apei_write() in place, usages of atomicio routines
are converted to apei_read()/apei_write() and existing calls within
osl.c and the CA, based on the re-factoring that was done in an earlier
patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
    acpi_pre_map_gar()     -->  acpi_os_map_generic_address()
    acpi_post_unmap_gar()  -->  acpi_os_unmap_generic_address()
    acpi_atomic_read()     -->  apei_read()
    acpi_atomic_write()    -->  apei_write()

Note that acpi_read() and acpi_write() currently use 'bit_width'
for accessing GARs which seems incorrect.  'bit_width' is the size of
the register, while 'access_width' is the size of the access the
processor must generate on the bus.  The 'access_width' may be larger,
for example, if the hardware only supports 32-bit or 64-bit reads.  I
wanted to minimize any possible impacts with this patch series so I
did *not* change this behavior.
Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

700130b4

ACPI, APEI, Printk queued error record before panic · 46d12f0b

由 Huang Ying 提交于 12月 08, 2011

Because printk is not safe inside NMI handler, the recoverable error
records received in NMI handler will be queued to be printked in a
delayed IRQ context via irq_work.  If a fatal error occurs after the
recoverable error and before the irq_work processed, we lost a error
report.

To solve the issue, the queued error records are printked in NMI
handler if system will go panic.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

46d12f0b

ACPI, APEI, GHES, Distinguish interleaved error report in kernel log · 5ba82ab5

由 Huang Ying 提交于 12月 08, 2011

In most cases, printk only guarantees messages from different printk
calling will not be interleaved between each other.  But, one APEI
GHES hardware error report will involve multiple printk calling,
normally each for one line.  So it is possible that the hardware error
report comes from different generic hardware error source will be
interleaved.

In this patch, a sequence number is prefixed to each line of error
report.  So that, even if they are interleaved, they still can be
distinguished by the prefixed sequence number.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

5ba82ab5

ACPI, APEI, GHES: Add PCIe AER recovery support · a654e5ee

由 Huang Ying 提交于 12月 08, 2011

aer_recover_queue() is called when recoverable PCIe AER errors are
notified by firmware to do the recovery work.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

a654e5ee

13 1月, 2012 1 次提交

module_param: make bool parameters really bool (drivers & misc) · 90ab5ee9

由 Rusty Russell 提交于 1月 13, 2012

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

90ab5ee9

10 10月, 2011 1 次提交

x86, nmi: Wire up NMI handlers to new routines · 9c48f1c6

由 Don Zickus 提交于 9月 30, 2011

Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NCorey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9c48f1c6

03 8月, 2011 4 次提交

APEI GHES: 32-bit buildfix · 70cb6e1d

由 Len Brown 提交于 8月 02, 2011

drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression

ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
  in function ghes_estatus_cache_add().
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NLen Brown <len.brown@intel.com>

70cb6e1d

ACPI, APEI, GHES: Add hardware memory error recovery support · ba61ca4a

由 Huang Ying 提交于 7月 13, 2011

memory_failure_queue() is called when recoverable memory errors are
notified by firmware to do the recovery work.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

ba61ca4a

ACPI, APEI, GHES, Error records content based throttle · 152cef40

由 Huang Ying 提交于 7月 13, 2011

printk is used by GHES to report hardware errors.  Ratelimit is
enforced on the printk to avoid too many hardware error reports in
kernel log.  Because there may be thousands or even millions of
corrected hardware errors during system running.

Currently, a simple scheme is used.  That is, the total number of
hardware error reporting is ratelimited.  This may cause some issues
in practice.

For example, there are two kinds of hardware errors occurred in
system.  One is corrected memory error, because the fault memory
address is accessed frequently, there may be hundreds error report
per-second.  The other is corrected PCIe AER error, it will be
reported once per-second.  Because they share one ratelimit control
structure, it is highly possible that only memory error is reported.

To avoid the above issue, an error record content based throttle
algorithm is implemented in the patch.  Where after the first
successful reporting, all error records that are same are throttled for
some time, to let other kinds of error records have the opportunity to
be reported.

In above example, the memory errors will be throttled for some time,
after being printked.  Then the PCIe AER error will be printked
successfully.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

152cef40

ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99

由 Huang Ying 提交于 7月 13, 2011

Some APEI GHES recoverable errors are reported via NMI, but printk is
not safe in NMI context.

To solve the issue, a lock-less memory allocator is used to allocate
memory in NMI handler, save the error record into the allocated
memory, put the error record into a lock-less list.  On the other
hand, an irq_work is used to delay the operation from NMI context to
IRQ context.  The irq_work IRQ handler will remove nodes from
lock-less list, printk the error record and do some further processing
include recovery operation, then free the memory.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

67eb2e99

14 7月, 2011 3 次提交

ACPI, APEI, Add WHEA _OSC support · 9fb0bfe1

由 Huang Ying 提交于 7月 13, 2011

APEI firmware first mode must be turned on explicitly on some
machines, otherwise there may be no GHES hardware error record for
hardware error notification.  APEI bit in generic _OSC call can be
used to do that, but on some machine, a special WHEA _OSC call must be
used.  This patch adds the support to that WHEA _OSC call.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NMatthew Garrett <mjg@redhat.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

9fb0bfe1

ACPI, APEI, GHES, Support disable GHES at boot time · b6a95016

由 Huang Ying 提交于 7月 13, 2011

Some machine may have broken firmware so that GHES and firmware first
mode should be disabled.  This patch adds support to that.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NMatthew Garrett <mjg@redhat.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

b6a95016

ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic · 5588340d

由 Huang Ying 提交于 7月 13, 2011

printk is used by GHES to report hardware errors.  Normally, the
printk will be ratelimited to avoid too many hardware error reports in
kernel log.  Because there may be thousands or even millions of
corrected hardware errors during system running.

That is different for fatal hardware error, because system will go
panic as soon as possible, there will be no more than several error
records.  And these error records are valuable for system fault
diagnosis, so they should not be ratelimited.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

5588340d

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

12 1月, 2011 1 次提交

ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support · 81e88fdc

由 Huang Ying 提交于 1月 12, 2011

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
  via NMI, because printk is not NMI-safe. Will fix this via delay
  printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

81e88fdc

14 12月, 2010 1 次提交

ACPI, APEI, Report GHES error information via printk · 32c361f5

由 Huang Ying 提交于 12月 07, 2010

printk is one of the methods to report hardware errors to user space.
This patch implements hardware error reporting for GHES via printk.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

32c361f5

30 9月, 2010 1 次提交

ACPI, APEI, HEST Fix the unsuitable usage of platform_data · 1dd6b20e

由 Jin Dongming 提交于 9月 29, 2010

platform_data in hest_parse_ghes() is used for saving the address of entry
information of erst_tab. When the device is failed to be added, platform_data
will be freed by platform_device_put(). But the value saved in platform_data
should not be freed here. If it is done, it will make system panic.

So I think platform_data should save the address of allocated memory
which saves entry information of erst_tab.

This patch fixed it and I confirmed it on x86_64 next-tree.

v2:
    Transport the pointer of hest_hdr to platform_data using
    platform_device_add_data()
Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

1dd6b20e

09 8月, 2010 2 次提交

ACPI, APEI, Manage GHES as platform devices · 7ad6e943

由 Huang Ying 提交于 8月 02, 2010

Register GHES during HEST initialization as platform devices. And make
GHES driver into platform device driver. So that the GHES driver
module can be loaded automatically when there are GHES available.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

7ad6e943

ACPI, APEI, Rename CPER and GHES severity constants · ad4ecef2

由 Huang Ying 提交于 8月 02, 2010

The abbreviation of severity should be SEV instead of SER, so the CPER
severity constants are renamed accordingly. GHES severity constants
are renamed in the same way too.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

ad4ecef2

20 5月, 2010 1 次提交

ACPI, APEI, Generic Hardware Error Source memory error support · d334a491

由 Huang Ying 提交于 5月 18, 2010

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.

On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

d334a491