提交 · e5517c2a5a49ed5e99047008629f1cd60246ea0e · openeuler / Kernel

24 10月, 2016 1 次提交

ACPI / APEI: Fix incorrect return value of ghes_proc() · 806487a8

由 Punit Agrawal 提交于 10月 18, 2016

Although ghes_proc() tests for errors while reading the error status,
it always return success (0). Fix this by propagating the return
value.

Fixes: d334a491 (ACPI, APEI, Generic Hardware Error Source memory error support)
Signed-of-by: NPunit Agrawal <punit.agrawa.@arm.com>
Tested-by: NTyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: NBorislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

806487a8

21 9月, 2016 1 次提交

ACPI / APEI: Send correct severity to calculate AER severity · 2458d66b

由 Tyler Baicar 提交于 9月 14, 2016

Currently the AER severity is calculated by calling cper_severity_to_aer(),
but the parameter sent is actually the GHES severity.  This causes the AER
severity to be incorrect.

Fix the parameter to be the CPER severity instead of the GHES severity.
Signed-off-by: NTyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>

2458d66b

10 3月, 2016 1 次提交

drivers/acpi: make apei/ghes.c more explicitly non-modular · 020bf066

由 Paul Gortmaker 提交于 2月 15, 2016

The Kconfig currently controlling compilation of this code is:

config ACPI_APEI_GHES
      bool "APEI Generic Hardware Error Source"

...meaning that it currently is not being built as a module by anyone.

Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We replace module.h with moduleparam.h as we are keeping the
pre-existing module_param that the file has, as currently that is
the easiest way to maintain compatibility with the existing boot
arg use cases.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

020bf066

14 9月, 2015 1 次提交

acpi/apei: Use appropriate pgprot_t to map GHES memory · 8ece249a

由 Jonathan (Zhixiong) Zhang 提交于 9月 04, 2015

If the ACPI APEI firmware handles hardware error first (called
"firmware first handling"), the firmware updates the GHES memory
region with hardware error record (called "generic hardware
error record"). Essentially the firmware writes hardware error
records in the GHES memory region, triggers an NMI/interrupt,
then the GHES driver goes off and grabs the error record from
the GHES region.

The kernel currently maps the GHES memory region as cacheable
(PAGE_KERNEL) for all architectures. However, on some arm64
platforms, there is a mismatch between how the kernel maps the
GHES region (PAGE_KERNEL) and how the firmware maps it
(EFI_MEMORY_UC, ie. uncacheable), leading to the possibility of
the kernel GHES driver reading stale data from the cache when it
receives the interrupt.

With stale data being read, the kernel is unaware there is new
hardware error to be handled when there actually is; this may
lead to further damage in various scenarios, such as error
propagation caused data corruption. If uncorrected error (such
as double bit ECC error) happened in memory operation and if the
kernel is unaware of such an event happening, errorneous data may
be propagated to the disk.

Instead GHES memory region should be mapped with page protection
type according to what is returned from arch_apei_get_mem_attribute().
Signed-off-by: NJonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
[ Small stylistic tweaks. ]
Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441372302-23242-3-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>

8ece249a

08 7月, 2015 1 次提交

ACPI: Remove FSF mailing addresses · 4c62dbbc

由 Jarkko Nikula 提交于 6月 26, 2015

There is no need to carry potentially outdated Free Software Foundation
mailing address in file headers since the COPYING file includes it.
Signed-off-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4c62dbbc

28 4月, 2015 5 次提交

GHES: Make NMI handler have a single reader · 6fe9e7c2

由 Jiri Kosina 提交于 3月 27, 2015

Since GHES sources are global, we theoretically need only a single CPU
reading them per NMI instead of a thundering herd of CPUs waiting on a
spinlock in NMI context for no reason at all.

Do that.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NBorislav Petkov <bp@suse.de>

6fe9e7c2

GHES: Elliminate double-loop in the NMI handler · 2383844d

由 Borislav Petkov 提交于 3月 18, 2015

There's no real need to iterate twice over the HW error sources in the
NMI handler. With the previous cleanups, elliminating the second loop is
almost trivial.
Signed-off-by: NBorislav Petkov <bp@suse.de>

2383844d

GHES: Panic right after detection · 6169ddf8

由 Borislav Petkov 提交于 3月 18, 2015

The moment we log an error of panic severity, there's no need to noodle
through the ghes_nmi list anymore. So panic instead right then and
there.
Signed-off-by: NBorislav Petkov <bp@suse.de>

6169ddf8

GHES: Carve out the panic functionality · e10be03f

由 Borislav Petkov 提交于 3月 18, 2015

... into another function for more clarity. No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>

e10be03f

B
GHES: Carve out error queueing in a separate function · 11568496
由 Borislav Petkov 提交于 3月 18, 2015
```
Make the handler more readable.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
11568496

22 10月, 2014 2 次提交

B
GHES: Make ghes_estatus_caches static · 8f7c31f6
由 Borislav Petkov 提交于 9月 29, 2014
```
It is used only in ghes.c.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
8f7c31f6

APEI, GHES: Cleanup unnecessary function for lockless list · 8d21d4c9

由 Chen, Gong 提交于 7月 28, 2014

We have a generic function to reverse a lockless list, kill homegrown
copy.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1406530260-26078-2-git-send-email-gong.chen@linux.intel.comAcked-by: NTony Luck <tony.luck@intel.com>
Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
[ Boris: correct commit msg ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

8d21d4c9

20 10月, 2014 1 次提交

acpi: apei: drop owner assignment from platform_drivers · e61bf8d0

由 Wolfram Sang 提交于 10月 20, 2014

A platform_driver does not need to set an owner, it will be populated by the
driver core.
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>

e61bf8d0

23 7月, 2014 3 次提交

acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context. · 594c7255

由 Tomasz Nowicki 提交于 7月 22, 2014

GHES currently maps two pages with atomic_ioremap.  From now
on, NMI is architectural depended so there is no need to allocate
an NMI page for platforms without NMI support.

To make it possible to not use a second page, swap the existing
page order so that the IRQ context page is first, and the optional
NMI context page is second.  Then, use HAVE_ACPI_APEI_NMI to decide
how many pages are to be allocated.
Signed-off-by: NTomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

594c7255

acpi, apei, ghes: Make NMI error notification to be GHES architecture extension. · 44a69f61

由 Tomasz Nowicki 提交于 7月 22, 2014

Currently APEI depends on x86 architecture. It is because of NMI hardware
error notification of GHES which is currently supported by x86 only.
However, many other APEI features can be still used perfectly by other
architectures.

This commit adds two symbols:
1. HAVE_ACPI_APEI for those archs which support APEI.
2. HAVE_ACPI_APEI_NMI which is used for NMI code isolation in ghes.c
   file. NMI related data and functions are grouped so they can be wrapped
   inside one #ifdef section. Appropriate function stubs are provided for
   !NMI case.

Note there is no functional changes for x86 due to hard selected
HAVE_ACPI_APEI and HAVE_ACPI_APEI_NMI symbols.
Signed-off-by: NTomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

44a69f61

apei, mce: Factor out APEI architecture specific MCE calls. · 9dae3d0d

由 Tomasz Nowicki 提交于 7月 22, 2014

This commit abstracts MCE calls and provides weak corresponding default
implementation for those architectures which do not need arch specific
actions. Each platform willing to do additional architectural actions
should provides desired function definition. It allows us to avoid wrap
code into #ifdef in generic code and prevent new platform from introducing
dummy stub function too.

Initially, there are two APEI arch-specific calls:
- arch_apei_enable_cmcff()
- arch_apei_report_mem_error()
Both interact with MCE driver for X86 architecture.
Signed-off-by: NTomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

9dae3d0d

17 6月, 2014 1 次提交

ACPICA: Restore error table definitions to reduce code differences between... · 0a00fd5e

由 Lv Zheng 提交于 6月 03, 2014

ACPICA: Restore error table definitions to reduce code differences between Linux and ACPICA upstream.

The following commit has changed ACPICA table header definitions:

 Commit: 88f074f4
 Subject: ACPI, CPER: Update cper info

While such definitions are currently maintained in ACPICA. As the
modifications applying to the table definitions affect other OSPMs'
drivers, it is very difficult for ACPICA to initiate a process to
complete the merge. Thus this commit finally only leaves us divergences.

Revert such naming modifications to reduce the source code differecnes
between Linux and ACPICA upstream. No functional changes.
Signed-off-by: NLv Zheng <lv.zheng@intel.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

0a00fd5e

21 12月, 2013 2 次提交

ACPI, APEI, GHES: Cleanup ghes memory error handling · ca104edc

由 Chen, Gong 提交于 11月 25, 2013

Cleanup the logic in ghes_handle_memory_failure(). While at it, add
proper PFN validity check for UC error and cleanup the code logic to
make it simpler and cleaner.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1385363701-12387-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

ca104edc

ACPI, APEI, GHES: Do not report only correctable errors with SCI · addccbb2

由 Chen, Gong 提交于 11月 25, 2013

Currently SCI is employed to handle corrected errors - memory corrected
errors, more specifically but in fact SCI still can be used to handle
any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS.
Enable logging for those kinds of errors too.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message, rename function arg. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

addccbb2

07 12月, 2013 1 次提交

ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h> · 27d50c82

由 Lv Zheng 提交于 12月 06, 2013

To avoid build problems and breaking dependencies between ACPI header
files, <acpi/acpi.h> should not be included directly by code outside
of the ACPI core subsystem.  However, that is possible if
<linux/acpi_io.h> is included, because that file contains
a direct inclusion of <acpi/acpi.h>.

For this reason, remove the direct <acpi/acpi.h> inclusion from
<linux/acpi_io.h>, move that file from include/linux/ to include/acpi/
and make <linux/acpi.h> include it for CONFIG_ACPI set along with the
other ACPI header files.  Accordingly, Remove the inclusions of
<linux/acpi_io.h> from everywhere.

Of course, that causes the contents of the new <acpi/acpi_io.h> file
to be available for CONFIG_ACPI set only, so intel_opregion.o that
depends on it should also depend on CONFIG_ACPI (and it really should
not be compiled for CONFIG_ACPI unset anyway).

References: https://01.org/linuxgraphics/sites/default/files/documentation/acpi_igd_opregion_spec.pdf
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: NLv Zheng <lv.zheng@intel.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
[rjw: Subject and changelog]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

27d50c82

24 10月, 2013 1 次提交

ACPI, APEI, CPER: Add UEFI 2.4 support for memory error · 147de147

由 Chen, Gong 提交于 10月 18, 2013

In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.

Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

147de147

22 10月, 2013 1 次提交

ACPI, CPER: Update cper info · 88f074f4

由 Chen, Gong 提交于 10月 18, 2013

We have a lot of confusing names of functions and data structures in
amongs the the error reporting code.  In particular the "apei" prefix
has been applied to many objects that are not part of APEI.  Since we
will be using these routines for extended error log reporting it will
be clearer if we fix up the names first.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

88f074f4

11 7月, 2013 1 次提交

mce: acpi/apei: Soft-offline a page on firmware GHES notification · cf870c70

由 Naveen N. Rao 提交于 7月 10, 2013

If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

cf870c70

07 6月, 2013 1 次提交

ACPI / APEI: Force fatal AER severity when component has been reset · 0ba98ec9

由 Betty Dall 提交于 6月 06, 2013

The CPER error record has a reset bit that indicates that the platform
has reset the component. The reset bit can be set for any severity
error including recoverable.  From the AER code path's perspective,
any error is fatal if the component has been reset.  This patch
upgrades the severity of the AER recovery to AER_FATAL whenever the
CPER error record indicates that the component has been reset.

[bhelgaas: s/bus has been reset/component has been reset/]
Signed-off-by: NBetty Dall <betty.dall@hp.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

0ba98ec9

05 6月, 2013 1 次提交

ACPI / APEI: fix error return code in ghes_probe() · a98d4f64

由 Wei Yongjun 提交于 6月 03, 2013

Fix to return a negative error code in the acpi_gsi_to_irq() and
request_irq() error handling case instead of 0, as done elsewhere
in this function.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a98d4f64

31 5月, 2013 1 次提交

aerdrv: Move cper_print_aer() call out of interrupt context · 37448adf

由 Lance Ortiz 提交于 5月 30, 2013

The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer().  The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.
Signed-off-by: NLance Ortiz <lance.ortiz@hp.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

37448adf

26 2月, 2013 1 次提交

ghes: add the needed hooks for EDAC error report · 21480547

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

In order to allow reporting errors via EDAC, add hooks for:

1) register an EDAC driver;
2) unregister an EDAC driver;
3) report errors via EDAC.

As the EDAC driver will need to access the ghes structure, adds it
as one of the parameters for ghes_do_proc.
Acked-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

21480547

22 2月, 2013 1 次提交

ghes: move structures/enum to a header file · 40e06415

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

As a ghes_edac driver will need to access ghes structures, in order
to properly handle the errors, move those structures to a separate
header file. No functional changes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

40e06415

29 11月, 2012 1 次提交

acpi: remove use of __devinit · da095fd3

由 Bill Pemberton 提交于 11月 19, 2012

CONFIG_HOTPLUG is going away as an option so __devinit is no longer
needed.
Signed-off-by: NBill Pemberton <wfp5p@virginia.edu>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

da095fd3

22 11月, 2012 1 次提交

ACPI: remove use of __devexit · b59bc2fb

由 Bill Pemberton 提交于 11月 21, 2012

CONFIG_HOTPLUG is going away as an option so __devexit is no
longer needed.
Signed-off-by: NBill Pemberton <wfp5p@virginia.edu>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b59bc2fb

12 6月, 2012 1 次提交

ACPI, APEI, Avoid too much error reporting in runtime · 34ddeb03

由 Huang Ying 提交于 6月 12, 2012

This patch fixed the following bug.

https://bugzilla.kernel.org/show_bug.cgi?id=43282

This is caused by a firmware bug checking (checking generic address
register provided by firmware) in runtime.  The checking should be
done in address mapping time instead of runtime to avoid too much
error reporting in runtime.
Reported-by: NPawel Sikora <pluto@agmk.net>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Tested-by: NJean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLen Brown <len.brown@intel.com>

34ddeb03

17 1月, 2012 4 次提交

ACPI APEI: Convert atomicio routines · 700130b4

由 Myron Stowe 提交于 11月 07, 2011

APEI needs memory access in interrupt context.  The obvious choice is
acpi_read(), but originally it couldn't be used in interrupt context
because it makes temporary mappings with ioremap().  Therefore, we added
drivers/acpi/atomicio.c, which provides:
    acpi_pre_map_gar()     -- ioremap in process context
	acpi_atomic_read()     -- memory access in interrupt context
	acpi_post_unmap_gar()  -- iounmap

Later we added acpi_os_map_generic_address() (29718521) and enhanced
acpi_read() so it works in interrupt context as long as the address has
been previously mapped (620242ae).  Now this sequence:
    acpi_os_map_generic_address()    -- ioremap in process context
    acpi_read()/apei_read()          -- now OK in interrupt context
    acpi_os_unmap_generic_address()
is equivalent to what atomicio.c provides.

This patch introduces apei_read() and apei_write(), which currently are
functional equivalents of acpi_read() and acpi_write().  This is mainly
proactive, to prevent APEI breakages if acpi_read() and acpi_write()
are ever augmented to support the 'bit_offset' field of GAS, as APEI's
__apei_exec_write_register() precludes splitting up functionality
related to 'bit_offset' and APEI's 'mask' (see its
APEI_EXEC_PRESERVE_REGISTER block).

With apei_read() and apei_write() in place, usages of atomicio routines
are converted to apei_read()/apei_write() and existing calls within
osl.c and the CA, based on the re-factoring that was done in an earlier
patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
    acpi_pre_map_gar()     -->  acpi_os_map_generic_address()
    acpi_post_unmap_gar()  -->  acpi_os_unmap_generic_address()
    acpi_atomic_read()     -->  apei_read()
    acpi_atomic_write()    -->  apei_write()

Note that acpi_read() and acpi_write() currently use 'bit_width'
for accessing GARs which seems incorrect.  'bit_width' is the size of
the register, while 'access_width' is the size of the access the
processor must generate on the bus.  The 'access_width' may be larger,
for example, if the hardware only supports 32-bit or 64-bit reads.  I
wanted to minimize any possible impacts with this patch series so I
did *not* change this behavior.
Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

700130b4

ACPI, APEI, Printk queued error record before panic · 46d12f0b

由 Huang Ying 提交于 12月 08, 2011

Because printk is not safe inside NMI handler, the recoverable error
records received in NMI handler will be queued to be printked in a
delayed IRQ context via irq_work.  If a fatal error occurs after the
recoverable error and before the irq_work processed, we lost a error
report.

To solve the issue, the queued error records are printked in NMI
handler if system will go panic.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

46d12f0b

ACPI, APEI, GHES, Distinguish interleaved error report in kernel log · 5ba82ab5

由 Huang Ying 提交于 12月 08, 2011

In most cases, printk only guarantees messages from different printk
calling will not be interleaved between each other.  But, one APEI
GHES hardware error report will involve multiple printk calling,
normally each for one line.  So it is possible that the hardware error
report comes from different generic hardware error source will be
interleaved.

In this patch, a sequence number is prefixed to each line of error
report.  So that, even if they are interleaved, they still can be
distinguished by the prefixed sequence number.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

5ba82ab5

ACPI, APEI, GHES: Add PCIe AER recovery support · a654e5ee

由 Huang Ying 提交于 12月 08, 2011

aer_recover_queue() is called when recoverable PCIe AER errors are
notified by firmware to do the recovery work.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

a654e5ee

13 1月, 2012 1 次提交

module_param: make bool parameters really bool (drivers & misc) · 90ab5ee9

由 Rusty Russell 提交于 1月 13, 2012

module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

90ab5ee9

10 10月, 2011 1 次提交

x86, nmi: Wire up NMI handlers to new routines · 9c48f1c6

由 Don Zickus 提交于 9月 30, 2011

Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion.  A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NCorey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9c48f1c6

03 8月, 2011 3 次提交

APEI GHES: 32-bit buildfix · 70cb6e1d

由 Len Brown 提交于 8月 02, 2011

drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression

ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
  in function ghes_estatus_cache_add().
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NLen Brown <len.brown@intel.com>

70cb6e1d

ACPI, APEI, GHES: Add hardware memory error recovery support · ba61ca4a

由 Huang Ying 提交于 7月 13, 2011

memory_failure_queue() is called when recoverable memory errors are
notified by firmware to do the recovery work.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

ba61ca4a

ACPI, APEI, GHES, Error records content based throttle · 152cef40

由 Huang Ying 提交于 7月 13, 2011

printk is used by GHES to report hardware errors.  Ratelimit is
enforced on the printk to avoid too many hardware error reports in
kernel log.  Because there may be thousands or even millions of
corrected hardware errors during system running.

Currently, a simple scheme is used.  That is, the total number of
hardware error reporting is ratelimited.  This may cause some issues
in practice.

For example, there are two kinds of hardware errors occurred in
system.  One is corrected memory error, because the fault memory
address is accessed frequently, there may be hundreds error report
per-second.  The other is corrected PCIe AER error, it will be
reported once per-second.  Because they share one ratelimit control
structure, it is highly possible that only memory error is reported.

To avoid the above issue, an error record content based throttle
algorithm is implemented in the patch.  Where after the first
successful reporting, all error records that are same are throttled for
some time, to let other kinds of error records have the opportunity to
be reported.

In above example, the memory errors will be throttled for some time,
after being printked.  Then the PCIe AER error will be printked
successfully.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

152cef40

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功