提交 · 0b8080b5784ac48732feea01056637c00a4cfd8d · openanolis / cloud-kernel

17 1月, 2020 2 次提交

x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units · 0b8080b5

由 Yazen Ghannam 提交于 11月 19, 2019

commit 3ad7e748c12cc771df6020a552def3e1727e8a17 upstream.

The existing CS, PSP, and SMU SMCA bank types will see new versions (as
indicated by their McaTypes) in future SMCA systems.

Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
same names as the older versions, since they are logically the same to
the user. SMCA systems won't mix and match IP blocks with different
McaType versions in the same system, so there isn't a need to
distinguish them. The MCA_IPID register is saved when logging an MCA
error, and that can be used to triage the error.

Also, add the new error descriptions to edac_mce_amd. Some error types
(positions in the list) are overloaded compared to the previous
McaTypes. Therefore, just create new lists of the error descriptions to
keep things simple even if some of the error descriptions are the same
between versions.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.comSigned-off-by: NWANG Siyuan <Siyuan.Wang@amd.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

0b8080b5

x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types · 43886175

由 Yazen Ghannam 提交于 11月 19, 2019

commit cbfa447edd6a3825fdb8a4ffae74ff7208f2d2c0 upstream.

Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and
PCIE SMCA bank types.

Also, add their respective error descriptions to the MCE decoding module
edac_mce_amd.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.comSigned-off-by: NWANG Siyuan <Siyuan.Wang@amd.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

43886175

22 2月, 2018 1 次提交

x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type · 68627a69

由 Yazen Ghannam 提交于 2月 21, 2018

Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
bank 4 in the smca_banks array. This means that when we check if a bank
is initialized, like during boot or resume, we will see that bank 4 is
not initialized and try to initialize it.

This will cause a call trace, when resuming from suspend, due to
rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
an IPI but we're running with interrupts disabled. This triggers:

  WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
  ...

Reserved banks will be read-as-zero, so their MCA_IPID register will be
zero. So, like the smca_banks array, the threshold_banks array will not
have an entry for a reserved bank since all its MCA_MISC* registers will
be zero.

Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.

Use the "Reserved" type when checking if a bank is reserved. It's
possible that other bank numbers may be reserved on future systems.

Don't try to find the block address on reserved banks.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.14.x
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

68627a69

21 8月, 2017 3 次提交

B
EDAC, mce_amd: Get rid of local var in amd_filter_mce() · 39844347
由 Borislav Petkov 提交于 7月 25, 2017
```
... and use the macro for that.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
39844347

EDAC, mce_amd: Get rid of most struct cpuinfo_x86 uses · f3c0891c

由 Borislav Petkov 提交于 7月 25, 2017

struct mce.cpuid contains CPUID(1).EAX which contains family, model and
stepping and thus has enough information for our purposes. Thus get rid
of some external dependencies which are not really needed.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>

f3c0891c

EDAC, mce_amd: Rename decode_smca_errors() to decode_smca_error() · 4ab1784b

由 Borislav Petkov 提交于 7月 25, 2017

Singular fits better because it decodes a single error.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>

4ab1784b

17 7月, 2017 1 次提交

EDAC, mce_amd: Use cpu_to_node() to find the node ID · fbe63acf

由 Yazen Ghannam 提交于 3月 20, 2017

Using the homegrown amd_get_nb_id() to find a node ID on AMD was fine
while the L3 to node mapping was 1:1. And Zen topology broke this. So
let's start slowly moving away from it and use the topology interfaces
instead.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1490041614-90057-2-git-send-email-Yazen.Ghannam@amd.com
[ Massage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

fbe63acf

13 6月, 2017 1 次提交

EDAC, mce_amd: Fix typo in SMCA error description · bdf1bf17

由 Yazen Ghannam 提交于 6月 12, 2017

Fix typo in "poison consumption" error description.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1497286703-62853-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

bdf1bf17

16 2月, 2017 1 次提交

EDAC, mce_amd: Print IPID and Syndrome on a separate line · 75bf2f64

由 Yazen Ghannam 提交于 2月 15, 2017

Currently, the IPID and Syndrome are printed on the same line as the
Address. There are cases when we can have a valid Syndrome but not a
valid Address.

For example, the MCA_SYND register can be used to hold more detailed
error info that the hardware folks can use. It's not just DRAM ECC
syndromes. There are some error types that aren't related to memory that
may have valid syndromes, like some errors related to links in the Data
Fabric, etc.

In these cases, the IPID and Syndrome are not printed at the same log
level as the rest of the stanza, so users won't see them on the console.

Console:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Dmesg:
  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2

Print the IPID first and on a new line. The IPID should always be
printed on SMCA systems. The Syndrome will then be printed with the IPID
and at the same log level when valid:

  [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
  [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
  [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

75bf2f64

28 1月, 2017 1 次提交

EDAC, mce_amd: Give more context to deferred error message · 67d7fd30

由 Yazen Ghannam 提交于 1月 24, 2017

Users may not be familiar with the concept of deferred errors. There is
no action for users to take on this type of error, so give more context
in the error message to make this more clear.
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1485297149-13733-2-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

67d7fd30

24 1月, 2017 3 次提交

x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority · 9026cc82

由 Borislav Petkov 提交于 1月 23, 2017

Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

9026cc82

EDAC/mce/amd: Dump TSC value · 0bceab67

由 Borislav Petkov 提交于 1月 23, 2017

Dump the TSC value of the time when the MCE got logged.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-8-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

0bceab67

EDAC/mce/amd: Unexport amd_decode_mce() · 1fbcd909

由 Borislav Petkov 提交于 1月 23, 2017

It is not used outside of the driver anymore.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-7-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

1fbcd909

29 11月, 2016 1 次提交

EDAC, mce_amd: Don't report poison bit on Fam15h, bank 4 · a6c14dce

由 Yazen Ghannam 提交于 11月 18, 2016

MCA_STATUS[43] has been defined as "Poison" or "Reserved" for every bank
since Fam15h except for Fam15h, bank 4 in which case it's defined as
part of the McaStatSubCache bitfield.

Filter out that case.
Reported-by: NDean Liberty <Dean.Liberty@amd.com>
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1479478222-19896-1-git-send-email-Yazen.Ghannam@amd.com
[ Split an almost unparseable ternary conditional, add a comment. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

a6c14dce

21 11月, 2016 1 次提交

EDAC, mce_amd: Rename nb_bus_decoder to dram_ecc_decoder · 5c332202

由 Yazen Ghannam 提交于 11月 17, 2016

nb_bus_decoder() is only used for DRAM ECC errors so rename it so that
the name is more generic and descriptive.

Also, call it for DRAM ECC errors on SMCA systems.

[ Boris: rename it to real function name with a verb in it. ]
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1479423463-8536-4-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

5c332202

09 11月, 2016 3 次提交

x86/RAS: Hide SMCA bank names · c09a8c40

由 Borislav Petkov 提交于 11月 03, 2016

Add accessor functions and hide the smca_names array. Also, add a
sanity-check to bank HWID assignment in get_smca_bank_info().
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnicSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c09a8c40

x86/RAS: Rename smca_bank_names to smca_names · a9a1c0ee

由 Borislav Petkov 提交于 11月 02, 2016

Make it differ more from struct smca_bank_name for better readability.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NYazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

a9a1c0ee

x86/RAS: Simplify SMCA HWID descriptor struct · 1ce9cd7f

由 Borislav Petkov 提交于 11月 02, 2016

Call it simply smca_hwid and call local variables "hwid". More readable.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Tested-by: NYazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1ce9cd7f

13 9月, 2016 6 次提交

x86/MCE/AMD, EDAC: Handle reserved bank 4 on Fam17h properly · a884675b

由 Yazen Ghannam 提交于 9月 12, 2016

Bank 4 is reserved on family 0x17 and shouldn't generate any MCE
records. However, broken hardware and software is not something unheard
of so warn about bank 4 errors. They shouldn't be coming from bank 4
naturally but users can still use mce_amd_inj to simulate errors from it
for testing purposed.

Also, avoid special handling in the injector mce_amd_inj like it is
being done on the older families.

[ bp: Rewrite commit message and merge into one patch. Use boot_cpu_data. ]
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NAravind Gopalakrishnan  <aravindksg.lkml@gmail.com>
Link: http://lkml.kernel.org/r/1473384591-5323-1-git-send-email-Yazen.Ghannam@amd.com
Link: http://lkml.kernel.org/r/1473384591-5323-2-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

a884675b

x86/mce, EDAC/mce_amd: Print MCA_SYND and MCA_IPID during MCE on SMCA systems · 4b711f92

由 Yazen Ghannam 提交于 9月 12, 2016

The MCA_SYND and MCA_IPID registers contain valuable information and
should be included in MCE output. The MCA_SYND register contains
syndrome and other error information, and the MCA_IPID register will
uniquely identify the MCA bank's type without having to rely on system
software.
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472680624-34221-2-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4b711f92

x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types · 5896820e

由 Yazen Ghannam 提交于 9月 12, 2016

Scalable MCA defines a number of IP types. An MCA bank on an SMCA
system is defined as one of these IP types. A bank's type is uniquely
identified by the combination of the HWID and MCATYPE values read from
its MCA_IPID register.

Add the required tables in order to be able to lookup error descriptions
based on a bank's type and the error's extended error code.

[ bp: Align comments, simplify a bit. ]
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472741832-1690-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5896820e

EDAC/mce_amd: Use SMCA prefix for error descriptions arrays · 856095b1

由 Yazen Ghannam 提交于 9月 12, 2016

The error descriptions defined for Fam17h can be reused for other SMCA
systems, so their names should reflect this.

Change f17h prefix to smca for error descriptions.
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-4-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

856095b1

EDAC/mce_amd: Add missing SMCA error descriptions · c019b951

由 Yazen Ghannam 提交于 9月 12, 2016

Add missing SMCA error descriptions to the error descriptions arrays.
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1472673994-12235-3-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c019b951

EDAC/mce_amd: Print syndrome register value on SMCA systems · b300e873

由 Yazen Ghannam 提交于 9月 12, 2016

Print SyndV bit status and print the raw value of the MCA_SYND register.
Further decoding of the syndrome from struct mce.synd can be done in
other places where appropriate, e.g. DRAM ECC.

Boris: make the error stanza more compact by putting the error address
and syndrome on the same line:

  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:2 (17:0:0) MC4_STATUS[-|CE|-|PCC|AddrV|-|-|SyndV|CECC]: 0x96204100001e0117
  [Hardware Error]: Error Addr: 0x000000007f4c52e3, Syndrome: 0x0000000000000000
  [Hardware Error]: Invalid IP block specified.
  [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: RD
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1467633035-32080-2-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b300e873

12 5月, 2016 1 次提交

EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCA · a348ed83

由 Yazen Ghannam 提交于 5月 11, 2016

Use X86_FEATURE_SMCA when detecting if SMCA is available instead of
directly using CPUID 0x80000007_EBX.
Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

a348ed83

08 3月, 2016 1 次提交

x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors · be0aec23

由 Aravind Gopalakrishnan 提交于 3月 07, 2016

For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

[ Minor fixups. ]
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

be0aec23

13 8月, 2015 2 次提交

x86/mce: Kill drain_mcelog_buffer() · eef4dfa0

由 Borislav Petkov 提交于 8月 12, 2015

This used to flush out MCEs logged during early boot and which
were in the MCA registers from a previous system run. No need
for that now, since we've moved to a genpool.
Suggested-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

eef4dfa0

x86/mce: Remove the MCE ring for Action Optional errors · fd4cf79f

由 Chen, Gong 提交于 8月 12, 2015

Use unified genpool to save Action Optional error events and put
Action Optional error handling in the same notification chain as
MCE error decoding.
Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
[ Fold in subsequent patch from Boris for early boot logging. ]
Signed-off-by: NTony Luck <tony.luck@intel.com>
[ Correct a lot. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

fd4cf79f

14 7月, 2015 1 次提交

EDAC, mce_amd: Don't emit 'CE' for Deferred error · 99e1dfb7

由 Aravind Gopalakrishnan 提交于 7月 13, 2015

Currently, when decoding an MCE, we display 'CE' for a Deferred error, like
this:

[Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00095080813

When the 'UC' bit in the MCx_STATUS register is clear, the error status
is either a Corrected error or Deferred error as determined by the
'Deferred' bit. So do not print 'CE' on a deferred error.

Refer to AMD Error Scope Hierarchy table in a newer BKDG (example:
49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features").
Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1436788382-6463-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

99e1dfb7

25 11月, 2014 1 次提交

EDAC, MCE, AMD: Correct formatting of decoded text · 50872ccd

由 Borislav Petkov 提交于 11月 22, 2014

Write out MCx_ADDR into the more humanly readable "MCx Error Address"
and remove double colon in the output.

Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

50872ccd

05 11月, 2014 1 次提交

EDAC, MCE, AMD: Add decoding table for MC6 xec · bc4febe9

由 Aravind Gopalakrishnan 提交于 11月 04, 2014

Extended error code meanings are tabulated for other banks. Extend that
tradition for MC6 too.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

bc4febe9

14 7月, 2014 1 次提交

EDAC, MCE, AMD: Add MCE decoding for F15h M60h · eba4bfb3

由 Aravind Gopalakrishnan 提交于 7月 14, 2014

Add decoding logic for new Fam15h model 60h.

Tested using mce_amd_inj module and works fine.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1405098795-4678-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: simplify a bit. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

eba4bfb3

09 5月, 2014 1 次提交

EDAC, MCE, AMD: Remove leftover unused mask · c5c0903b

由 Borislav Petkov 提交于 5月 08, 2014

295d8cda ("EDAC, MCE, AMD: Drop local coreid reporting") removed the
code snippet which used that mask but forgot to drop the mask itself. Do
that now.
Signed-off-by: NBorislav Petkov <bp@suse.de>

c5c0903b

24 2月, 2014 1 次提交

MCE, AMD: Fix decoding module loading on unsupported hw · fd0f5fff

由 Borislav Petkov 提交于 2月 17, 2014

We want to still be able to issue some error information on systems for
which there is no decoding support (think older distro kernels here,
for example). Therefore, we allow module registration but skip the
per-family bank-specific decoders and issue the general information
only, i.e.:

[   46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error.
[   46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[   46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)

with the hope that it still contains helpful useful bits.
Suggested-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Tested-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

fd0f5fff

08 6月, 2013 1 次提交

EDAC, MCE, AMD: Add an MCE signature for new Fam15h models · aad19e51

由 Aravind Gopalakrishnan 提交于 6月 05, 2013

Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.
Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
 [ cleanup commit message and error string ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

aad19e51

23 1月, 2013 3 次提交

EDAC, MCE, AMD: Remove unneeded exports · 0f08669e

由 Borislav Petkov 提交于 12月 23, 2012

Initially, those strings describing different parts of an MCE message
were shared with amd64_edac and were therefore exported to modules.
However, all except pp_msgs are used only in one place right now so hide
them and make them static.

No functionality change.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>

0f08669e

EDAC, MCE, AMD: Add MCE decoding support for Family 16h · 980eec8b

由 Jacob Shin 提交于 12月 18, 2012

Add MCE decoding logic for AMD Family 16h processors.

Boris:

- drop unneeded uu_msgs export
- exit early in cat_mc1_mce and save us an indentation level
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>

980eec8b

EDAC, MCE, AMD: Make MC2 decoding per-family · 4a73d3de

由 Jacob Shin 提交于 12月 18, 2012

Currently only AMD Family 15h processors have special handling for MC2
errors. Since upcoming Family 16h will also need unique handling, let's
make MC2 handling part of amd_decoder_ops.
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>

4a73d3de

28 11月, 2012 2 次提交

MCE, AMD: Dump error status · d5c6770d

由 Borislav Petkov 提交于 9月 14, 2012

Dump error status after decoding the error which describes the error
disposition.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

d5c6770d

MCE, AMD: Report decoded error type first · d824c771

由 Borislav Petkov 提交于 9月 14, 2012

Instead of starting with the error details, report the decoded, readable
error type first.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

d824c771

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功