- 18 12月, 2019 1 次提交
-
-
由 Shirish S 提交于
[ Upstream commit 30aa3d26edb0f3d7992757287eec0ca588a5c259 ] The MC4_MISC thresholding quirk needs to be applied during S5 -> S0 and S3 -> S0 state transitions, which follow different code paths. Carve it out into a separate function and call it mce_amd_feature_init() where the two code paths of the state transitions converge. [ bp: massage commit message and the carved out function. ] Signed-off-by: NShirish S <shirish.s@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1547651417-23583-3-git-send-email-shirish.s@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>
-
- 26 1月, 2019 1 次提交
-
-
由 Borislav Petkov 提交于
[ Upstream commit 68b5e4326e4b8ac9080835005d8254fed0fb3c56 ] Add the proper includes and make smca_get_name() static. Fix an actual bug too which the warning triggered: arch/x86/kernel/cpu/mcheck/therm_throt.c:395:39: error: conflicting \ types for ‘smp_thermal_interrupt’ asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *r) ^~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/kernel/cpu/mcheck/therm_throt.c:29: ./arch/x86/include/asm/traps.h:107:17: note: previous declaration of \ ‘smp_thermal_interrupt’ was here asmlinkage void smp_thermal_interrupt(void); Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Yi Wang <wang.yi59@zte.com.cn> Cc: Michael Matz <matz@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811081633160.1549@nanos.tec.linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
-
- 06 12月, 2018 1 次提交
-
-
由 Borislav Petkov 提交于
commit 60c8144afc287ef09ce8c1230c6aa972659ba1bb upstream. Currently, the code sets up the thresholding interrupt vector and only then goes about initializing the thresholding banks. Which is wrong, because an early thresholding interrupt would cause a NULL pointer dereference when accessing those banks and prevent the machine from booting. Therefore, set the thresholding interrupt vector only *after* having initialized the banks successfully. Fixes: 18807ddb ("x86/mce/AMD: Reset Threshold Limit after logging error") Reported-by: NRafał Miłecki <rafal@milecki.pl> Reported-by: NJohn Clemens <clemej@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NRafał Miłecki <rafal@milecki.pl> Tested-by: NJohn Clemens <john@deater.net> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 19 5月, 2018 2 次提交
-
-
由 Borislav Petkov 提交于
We used rdmsr_safe_on_cpu() to make sure we're reading the proper CPU's MISC block addresses. However, that caused trouble with CPU hotplug due to the _on_cpu() helper issuing an IPI while IRQs are disabled. But we don't have to do that: the block addresses are the same on any CPU so we can read them on any CPU. (What practically happens is, we read them on the BSP and cache them, and for later reads, we service them from the cache). Suggested-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
... into a global, two-dimensional array and service subsequent reads from that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs disabled). In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage of the bank->blocks pointer. Fixes: 27bd5950 ("x86/mce/AMD: Get address from already initialized block") Reported-by: NJohannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: NJohannes Hirte <johannes.hirte@datenkhaos.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
-
- 22 2月, 2018 4 次提交
-
-
由 Yazen Ghannam 提交于
Carve out the SMCA code in get_block_address() into a separate helper function. No functional change. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> [ Save an indentation level. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180215210943.11530-4-Yazen.Ghannam@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
The block address is saved after the block is initialized when threshold_init_device() is called. Use the saved block address, if available, rather than trying to rediscover it. This will avoid a call trace, when resuming from suspend, due to the rdmsr_safe_on_cpu() call in get_block_address(). The rdmsr_safe_on_cpu() call issues an IPI but we're running with interrupts disabled. This triggers: WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0 Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.14.x Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180221101900.10326-8-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
Currently, bank 4 is reserved on Fam17h, so we chose not to initialize bank 4 in the smca_banks array. This means that when we check if a bank is initialized, like during boot or resume, we will see that bank 4 is not initialized and try to initialize it. This will cause a call trace, when resuming from suspend, due to rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue an IPI but we're running with interrupts disabled. This triggers: WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0 ... Reserved banks will be read-as-zero, so their MCA_IPID register will be zero. So, like the smca_banks array, the threshold_banks array will not have an entry for a reserved bank since all its MCA_MISC* registers will be zero. Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0. Use the "Reserved" type when checking if a bank is reserved. It's possible that other bank numbers may be reserved on future systems. Don't try to find the block address on reserved banks. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.14.x Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
Pass the bank number to smca_get_bank_type() since that's all we need. Also, we should compare the bank number to MAX_NR_BANKS (size of the smca_banks array) not the number of bank types. Bank types are reused for multiple banks, so the number of types can be different from the number of banks in a system and thus we could return an invalid bank type. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.14.x Cc: <stable@vger.kernel.org> # 4.14.x: 11cf8877 x86/MCE/AMD: Define a function to get SMCA bank type Cc: <stable@vger.kernel.org> # 4.14.x: c6708d50 x86/MCE: Report only DRAM ECC as memory errors on AMD systems Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20180221101900.10326-6-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 12月, 2017 2 次提交
-
-
由 Yazen Ghannam 提交于
The MCA_STATUS[ErrorCodeExt] field is very bank type specific. We currently check if the ErrorCodeExt value is 0x0 or 0x8 in mce_is_memory_error(), but we don't check the bank number. This means that we could flag non-memory errors as memory errors. We know that we want to flag DRAM ECC errors as memory errors, so let's do those cases first. We can add more cases later when needed. Define a wrapper function in mce_amd.c so we can use SMCA enums. [ bp: Remove brackets around return statements. ] Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171207203955.118171-2-Yazen.Ghannam@amd.com
-
由 Yazen Ghannam 提交于
Scalable MCA systems have various types of banks. The bank's type can determine how we handle errors from it. For example, if a bank represents a UMC (Unified Memory Controller) then we will need to convert its address from a normalized address to a system physical address before handling the error. [ bp: Verify m->bank is within range and use bank pointer. ] Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20171207203955.118171-1-Yazen.Ghannam@amd.com
-
- 05 12月, 2017 1 次提交
-
-
由 Yazen Ghannam 提交于
The McaIntrCfg register (MSRC000_0410), previously known as CU_DEFER_ERR, is used on SMCA systems to set the LVT offset for the Threshold and Deferred error interrupts. This register was used on non-SMCA systems to also set the Deferred interrupt type in bits 2:1. However, these bits are reserved on SMCA systems. Only set MSRC000_0410[2:1] on non-SMCA systems. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20171120162646.5210-1-Yazen.Ghannam@amd.com
-
- 05 10月, 2017 1 次提交
-
-
由 Borislav Petkov 提交于
Now that lguest is gone, put it in the internal header which should be used only by MCA/RAS code. Add missing header guards while at it. No functional change. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
-
- 29 8月, 2017 1 次提交
-
-
由 Thomas Gleixner 提交于
Machine checks are not really high frequency events. The extra two NOP5s for the disabled tracepoints are noise vs. the heavy lifting which needs to be done in the MCE handler. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.144301907@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 7月, 2017 1 次提交
-
-
由 Yazen Ghannam 提交于
Current SMCA implementations have the same banks on each CPU with the non-core banks only visible to a "master thread" on each die. Practically, this means the smca_banks array, which describes the banks, only needs to be populated once by a single master thread. CPU 0 seemed like a good candidate to do the populating. However, it's possible that CPU 0 is not enabled in which case the smca_banks array won't be populated. Rather than try to figure out another master thread to do the populating, we should just allow any CPU to populate the array. Drop the CPU 0 check and return early if the bank was already initialized. Also, drop the WARNing about an already initialized bank, since this will be a common, expected occurrence. The smca_banks array is only populated at boot time and CPUs are brought online sequentially. So there's no need for locking around the array. If the first CPU up is a master thread, then it will populate the array with all banks, core and non-core. Every CPU afterwards will return early. If the first CPU up is not a master thread, then it will populate the array with all core banks. The first CPU afterwards that is a master thread will skip populating the core banks and continue populating the non-core banks. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NJack Miller <jack@codezen.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170724101228.17326-4-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 6月, 2017 2 次提交
-
-
由 Yazen Ghannam 提交于
In the amd_threshold_interrupt() handler, we loop through every possible block in each bank and rediscover the block's address and if it's valid, e.g. valid, counter present and not locked. However, we already have the address saved in the threshold blocks list for each CPU and bank. The list only contains blocks that have passed all the valid checks. Besides the redundancy, there's also a smp_call_function* in get_block_address() which causes a warning when servicing the interrupt: WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0 ... Call Trace: <IRQ> rdmsr_safe_on_cpu() get_block_address.isra.2() amd_threshold_interrupt() smp_threshold_interrupt() threshold_interrupt() because we do get called in an interrupt handler *with* interrupts disabled, which can result in a deadlock. Drop the redundant valid checks and move the overflow check, logging and block reset into a separate function. Check the first block then iterate over the rest. This procedure is needed since the first block is used as the head of the list. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS. This may cause the following warning: unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000) Call Trace: <IRQ> smp_threshold_interrupt() threshold_interrupt() Use msr_stat instead which has the MSR address. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 37d43acf ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 5月, 2017 3 次提交
-
-
由 Yazen Ghannam 提交于
Scalable MCA systems have a new MCA_CONFIG register that we use to configure each bank. We currently use this when we set up thresholding. However, this is logically separate. Group all SMCA-related initialization into a single function. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-2-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yazen Ghannam 提交于
We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems. However, the guidance for current SMCA implementations of is to continue using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred error was not found in the former registers. If we logged a Deferred error in MCA_STATUS then we should also clear MCA_DESTAT. This also means we shouldn't clear MCA_CONFIG[LogDeferredInMcaStat]. Rework __log_error() to only log an error and add helpers for the different error types being logged from the corresponding interrupt handlers. Boris: carve out common functionality into a _log_error_bank(). Cleanup comments, check MCi_STATUS bits before reading MSRs. Streamline flow. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Elena Reshetova 提交于
The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Suggested-by: NKees Cook <keescook@chromium.org> Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com> Reviewed-by: NDavid Windsor <dwindsor@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1492695536-5947-1-git-send-email-elena.reshetova@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 31 3月, 2017 1 次提交
-
-
由 Yazen Ghannam 提交于
MCA bank 3 is reserved on systems pre-Fam17h, so it didn't have a name. However, MCA bank 3 is defined on Fam17h systems and can be accessed using legacy MSRs. Without a name we get a stack trace on Fam17h systems when trying to register sysfs files for bank 3 on kernels that don't recognize Scalable MCA. Call MCA bank 3 "decode_unit" since this is what it represents on Fam17h. This will allow kernels without SMCA support to see this bank on Fam17h+ and prevent the stack trace. This will not affect older systems since this bank is reserved on them, i.e. it'll be ignored. Tested on AMD Fam15h and Fam17h systems. WARNING: CPU: 26 PID: 1 at lib/kobject.c:210 kobject_add_internal kobject: (ffff88085bb256c0): attempted to be registered with empty name! ... Call Trace: kobject_add_internal kobject_add kobject_create_and_add threshold_create_device threshold_init_device Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1490102285-3659-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 24 1月, 2017 2 次提交
-
-
由 Borislav Petkov 提交于
Add the TSC value to the MCE record only when the MCE being logged is precise, i.e., it is logged as an exception or an MCE-related interrupt. So it doesn't look particularly easy to do without touching/changing a bunch of places. That's why I'm trying tricks first. For example, the mce-apei.c case I'm addressing by setting ->tsc only for errors of panic severity. The idea there is, that, panic errors will have raised an #MC and not polled. And then instead of propagating a flag to mce_setup(), it seems easier/less code to set ->tsc depending on the call sites, i.e., are we polling or are we preparing an MCE record in an exception handler/thresholding interrupt. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-5-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
Currently, we append the MCA_IPID[InstanceId] to the bank name to create the sysfs filename. The InstanceId field uniquely identifies a bank instance but it doesn't look very nice for most banks. Replace the InstanceId with a simpler, ascending (0, 1, ..) value. Only use this in the sysfs name when there is more than 1 instance. Otherwise, just use the bank's name as the sysfs name. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484322741-41884-3-git-send-email-Yazen.Ghannam@amd.com Link: http://lkml.kernel.org/r/20170123183514.13356-4-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 1月, 2017 1 次提交
-
-
This patch adds the __irq_entry annotation to the default x86 platform IRQ handlers. ftrace's function_graph tracer uses the __irq_entry annotation to notify the entry and return of IRQ handlers. For example, before the patch: 354549.667252 | 3) d..1 | default_idle_call() { 354549.667252 | 3) d..1 | arch_cpu_idle() { 354549.667253 | 3) d..1 | default_idle() { 354549.696886 | 3) d..1 | smp_trace_reschedule_interrupt() { 354549.696886 | 3) d..1 | irq_enter() { 354549.696886 | 3) d..1 | rcu_irq_enter() { After the patch: 366416.254476 | 3) d..1 | arch_cpu_idle() { 366416.254476 | 3) d..1 | default_idle() { 366416.261566 | 3) d..1 ==========> | 366416.261566 | 3) d..1 | smp_trace_reschedule_interrupt() { 366416.261566 | 3) d..1 | irq_enter() { 366416.261566 | 3) d..1 | rcu_irq_enter() { KASAN also uses this annotation. The smp_apic_timer_interrupt() was already annotated. Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Claudio Fontana <claudio.fontana@huawei.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicolai Stange <nicstange@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/059fdf437c2f0c09b13c18c8fe4e69999d3ffe69.1483528431.git.bristot@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
If mce_device_init() fails then the mce device pointer is NULL and the AMD mce code happily dereferences it. Add a sanity check. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
One include less is always a good thing(tm). Good riddance. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20161209182912.2726-6-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 22 11月, 2016 1 次提交
-
-
由 Yazen Ghannam 提交于
The Unified Memory Controllers (UMCs) on Fam17h log a normalized address in their MCA_ADDR registers. We need to convert that normalized address to a system physical address in order to support a few facilities: 1) To offline poisoned pages in DRAM proactively in the deferred error handler. 2) To print sysaddr and page info for DRAM ECC errors in EDAC. [ Boris: fixes/cleanups ontop: * hi_addr_offset = 0 - no need for that branch. Stick it all under the HiAddrOffsetEn case. It confines hi_addr_offset's declaration too. * Move variables to the innermost scope they're used at so that we save on stack and not blow it up immediately on function entry. * Do not modify *sys_addr prematurely - we want to not exit early and have modified *sys_addr some, which callers get to see. We either convert to a sys_addr or we don't do anything. And we signal that with the retval of the function. * Rename label out -> out_err - because it is the error path. * No need to pr_err of the conversion failed case: imagine a sparsely-populated machine with UMCs which don't have DIMMs. Callers should look at the retval instead and issue a printk only when really necessary. No need for useless info in dmesg. * s/temp_reg/tmp/ and other variable names shortening => shorter code. * Use BIT() everywhere. * Make error messages more informative. * Small build fix for the !CONFIG_X86_MCE_AMD case. * ... and more minor cleanups. ] Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161122111133.mjzpvzhf7o7yl2oa@pd.tnic [ Typo fixes. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 11月, 2016 1 次提交
-
-
由 Borislav Petkov 提交于
So adding thresholding_en et al was a good thing for removing the per-CPU thresholding callback, i.e., threshold_cpu_callback. But, in order for it to work and especially that test in mce_threshold_create_device() so that all thresholding banks get properly created and not the whole thing to fail with a NULL ptr dereference at mce_cpu_pre_down() when we offline the CPUs, we need to set the thresholding_en flag *before* we start creating the devices. Yap, it failed because thresholding_en wasn't set at the time we were creating the banks so we didn't create any and then at mce_cpu_pre_down() -> mce_threshold_remove_device() time, we would blow up. And the fix is actually easy: we have thresholding on the system when we have managed to set the thresholding vector to amd_threshold_interrupt() earlier in mce_amd_feature_init() while we were picking apart the thresholding banks and what is set and what not. So let's do that. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Fixes: 4d7b02d5 ("x86/mcheck: Split threshold_cpu_callback into two callbacks") Link: http://lkml.kernel.org/r/20161119103402.5227-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 11月, 2016 5 次提交
-
-
由 Yazen Ghannam 提交于
The error count field in MCA_MISC does not get reset by hardware when the threshold has been reached. Software is expected to reset it. Currently, the threshold limit only gets reset during init or when a user writes to sysfs. If the user is not monitoring threshold interrupts and resetting the limit then the user will only see 1 interrupt when the limit is first hit. So if, for example, the limit is set to 10 then only 1 interrupt will be recorded after 10 errors even if 100 errors have occurred. The user may then assume that only 10 errors have occurred. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479244433-69267-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
The threshold_cpu_callback callbacks looks like one of the notifier and its arguments are almost the same. Split this out and have one ONLINE and one DEAD callback. This will come handy later once the main code gets changed to use the callback mechanism. Also, handle threshold_cpu_callback_online() return value so we don't continue if the function fails. Boris Petkov removed the callback pointer and replaced it with proper functions. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NBorislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-5-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
If we try a CPU down and fail in the middle then we roll back to the online state. This means we would perform CPU_ONLINE / mce_device_create() without invoking CPU_DEAD / mce_device_remove() for the cleanup of what was allocated in CPU_ONLINE. Be prepared for this and don't allocate the struct if we have it already. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NBorislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-4-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
If the ONLINE callback fails, the driver does not any clean up right away instead it waits to get to the DEAD stage to do it. Yes, it waits. Since we don't pass the error code back to the caller, no one knows. Do the clean up right away so it does not look like a leak. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NBorislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-3-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
Move the threshold_create_device() so it can use threshold_remove_device() without a forward declaration. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: NBorislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: rt@linutronix.de Cc: linux-edac@vger.kernel.org Link: http://lkml.kernel.org/r/20161110174447.11848-2-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 09 11月, 2016 4 次提交
-
-
由 Borislav Petkov 提交于
Add accessor functions and hide the smca_names array. Also, add a sanity-check to bank HWID assignment in get_smca_bank_info(). Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161104152317.5r276t35df53qk76@pd.tnicSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Make it differ more from struct smca_bank_name for better readability. Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-3-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Call it simply smca_hwid and call local variables "hwid". More readable. Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Call the struct simply smca_bank, it's instance ID can be simply ->id. Makes the code much more readable. Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20161103125556.15482-1-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 13 9月, 2016 2 次提交
-
-
由 Yazen Ghannam 提交于
The MCA_ADDR registers on Scalable MCA systems contain the ErrorAddr in bits [55:0] and the least significant bit of the address in bits [61:56]. We should extract the valid ErrorAddr bits from the MCA_ADDR register rather than saving the raw value to struct mce. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1473275643-1721-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yazen Ghannam 提交于
The MCA_IPID register uniquely identifies a bank's type and instance on Scalable MCA systems. We should save the value of this register in struct mce along with the other relevant error information. This ensures that we can decode errors without relying on system software to correlate the bank to the type. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1472680624-34221-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-