- 17 2月, 2015 1 次提交
-
-
由 Daniel J Blueman 提交于
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: NDaniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com Cc: stable@vger.kernel.org [ Boris: massage commit message ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 05 11月, 2014 1 次提交
-
-
由 Tomasz Pala 提交于
By popular demand, enable amd64_edac on 32-bit too. Boris: - update Kconfig text. - add a warning on load which states that 32-bit configurations are unsupported. Signed-off-by: NTomasz Pala <gotar@polanet.pl> Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.plSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 30 10月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
This patch adds support for ECC error decoding for F15h M60h processor. Aside from the usual changes, the patch adds support for some new features in the processor: - DDR4(unbuffered, registered); LRDIMM DDR3 support - relevant debug messages have been modified/added to report these memory types - new dbam_to_cs mappers - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find cs_size. This multiplier value is obtained from the per-dimm DCSM register. So, change the interface to accept a 'cs_mask_nr' value to facilitate this calculation - switch-casing determine_memory_type() - done to cleanse the function of too many if-else statements and improve readability - This is now called early in read_mc_regs() to cache dram_type Misc cleanup: - amd64_pci_table[] is condensed by using PCI_VDEVICE macro. Testing details: Tested the patch by injecting 'ECC' type errors using mce_amd_inj and error decoding works fine. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: determine_memory_type() cleanups ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 23 9月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
Rationale behind this change: - F2x1xx addresses were stopped from being mapped explicitly to DCT1 from F15h (OR) onwards. They use _dct[0:1] mechanism to access the registers. So we should move away from using address ranges to select DCT for these families. - On newer processors, the address ranges used to indicate DCT1 (0x140, 0x1a0) have different meanings than what is assumed currently. Changes introduced: - amd64_read_dct_pci_cfg() now takes in dct value and uses it for 'selecting the dct' - Update usage of the function. Keep in mind that different families have specific handling requirements - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different from amd64_read_pci_cfg() - Move the k8 specific check to amd64_read_pci_cfg - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg() - Remove now needless .read_dct_pci_cfg Testing: - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG' and mce_amd_inj - driver obtains info from F2x registers and caches it in pvt structures correctly - ECC decoding works fine Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 28 2月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC turned on using mce_amd_inj module and the patch works fine. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.comTested-by: NArindam Nath <Arindam.Nath@amd.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 07 2月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
Update current channel selection logic to include F15h, M30h memory controllers. Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low) (Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf) Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 16 12月, 2013 3 次提交
-
-
由 Borislav Petkov 提交于
No need for the namespace tagging there. Cleanup setup_pci_device while at it. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Drop wrapper function and prefixes. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Rashika Kheria 提交于
This patch marks the function amd64_decode_bus_error() as static because it is not used outside of amd64_edac.c. It also eliminates the following warning: drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes] Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 06 12月, 2013 2 次提交
-
-
由 Jingoo Han 提交于
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: NJingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Aravind Gopalakrishnan 提交于
The value returned from 'f15_m30h_determine_channel' will always be 0x3 max. The condition (channel > 4 || channel < 0) works as hardware never returns a value of 4, but it leads to static checker analysis errors like http://marc.info/?l=linux-edac&m=138607615131951&w=2. Fix that. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain [ Boris: massage commit message a bit. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 22 10月, 2013 1 次提交
-
-
由 Chen, Gong 提交于
GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 27 8月, 2013 2 次提交
-
-
由 Aravind Gopalakrishnan 提交于
dct_base and dct_limit obtain 32 bit register values when they read their respective pci config space registers. A left shift beyond 32 bits will cause them to wrap around. Similar case for chan_addr as can be seen from the bug report (link below). In the patch, we rectify this by casting chan_addr to u64 and by comparing dct_base and dct_limit against properly shifted sys_addr in order to compare the correct bits. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountainSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later. Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 12 8月, 2013 2 次提交
-
-
由 Borislav Petkov 提交于
Now that we cache (family, model, stepping) locally, use them instead of boot_cpu_data. No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Aravind Gopalakrishnan 提交于
On newer models, support has been included for upto 4 DCT's, however, only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10). Also, the routing DRAM Requests algorithm is different for F15h M30h. Thus it is cleaner to use a brand new function rather than adding quirks to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM Routing Requests" in the BKDG for further info. Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and verified to be functionally correct. While at it, verify if erratum workarounds for E505 and E637 still hold. From email conversations within AMD, the current status of the errata is: * Erratum 505: fixed in model 0x1, stepping 0x1 and later. * Erratum 637: not fixed. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Cleanups, corrections ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 29 7月, 2013 1 次提交
-
-
由 Borislav Petkov 提交于
It can happen that configurations are running in a single-channel mode even with a dual-channel memory controller, by, say, putting the DIMMs only on the one channel and leaving the other empty. This causes a problem in init_csrows which implicitly assumes that when the second channel is enabled, i.e. channel 1, the struct dimm hierarchy will be present. Which is not. So always allocate two channels unconditionally. This provides for the nice side effect that the data structures are initialized so some day, when memory hotplug is supported, it should just work out of the box when all of a sudden a second channel appears. Reported-and-tested-by: NRoger Leigh <rleigh@debian.org> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 19 4月, 2013 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 16 3月, 2013 2 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Mauro Carvalho Chehab 提交于
We were filling the csrow size with a wrong value. 16a528ee ("EDAC: Fix csrow size reported in sysfs") tried to address the issue. It fixed the report with the old API but not with the new one. Correct it for the new API too. Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com> [ make it a per-csrow accounting regardless of ->channel_count ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 23 1月, 2013 1 次提交
-
-
由 Borislav Petkov 提交于
5e2af0c0 ("edac: Don't initialize csrow's first_page & friends when not needed") removed useless initialization of variables but left in the functions which did that. They're unused now so drop them. Signed-off-by: NBorislav Petkov <bp@alien8.de>
-
- 10 1月, 2013 4 次提交
-
-
由 Daniel J Blueman 提交于
Use appropriate types for northbridge IDs and memory ranges. Mark immutable data const and keep within compilation unit on related structures. Signed-off-by: NDaniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com [Boris: Drop arg change to node_to_amd_nb] Signed-off-by: NBorislav Petkov <bp@alien8.de>
-
由 Daniel J Blueman 提交于
Fix locating sibling memory controller PCI functions by using the correct PCI domain and use a northbridge descriptor only if found. We need to at least warn if it wasn't found so that it gets fixed and we don't go off with wrong results. Signed-off-by: NDaniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com [Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails] Signed-off-by: NBorislav Petkov <bp@alien8.de>
-
由 Daniel J Blueman 提交于
Change amd_get_nb_id to return u16 to support >255 memory controllers, and related consistency fixes. Signed-off-by: NDaniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.comSigned-off-by: NBorislav Petkov <bp@alien8.de>
-
由 Daniel J Blueman 提交于
Fix get_node_id to match northbridge IDs from the array of detected ones, allowing multi-server support such as with Numascale's NumaConnect, renaming to 'amd_get_node_id' for consistency. Signed-off-by: NDaniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com [Boris: shorten lines to fit 80 cols] Signed-off-by: NBorislav Petkov <bp@alien8.de>
-
- 04 1月, 2013 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 11月, 2012 9 次提交
-
-
由 Borislav Petkov 提交于
On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
The first flag is ->csbased and will be used in common EDAC code later. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
Make sure code pays attention to K8 having only one DCT, reformat and cleanup code, correct debug messages, remove unused code. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
Instead of open-coding it, use the DBAM_DIMM macro in amd64_csrow_nr_pages() which we have already. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
This basically reverts 603adaf6 ("amd64_edac: fix K8 chip select reporting") because it was a clumsy workaround for DIMM sizes reporting on K8 which got superceded by a much more correct one with 41d8bfab ("amd64_edac: Improve DRAM address mapping") without removing the prior one. Remove it now finally. Reported-by: NJosh Hunt <johunt@akamai.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
Rewrite CE/UE paths so that they use the same code and drop additional code duplication in handle_ue. Add a struct err_info which collects required info for the error reporting. This, in turn, helps slimming all edac_mc_handle_error() calls down to one. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
All families report a valid error address when encountering a DRAM ECC error so no need to check it. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine does this by injecting the error in the next non-cached access. This takes relatively long time on a normal system so that in order for us to expedite it, we disable the caches around the injection. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
amd64_get_dram_hole_info: remove local variable 'base'. sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize constants formatting. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 24 10月, 2012 1 次提交
-
-
由 Andrew Morton 提交于
If none of the elements in scrubrates[] matches, this loop will cause __amd64_set_scrub_rate() to incorrectly use the n+1th element. As the function is designed to use the final scrubrates[] element in the case of no match, we can fix this bug by simply terminating the array search at the n-1th element. Boris: this code is fragile anyway, see here why: http://marc.info/?l=linux-kernel&m=135102834131236&w=2 It will be rewritten more robustly soonish. Reported-by: NDenis Kirjanov <kirjanov@gmail.com> Cc: stable@vger.kernel.org Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 12 6月, 2012 4 次提交
-
-
由 Mauro Carvalho Chehab 提交于
In order to avoid loosing error events, it is desirable to group error events together and generate a single trace for several identical errors. The trace API already allows reporting multiple errors. Change the handle_error function to also allow that. The changes at the drivers were made by this small script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g; print $file; Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
由 Mauro Carvalho Chehab 提交于
Remove the arch-dependent parameter, as it were not used, as the MCE tracepoint weren't implemented. It probably doesn't make sense to have an MCE-specific tracepoint, as this will cost more bytes at the tracepoint, and tracepoint is not free. The changes at the EDAC drivers were done by this small perl script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g; print $file; Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
由 Mauro Carvalho Chehab 提交于
The EDAC driver name doesn't help to handle EDAC errors. So, remove it from the EDAC error messages, preserving only the error_message. Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
由 Joe Perches 提交于
Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-