- 25 5月, 2017 5 次提交
-
-
由 Borislav Petkov 提交于
... to slim down get_dimm_config(). No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
It is called "sb_edac.c" now. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Qiuxu Zhuo 提交于
Tony pointed out: "currently the driver pretends there is one big 8-channel memory controller per socket instead of 2 4-channel controllers. This is fine with all memory controller populated with symmetrical DIMM configurations, but runs into difficulties on asymmetrical setups". Restructure the driver to assign an EDAC memory controller to each real h/w memory controller to resolve the issue. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com [ Break some lines at convenient points. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Tony Luck 提交于
EDAC assigns logical memory controller numbers in the order that we find memory controllers, which depends on which PCI bus they are on. Some systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on socket3. All this is made more confusing for users because we use the string "Socket" while generating names for memory controllers, but the number that we attach there is the memory controller number. E.g. EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT) Change the names to say "SrcID#%d" (where the number we use is read from the h/w associated with the memory controller instead of some logical number internal to the EDAC driver). New message: EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT) Reported-by: NAndrey Korolyov <andrey@xdel.ru> Reported-by: NPatrick Geary <patrickg@supermicro.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Qiuxu Zhuo 提交于
Each of the PCI device IDs belongs to a CPU socket, or to one of the integrated memory controllers. Provide an enum to specify the domain of each, and distinguish the resource number in each domain: the number of the PCI device IDs per integrated memory controller/socket, and the number of integrated memory controllers per socket. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com [ Realign pci_dev_descr_knl members. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 24 5月, 2017 1 次提交
-
-
由 Tobias Klauser 提交于
struct irq_domain_ops is not modified, so it can be made const. Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Cc: Thor Thayer <thor.thayer@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170524133505.1233-1-tklauser@distanz.chSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 03 5月, 2017 1 次提交
-
-
由 Yazen Ghannam 提交于
The wrong index into the csbases/csmasks arrays was being passed to the function to compute the chip select sizes, which resulted in the wrong size being computed. Address that so that the correct values are computed and printed. Also, redo how we calculate the number of pages in a CS row. Reported-by: NBenjamin Bennett <benbennett@gmail.com> Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Cc: <stable@vger.kernel.org> # 4.10.x Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493313114-11260-1-git-send-email-Yazen.Ghannam@amd.com [ Remove unneeded integer math comment, minor cleanups. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 27 4月, 2017 1 次提交
-
-
由 Borislav Petkov 提交于
Leave it to the user to decide whether to enable this or not. Otherwise, platform-specific drivers won't initialize (currently, EDAC supports only a single platform driver loaded). Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 10 4月, 2017 10 次提交
-
-
由 Borislav Petkov 提交于
Change them to have the edac_ prefix. No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Move the remaining functionality to edac_mc.c. Convert "edac_report=" to a module parameter. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Remove the old URLs. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Move all the EDAC core functionality behind CONFIG_EDAC and get rid of that indirection. Update defconfigs which had it. While at it, fix dependencies such that EDAC depends on RAS for the tracepoints. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-edac@vger.kernel.org
-
由 Borislav Petkov 提交于
... and this happens only when CONFIG_RAS is enabled. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
... as part of moving stuff away from edac_stub.c Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
... and the glue around it. It is not needed anymore. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Use mc_devices list instead to check whether we have EDAC driver instances successfully registered with EDAC core. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Apparently, some machines used to report DRAM errors through a PCI SERR NMI. This is why we have a call into EDAC in the NMI handler. See c0d12172 ("drivers/edac: add new nmi rescan"). From looking at the patch above, that's two drivers: e752x_edac.c and e7xxx_edac.c. Now, I wanna say those are old machines which are probably decommissioned already. Tony says that "[t]the newest CPU supported by either of those drivers is the Xeon E7520 (a.k.a. "Nehalem") released in Q1'2010. Possibly some folks are still using these ... but people that hold onto h/w for 7 years generally cling to old s/w too ... so I'd guess it unlikely that we will get complaints for breaking these in upstream." So even if there is a small number still in use, we did load EDAC with edac_op_state == EDAC_OPSTATE_POLL by default (we still do, in fact) which means a default EDAC setup without any parameters supplied on the command line or otherwise would never even log the error in the NMI handler because we're polling by default: inline int edac_handler_set(void) { if (edac_op_state == EDAC_OPSTATE_POLL) return 0; return atomic_read(&edac_handlers); } So, long story short, I'd like to get rid of that nastiness called edac_stub.c and confine all the EDAC drivers solely to drivers/edac/. If we ever have to do stuff like that again, it should be notifiers we're using and not some insanity like this one. Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com>
-
由 Borislav Petkov 提交于
... like the rest of the file. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 07 4月, 2017 2 次提交
-
-
由 Sergey Temerkhanov 提交于
Remove unused code reserved for upcoming CPUs. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSergey Temerkhanov <s.temerkhanov@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Jan.Glauber@cavium.com Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170406113834.17153-1-s.temerkhanov@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Sergey Temerkhanov 提交于
Shift the node number by 3 bits instead of 8 allowing proper functioning with default EDAC_MAX_MCS. Signed-off-by: NSergey Temerkhanov <s.temerkhanov@gmail.com> Cc: David Daney <david.daney@cavium.com> Cc: Jan.Glauber@cavium.com Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170406113755.17082-1-s.temerkhanov@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 06 4月, 2017 1 次提交
-
-
由 Thor Thayer 提交于
The peripherals' RAS functionality only exist on the Arria10 SoCFPGA. The Cyclone5 initialization generates EDAC warnings when the peripherals aren't found in the device tree. Fix by checking for Arria10 in the init functions. Signed-off-by: NThor Thayer <thor.thayer@linux.intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1491415262-5018-1-git-send-email-thor.thayer@linux.intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 05 4月, 2017 1 次提交
-
-
由 Jan Glauber 提交于
Fix a typo that disabled the MCI interrupts using the wrong bitmask. Signed-off-by: NJan Glauber <jglauber@cavium.com> Cc: David Daney <david.daney@cavium.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Sergey Temerkhanov <s.temerkhanov@gmail.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170405102739.6301-1-jglauber@cavium.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 27 3月, 2017 1 次提交
-
-
由 Sergey Temerkhanov 提交于
Add support for Cavium ThunderX EDAC capable on-chip peripherals, namely the DRAM controller (LMC), cache coherent processor interconnect (CCPI) and level 2 cache blocks (L2C-TAD, L2C-MCI, L2C-CBC) Signed-off-by: NSergey Temerkhanov <s.temerkhanov@gmail.com> Cc: David.Daney@cavium.com Cc: Jan.Glauber@cavium.com Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170324222837.60583-1-s.temerkhanov@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 26 3月, 2017 1 次提交
-
-
由 Qiuxu Zhuo 提交于
DIMM number passed to edac_mc_handle_error() was accidentally hardcoded to zero. Pass in the correct daddr->dimm value. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 23 3月, 2017 2 次提交
-
-
由 Borislav Petkov 提交于
Provide debugfs function stubs when EDAC_DEBUG is not enabled so that we don't fail the build: drivers/edac/pnd2_edac.c: In function ‘pnd2_init’: drivers/edac/pnd2_edac.c:1521:2: error: implicit declaration of function ‘setup_pnd2_debug’ [-Werror=implicit-function-declaration] setup_pnd2_debug(); ^ drivers/edac/pnd2_edac.c: In function ‘pnd2_exit’: drivers/edac/pnd2_edac.c:1529:2: error: implicit declaration of function ‘teardown_pnd2_debug’ [-Werror=implicit-function-declaration] teardown_pnd2_debug(); ^ Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
The debugfs.c functionality relies on DEBUG_FS so select it. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 16 3月, 2017 1 次提交
-
-
由 Tony Luck 提交于
Initial target for this driver is the Intel Apollo Lake platform and Denverton micro-server, they use the same internal memory controller IP called Pondicherry2. Memory controller registers are not in PCI config space like earlier Intel memory controllers. For Apollo Lake platform they are accessed via a "side-band" interface, for Denverton micro-server they are access via PCI config space and memory map I/O. This driver is for Apollo Lake and Denverton, but only the Denverton is fully enabled while we wait for the sideband driver. Apollo lake driver and initial cut at Denverton driver by Tony Luck. Extensive cleanup, refactoring and basic verification by Qiuxu Zhuo. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170308174539.14432-1-qiuxu.zhuo@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 09 3月, 2017 1 次提交
-
-
由 Jérémy Lefaure 提交于
The MTR_DRAM_WIDTH macro returns the data width. It is sometimes used as if it returned a boolean true if the width if 8. Fix the tests where MTR_DRAM_WIDTH is misused. Signed-off-by: NJérémy Lefaure <jeremy.lefaure@lse.epita.fr> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170309011809.8340-1-jeremy.lefaure@lse.epita.frSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 07 3月, 2017 1 次提交
-
-
由 Colin Ian King 提交于
Fix spelling mistake in dev_err message. Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NLoc Ho <lho@apm.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170223002609.9440-1-colin.king@canonical.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 16 2月, 2017 1 次提交
-
-
由 Yazen Ghannam 提交于
Currently, the IPID and Syndrome are printed on the same line as the Address. There are cases when we can have a valid Syndrome but not a valid Address. For example, the MCA_SYND register can be used to hold more detailed error info that the hardware folks can use. It's not just DRAM ECC syndromes. There are some error types that aren't related to memory that may have valid syndromes, like some errors related to links in the Data Fabric, etc. In these cases, the IPID and Syndrome are not printed at the same log level as the rest of the stanza, so users won't see them on the console. Console: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Dmesg: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002 [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Print the IPID first and on a new line. The IPID should always be printed on SMCA systems. The Syndrome will then be printed with the IPID and at the same log level when valid: [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000 [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2 Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 14 2月, 2017 1 次提交
-
-
由 Borislav Petkov 提交于
Last time we did that was when we enabled Bulldozer. Now, we enabled Zen so it is only natural ... :-) Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
-
- 10 2月, 2017 1 次提交
-
-
由 Wei Yongjun 提交于
Fix the following sparse warnings: drivers/edac/fsl_ddr_edac.c:148:1: warning: symbol 'dev_attr_inject_data_hi' was not declared. Should it be static? drivers/edac/fsl_ddr_edac.c:150:1: warning: symbol 'dev_attr_inject_data_lo' was not declared. Should it be static? drivers/edac/fsl_ddr_edac.c:152:1: warning: symbol 'dev_attr_inject_ctrl' was not declared. Should it be static? Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170209150424.15124-1-weiyj.lk@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 03 2月, 2017 1 次提交
-
-
由 Chris Packham 提交于
The L2 cache controller on the T2080 SoC has similar capabilities to the others already supported by the mpc85xx_edac driver. Add it to the list of compatible devices. Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Acked-by: NJohannes Thumshirn <jth@kernel.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: devicetree@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170201231624.28843-1-chris.packham@alliedtelesis.co.nzSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 28 1月, 2017 7 次提交
-
-
由 Yazen Ghannam 提交于
Match one of the devices in amd64_cpuids[] before loading the module. This is an additional sanity check against users trying to load amd64_edac_mod on unsupported systems. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-9-git-send-email-Yazen.Ghannam@amd.com [ Get rid of err_ret label, make it a bit more readable this way. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Yazen Ghannam 提交于
Having ECC disabled on a node doesn't necessarily mean that it's disabled for the entire system. So let's return a non-failing code when ECC is disabled on a node. This way we can skip initialization for the node but still continue with the remaining nodes. After probing all instances, make sure we have at least one MC device allocated. This issue is seen and fix tested on Fam15h and Fam17h MCM systems. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-8-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Yazen Ghannam 提交于
We need to know if any MC devices have been allocated. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-7-git-send-email-Yazen.Ghannam@amd.com [ Prettify text. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Yazen Ghannam 提交于
amd64_{debug,notice} don't have any users, so remove them. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-6-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Yazen Ghannam 提交于
Print the node number when informing that DRAM ECC is disabled so that we can show which nodes have DRAM ECC disabled. Also, print more detailed system information as edac_dbg(), so as to not bother general users. Switch amd64_notice to amd64_info to match the message above it. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485537863-2707-5-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Yazen Ghannam 提交于
We have a few functions that register/unregister an ECC error decoding routine. These functions are called when we init/remove instances. However, they are global and so don't need to be registered/unregistered multiple times. So move them out of the init/remove instance functions and into the module init/exit routines. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-4-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Yazen Ghannam 提交于
Jump to memory freeing routines when init_one_instance() fails. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1485297149-13733-3-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-