- 20 11月, 2020 4 次提交
-
-
由 Qiuxu Zhuo 提交于
Add a new entry to 'enum mem_type' and a new string to 'edac_mem_types[]' for DDR5 new memory type. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Qiuxu Zhuo 提交于
Instead of raw access, use readl() to access MMIO registers of memory controller to avoid possible compiler re-ordering. Fixes: d4dc89d0 ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Cc: <stable@vger.kernel.org> Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Qiuxu Zhuo 提交于
Add debugfs support to fake memory correctable errors to test the error reporting path and the error address decoding logic in the igen6_edac driver. Please note that the fake errors are also reported to EDAC core and then the CE counter in EDAC sysfs is also increased. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Qiuxu Zhuo 提交于
This driver supports Intel client SoC with integrated memory controller using In-Band ECC(IBECC). The memory correctable and uncorrectable errors are reported via NMIs. The driver handles the NMIs and decodes the memory error address to platform specific address. The first IBECC-supported SoC is Elkhart Lake. [Tony: s/#include <linux/nmi.h>/#include <asm/nmi.h>/ to fix randconfig build] Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 06 11月, 2020 1 次提交
-
-
由 Qiuxu Zhuo 提交于
There are {Low-Power DDR3/4, WIO2} types of memory. Add new entries to 'enum mem_type' and new strings to 'edac_mem_types[]' for the new types. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 10 10月, 2020 1 次提交
-
-
由 Yazen Ghannam 提交于
AMD Family 19h Models 20h-2Fh use the same PCI IDs as Family 17h Models 70h-7Fh. The same family ops and number of channels also apply. Use the Family17h Model 70h family_type and ops for Family 19h Models 20h-2Fh. Update the controller name to match the system. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201009171803.3214354-1-Yazen.Ghannam@amd.com
-
- 18 9月, 2020 2 次提交
-
-
由 Xiongfeng Wang 提交于
Reading those sysfs entries gives: [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/max_location memory 3 [root@localhost /]# cat /sys/devices/system/edac/mc/mc0/dimm0/dimm_location memory 0 [root@localhost /]# Add newlines after the value it prints for better readability. [ bp: Make len a signed int and change the check to catch wraparound. Increment the pointer p only when the length check passes. Use scnprintf(). ] Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1600051734-8993-1-git-send-email-wangxiongfeng2@huawei.com
-
由 Liu Shixin 提交于
Use module_platform_driver() which makes the code simpler by eliminating boilerplate code. Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJoel Stanley <joel@jms.id.au> Link: https://lkml.kernel.org/r/20200914065358.3726216-1-liushixin2@huawei.com
-
- 17 9月, 2020 2 次提交
-
-
由 Alex Kluver 提交于
Updates to the UEFI 2.8 Memory Error Record allow splitting the bank field into bank address and bank group, and using the last 3 bits of the extended field as a chip identifier. When needed, print correct version of bank field, bank group, and chip identification. Based on UEFI 2.8 Table 299. Memory Error Record. Signed-off-by: NAlex Kluver <alex.kluver@hpe.com> Reviewed-by: NRuss Anderson <russ.anderson@hpe.com> Reviewed-by: NKyle Meyer <kyle.meyer@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Acked-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200819143544.155096-3-alex.kluver@hpe.comSigned-off-by: NArd Biesheuvel <ardb@kernel.org>
-
由 Alex Kluver 提交于
Memory errors could be printed with incorrect row values since the DIMM size has outgrown the 16 bit row field in the CPER structure. UEFI Specification Version 2.8 has increased the size of row by allowing it to use the first 2 bits from a previously reserved space within the structure. When needed, add the extension bits to the row value printed. Based on UEFI 2.8 Table 299. Memory Error Record Signed-off-by: NAlex Kluver <alex.kluver@hpe.com> Tested-by: NRuss Anderson <russ.anderson@hpe.com> Reviewed-by: NSteve Wahl <steve.wahl@hpe.com> Reviewed-by: NKyle Meyer <kyle.meyer@hpe.com> Acked-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200819143544.155096-2-alex.kluver@hpe.comSigned-off-by: NArd Biesheuvel <ardb@kernel.org>
-
- 15 9月, 2020 2 次提交
-
-
由 Borislav Petkov 提交于
With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe, unregister and probe again a driver. When ghes_edac is attempted to be loaded on a system which is not on the safe platforms list, ghes_edac_register() would return early. The unregister counterpart ghes_edac_unregister() would still attempt to unregister and exit early at the refcount test, leading to the refcount underflow below. In order to not do *anything* on the unregister path too, reuse the force_load parameter and check it on that path too, before fumbling with the refcount. ghes_edac: ghes_edac_register: entry ghes_edac: ghes_edac_register: return -ENODEV ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100 Modules linked in: CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:refcount_warn_saturate+0xb9/0x100 Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c RSP: 0018:ffffc90000037d58 EFLAGS: 00010292 RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0 Call Trace: ghes_edac_unregister ghes_remove platform_drv_remove really_probe driver_probe_device device_driver_attach __driver_attach ? device_driver_attach ? device_driver_attach bus_for_each_dev bus_add_driver driver_register ? bert_init ghes_init do_one_initcall ? rcu_read_lock_sched_held kernel_init_freeable ? rest_init kernel_init ret_from_fork ... ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824 Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164950.GB19320@zn.tnic
-
由 Borislav Petkov 提交于
Commit b972fdba ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") didn't clear all the information from the scanned system and, more specifically, left ghes_hw.num_dimms to its previous value. On a second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the leftover num_dimms value which is not 0 and thus the 0 check in enumerate_dimms() will get bypassed and it would go directly to the pointer deref: d = &hw->dimms[hw->num_dimms]; which is, of course, NULL: #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:enumerate_dimms.cold+0x7b/0x375 Reset the whole ghes_hw on driver unregister so that no stale values are used on a second system scan. Fixes: b972fdba ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") Cc: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164817.GA19320@zn.tnic
-
- 09 9月, 2020 1 次提交
-
-
由 Tom Rix 提交于
clang static analyzer reports this problem sb_edac.c:959:2: warning: Undefined or garbage value returned to caller return type; ^~~~~~~~~~~ This is a false positive. However by initializing the type to DEV_UNKNOWN the 3 case can be removed from the switch, saving a comparison and jump. Signed-off-by: NTom Rix <trix@redhat.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200907153225.7294-1-trix@redhat.com
-
- 02 9月, 2020 2 次提交
-
-
由 Krzysztof Kozlowski 提交于
platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: 86a18ee2 ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC") Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTero Kristo <t-kristo@ti.com> Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org
-
由 Krzysztof Kozlowski 提交于
platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: 9b7e6242 ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver") Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NStefan Schaeckeler <schaecsn@gmx.net> Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org
-
- 01 9月, 2020 1 次提交
-
-
由 Dinghao Liu 提交于
When pci_get_device_func() fails, the driver doesn't need to execute pci_dev_put(). mci should still be freed, though, to prevent a memory leak. When pci_enable_device() fails, the error injection PCI device "einj" doesn't need to be disabled either. [ bp: Massage commit message, rename label to "bail_mc_free". ] Fixes: 52608ba2 ("i5100_edac: probe for device 19 function 0") Signed-off-by: NDinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn
-
- 28 8月, 2020 1 次提交
-
-
由 Shiju Jose 提交于
After b9cae277 ("EDAC/ghes: Scan the system once on driver init") and with CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled, ghes_hw.dimms becomes a NULL pointer after the second ->probe() (aka ghes_edac_register()) which the config option causes to be called. This happens because the static variable which holds down whether the system has been scanned already, doesn't get reset in ghes_edac_unregister(). Then, on the second probe, ghes_scan_system() doesn't get to enumerate the DIMMs, leading to ghes_hw.dimms remaining NULL. Clear the variable and rename it to something more descriptive so that a second probe succeeds. [ bp: Rewrite commit message. ] Fixes: b9cae277 ("EDAC/ghes: Scan the system once on driver init") Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NShiju Jose <shiju.jose@huawei.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200827140450.1620-1-shiju.jose@huawei.com
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 20 8月, 2020 1 次提交
-
-
由 Yazen Ghannam 提交于
The Extended Error Code Bitmap (xec_bitmap) for a Scalable MCA bank type was intended to be used by the kernel to filter out invalid error codes on a system. However, this is unnecessary after a few product releases because the hardware will only report valid error codes. Thus, there's no need for it with future systems. Remove the xec_bitmap field and all references to it. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200720145353.43924-1-Yazen.Ghannam@amd.com
-
- 18 8月, 2020 1 次提交
-
-
由 Tony Luck 提交于
IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto the stack as part of machine check delivery is valid or not. Various drivers copied a code fragment that uses the RIPV bit to determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where RIPV is set as "FATAL"). Reverse the tests so that the error is marked fatal when RIPV is not set. Reported-by: NGabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com
-
- 17 8月, 2020 4 次提交
-
-
由 Wei Yongjun 提交于
Symbol 'lmc_dfs_ents' is not used outside of thunderx_edac.c, so make it static: drivers/edac/thunderx_edac.c:457:22: warning: symbol 'lmc_dfs_ents' was not declared. Should it be static? Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NRobert Richter <rrichter@marvell.com> Link: https://lkml.kernel.org/r/20200714142308.46612-1-weiyongjun1@huawei.com
-
由 Talel Shenhar 提交于
The Amazon's Annapurna Labs Memory Controller EDAC supports ECC capability for error detection and correction (Single bit error correction, Double detection). This driver introduces EDAC driver for that capability. [ bp: Remove "EDAC" string from Kconfig tristate as it is redundant. ] Signed-off-by: NTalel Shenhar <talel@amazon.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJames Morse <james.morse@arm.com> Link: https://lkml.kernel.org/r/20200816185551.19108-3-talel@amazon.com
-
由 Yazen Ghannam 提交于
A few existing MCA bank types will have new error types in future SMCA systems. Add the descriptions for the new error types. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200708153515.1911642-1-Yazen.Ghannam@amd.com
-
由 Alexander A. Klimov 提交于
Rationale: Reduces attack surface on kernel devs opening the links for MITM as HTTPS traffic is much harder to manipulate. Deterministic algorithm: For each file: If not .svg: For each line: If doesn't contain `\bxmlns\b`: For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`: If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`: If both the HTTP and HTTPS versions return 200 OK and serve the same content: Replace HTTP with HTTPS. [ bp: Merge all EDAC patches into a single one. ] Signed-off-by: NAlexander A. Klimov <grandmaster@al2klimov.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: Tero Kristo <t-kristo@ti.com> # ti_edac Link: https://lkml.kernel.org/r/20200708113546.14135-1-grandmaster@al2klimov.de
-
- 11 8月, 2020 1 次提交
-
-
由 Jason Baron 提交于
The Intel uncore driver may claim some of the pci ids from ie31200 which means that the ie31200 edac driver will not initialize them as part of pci_register_driver(). Let's add a fallback for this case to 'pci_get_device()' to get a reference on the device such that it can still be configured. This is similar in approach to other edac drivers. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/1594923911-10885-1-git-send-email-jbaron@akamai.com
-
- 23 6月, 2020 1 次提交
-
-
由 Smita Koralahalli 提交于
Print the Protected Processor Identification Number (PPIN) on processors which support it. [ bp: Massage. ] Signed-off-by: NSmita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200623130059.8870-1-Smita.KoralahalliChannabasappa@amd.com
-
- 19 6月, 2020 1 次提交
-
-
由 Borislav Petkov 提交于
Commit: da92110d ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h") added support for F15h, model 0x60 CPUs but in doing so, missed to read back SCRCTRL PCI config register on F15h CPUs which are *not* model 0x60. Add that read so that doing $ cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate can show the previously set DRAM scrub rate. Fixes: da92110d ("EDAC, amd64_edac: Extend scrub rate support to F15hM60h") Reported-by: NAnders Andersson <pipatron@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> #v4.4.. Link: https://lkml.kernel.org/r/CAKkunMbNWppx_i6xSdDHLseA2QQmGJqj_crY=NF-GZML5np4Vw@mail.gmail.com
-
- 17 6月, 2020 2 次提交
-
-
由 Qiushi Wu 提交于
When kobject_init_and_add() returns an error, it should be handled because kobject_init_and_add() takes a reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Therefore, replace calling kfree() and call kobject_put() and add a missing kobject_put() in the edac_device_register_sysfs_main_kobj() error path. [ bp: Massage and merge into a single patch. ] Fixes: b2ed215a ("Kobject: change drivers/edac to use kobject_init_and_add") Signed-off-by: NQiushi Wu <wu000273@umn.edu> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu
-
由 Borislav Petkov 提交于
Change the hardware scanning and figuring out how many DIMMs a machine has to a single, one-time thing which happens once on driver init. After that scanning completes, struct ghes_hw_desc contains a representation of the hardware which the driver can then use for later initialization. Then, copy the DIMM information into the respective EDAC core representation of those. Get rid of ghes_edac_dimm_fill and use a struct dimm_info array directly. This way, hw detection and further driver initialization is nicely and logically split. Further additions should all be added to ghes_scan_system() and the hw representation extended as needed. There should be no functionality change resulting from this patch. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 16 6月, 2020 5 次提交
-
-
由 Robert Richter 提交于
The struct members list and ghes of struct ghes_edac_pvt are unused, remove them. On that occasion, rename it to the shorter name struct ghes_pvt. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200519104443.15673-2-rrichter@marvell.com
-
由 Robert Richter 提交于
The ghes driver reports errors with 'unknown label' even if the actual DIMM label is known, e.g.: EDAC MC0: 1 CE Single-bit ECC on unknown label (node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x966a9b3 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:0 col:13 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) Fix this by using struct dimm_info's label string in error reports: EDAC MC0: 1 CE Single-bit ECC on N0 DIMM_A0 (node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x99223d8 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:515 col:14 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) The labels are initialized by reading the bank and device strings from DMI. Now, the label information can also read from sysfs. E.g. a ThunderX2 system will show the following: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc0/dimm8/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc0/dimm9/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc0/dimm10/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc0/dimm11/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc0/dimm12/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc0/dimm13/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc0/dimm14/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc0/dimm15/dimm_label:N1 DIMM_P0 Since dimm_labels can be rewritten, that label will be used in a later error report: # echo foobar >/sys/devices/system/edac/mc/mc0/dimm0/dimm_label # # some error injection here # dmesg | grep foobar [ 751.383533] EDAC MC0: 1 CE Single-bit ECC on foobar (node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 page:0x8c8dc74 offset:0x0 grain:1 syndrome:0x0 - APEI location: node:0 card:0 module:0 rank:1 bank:259 col:3 bit_pos:16 DIMM location:N0 DIMM_A0 status(0x0000000000000400): Storage error in DRAM memory) [ bp: Remove curly brackets around a single if-statement in dimm_setup_label(). ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200528101307.23245-1-rrichter@marvell.com
-
由 Qiuxu Zhuo 提交于
Use the X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS() macro to pass CPU stepping specific configurations to {skx,i10nm}_init(), so can delete the CPU stepping check from 10nm_init(). Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200509010822.76331-1-qiuxu.zhuo@intel.com
-
由 Zhenzhong Duan 提交于
By calling edac_inc_ue_error() before panic, we get a correct UE error count for core dump analysis. Signed-off-by: NZhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-2-zhenzhong.duan@gmail.com
-
由 Zhenzhong Duan 提交于
Avoid giving it MCE_PRIO_LOWEST priority by default. Signed-off-by: NZhenzhong Duan <zhenzhong.duan@gmail.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200610065846.3626-1-zhenzhong.duan@gmail.com
-
- 14 6月, 2020 1 次提交
-
-
由 Masahiro Yamada 提交于
Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
-
- 29 5月, 2020 1 次提交
-
-
由 Colin Ian King 提交于
The variable ret is being assigned with a value that is never read and it is being updated later with a new value. The initialization is redundant so remove it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200429154847.287001-1-colin.king@canonical.com
-
- 23 5月, 2020 1 次提交
-
-
由 Alexander Monakov 提交于
Add support for AMD Renoir (4000-series Ryzen CPUs). Signed-off-by: NAlexander Monakov <amonakov@ispras.ru> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NYazen Ghannam <yazen.ghannam@amd.com> Link: https://lkml.kernel.org/r/20200510204842.2603-4-amonakov@ispras.ru
-
- 20 5月, 2020 1 次提交
-
-
由 Qiuxu Zhuo 提交于
The skx_edac driver wrongly uses the mtr register to retrieve two fields close_pg and bank_xor_enable. Fix it by using the correct mcmtr register to get the two fields. Cc: <stable@vger.kernel.org> Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Reported-by: NMatthew Riley <mattdr@google.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200515210146.1337-1-tony.luck@intel.com
-
- 28 4月, 2020 2 次提交
-
-
由 Qiuxu Zhuo 提交于
The i10nm_edac driver failed to load on Ice Lake and Tremont/Jacobsville servers if their CPU stepping >= 4 and failed on Ice Lake-D servers from stepping 0. The root cause was that for Ice Lake and Tremont/Jacobsville servers with CPU stepping >=4, the offset for bus number configuration register was updated from 0xcc to 0xd0. For Ice Lake-D servers, all the steppings use the updated 0xd0 offset. Fix the issue by using the appropriate offset for bus number configuration register according to the CPU model number and stepping. Reported-by: NJerry Chen <jerry.t.chen@intel.com> Reported-and-tested-by: NJin Wen <wen.jin@intel.com> Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/linux-edac/20200427084022.GC11036@zn.tnic
-
由 Qiuxu Zhuo 提交于
The device ID for configuration agent PCI device and the offset for bus number configuration register can be CPU model specific. So add a new structure res_config to make them configurable and pass res_config to {skx,i10nm}_init() and skx_get_all_bus_mappings() for use. Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200427083246.GB11036@zn.tnic
-