- 23 2月, 2015 5 次提交
-
-
由 Takashi Iwai 提交于
... instead of possibly uninitialized return value. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-5-git-send-email-tiwai@suse.de [ Add a commit message, albeit a small one. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Takashi Iwai 提交于
Instead of calling device_create_file() and device_remove_file() manually, pass the static attribute groups with the new edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs files is done by a proper is_visible callback. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.deSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Takashi Iwai 提交于
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info object with the optional attribute groups. This allows drivers to pass additional sysfs entries without manual (and racy) device_create_file() and co calls. edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups() with NULL groups. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.deSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Takashi Iwai 提交于
Instead of manual calls of device_create_file() and device_remove_file(), use static attribute groups with proper is_visible callbacks for managing the sysfs entries. This simplifies the code a lot and avoids the possible races. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.deSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Markus Elfring 提交于
The pci_dev_put() function tests whether its argument is NULL and thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Link: http://lkml.kernel.org/r/54CFC12C.9010002@users.sourceforge.netSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 17 2月, 2015 1 次提交
-
-
由 Daniel J Blueman 提交于
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: NDaniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com Cc: stable@vger.kernel.org [ Boris: massage commit message ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 09 2月, 2015 1 次提交
-
-
由 Borislav Petkov 提交于
d0585cd8 ("sb_edac: Claim a different PCI device") changed the probing of sb_edac to look for PCI device 0x3ca0: 3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07) 00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00 ... but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA in sbridge_probe() therefore the probing fails. Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0), .i.e., the 14.0 device, fixes the issue and driver loads successfully again: [ 2449.013120] EDAC DEBUG: sbridge_init: [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0 [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8 [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 ... Add a debug printk while at it to be able to catch the failure in the future and dump driver version on successful load. Fixes: d0585cd8 ("sb_edac: Claim a different PCI device") Cc: stable@vger.kernel.org # 3.18 Acked-by: NAristeu Rozanski <aris@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 31 1月, 2015 1 次提交
-
-
由 Dan Carpenter 提交于
If edac_mc_add_mc() fails then we should preserve the error code, but instead the current code returns success. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20150128191351.GC10259@mwandaSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 30 1月, 2015 2 次提交
-
-
由 Borislav Petkov 提交于
Fix sparse warnings. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Junjie Mao 提交于
Also use goto labels for all failure paths in edac_create_sysfs_mci_device and update meaningless labels. Signed-off-by: NJunjie Mao <junjie.mao@hotmail.com> Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl [ Boris: Use ! for 0 checks and add newlines for less crammed code. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 12 1月, 2015 1 次提交
-
-
由 Rickard Strandqvist 提交于
Remove the function i5100_recmema_dm_buf_id() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Link: http://lkml.kernel.org/r/1420999030-21770-1-git-send-email-rickard_strandqvist@spectrumdigital.seSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 07 1月, 2015 1 次提交
-
-
Add EDAC support for ecc errors reporting on the synopsys ddr controller. The ddr ecc controller corrects single bit errors and detects double bit errors. Selected important-ish notes from the changelog: - I have not taken care of spliting synps_edac_geterror_info function as it adds additional indentation levels and moreover the existing changes were made as part of the v2 review comments - Removed dt binding info as already there is a binding info available under memorycontroller. so, updated ecc info there. - Shortened the prefix "sysnopsys" to "synps" Signed-off-by: NPunnaiah Choudary Kalluri <punnaia@xilinx.com> Link: http://lkml.kernel.org/r/a728a8d4678f4dbf9de189a480297c3d@BY2FFO11FD034.protection.gbl [ Boris: massage commit message. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 02 1月, 2015 1 次提交
-
-
由 Alexander Kuleshov 提交于
s/kenel/kernel/g [ Boris: massage commit message a bit ] Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Cc: Johannes Thumshirn <johannes.thumshirn@men.de> Link: http://lkml.kernel.org/r/1419749085-7128-1-git-send-email-kuleshovmail@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 21 12月, 2014 1 次提交
-
-
由 Wei Yongjun 提交于
Fixes the following sparse warnings: drivers/edac/mce_amd_inj.c:204:3: warning: symbol 'dfs_fls' was not declared. Should it be static? Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Link: http://lkml.kernel.org/r/1418087095-14174-1-git-send-email-weiyj_lk@163.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 03 12月, 2014 2 次提交
-
-
由 Tony Luck 提交于
Code will always think there are 16 banks because of a typo Reported-by: Misha Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Tony Luck 提交于
Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 02 12月, 2014 3 次提交
-
-
由 Tony Luck 提交于
Haswell moved the TOLM/TOHM registers to a different device and offset. The sb_edac driver accounted for the change of device, but not for the new offset. There was also a typo in the constant to fill in the low 26 bits (was 0x1ffffff, should be 0x3ffffff). This resulted in a bogus value for the top of low memory: EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff) which would result in EDAC refusing to translate addresses for errors above the bogus value and below 4GB: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523eac86 sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0 MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0) With the fix we see the correct TOLM value: DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff) and we decode address 2000000 correctly: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523e1086 sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0 DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0 DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01 DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1 DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0 DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0 MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Jim Snow 提交于
Signed-off-by: NJim Snow <jim.snow@intel.com> Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Jim Snow 提交于
This prevented edac sysfs code from properly handling 6 channels per memory controller. Signed-off-by: NJim Snow <jim.snow@intel.com> Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 25 11月, 2014 5 次提交
-
-
由 Borislav Petkov 提交于
Write out MCx_ADDR into the more humanly readable "MCx Error Address" and remove double colon in the output. Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Selectively inject either a real MCE or a sw-only version which exercises the decoding path only. The hardware-injected MCE triggers a machine check exception (#MC) so that the MCE handler can be bothered to do something too. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Expose struct mce->inject_flags. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Normally, writing those causes a #GP but HWCR[McStatusWrEn] controls that. Provide a knob. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
This module's interface belongs in debugfs, not in sysfs. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 20 11月, 2014 1 次提交
-
-
由 Chen Yucong 提交于
Until now, the mce_severity mechanism can only identify the severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter out DEFERRED error for AMD platform. This patch extends the mce_severity mechanism for handling UCNA/DEFERRED error. In order to do this, the patch introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY. In addition, mce_severity is specific to machine check exception, and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity mechanism in non-exception context, the patch also introduces a new argument (is_excp) for mce_severity. `is_excp' is used to explicitly specify the calling context of mce_severity. Reviewed-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: NChen Yucong <slaoub@gmail.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 19 11月, 2014 1 次提交
-
-
由 Markus Elfring 提交于
The pci_dev_put() function tests whether its argument is NULL and then returns immediately. Thus the test before the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Link: http://lkml.kernel.org/r/546CB20D.4070808@users.sourceforge.net [ Boris: commit message. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 12 11月, 2014 2 次提交
-
-
由 Andreas Ruprecht 提交于
The file edac_pci_sysfs.c is dependent on CONFIG_PCI. This is already modelled in the Makefile, but edac_pci_sysfs.o is still contained in the list of files compiled even without CONFIG_PCI. This change removes edac_pci_sysfs.o from the list of built objects when not having CONFIG_PCI enabled and removes the then-unnecessary ifdef from the source file. Signed-off-by: NAndreas Ruprecht <rupran@einserver.de> Link: http://lkml.kernel.org/r/1407697803-3837-1-git-send-email-rupran@einserver.deSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Dan Carpenter 提交于
My static checker complains because the "e->location" has up to 256 characters but we are copying it into the "pvt->detail_location" which only has space for 240 characters. That's not counting the surrounding text and the "e->other_detail" string which can be over 80 characters long. I am not familiar with this code but presumably it normally works. Let's add a limit though for safety. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com> Link: http://lkml.kernel.org/r/20140801082514.GD28869@mwandaSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 05 11月, 2014 2 次提交
-
-
由 Tomasz Pala 提交于
By popular demand, enable amd64_edac on 32-bit too. Boris: - update Kconfig text. - add a warning on load which states that 32-bit configurations are unsupported. Signed-off-by: NTomasz Pala <gotar@polanet.pl> Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.plSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Aravind Gopalakrishnan 提交于
Extended error code meanings are tabulated for other banks. Extend that tradition for MC6 too. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 30 10月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
This patch adds support for ECC error decoding for F15h M60h processor. Aside from the usual changes, the patch adds support for some new features in the processor: - DDR4(unbuffered, registered); LRDIMM DDR3 support - relevant debug messages have been modified/added to report these memory types - new dbam_to_cs mappers - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find cs_size. This multiplier value is obtained from the per-dimm DCSM register. So, change the interface to accept a 'cs_mask_nr' value to facilitate this calculation - switch-casing determine_memory_type() - done to cleanse the function of too many if-else statements and improve readability - This is now called early in read_mc_regs() to cache dram_type Misc cleanup: - amd64_pci_table[] is condensed by using PCI_VDEVICE macro. Testing details: Tested the patch by injecting 'ECC' type errors using mce_amd_inj and error decoding works fine. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: determine_memory_type() cleanups ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 23 10月, 2014 4 次提交
-
-
由 Jason Baron 提交于
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Fix UE event being reported as HW_EVENT_ERR_CORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 20 10月, 2014 4 次提交
-
-
由 Wolfram Sang 提交于
A platform_driver does not need to set an owner, it will be populated by the driver core. Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
-
由 Michael Opdenacker 提交于
It's a NOOP since 2.6.35. Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com> Link: http://lkml.kernel.org/r/1412159043-7348-1-git-send-email-michael.opdenacker@free-electrons.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
Make keeping the sync between the mem_types enum and the actual string names simpler by using designated initializers. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Aravind Gopalakrishnan 提交于
F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: improve comments. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 09 10月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
sb_edac controls a large number of different PCI functions. Rather than registering as a normal PCI driver for all of them, it registers for just one so that it gets probed and, at probe time, it looks for all the others. Coincidentally, the device it registers for also contains the SMBUS registers, so the PCI core will refuse to probe both sb_edac and a future iMC SMBUS driver. The drivers don't actually conflict, so just change sb_edac's device table to probe a different device. An alternative fix would be to merge the two drivers, but sb_edac will also refuse to load on non-ECC systems, whereas i2c_imc would still be useful without ECC. The only user-visible change should be that sb_edac appears to bind a different device. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: Rui Wang <ruiv.wang@gmail.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-