提交 · 0c510cc83bdbaac8406f4f7caef34f4da0ba35ea · openeuler / raspberrypi-kernel

17 2月, 2015 1 次提交

EDAC, amd64_edac: Prevent OOPS with >16 memory controllers · 0c510cc8

由 Daniel J Blueman 提交于 2月 17, 2015

When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
the kernel fatally dereferences unallocated structures, see splat below;
this occurs on at least NumaConnect systems.

Fix by checking if a memory controller info structure was found.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in:
CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
Stack:
 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
 ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
Call Trace:
 <IRQ>
 ? vprintk_default
 ? printk
 amd_decode_mce
 notifier_call_chain
 atomic_notifier_call_chain
 mce_log
 machine_check_poll
 mce_timer_fn
 ? mce_cpu_restart
 call_timer_fn.isra.29
 run_timer_softirq
 __do_softirq
 irq_exit
 smp_apic_timer_interrupt
 apic_timer_interrupt
 <EOI>
 ? down_read_trylock
 __do_page_fault
 ? __schedule
 do_page_fault
 page_fault
Signed-off-by: NDaniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
Cc: stable@vger.kernel.org
[ Boris: massage commit message ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

0c510cc8

09 2月, 2015 1 次提交

sb_edac: Fix detection on SNB machines · 11249e73

由 Borislav Petkov 提交于 2月 05, 2015

d0585cd8 ("sb_edac: Claim a different PCI device") changed the
probing of sb_edac to look for PCI device 0x3ca0:

3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
...

but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
in sbridge_probe() therefore the probing fails.

Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
.i.e., the 14.0 device, fixes the issue and driver loads successfully
again:

[ 2449.013120] EDAC DEBUG: sbridge_init:
[ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
[ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
[ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
[ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
[ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
...

Add a debug printk while at it to be able to catch the failure in the
future and dump driver version on successful load.

Fixes: d0585cd8 ("sb_edac: Claim a different PCI device")
Cc: stable@vger.kernel.org # 3.18
Acked-by: NAristeu Rozanski <aris@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

11249e73

03 12月, 2014 2 次提交

sb_edac: Fix typo computing number of banks · fec53af5

由 Tony Luck 提交于 12月 02, 2014

Code will always think there are 16 banks because of a typo

Reported-by: Misha
Signed-off-by: NTony Luck <tony.luck@intel.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

fec53af5

sb_edac: Add support for Broadwell-DE processor · 1f39581a

由 Tony Luck 提交于 12月 02, 2014

Broadwell-DE is the microserver version of next generation Xeon
processors.  A whole bunch of new PCIe device ids, but otherwise
pretty much the same as Haswell.
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

1f39581a

02 12月, 2014 3 次提交

sb_edac: Fix discovery of top-of-low-memory for Haswell · f7cf2a22

由 Tony Luck 提交于 10月 29, 2014

Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset. There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).

This resulted in a bogus value for the top of low memory:

EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:

sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523eac86
sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)

With the fix we see the correct TOLM value:

DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

and we decode address 2000000 correctly:

sbridge MC3: HANDLING MCE MEMORY ERROR
sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
sbridge MC3: TSC 0
sbridge MC3: ADDR 2000000
sbridge MC3: MISC 523e1086
sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
Signed-off-by: NTony Luck <tony.luck@intel.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

f7cf2a22

sb_edac: Fix erroneous bytes->gigabytes conversion · 8c009100

由 Jim Snow 提交于 11月 18, 2014

Signed-off-by: NJim Snow <jim.snow@intel.com>
Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

8c009100

sb_edac: Fix off-by-one error in number of channels · 50043e25

由 Jim Snow 提交于 11月 18, 2014

This prevented edac sysfs code from properly handling 6 channels
per memory controller.
Signed-off-by: NJim Snow <jim.snow@intel.com>
Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

50043e25

25 11月, 2014 5 次提交

EDAC, MCE, AMD: Correct formatting of decoded text · 50872ccd

由 Borislav Petkov 提交于 11月 22, 2014

Write out MCx_ADDR into the more humanly readable "MCx Error Address"
and remove double colon in the output.

Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

50872ccd

EDAC, mce_amd_inj: Add an injector function · 51756a50

由 Borislav Petkov 提交于 11月 22, 2014

Selectively inject either a real MCE or a sw-only version which
exercises the decoding path only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.
Signed-off-by: NBorislav Petkov <bp@suse.de>

51756a50

B
EDAC, mce_amd_inj: Add hw-injection attributes · b18f3864
由 Borislav Petkov 提交于 11月 22, 2014
```
Expose struct mce->inject_flags.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
b18f3864

EDAC, mce_amd_inj: Enable direct writes to MCE MSRs · 21690934

由 Borislav Petkov 提交于 11月 22, 2014

Normally, writing those causes a #GP but HWCR[McStatusWrEn] controls
that. Provide a knob.
Signed-off-by: NBorislav Petkov <bp@suse.de>

21690934

B
EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs · fd19fcd6
由 Borislav Petkov 提交于 11月 22, 2014
```
This module's interface belongs in debugfs, not in sysfs.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
fd19fcd6

20 11月, 2014 1 次提交

x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error · e3480271

由 Chen Yucong 提交于 11月 18, 2014

Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.
Reviewed-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: NChen Yucong <slaoub@gmail.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

e3480271

19 11月, 2014 1 次提交

EDAC: Delete unnecessary check before calling pci_dev_put() · 0a98babd

由 Markus Elfring 提交于 11月 19, 2014

The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test before the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Link: http://lkml.kernel.org/r/546CB20D.4070808@users.sourceforge.net
[ Boris: commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

0a98babd

12 11月, 2014 2 次提交

EDAC, pci_sysfs: remove unneccessary ifdef around entire file · 19ca5a3c

由 Andreas Ruprecht 提交于 8月 10, 2014

The file edac_pci_sysfs.c is dependent on CONFIG_PCI. This is already
modelled in the Makefile, but edac_pci_sysfs.o is still contained in
the list of files compiled even without CONFIG_PCI.

This change removes edac_pci_sysfs.o from the list of built objects
when not having CONFIG_PCI enabled and removes the then-unnecessary
ifdef from the source file.
Signed-off-by: NAndreas Ruprecht <rupran@einserver.de>
Link: http://lkml.kernel.org/r/1407697803-3837-1-git-send-email-rupran@einserver.deSigned-off-by: NBorislav Petkov <bp@suse.de>

19ca5a3c

ghes_edac: Use snprintf() to silence a static checker warning · 665aa8cd

由 Dan Carpenter 提交于 8月 01, 2014

My static checker complains because the "e->location" has up to 256
characters but we are copying it into the "pvt->detail_location" which
only has space for 240 characters.  That's not counting the surrounding
text and the "e->other_detail" string which can be over 80 characters
long.

I am not familiar with this code but presumably it normally works.
Let's add a limit though for safety.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/20140801082514.GD28869@mwandaSigned-off-by: NBorislav Petkov <bp@suse.de>

665aa8cd

05 11月, 2014 2 次提交

amd64_edac: Build module on x86-32 · f5b10c45

由 Tomasz Pala 提交于 11月 02, 2014

By popular demand, enable amd64_edac on 32-bit too.

Boris:
 - update Kconfig text.
 - add a warning on load which states that 32-bit configurations are unsupported.
Signed-off-by: NTomasz Pala <gotar@polanet.pl>
Link: http://lkml.kernel.org/r/20141102102212.GA7034@polanet.plSigned-off-by: NBorislav Petkov <bp@suse.de>

f5b10c45

EDAC, MCE, AMD: Add decoding table for MC6 xec · bc4febe9

由 Aravind Gopalakrishnan 提交于 11月 04, 2014

Extended error code meanings are tabulated for other banks. Extend that
tradition for MC6 too.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

bc4febe9

30 10月, 2014 1 次提交

amd64_edac: Add F15h M60h support · a597d2a5

由 Aravind Gopalakrishnan 提交于 10月 30, 2014

This patch adds support for ECC error decoding for F15h M60h processor.
Aside from the usual changes, the patch adds support for some new features
in the processor:
 - DDR4(unbuffered, registered); LRDIMM DDR3 support
   - relevant debug messages have been modified/added to report these
     memory types
 - new dbam_to_cs mappers
   - if (F15h M60h && LRDIMM); we need a 'multiplier' value to find
     cs_size. This multiplier value is obtained from the per-dimm
     DCSM register. So, change the interface to accept a 'cs_mask_nr'
     value to facilitate this calculation
 - switch-casing determine_memory_type()
   - done to cleanse the function of too many if-else statements
     and improve readability
   - This is now called early in read_mc_regs() to cache dram_type

Misc cleanup:
 - amd64_pci_table[] is condensed by using PCI_VDEVICE macro.

Testing details:
Tested the patch by injecting 'ECC' type errors using mce_amd_inj
and error decoding works fine.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1414617483-4941-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: determine_memory_type() cleanups ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

a597d2a5

23 10月, 2014 4 次提交

e7xxx_edac: Report CE events properly · 8030122a

由 Jason Baron 提交于 10月 18, 2014

Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>

8030122a

cpc925_edac: Report UE events properly · fa19ac4b

由 Jason Baron 提交于 10月 15, 2014

Fix UE event being reported as HW_EVENT_ERR_CORRECTED.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>

fa19ac4b

i82860_edac: Report CE events properly · ab0543de

由 Jason Baron 提交于 10月 15, 2014

Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>

ab0543de

i3200_edac: Report CE events properly · 8a3f075d

由 Jason Baron 提交于 10月 15, 2014

Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>

8a3f075d

20 10月, 2014 4 次提交

edac: drop owner assignment from platform_drivers · b7382f83

由 Wolfram Sang 提交于 10月 20, 2014

A platform_driver does not need to set an owner, it will be populated by the
driver core.
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>

b7382f83

{mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED · 5c43cbdf

由 Michael Opdenacker 提交于 10月 01, 2014

It's a NOOP since 2.6.35.
Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
Link: http://lkml.kernel.org/r/1412159043-7348-1-git-send-email-michael.opdenacker@free-electrons.comSigned-off-by: NBorislav Petkov <bp@suse.de>

5c43cbdf

EDAC: Sync memory types and names · 4cfc3a40

由 Borislav Petkov 提交于 9月 30, 2014

Make keeping the sync between the mem_types enum and the actual string
names simpler by using designated initializers.
Signed-off-by: NBorislav Petkov <bp@suse.de>

4cfc3a40

EDAC: Add DDR3 LRDIMM entries to edac_mem_types · 348fec70

由 Aravind Gopalakrishnan 提交于 9月 18, 2014

F15hM60h adds support for DDR4 and DDR3 LRDIMMs. Add them here.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1411070218-10258-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: improve comments. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

348fec70

09 10月, 2014 3 次提交

sb_edac: Claim a different PCI device · d0585cd8

由 Andy Lutomirski 提交于 8月 14, 2014

sb_edac controls a large number of different PCI functions.  Rather
than registering as a normal PCI driver for all of them, it
registers for just one so that it gets probed and, at probe time, it
looks for all the others.

Coincidentally, the device it registers for also contains the SMBUS
registers, so the PCI core will refuse to probe both sb_edac and a
future iMC SMBUS driver.  The drivers don't actually conflict, so
just change sb_edac's device table to probe a different device.

An alternative fix would be to merge the two drivers, but sb_edac
will also refuse to load on non-ECC systems, whereas i2c_imc would
still be useful without ECC.

The only user-visible change should be that sb_edac appears to bind
a different device.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Rui Wang <ruiv.wang@gmail.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

d0585cd8

Move Intel SNB device ids from sb_edac to pci_ids.h · 68939df1

由 Andy Lutomirski 提交于 8月 14, 2014

The i2c_imc driver will use two of them, and moving only part of
the list seems messier.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

68939df1

sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel · 351fc4a9

由 Seth Jennings 提交于 9月 05, 2014

Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but
EDAC doesn't know about this and returns and INTERNAL ERROR when the
channel is greater than NUM_CHANNELS:

kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f
kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0
kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4)
kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0)

This commit changes sb_edac to forward a channel of -1 to EDAC if the
channel is not specified. edac_mc_handle_error() sets the channel to -1
internally after the error message anyway, so this commit should have no
effect other than avoiding the INTERNAL ERROR message when the channel
is not specified.
Signed-off-by: NSeth Jennings <sjenning@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

351fc4a9

30 9月, 2014 1 次提交

mpc85xx_edac: Make L2 interrupt shared too · a18c3f16

由 Borislav Petkov 提交于 9月 30, 2014

The other two interrupt handlers in this driver are shared, except this
one. When loading the driver, it fails like this.

So make the IRQ line shared.

Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software
mpc85xx_mc_err_probe: No ECC DIMMs discovered
EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT)
genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err)
mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err
remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:521
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1
task: ee058000 ti: ee046000 task.ti: ee046000
NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c
REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty)
MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000

GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003
GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394
GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660
NIP [c016c0c4] remove_proc_entry
LR [c016c0c4] remove_proc_entry
Call Trace:
remove_proc_entry (unreliable)
unregister_irq_proc
free_desc
irq_free_descs
mpc85xx_l2_err_probe
platform_drv_probe
really_probe
__driver_attach
bus_for_each_dev
bus_add_driver
driver_register
mpc85xx_mc_init
do_one_initcall
kernel_init_freeable
kernel_init
ret_from_kernel_thread
Instruction dump: ...

Reported-and-tested-by: <lpb_098@163.com>
Acked-by: NJohannes Thumshirn <johannes.thumshirn@men.de>
Cc: stable@vger.kernel.org
Signed-off-by: NBorislav Petkov <bp@suse.de>

a18c3f16

23 9月, 2014 1 次提交

amd64_edac: Modify usage of amd64_read_dct_pci_cfg() · 7981a28f

由 Aravind Gopalakrishnan 提交于 9月 15, 2014

Rationale behind this change:
 - F2x1xx addresses were stopped from being mapped explicitly to DCT1
   from F15h (OR) onwards. They use _dct[0:1] mechanism to access the
   registers. So we should move away from using address ranges to select
   DCT for these families.
 - On newer processors, the address ranges used to indicate DCT1 (0x140,
   0x1a0) have different meanings than what is assumed currently.

Changes introduced:
 - amd64_read_dct_pci_cfg() now takes in dct value and uses it for
   'selecting the dct'
 - Update usage of the function. Keep in mind that different families
   have specific handling requirements
 - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different
   from amd64_read_pci_cfg()
   - Move the k8 specific check to amd64_read_pci_cfg
 - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg()
 - Remove now needless .read_dct_pci_cfg

Testing:
 - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG'
   and mce_amd_inj
 - driver obtains info from F2x registers and caches it in pvt
   structures correctly
 - ECC decoding works fine
Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>

7981a28f

15 9月, 2014 1 次提交

ppc4xx_edac: Fix build error caused by wrong member access · 2d34056d

由 Pranith Kumar 提交于 8月 19, 2014

Fix the following error

drivers/edac/ppc4xx_edac.c:977:45: error: request for member 'dimm' in something
not a structure or union

by changing member access to pointer dereference.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Link: http://lkml.kernel.org/r/1408482646-22541-1-git-send-email-bobby.prani@gmail.com
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

2d34056d

05 9月, 2014 1 次提交

edac: altera: Add Altera SDRAM EDAC support · 71bcada8

由 Thor Thayer 提交于 9月 03, 2014

This patch adds support for the CycloneV and ArriaV SDRAM controllers.
Correction and reporting of SBEs, Panic on DBEs.

There was a discussion thread on whether this driver should be an mfd driver
or just make use of syscon, which is already a mfd. Ultimately, the
decision to use a simple syscon interface was reached.[1]

[1] https://lkml.org/lkml/2014/7/30/514

[dinguyen] Fixed Kconfig to have EDAC_ALTERA_MC as a tristate to prevent a
build failure for allmodconfig.
Signed-off-by: NThor Thayer <tthayer@opensource.altera.com>
Acked-by: NBorislav Petkov <bp@suse.de>
[dinguyen] cleaned up commit message
Signed-off-by: NDinh Nguyen <dinguyen@opensource.altera.com>

71bcada8

02 9月, 2014 1 次提交

EDAC: Fix mem_types strings type · f4ce6eca

由 Borislav Petkov 提交于 8月 13, 2014

This one got forgotten during an earlier cleanup.
Signed-off-by: NBorislav Petkov <bp@suse.de>

f4ce6eca

14 7月, 2014 1 次提交

EDAC, MCE, AMD: Add MCE decoding for F15h M60h · eba4bfb3

由 Aravind Gopalakrishnan 提交于 7月 14, 2014

Add decoding logic for new Fam15h model 60h.

Tested using mce_amd_inj module and works fine.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1405098795-4678-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: simplify a bit. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

eba4bfb3

10 7月, 2014 1 次提交

ie31200_edac: Allocate mci and map mchbar first · 78fd4d12

由 Jason Baron 提交于 7月 09, 2014

Check for memory allocation and mchbar mapping failures before
initializing the dimm info tables needlessly.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Suggested-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/ead8f53e699f1ce21c2e17f3cffb4685d4faf72a.1404939455.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>

78fd4d12

04 7月, 2014 2 次提交

ie31200_edac: Introduce the driver · 7ee40b89

由 Jason Baron 提交于 7月 04, 2014

Add a driver for the E3-1200 series of Intel DRAM controllers, based on
the following E3-1200 specs:

http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200-family-vol-2-datasheet.html
http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-vol-2-datasheet.html

I've tested this on bad memory hardware, and observed correlating bad
reads and uncorrected memory errors as reported by the driver.

Tested against:

CPU E3-1270 v3 @ 3.50GHz : 8086:0c08 (haswell)
CPU E3-1270 V2 @ 3.50GHz : 8086:0158 (ivy bridge)
CPU E31270 @ 3.40GHz : 8086:0108 (sandy bridge)
Signed-off-by: NJason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/95c83e80dd40b5377e8bb206285c5d95ac623872.1403818526.git.jbaron@akamai.com
[ Boris: realign defines ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

7ee40b89

x38_edac: make use of lo_hi_readq() · a21e98ce

由 Jason Baron 提交于 6月 26, 2014

Convert to the generic API.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/bb9a4cbb980cc7b51be75cbfcf644553bf6a04cd.1403818526.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>

a21e98ce

27 6月, 2014 1 次提交

sb_edac: add support for Haswell based systems · 50d1bb93

由 Aristeu Rozanski 提交于 6月 20, 2014

Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge
ones. This patch adds support to Haswell based systems.

[m.chehab@samsung.com: Fix CodingStyle issues]
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>

50d1bb93