- 13 8月, 2015 1 次提交
-
-
由 Chen, Gong 提交于
Use unified genpool to save Action Optional error events and put Action Optional error handling in the same notification chain as MCE error decoding. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> [ Fold in subsequent patch from Boris for early boot logging. ] Signed-off-by: NTony Luck <tony.luck@intel.com> [ Correct a lot. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 6月, 2015 3 次提交
-
-
由 Tony Luck 提交于
Basic support for the single socket Broadwell-DE processor was added back in commit 1f39581a sb_edac: Add support for Broadwell-DE processor This patch extends Broadwell support to cover the two socket "-EP" and four socket "-EX" versions of Broadwell. Only tested on the 2 socket - but this code is largely cloned from the Haswell path. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Tony Luck 提交于
First noticed a problem on a 4 socket machine where EDAC only reported half the DIMMS. Tracked this down to the code that assumes that systems with two home agents only have two memory channels on each agent. This is true on 2 sockect ("-EP") machines. But four socket ("-EX") machines have four memory channels on each home agent. The old code would have had problems on two socket systems as it did a shuffling trick to make the internals of the code think that the channels from the first agent were '0' and '1', with the second agent providing '2' and '3'. But the code didn't uniformly convert from {ha,channel} tuples to this internal representation. New code always considers up to eight channels. On a machine with a single home agent these map easily to edac channels 0, 1, 2, 3. On machines with two home agents we map using: edac_channel = 4*ha# + channel So on a -EP machine where each home agent supports only two channels we'll fill in channels 0, 1, 4, 5, and on a -EX machine we use all of 0, 1, 2, 3, 4, 5, 6, 7. [mchehab@osg.samsung.com: fold a fixup patch as per Tony's request and fixed a few CodingStyle issues] Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Tony Luck 提交于
typo: "a7mode" chooses whether to use bits {8, 7, 9} or {8, 7, 6} in the algorithm to spread access between memory resources. But the non-a7mode path was incorrectly using GET_BITFIELD(addr, 7, 9) and so picking bits {9, 8, 7} thinko: BIT(1) of the dram_rule registers chooses whether to just use the {8, 7, 6} (or {8, 7, 9}) bits mentioned above as they are, or to XOR them with bits {18, 17, 16} but the code inverted the test. We need the additional XOR when dram_rule{1} == 0. Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 09 2月, 2015 1 次提交
-
-
由 Borislav Petkov 提交于
d0585cd8 ("sb_edac: Claim a different PCI device") changed the probing of sb_edac to look for PCI device 0x3ca0: 3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07) 00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00 ... but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA in sbridge_probe() therefore the probing fails. Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0), .i.e., the 14.0 device, fixes the issue and driver loads successfully again: [ 2449.013120] EDAC DEBUG: sbridge_init: [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0 [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8 [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 ... Add a debug printk while at it to be able to catch the failure in the future and dump driver version on successful load. Fixes: d0585cd8 ("sb_edac: Claim a different PCI device") Cc: stable@vger.kernel.org # 3.18 Acked-by: NAristeu Rozanski <aris@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 03 12月, 2014 2 次提交
-
-
由 Tony Luck 提交于
Code will always think there are 16 banks because of a typo Reported-by: Misha Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Tony Luck 提交于
Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 02 12月, 2014 2 次提交
-
-
由 Tony Luck 提交于
Haswell moved the TOLM/TOHM registers to a different device and offset. The sb_edac driver accounted for the change of device, but not for the new offset. There was also a typo in the constant to fill in the low 26 bits (was 0x1ffffff, should be 0x3ffffff). This resulted in a bogus value for the top of low memory: EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff) which would result in EDAC refusing to translate addresses for errors above the bogus value and below 4GB: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523eac86 sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0 MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0) With the fix we see the correct TOLM value: DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff) and we decode address 2000000 correctly: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523e1086 sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0 DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0 DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01 DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1 DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0 DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0 MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Jim Snow 提交于
Signed-off-by: NJim Snow <jim.snow@intel.com> Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 09 10月, 2014 3 次提交
-
-
由 Andy Lutomirski 提交于
sb_edac controls a large number of different PCI functions. Rather than registering as a normal PCI driver for all of them, it registers for just one so that it gets probed and, at probe time, it looks for all the others. Coincidentally, the device it registers for also contains the SMBUS registers, so the PCI core will refuse to probe both sb_edac and a future iMC SMBUS driver. The drivers don't actually conflict, so just change sb_edac's device table to probe a different device. An alternative fix would be to merge the two drivers, but sb_edac will also refuse to load on non-ECC systems, whereas i2c_imc would still be useful without ECC. The only user-visible change should be that sb_edac appears to bind a different device. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: Rui Wang <ruiv.wang@gmail.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Andy Lutomirski 提交于
The i2c_imc driver will use two of them, and moving only part of the list seems messier. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Seth Jennings 提交于
Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but EDAC doesn't know about this and returns and INTERNAL ERROR when the channel is greater than NUM_CHANNELS: kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0 kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4) kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0) This commit changes sb_edac to forward a channel of -1 to EDAC if the channel is not specified. edac_mc_handle_error() sets the channel to -1 internally after the error message anyway, so this commit should have no effect other than avoiding the INTERNAL ERROR message when the channel is not specified. Signed-off-by: NSeth Jennings <sjenning@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 27 6月, 2014 9 次提交
-
-
由 Aristeu Rozanski 提交于
Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge ones. This patch adds support to Haswell based systems. [m.chehab@samsung.com: Fix CodingStyle issues] Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Mauro Carvalho Chehab 提交于
We should not have spaces before ^I on alignments. Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
When a MC is handled, the correct sbridge_dev is searched based on the node, checking again later with the assumption the first memory controller found is the first socket's memory controller is a bogus assumption. Get rid of it. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
channel_mask will be used in the future to determine which group of memory modules is causing the errors since when mirroring, lockstep and close page are enabled you can't. While that doesn't happen, use the channel_mask to determine the channel instead of relying on the MC event/exception. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This patch fixes the obvious bug while handling the socket/HA bitmask used in Ivy Bridge memory controllers. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This patch changes the way devices are searched by using product id instead of device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to make sure everything works properly. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has a different way to retrieve RIR limits, make this procedure per model. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has a different way to retrieve the node id, make so this procedure can be reimplemented. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has different register, offset to determine memory type and supports DDR4 in some models. This patch makes it easier to have a different method depending on the memory controller type. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 13 3月, 2014 2 次提交
-
-
由 Aristeu Rozanski 提交于
Since the driver is decoding the MCE, it's useless to have these messages printed unless you're debugging a problem in the driver. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Corrected Errors are MC events, not exceptions and reporting as the later might confuse users. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 20 2月, 2014 1 次提交
-
-
由 Jiang Liu 提交于
On a system with four Intel processors, it generates too many messages "EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give many useful information for normal users, so change log level from INFO to DEBUG. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.comAcked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 07 2月, 2014 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
There are several left overs with my old email address. Remove their occurrences and add myself at CREDITS, to allow people to be able to reach me on my new addresses. Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 16 12月, 2013 1 次提交
-
-
由 Rashika Kheria 提交于
This patch marks the function get_mci_for_node_id() as static because it is not used outside of sb_edac.c. Thus, it also eliminates the following warning: drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes] Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 12 12月, 2013 1 次提交
-
-
由 Chen, Gong 提交于
Newer Intel platforms support more than one method to report H/W event. On this kind of platform, H/W event report can adopt new method and traditional EDAC method should be disabled. Moreover, if EDAC event report method is set to *force*, it means event must be reported via EDAC interface. IOW, it overrides the default event report policy. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit and error messages ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 06 12月, 2013 1 次提交
-
-
由 Jingoo Han 提交于
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: NJingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 30 11月, 2013 1 次提交
-
-
由 Aristeu Rozanski 提交于
Fix this: In file included from drivers/edac/sb_edac.c:27:0: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(level "EDAC " prefix ": " fmt, ##arg) ^ drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here u64 ch_addr, offset, limit, prv = 0; Limit can be initialized to 0. The only way limit wouldn't be initialized is if there are no DIMMs present (which would be a bug of course) and it'd fail on the next test. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 15 11月, 2013 11 次提交
-
-
由 Aristeu Rozanski 提交于
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is preparation of Ivy Bridge support. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for the Ivy Bridge support. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation of Ivy Bridge support. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-