- 03 12月, 2014 2 次提交
-
-
由 Tony Luck 提交于
Code will always think there are 16 banks because of a typo Reported-by: Misha Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Tony Luck 提交于
Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 02 12月, 2014 2 次提交
-
-
由 Tony Luck 提交于
Haswell moved the TOLM/TOHM registers to a different device and offset. The sb_edac driver accounted for the change of device, but not for the new offset. There was also a typo in the constant to fill in the low 26 bits (was 0x1ffffff, should be 0x3ffffff). This resulted in a bogus value for the top of low memory: EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff) which would result in EDAC refusing to translate addresses for errors above the bogus value and below 4GB: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523eac86 sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0 MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0) With the fix we see the correct TOLM value: DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff) and we decode address 2000000 correctly: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523e1086 sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0 DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0 DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01 DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1 DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0 DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0 MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Jim Snow 提交于
Signed-off-by: NJim Snow <jim.snow@intel.com> Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 09 10月, 2014 3 次提交
-
-
由 Andy Lutomirski 提交于
sb_edac controls a large number of different PCI functions. Rather than registering as a normal PCI driver for all of them, it registers for just one so that it gets probed and, at probe time, it looks for all the others. Coincidentally, the device it registers for also contains the SMBUS registers, so the PCI core will refuse to probe both sb_edac and a future iMC SMBUS driver. The drivers don't actually conflict, so just change sb_edac's device table to probe a different device. An alternative fix would be to merge the two drivers, but sb_edac will also refuse to load on non-ECC systems, whereas i2c_imc would still be useful without ECC. The only user-visible change should be that sb_edac appears to bind a different device. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: Rui Wang <ruiv.wang@gmail.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Andy Lutomirski 提交于
The i2c_imc driver will use two of them, and moving only part of the list seems messier. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Seth Jennings 提交于
Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but EDAC doesn't know about this and returns and INTERNAL ERROR when the channel is greater than NUM_CHANNELS: kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0 kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4) kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0) This commit changes sb_edac to forward a channel of -1 to EDAC if the channel is not specified. edac_mc_handle_error() sets the channel to -1 internally after the error message anyway, so this commit should have no effect other than avoiding the INTERNAL ERROR message when the channel is not specified. Signed-off-by: NSeth Jennings <sjenning@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 27 6月, 2014 9 次提交
-
-
由 Aristeu Rozanski 提交于
Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge ones. This patch adds support to Haswell based systems. [m.chehab@samsung.com: Fix CodingStyle issues] Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Mauro Carvalho Chehab 提交于
We should not have spaces before ^I on alignments. Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
When a MC is handled, the correct sbridge_dev is searched based on the node, checking again later with the assumption the first memory controller found is the first socket's memory controller is a bogus assumption. Get rid of it. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
channel_mask will be used in the future to determine which group of memory modules is causing the errors since when mirroring, lockstep and close page are enabled you can't. While that doesn't happen, use the channel_mask to determine the channel instead of relying on the MC event/exception. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This patch fixes the obvious bug while handling the socket/HA bitmask used in Ivy Bridge memory controllers. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This patch changes the way devices are searched by using product id instead of device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to make sure everything works properly. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has a different way to retrieve RIR limits, make this procedure per model. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has a different way to retrieve the node id, make so this procedure can be reimplemented. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has different register, offset to determine memory type and supports DDR4 in some models. This patch makes it easier to have a different method depending on the memory controller type. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 13 3月, 2014 2 次提交
-
-
由 Aristeu Rozanski 提交于
Since the driver is decoding the MCE, it's useless to have these messages printed unless you're debugging a problem in the driver. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Corrected Errors are MC events, not exceptions and reporting as the later might confuse users. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 20 2月, 2014 1 次提交
-
-
由 Jiang Liu 提交于
On a system with four Intel processors, it generates too many messages "EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give many useful information for normal users, so change log level from INFO to DEBUG. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.comAcked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 07 2月, 2014 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
There are several left overs with my old email address. Remove their occurrences and add myself at CREDITS, to allow people to be able to reach me on my new addresses. Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 16 12月, 2013 1 次提交
-
-
由 Rashika Kheria 提交于
This patch marks the function get_mci_for_node_id() as static because it is not used outside of sb_edac.c. Thus, it also eliminates the following warning: drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes] Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 12 12月, 2013 1 次提交
-
-
由 Chen, Gong 提交于
Newer Intel platforms support more than one method to report H/W event. On this kind of platform, H/W event report can adopt new method and traditional EDAC method should be disabled. Moreover, if EDAC event report method is set to *force*, it means event must be reported via EDAC interface. IOW, it overrides the default event report policy. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Acked-by: NTony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit and error messages ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 06 12月, 2013 1 次提交
-
-
由 Jingoo Han 提交于
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: NJingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 30 11月, 2013 1 次提交
-
-
由 Aristeu Rozanski 提交于
Fix this: In file included from drivers/edac/sb_edac.c:27:0: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(level "EDAC " prefix ": " fmt, ##arg) ^ drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here u64 ch_addr, offset, limit, prv = 0; Limit can be initialized to 0. The only way limit wouldn't be initialized is if there are no DIMMs present (which would be a bug of course) and it'd fail on the next test. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 15 11月, 2013 11 次提交
-
-
由 Aristeu Rozanski 提交于
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for Ivy Bridge support Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is preparation of Ivy Bridge support. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation for the Ivy Bridge support. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This is in preparation of Ivy Bridge support. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
- 22 10月, 2013 1 次提交
-
-
由 Chen, Gong 提交于
GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NMauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 29 4月, 2013 1 次提交
-
-
由 Luck, Tony 提交于
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR space to determine the type of DIMMs (registered or unregistered). But this device does not exist on some single socket Sandy Bridge servers. While the type of DIMMs is nice to know, it is not essential for this driver's other functions. So it seems harsh to have it refuse to load at all when it cannot find this device. Make the check for this device be optional. If it isn't present just report the memory type as "MEM_UNKNOWN". Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
- 04 1月, 2013 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 21 12月, 2012 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
[ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3 Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
- 25 9月, 2012 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Cc: stable@kernel.org Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-