- 02 12月, 2014 2 次提交
-
-
由 Jim Snow 提交于
Signed-off-by: NJim Snow <jim.snow@intel.com> Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Jim Snow 提交于
This prevented edac sysfs code from properly handling 6 channels per memory controller. Signed-off-by: NJim Snow <jim.snow@intel.com> Signed-off-by: NLukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 23 10月, 2014 4 次提交
-
-
由 Jason Baron 提交于
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Fix UE event being reported as HW_EVENT_ERR_CORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 09 10月, 2014 3 次提交
-
-
由 Andy Lutomirski 提交于
sb_edac controls a large number of different PCI functions. Rather than registering as a normal PCI driver for all of them, it registers for just one so that it gets probed and, at probe time, it looks for all the others. Coincidentally, the device it registers for also contains the SMBUS registers, so the PCI core will refuse to probe both sb_edac and a future iMC SMBUS driver. The drivers don't actually conflict, so just change sb_edac's device table to probe a different device. An alternative fix would be to merge the two drivers, but sb_edac will also refuse to load on non-ECC systems, whereas i2c_imc would still be useful without ECC. The only user-visible change should be that sb_edac appears to bind a different device. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: Rui Wang <ruiv.wang@gmail.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Andy Lutomirski 提交于
The i2c_imc driver will use two of them, and moving only part of the list seems messier. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
由 Seth Jennings 提交于
Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but EDAC doesn't know about this and returns and INTERNAL ERROR when the channel is greater than NUM_CHANNELS: kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0 kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4) kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0) This commit changes sb_edac to forward a channel of -1 to EDAC if the channel is not specified. edac_mc_handle_error() sets the channel to -1 internally after the error message anyway, so this commit should have no effect other than avoiding the INTERNAL ERROR message when the channel is not specified. Signed-off-by: NSeth Jennings <sjenning@redhat.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
-
- 30 9月, 2014 1 次提交
-
-
由 Borislav Petkov 提交于
The other two interrupt handlers in this driver are shared, except this one. When loading the driver, it fails like this. So make the IRQ line shared. Freescale(R) MPC85xx EDAC driver, (C) 2006 Montavista Software mpc85xx_mc_err_probe: No ECC DIMMs discovered EDAC DEVICE0: Giving out device to module MPC85xx_edac controller mpc85xx_l2_err: DEV mpc85xx_l2_err (INTERRUPT) genirq: Flags mismatch irq 16. 00000000 ([EDAC] L2 err) vs. 00000080 ([EDAC] PCI err) mpc85xx_l2_err_probe: Unable to request irq 16 for MPC85xx L2 err remove_proc_entry: removing non-empty directory 'irq/16', leaking at least 'aerdrv' ------------[ cut here ]------------ WARNING: at fs/proc/generic.c:521 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc5-dirty #1 task: ee058000 ti: ee046000 task.ti: ee046000 NIP: c016c0c4 LR: c016c0c4 CTR: c037b51c REGS: ee047c10 TRAP: 0700 Not tainted (3.17.0-rc5-dirty) MSR: 00029000 <CE,EE,ME> CR: 22008022 XER: 20000000 GPR00: c016c0c4 ee047cc0 ee058000 00000053 00029000 00000000 c037c744 00000003 GPR08: c09aab28 c09aab24 c09aab28 00000156 20008028 00000000 c0002ac8 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000139 c0950394 GPR24: c09f0000 ee5585b0 ee047d08 c0a10000 ee047d08 ee15f808 00000002 ee03f660 NIP [c016c0c4] remove_proc_entry LR [c016c0c4] remove_proc_entry Call Trace: remove_proc_entry (unreliable) unregister_irq_proc free_desc irq_free_descs mpc85xx_l2_err_probe platform_drv_probe really_probe __driver_attach bus_for_each_dev bus_add_driver driver_register mpc85xx_mc_init do_one_initcall kernel_init_freeable kernel_init ret_from_kernel_thread Instruction dump: ... Reported-and-tested-by: <lpb_098@163.com> Acked-by: NJohannes Thumshirn <johannes.thumshirn@men.de> Cc: stable@vger.kernel.org Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 23 9月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
Rationale behind this change: - F2x1xx addresses were stopped from being mapped explicitly to DCT1 from F15h (OR) onwards. They use _dct[0:1] mechanism to access the registers. So we should move away from using address ranges to select DCT for these families. - On newer processors, the address ranges used to indicate DCT1 (0x140, 0x1a0) have different meanings than what is assumed currently. Changes introduced: - amd64_read_dct_pci_cfg() now takes in dct value and uses it for 'selecting the dct' - Update usage of the function. Keep in mind that different families have specific handling requirements - Remove [k8|f10]_read_dct_pci_cfg() as they don't do much different from amd64_read_pci_cfg() - Move the k8 specific check to amd64_read_pci_cfg - Remove f15_read_dct_pci_cfg() and move logic to amd64_read_dct_pci_cfg() - Remove now needless .read_dct_pci_cfg Testing: - Tested on Fam 10h; Fam15h Models: 00h, 30h; Fam16h using 'EDAC_DEBUG' and mce_amd_inj - driver obtains info from F2x registers and caches it in pvt structures correctly - ECC decoding works fine Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1410799058-3149-1-git-send-email-aravind.gopalakrishnan@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 15 9月, 2014 1 次提交
-
-
由 Pranith Kumar 提交于
Fix the following error drivers/edac/ppc4xx_edac.c:977:45: error: request for member 'dimm' in something not a structure or union by changing member access to pointer dereference. Signed-off-by: NPranith Kumar <bobby.prani@gmail.com> Link: http://lkml.kernel.org/r/1408482646-22541-1-git-send-email-bobby.prani@gmail.com CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 05 9月, 2014 1 次提交
-
-
由 Thor Thayer 提交于
This patch adds support for the CycloneV and ArriaV SDRAM controllers. Correction and reporting of SBEs, Panic on DBEs. There was a discussion thread on whether this driver should be an mfd driver or just make use of syscon, which is already a mfd. Ultimately, the decision to use a simple syscon interface was reached.[1] [1] https://lkml.org/lkml/2014/7/30/514 [dinguyen] Fixed Kconfig to have EDAC_ALTERA_MC as a tristate to prevent a build failure for allmodconfig. Signed-off-by: NThor Thayer <tthayer@opensource.altera.com> Acked-by: NBorislav Petkov <bp@suse.de> [dinguyen] cleaned up commit message Signed-off-by: NDinh Nguyen <dinguyen@opensource.altera.com>
-
- 02 9月, 2014 1 次提交
-
-
由 Borislav Petkov 提交于
This one got forgotten during an earlier cleanup. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 14 7月, 2014 1 次提交
-
-
由 Aravind Gopalakrishnan 提交于
Add decoding logic for new Fam15h model 60h. Tested using mce_amd_inj module and works fine. Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1405098795-4678-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: simplify a bit. ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 10 7月, 2014 1 次提交
-
-
由 Jason Baron 提交于
Check for memory allocation and mchbar mapping failures before initializing the dimm info tables needlessly. Signed-off-by: NJason Baron <jbaron@akamai.com> Suggested-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/ead8f53e699f1ce21c2e17f3cffb4685d4faf72a.1404939455.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 04 7月, 2014 2 次提交
-
-
由 Jason Baron 提交于
Add a driver for the E3-1200 series of Intel DRAM controllers, based on the following E3-1200 specs: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200-family-vol-2-datasheet.html http://www.intel.com/content/www/us/en/processors/xeon/xeon-e3-1200v3-vol-2-datasheet.html I've tested this on bad memory hardware, and observed correlating bad reads and uncorrected memory errors as reported by the driver. Tested against: CPU E3-1270 v3 @ 3.50GHz : 8086:0c08 (haswell) CPU E3-1270 V2 @ 3.50GHz : 8086:0158 (ivy bridge) CPU E31270 @ 3.40GHz : 8086:0108 (sandy bridge) Signed-off-by: NJason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/95c83e80dd40b5377e8bb206285c5d95ac623872.1403818526.git.jbaron@akamai.com [ Boris: realign defines ] Signed-off-by: NBorislav Petkov <bp@suse.de>
-
由 Jason Baron 提交于
Convert to the generic API. Signed-off-by: NJason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/bb9a4cbb980cc7b51be75cbfcf644553bf6a04cd.1403818526.git.jbaron@akamai.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 27 6月, 2014 12 次提交
-
-
由 Aristeu Rozanski 提交于
Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge ones. This patch adds support to Haswell based systems. [m.chehab@samsung.com: Fix CodingStyle issues] Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Mauro Carvalho Chehab 提交于
We should not have spaces before ^I on alignments. Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell memory controller can make use of DDR4 and Registered DDR4 Cc: tony.luck@intel.com Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
When a MC is handled, the correct sbridge_dev is searched based on the node, checking again later with the assumption the first memory controller found is the first socket's memory controller is a bogus assumption. Get rid of it. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
channel_mask will be used in the future to determine which group of memory modules is causing the errors since when mirroring, lockstep and close page are enabled you can't. While that doesn't happen, use the channel_mask to determine the channel instead of relying on the MC event/exception. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This patch fixes the obvious bug while handling the socket/HA bitmask used in Ivy Bridge memory controllers. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Kconfig wasn't updated when Ivy Bridge support was added. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
This patch changes the way devices are searched by using product id instead of device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to make sure everything works properly. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has a different way to retrieve RIR limits, make this procedure per model. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has a different way to retrieve the node id, make so this procedure can be reimplemented. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Haswell has different register, offset to determine memory type and supports DDR4 in some models. This patch makes it easier to have a different method depending on the memory controller type. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Grant Likely 提交于
There are a bunch of users open coding the for_each_node_by_name() by calling of_find_node_by_name() directly instead of using the macro. This is getting in the way of some cleanups, and the possibility of removing of_find_node_by_name() entirely. Clean it up so that all the users are consistent. Signed-off-by: NGrant Likely <grant.likely@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Takashi Iwai <tiwai@suse.de>
-
- 24 6月, 2014 2 次提交
-
-
由 Fabian Frederick 提交于
unsigned long value is never < 0. Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: NFabian Frederick <fabf@skynet.be> Link: http://lkml.kernel.org/r/1402341618-10674-1-git-send-email-fabf@skynet.beSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Chen, Gong 提交于
To avoid confuision and conflict of usage for RAS related trace event, add an unified RAS trace event stub. Start a RAS subsystem menu which will be fleshed out in time, when more features get added to it. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.comSigned-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 31 5月, 2014 1 次提交
-
-
由 Yinghai Lu 提交于
Assign PCI resources before pci_bus_add_device(). The resources must be assigned before a driver can claim the device. [bhelgaas: changelog] Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 30 5月, 2014 1 次提交
-
-
由 Yijing Wang 提交于
pci_bus_add_device() always returns 0, so there's no point in returning anything at all. Make it a void function and remove the tests of the return value from the callers. [bhelgaas: changelog, remove unused "err" from i82875p_setup_overfl_dev()] Signed-off-by: NYijing Wang <wangyijing@huawei.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 09 5月, 2014 2 次提交
-
-
由 Loc Ho 提交于
The MC structure field scrub_mode is of integer type - not bit field. Use it accordingly. Signed-off-by: NLoc Ho <lho@apm.com> Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Borislav Petkov 提交于
295d8cda ("EDAC, MCE, AMD: Drop local coreid reporting") removed the code snippet which used that mask but forgot to drop the mask itself. Do that now. Signed-off-by: NBorislav Petkov <bp@suse.de>
-
- 01 4月, 2014 2 次提交
-
-
由 Daniel Walker 提交于
This adds an ad-hoc error injection method. Octeon II doesn't have hardware support for injection, so this simulates it. Signed-off-by: NDaniel Walker <dwalker@fifo99.com> Cc: David Daney <david.daney@cavium.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-edac@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/5873/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Daniel Walker 提交于
If the opstate_init() isn't called the driver won't start properly. I just added it in what appears to be an appropriate place. Signed-off-by: NDaniel Walker <dwalker@fifo99.com> Cc: David Daney <david.daney@cavium.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-edac@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/5872/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
- 13 3月, 2014 2 次提交
-
-
由 Aristeu Rozanski 提交于
Since the driver is decoding the MCE, it's useless to have these messages printed unless you're debugging a problem in the driver. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-
由 Aristeu Rozanski 提交于
Corrected Errors are MC events, not exceptions and reporting as the later might confuse users. Signed-off-by: NAristeu Rozanski <arozansk@redhat.com> Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
-