提交 · 1446f17adfebc32055a892c23da0b96d43e3d57b · openeuler / Kernel

05 11月, 2013 5 次提交

edac, highbank: Moving error injection to sysfs for edac · 78cfbf0b

由 Robert Richter 提交于 10月 29, 2013

Always have the error injection i/f available, even if there is no
debugfs or EDAC_DEBUG enabled. We need this for testing production
kernels and environments.

Thus, the entry moves from:

 /sys/kernel/debug/edac/mc0/inject_ctrl

to:

 /sys/devices/system/edac/mc/mc0/inject_ctrl

No other changes of the interface.
Signed-off-by: NRobert Richter <robert.richter@linaro.org>
Signed-off-by: NRobert Richter <rric@kernel.org>

78cfbf0b

edac: Unify reporting of device info for device, mc and pci · 7270a608

由 Robert Richter 提交于 10月 10, 2013

Log messages slightly differ between edac subsystems. Unifying it.
Signed-off-by: NRobert Richter <robert.richter@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRobert Richter <rric@kernel.org>

7270a608

edac, highbank: Improve and unify naming · 41ec0e8d

由 Robert Richter 提交于 10月 10, 2013

Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac
modules for module, controller and device. Reported values for
Highbank in dmesg are now:

 EDAC MC0: Giving out device to module hb_mc_edac controller
 calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT)

 EDAC DEVICE0: Giving out device to module hb_l2_edac controller
 calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT)
Signed-off-by: NRobert Richter <robert.richter@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Signed-off-by: NRobert Richter <rric@kernel.org>

41ec0e8d

edac, highbank: Add Calxeda ECX-2000 support · 0ec8579e

由 Robert Richter 提交于 10月 10, 2013

Implement edac support for Calxeda ECX-2000.

The ECX-2000 memory controller is similar to Highbank but has
different register bases for error and interrupt registers. There is
an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification
and initialization of the ECX-2000 and its base addresses.
Signed-off-by: NRobert Richter <robert.richter@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Signed-off-by: NRobert Richter <rric@kernel.org>

0ec8579e

edac, highbank: Fix interrupt setup of mem and l2 controller · a72b8859

由 Robert Richter 提交于 10月 10, 2013

Register and enable interrupts after the edac registration. Otherwise
incomming ecc error interrupts lead to crashes during device setup.

Fixing this in drivers for mc and l2.
Signed-off-by: NRobert Richter <robert.richter@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Cc: stable <stable@vger.kernel.org>     # 3.6+
Signed-off-by: NRobert Richter <rric@kernel.org>

a72b8859

27 8月, 2013 2 次提交

amd64_edac: Fix incorrect wraparounds · 4fc06b31

由 Aravind Gopalakrishnan 提交于 8月 24, 2013

dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountainSigned-off-by: NBorislav Petkov <bp@suse.de>

4fc06b31

amd64_edac: Correct erratum 505 range · 3f0aba4f

由 Borislav Petkov 提交于 8月 24, 2013

Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.

Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>

3f0aba4f

14 8月, 2013 1 次提交

cpc925_edac: Use proper array termination · 75a9551f

由 Jingoo Han 提交于 8月 12, 2013

The struct should be terminated by using empty braces in order to
fix the following sparse warning.

drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
[ drop obvious comment ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

75a9551f

12 8月, 2013 2 次提交

amd64_edac: Get rid of boot_cpu_data accesses · a4b4bedc

由 Borislav Petkov 提交于 8月 10, 2013

Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>

a4b4bedc

amd64_edac: Add ECC decoding support for newer F15h models · 18b94f66

由 Aravind Gopalakrishnan 提交于 8月 09, 2013

On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.

Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.

While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:

      * Erratum 505: fixed in model 0x1, stepping 0x1 and later.
      * Erratum 637: not fixed.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

18b94f66

09 8月, 2013 2 次提交

x38_edac: Make a local function static · e0d391ab

由 Jingoo Han 提交于 8月 09, 2013

Make a local function static in order to fix the following sparse
warning:

drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static?
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
[ Boris: Correct commit message ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

e0d391ab

i3200_edac: Make a local function static · 166e9334

由 Jingoo Han 提交于 8月 09, 2013

This local symbol is used only in this file.
Fix the following sparse warnings:

drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static?
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

166e9334

29 7月, 2013 1 次提交

amd64_edac: Fix single-channel setups · f0a56c48

由 Borislav Petkov 提交于 7月 23, 2013

It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.

So always allocate two channels unconditionally.

This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.
Reported-and-tested-by: NRoger Leigh <rleigh@debian.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

f0a56c48

24 7月, 2013 2 次提交

EDAC: Replace strict_strtol() with kstrtol() · c542b53d

由 Jingoo Han 提交于 7月 19, 2013

The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

c542b53d

EDAC: Fix lockdep splat · 88d84ac9

由 Borislav Petkov 提交于 7月 19, 2013

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
  dump_stack
  warn_slowpath_common
  warn_slowpath_fmt
  lockdep_init_map
  ? trace_hardirqs_on_caller
  ? trace_hardirqs_on
  debug_mutex_init
  __mutex_init
  bus_register
  edac_create_sysfs_mci_device
  edac_mc_add_mc
  sbridge_probe
  pci_device_probe
  driver_probe_device
  __driver_attach
  ? driver_probe_device
  bus_for_each_dev
  driver_attach
  bus_add_driver
  driver_register
  __pci_register_driver
  ? 0xffffffffa0010fff
  sbridge_init
  ? 0xffffffffa0010fff
  do_one_initcall
  load_module
  ? unset_module_init_ro_nx
  SyS_init_module
  tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: NTony Luck <tony.luck@intel.com>

88d84ac9

18 7月, 2013 1 次提交

edac: Remove redundant platform_set_drvdata() · 8e42e211

由 Sachin Kamat 提交于 5月 03, 2013

Commit 0998d063 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>

8e42e211

11 6月, 2013 1 次提交

MIPS: OCTEON: Rename Kconfig CAVIUM_OCTEON_REFERENCE_BOARD to CAVIUM_OCTEON_SOC · 9ddebc46

由 David Daney 提交于 5月 22, 2013

CAVIUM_OCTEON_SOC most place we used to use CPU_CAVIUM_OCTEON.  This
allows us to CPU_CAVIUM_OCTEON in places where we have no OCTEON SOC.

Remove CAVIUM_OCTEON_SIMULATOR as it doesn't really do anything, we can
get the same configuration with CAVIUM_OCTEON_SOC.
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-ide@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: linux-i2c@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: spi-devel-general@lists.sourceforge.net
Cc: devel@driverdev.osuosl.org
Cc: linux-usb@vger.kernel.org
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NWolfram Sang <wsa@the-dreams.de>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Patchwork: https://patchwork.linux-mips.org/patch/5295/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

9ddebc46

08 6月, 2013 2 次提交

EDAC, MCE, AMD: Add an MCE signature for new Fam15h models · aad19e51

由 Aravind Gopalakrishnan 提交于 6月 05, 2013

Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.
Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
 [ cleanup commit message and error string ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

aad19e51

EDAC: Replace strict_strtoul() with kstrtoul() · c7f62fc8

由 Jingoo Han 提交于 6月 01, 2013

The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

c7f62fc8

04 6月, 2013 1 次提交

Finally eradicate CONFIG_HOTPLUG · 40b31360

由 Stephen Rothwell 提交于 5月 21, 2013

Ever since commit 45f035ab ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

40b31360

21 5月, 2013 1 次提交

amd64_edac: Fix bogus sysfs file permissions · bbb013b9

由 Borislav Petkov 提交于 5月 12, 2013

Fix yet another issue caught by 8f46baaa ("base: core: WARN() about
bogus permissions on device attributes").
Signed-off-by: NBorislav Petkov <bp@suse.de>

bbb013b9

09 5月, 2013 1 次提交

EDAC: Don't give write permission to read-only files · c8c64d16

由 Srivatsa S. Bhat 提交于 4月 30, 2013

I get the following warning on boot:

------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name:  -[8737R2A]-
Write permission without 'store'
...
</snip>

Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

c8c64d16

29 4月, 2013 2 次提交

edac: sb_edac.c should not require prescence of IMC_DDRIO device · de4772c6

由 Luck, Tony 提交于 3月 28, 2013

The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers.  While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.

Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

de4772c6

i7300_edac: Fix memory detection in single mode · 33ad4126

由 Mauro Carvalho Chehab 提交于 3月 13, 2013

When the machine is on single mode, only branch 0 channel 0
is valid. However, the code is not honouring it:

[ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode
...
[ 1952.639351] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH0 = 0x1:
[ 1952.639353] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH1 = 0x0:
[ 1952.639355] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH2 = 0x0:
[ 1952.639358] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH3 = 0x0:
...
[ 1952.639360] EDAC DEBUG: decode_mtr: 	MTR0 CH0: DIMMs are Present (mtr)
[ 1952.639362] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639363] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639364] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639366] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639367] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639368] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639370] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639371] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639373] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
[ 1952.639374] EDAC DEBUG: decode_mtr: 	MTR0 CH1: DIMMs are Present (mtr)
[ 1952.639376] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639377] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639379] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639380] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639381] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639383] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639384] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639385] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639387] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
...
[ 1952.639449] EDAC DEBUG: print_dimm_size:               channel 0 | channel 1 | channel 2 | channel 3 |
[ 1952.639451] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
[ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0   512 MB   |  512 MB   |    0 MB   |    0 MB   |
[ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639470] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------

Instead of detecting a single memory at channel 0, it is showing
twice the memory.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

33ad4126

19 4月, 2013 1 次提交

amd64_edac: Add Family 16h support · 94c1acf2

由 Aravind Gopalakrishnan 提交于 4月 17, 2013

Add code to handle DRAM ECC errors decoding for Fam16h.

Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

94c1acf2

25 3月, 2013 1 次提交
- B
  EDAC, mc_sysfs.c: Fix string array pointer types · 8b7719e0
  由 Borislav Petkov 提交于 3月 25, 2013
```
Those should be const ptr to a const string, fix them.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
  8b7719e0
16 3月, 2013 2 次提交

EDAC: Merge mci.mem_is_per_rank with mci.csbased · 9713faec

由 Mauro Carvalho Chehab 提交于 3月 11, 2013

Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

9713faec

amd64_edac: Correct DIMM sizes · 1eef1282

由 Mauro Carvalho Chehab 提交于 3月 11, 2013

We were filling the csrow size with a wrong value. 16a528ee ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

1eef1282

05 3月, 2013 1 次提交

EDAC: Make sysfs functions static · fbe2d361

由 Stephen Hemminger 提交于 2月 21, 2013

Fixes lots of sparse warnings here.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

fbe2d361

26 2月, 2013 9 次提交

i5100_edac: convert to use simple_open() · b0769891

由 Wei Yongjun 提交于 2月 26, 2013

This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

b0769891

ghes_edac: fix to use list_for_each_entry_safe() when delete list items · 5dae92a7

由 Wei Yongjun 提交于 2月 26, 2013

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

5dae92a7

ghes_edac: Fix RAS tracing · 8ae8f50a

由 Mauro Carvalho Chehab 提交于 2月 19, 2013

With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   72.396627] {1}[Hardware Error]: APEI generic hardware error status
[   72.396628] {1}[Hardware Error]: severity: 2, corrected
[   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[   72.396632] {1}[Hardware Error]: flags: 0x01
[   72.396634] {1}[Hardware Error]: primary
[   72.396635] {1}[Hardware Error]: section_type: memory error
[   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[   72.396638] {1}[Hardware Error]: node: 3
[   72.396639] {1}[Hardware Error]: card: 0
[   72.396640] {1}[Hardware Error]: module: 0
[   72.396641] {1}[Hardware Error]: device: 0
[   72.396643] {1}[Hardware Error]: error_type: 18, unknown
[   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

     kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

8ae8f50a

ghes_edac: Make it compliant with UEFI spec 2.3.1 · 689c9cd8

由 Mauro Carvalho Chehab 提交于 2月 19, 2013

The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[  281.556854] mce: [Hardware Error]: Machine check events logged
[  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  281.557044] {2}[Hardware Error]: APEI generic hardware error status
[  281.557046] {2}[Hardware Error]: severity: 2, corrected
[  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[  281.557050] {2}[Hardware Error]: flags: 0x01
[  281.557052] {2}[Hardware Error]: primary
[  281.557053] {2}[Hardware Error]: section_type: memory error
[  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[  281.557056] {2}[Hardware Error]: node: 3
[  281.557057] {2}[Hardware Error]: card: 0
[  281.557058] {2}[Hardware Error]: module: 1
[  281.557059] {2}[Hardware Error]: device: 0
[  281.557061] {2}[Hardware Error]: error_type: 18, unknown
[  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

689c9cd8

ghes_edac: Improve driver's printk messages · d2a68566

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

Provide a better infrastructure for printk's inside the driver:
	- use edac_dbg() for debug messages;
	- standardize the usage of pr_info();
	- provide warning about the risk of relying on this
	  driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

Conflicts:
	drivers/edac/ghes_edac.c

d2a68566

ghes_edac: Don't credit the same memory dimm twice · 5ee726db

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

          +-----------------------+
          |    mc0    |    mc1    |
----------+-----------------------+
memory45: |  8192 MB  |  8192 MB  |
memory44: |     0 MB  |     0 MB  |
----------+-----------------------+
memory43: |     0 MB  |     0 MB  |
memory42: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory41: |     0 MB  |     0 MB  |
memory40: |     0 MB  |     0 MB  |
----------+-----------------------+
memory39: |  8192 MB  |  8192 MB  |
memory38: |     0 MB  |     0 MB  |
----------+-----------------------+
memory37: |     0 MB  |     0 MB  |
memory36: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory35: |     0 MB  |     0 MB  |
memory34: |     0 MB  |     0 MB  |
----------+-----------------------+
memory33: |  8192 MB  |  8192 MB  |
memory32: |     0 MB  |     0 MB  |
----------+-----------------------+
memory31: |     0 MB  |     0 MB  |
memory30: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory29: |     0 MB  |     0 MB  |
memory28: |     0 MB  |     0 MB  |
----------+-----------------------+
memory27: |  8192 MB  |  8192 MB  |
memory26: |     0 MB  |     0 MB  |
----------+-----------------------+
memory25: |     0 MB  |     0 MB  |
memory24: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory23: |     0 MB  |     0 MB  |
memory22: |     0 MB  |     0 MB  |
----------+-----------------------+
memory21: |  8192 MB  |  8192 MB  |
memory20: |     0 MB  |     0 MB  |
----------+-----------------------+
memory19: |     0 MB  |     0 MB  |
memory18: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory17: |     0 MB  |     0 MB  |
memory16: |     0 MB  |     0 MB  |
----------+-----------------------+
memory15: |  8192 MB  |  8192 MB  |
memory14: |     0 MB  |     0 MB  |
----------+-----------------------+
memory13: |     0 MB  |     0 MB  |
memory12: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory11: |     0 MB  |     0 MB  |
memory10: |     0 MB  |     0 MB  |
----------+-----------------------+
memory9:  |  8192 MB  |  8192 MB  |
memory8:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory7:  |     0 MB  |     0 MB  |
memory6:  |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory5:  |     0 MB  |     0 MB  |
memory4:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory3:  |  8192 MB  |  8192 MB  |
memory2:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory1:  |     0 MB  |     0 MB  |
memory0:  |  8192 MB  |  8192 MB  |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

5ee726db

ghes_edac: do a better job of filling EDAC DIMM info · 32fa1f53

由 Mauro Carvalho Chehab 提交于 2月 14, 2013

Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

32fa1f53

ghes_edac: add support for reporting errors via EDAC · f04c62a7

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f04c62a7

ghes_edac: Register at EDAC core the BIOS report · 77c5f5d2

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.

The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.

For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

77c5f5d2

22 2月, 2013 2 次提交

edac: add support for raw error reports · e7e24830

由 Mauro Carvalho Chehab 提交于 10月 31, 2012

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

e7e24830

edac: reduce stack pressure by using a pre-allocated buffer · c7ef7645

由 Mauro Carvalho Chehab 提交于 2月 21, 2013

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c7ef7645

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功