提交 · f3acb96f38bb16057e98f862e70e56ca3588ef54 · openanolis / cloud-kernel

08 6月, 2013 2 次提交

EDAC, MCE, AMD: Add an MCE signature for new Fam15h models · aad19e51

由 Aravind Gopalakrishnan 提交于 6月 05, 2013

Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.
Signed-off-by: NAravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
 [ cleanup commit message and error string ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

aad19e51

EDAC: Replace strict_strtoul() with kstrtoul() · c7f62fc8

由 Jingoo Han 提交于 6月 01, 2013

The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

c7f62fc8

04 6月, 2013 1 次提交

Finally eradicate CONFIG_HOTPLUG · 40b31360

由 Stephen Rothwell 提交于 5月 21, 2013

Ever since commit 45f035ab ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off.  Remove all the remaining references to it.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

40b31360

21 5月, 2013 1 次提交

amd64_edac: Fix bogus sysfs file permissions · bbb013b9

由 Borislav Petkov 提交于 5月 12, 2013

Fix yet another issue caught by 8f46baaa ("base: core: WARN() about
bogus permissions on device attributes").
Signed-off-by: NBorislav Petkov <bp@suse.de>

bbb013b9

09 5月, 2013 1 次提交

EDAC: Don't give write permission to read-only files · c8c64d16

由 Srivatsa S. Bhat 提交于 4月 30, 2013

I get the following warning on boot:

------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name:  -[8737R2A]-
Write permission without 'store'
...
</snip>

Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

c8c64d16

29 4月, 2013 2 次提交

edac: sb_edac.c should not require prescence of IMC_DDRIO device · de4772c6

由 Luck, Tony 提交于 3月 28, 2013

The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers.  While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.

Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

de4772c6

i7300_edac: Fix memory detection in single mode · 33ad4126

由 Mauro Carvalho Chehab 提交于 3月 13, 2013

When the machine is on single mode, only branch 0 channel 0
is valid. However, the code is not honouring it:

[ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode
...
[ 1952.639351] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH0 = 0x1:
[ 1952.639353] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH1 = 0x0:
[ 1952.639355] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH2 = 0x0:
[ 1952.639358] EDAC DEBUG: i7300_init_csrows: 		AMB-present CH3 = 0x0:
...
[ 1952.639360] EDAC DEBUG: decode_mtr: 	MTR0 CH0: DIMMs are Present (mtr)
[ 1952.639362] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639363] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639364] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639366] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639367] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639368] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639370] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639371] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639373] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
[ 1952.639374] EDAC DEBUG: decode_mtr: 	MTR0 CH1: DIMMs are Present (mtr)
[ 1952.639376] EDAC DEBUG: decode_mtr: 		WIDTH: x8
[ 1952.639377] EDAC DEBUG: decode_mtr: 		ELECTRICAL THROTTLING is enabled
[ 1952.639379] EDAC DEBUG: decode_mtr: 		NUMBANK: 4 bank(s)
[ 1952.639380] EDAC DEBUG: decode_mtr: 		NUMRANK: single
[ 1952.639381] EDAC DEBUG: decode_mtr: 		NUMROW: 16,384 - 14 rows
[ 1952.639383] EDAC DEBUG: decode_mtr: 		NUMCOL: 1,024 - 10 columns
[ 1952.639384] EDAC DEBUG: decode_mtr: 		SIZE: 512 MB
[ 1952.639385] EDAC DEBUG: decode_mtr: 		ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639387] EDAC DEBUG: decode_mtr: 		Scrub algorithm for x8 is on enhanced mode
...
[ 1952.639449] EDAC DEBUG: print_dimm_size:               channel 0 | channel 1 | channel 2 | channel 3 |
[ 1952.639451] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
[ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0   512 MB   |  512 MB   |    0 MB   |    0 MB   |
[ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7     0 MB   |    0 MB   |    0 MB   |    0 MB   |
[ 1952.639470] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------

Instead of detecting a single memory at channel 0, it is showing
twice the memory.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

33ad4126

19 4月, 2013 1 次提交

amd64_edac: Add Family 16h support · 94c1acf2

由 Aravind Gopalakrishnan 提交于 4月 17, 2013

Add code to handle DRAM ECC errors decoding for Fam16h.

Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.
Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

94c1acf2

25 3月, 2013 1 次提交
- B
  EDAC, mc_sysfs.c: Fix string array pointer types · 8b7719e0
  由 Borislav Petkov 提交于 3月 25, 2013
```
Those should be const ptr to a const string, fix them.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
  8b7719e0
16 3月, 2013 2 次提交

EDAC: Merge mci.mem_is_per_rank with mci.csbased · 9713faec

由 Mauro Carvalho Chehab 提交于 3月 11, 2013

Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

9713faec

amd64_edac: Correct DIMM sizes · 1eef1282

由 Mauro Carvalho Chehab 提交于 3月 11, 2013

We were filling the csrow size with a wrong value. 16a528ee ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

1eef1282

05 3月, 2013 1 次提交

EDAC: Make sysfs functions static · fbe2d361

由 Stephen Hemminger 提交于 2月 21, 2013

Fixes lots of sparse warnings here.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

fbe2d361

26 2月, 2013 9 次提交

i5100_edac: convert to use simple_open() · b0769891

由 Wei Yongjun 提交于 2月 26, 2013

This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

b0769891

ghes_edac: fix to use list_for_each_entry_safe() when delete list items · 5dae92a7

由 Wei Yongjun 提交于 2月 26, 2013

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

5dae92a7

ghes_edac: Fix RAS tracing · 8ae8f50a

由 Mauro Carvalho Chehab 提交于 2月 19, 2013

With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   72.396627] {1}[Hardware Error]: APEI generic hardware error status
[   72.396628] {1}[Hardware Error]: severity: 2, corrected
[   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[   72.396632] {1}[Hardware Error]: flags: 0x01
[   72.396634] {1}[Hardware Error]: primary
[   72.396635] {1}[Hardware Error]: section_type: memory error
[   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[   72.396638] {1}[Hardware Error]: node: 3
[   72.396639] {1}[Hardware Error]: card: 0
[   72.396640] {1}[Hardware Error]: module: 0
[   72.396641] {1}[Hardware Error]: device: 0
[   72.396643] {1}[Hardware Error]: error_type: 18, unknown
[   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

     kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

8ae8f50a

ghes_edac: Make it compliant with UEFI spec 2.3.1 · 689c9cd8

由 Mauro Carvalho Chehab 提交于 2月 19, 2013

The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[  281.556854] mce: [Hardware Error]: Machine check events logged
[  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  281.557044] {2}[Hardware Error]: APEI generic hardware error status
[  281.557046] {2}[Hardware Error]: severity: 2, corrected
[  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[  281.557050] {2}[Hardware Error]: flags: 0x01
[  281.557052] {2}[Hardware Error]: primary
[  281.557053] {2}[Hardware Error]: section_type: memory error
[  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[  281.557056] {2}[Hardware Error]: node: 3
[  281.557057] {2}[Hardware Error]: card: 0
[  281.557058] {2}[Hardware Error]: module: 1
[  281.557059] {2}[Hardware Error]: device: 0
[  281.557061] {2}[Hardware Error]: error_type: 18, unknown
[  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

689c9cd8

ghes_edac: Improve driver's printk messages · d2a68566

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

Provide a better infrastructure for printk's inside the driver:
	- use edac_dbg() for debug messages;
	- standardize the usage of pr_info();
	- provide warning about the risk of relying on this
	  driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

Conflicts:
	drivers/edac/ghes_edac.c

d2a68566

ghes_edac: Don't credit the same memory dimm twice · 5ee726db

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

          +-----------------------+
          |    mc0    |    mc1    |
----------+-----------------------+
memory45: |  8192 MB  |  8192 MB  |
memory44: |     0 MB  |     0 MB  |
----------+-----------------------+
memory43: |     0 MB  |     0 MB  |
memory42: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory41: |     0 MB  |     0 MB  |
memory40: |     0 MB  |     0 MB  |
----------+-----------------------+
memory39: |  8192 MB  |  8192 MB  |
memory38: |     0 MB  |     0 MB  |
----------+-----------------------+
memory37: |     0 MB  |     0 MB  |
memory36: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory35: |     0 MB  |     0 MB  |
memory34: |     0 MB  |     0 MB  |
----------+-----------------------+
memory33: |  8192 MB  |  8192 MB  |
memory32: |     0 MB  |     0 MB  |
----------+-----------------------+
memory31: |     0 MB  |     0 MB  |
memory30: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory29: |     0 MB  |     0 MB  |
memory28: |     0 MB  |     0 MB  |
----------+-----------------------+
memory27: |  8192 MB  |  8192 MB  |
memory26: |     0 MB  |     0 MB  |
----------+-----------------------+
memory25: |     0 MB  |     0 MB  |
memory24: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory23: |     0 MB  |     0 MB  |
memory22: |     0 MB  |     0 MB  |
----------+-----------------------+
memory21: |  8192 MB  |  8192 MB  |
memory20: |     0 MB  |     0 MB  |
----------+-----------------------+
memory19: |     0 MB  |     0 MB  |
memory18: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory17: |     0 MB  |     0 MB  |
memory16: |     0 MB  |     0 MB  |
----------+-----------------------+
memory15: |  8192 MB  |  8192 MB  |
memory14: |     0 MB  |     0 MB  |
----------+-----------------------+
memory13: |     0 MB  |     0 MB  |
memory12: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory11: |     0 MB  |     0 MB  |
memory10: |     0 MB  |     0 MB  |
----------+-----------------------+
memory9:  |  8192 MB  |  8192 MB  |
memory8:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory7:  |     0 MB  |     0 MB  |
memory6:  |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory5:  |     0 MB  |     0 MB  |
memory4:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory3:  |  8192 MB  |  8192 MB  |
memory2:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory1:  |     0 MB  |     0 MB  |
memory0:  |  8192 MB  |  8192 MB  |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

5ee726db

ghes_edac: do a better job of filling EDAC DIMM info · 32fa1f53

由 Mauro Carvalho Chehab 提交于 2月 14, 2013

Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

32fa1f53

ghes_edac: add support for reporting errors via EDAC · f04c62a7

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f04c62a7

ghes_edac: Register at EDAC core the BIOS report · 77c5f5d2

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.

The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.

For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

77c5f5d2

22 2月, 2013 2 次提交

edac: add support for raw error reports · e7e24830

由 Mauro Carvalho Chehab 提交于 10月 31, 2012

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

e7e24830

edac: reduce stack pressure by using a pre-allocated buffer · c7ef7645

由 Mauro Carvalho Chehab 提交于 2月 21, 2013

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c7ef7645

21 2月, 2013 11 次提交

edac: lock module owner to avoid error report conflicts · 80cc7d87

由 Mauro Carvalho Chehab 提交于 10月 31, 2012

APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
	"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
   the EDAC core, as it is driver's responsibility to number
   the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
   doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

80cc7d87

edac: add a new memory layer type · c66b5a79

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c66b5a79

edac: initialize the core earlier · 4ab19b06

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:

...
[    4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[    4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[    6.519495] EDAC MC: Ver: 3.0.0
[    6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created

The net result is that no EDAC sysfs nodes will appear.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

4ab19b06

edac: better report error conditions in debug mode · 3d958823

由 Mauro Carvalho Chehab 提交于 2月 15, 2013

It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

3d958823

i5100_edac: Remove two checkpatch warnings · 59b9796d

由 Mauro Carvalho Chehab 提交于 2月 21, 2013

The last changeset introduced a few checkpatch warnings:

WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+       if (priv->debugfs)
+               debugfs_remove_recursive(priv->debugfs);

WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+       if (i5100_debugfs)
+               debugfs_remove(i5100_debugfs);

Get rid of them.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

59b9796d

i5100_edac: connect fault injection to debugfs node · 9cbc6d38

由 Niklas Söderlund 提交于 8月 08, 2012

Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.

After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.

Example of a CE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Example of UE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.
Signed-off-by: NNiklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

9cbc6d38

i5100_edac: add fault injection code · 53ceafd6

由 Niklas Söderlund 提交于 8月 08, 2012

Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.

[1] Intel 5100 Memory Controller Hub Chipset
    Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

[2] Intel 7300 Chipset MemoryController Hub (MCH)
    Doc.Nr: 318082
	http://www.intel.com/assets/pdf/datasheet/318082.pdfSigned-off-by: NNiklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

53ceafd6

i5100_edac: probe for device 19 function 0 · 52608ba2

由 Niklas Söderlund 提交于 8月 08, 2012

Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.
Signed-off-by: NNiklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

52608ba2

edac: only create sdram_scrub_rate where supported · e7100478

由 Mauro Carvalho Chehab 提交于 2月 19, 2013

Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.
Reported-by: NFelipe Balbi <balbi@ti.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NFelipe Balbi <balbi@ti.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

e7100478

i3200_edac: Fix the logic that detects filled memories · 61734e18

由 Mauro Carvalho Chehab 提交于 1月 10, 2013

After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.

It seems to be partially caused by the BIOS and partially by the driver.

The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.

However, the BIOS on this particular machine doesn't do it:

[   16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[   16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled

So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).

Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).

The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.

The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.

I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:

[   16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[   16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[   16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[   16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[   16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[   16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb

and the corresponding sysfs nodes are now properly filled.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

61734e18

i3200_edac: Add more debug to the driver · 5f466cb0

由 Mauro Carvalho Chehab 提交于 1月 10, 2013

Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.

Add debug for both, as it helps to track issues on the driver.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

5f466cb0

10 2月, 2013 1 次提交

mpc85xx_edac: Fix typo · e7d2c215

由 Baruch Siach 提交于 2月 10, 2013

Correct typos.
Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
Cc: Dave Jiang <djiang@mvista.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

e7d2c215

30 1月, 2013 2 次提交

EDAC: Fix kcalloc argument order · d3d09e18

由 Joe Perches 提交于 1月 26, 2013

First number, then size.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>

d3d09e18

EDAC: Test correct variable in ->store function · 8024c4c0

由 Dan Carpenter 提交于 1月 26, 2013

We're testing for ->show but calling ->store().
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NBorislav Petkov <bp@suse.de>

8024c4c0

23 1月, 2013 3 次提交

EDAC, MCE, AMD: Remove unneeded exports · 0f08669e

由 Borislav Petkov 提交于 12月 23, 2012

Initially, those strings describing different parts of an MCE message
were shared with amd64_edac and were therefore exported to modules.
However, all except pp_msgs are used only in one place right now so hide
them and make them static.

No functionality change.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>

0f08669e

EDAC, MCE, AMD: Add MCE decoding support for Family 16h · 980eec8b

由 Jacob Shin 提交于 12月 18, 2012

Add MCE decoding logic for AMD Family 16h processors.

Boris:

- drop unneeded uu_msgs export
- exit early in cat_mc1_mce and save us an indentation level
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>

980eec8b

EDAC, MCE, AMD: Make MC2 decoding per-family · 4a73d3de

由 Jacob Shin 提交于 12月 18, 2012

Currently only AMD Family 15h processors have special handling for MC2
errors. Since upcoming Family 16h will also need unique handling, let's
make MC2 handling part of amd_decoder_ops.
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Signed-off-by: NBorislav Petkov <bp@alien8.de>

4a73d3de

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功