提交 · f338d736910edf00e8426ee4322cfda585268d50 · openanolis / cloud-kernel

10 5月, 2010 40 次提交

i7core_edac: Convert UDIMM error counters into a proper sysfs group · f338d736

由 Mauro Carvalho Chehab 提交于 9月 24, 2009

Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:

/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2

For registered dimms, however, the error counters are already being
displayed at:
	/sys/devices/system/edac/mc/mc0/csrow*/ce_count

So, there's no need to add any extra sysfs nodes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f338d736

M
edac: Don't create csrow entries on instance groups · c419d921
由 Mauro Carvalho Chehab 提交于 9月 24, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
c419d921
M
edac: store/show methods for device groups weren't working · cc301b3a
由 Mauro Carvalho Chehab 提交于 9月 24, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
cc301b3a
M
i7core_edac: Add support for sysfs addrmatch group · a5538e53
由 Mauro Carvalho Chehab 提交于 9月 23, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
a5538e53

edac_core: Allow the creation of sysfs groups · 9fa2fc2e

由 Mauro Carvalho Chehab 提交于 9月 23, 2009

Currently, all sysfs nodes are stored at /sys/.*/mc. (regex)
However, sometimes it is needed to create attribute groups.

This patch extends edac_core to allow groups creation.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

9fa2fc2e

M
i7core_edac: Avoid printing a warning when debug is disabled · 4af91889
由 Mauro Carvalho Chehab 提交于 9月 24, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
4af91889
M
i7core_edac: We need to use list_for_each_entry_safe to avoid errors · 42538680
由 Mauro Carvalho Chehab 提交于 9月 24, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
42538680

i7core_edac: change remove module strategy · 22e6bcbd

由 Mauro Carvalho Chehab 提交于 9月 05, 2009

The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.

Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.

So, instead, it unregisters all devices at once, deleting them from the
device list.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

22e6bcbd

i7core_edac: remove static counter for max sockets · 0f062792

由 Mauro Carvalho Chehab 提交于 9月 05, 2009

The number of sockets is now fully dynamic. Get rid of this obsolete
var.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

0f062792

M
i7core_edac: at remove, don't remove all pci devices at once · 13d6e9b6
由 Mauro Carvalho Chehab 提交于 9月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
13d6e9b6
M
i7core_edac: Fix a bug when printing error counts with RDIMMs · d88b8507
由 Mauro Carvalho Chehab 提交于 9月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
d88b8507
M
i7core_edac: a few fixes for multiple mc's · d4c27795
由 Mauro Carvalho Chehab 提交于 9月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
d4c27795
M
i7core_edac: sanity check: print a warning if a mcelog is ignored · 6c6aa3af
由 Mauro Carvalho Chehab 提交于 9月 05, 2009
```
In thesis, the other mc controller should handle it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
6c6aa3af

i7core_edac: create one mc per socket/QPI · f4742949

由 Mauro Carvalho Chehab 提交于 9月 05, 2009

Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).

This better reflects the Nehalem architecture.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f4742949

Dynamically allocate memory for PCI devices · 66607706

由 Mauro Carvalho Chehab 提交于 9月 05, 2009

Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.

This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.

66607706

M
i7core: temporary workaround to allow it to compile against 2.6.30 · a55456f3
由 Mauro Carvalho Chehab 提交于 9月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
a55456f3

i7core_edac: Improve corrected_error_counts output for RDIMM · 3a3bb4a6

由 Mauro Carvalho Chehab 提交于 9月 03, 2009

Just cosmetics. instead of showing something like:

socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0

Show:

socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0

This is more synthetic and easier to parse.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

3a3bb4a6

i7core_edac: Probe on Xeons eariler · bc2d7245

由 Keith Mannthey 提交于 9月 03, 2009

On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.

This moves the detection of this state to before pci_register_driver is
called. Its present position was not working on my systems, the driver
would complain about not finding a specific device.

This patch allows the driver to load on my systems.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

bc2d7245

i7core: Use registered memories per processor · 14d2c083

由 Mauro Carvalho Chehab 提交于 9月 02, 2009

Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.

While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
---

14d2c083

i7core_edac: Use Device 3 function 2 to report errors with RDIMM's · b4e8f0b6

由 Mauro Carvalho Chehab 提交于 9月 02, 2009

Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.

After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.

For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.

This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

b4e8f0b6

i7core_edac: Fix ecc enable shift · 61053fde

由 Keith Mannthey 提交于 9月 02, 2009

From: Keith Mannthey <kmannth@us.ibm.com>

Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c

This correctly identifies the state of the ECC at the machine.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

61053fde

M
i7core_edac: Print an error message if pci register fails · 3ef288a9
由 Mauro Carvalho Chehab 提交于 9月 02, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
3ef288a9
M
i7core_edac: CodingSyle fixes/cleanups · b990538a
由 Mauro Carvalho Chehab 提交于 8月 05, 2009
```
No functional changes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
b990538a

i7core_edac: fix error injection · 4157d9f5

由 Mauro Carvalho Chehab 提交于 8月 05, 2009

There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.

While here, adds debug messages to allow seeing what registers are being
set while sending error injection.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

4157d9f5

M
i7core_edac: fix error codes for sysfs error injection interface · 2068def5
由 Mauro Carvalho Chehab 提交于 8月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
2068def5
M
i7core_edac: some fixes at error injection code · 276b824c
由 Mauro Carvalho Chehab 提交于 7月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
276b824c
M
i7core_edac: Some cleanups at displayed info · 17cb7b0c
由 Mauro Carvalho Chehab 提交于 7月 20, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
17cb7b0c
M
i7core: remove some uneeded noisy debug messages · 086271a0
由 Mauro Carvalho Chehab 提交于 7月 18, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
086271a0
M
i7core: add socket info at the debug msg · 3a7dde7f
由 Mauro Carvalho Chehab 提交于 7月 18, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
3a7dde7f
M
i7core: better document i7core_get_active_channels() · ec6df24c
由 Mauro Carvalho Chehab 提交于 7月 18, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
ec6df24c

i7core: fix get_devices routine for Xeon55xx · c77720b9

由 Mauro Carvalho Chehab 提交于 7月 18, 2009

i7core_get_devices() were preparet to get just the first found device of each type.
Due to that, on Xeon 55xx, only socket 1 were retrived.

Rework i7core_get_devices() to clean it and to properly support Xeon 55xx.

While here, fix a small typo.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c77720b9

M
i7core: enrich error information based on memory transaction type · a639539f
由 Mauro Carvalho Chehab 提交于 7月 17, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
a639539f
M
i7core: check if the memory error is fatal or non-fatal · c5d34528
由 Mauro Carvalho Chehab 提交于 7月 17, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
c5d34528

i7core: fix probing on Xeon55xx · 310cbb72

由 Mauro Carvalho Chehab 提交于 7月 17, 2009

Xeon55xx fails to probe with this error message:

EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41
i7core_edac: probe of 0000:00:14.0 failed with error -22

This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has
PCI ID 8086:2c40.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

310cbb72

i7core_edac: some fixes at memory error parser · f237fcf2

由 Mauro Carvalho Chehab 提交于 7月 15, 2009

m->bank is not related to the memory bank but, instead, to the MCA Error
register bank. Fix it accordingly. While here, improves the comments for
Nehalem bank.

A later fix is needed, in order to get bank/rank information from MCA
error log.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f237fcf2

i7core_edac: decode mcelog error and send it via edac interface · 8a2f118e

由 Mauro Carvalho Chehab 提交于 7月 15, 2009

Enriches mcelog error by using the encoded information at MCE status and
misc registers (IA32_MCx_STATUS, IA32_MCx_MISC).

Some fixes are still needed here, in order to properly fill the EDAC
fields.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

8a2f118e

M
i7core_edac: maps all sockets as if ther are one MC controller · ba6c5c62
由 Mauro Carvalho Chehab 提交于 7月 15, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
ba6c5c62

i7core_edac: add support for more than one MC socket · 67166af4

由 Mauro Carvalho Chehab 提交于 7月 15, 2009

Some Nehalem architectures have more than one MC socket. Socket 0 is
located at bus 255.

Currently, it is using up to 2 sockets, but increasing it to a larger
number is just a matter of increasing MAX_SOCKETS definition.

This seems to be required for properly support of Xeon 55xx.

Still needs testing with Xeon 55xx.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

67166af4

i7core_edac: Add a code to probe Xeon 55xx bus · d1fd4fb6

由 Mauro Carvalho Chehab 提交于 7月 10, 2009

This code changes the detection procedure of i7core_edac. Instead of
directly probing for MC registers, it probes for another register found
on Nehalem. If found, it tries to pick the first MC PCI BUS. This should
work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255
that are not properly detected by the non-legacy PCI methods.

The new detection code scans specifically at buses 254 and 255 for the
Xeon 55xx devices.

This code has not tested yet. After working, a change at the code will
be needed, since the i7core is not yet ready for working with 2 sets of
MC.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

d1fd4fb6

i7core_edac: Adds write unlock to MC registers · e9bd2e73

由 Mauro Carvalho Chehab 提交于 7月 09, 2009

The public Intel Xeon 5500 volume 2 datasheet describes, on page 53,
session 2.6.7 a register that can lock/unlock Memory Controller the
configuration register, called MC_CFG_CONTROL.

Adds support for it in the hope that software error injection would
work. With my tests with Xeon 35xx, there's still something missing.
With a program that does sequencial bit writes at dev 0.0, sometimes, it
produces error injection, after unblocking the MC_CFG_CONTROL (and,
sometimes, it just locks my testing machine).

I'll try later to discover by trial and error what's the register that
solves this issue on Xeon 35xx.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

e9bd2e73

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功