提交 · a55456f3446d19853af54b64b3840312f46b6ea5 · openeuler / Kernel

10 5月, 2010 40 次提交

M
i7core: temporary workaround to allow it to compile against 2.6.30 · a55456f3
由 Mauro Carvalho Chehab 提交于 9月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
a55456f3

i7core_edac: Improve corrected_error_counts output for RDIMM · 3a3bb4a6

由 Mauro Carvalho Chehab 提交于 9月 03, 2009

Just cosmetics. instead of showing something like:

socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0

Show:

socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0

This is more synthetic and easier to parse.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

3a3bb4a6

i7core_edac: Probe on Xeons eariler · bc2d7245

由 Keith Mannthey 提交于 9月 03, 2009

On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.

This moves the detection of this state to before pci_register_driver is
called. Its present position was not working on my systems, the driver
would complain about not finding a specific device.

This patch allows the driver to load on my systems.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

bc2d7245

i7core: Use registered memories per processor · 14d2c083

由 Mauro Carvalho Chehab 提交于 9月 02, 2009

Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.

While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
---

14d2c083

i7core_edac: Use Device 3 function 2 to report errors with RDIMM's · b4e8f0b6

由 Mauro Carvalho Chehab 提交于 9月 02, 2009

Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.

After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.

For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.

This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

b4e8f0b6

i7core_edac: Fix ecc enable shift · 61053fde

由 Keith Mannthey 提交于 9月 02, 2009

From: Keith Mannthey <kmannth@us.ibm.com>

Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c

This correctly identifies the state of the ECC at the machine.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

61053fde

M
i7core_edac: Print an error message if pci register fails · 3ef288a9
由 Mauro Carvalho Chehab 提交于 9月 02, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
3ef288a9
M
i7core_edac: CodingSyle fixes/cleanups · b990538a
由 Mauro Carvalho Chehab 提交于 8月 05, 2009
```
No functional changes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
b990538a

i7core_edac: fix error injection · 4157d9f5

由 Mauro Carvalho Chehab 提交于 8月 05, 2009

There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.

While here, adds debug messages to allow seeing what registers are being
set while sending error injection.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

4157d9f5

M
i7core_edac: fix error codes for sysfs error injection interface · 2068def5
由 Mauro Carvalho Chehab 提交于 8月 05, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
2068def5
M
i7core_edac: some fixes at error injection code · 276b824c
由 Mauro Carvalho Chehab 提交于 7月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
276b824c
M
i7core_edac: Some cleanups at displayed info · 17cb7b0c
由 Mauro Carvalho Chehab 提交于 7月 20, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
17cb7b0c
M
i7core: remove some uneeded noisy debug messages · 086271a0
由 Mauro Carvalho Chehab 提交于 7月 18, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
086271a0
M
i7core: add socket info at the debug msg · 3a7dde7f
由 Mauro Carvalho Chehab 提交于 7月 18, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
3a7dde7f
M
i7core: better document i7core_get_active_channels() · ec6df24c
由 Mauro Carvalho Chehab 提交于 7月 18, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
ec6df24c

i7core: fix get_devices routine for Xeon55xx · c77720b9

由 Mauro Carvalho Chehab 提交于 7月 18, 2009

i7core_get_devices() were preparet to get just the first found device of each type.
Due to that, on Xeon 55xx, only socket 1 were retrived.

Rework i7core_get_devices() to clean it and to properly support Xeon 55xx.

While here, fix a small typo.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

c77720b9

M
i7core: enrich error information based on memory transaction type · a639539f
由 Mauro Carvalho Chehab 提交于 7月 17, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
a639539f
M
i7core: check if the memory error is fatal or non-fatal · c5d34528
由 Mauro Carvalho Chehab 提交于 7月 17, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
c5d34528

i7core: fix probing on Xeon55xx · 310cbb72

由 Mauro Carvalho Chehab 提交于 7月 17, 2009

Xeon55xx fails to probe with this error message:

EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41
i7core_edac: probe of 0000:00:14.0 failed with error -22

This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has
PCI ID 8086:2c40.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

310cbb72

i7core_edac: some fixes at memory error parser · f237fcf2

由 Mauro Carvalho Chehab 提交于 7月 15, 2009

m->bank is not related to the memory bank but, instead, to the MCA Error
register bank. Fix it accordingly. While here, improves the comments for
Nehalem bank.

A later fix is needed, in order to get bank/rank information from MCA
error log.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f237fcf2

i7core_edac: decode mcelog error and send it via edac interface · 8a2f118e

由 Mauro Carvalho Chehab 提交于 7月 15, 2009

Enriches mcelog error by using the encoded information at MCE status and
misc registers (IA32_MCx_STATUS, IA32_MCx_MISC).

Some fixes are still needed here, in order to properly fill the EDAC
fields.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

8a2f118e

M
i7core_edac: maps all sockets as if ther are one MC controller · ba6c5c62
由 Mauro Carvalho Chehab 提交于 7月 15, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
ba6c5c62

i7core_edac: add support for more than one MC socket · 67166af4

由 Mauro Carvalho Chehab 提交于 7月 15, 2009

Some Nehalem architectures have more than one MC socket. Socket 0 is
located at bus 255.

Currently, it is using up to 2 sockets, but increasing it to a larger
number is just a matter of increasing MAX_SOCKETS definition.

This seems to be required for properly support of Xeon 55xx.

Still needs testing with Xeon 55xx.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

67166af4

i7core_edac: Add a code to probe Xeon 55xx bus · d1fd4fb6

由 Mauro Carvalho Chehab 提交于 7月 10, 2009

This code changes the detection procedure of i7core_edac. Instead of
directly probing for MC registers, it probes for another register found
on Nehalem. If found, it tries to pick the first MC PCI BUS. This should
work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255
that are not properly detected by the non-legacy PCI methods.

The new detection code scans specifically at buses 254 and 255 for the
Xeon 55xx devices.

This code has not tested yet. After working, a change at the code will
be needed, since the i7core is not yet ready for working with 2 sets of
MC.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

d1fd4fb6

i7core_edac: Adds write unlock to MC registers · e9bd2e73

由 Mauro Carvalho Chehab 提交于 7月 09, 2009

The public Intel Xeon 5500 volume 2 datasheet describes, on page 53,
session 2.6.7 a register that can lock/unlock Memory Controller the
configuration register, called MC_CFG_CONTROL.

Adds support for it in the hope that software error injection would
work. With my tests with Xeon 35xx, there's still something missing.
With a program that does sequencial bit writes at dev 0.0, sometimes, it
produces error injection, after unblocking the MC_CFG_CONTROL (and,
sometimes, it just locks my testing machine).

I'll try later to discover by trial and error what's the register that
solves this issue on Xeon 35xx.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

e9bd2e73

i7core_edac: Add edac_mce glue · d5381642

由 Mauro Carvalho Chehab 提交于 7月 09, 2009

Adds a glue code to allow i7core to work with mcelog. With the glue,
i7core registers itself on edac_mce. At mce, when an error is detected,
it calls all registered drivers (in this case, i7core), for EDAC error
handling.

TODO: It currently just prints the MCE error log using about the same
      format as mce panic messages. The error message should be enhanced
      with mcelog userspace info and converted into the proper EDAC format,
      to feed the EDAC error counts.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

d5381642

M
i7core_edac: CodingStyle fixes · 41fcb7fe
由 Mauro Carvalho Chehab 提交于 6月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
41fcb7fe

i7core_edac: fill csrows edac sysfs info · eb94fc40

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

csrows is still fake, since we can't identify its representation with
Nehalem registers.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

eb94fc40

i7core_edac: Memory info fixes and preparation for properly filling cswrow data · 5566cb7c

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

Now, memory size is properly displayed:

EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked
EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400
EDAC i7core: Memory channel configuration:
EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400
EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs
EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
numrank: 1, numrow: 0x4000, numcol: 0x400

Still, as the way to retrieve csrows info is not known, it does a
mapping of what's available to csrows basic unit at edac core.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

5566cb7c

M
i7core_edac: Get more info about the memory DIMMs · 854d3349
由 Mauro Carvalho Chehab 提交于 6月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
854d3349

i7core_edac: Add more information about each active dimm · 7dd6953c

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

7dd6953c

M
i7core_edac: Improve error handling · b7c76151
由 Mauro Carvalho Chehab 提交于 6月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
b7c76151

i7core_edac: Properly fill struct csrow_info · 1c6fed80

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

1c6fed80

i7core_edac: Add additional tests for error detection · ef708b53

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

Properly check the number of channels and improve probing error detection
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

ef708b53

i7core_edac: Add a memory check routine, based on device 3 function 4 · 442305b1

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

This function appears only on Xeon 5500 datasheet. Yet, testing with a
Xeon 3503 showed that this is also implemented on other Nehalem
processors.

At the first read, MC_TEST_ERR_RCV1 and MC_TEST_ERR_RCV0 can contain any
value. Modify CE error logic to update the error count only after the
second read.

An alternative approach would be to do a write at rcv0 and rcv1
registers, but it seemed better to keep they untouched, since BIOS might
eventually assume that they are exclusive for their usage.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

442305b1

i7core_edac: need mci->edac_check, otherwise module removal doesn't work · 87d1d272

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

There are some locking troubles with edac_core: if you don't declare an
edac_check, module may suffer from soft lock.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

87d1d272

M
i7core_edac: A few fixes at error injection code · 7b029d03
由 Mauro Carvalho Chehab 提交于 6月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
7b029d03
M
i7core_edac: Show read/write virtual/physical channel association · f122a892
由 Mauro Carvalho Chehab 提交于 6月 22, 2009
```
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
```
f122a892

i7core_edac: Registers all supported MC functions · 8f331907

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

Now, it will try to register on all supported Memory Controller
functions.

It should be noticed that dev3, function 2 is present only on chips with
Registered DIMM's, according to the datasheet. So, the driver doesn't
return -ENODEV is all functions but this one were successfully
registered and enabled:

    EDAC i7core: Registered device 8086:2c18 fn=3 0
    EDAC i7core: Registered device 8086:2c19 fn=3 1
    EDAC i7core: Device not found: PCI ID 8086:2c1a (dev 3, func 2)
    EDAC i7core: Registered device 8086:2c1c fn=3 4
    EDAC i7core: Registered device 8086:2c20 fn=4 0
    EDAC i7core: Registered device 8086:2c21 fn=4 1
    EDAC i7core: Registered device 8086:2c22 fn=4 2
    EDAC i7core: Registered device 8086:2c23 fn=4 3
    EDAC i7core: Registered device 8086:2c28 fn=5 0
    EDAC i7core: Registered device 8086:2c29 fn=5 1
    EDAC i7core: Registered device 8086:2c2a fn=5 2
    EDAC i7core: Registered device 8086:2c2b fn=5 3
    EDAC i7core: Registered device 8086:2c30 fn=6 0
    EDAC i7core: Registered device 8086:2c31 fn=6 1
    EDAC i7core: Registered device 8086:2c32 fn=6 2
    EDAC i7core: Registered device 8086:2c33 fn=6 3
    EDAC i7core: Driver loaded.
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

8f331907

i7core_edac: Add more status functions to EDAC driver · 0b2b7b7e

由 Mauro Carvalho Chehab 提交于 6月 22, 2009

This patch were co-authored with Aristeu Rozanski.
Signed-off-by: NAristeu Sergio <arozansk@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

0b2b7b7e

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功