提交 · 5d1267c68169de90ef1157751d37ac7d3ee3470f · openanolis / cloud-kernel

14 11月, 2018 1 次提交

EDAC, {i7core,sb,skx}_edac: Fix uncorrected error counting · 5d1267c6

由 Tony Luck 提交于 9月 28, 2018

commit 432de7fd upstream.

The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.
Signed-off-by: NTony Luck <tony.luck@intel.com>
[ Massage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180928213934.19890-1-tony.luck@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5d1267c6

25 7月, 2018 1 次提交

EDAC, sb_edac: Add support for systems with segmented PCI buses · 190bd6e9

由 Masayoshi Mizuma 提交于 7月 24, 2018

Extend the driver to check whether segment number and bus number matches
when deciding how to group memory controller PCI devices to CPU sockets.
Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180724190213.26359-1-msys.mizuma@gmail.com
[ Cleanup commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

190bd6e9

17 3月, 2018 1 次提交

EDAC, sb_edac: Remove variable length array usage · 6fd05266

由 Gustavo A. R. Silva 提交于 3月 14, 2018

In preparation for enabling -Wvla, remove VLA and replace it with a
fixed-length array instead.

Also, remove max_interleave as it is no longer needed.
Reviewed-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180314182131.GA25259@embeddedgusSigned-off-by: NBorislav Petkov <bp@suse.de>

6fd05266

23 2月, 2018 1 次提交

EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL · bf848670

由 Anna Karbownik 提交于 2月 22, 2018

Commit

  3286d3eb ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")

decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights
Landing which supports up to 6 channels.

This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm
variables which don't pay critical role on KNL code path, so the memory
corruption wasn't causing any visible driver failures.

The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.

An alternative solution would be to restructure the KNL part of the
driver to 2MC/3channel representation.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAnna Karbownik <anna.karbownik@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: jim.m.snow@intel.com
Cc: krzysztof.paliswiat@intel.com
Cc: lukasz.odzioba@intel.com
Cc: qiuxu.zhuo@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3286d3eb ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com
[ Massage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

bf848670

19 10月, 2017 1 次提交

EDAC, sb_edac: Fix missing break in switch · a8e9b186

由 Gustavo A. R. Silva 提交于 10月 16, 2017

Add missing break statement in order to prevent the code from falling
through.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20171016174029.GA19757@embeddedor.comSigned-off-by: NBorislav Petkov <bp@suse.de>

a8e9b186

11 10月, 2017 1 次提交

EDAC, sb_edac: Fix missing DIMM sysfs entries with KNL SNC2/SNC4 mode · 24281a2f

由 Luis Felipe Sandoval Castro 提交于 9月 28, 2017

When figuring out the size of the DIMMs and the cluster mode is SNC2 or SNC4 the
current algorithm ignores the contribution of some of the channels resulting in
EDAC never knowing of the existence of some DIMMs attached to such channels (thus
sysfs is not populated).

Instead of selectively iterating from 0 to interlv_ways when looking for all the
participants in the interleave, do an exhaustive search and iterate from 0 to
KNL_MAX_CHANNELS. The algorithm is already smart enough to consider participants
only one time.

This works fine in all KNL cluster modes and even when there are missing DIMMs
as the contribution of those channels is 0.
Signed-off-by: NLuis Felipe Sandoval Castro <luis.felipe.sandoval.castro@intel.com>
Acked-by: NTony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: arozansk@redhat.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: qiuxu.zhuo@intel.com
Link: http://lkml.kernel.org/r/1506606882-90521-1-git-send-email-luis.felipe.sandoval.castro@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

24281a2f

27 9月, 2017 1 次提交

EDAC, sb_edac: Don't create a second memory controller if HA1 is not present · 15cc3ae0

由 Qiuxu Zhuo 提交于 9月 13, 2017

Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdfReported-and-tested-by: NYi Zhang <yizhan@redhat.com>
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: e2f747b1 ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

15cc3ae0

25 9月, 2017 1 次提交

EDAC: Add owner check to the x86 platform drivers · 301375e7

由 Toshi Kani 提交于 8月 23, 2017

Change x86 EDAC platform drivers to verify the module owner at the
beginning of their module init functions. This allows them to fail their
init immediately when ghes_edac is enabled. Similar change can be made
to other edac drivers if necessary.

Also, remove ".c" from module names of pnp2_edac, sb_edac, and skx_edac.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Suggested-by: NBorislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170823225447.15608-6-toshi.kani@hpe.comSigned-off-by: NBorislav Petkov <bp@suse.de>

301375e7

21 9月, 2017 1 次提交

EDAC: Handle return value of kasprintf() · 75f029c3

由 Arvind Yadav 提交于 9月 21, 2017

kasprintf() can fail and we must check its return value.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Cc: linux-edac@vger.kernel.org
[ Merged into a single patch, small formatting fixups. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

75f029c3

02 8月, 2017 1 次提交

EDAC, sb_edac: Classify memory mirroring modes · 039d7af6

由 Qiuxu Zhuo 提交于 7月 31, 2017

Basically, there are full memory mirroring and address range partial
memory mirroring (supported by Haswell EX and Broadwell EX) modes.

a) In full memory mirroring, the memory behind each memory controller
is mirrored, i.e. the memory is split into two identical mirrors
(primary and secondary), half of the memory is reserved for redundancy.

b) In address range partial memory mirroring, the memory size (range)
of primary and secondary behind each memory controller can be user
defined by the TAD0 register. The rest of memory ranges defined by
TAD1/TAD2/... in that memory controller are non-mirrored.

For more detail on memory mirroring, see the following link written by Tony Luck:

https://01.org/lkp/blogs/tonyluck/2016/address-range-partial-memory-mirroring-linux

Currently the sb_edac driver only supports address decoding in full
memory mirroring and non-mirroring modes. In address range partial
memory mirroring mode, it may fail to decode an address that falls in a
non-mirroring area (the following was one of this kind of failed logs).

mce: Uncorrected hardware memory error in user-access at 566d53a400
Memory failure: 0x566d53a8: Killing einj_mem_uc:4647 due to hardware memory corruption
Memory failure: 0x566d53a8: recovery action for dirty LRU page: Recovered
mce: [Hardware Error]: Machine check events logged
EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
EDAC sbridge MC1: CPU 48: Machine Check Event: 0 Bank 7: ec00000000010090
EDAC sbridge MC1: TSC 4b914aa5a99dab
EDAC sbridge MC1: ADDR 566d53a400
EDAC sbridge MC1: MISC 1443a0c86
EDAC sbridge MC1: PROCESSOR 0:406f1 TIME 1499712764 SOCKET 2 APIC 80
EDAC MC1: 0 UE Can't discover the memory rank for ch addr 0x7fb54e900 on any memory ( page:0x0 offset:0x0 grain:32)
mce: [Hardware Error]: Machine check events logged

Therefore, classify memory mirroring modes and make the address decoding
in address range partial memory mode correct.
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170730180651.30060-1-qiuxu.zhuo@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

039d7af6

17 7月, 2017 1 次提交

EDAC: Get rid of mci->mod_ver · c54182ec

由 Borislav Petkov 提交于 6月 29, 2017

It is a write-only variable so get rid of it.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRobert Richter <rric@kernel.org>
Acked-by: NMichal Simek <michal.simek@xilinx.com>
Acked-by: NThor Thayer <thor.thayer@linux.intel.com>
Acked-by: NTony Luck <tony.luck@intel.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Daney <david.daney@cavium.com>
Cc: Loc Ho <lho@apm.com>
Cc: linux-edac@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org

c54182ec

14 6月, 2017 1 次提交

EDAC, sb_edac: Avoid creating SOCK memory controller · 133e4455

由 Qiuxu Zhuo 提交于 6月 08, 2017

Xiaolong Ye reported the following failure on Broadwell D server:

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per
socket) use the same PCI device IDs for IMC0 per socket, then they
share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case,
Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller
and reports above error messages, since it has no IMC1 per socket.

Avoid creating the nonexistent SOCK memory controller.
Reported-and-tested-by: NXiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com
[ Massage. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

133e4455

25 5月, 2017 8 次提交

EDAC, sb_edac: Bump driver version and do some cleanups · d14e3a20

由 Qiuxu Zhuo 提交于 5月 23, 2017

Collapse 'case:' in *_mci_bind_devs() and update driver version from
1.1.1 to 1.1.2.
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

d14e3a20

EDAC, sb_edac: Check if ECC enabled when at least one DIMM is present · 4d475dde

由 Qiuxu Zhuo 提交于 5月 25, 2017

This is based on previous work by Patrick Geary, see Link.

Additional cleanups ontop:

 - Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro,
 now that TA0 and TA1 are unified.

 - Remove get_pdev_same_bus(), since in get_dimm_config() the
 variable "pvt->pci_ta" for KNL is also ready, we can simply use
 pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read
 MCMTR.
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com
Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com
[ Make __populate_dimms() return int. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

4d475dde

EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4 · 3286d3eb

由 Qiuxu Zhuo 提交于 5月 23, 2017

We don't need this quirk anymore now that the EDAC memory controller
representation matches the hardware.
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com
[ Commit message. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

3286d3eb

EDAC, sb_edac: Carve out dimm-populating loop · 66965229

由 Borislav Petkov 提交于 5月 25, 2017

... to slim down get_dimm_config().

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>

66965229

B
EDAC, sb_edac: Fix mod_name · 199389ac
由 Borislav Petkov 提交于 5月 25, 2017
```
It is called "sb_edac.c" now.
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
199389ac

EDAC, sb_edac: Assign EDAC memory controller per h/w controller · e2f747b1

由 Qiuxu Zhuo 提交于 5月 23, 2017

Tony pointed out: "currently the driver pretends there is one big
8-channel memory controller per socket instead of 2 4-channel
controllers. This is fine with all memory controller populated with
symmetrical DIMM configurations, but runs into difficulties on
asymmetrical setups".

Restructure the driver to assign an EDAC memory controller to each real
h/w memory controller to resolve the issue.
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com
[ Break some lines at convenient points. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

e2f747b1

EDAC, sb_edac: Don't use "Socket#" in the memory controller name · 7fd562b7

由 Tony Luck 提交于 5月 23, 2017

EDAC assigns logical memory controller numbers in the order that we find
memory controllers, which depends on which PCI bus they are on. Some
systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on
socket3.

All this is made more confusing for users because we use the string
"Socket" while generating names for memory controllers, but the number
that we attach there is the memory controller number. E.g.

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT)

Change the names to say "SrcID#%d" (where the number we use is read from
the h/w associated with the memory controller instead of some logical
number internal to the EDAC driver). New message:

  EDAC MC0: Giving out device to module sbridge_edac.c controller
    Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT)
Reported-by: NAndrey Korolyov <andrey@xdel.ru>
Reported-by: NPatrick Geary <patrickg@supermicro.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

7fd562b7

EDAC, sb_edac: Classify PCI-IDs by topology · 00cf50d9

由 Qiuxu Zhuo 提交于 5月 23, 2017

Each of the PCI device IDs belongs to a CPU socket, or to one of the
integrated memory controllers. Provide an enum to specify the domain of
each, and distinguish the resource number in each domain: the number
of the PCI device IDs per integrated memory controller/socket, and the
number of integrated memory controllers per socket.
Signed-off-by: NQiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com
[ Realign pci_dev_descr_knl members. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

00cf50d9

10 4月, 2017 1 次提交

EDAC: Rename report status accessors · bffc7dec

由 Borislav Petkov 提交于 2月 04, 2017

Change them to have the edac_ prefix.

No functionality change.
Signed-off-by: NBorislav Petkov <bp@suse.de>

bffc7dec

24 1月, 2017 1 次提交

x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority · 9026cc82

由 Borislav Petkov 提交于 1月 23, 2017

Assign all notifiers on the MCE decode chain a priority so that they get
called in the correct order.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

9026cc82

23 1月, 2017 1 次提交

EDAC, sb_edac: Get rid of ->show_interleave_mode() · 127c1225

由 Nicolas Iooss 提交于 1月 22, 2017

Function sbridge_register_mci() sets pvt->info.show_interleave_mode
to knl_show_interleave_mode() on Knight's Landing and
show_interleave_mode() anywhere else.

Merge show_interleave_mode() and knl_show_interleave_mode() in a single
implementation and use it without an indirect function pointer.
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org
[ Call it get_intlv_mode_str(). ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

127c1225

15 12月, 2016 1 次提交

edac: rename edac_core.h to edac_mc.h · 78d88e8a

由 Mauro Carvalho Chehab 提交于 10月 29, 2016

Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

78d88e8a

19 10月, 2016 2 次提交

EDAC, sb_edac: Add Knights Mill support · 9a9260ca

由 Piotr Luc 提交于 10月 13, 2016

Add Knights Mill (KNM) to the list of CPU models supported by sb_edac.
Signed-off-by: NPiotr Luc <piotr.luc@intel.com>
Reviewed-by: NDave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

9a9260ca

EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them · 20f4d692

由 Dave Hansen 提交于 9月 29, 2016

We now have symbolic names for a bunch of Intel CPU models via
asm/intel-family.h. The original conversion missed the EDAC drivers.
Convert them.
Signed-off-by: NDave Hansen <dave.hansen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com
[ Remove comment, macro name is descriptive enough. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

20f4d692

13 9月, 2016 1 次提交

EDAC, sb_edac: Remove NULL pointer check on array pci_tad · c7c35407

由 Colin Ian King 提交于 9月 08, 2016

pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and
hence cannot be NULL, so the NULL pointer check on pci_tad is redundant.
Remove it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NTony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.comSigned-off-by: NBorislav Petkov <bp@suse.de>

c7c35407

08 8月, 2016 1 次提交

EDAC, sb_edac: Fix channel reporting on Knights Landing · c5b48fa7

由 Lukasz Odzioba 提交于 7月 23, 2016

On Intel Xeon Phi Knights Landing processor family the channels of the
memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
channel name incorrectly.

We missed this change earlier, so the code already contains similar
comment, but the translation function is incorrect.

Without this patch:
  errors in DIMM_A and DIMM_D were reported in DIMM_D
  errors in DIMM_B and DIMM_E were reported in DIMM_E
  errors in DIMM_C and DIMM_F were reported in DIMM_F

Correct this.

Hubert Chrzaniuk:
 - rebased to 4.8
 - comments and code cleanup

Fixes: d0cdf900 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
Reviewed-by: NTony Luck <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Cc: lukasz.odzioba@intel.com
Cc: mchehab@kernel.org
Cc: <stable@vger.kernel.org> # v4.5..
Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.comSigned-off-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
[ Boris: Simplify a bit by removing char mc. ]
Signed-off-by: NBorislav Petkov <bp@suse.de>

c5b48fa7

16 7月, 2016 1 次提交

EDAC, sb_edac: Fix Knights Landing · 0ba169ac

由 Tony Luck 提交于 7月 14, 2016

In commit 2c1ea4c7 ("EDAC, sb_edac: Use cpu family/model in driver
detection") I broke Knights Landing because I failed to notice that it
called a wrapper macro "sbridge_get_all_devices_knl" instead of
"sbridge_get_all_devices" like all the other types.

Now that we include the processor type in the pci_id_table structure we
can skip the wrappers and just have the sbridge_get_all_devices() check
the type to decide whether to allow duplicate devices and controllers to
have registers spread across buses.

Fixes: 2c1ea4c7 ("EDAC, sb_edac: Use cpu family/model in driver detection")
Tested-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ba169ac

03 6月, 2016 2 次提交

EDAC, sb_edac: Readd accidentally dropped Broadwell-D support · 665f05e0

由 Tony Luck 提交于 6月 02, 2016

In commit

  2c1ea4c7 ("EDAC, sb_edac: Use cpu family/model in driver detection")

we switched from using PCI ids to determine which platform we are
running on to using CPU model instead.

I forgot that Broadwell-DE has its own distinct model number different
from Broadwell-EP or -EX.

Fixing this isn't just adding a line to the array of cpuids - the
exising code assumed a 1:1 mapping between entries in that array and the
"enum type" values. Added the type to pci_id_table structure to remove
this dependency and allows two Broadwell cpu models.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: 2c1ea4c7 ("EDAC, sb_edac: Use cpu family/model in driver detection")
Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

665f05e0

EDAC, sb_edac: Fix rank lookup on Broadwell · c7103f65

由 Tony Luck 提交于 5月 31, 2016

Broadwell made a small change to the rank target register moving the
target rank ID field up from bits 16:19 to bits 20:23.

Also found that the offset field grew by one bit in the IVY_BRIDGE to
HASWELL transition, so fix the RIR_OFFSET() macro too.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org # v3.19+
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

c7103f65

03 5月, 2016 1 次提交

EDAC, sb_edac: Use cpu family/model in driver detection · 2c1ea4c7

由 Tony Luck 提交于 4月 28, 2016

Instead of picking a random PCI ID from the dozen or so we need to
access, just use x86_match_cpu() to pick based on CPU model number. The
choosing of PCI devices has been problematic in the past, see

  11249e73 ("sb_edac: Fix detection on SNB machines")

which fixed problems introduced by

  d0585cd8 ("sb_edac: Claim a different PCI device").

This is especially ugly if future hardware might not even have
EDAC-relevant registers in PCI config space and we would still be
required to choose some "random" PCI devices to scan for just so our
driver loads.

Is this cleaner/clearer? It deletes much more code than it adds. Only
tested on Broadwell. The driver loads/unloads and loads again. Still
decodes errors too.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Suggested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>

2c1ea4c7

29 4月, 2016 1 次提交

EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback · c4fc1956

由 Tony Luck 提交于 4月 29, 2016

Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Acked-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

c4fc1956

23 4月, 2016 1 次提交

EDAC, sb_edac: Remove double buffering of error records · ad08c4e9

由 Tony Luck 提交于 4月 15, 2016

In the bad old days the functions from x86_mce_decoder_chain could be
called in machine check context. So we used to carefully copy them and
defer processing until later. But in

f29a7aff ("x86/mce: Avoid potential deadlock due to printk() in MCE context")

we switched the logging code to save the record in a genpool, and call
the functions that registered to be notified later from a work queue.

So drop all the double buffering and do all the work we want to do as
soon as sbridge_mce_check_error() is called.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: patrickg@supermicro.com
Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

ad08c4e9

22 4月, 2016 2 次提交

x86 EDAC, sb_edac.c: Take account of channel hashing when needed · ea5dfb5f

由 Tony Luck 提交于 4月 14, 2016

Haswell and Broadwell can be configured to hash the channel
interleave function using bits [27:12] of the physical address.

On those processor models we must check to see if hashing is
enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and
act accordingly.

Based on a patch by patrickg <patrickg@supermicro.com>
Tested-by: NPatrick Geary <patrickg@supermicro.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Acked-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ea5dfb5f

x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address · ff15e95c

由 Tony Luck 提交于 4月 14, 2016

In commit:

  eb1af3b7 ("Fix computation of channel address")

I switched the "sck_way" variable from holding the log2 value read
from the h/w to instead be the actual number. Unfortunately it
is needed in log2 form when used to shift the address.
Tested-by: NPatrick Geary <patrickg@supermicro.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Acked-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: eb1af3b7 ("Fix computation of channel address")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ff15e95c

11 3月, 2016 1 次提交

EDAC/sb_edac: Fix computation of channel address · eb1af3b7

由 Luck, Tony 提交于 3月 09, 2016

Large memory Haswell-EX systems with multiple DIMMs per channel were
sometimes reporting the wrong DIMM.

Found three problems:

 1) Debug printouts for socket and channel interleave were not interpreting
    the register fields correctly. The socket interleave field is a 2^X
    value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2,
    2=3. 3=4).

 2) Actual use of the socket interleave value didn't interpret as 2^X

 3) Conversion of address to channel address was complicated, and wrong.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Acked-by: NAristeu Rozanski <arozansk@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

eb1af3b7

08 3月, 2016 1 次提交

EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi · 83bdaad4

由 Hubert Chrzaniuk 提交于 3月 07, 2016

Correct a typo introduced by

  d0cdf900 ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support")

As a result under some configurations DIMMs were not correctly
recognized. Problem affects only Xeon Phi architecture.
Signed-off-by: NHubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

83bdaad4

11 12月, 2015 1 次提交

EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing · 45f4d3ab

由 Hubert Chrzaniuk 提交于 12月 11, 2015

Knights Landing does not come with register that could be used to fetch
DIMM width. However the value is fixed for this architecture so it can
be hardcoded.
Signed-off-by: NHubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>

45f4d3ab

06 12月, 2015 1 次提交

EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support · d0cdf900

由 Jim Snow 提交于 12月 03, 2015

Knights Landing is the next generation architecture for HPC market.

KNL introduces concept of a tile and CHA - Cache/Home Agent for memory
accesses.

Some things are fixed in KNL:
() There's single DIMM slot per channel
() There's 2 memory controllers with 3 channels each, however,
   from EDAC standpoint, it is presented as single memory controller
   with 6 channels. In order to represent 2 MCs w/ 3 CH, it would
   require major redesign of EDAC core driver.

Basically, two functionalities are added/extended:
() during driver initialization KNL topology is being recognized, i.e.
   which channels are populated with what DIMM sizes
   (knl_get_dimm_capacity function)
() handle MCE errors - channel swizzling
Reviewed-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NJim Snow <jim.m.snow@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: lukasz.anaczkowski@intel.com
Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com
[ Rebase to 4.4-rc3. ]
Signed-off-by: NHubert Chrzaniuk <hubert.chrzaniuk@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>

d0cdf900

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功