提交 · 9d090926ce22896bff781623770cd59378c3e15a · openanolis / cloud-kernel

02 9月, 2020 5 次提交

EDAC/amd64: Set grain per DIMM · 9d090926

由 Yazen Ghannam 提交于 10月 22, 2019

fix #29035167

commit 466503d6b1b33be46ab87c6090f0ade6c6011cbc upstream

The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582d9 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

9d090926

EDAC/amd64: Find Chip Select memory size using Address Mask · ac316351

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit e53a3b267fb0a79db9ca1f1e08b97889b22013e6 upstream

Chip Select memory size reporting on AMD Family 17h was recently fixed
in order to account for interleaving. However, the current method is not
robust.

The Chip Select Address Mask can be used to find the memory size. There
are a couple of cases.

1) For single-rank and dual-rank non-interleaved, use the address mask
plus 1 as the size.

2) For dual-rank interleaved, do #1 but "de-interleave" the address mask
first.

Always "de-interleave" the address mask in order to simplify the code
flow. Bit mask manipulation is necessary to check for interleaving, so
just go ahead and do the de-interleaving. In the non-interleaved case,
the original and de-interleaved address masks will be the same.

To de-interleave the mask, count the number of zero bits in the middle
of the mask and swap them with the most significant bits.

For example,
Original=0xFFFF9FE, De-interleaved=0x3FFFFFE
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-5-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

ac316351

EDAC/amd64: Initialize DIMM info for systems with more than two channels · b635b9c9

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit 353a1fcb8f9e5857c0fb720b9e57a86c1fb7c17e upstream

Currently, the DIMM info for AMD Family 17h systems is initialized in
init_csrows(). This function is shared with legacy systems, and it has a
limit of two channel support.

This prevents initialization of the DIMM info for a number of ranks, so
there will be missing ranks in the EDAC sysfs.

Create a new init_csrows_df() for Family17h+ and revert init_csrows()
back to pre-Family17h support.

Loop over all channels in the new function in order to support systems
with more than two channels.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-4-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

b635b9c9

EDAC/amd64: Support more than two controllers for chip selects handling · 47af478f

由 Yazen Ghannam 提交于 8月 21, 2019

fix #29035167

commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream

The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.

Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.

Fix number of DIMMs and Chip Select bases/masks on Family17h, because
AMD Family 17h systems support 2 DIMMs, 4 CS bases, and 2 CS masks per
channel.

Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.
Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190821235938.118710-2-Yazen.Ghannam@amd.comSigned-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

47af478f

Revert "EDAC/amd64: Support more than two controllers for chip select handling" · b7e38109

由 Borislav Petkov 提交于 4月 25, 2019

fix #29035167

commit 8de9930a4618811edfaebc4981a9fafff2af9170 upstream

This reverts commit 0a227af521d6df5286550b62f4b591417170b4ea.

Unfortunately, this commit caused wrong detection of chip select sizes
on some F17h client machines:

  --- 00-rc6+     2019-02-14 14:28:03.126622904 +0100
  +++ 01-rc4+     2019-04-14 21:06:16.060614790 +0200
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0MB
   EDAC MC: UMC1 chip selects:
   EDAC amd64: MC: 0:     0MB 1:     0MB
  -EDAC amd64: MC: 2: 16383MB 3: 16383MB
  +EDAC amd64: MC: 2:     0MB 3: 2097151MB
   EDAC amd64: MC: 4:     0MB 5:     0MB
   EDAC amd64: MC: 6:     0MB 7:     0M

Revert it for now until it has been solved properly.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NArtie Ding <artie.ding@linux.alibaba.com>

b7e38109

18 3月, 2020 1 次提交

EDAC, skx: Retrieve and print retry_rd_err_log registers · a20889d9

由 Tony Luck 提交于 8月 15, 2019

commit e80634a75aba90e7485cd1fdb463fcac5d45f14d upstream

Skylake logs some additional useful information in per-channel
registers in addition the the architectural status/addr/misc
logged in the machine check bank.

Pick up this information and add it to the EDAC log:

retry_rd_err_[five 32-bit register values]

Sorry, no definitions for these registers. OEMs and DIMM vendors
will be able to use them to isolate which cells in the DIMM are
causing problems.

correrrcnt[per rank corrected error counts]

Note that if additional errors are logged while these registers are
being read, you may see a jumble of values some from earlier errors,
others from later errors (since the registers report the most recent
logged error). The correrrcnt registers provide error counts per possible
rank. If these counts only change by one since the previous error logged
for this channel, then it is safe to assume that the registers logged
provide a coherent view of one error.

With this change EDAC logs look like this:

EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 - err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: Nzhaobing <zhaobing@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

a20889d9

17 1月, 2020 13 次提交

HYGON: EDAC, amd64: Add Hygon Dhyana support · 4ed05f02

由 Pu Wen 提交于 1月 10, 2020

commit c4a3e94641449362ee970f521a2cdb0e8cd08690 upstream.

Add support for Hygon Dhyana CPU to EDAC.
Signed-off-by: NPu Wen <puwen@hygon.cn>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: mchehab@kernel.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: thomas.lendacky@amd.com
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/9d71061301177822bc55b3bfd44f91057458d886.1537533369.git.puwen@hygon.cnAcked-by: NCaspar Zhang <caspar@linux.alibaba.com>

4ed05f02

EDAC: skx_common: downgrade message importance on missing PCI device · 8386996b

由 Aristeu Rozanski 提交于 12月 04, 2019

cherry-pick form linux-next commit 854bb48018d5da261d438b2232fa683bdb553979.

Both skx_edac and i10nm_edac drivers are loaded based on the matching CPU being
available which leads the module to be automatically loaded in virtual machines
as well. That will fail due the missing PCI devices. In both drivers the first
function to make use of the PCI devices is skx_get_hi_lo() will simply print

EDAC skx: Can't get tolm/tohm

for each CPU core, which is noisy. This patch makes it a debug message.
Signed-off-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20191204212325.c4k47p5hrnn3vpb5@redhat.comSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8386996b

EDAC/amd64: Adjust printed chip select sizes when interleaved · f2e28686