1. 17 1月, 2020 2 次提交
    • Y
      x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units · 0b8080b5
      Yazen Ghannam 提交于
      commit 3ad7e748c12cc771df6020a552def3e1727e8a17 upstream.
      
      The existing CS, PSP, and SMU SMCA bank types will see new versions (as
      indicated by their McaTypes) in future SMCA systems.
      
      Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
      same names as the older versions, since they are logically the same to
      the user. SMCA systems won't mix and match IP blocks with different
      McaType versions in the same system, so there isn't a need to
      distinguish them. The MCA_IPID register is saved when logging an MCA
      error, and that can be used to triage the error.
      
      Also, add the new error descriptions to edac_mce_amd. Some error types
      (positions in the list) are overloaded compared to the previous
      McaTypes. Therefore, just create new lists of the error descriptions to
      keep things simple even if some of the error descriptions are the same
      between versions.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: Shirish S <Shirish.S@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.comSigned-off-by: NWANG Siyuan <Siyuan.Wang@amd.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      0b8080b5
    • Y
      x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types · 43886175
      Yazen Ghannam 提交于
      commit cbfa447edd6a3825fdb8a4ffae74ff7208f2d2c0 upstream.
      
      Add the (HWID, MCATYPE) tuples and names for the new MP5, NBIO, and
      PCIE SMCA bank types.
      
      Also, add their respective error descriptions to the MCE decoding module
      edac_mce_amd.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: Shirish S <Shirish.S@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190201225534.8177-2-Yazen.Ghannam@amd.comSigned-off-by: NWANG Siyuan <Siyuan.Wang@amd.com>
      Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
      43886175
  2. 22 2月, 2018 1 次提交
    • Y
      x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type · 68627a69
      Yazen Ghannam 提交于
      Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
      bank 4 in the smca_banks array. This means that when we check if a bank
      is initialized, like during boot or resume, we will see that bank 4 is
      not initialized and try to initialize it.
      
      This will cause a call trace, when resuming from suspend, due to
      rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
      an IPI but we're running with interrupts disabled. This triggers:
      
        WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
        ...
      
      Reserved banks will be read-as-zero, so their MCA_IPID register will be
      zero. So, like the smca_banks array, the threshold_banks array will not
      have an entry for a reserved bank since all its MCA_MISC* registers will
      be zero.
      
      Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.
      
      Use the "Reserved" type when checking if a bank is reserved. It's
      possible that other bank numbers may be reserved on future systems.
      
      Don't try to find the block address on reserved banks.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.14.x
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68627a69
  3. 21 8月, 2017 3 次提交
  4. 17 7月, 2017 1 次提交
  5. 13 6月, 2017 1 次提交
  6. 16 2月, 2017 1 次提交
    • Y
      EDAC, mce_amd: Print IPID and Syndrome on a separate line · 75bf2f64
      Yazen Ghannam 提交于
      Currently, the IPID and Syndrome are printed on the same line as the
      Address. There are cases when we can have a valid Syndrome but not a
      valid Address.
      
      For example, the MCA_SYND register can be used to hold more detailed
      error info that the hardware folks can use. It's not just DRAM ECC
      syndromes. There are some error types that aren't related to memory that
      may have valid syndromes, like some errors related to links in the Data
      Fabric, etc.
      
      In these cases, the IPID and Syndrome are not printed at the same log
      level as the rest of the stanza, so users won't see them on the console.
      
      Console:
        [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
        [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
      
      Dmesg:
        [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
        , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
        [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
      
      Print the IPID first and on a new line. The IPID should always be
      printed on SMCA systems. The Syndrome will then be printed with the IPID
      and at the same log level when valid:
      
        [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
        [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
        [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
      Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      75bf2f64
  7. 28 1月, 2017 1 次提交
  8. 24 1月, 2017 3 次提交
  9. 29 11月, 2016 1 次提交
  10. 21 11月, 2016 1 次提交
  11. 09 11月, 2016 3 次提交
  12. 13 9月, 2016 6 次提交
  13. 12 5月, 2016 1 次提交
  14. 08 3月, 2016 1 次提交
  15. 13 8月, 2015 2 次提交
  16. 14 7月, 2015 1 次提交
  17. 25 11月, 2014 1 次提交
  18. 05 11月, 2014 1 次提交
  19. 14 7月, 2014 1 次提交
  20. 09 5月, 2014 1 次提交
  21. 24 2月, 2014 1 次提交
  22. 08 6月, 2013 1 次提交
  23. 23 1月, 2013 3 次提交
  24. 28 11月, 2012 2 次提交