1. 24 4月, 2019 1 次提交
    • Y
      x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models · 71a84402
      Yazen Ghannam 提交于
      AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
      errors under certain conditions. The errors are benign and can safely be
      ignored. However, the high error rate may cause the MCA threshold
      counter to overflow causing a high rate of thresholding interrupts.
      
      In addition, users may see the errors reported through the AMD MCE
      decoder module, even with the interrupt disabled, due to MCA polling.
      
      Clear the "Counter Present" bit in the Instruction Fetch bank's
      MCA_MISC0 register. This will prevent enabling MCA thresholding on this
      bank which will prevent the high interrupt rate due to this error.
      
      Define an AMD-specific function to filter these errors from the MCE
      event pool so that they don't get reported during early boot.
      
      Rename filter function in EDAC/mce_amd to avoid a naming conflict, while
      at it.
      
       [ bp: Move function prototype to the internal header and
         massage/cleanup, fix typos. ]
      Reported-by: NRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "clemej@gmail.com" <clemej@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
      Cc: Shirish S <Shirish.S@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Cc: <stable@vger.kernel.org> # 5.0.x: c95b323d: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
      Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
      Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd40: x86/MCE: Group AMD function prototypes in <asm/mce.h>
      Cc: <stable@vger.kernel.org> # 5.0.x
      Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
      71a84402
  2. 15 2月, 2019 2 次提交
  3. 05 2月, 2019 1 次提交
  4. 03 2月, 2019 4 次提交
  5. 28 9月, 2018 1 次提交
  6. 22 2月, 2018 1 次提交
    • Y
      x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type · 68627a69
      Yazen Ghannam 提交于
      Currently, bank 4 is reserved on Fam17h, so we chose not to initialize
      bank 4 in the smca_banks array. This means that when we check if a bank
      is initialized, like during boot or resume, we will see that bank 4 is
      not initialized and try to initialize it.
      
      This will cause a call trace, when resuming from suspend, due to
      rdmsr_*on_cpu() calls in the init path. The rdmsr_*on_cpu() calls issue
      an IPI but we're running with interrupts disabled. This triggers:
      
        WARNING: CPU: 0 PID: 11523 at kernel/smp.c:291 smp_call_function_single+0xdc/0xe0
        ...
      
      Reserved banks will be read-as-zero, so their MCA_IPID register will be
      zero. So, like the smca_banks array, the threshold_banks array will not
      have an entry for a reserved bank since all its MCA_MISC* registers will
      be zero.
      
      Enumerate a "Reserved" bank type that matches on a HWID_MCATYPE of 0,0.
      
      Use the "Reserved" type when checking if a bank is reserved. It's
      possible that other bank numbers may be reserved on future systems.
      
      Don't try to find the block address on reserved banks.
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # 4.14.x
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20180221101900.10326-7-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68627a69
  7. 21 8月, 2017 3 次提交
  8. 17 7月, 2017 1 次提交
  9. 13 6月, 2017 1 次提交
  10. 16 2月, 2017 1 次提交
    • Y
      EDAC, mce_amd: Print IPID and Syndrome on a separate line · 75bf2f64
      Yazen Ghannam 提交于
      Currently, the IPID and Syndrome are printed on the same line as the
      Address. There are cases when we can have a valid Syndrome but not a
      valid Address.
      
      For example, the MCA_SYND register can be used to hold more detailed
      error info that the hardware folks can use. It's not just DRAM ECC
      syndromes. There are some error types that aren't related to memory that
      may have valid syndromes, like some errors related to links in the Data
      Fabric, etc.
      
      In these cases, the IPID and Syndrome are not printed at the same log
      level as the rest of the stanza, so users won't see them on the console.
      
      Console:
        [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
        [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
      
      Dmesg:
        [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
        , Syndrome: 0x000000010b404000, IPID: 0x0001002e00000002
        [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
      
      Print the IPID first and on a new line. The IPID should always be
      printed on SMCA systems. The Syndrome will then be printed with the IPID
      and at the same log level when valid:
      
        [Hardware Error]: CPU:16 (17:1:0) MC22_STATUS[Over|CE|MiscV|-|-|-|-|SyndV|-]: 0xd82000000002080b
        [Hardware Error]: IPID: 0x0001002e00000002, Syndrome: 0x000000010b404000
        [Hardware Error]: Power, Interrupts, etc. Extended Error Code: 2
      Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1487192182-2474-1-git-send-email-Yazen.Ghannam@amd.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      75bf2f64
  11. 28 1月, 2017 1 次提交
  12. 24 1月, 2017 3 次提交
  13. 29 11月, 2016 1 次提交
  14. 21 11月, 2016 1 次提交
  15. 09 11月, 2016 3 次提交
  16. 13 9月, 2016 6 次提交
  17. 12 5月, 2016 1 次提交
  18. 08 3月, 2016 1 次提交
  19. 13 8月, 2015 2 次提交
  20. 14 7月, 2015 1 次提交
  21. 25 11月, 2014 1 次提交
  22. 05 11月, 2014 1 次提交
  23. 14 7月, 2014 1 次提交
  24. 09 5月, 2014 1 次提交