1. 19 10月, 2022 2 次提交
  2. 18 10月, 2022 3 次提交
    • O
      !163 ICX: EDAC driver decoder for Ice Lake · 28e59bca
      openeuler-ci-bot 提交于
      Merge Pull Request from: @youquan_song 
       
      [Description]​
      https://gitee.com/openeuler/intel-kernel/issues/I5V3IO
      
      Current i10nm_edac only supports firmware decoder (ACPI DSM methods).
      MCA bank registers of Ice Lake or Tremont CPUs contain the information
      to decode DDR memory errors. To get better decoding performance, add
      the driver decoder (decoding DDR memory errors via extracting error
      information from MCA bank registers) for Ice Lake and Tremont CPUs.
      
      the patchset will be valuable to avoid SMI triggered to call firware decoder, especially when CE (Correctable Error) triggered frequently on DDR memory.
      
      fe32f366 EDAC/skx_common: Use driver decoder first
      627d551a EDAC/skx_common: Make output format similar
      2738c69a EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
      
      [Testing]
      #echo 1 > /sys/module/i10nm_edac/parameters/decoding_via_mca
      #modprobe einj
      #rdmsr 0x34    (read SMI count)
      132
      #/home/ras-tools/cmcistorm 1
      0: vaddr = 0x1401490 paddr = 9686af490
      #rdmsr 0x34
      133      --- only increase one for EINJ error injection. Avoid the SMI increase for EDAC decode by call _DSM.  
      #dmesg
      [ 467.460634] EINJ: Error INJection is initialized.
      [ 666.964249] mce: [Hardware Error]: Machine check events logged
      [ 666.964258] EDAC skx MC7: HANDLING MCE MEMORY ERROR
      [ 666.964262] EDAC skx MC7: CPU 36: Machine Check Event: 0x0 Bank 25: 0x8c00004200800090
      [ 666.964265] EDAC skx MC7: TSC 0x1ca2cdd7071
      [ 666.964267] EDAC skx MC7: ADDR 0x9686af480
      [ 666.964269] EDAC skx MC7: MISC 0x9004b016851cc86
      [ 666.964272] EDAC skx MC7: PROCESSOR 0:0x606a6 TIME 1529666570 SOCKET 1 APIC 0x80
      [ 666.964297] EDAC DEBUG: skx_mce_output_error: err_code:0x0080:0x0090 ProcessorSocketId:0x1 MemoryControllerId:0x3 PhysicalRankId:0x1 Row:0x2d0a Column:0x398 Bank:0x2 BankGroup:0x3 retry_rd_err_log[00438209 00000000 00000001 07316041 00002d0a 00000009686af480] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000]
      [ 666.964308] EDAC MC7: 1 CE memory read error on CPU_SrcID#1_MC#3_Chan#0_DIMM#0 (channel:0 slot:0 page:0x9686af offset:0x480 grain:32 syndrome:0x0 - err_code:0x0080:0x0090 ProcessorSocketId:0x1 MemoryControllerId:0x3 PhysicalRankId:0x1 Row:0x2d0a Column:0x398 Bank:0x2 BankGroup:0x3 retry_rd_err_log[00438209 00000000 00000001 07316041 00002d0a 00000009686af480] correrrcnt[0000 0001 0000 0000 0000 0000 0000 0000]) 
       
      Link:https://gitee.com/openeuler/kernel/pulls/163 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Reviewed-by: Jun Tian <jun.j.tian@intel.com> 
      Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      28e59bca
    • O
      !162 SPR: EDPC get recoveried but the PCI configuration registers value changed · 4aff3e5e
      openeuler-ci-bot 提交于
      Merge Pull Request from: @youquan_song 
       
      [Description]​
      https://gitee.com/openeuler/intel-kernel/issues/I5V39J
      current OLK-5.10 kernel running on SPR server, when the eDPC (Enhanced Downstream Port Containment) was triggered on PCIe device and detected by OS, OS will do recovery from error by reset the link and it also will reset slot if NEED_RESET. But with current OLK-5.10 kernel, after eDPC recovery the PCI configuration registers have changed comparing to the registers before eDPC recovery. Like MaxRedReq/Maxpayload. The issue was because OLK-5.10 kernel missed below patch, which has overwritted the NEED_RESET result, so the slot reset is possible to be skipped.
      
      PCI/ERR: Retain status from error notification
      
      commit 387c72cd upstream.
      
      Overwriting the frozen detected status with the result of the link reset
      loses the NEED_RESET result that drivers are depending on for error
      handling to report the .slot_reset() callback. Retain this status so
      that subsequent error handling has the correct flow.
      [Testing]
      
      BIOS is set to enable eDPC
      Log the target device PCIe configuration values
      Trigger eDPC (hotplug) or eDPC software injection
      Log the target device PCIe configuration values
      compare the PCIe configuration before/after eDPC get recovery. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/162 
      Reviewed-by: Jun Tian <jun.j.tian@intel.com> 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      4aff3e5e
    • O
      !158 Intel SPR: SGX: Backport SGX EDMM support · 16046abd
      openeuler-ci-bot 提交于
      Merge Pull Request from: @zhiquan1-li 
       
      **Content:**
      This PR includes incremental backporting patches which mainly covers [SGX EDMM](https://lore.kernel.org/linux-sgx/239f0f5692d9c00f3c9e0d5d58cd77d2e5ba5eb4.camel@kernel.org/T/#m5f94561a7fef3f33e9922a41f45e5dcf88ad9880) (Enclave Dynamic
      Memory Management) support and its dependencies, as well as subsequent fix until upstream v6.0.
      
      This total patch number is 54, it includes:
      - SGX EDMM support (commit 22~52)
        [[PATCH V5 00/31] x86/sgx and selftests/sgx: Support SGX2](https://lore.kernel.org/linux-sgx/239f0f5692d9c00f3c9e0d5d58cd77d2e5ba5eb4.camel@kernel.org/T/#m5f94561a7fef3f33e9922a41f45e5dcf88ad9880)
      - Its dependencies (commit 1~21)
      - Subsequent bug fix until upstream v6.0 (commit 53~54)
      
      **Intel-kernel issue:**
      https://gitee.com/openeuler/intel-kernel/issues/I5USAM
      
      **Test:**
      1. Build successfully for each commits
      2. Kernel selftest - SGX: PASSED
         (this patchset includes dedicated test cases against EDMM)
         ```sh
         cd tools/testing/selftests/sgx/
         make
         ./test_sgx
         ```
      3. SGX internal stress test: No new failure
      
      **Known issue:**
      None
      
      **Default config change:**
      None 
       
      Link:https://gitee.com/openeuler/kernel/pulls/158 
      Reviewed-by: Jun Tian <jun.j.tian@intel.com> 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      16046abd
  3. 14 10月, 2022 12 次提交
  4. 13 10月, 2022 4 次提交
  5. 11 10月, 2022 19 次提交