1. 11 10月, 2022 2 次提交
  2. 29 9月, 2022 2 次提交
  3. 14 9月, 2022 1 次提交
  4. 17 8月, 2022 1 次提交
  5. 13 7月, 2022 1 次提交
  6. 06 7月, 2022 2 次提交
  7. 11 6月, 2022 2 次提交
  8. 02 6月, 2022 2 次提交
  9. 27 5月, 2022 1 次提交
  10. 11 5月, 2022 2 次提交
  11. 26 4月, 2022 1 次提交
  12. 23 4月, 2022 3 次提交
  13. 29 3月, 2022 1 次提交
  14. 16 3月, 2022 1 次提交
  15. 03 3月, 2022 2 次提交
  16. 24 2月, 2022 2 次提交
  17. 18 2月, 2022 3 次提交
  18. 15 2月, 2022 4 次提交
  19. 08 2月, 2022 3 次提交
    • Y
      Revert "drm/amdgpu: Add judgement to avoid infinite loop" · a50b0482
      yipechai 提交于
      The commit d5e8ff5f ("drm/amdgpu: Fixed the defect of soft lock caused by infinite loop")
      had fixed this defect.
      
      Revert workaround
      commit a2170b4a ("drm/amdgpu: Add judgement to avoid infinite loop").
      Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
      Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a50b0482
    • Y
      drm/amdgpu: Fixed the defect of soft lock caused by infinite loop · d5e8ff5f
      yipechai 提交于
      1. The infinite loop case only occurs on multiple cards support
         ras functions.
      2. The explanation of root cause refer to commit 76641cbbf196
         ("drm/amdgpu: Add judgement to avoid infinite loop").
      3. Create new node to manage each unique ras instance to guarantee
         each device .ras_list is completely independent.
      4. Fixes: commit 7a6b8ab3231b51 ("drm/amdgpu: Unify ras block
         interface for each ras block").
      5. The soft locked logs are as follows:
      [  262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G           OE     5.13.0-27-generic #29~20.04.1-Ubuntu
      [  262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020
      [  262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
      [  262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu]
      [  262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45
      [  262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202
      [  262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248
      [  262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000
      [  262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001
      [  262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e
      [  262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003
      [  262.166258] FS:  0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000
      [  262.166261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0
      [  262.166267] Call Trace:
      [  262.166272]  amdgpu_ras_do_recovery+0x130/0x290 [amdgpu]
      [  262.166529]  ? psi_task_switch+0xd2/0x250
      [  262.166537]  ? __switch_to+0x11d/0x460
      [  262.166542]  ? __switch_to_asm+0x36/0x70
      [  262.166549]  process_one_work+0x220/0x3c0
      [  262.166556]  worker_thread+0x4d/0x3f0
      [  262.166560]  ? process_one_work+0x3c0/0x3c0
      [  262.166563]  kthread+0x12b/0x150
      [  262.166568]  ? set_kthread_struct+0x40/0x40
      [  262.166571]  ret_from_fork+0x22/0x30
      Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
      Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d5e8ff5f
    • L
      drm/amdgpu: Print once if RAS unsupported · afa37315
      Luben Tuikov 提交于
      MESA polls for errors every 2-3 seconds. Printing with dev_info() causes
      the dmesg log to fill up with the same message, e.g,
      
      [18028.206676] amdgpu 0000:0b:00.0: amdgpu: df doesn't config ras function.
      
      Make it dev_dbg_once(), as it isn't something correctible during boot or
      thereafter, so printing just once is sufficient. Also sanitize the message.
      
      Cc: Alex Deucher <Alexander.Deucher@amd.com>
      Cc: Hawking Zhang <Hawking.Zhang@amd.com>
      Cc: John Clements <john.clements@amd.com>
      Cc: Tao Zhou <tao.zhou1@amd.com>
      Cc: yipechai <YiPeng.Chai@amd.com>
      Fixes: 8b0fb0e9 ("drm/amdgpu: Modify gfx block to fit for the unified ras block data and ops")
      Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
      Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      afa37315
  20. 03 2月, 2022 1 次提交
    • Y
      drm/amdgpu: Add judgement to avoid infinite loop · a2170b4a
      yipechai 提交于
      1. The infinite loop causing soft lock occurs on multiple amdgpu cards
         supporting ras feature.
      2. This a workaround patch to fix 6492e1b0.
         It is valid for multiple amdgpu cards of the same type.
      3. The root cause is that each GPU card device has a separate .ras_list
         link header, but the instance and linked list node of each ras block
         are unique. When each device is initialized, each ras instance will
         repeatedly add link node to the device every time. In this way, only
         the .ras_list of the last initialized device is completely correct.
         the .ras_list->prev and .ras_list->next of the device initialzied
         before can still point to the correct ras instance, but the prev
         pointer and next pointer of the pointed ras instance both point to
         the last initialized device's .ras_ list instead of the beginning
         .ras_ list. When using list_for_each_entry_safe searches for
         non-existent Ras nodes on devices other than the last device, the
         last ras instance next pointer cannot always be equal to the
         beginning .ras_list, so that the loop cannot be terminated, the
         program enters a infinite loop.
       BTW: Since the data and initialization process of each card are the same,
            the link list between ras instances will not be destroyed every time
            the device is initialized.
       4. The soft locked logs are as follows:
      [  262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G           OE     5.13.0-27-generic #29~20.04.1-Ubuntu
      [  262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020
      [  262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
      [  262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu]
      [  262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45
      [  262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202
      [  262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248
      [  262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000
      [  262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001
      [  262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e
      [  262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003
      [  262.166258] FS:  0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000
      [  262.166261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0
      [  262.166267] Call Trace:
      [  262.166272]  amdgpu_ras_do_recovery+0x130/0x290 [amdgpu]
      [  262.166529]  ? psi_task_switch+0xd2/0x250
      [  262.166537]  ? __switch_to+0x11d/0x460
      [  262.166542]  ? __switch_to_asm+0x36/0x70
      [  262.166549]  process_one_work+0x220/0x3c0
      [  262.166556]  worker_thread+0x4d/0x3f0
      [  262.166560]  ? process_one_work+0x3c0/0x3c0
      [  262.166563]  kthread+0x12b/0x150
      [  262.166568]  ? set_kthread_struct+0x40/0x40
      [  262.166571]  ret_from_fork+0x22/0x30
      
      Fixes: 6492e1b0 ("drm/amdgpu: Unify ras block interface for each ras block")
      Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
      Reviewed-by: NJohn Clements <john.clements@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a2170b4a
  21. 28 1月, 2022 1 次提交
  22. 26 1月, 2022 1 次提交
  23. 20 1月, 2022 1 次提交