1. 04 1月, 2023 3 次提交
  2. 21 12月, 2022 4 次提交
  3. 14 12月, 2022 1 次提交
  4. 30 11月, 2022 1 次提交
  5. 23 11月, 2022 2 次提交
  6. 18 11月, 2022 1 次提交
  7. 16 11月, 2022 8 次提交
  8. 06 11月, 2022 1 次提交
  9. 05 11月, 2022 2 次提交
  10. 03 11月, 2022 1 次提交
    • M
      drm/amd: Fail the suspend if resources can't be evicted · 8d4de331
      Mario Limonciello 提交于
      If a system does not have swap and memory is under 100% usage,
      amdgpu will fail to evict resources.  Currently the suspend
      carries on proceeding to reset the GPU:
      
      ```
      [drm] evicting device resources failed
      [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
      [drm] free PSP TMR buffer
      [TTM] Failed allocating page table
      [drm] evicting device resources failed
      amdgpu 0000:03:00.0: amdgpu: MODE1 reset
      amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
      amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
      ```
      
      At this point if the suspend actually succeeded I think that amdgpu
      would have recovered because the GPU would have power cut off and
      restored.  However the kernel fails to continue the suspend from the
      memory pressure and amdgpu fails to run the "resume" from the aborted
      suspend.
      
      ```
      ACPI: PM: Preparing to enter system sleep state S3
      SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
        cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
        node 0: slabs: 22, objs: 1122, free: 0
      ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)
      
      [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
      [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
      [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
      amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
      PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
      amdgpu 0000:03:00.0: PM: failed to resume async: error -62
      ```
      
      To avoid this series of unfortunate events, fail amdgpu's suspend
      when the memory eviction fails.  This will let the system gracefully
      recover and the user can try suspend again when the memory pressure
      is relieved.
      
      Reported-by: post@davidak.de
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8d4de331
  11. 28 10月, 2022 1 次提交
    • M
      drm/amd: Fail the suspend if resources can't be evicted · 7863c155
      Mario Limonciello 提交于
      If a system does not have swap and memory is under 100% usage,
      amdgpu will fail to evict resources.  Currently the suspend
      carries on proceeding to reset the GPU:
      
      ```
      [drm] evicting device resources failed
      [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <vcn_v3_0> failed -12
      [drm] free PSP TMR buffer
      [TTM] Failed allocating page table
      [drm] evicting device resources failed
      amdgpu 0000:03:00.0: amdgpu: MODE1 reset
      amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
      amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
      ```
      
      At this point if the suspend actually succeeded I think that amdgpu
      would have recovered because the GPU would have power cut off and
      restored.  However the kernel fails to continue the suspend from the
      memory pressure and amdgpu fails to run the "resume" from the aborted
      suspend.
      
      ```
      ACPI: PM: Preparing to enter system sleep state S3
      SLUB: Unable to allocate memory on node -1, gfp=0xdc0(GFP_KERNEL|__GFP_ZERO)
        cache: Acpi-State, object size: 80, buffer size: 80, default order: 0, min order: 0
        node 0: slabs: 22, objs: 1122, free: 0
      ACPI Error: AE_NO_MEMORY, Could not update object reference count (20210730/utdelete-651)
      
      [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed!
      [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
      [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
      amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
      PM: dpm_run_callback(): pci_pm_resume+0x0/0x100 returns -62
      amdgpu 0000:03:00.0: PM: failed to resume async: error -62
      ```
      
      To avoid this series of unfortunate events, fail amdgpu's suspend
      when the memory eviction fails.  This will let the system gracefully
      recover and the user can try suspend again when the memory pressure
      is relieved.
      
      Reported-by: post@davidak.de
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2223Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      7863c155
  12. 27 10月, 2022 1 次提交
  13. 25 10月, 2022 4 次提交
  14. 19 10月, 2022 2 次提交
  15. 18 10月, 2022 2 次提交
  16. 29 9月, 2022 3 次提交
  17. 28 9月, 2022 1 次提交
  18. 21 9月, 2022 1 次提交
  19. 20 9月, 2022 1 次提交
    • Y
      drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu device · 83d29a5f
      YiPeng Chai 提交于
      V3:
      Fixed psp fence and memory issues for the asic
      using smu v13_0_2 when removing amdgpu device.
      
      [Why]:
      1. psp_suspend->psp_free_shared_bufs->
             psp_ta_free_shared_buf->
                 amdgpu_bo_free_kernel->
                   ...->amdgpu_bo_release_notify->
                          amdgpu_fill_buffer
         psp will free vram memory used by psp when psp_suspend
         is called. But for the asic using smu v13_0_2, because
         psp_suspend is called before adev->shutdown is set to
         true when removing the first hive device, amdgpu fill_buffer
         will be called, which will cause fence issues when evicting
         all vram resources in amdgpu vram mgr_fini.
      2. Since psp_hw_fini is not called after calling psp_suspend
         and psp_suspend only calls psp_ring_stop, the psp ring memory
         will not be released when amdgpu device is removed.
      
      [How]:
      1. Set shutdown to true before calling amdgpu_device_gpu_recover,
         then amdgpu_fill_buffer will not be called when psp_suspend is
         called.
      2. Free psp ring memory in psp_sw_fini.
      Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      83d29a5f