1. 20 9月, 2022 2 次提交
    • Y
      drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu device · 83d29a5f
      YiPeng Chai 提交于
      V3:
      Fixed psp fence and memory issues for the asic
      using smu v13_0_2 when removing amdgpu device.
      
      [Why]:
      1. psp_suspend->psp_free_shared_bufs->
             psp_ta_free_shared_buf->
                 amdgpu_bo_free_kernel->
                   ...->amdgpu_bo_release_notify->
                          amdgpu_fill_buffer
         psp will free vram memory used by psp when psp_suspend
         is called. But for the asic using smu v13_0_2, because
         psp_suspend is called before adev->shutdown is set to
         true when removing the first hive device, amdgpu fill_buffer
         will be called, which will cause fence issues when evicting
         all vram resources in amdgpu vram mgr_fini.
      2. Since psp_hw_fini is not called after calling psp_suspend
         and psp_suspend only calls psp_ring_stop, the psp ring memory
         will not be released when amdgpu device is removed.
      
      [How]:
      1. Set shutdown to true before calling amdgpu_device_gpu_recover,
         then amdgpu_fill_buffer will not be called when psp_suspend is
         called.
      2. Free psp ring memory in psp_sw_fini.
      Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      83d29a5f
    • Y
      drm/amdgpu: Adjust removal control flow for smu v13_0_2 · f5c7e779
      YiPeng Chai 提交于
      Adjust removal control flow for smu v13_0_2:
         During amdgpu uninstallation, when removing the first
      device, the kernel needs to first send a mode1reset message
      to all gpu devices. Otherwise, smu initialization will fail
      the next time amdgpu is installed.
      
      V2:
      1. Update commit comments.
      2. Remove the global variable amdgpu_device_remove_cnt
         and add a variable to the structure amdgpu_hive_info.
      3. Use hive to detect the first removed device instead of
         a global variable.
      
      V3:
       1. Update commit comments.
       2. Split a patch into multiple patches.
       3. The current patch does:
          a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into
             the existing gpu recover path, which make all devices
             in hive list only have HW reset but no resume (except
             the base IP).
          b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and
             AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover
             in amdgpu_pci_remove when removing the first device in
             hive list.
          c. When removing the first device, the IP blocks keyword
             function call sequence is as follows:
      .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini.
         ^                           |
         |-<----------<---------<----|
      	The first three sequences are because of a call to
              amdgpu_device_gpu_recover. The three sequences will be
              executed in a loop until all devices in the hive list
              are iterated.
              The sequences starting from .hw_fini only apply to the
              first device. Since .suspend has been called before,
              except the resumed phase1 basic ip blocks, all other ip
              blocks .hw_fini of current device will do nothing.
           d. When removing other devices, the calling sequences is the
              same as legacy:
      	   .hw_fini -> .early_fini -> .sw_fini.
      	Since .suspend has been called when removing the first device,
              except the resumed phase1 basic ip blocks, all of other ip
              blocks .hw_fini of current device will do nothing.
      Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      f5c7e779
  2. 15 9月, 2022 1 次提交
  3. 14 9月, 2022 2 次提交
  4. 08 9月, 2022 1 次提交
    • Y
      drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is uninstalled · fac53471
      YiPeng Chai 提交于
      V1:
        The psp_cmd_submit_buf function is called by psp_hw_fini to send
      TA unload messages to psp to terminate ras, asd and tmr. But when
      amdgpu is uninstalled, drm_dev_unplug is called earlier than
      psp_hw_fini in amdgpu_pci_remove, the calling order as follows:
      static void amdgpu_pci_remove(struct pci_dev *pdev) {
      	drm_dev_unplug
      	......
      	amdgpu_driver_unload_kms->amdgpu_device_fini_hw->...
      		->.hw_fini->psp_hw_fini->...
      		->psp_ta_unload->psp_cmd_submit_buf
      	......
      }
      The program will return when calling drm_dev_enter in psp_cmd_submit_buf.
      
      So the call to drm_dev_enter in psp_cmd_submit_buf should be
      removed, so that the TA unload messages can be sent to the psp
      when amdgpu is uninstalled.
      
      V2:
      1. Restore psp_cmd_submit_buf to its original code.
      2. Move drm_dev_unplug call after amdgpu_driver_unload_kms in
         amdgpu_pci_remove.
      3. Since amdgpu_device_fini_hw is called by amdgpu_driver_unload_kms,
         remove the unplug check to release device mmio resource in
         amdgpu_device_fini_hw before calling drm_dev_unplug.
      Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      fac53471
  5. 30 8月, 2022 1 次提交
  6. 26 8月, 2022 1 次提交
  7. 23 8月, 2022 2 次提交
  8. 17 8月, 2022 3 次提交
  9. 11 8月, 2022 1 次提交
  10. 29 7月, 2022 2 次提交
  11. 19 7月, 2022 1 次提交
  12. 13 7月, 2022 1 次提交
  13. 30 6月, 2022 3 次提交
  14. 28 6月, 2022 2 次提交
  15. 22 6月, 2022 1 次提交
  16. 15 6月, 2022 1 次提交
  17. 11 6月, 2022 4 次提交
  18. 08 6月, 2022 1 次提交
  19. 07 6月, 2022 2 次提交
  20. 04 6月, 2022 4 次提交
  21. 27 5月, 2022 2 次提交
  22. 11 5月, 2022 1 次提交
  23. 07 5月, 2022 1 次提交