1. 13 11月, 2020 1 次提交
    • L
      drm/amd/amdgpu/amdgpu_cs: Add a couple of missing function param descriptions · fec3124d
      Lee Jones 提交于
      Fixes the following W=1 kernel build warning(s):
      
       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:685: warning: Function parameter or member 'backoff' not described in 'amdgpu_cs_parser_fini'
       drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1655: warning: Function parameter or member 'map' not described in 'amdgpu_cs_find_mapping'
      
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Jerome Glisse <glisse@freedesktop.org>
      Cc: amd-gfx@lists.freedesktop.org
      Cc: dri-devel@lists.freedesktop.org
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      fec3124d
  2. 04 11月, 2020 1 次提交
  3. 03 11月, 2020 2 次提交
  4. 24 9月, 2020 1 次提交
  5. 25 8月, 2020 1 次提交
  6. 19 8月, 2020 1 次提交
  7. 18 8月, 2020 1 次提交
  8. 15 8月, 2020 1 次提交
  9. 06 8月, 2020 2 次提交
  10. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  11. 01 7月, 2020 1 次提交
  12. 20 5月, 2020 1 次提交
  13. 01 5月, 2020 1 次提交
  14. 29 4月, 2020 3 次提交
  15. 02 4月, 2020 1 次提交
  16. 10 3月, 2020 1 次提交
  17. 27 2月, 2020 1 次提交
  18. 05 2月, 2020 2 次提交
  19. 17 1月, 2020 2 次提交
  20. 10 12月, 2019 1 次提交
  21. 24 11月, 2019 1 次提交
  22. 25 10月, 2019 1 次提交
  23. 18 10月, 2019 1 次提交
  24. 16 10月, 2019 1 次提交
  25. 16 9月, 2019 1 次提交
  26. 14 9月, 2019 2 次提交
    • A
      drm/amdgpu: Avoid HW GPU reset for RAS. · 7c6e68c7
      Andrey Grodzovsky 提交于
      Problem:
      Under certain conditions, when some IP bocks take a RAS error,
      we can get into a situation where a GPU reset is not possible
      due to issues in RAS in SMU/PSP.
      
      Temporary fix until proper solution in PSP/SMU is ready:
      When uncorrectable error happens the DF will unconditionally
      broadcast error event packets to all its clients/slave upon
      receiving fatal error event and freeze all its outbound queues,
      err_event_athub interrupt  will be triggered.
      In such case and we use this interrupt
      to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
      reset, only stops schedulers, deatches all in progress and not yet scheduled
      job's fences, set error code on them and signals.
      Also reject any new incoming job submissions from user space.
      All this is done to notify the applications of the problem.
      
      v2:
      Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
      Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
      Remove print param from amdgpu_ras_query_error_count
      
      v3:
      Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
      for other XGMI hive memebers.
      Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      7c6e68c7
    • C
      drm/amdgpu: remove amdgpu_cs_try_evict · 43ce6bab
      Christian König 提交于
      Trying to evict things from the current working set doesn't work that
      well anymore because of per VM BOs.
      
      Rely on reserving VRAM for page tables to avoid contention.
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      43ce6bab
  27. 22 8月, 2019 2 次提交
  28. 13 8月, 2019 1 次提交
  29. 06 8月, 2019 1 次提交
  30. 05 8月, 2019 1 次提交
  31. 31 7月, 2019 2 次提交