1. 10 2月, 2021 1 次提交
  2. 13 11月, 2020 1 次提交
  3. 10 10月, 2020 1 次提交
  4. 25 8月, 2020 4 次提交
  5. 15 8月, 2020 1 次提交
  6. 08 8月, 2020 1 次提交
  7. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  8. 01 7月, 2020 1 次提交
  9. 22 5月, 2020 1 次提交
    • J
      drm/amdgpu fix incorrect sysfs remove behavior for xgmi · a89b5dae
      Jack Zhang 提交于
      Under xgmi setup,some sysfs fail to create for the second time of kmd
      driver loading. It's due to sysfs nodes are not removed appropriately
      in the last unlod time.
      
      Changes of this patch:
      1. remove sysfs for dev_attr_xgmi_error
      2. remove sysfs_link adev->dev->kobj with target name.
         And it only needs to be removed once for a xgmi setup
      3. remove sysfs_link hive->kobj with target name
      
      In amdgpu_xgmi_remove_device:
      1. amdgpu_xgmi_sysfs_rem_dev_info needs to be run per device
      2. amdgpu_xgmi_sysfs_destroy needs to be run on the last node of
      device.
      
      v2: initialize array with memset
      Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a89b5dae
  10. 15 5月, 2020 1 次提交
  11. 09 5月, 2020 1 次提交
  12. 28 4月, 2020 1 次提交
  13. 23 4月, 2020 1 次提交
  14. 02 4月, 2020 1 次提交
  15. 11 3月, 2020 1 次提交
  16. 07 3月, 2020 2 次提交
  17. 27 2月, 2020 2 次提交
  18. 07 2月, 2020 1 次提交
    • H
      drm/amdgpu: move xgmi init/fini to xgmi_add/remove_device call (v2) · 0b9d3760
      Hawking Zhang 提交于
      For sriov, psp ip block has to be initialized before
      ih block for the dynamic register programming interface
      that needed for vf ih ring buffer. On the other hand,
      current psp ip block hw_init function will initialize
      xgmi session which actaully depends on interrupt to
      return session context. This results an empty xgmi ta
      session id and later failures on all the xgmi ta cmd
      invoked from vf. xgmi ta session initialization has to
      be done after ih ip block hw_init call.
      
      to unify xgmi session init/fini for both bare-metal
      sriov virtualization use scenario, move xgmi ta init
      to xgmi_add_device call, and accordingly terminate xgmi
      ta session in xgmi_remove_device call.
      
      The existing suspend/resume sequence will not be changed.
      
      v2: squash in return fix from Nirmoy
      Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Reviewed-by: NFrank Min <Frank.Min@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      0b9d3760
  19. 14 1月, 2020 2 次提交
  20. 19 12月, 2019 1 次提交
  21. 08 11月, 2019 1 次提交
  22. 07 11月, 2019 2 次提交
  23. 03 10月, 2019 1 次提交
  24. 16 9月, 2019 1 次提交
  25. 31 7月, 2019 1 次提交
  26. 19 7月, 2019 3 次提交
  27. 25 5月, 2019 3 次提交
  28. 13 4月, 2019 1 次提交
  29. 28 3月, 2019 1 次提交