1. 06 11月, 2020 1 次提交
  2. 26 9月, 2020 1 次提交
  3. 25 8月, 2020 2 次提交
  4. 15 8月, 2020 1 次提交
  5. 05 8月, 2020 1 次提交
  6. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  7. 03 7月, 2020 1 次提交
  8. 01 7月, 2020 3 次提交
  9. 18 5月, 2020 1 次提交
  10. 24 4月, 2020 1 次提交
  11. 14 4月, 2020 2 次提交
  12. 02 4月, 2020 3 次提交
  13. 23 1月, 2020 1 次提交
  14. 12 12月, 2019 1 次提交
    • Y
      drm/amd/powerplay: enable pp one vf mode for vega10 · c9ffa427
      Yintian Tao 提交于
      Originally, due to the restriction from PSP and SMU, VF has
      to send message to hypervisor driver to handle powerplay
      change which is complicated and redundant. Currently, SMU
      and PSP can support VF to directly handle powerplay
      change by itself. Therefore, the old code about the handshake
      between VF and PF to handle powerplay will be removed and VF
      will use new the registers below to handshake with SMU.
      mmMP1_SMN_C2PMSG_101: register to handle SMU message
      mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
      mmMP1_SMN_C2PMSG_103: register to handle SMU response
      
      v2: remove module parameter pp_one_vf
      v3: fix the parens
      v4: forbid vf to change smu feature
      v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
      v6: change skip condition at vega10_copy_table_to_smc
      Signed-off-by: NYintian Tao <yttao@amd.com>
      Acked-by: NEvan Quan <evan.quan@amd.com>
      Reviewed-by: NKenneth Feng <kenneth.feng@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      c9ffa427
  15. 02 8月, 2019 1 次提交
  16. 17 7月, 2019 1 次提交
  17. 11 6月, 2019 1 次提交
  18. 25 5月, 2019 1 次提交
  19. 20 4月, 2019 1 次提交
  20. 11 4月, 2019 1 次提交
  21. 20 11月, 2018 1 次提交
  22. 06 11月, 2018 5 次提交
  23. 11 9月, 2018 1 次提交
  24. 28 8月, 2018 1 次提交
  25. 08 3月, 2018 1 次提交
  26. 02 3月, 2018 1 次提交
    • M
      drm/amdgpu: try again kiq access if not in IRQ(v4) · a22144a5
      Monk Liu 提交于
      sometimes GPU is switched to other VFs and won't swich
      back soon, so the kiq reg access will not signal within
      a short period, instead of busy waiting a long time(MAX_KEQ_REG_WAIT)
      and returning TMO we can istead sleep 5ms and try again
      later (non irq context)
      
      And since the waiting in kiq_r/weg is busy wait, so MAX_KIQ_REG_WAIT
      shouldn't set to a long time, set it to 10ms is more appropriate.
      
      if gpu already in reset state, don't retry the KIQ reg access
      otherwise it would always hang because KIQ was already die usually.
      
      v2:
      replace schedule() with msleep() for the wait
      
      v3:
      use while loop for the wait repeating
      use macros for the sleep period
      more description for it
      
      v4:
      drop unused variable
      Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
      Reviewed-by: Christian König <christian.koenig@amd.com
      Reviewed-by: NPixel Ding <Pixel.Ding@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a22144a5
  27. 20 2月, 2018 1 次提交
  28. 07 12月, 2017 2 次提交
  29. 05 12月, 2017 1 次提交