1. 10 2月, 2022 1 次提交
  2. 15 12月, 2021 1 次提交
  3. 14 12月, 2021 1 次提交
    • L
      drm/amdgpu: add support for SMU debug option · 6ff7fddb
      Lang Yu 提交于
      SMU firmware expects the driver maintains error context
      and doesn't interact with SMU any more when SMU errors
      occurred. That will aid in debugging SMU firmware issues.
      
      Add SMU debug option support for this request, it can be
      enabled or disabled via amdgpu_smu_debug debugfs file.
      Use a 32-bit mask to indicate corresponding debug modes.
      Currently, only one mode(HALT_ON_ERROR) is supported.
      When enabled, it brings hardware to a kind of halt state
      so that no one can touch it any more in the envent of SMU
      errors.
      
      The dirver interacts with SMU via sending messages. And
      threre are three ways to sending messages to SMU in current
      implementation. Handle them respectively as following:
      
      1, smu_cmn_send_smc_msg_with_param() for normal timeout cases
      
        Halt on any error.
      
      2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
      for longer timeout cases
      
        Halt on errors apart from ETIME. Otherwise this way won't work.
        Let the user handle ETIME error in such a case.
      
      3, smu_cmn_send_msg_without_waiting() for no waiting cases
      
        Halt on errors apart from ETIME. Otherwise second way won't work.
      
      == Command Guide ==
      
      1, enable HALT_ON_ERROR mode
      
       # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
      
      2, disable HALT_ON_ERROR mode
      
       # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug
      
      v5:
       - Use bit mask to allow more debug features.(Evan)
       - Use WRAN() instead of BUG().(Evan)
      
      v4:
       - Set to halt state instead of a simple hang.(Christian)
      
      v3:
       - Use debugfs_create_bool().(Christian)
       - Put variable into smu_context struct.
       - Don't resend command when timeout.
      
      v2:
       - Resend command when timeout.(Lijo)
       - Use debugfs file instead of module parameter.
      Signed-off-by: NLang Yu <lang.yu@amd.com>
      Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      6ff7fddb
  4. 07 10月, 2021 2 次提交
  5. 06 10月, 2021 1 次提交
  6. 05 10月, 2021 1 次提交
  7. 15 9月, 2021 4 次提交
  8. 02 9月, 2021 1 次提交
  9. 17 8月, 2021 1 次提交
    • J
      drm/amd/amdgpu embed hw_fence into amdgpu_job · c530b02f
      Jack Zhang 提交于
      Why: Previously hw fence is alloced separately with job.
      It caused historical lifetime issues and corner cases.
      The ideal situation is to take fence to manage both job
      and fence's lifetime, and simplify the design of gpu-scheduler.
      
      How:
      We propose to embed hw_fence into amdgpu_job.
      1. We cover the normal job submission by this method.
      2. For ib_test, and submit without a parent job keep the
      legacy way to create a hw fence separately.
      v2:
      use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
      embedded in a job.
      v3:
      remove redundant variable ring in amdgpu_job
      v4:
      add tdr sequence support for this feature. Add a job_run_counter to
      indicate whether this job is a resubmit job.
      v5
      add missing handling in amdgpu_fence_enable_signaling
      Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com>
      Signed-off-by: NJack Zhang <Jack.Zhang7@hotmail.com>
      Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Reviewed by: Monk Liu <monk.liu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      c530b02f
  10. 16 6月, 2021 1 次提交
  11. 21 5月, 2021 1 次提交
    • L
      drm/amd/amdgpu/amdgpu_debugfs: Fix a couple of misnamed functions · e72d4a8b
      Lee Jones 提交于
      Fixes the following W=1 kernel build warning(s):
      
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1004: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_write(). Prototype was for amdgpu_debugfs_gfxoff_write() instead
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1053: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_status(). Prototype was for amdgpu_debugfs_gfxoff_read() instead
      
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: amd-gfx@lists.freedesktop.org
      Cc: dri-devel@lists.freedesktop.org
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      e72d4a8b
  12. 04 3月, 2021 1 次提交
  13. 03 3月, 2021 1 次提交
  14. 27 2月, 2021 1 次提交
  15. 19 2月, 2021 4 次提交
  16. 14 1月, 2021 1 次提交
  17. 08 12月, 2020 2 次提交
  18. 14 11月, 2020 1 次提交
    • L
      drm/amd/amdgpu/amdgpu_debugfs: Demote obvious abuse of kernel-doc formatting · 20ed491b
      Lee Jones 提交于
      Fixes the following W=1 kernel build warning(s):
      
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_read'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_read'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_read'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_read'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_write'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_write'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_write'
       drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_write'
      
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: amd-gfx@lists.freedesktop.org
      Cc: dri-devel@lists.freedesktop.org
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      20ed491b
  19. 03 11月, 2020 1 次提交
  20. 27 10月, 2020 1 次提交
  21. 16 10月, 2020 1 次提交
  22. 07 10月, 2020 1 次提交
  23. 01 10月, 2020 1 次提交
  24. 25 8月, 2020 4 次提交
  25. 15 8月, 2020 2 次提交
  26. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  27. 22 7月, 2020 1 次提交
  28. 15 7月, 2020 1 次提交