1. 14 12月, 2021 1 次提交
  2. 23 11月, 2021 1 次提交
  3. 06 11月, 2021 1 次提交
  4. 27 8月, 2021 1 次提交
  5. 25 8月, 2021 1 次提交
  6. 19 8月, 2021 1 次提交
    • J
      drm/amdgpu: get extended xgmi topology data · 44357a1b
      Jonathan Kim 提交于
      The TA has a limit to the amount of data that can be retrieved from
      GET_TOPOLOGY.  For setups that exceed this limit, the xGMI topology
      needs to be re-initialized and data needs to be re-fetched from the
      extended link records by setting a flag in the shared command buffer.
      
      The number of hops and the number of links must be accumulated by the
      driver. Other data points are all fetched from the first request.
      Because the TA has already exceeded its link record limit, it
      cannot hold bidirectional information.  Otherwise the driver would
      have to do more than two fetches so the driver has to reflect the
      topology information in the opposite direction.
      
      v2: squashed with internal reviewed fix
      Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
      Reviewed-by: NHawking Zhang <hawking.zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      44357a1b
  7. 17 8月, 2021 1 次提交
  8. 23 7月, 2021 1 次提交
  9. 10 4月, 2021 3 次提交
    • H
      drm/amdgpu: move xgmi ras functions to xgmi_ras_funcs · 52137ca8
      Hawking Zhang 提交于
      xgmi ras is not managed by gpu driver when gpu is
      connected to cpu through xgmi. move all xgmi ras
      functions to xgmi_ras_funcs so gpu driver only
      initializes xgmi ras functions when it manages
      xgmi ras.
      Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Reviewed-by: NDennis Li <Dennis.Li@amd.com>
      Reviewed-by: NJohn Clements <John.Clements@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      52137ca8
    • T
      drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit · 36000c7a
      Tian Tao 提交于
      Fix the following coccicheck warning:
      drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING:
      use scnprintf or sprintf
      drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING:
      use scnprintf or sprintf
      Signed-off-by: NTian Tao <tiantao6@hisilicon.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      36000c7a
    • E
      drm/amd/pm: label these APIs used internally as static · c6ce68e6
      Evan Quan 提交于
      Also drop unnecessary header file and declarations.
      Signed-off-by: NEvan Quan <evan.quan@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      c6ce68e6
  10. 24 3月, 2021 2 次提交
    • S
      drm/amdgpu: Reset the devices in the XGMI hive duirng probe · e3c1b071
      shaoyunl 提交于
      In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices
      without sync to each other. This could cause device hang since for XGMI configuration, all the devices
      within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue
      by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't
      do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init
      to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time
      Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
      Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      e3c1b071
    • J
      drm/amdgpu: mask the xgmi number of hops reported from psp to kfd · 4ac5617c
      Jonathan Kim 提交于
      The psp supplies the link type in the upper 2 bits of the psp xgmi node
      information num_hops field.  With a new link type, Aldebaran has these
      bits set to a non-zero value (1 = xGMI3) so the KFD topology will report
      the incorrect IO link weights without proper masking.
      The actual number of hops is located in the 3 least significant bits of
      this field so mask if off accordingly before passing it to the KFD.
      Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
      Reviewed-by: NAmber Lin <amber.lin@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      4ac5617c
  11. 10 2月, 2021 1 次提交
  12. 13 11月, 2020 1 次提交
  13. 10 10月, 2020 1 次提交
  14. 25 8月, 2020 4 次提交
  15. 15 8月, 2020 1 次提交
  16. 08 8月, 2020 1 次提交
  17. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  18. 01 7月, 2020 1 次提交
  19. 22 5月, 2020 1 次提交
    • J
      drm/amdgpu fix incorrect sysfs remove behavior for xgmi · a89b5dae
      Jack Zhang 提交于
      Under xgmi setup,some sysfs fail to create for the second time of kmd
      driver loading. It's due to sysfs nodes are not removed appropriately
      in the last unlod time.
      
      Changes of this patch:
      1. remove sysfs for dev_attr_xgmi_error
      2. remove sysfs_link adev->dev->kobj with target name.
         And it only needs to be removed once for a xgmi setup
      3. remove sysfs_link hive->kobj with target name
      
      In amdgpu_xgmi_remove_device:
      1. amdgpu_xgmi_sysfs_rem_dev_info needs to be run per device
      2. amdgpu_xgmi_sysfs_destroy needs to be run on the last node of
      device.
      
      v2: initialize array with memset
      Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a89b5dae
  20. 15 5月, 2020 1 次提交
  21. 09 5月, 2020 1 次提交
  22. 28 4月, 2020 1 次提交
  23. 23 4月, 2020 1 次提交
  24. 02 4月, 2020 1 次提交
  25. 11 3月, 2020 1 次提交
  26. 07 3月, 2020 2 次提交
  27. 27 2月, 2020 2 次提交
  28. 07 2月, 2020 1 次提交
    • H
      drm/amdgpu: move xgmi init/fini to xgmi_add/remove_device call (v2) · 0b9d3760
      Hawking Zhang 提交于
      For sriov, psp ip block has to be initialized before
      ih block for the dynamic register programming interface
      that needed for vf ih ring buffer. On the other hand,
      current psp ip block hw_init function will initialize
      xgmi session which actaully depends on interrupt to
      return session context. This results an empty xgmi ta
      session id and later failures on all the xgmi ta cmd
      invoked from vf. xgmi ta session initialization has to
      be done after ih ip block hw_init call.
      
      to unify xgmi session init/fini for both bare-metal
      sriov virtualization use scenario, move xgmi ta init
      to xgmi_add_device call, and accordingly terminate xgmi
      ta session in xgmi_remove_device call.
      
      The existing suspend/resume sequence will not be changed.
      
      v2: squash in return fix from Nirmoy
      Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Reviewed-by: NFrank Min <Frank.Min@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      0b9d3760
  29. 14 1月, 2020 2 次提交
  30. 19 12月, 2019 1 次提交
  31. 08 11月, 2019 1 次提交