1. 18 9月, 2020 2 次提交
  2. 16 9月, 2020 3 次提交
    • Y
      drm/amdkfd: Fix -Wunused-const-variable warning · 2b3bbf23
      YueHaibing 提交于
      If KFD_SUPPORT_IOMMU_V2 is not set, gcc warns:
      
      drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:121:37: warning: ‘raven_device_info’ defined but not used [-Wunused-const-variable=]
       static const struct kfd_device_info raven_device_info = {
                                           ^~~~~~~~~~~~~~~~~
      
      As Huang Rui suggested, Raven already has the fallback path,
      so it should be out of IOMMU v2 flag.
      Suggested-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NHuang Rui <ray.huang@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2b3bbf23
    • D
      drm/amdkfd: fix a memory leak issue · edb084f4
      Dennis Li 提交于
      In the resume stage of GPU recovery, start_cpsch will call pm_init
      which set pm->allocated as false, cause the next pm_release_ib has
      no chance to release ib memory.
      
      Add pm_release_ib in stop_cpsch which will be called in the suspend
      stage of GPU recovery.
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      edb084f4
    • D
      drm/kfd: fix a system crash issue during GPU recovery · a9a83a92
      Dennis Li 提交于
      The crash log as the below:
      
      [Thu Aug 20 23:18:14 2020] general protection fault: 0000 [#1] SMP NOPTI
      [Thu Aug 20 23:18:14 2020] CPU: 152 PID: 1837 Comm: kworker/152:1 Tainted: G           OE     5.4.0-42-generic #46~18.04.1-Ubuntu
      [Thu Aug 20 23:18:14 2020] Hardware name: GIGABYTE G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020
      [Thu Aug 20 23:18:14 2020] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
      [Thu Aug 20 23:18:14 2020] RIP: 0010:evict_process_queues_cpsch+0xc9/0x130 [amdgpu]
      [Thu Aug 20 23:18:14 2020] Code: 49 8d 4d 10 48 39 c8 75 21 eb 44 83 fa 03 74 36 80 78 72 00 74 0c 83 ab 68 01 00 00 01 41 c6 45 41 00 48 8b 00 48 39 c8 74 25 <80> 78 70 00 c6 40 6d 01 74 ee 8b 50 28 c6 40 70 00 83 ab 60 01 00
      [Thu Aug 20 23:18:14 2020] RSP: 0018:ffffb29b52f6fc90 EFLAGS: 00010213
      [Thu Aug 20 23:18:14 2020] RAX: 1c884edb0a118914 RBX: ffff8a0d45ff3c00 RCX: ffff8a2d83e41038
      [Thu Aug 20 23:18:14 2020] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8a0e2e4178c0
      [Thu Aug 20 23:18:14 2020] RBP: ffffb29b52f6fcb0 R08: 0000000000001b64 R09: 0000000000000004
      [Thu Aug 20 23:18:14 2020] R10: ffffb29b52f6fb78 R11: 0000000000000001 R12: ffff8a0d45ff3d28
      [Thu Aug 20 23:18:14 2020] R13: ffff8a2d83e41028 R14: 0000000000000000 R15: 0000000000000000
      [Thu Aug 20 23:18:14 2020] FS:  0000000000000000(0000) GS:ffff8a0e2e400000(0000) knlGS:0000000000000000
      [Thu Aug 20 23:18:14 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [Thu Aug 20 23:18:14 2020] CR2: 000055c783c0e6a8 CR3: 00000034a1284000 CR4: 0000000000340ee0
      [Thu Aug 20 23:18:14 2020] Call Trace:
      [Thu Aug 20 23:18:14 2020]  kfd_process_evict_queues+0x43/0xd0 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  kfd_suspend_all_processes+0x60/0xf0 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  kgd2kfd_suspend.part.7+0x43/0x50 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  kgd2kfd_pre_reset+0x46/0x60 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  amdgpu_amdkfd_pre_reset+0x1a/0x20 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  amdgpu_device_gpu_recover+0x377/0xf90 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
      [Thu Aug 20 23:18:14 2020]  process_one_work+0x20f/0x400
      [Thu Aug 20 23:18:14 2020]  worker_thread+0x34/0x410
      
      When GPU hang, user process will fail to create a compute queue whose
      struct object will be freed later, but driver wrongly add this queue to
      queue list of the proccess. And then kfd_process_evict_queues will
      access a freed memory, which cause a system crash.
      
      v2:
      The failure to execute_queues should probably not be reported to
      the caller of create_queue, because the queue was already created.
      Therefore change to ignore the return value from execute_queues.
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a9a83a92
  3. 01 9月, 2020 1 次提交
  4. 27 8月, 2020 4 次提交
  5. 25 8月, 2020 1 次提交
  6. 19 8月, 2020 1 次提交
  7. 15 8月, 2020 1 次提交
  8. 11 8月, 2020 3 次提交
  9. 05 8月, 2020 1 次提交
  10. 28 7月, 2020 4 次提交
  11. 16 7月, 2020 5 次提交
  12. 08 7月, 2020 1 次提交
  13. 03 7月, 2020 2 次提交
  14. 01 7月, 2020 11 次提交