1. 27 8月, 2020 2 次提交
  2. 28 7月, 2020 1 次提交
  3. 16 7月, 2020 3 次提交
  4. 03 7月, 2020 1 次提交
  5. 01 7月, 2020 4 次提交
  6. 29 4月, 2020 1 次提交
  7. 02 4月, 2020 1 次提交
  8. 27 2月, 2020 1 次提交
  9. 13 2月, 2020 1 次提交
    • R
      drm/amdkfd: refactor runtime pm for baco · 9593f4d6
      Rajneesh Bhardwaj 提交于
      So far the kfd driver implemented same routines for runtime and system
      wide suspend and resume (s2idle or mem). During system wide suspend the
      kfd aquires an atomic lock that prevents any more user processes to
      create queues and interact with kfd driver and amd gpu. This mechanism
      created problem when amdgpu device is runtime suspended with BACO
      enabled. Any application that relies on kfd driver fails to load because
      the driver reports a locked kfd device since gpu is runtime suspended.
      
      However, in an ideal case, when gpu is runtime  suspended the kfd driver
      should be able to:
      
       - auto resume amdgpu driver whenever a client requests compute service
       - prevent runtime suspend for amdgpu  while kfd is in use
      
      This change refactors the amdgpu and amdkfd drivers to support BACO and
      runtime power management.
      Reviewed-by: NOak Zeng <oak.zeng@amd.com>
      Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      9593f4d6
  10. 08 1月, 2020 1 次提交
  11. 19 12月, 2019 1 次提交
  12. 14 11月, 2019 1 次提交
  13. 26 10月, 2019 1 次提交
    • P
      drm/amdkfd: don't use dqm lock during device reset/suspend/resume · 2c99a547
      Philip Yang 提交于
      If device reset/suspend/resume failed for some reason, dqm lock is
      hold forever and this causes deadlock. Below is a kernel backtrace when
      application open kfd after suspend/resume failed.
      
      Instead of holding dqm lock in pre_reset and releasing dqm lock in
      post_reset, add dqm->sched_running flag which is modified in
      dqm->ops.start and dqm->ops.stop. The flag doesn't need lock protection
      because write/read are all inside dqm lock.
      
      For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
      sched_running flag before sending the updated runlist.
      
      v2: For no-HWS case, when device is stopped, don't call
      load/destroy_mqd for eviction, restore and create queue, and avoid
      debugfs dump hdqs.
      
      Backtrace of dqm lock deadlock:
      
      [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
      than 120 seconds.
      [Thu Oct 17 16:43:37 2019]       Not tainted
      5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
      [Thu Oct 17 16:43:37 2019] "echo 0 >
      /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [Thu Oct 17 16:43:37 2019] rocminfo        D    0  3024   2947
      0x80000000
      [Thu Oct 17 16:43:37 2019] Call Trace:
      [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
      [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
      [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
      [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
      [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
      [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
      [amdgpu]
      [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
      [amdgpu]
      [Thu Oct 17 16:43:37 2019]
      kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
      [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
      [amdgpu]
      [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
      [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
      [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
      [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
      [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
      [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
      [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
      [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
      [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
      [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
      [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2c99a547
  14. 08 10月, 2019 2 次提交
  15. 03 10月, 2019 4 次提交
  16. 16 9月, 2019 3 次提交
    • H
      drm/amdkfd: fix the missed asic name while inited renoir_device_info · acb9acbe
      Huang Rui 提交于
      This patch fixes null pointer issue below, I missed to init the asic renior name
      while I rebase the patches.
      
      [  106.004250] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  106.004254] #PF: supervisor read access in kernel mode
      [  106.004256] #PF: error_code(0x0000) - not-present page
      [  106.004257] PGD 0 P4D 0
      [  106.004261] Oops: 0000 [#1] SMP NOPTI
      [  106.004264] CPU: 3 PID: 1422 Comm: modprobe Not tainted 5.2.0-rc1-custom #1
      [  106.004266] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS
      WCD9814N_Weekly_19_08_1 08/14/2019
      [  106.004272] RIP: 0010:strncpy+0x12/0x30
      [  106.004274] Code: c1 c0 11 48 c1 c6 15 48 31 d0 48 c1 c2 20 31 c2 89 d0 31 f0
      41 5c 5d c3 55 48 85 d2 48 89 f8 48 89 e5 74 1e 48 01 fa 48 89 f9 <44> 0f b6 06
      41 80 f8 01 44 88 01 48 83 de ff 48 83 c1 01 48 39 d1
      [  106.004278] RSP: 0018:ffffc092c1fd37a8 EFLAGS: 00010286
      [  106.004281] RAX: ffff9e943466a28c RBX: 00000000000036ed RCX: ffff9e943466a28c
      [  106.004283] RDX: ffff9e943466a2ac RSI: 0000000000000000 RDI: ffff9e943466a28c
      [  106.004285] RBP: ffffc092c1fd37a8 R08: ffff9e943d100000 R09: 0000000000000228
      [  106.004287] R10: ffff9e94418dc5a8 R11: ffff9e944746c0d0 R12: 0000000000000000
      [  106.004289] R13: ffff9e943fa1ec00 R14: ffff9e943466a200 R15: ffff9e943466a200
      [  106.004291] FS:  00007f7a022c5540(0000) GS:ffff9e9447ac0000(0000)
      knlGS:0000000000000000
      [  106.004294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.004296] CR2: 0000000000000000 CR3: 00000001ff0b0000 CR4: 0000000000340ee0
      [  106.004298] Call Trace:
      [  106.004382]  kfd_topology_add_device+0x150/0x610 [amdgpu]
      [  106.004445]  kgd2kfd_device_init+0x2e0/0x4f0 [amdgpu]
      [  106.004509]  amdgpu_amdkfd_device_init+0x14c/0x1b0 [amdgpu]
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Reviewed-and-Tested-by: NAaron Liu <aaron.liu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      acb9acbe
    • H
      drm/amdkfd: add renoir kfd device info (v2) · 2b9c2211
      Huang Rui 提交于
      This patch inits renoir kfd device info, so we treat renoir as "dgpu"
      (bypass iommu v2). Will enable needs_iommu_device till renoir iommu is ready.
      
      v2: rebase and align the drm-next
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2b9c2211
    • Y
      drm/amdkfd: Support Navi14 in KFD · 8099ae40
      Yong Zhao 提交于
      Initial support of Navi14 in KFD. The device IDs will be added later.
      Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8099ae40
  17. 14 9月, 2019 2 次提交
  18. 24 8月, 2019 1 次提交
  19. 22 8月, 2019 1 次提交
  20. 19 7月, 2019 5 次提交
  21. 03 7月, 2019 1 次提交
  22. 02 7月, 2019 1 次提交
  23. 28 6月, 2019 1 次提交