1. 10 3月, 2021 1 次提交
  2. 02 2月, 2021 1 次提交
    • H
      drm/amdkfd: fix null pointer panic while free buffer in kfd · 875440fd
      Huang Rui 提交于
      In drm_gem_object_free, it will call funcs of drm buffer obj. So
      kfd_alloc should use amdgpu_gem_object_create instead of
      amdgpu_bo_create to initialize the funcs as amdgpu_gem_object_funcs.
      
      [  396.231390] amdgpu: Release VA 0x7f76b4ada000 - 0x7f76b4add000
      [  396.231394] amdgpu:   remove VA 0x7f76b4ada000 - 0x7f76b4add000 in entry 0000000085c24a47
      [  396.231408] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  396.231445] #PF: supervisor read access in kernel mode
      [  396.231466] #PF: error_code(0x0000) - not-present page
      [  396.231484] PGD 0 P4D 0
      [  396.231495] Oops: 0000 [#1] SMP NOPTI
      [  396.231509] CPU: 7 PID: 1352 Comm: clinfo Tainted: G           OE     5.11.0-rc2-custom #1
      [  396.231537] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS WCD0401N_Weekly_20_04_0 04/01/2020
      [  396.231563] RIP: 0010:drm_gem_object_free+0xc/0x22 [drm]
      [  396.231606] Code: eb ec 48 89 c3 eb e7 0f 1f 44 00 00 55 48 89 e5 48 8b bf 00 06 00 00 e8 72 0d 01 00 5d c3 0f 1f 44 00 00 48 8b 87 40 01 00 00 <48> 8b 00 48 85 c0 74 0b 55 48 89 e5 e8 54 37 7c db 5d c3 0f 0b c3
      [  396.231666] RSP: 0018:ffffb4704177fcf8 EFLAGS: 00010246
      [  396.231686] RAX: 0000000000000000 RBX: ffff993a0d0cc400 RCX: 0000000000003113
      [  396.231711] RDX: 0000000000000001 RSI: e9cda7a5d0791c6d RDI: ffff993a333a9058
      [  396.231736] RBP: ffffb4704177fdd0 R08: ffff993a03855858 R09: 0000000000000000
      [  396.231761] R10: ffff993a0d1f7158 R11: 0000000000000001 R12: 0000000000000000
      [  396.231785] R13: ffff993a0d0cc428 R14: 0000000000003000 R15: ffffb4704177fde0
      [  396.231811] FS:  00007f76b5730740(0000) GS:ffff993b275c0000(0000) knlGS:0000000000000000
      [  396.231840] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  396.231860] CR2: 0000000000000000 CR3: 000000016d2e2000 CR4: 0000000000350ee0
      [  396.231885] Call Trace:
      [  396.231897]  ? amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x24c/0x25f [amdgpu]
      [  396.232056]  ? __dynamic_dev_dbg+0xcd/0x100
      [  396.232076]  kfd_ioctl_free_memory_of_gpu+0x91/0x102 [amdgpu]
      [  396.232214]  kfd_ioctl+0x211/0x35b [amdgpu]
      [  396.232341]  ? kfd_ioctl_get_queue_wave_state+0x52/0x52 [amdgpu]
      
      Fixes: 246cb7e4 ("drm/amdgpu: Introduce GEM object functions")
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Tested-by: NChangfeng <changzhu@amd.com>
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      875440fd
  3. 16 12月, 2020 1 次提交
  4. 15 12月, 2020 1 次提交
  5. 02 12月, 2020 1 次提交
  6. 03 11月, 2020 1 次提交
  7. 27 10月, 2020 1 次提交
  8. 10 10月, 2020 1 次提交
  9. 24 9月, 2020 1 次提交
  10. 18 9月, 2020 1 次提交
  11. 25 8月, 2020 1 次提交
  12. 15 8月, 2020 1 次提交
  13. 07 8月, 2020 2 次提交
  14. 29 7月, 2020 1 次提交
    • A
      dma-buf: Use sequence counter with associated wound/wait mutex · cd29f220
      Ahmed S. Darwish 提交于
      A sequence counter write side critical section must be protected by some
      form of locking to serialize writers. If the serialization primitive is
      not disabling preemption implicitly, preemption has to be explicitly
      disabled before entering the sequence counter write side critical
      section.
      
      The dma-buf reservation subsystem uses plain sequence counters to manage
      updates to reservations. Writer serialization is accomplished through a
      wound/wait mutex.
      
      Acquiring a wound/wait mutex does not disable preemption, so this needs
      to be done manually before and after the write side critical section.
      
      Use the newly-added seqcount_ww_mutex_t instead:
      
        - It associates the ww_mutex with the sequence count, which enables
          lockdep to validate that the write side critical section is properly
          serialized.
      
        - It removes the need to explicitly add preempt_disable/enable()
          around the write side critical section because the write_begin/end()
          functions for this new data type automatically do this.
      
      If lockdep is disabled this ww_mutex lock association is compiled out
      and has neither storage size nor runtime overhead.
      Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://lkml.kernel.org/r/20200720155530.1173732-13-a.darwish@linutronix.de
      cd29f220
  15. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  16. 01 7月, 2020 1 次提交
  17. 10 6月, 2020 1 次提交
  18. 22 5月, 2020 1 次提交
  19. 09 5月, 2020 1 次提交
  20. 07 5月, 2020 1 次提交
  21. 01 5月, 2020 1 次提交
  22. 24 4月, 2020 1 次提交
  23. 23 4月, 2020 1 次提交
  24. 14 4月, 2020 1 次提交
  25. 11 3月, 2020 1 次提交
  26. 29 2月, 2020 1 次提交
  27. 27 2月, 2020 1 次提交
  28. 05 2月, 2020 1 次提交
  29. 28 1月, 2020 1 次提交
  30. 10 12月, 2019 1 次提交
  31. 27 11月, 2019 3 次提交
  32. 24 11月, 2019 2 次提交
  33. 14 11月, 2019 1 次提交
  34. 30 10月, 2019 1 次提交
  35. 25 10月, 2019 1 次提交
  36. 26 9月, 2019 1 次提交