1. 14 7月, 2022 1 次提交
    • R
      drm/amdgpu: Fix recursive locking warning · 1f1b4f34
      Rajneesh Bhardwaj 提交于
      stable inclusion
      from stable-v5.10.111
      commit 6694b8643bde4b940f6b410507960793b922a77d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6694b8643bde4b940f6b410507960793b922a77d
      
      --------------------------------
      
      [ Upstream commit 447c7997 ]
      
      Noticed the below warning while running a pytorch workload on vega10
      GPUs. Change to trylock to avoid conflicts with already held reservation
      locks.
      
      [  +0.000003] WARNING: possible recursive locking detected
      [  +0.000003] 5.13.0-kfd-rajneesh #1030 Not tainted
      [  +0.000004] --------------------------------------------
      [  +0.000002] python/4822 is trying to acquire lock:
      [  +0.000004] ffff932cd9a259f8 (reservation_ww_class_mutex){+.+.}-{3:3},
      at: amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
      [  +0.000203]
                    but task is already holding lock:
      [  +0.000003] ffff932cbb7181f8 (reservation_ww_class_mutex){+.+.}-{3:3},
      at: ttm_eu_reserve_buffers+0x270/0x470 [ttm]
      [  +0.000017]
                    other info that might help us debug this:
      [  +0.000002]  Possible unsafe locking scenario:
      
      [  +0.000003]        CPU0
      [  +0.000002]        ----
      [  +0.000002]   lock(reservation_ww_class_mutex);
      [  +0.000004]   lock(reservation_ww_class_mutex);
      [  +0.000003]
                     *** DEADLOCK ***
      
      [  +0.000002]  May be due to missing lock nesting notation
      
      [  +0.000003] 7 locks held by python/4822:
      [  +0.000003]  #0: ffff932c4ac028d0 (&process->mutex){+.+.}-{3:3}, at:
      kfd_ioctl_map_memory_to_gpu+0x10b/0x320 [amdgpu]
      [  +0.000232]  #1: ffff932c55e830a8 (&info->lock#2){+.+.}-{3:3}, at:
      amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x64/0xf60 [amdgpu]
      [  +0.000241]  #2: ffff932cc45b5e68 (&(*mem)->lock){+.+.}-{3:3}, at:
      amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0xdf/0xf60 [amdgpu]
      [  +0.000236]  #3: ffffb2b35606fd28
      (reservation_ww_class_acquire){+.+.}-{0:0}, at:
      amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x232/0xf60 [amdgpu]
      [  +0.000235]  #4: ffff932cbb7181f8
      (reservation_ww_class_mutex){+.+.}-{3:3}, at:
      ttm_eu_reserve_buffers+0x270/0x470 [ttm]
      [  +0.000015]  #5: ffffffffc045f700 (*(sspp++)){....}-{0:0}, at:
      drm_dev_enter+0x5/0xa0 [drm]
      [  +0.000038]  #6: ffff932c52da7078 (&vm->eviction_lock){+.+.}-{3:3},
      at: amdgpu_vm_bo_update_mapping+0xd5/0x4f0 [amdgpu]
      [  +0.000195]
                    stack backtrace:
      [  +0.000003] CPU: 11 PID: 4822 Comm: python Not tainted
      5.13.0-kfd-rajneesh #1030
      [  +0.000005] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F02
      08/29/2018
      [  +0.000003] Call Trace:
      [  +0.000003]  dump_stack+0x6d/0x89
      [  +0.000010]  __lock_acquire+0xb93/0x1a90
      [  +0.000009]  lock_acquire+0x25d/0x2d0
      [  +0.000005]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
      [  +0.000184]  ? lock_is_held_type+0xa2/0x110
      [  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
      [  +0.000184]  __ww_mutex_lock.constprop.17+0xca/0x1060
      [  +0.000007]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
      [  +0.000183]  ? lock_release+0x13f/0x270
      [  +0.000005]  ? lock_is_held_type+0xa2/0x110
      [  +0.000006]  ? amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
      [  +0.000183]  amdgpu_bo_release_notify+0xc4/0x160 [amdgpu]
      [  +0.000185]  ttm_bo_release+0x4c6/0x580 [ttm]
      [  +0.000010]  amdgpu_bo_unref+0x1a/0x30 [amdgpu]
      [  +0.000183]  amdgpu_vm_free_table+0x76/0xa0 [amdgpu]
      [  +0.000189]  amdgpu_vm_free_pts+0xb8/0xf0 [amdgpu]
      [  +0.000189]  amdgpu_vm_update_ptes+0x411/0x770 [amdgpu]
      [  +0.000191]  amdgpu_vm_bo_update_mapping+0x324/0x4f0 [amdgpu]
      [  +0.000191]  amdgpu_vm_bo_update+0x251/0x610 [amdgpu]
      [  +0.000191]  update_gpuvm_pte+0xcc/0x290 [amdgpu]
      [  +0.000229]  ? amdgpu_vm_bo_map+0xd7/0x130 [amdgpu]
      [  +0.000190]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x912/0xf60
      [amdgpu]
      [  +0.000234]  kfd_ioctl_map_memory_to_gpu+0x182/0x320 [amdgpu]
      [  +0.000218]  kfd_ioctl+0x2b9/0x600 [amdgpu]
      [  +0.000216]  ? kfd_ioctl_unmap_memory_from_gpu+0x270/0x270 [amdgpu]
      [  +0.000216]  ? lock_release+0x13f/0x270
      [  +0.000006]  ? __fget_files+0x107/0x1e0
      [  +0.000007]  __x64_sys_ioctl+0x8b/0xd0
      [  +0.000007]  do_syscall_64+0x36/0x70
      [  +0.000004]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  +0.000007] RIP: 0033:0x7fbff90a7317
      [  +0.000004] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
      48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
      05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
      [  +0.000005] RSP: 002b:00007fbe301fe648 EFLAGS: 00000246 ORIG_RAX:
      0000000000000010
      [  +0.000006] RAX: ffffffffffffffda RBX: 00007fbcc402d820 RCX:
      00007fbff90a7317
      [  +0.000003] RDX: 00007fbe301fe690 RSI: 00000000c0184b18 RDI:
      0000000000000004
      [  +0.000003] RBP: 00007fbe301fe690 R08: 0000000000000000 R09:
      00007fbcc402d880
      [  +0.000003] R10: 0000000002001000 R11: 0000000000000246 R12:
      00000000c0184b18
      [  +0.000003] R13: 0000000000000004 R14: 00007fbf689593a0 R15:
      00007fbcc402d820
      
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Alex Deucher <Alexander.Deucher@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      1f1b4f34
  2. 21 10月, 2021 1 次提交
  3. 11 9月, 2020 1 次提交
  4. 25 8月, 2020 1 次提交
  5. 18 8月, 2020 1 次提交
  6. 12 8月, 2020 1 次提交
  7. 06 8月, 2020 3 次提交
  8. 05 8月, 2020 1 次提交
  9. 25 6月, 2020 1 次提交
  10. 29 4月, 2020 1 次提交
  11. 11 3月, 2020 1 次提交
  12. 27 2月, 2020 3 次提交
  13. 05 2月, 2020 2 次提交
  14. 18 10月, 2019 1 次提交
  15. 17 10月, 2019 1 次提交
  16. 16 10月, 2019 1 次提交
  17. 03 10月, 2019 1 次提交
  18. 17 9月, 2019 1 次提交
  19. 16 9月, 2019 1 次提交
  20. 22 8月, 2019 1 次提交
  21. 13 8月, 2019 1 次提交
  22. 06 8月, 2019 2 次提交
  23. 05 8月, 2019 1 次提交
  24. 02 8月, 2019 1 次提交
    • F
      drm/amdgpu: Implement VRAM wipe on release · ab2f7a5c
      Felix Kuehling 提交于
      Wipe VRAM memory containing sensitive data when moving or releasing
      BOs. Clearing the memory is pipelined to minimize any impact on
      subsequent memory allocation latency. Use of a poison value should
      help debug future use-after-free bugs.
      
      When moving BOs, the existing ttm_bo_pipelined_move ensures that the
      memory won't be reused before being wiped.
      
      When releasing BOs, the BO is fenced with the memory fill operation,
      which results in queuing the BO for a delayed delete.
      
      v2: Move amdgpu_amdkfd_unreserve_memory_limit into
      amdgpu_bo_release_notify so that KFD can use memory that's still
      being cleared in the background
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      ab2f7a5c
  25. 31 7月, 2019 2 次提交
  26. 21 6月, 2019 1 次提交
  27. 12 6月, 2019 1 次提交
  28. 11 6月, 2019 1 次提交
  29. 20 4月, 2019 1 次提交
  30. 14 2月, 2019 1 次提交
  31. 01 2月, 2019 1 次提交
  32. 15 12月, 2018 1 次提交
  33. 08 12月, 2018 1 次提交