1. 22 9月, 2022 5 次提交
  2. 20 9月, 2022 1 次提交
  3. 06 7月, 2022 1 次提交
  4. 22 6月, 2022 1 次提交
  5. 08 6月, 2022 2 次提交
  6. 04 6月, 2022 1 次提交
  7. 11 5月, 2022 1 次提交
  8. 04 5月, 2022 1 次提交
  9. 15 4月, 2022 1 次提交
    • X
      drm/amdgpu: Fix one use-after-free of VM · 7c703a7d
      xinhui pan 提交于
      VM might already be freed when amdgpu_vm_tlb_seq_cb() is called.
      We see the calltrace below.
      
      Fix it by keeping the last flush fence around and wait for it to signal
      
      BUG kmalloc-4k (Not tainted): Poison overwritten
      
      0xffff9c88630414e8-0xffff9c88630414e8 @offset=5352. First byte 0x6c
      instead of 0x6b Allocated in amdgpu_driver_open_kms+0x9d/0x360 [amdgpu]
      age=44 cpu=0 pid=2343
       __slab_alloc.isra.0+0x4f/0x90
       kmem_cache_alloc_trace+0x6b8/0x7a0
       amdgpu_driver_open_kms+0x9d/0x360 [amdgpu]
       drm_file_alloc+0x222/0x3e0 [drm]
       drm_open+0x11d/0x410 [drm]
      Freed in amdgpu_driver_postclose_kms+0x3e9/0x550 [amdgpu] age=22 cpu=1
      pid=2485
       kfree+0x4a2/0x580
       amdgpu_driver_postclose_kms+0x3e9/0x550 [amdgpu]
       drm_file_free+0x24e/0x3c0 [drm]
       drm_close_helper.isra.0+0x90/0xb0 [drm]
       drm_release+0x97/0x1a0 [drm]
       __fput+0xb6/0x280
       ____fput+0xe/0x10
       task_work_run+0x64/0xb0
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      7c703a7d
  10. 07 4月, 2022 2 次提交
  11. 06 4月, 2022 1 次提交
  12. 05 4月, 2022 2 次提交
  13. 01 4月, 2022 2 次提交
  14. 29 3月, 2022 2 次提交
  15. 26 3月, 2022 4 次提交
  16. 03 3月, 2022 2 次提交
  17. 24 2月, 2022 2 次提交
    • Q
      drm/amdgpu: check vm ready by amdgpu_vm->evicting flag · c1a66c3b
      Qiang Yu 提交于
      Workstation application ANSA/META v21.1.4 get this error dmesg when
      running CI test suite provided by ANSA/META:
      [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
      
      This is caused by:
      1. create a 256MB buffer in invisible VRAM
      2. CPU map the buffer and access it causes vm_fault and try to move
         it to visible VRAM
      3. force visible VRAM space and traverse all VRAM bos to check if
         evicting this bo is valuable
      4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
         will set amdgpu_vm->evicting, but latter due to not in visible
         VRAM, won't really evict it so not add it to amdgpu_vm->evicted
      5. before next CS to clear the amdgpu_vm->evicting, user VM ops
         ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
         but fail in amdgpu_vm_bo_update_mapping() (check
         amdgpu_vm->evicting) and get this error log
      
      This error won't affect functionality as next CS will finish the
      waiting VM ops. But we'd better clear the error log by checking
      the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
      amdgpu_vm_bo_update_mapping() later.
      
      Another reason is amdgpu_vm->evicted list holds all BOs (both
      user buffer and page table), but only page table BOs' eviction
      prevent VM ops. amdgpu_vm->evicting flag is set only for page
      table BOs, so we should use evicting flag instead of evicted list
      in amdgpu_vm_ready().
      
      The side effect of this change is: previously blocked VM op (user
      buffer in "evicted" list but no page table in it) gets done
      immediately.
      
      v2: update commit comments.
      Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NQiang Yu <qiang.yu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      c1a66c3b
    • Q
      drm/amdgpu: check vm ready by amdgpu_vm->evicting flag · b74e2476
      Qiang Yu 提交于
      Workstation application ANSA/META v21.1.4 get this error dmesg when
      running CI test suite provided by ANSA/META:
      [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
      
      This is caused by:
      1. create a 256MB buffer in invisible VRAM
      2. CPU map the buffer and access it causes vm_fault and try to move
         it to visible VRAM
      3. force visible VRAM space and traverse all VRAM bos to check if
         evicting this bo is valuable
      4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
         will set amdgpu_vm->evicting, but latter due to not in visible
         VRAM, won't really evict it so not add it to amdgpu_vm->evicted
      5. before next CS to clear the amdgpu_vm->evicting, user VM ops
         ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
         but fail in amdgpu_vm_bo_update_mapping() (check
         amdgpu_vm->evicting) and get this error log
      
      This error won't affect functionality as next CS will finish the
      waiting VM ops. But we'd better clear the error log by checking
      the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
      amdgpu_vm_bo_update_mapping() later.
      
      Another reason is amdgpu_vm->evicted list holds all BOs (both
      user buffer and page table), but only page table BOs' eviction
      prevent VM ops. amdgpu_vm->evicting flag is set only for page
      table BOs, so we should use evicting flag instead of evicted list
      in amdgpu_vm_ready().
      
      The side effect of this change is: previously blocked VM op (user
      buffer in "evicted" list but no page table in it) gets done
      immediately.
      
      v2: update commit comments.
      Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NQiang Yu <qiang.yu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      b74e2476
  18. 08 2月, 2022 3 次提交
  19. 03 2月, 2022 1 次提交
  20. 15 12月, 2021 1 次提交
  21. 20 10月, 2021 1 次提交
  22. 09 10月, 2021 1 次提交
  23. 24 9月, 2021 1 次提交
    • X
      drm/amdgpu: Put drm_dev_enter/exit outside hot codepath · b2fe31cf
      xinhui pan 提交于
      We hit soft hang while doing memory pressure test on one numa system.
      After a qucik look, this is because kfd invalid/valid userptr memory
      frequently with process_info lock hold.
      Looks like update page table mapping use too much cpu time.
      
      perf top says below,
      75.81%  [kernel]       [k] __srcu_read_unlock
       6.19%  [amdgpu]       [k] amdgpu_gmc_set_pte_pde
       3.56%  [kernel]       [k] __srcu_read_lock
       2.20%  [amdgpu]       [k] amdgpu_vm_cpu_update
       2.20%  [kernel]       [k] __sg_page_iter_dma_next
       2.15%  [drm]          [k] drm_dev_enter
       1.70%  [drm]          [k] drm_prime_sg_to_dma_addr_array
       1.18%  [kernel]       [k] __sg_alloc_table_from_pages
       1.09%  [drm]          [k] drm_dev_exit
      
      So move drm_dev_enter/exit outside gmc code, instead let caller do it.
      They are gart_unbind, gart_map, vm_clear_bo, vm_update_pdes and
      gmc_init_pdb0. vm_bo_update_mapping already calls it.
      Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
      Reviewed-and-tested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      b2fe31cf
  24. 26 8月, 2021 1 次提交