1. 22 9月, 2022 2 次提交
  2. 20 9月, 2022 1 次提交
  3. 06 7月, 2022 1 次提交
  4. 22 6月, 2022 1 次提交
  5. 08 6月, 2022 2 次提交
  6. 04 6月, 2022 1 次提交
  7. 11 5月, 2022 1 次提交
  8. 04 5月, 2022 1 次提交
  9. 15 4月, 2022 1 次提交
    • X
      drm/amdgpu: Fix one use-after-free of VM · 7c703a7d
      xinhui pan 提交于
      VM might already be freed when amdgpu_vm_tlb_seq_cb() is called.
      We see the calltrace below.
      
      Fix it by keeping the last flush fence around and wait for it to signal
      
      BUG kmalloc-4k (Not tainted): Poison overwritten
      
      0xffff9c88630414e8-0xffff9c88630414e8 @offset=5352. First byte 0x6c
      instead of 0x6b Allocated in amdgpu_driver_open_kms+0x9d/0x360 [amdgpu]
      age=44 cpu=0 pid=2343
       __slab_alloc.isra.0+0x4f/0x90
       kmem_cache_alloc_trace+0x6b8/0x7a0
       amdgpu_driver_open_kms+0x9d/0x360 [amdgpu]
       drm_file_alloc+0x222/0x3e0 [drm]
       drm_open+0x11d/0x410 [drm]
      Freed in amdgpu_driver_postclose_kms+0x3e9/0x550 [amdgpu] age=22 cpu=1
      pid=2485
       kfree+0x4a2/0x580
       amdgpu_driver_postclose_kms+0x3e9/0x550 [amdgpu]
       drm_file_free+0x24e/0x3c0 [drm]
       drm_close_helper.isra.0+0x90/0xb0 [drm]
       drm_release+0x97/0x1a0 [drm]
       __fput+0xb6/0x280
       ____fput+0xe/0x10
       task_work_run+0x64/0xb0
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      7c703a7d
  10. 07 4月, 2022 2 次提交
  11. 06 4月, 2022 1 次提交
  12. 05 4月, 2022 2 次提交
  13. 01 4月, 2022 2 次提交
  14. 29 3月, 2022 2 次提交
  15. 26 3月, 2022 4 次提交
  16. 03 3月, 2022 2 次提交
  17. 24 2月, 2022 2 次提交
    • Q
      drm/amdgpu: check vm ready by amdgpu_vm->evicting flag · c1a66c3b
      Qiang Yu 提交于
      Workstation application ANSA/META v21.1.4 get this error dmesg when
      running CI test suite provided by ANSA/META:
      [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
      
      This is caused by:
      1. create a 256MB buffer in invisible VRAM
      2. CPU map the buffer and access it causes vm_fault and try to move
         it to visible VRAM
      3. force visible VRAM space and traverse all VRAM bos to check if
         evicting this bo is valuable
      4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
         will set amdgpu_vm->evicting, but latter due to not in visible
         VRAM, won't really evict it so not add it to amdgpu_vm->evicted
      5. before next CS to clear the amdgpu_vm->evicting, user VM ops
         ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
         but fail in amdgpu_vm_bo_update_mapping() (check
         amdgpu_vm->evicting) and get this error log
      
      This error won't affect functionality as next CS will finish the
      waiting VM ops. But we'd better clear the error log by checking
      the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
      amdgpu_vm_bo_update_mapping() later.
      
      Another reason is amdgpu_vm->evicted list holds all BOs (both
      user buffer and page table), but only page table BOs' eviction
      prevent VM ops. amdgpu_vm->evicting flag is set only for page
      table BOs, so we should use evicting flag instead of evicted list
      in amdgpu_vm_ready().
      
      The side effect of this change is: previously blocked VM op (user
      buffer in "evicted" list but no page table in it) gets done
      immediately.
      
      v2: update commit comments.
      Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NQiang Yu <qiang.yu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      c1a66c3b
    • Q
      drm/amdgpu: check vm ready by amdgpu_vm->evicting flag · b74e2476
      Qiang Yu 提交于
      Workstation application ANSA/META v21.1.4 get this error dmesg when
      running CI test suite provided by ANSA/META:
      [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16)
      
      This is caused by:
      1. create a 256MB buffer in invisible VRAM
      2. CPU map the buffer and access it causes vm_fault and try to move
         it to visible VRAM
      3. force visible VRAM space and traverse all VRAM bos to check if
         evicting this bo is valuable
      4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable()
         will set amdgpu_vm->evicting, but latter due to not in visible
         VRAM, won't really evict it so not add it to amdgpu_vm->evicted
      5. before next CS to clear the amdgpu_vm->evicting, user VM ops
         ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted)
         but fail in amdgpu_vm_bo_update_mapping() (check
         amdgpu_vm->evicting) and get this error log
      
      This error won't affect functionality as next CS will finish the
      waiting VM ops. But we'd better clear the error log by checking
      the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling
      amdgpu_vm_bo_update_mapping() later.
      
      Another reason is amdgpu_vm->evicted list holds all BOs (both
      user buffer and page table), but only page table BOs' eviction
      prevent VM ops. amdgpu_vm->evicting flag is set only for page
      table BOs, so we should use evicting flag instead of evicted list
      in amdgpu_vm_ready().
      
      The side effect of this change is: previously blocked VM op (user
      buffer in "evicted" list but no page table in it) gets done
      immediately.
      
      v2: update commit comments.
      Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NQiang Yu <qiang.yu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      b74e2476
  18. 08 2月, 2022 3 次提交
  19. 03 2月, 2022 1 次提交
  20. 15 12月, 2021 1 次提交
  21. 20 10月, 2021 1 次提交
  22. 09 10月, 2021 1 次提交
  23. 24 9月, 2021 1 次提交
    • X
      drm/amdgpu: Put drm_dev_enter/exit outside hot codepath · b2fe31cf
      xinhui pan 提交于
      We hit soft hang while doing memory pressure test on one numa system.
      After a qucik look, this is because kfd invalid/valid userptr memory
      frequently with process_info lock hold.
      Looks like update page table mapping use too much cpu time.
      
      perf top says below,
      75.81%  [kernel]       [k] __srcu_read_unlock
       6.19%  [amdgpu]       [k] amdgpu_gmc_set_pte_pde
       3.56%  [kernel]       [k] __srcu_read_lock
       2.20%  [amdgpu]       [k] amdgpu_vm_cpu_update
       2.20%  [kernel]       [k] __sg_page_iter_dma_next
       2.15%  [drm]          [k] drm_dev_enter
       1.70%  [drm]          [k] drm_prime_sg_to_dma_addr_array
       1.18%  [kernel]       [k] __sg_alloc_table_from_pages
       1.09%  [drm]          [k] drm_dev_exit
      
      So move drm_dev_enter/exit outside gmc code, instead let caller do it.
      They are gart_unbind, gart_map, vm_clear_bo, vm_update_pdes and
      gmc_init_pdb0. vm_bo_update_mapping already calls it.
      Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
      Reviewed-and-tested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      b2fe31cf
  24. 26 8月, 2021 1 次提交
  25. 25 8月, 2021 1 次提交
  26. 17 8月, 2021 1 次提交
    • J
      drm/amd/amdgpu embed hw_fence into amdgpu_job · c530b02f
      Jack Zhang 提交于
      Why: Previously hw fence is alloced separately with job.
      It caused historical lifetime issues and corner cases.
      The ideal situation is to take fence to manage both job
      and fence's lifetime, and simplify the design of gpu-scheduler.
      
      How:
      We propose to embed hw_fence into amdgpu_job.
      1. We cover the normal job submission by this method.
      2. For ib_test, and submit without a parent job keep the
      legacy way to create a hw fence separately.
      v2:
      use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
      embedded in a job.
      v3:
      remove redundant variable ring in amdgpu_job
      v4:
      add tdr sequence support for this feature. Add a job_run_counter to
      indicate whether this job is a resubmit job.
      v5
      add missing handling in amdgpu_fence_enable_signaling
      Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com>
      Signed-off-by: NJack Zhang <Jack.Zhang7@hotmail.com>
      Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Reviewed by: Monk Liu <monk.liu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      c530b02f
  27. 03 8月, 2021 1 次提交