drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs (79867462) · 提交 · openanolis / cloud-kernel

提交 79867462 编写于 9月 28, 2017 作者: N Nicolai Hähnle 提交者： Alex Deucher 10月 06, 2017

drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs

Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

    [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
    [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
    [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
    [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
    [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
    [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
    [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
    [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
    [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
    [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
    [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
    [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
    [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
    [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
    [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
    [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
    [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

Note: The correctness of this change depends on the earlier commit
"drm/amd/sched: move adding finish callback to amd_sched_job_begin"

v2: set an error on the finished fence
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

上级 29d25355

隐藏空白更改

内联并排

浏览文件 @ 79867462

@@ -227,8 +227,14 @@ void amd_sched_entity_fini(struct amd_gpu_scheduler *sched,
 		 */
 		kthread_park(sched->thread);
 		kthread_unpark(sched->thread);
 		while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
 		while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
 			struct amd_sched_fence *s_fence = job->s_fence;
 			amd_sched_fence_scheduled(s_fence);
 			dma_fence_set_error(&s_fence->finished, -ESRCH);
 			amd_sched_fence_finished(s_fence);
 			dma_fence_put(&s_fence->finished);
 			sched->ops->free_job(job);
+		}
+	}
 	kfifo_free(&entity->job_queue);
-...

想要评论请注册或

openanolis / cloud-kernel 大约 1 年 前同步成功

drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs

openanolis / cloud-kernel
大约 1 年前同步成功