• M
    drm/sched: fix the bug of time out calculation(v4) · bcf26654
    Monk Liu 提交于
    issue:
    in cleanup_job the cancle_delayed_work will cancel a TO timer
    even the its corresponding job is still running.
    
    fix:
    do not cancel the timer in cleanup_job, instead do the cancelling
    only when the heading job is signaled, and if there is a "next" job
    we start_timeout again.
    
    v2:
    further cleanup the logic, and do the TDR timer cancelling if the signaled job
    is the last one in its scheduler.
    
    v3:
    change the issue description
    remove the cancel_delayed_work in the begining of the cleanup_job
    recover the implement of drm_sched_job_begin.
    
    v4:
    remove the kthread_should_park() checking in cleanup_job routine,
    we should cleanup the signaled job asap
    
    TODO:
    1)introduce pause/resume scheduler in job_timeout to serial the handling
    of scheduler and job_timeout.
    2)drop the bad job's del and insert in scheduler due to above serialization
    (no race issue anymore with the serialization)
    
    Tested-by: jingwen <jingwen.chen@@amd.com>
    Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
    Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/1630457207-13107-1-git-send-email-Monk.Liu@amd.com
    bcf26654
sched_main.c 29.7 KB