• J
    drm/i915/guc: Refcount context during error capture · 08d1ecd9
    John Harrison 提交于
    When i915 receives a context reset notification from GuC, it triggers
    an error capture before resetting any outstanding requsts of that
    context. Unfortunately, the error capture is not a time bound
    operation. In certain situations it can take a long time, particularly
    when multiple large LMEM buffers must be read back and eoncoded. If
    this delay is longer than other timeouts (heartbeat, test recovery,
    etc.) then a full GT reset can be triggered in the middle.
    
    That can result in the context being reset by GuC actually being
    destroyed before the error capture completes and the GuC submission
    code resumes. Thus, the GuC side can start dereferencing stale
    pointers and Bad Things ensue.
    
    So add a refcount get of the context during the entire reset
    operation. That way, the context can't be destroyed part way through
    no matter what other resets or user interactions occur.
    
    v2:
     (Matthew Brost)
      - Update patch to work with async error capture
    v3:
     (Matthew Brost)
      - Drop async capture support as that hasn't landed yet
    Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
    Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
    Reviewed-by: NMatthew Brost <matthew.brost@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20211108164054.23588-1-matthew.brost@intel.com
    08d1ecd9
intel_guc_submission.c 125.0 KB