• D
    drm/i915: fix gpu hang vs. flip stall deadlocks · 122f46ba
    Daniel Vetter 提交于
    Since we've started to clean up pending flips when the gpu hangs in
    
    commit 96a02917
    Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Date:   Mon Feb 18 19:08:49 2013 +0200
    
        drm/i915: Finish page flips and update primary planes after a GPU reset
    
    the gpu reset work now also grabs modeset locks. But since work items
    on our private work queue are not allowed to do that due to the
    flush_workqueue from the pageflip code this results in a neat
    deadlock:
    
    INFO: task kms_flip:14676 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kms_flip        D ffff88019283a5c0     0 14676  13344 0x00000004
     ffff88018e62dbf8 0000000000000046 ffff88013bdb12e0 ffff88018e62dfd8
     ffff88018e62dfd8 00000000001d3b00 ffff88019283a5c0 ffff88018ec21000
     ffff88018f693f00 ffff88018eece000 ffff88018e62dd60 ffff88018eece898
    Call Trace:
     [<ffffffff8138ee7b>] schedule+0x60/0x62
     [<ffffffffa046c0dd>] intel_crtc_wait_for_pending_flips+0xb2/0x114 [i915]
     [<ffffffff81050ff4>] ? finish_wait+0x60/0x60
     [<ffffffffa0478041>] intel_crtc_set_config+0x7f3/0x81e [i915]
     [<ffffffffa031780a>] drm_mode_set_config_internal+0x4f/0xc6 [drm]
     [<ffffffffa0319cf3>] drm_mode_setcrtc+0x44d/0x4f9 [drm]
     [<ffffffff810e44da>] ? might_fault+0x38/0x86
     [<ffffffffa030d51f>] drm_ioctl+0x2f9/0x447 [drm]
     [<ffffffff8107a722>] ? trace_hardirqs_off+0xd/0xf
     [<ffffffffa03198a6>] ? drm_mode_setplane+0x343/0x343 [drm]
     [<ffffffff8112222f>] ? mntput_no_expire+0x3e/0x13d
     [<ffffffff81117f33>] vfs_ioctl+0x18/0x34
     [<ffffffff81118776>] do_vfs_ioctl+0x396/0x454
     [<ffffffff81396b37>] ? sysret_check+0x1b/0x56
     [<ffffffff81118886>] SyS_ioctl+0x52/0x7d
     [<ffffffff81396b12>] system_call_fastpath+0x16/0x1b
    2 locks held by kms_flip/14676:
     #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0316545>] drm_modeset_lock_all+0x22/0x59 [drm]
     #1:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa031656b>] drm_modeset_lock_all+0x48/0x59 [drm]
    INFO: task kworker/u8:4:175 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/u8:4    D ffff88018de9a5c0     0   175      2 0x00000000
    Workqueue: i915 i915_error_work_func [i915]
     ffff88018e37dc30 0000000000000046 ffff8801938ab8a0 ffff88018e37dfd8
     ffff88018e37dfd8 00000000001d3b00 ffff88018de9a5c0 ffff88018ec21018
     0000000000000246 ffff88018e37dca0 000000005a865a86 ffff88018de9a5c0
    Call Trace:
     [<ffffffff8138ee7b>] schedule+0x60/0x62
     [<ffffffff8138f23d>] schedule_preempt_disabled+0x9/0xb
     [<ffffffff8138d0cd>] mutex_lock_nested+0x205/0x3b1
     [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
     [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
     [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
     [<ffffffffa044e0a2>] i915_error_work_func+0x128/0x147 [i915]
     [<ffffffff8104a89a>] process_one_work+0x1d4/0x35a
     [<ffffffff8104a821>] ? process_one_work+0x15b/0x35a
     [<ffffffff8104b4a5>] worker_thread+0x144/0x1f0
     [<ffffffff8104b361>] ? rescuer_thread+0x275/0x275
     [<ffffffff8105076d>] kthread+0xac/0xb4
     [<ffffffff81059d30>] ? finish_task_switch+0x3b/0xc0
     [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
     [<ffffffff81396a6c>] ret_from_fork+0x7c/0xb0
     [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
    3 locks held by kworker/u8:4/175:
     #0:  (i915){.+.+.+}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
     #1:  ((&dev_priv->gpu_error.work)){+.+.+.}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
     #2:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
    
    This blew up while running kms_flip/flip-vs-panning-vs-hang-interruptible
    on one of my older machines.
    
    Unfortunately (despite the proper lockdep annotations for
    flush_workqueue) lockdep still doesn't detect this correctly, so we
    need to rely on chance to discover these bugs.
    
    Apply the usual bugfix and schedule the reset work on the system
    workqueue to keep our own driver workqueue free of any modeset lock
    grabbing.
    
    Note that this is not a terribly serious regression since before the
    offending commit we'd simply have stalled userspace forever due to
    failing to abort all outstanding pageflips.
    
    v2: Add a comment as requested by Chris.
    
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
    122f46ba
i915_irq.c 92.4 KB