1. 10 9月, 2013 2 次提交
  2. 09 9月, 2013 2 次提交
    • D
      drm/i915: fix wait_for_pending_flips vs gpu hang deadlock · 17e1df07
      Daniel Vetter 提交于
      My g33 here seems to be shockingly good at hitting them all. This time
      around kms_flip/flip-vs-panning-vs-hang blows up:
      
      intel_crtc_wait_for_pending_flips correctly checks for gpu hangs and
      if a gpu hang is pending aborts the wait for outstanding flips so that
      the setcrtc call will succeed and release the crtc mutex. And the gpu
      hang handler needs that lock in intel_display_handle_reset to be able
      to complete outstanding flips.
      
      The problem is that we can race in two ways:
      - Waiters on the dev_priv->pending_flip_queue aren't woken up after
        we've the reset as pending, but before we actually start the reset
        work. This means that the waiter doesn't notice the pending reset
        and hence will keep on hogging the locks.
      
        Like with dev->struct_mutex and the ring->irq_queue wait queues we
        there need to wake up everyone that potentially holds a lock which
        the reset handler needs.
      
      - intel_display_handle_reset was called _after_ we've already
        signalled the completion of the reset work. Which means a waiter
        could sneak in, grab the lock and never release it (since the
        pageflips won't ever get released).
      
        Similar to resetting the gem state all the reset work must complete
        before we update the reset counter. Contrary to the gem reset we
        don't need to have a second explicit wake up call since that will
        have happened already when completing the pageflips. We also don't
        have any issues that the completion happens while the reset state is
        still pending - wait_for_pending_flips is only there to ensure we
        display the right frame. After a gpu hang&reset events such
        guarantees are out the window anyway. This is in contrast to the gem
        code where too-early wake-up would result in unnecessary restarting
        of ioctls.
      
      Also, since we've gotten these various deadlocks and ordering
      constraints wrong so often throw copious amounts of comments at the
      code.
      
      This deadlock regression has been introduced in the commit which added
      the pageflip reset logic to the gpu hang work:
      
      commit 96a02917
      Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Date:   Mon Feb 18 19:08:49 2013 +0200
      
          drm/i915: Finish page flips and update primary planes after a GPU reset
      
      v2:
      - Add comments to explain how the wake_up serves as memory barriers
        for the atomic_t reset counter.
      - Improve the comments a bit as suggested by Chris Wilson.
      - Extract the wake_up calls before/after the reset into a little
        i915_error_wake_up and unconditionally wake up the
        pending_flip_queue waiters, again as suggested by Chris Wilson.
      
      v3: Throw copious amounts of comments at i915_error_wake_up as
      suggested by Chris Wilson.
      
      Cc: stable@vger.kernel.org
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      17e1df07
    • C
      drm/i915: Track pfit enable state separately from size · fd4daa9c
      Chris Wilson 提交于
      Detangle the additional state of whether or not the hw has the pfit
      enabled from whether it has zero size. This allows us to cleanly
      distinguish in the code when we expect the pfit to be enabled (for
      Haswell pc8), and when the BIOS is confused and needs sanitizing.
      Reported-by: Nshui yanwei <yangweix.shui@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68251Tested-by: Nshui yanwei <yangweix.shui@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      fd4daa9c
  3. 07 9月, 2013 1 次提交
    • V
      drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done · 6e1b4fda
      Ville Syrjälä 提交于
      When transitioning away from vgacon the system tries to save the
      current contents of the VGA memory, so that it can be cleanly handed
      off to fbcon (or whatever comes afterwards).
      
      The recent change
      
       commit 81b5c7bc
       Author: Alex Williamson <alex.williamson@redhat.com>
       Date:   Wed Aug 28 09:39:08 2013 -0600
      
          i915: Update VGA arbiter support for newer devices
      
      caused i915 to disable VGA memory decode for the IGD when i915 is
      initializing. Unfortunately that happens before the vgacon->fbcon
      handoff so vgacon_save_screen() will read out all ones from the
      VGA memory.
      
      After the handoff fbcon will inherit the bogus state from vgacon,
      and pre-fills the fb with matching contents. The end result is
      a white rectangle in the top left corner of the screen, the size
      of which matches the now inactive VGA console.
      
      To remedy the situation delay the disabling of VGA memory until
      the vgacon->fbcon handoff has happened.
      
      Also rename i915_enable_vga to i915_enable_vga_mem to make
      the relationship between these functions clearer.
      
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6e1b4fda
  4. 06 9月, 2013 2 次提交
  5. 05 9月, 2013 3 次提交
    • C
      drm/i915: Skip stolen region initialisation if none is reserved · 6644a4e9
      Chris Wilson 提交于
      Paulo reported that if he set the amount of reserved memory to 0, then
      we emitted a warning about a conflict before disabling our use of stolen
      memory. This was introduced with
      
      commit eaba1b8f
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Jul 4 12:28:35 2013 +0100
      
          drm/i915: Verify that our stolen memory doesn't conflict
      
      and is simply fixed by checking for a no reservation first.
      Reported-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6644a4e9
    • D
      drm/i915: fix gpu hang vs. flip stall deadlocks · 122f46ba
      Daniel Vetter 提交于
      Since we've started to clean up pending flips when the gpu hangs in
      
      commit 96a02917
      Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Date:   Mon Feb 18 19:08:49 2013 +0200
      
          drm/i915: Finish page flips and update primary planes after a GPU reset
      
      the gpu reset work now also grabs modeset locks. But since work items
      on our private work queue are not allowed to do that due to the
      flush_workqueue from the pageflip code this results in a neat
      deadlock:
      
      INFO: task kms_flip:14676 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kms_flip        D ffff88019283a5c0     0 14676  13344 0x00000004
       ffff88018e62dbf8 0000000000000046 ffff88013bdb12e0 ffff88018e62dfd8
       ffff88018e62dfd8 00000000001d3b00 ffff88019283a5c0 ffff88018ec21000
       ffff88018f693f00 ffff88018eece000 ffff88018e62dd60 ffff88018eece898
      Call Trace:
       [<ffffffff8138ee7b>] schedule+0x60/0x62
       [<ffffffffa046c0dd>] intel_crtc_wait_for_pending_flips+0xb2/0x114 [i915]
       [<ffffffff81050ff4>] ? finish_wait+0x60/0x60
       [<ffffffffa0478041>] intel_crtc_set_config+0x7f3/0x81e [i915]
       [<ffffffffa031780a>] drm_mode_set_config_internal+0x4f/0xc6 [drm]
       [<ffffffffa0319cf3>] drm_mode_setcrtc+0x44d/0x4f9 [drm]
       [<ffffffff810e44da>] ? might_fault+0x38/0x86
       [<ffffffffa030d51f>] drm_ioctl+0x2f9/0x447 [drm]
       [<ffffffff8107a722>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffffa03198a6>] ? drm_mode_setplane+0x343/0x343 [drm]
       [<ffffffff8112222f>] ? mntput_no_expire+0x3e/0x13d
       [<ffffffff81117f33>] vfs_ioctl+0x18/0x34
       [<ffffffff81118776>] do_vfs_ioctl+0x396/0x454
       [<ffffffff81396b37>] ? sysret_check+0x1b/0x56
       [<ffffffff81118886>] SyS_ioctl+0x52/0x7d
       [<ffffffff81396b12>] system_call_fastpath+0x16/0x1b
      2 locks held by kms_flip/14676:
       #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0316545>] drm_modeset_lock_all+0x22/0x59 [drm]
       #1:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa031656b>] drm_modeset_lock_all+0x48/0x59 [drm]
      INFO: task kworker/u8:4:175 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u8:4    D ffff88018de9a5c0     0   175      2 0x00000000
      Workqueue: i915 i915_error_work_func [i915]
       ffff88018e37dc30 0000000000000046 ffff8801938ab8a0 ffff88018e37dfd8
       ffff88018e37dfd8 00000000001d3b00 ffff88018de9a5c0 ffff88018ec21018
       0000000000000246 ffff88018e37dca0 000000005a865a86 ffff88018de9a5c0
      Call Trace:
       [<ffffffff8138ee7b>] schedule+0x60/0x62
       [<ffffffff8138f23d>] schedule_preempt_disabled+0x9/0xb
       [<ffffffff8138d0cd>] mutex_lock_nested+0x205/0x3b1
       [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
       [<ffffffffa0477094>] ? intel_display_handle_reset+0x7e/0xbd [i915]
       [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
       [<ffffffffa044e0a2>] i915_error_work_func+0x128/0x147 [i915]
       [<ffffffff8104a89a>] process_one_work+0x1d4/0x35a
       [<ffffffff8104a821>] ? process_one_work+0x15b/0x35a
       [<ffffffff8104b4a5>] worker_thread+0x144/0x1f0
       [<ffffffff8104b361>] ? rescuer_thread+0x275/0x275
       [<ffffffff8105076d>] kthread+0xac/0xb4
       [<ffffffff81059d30>] ? finish_task_switch+0x3b/0xc0
       [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
       [<ffffffff81396a6c>] ret_from_fork+0x7c/0xb0
       [<ffffffff810506c1>] ? __kthread_parkme+0x60/0x60
      3 locks held by kworker/u8:4/175:
       #0:  (i915){.+.+.+}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
       #1:  ((&dev_priv->gpu_error.work)){+.+.+.}, at: [<ffffffff8104a821>] process_one_work+0x15b/0x35a
       #2:  (&crtc->mutex){+.+.+.}, at: [<ffffffffa0477094>] intel_display_handle_reset+0x7e/0xbd [i915]
      
      This blew up while running kms_flip/flip-vs-panning-vs-hang-interruptible
      on one of my older machines.
      
      Unfortunately (despite the proper lockdep annotations for
      flush_workqueue) lockdep still doesn't detect this correctly, so we
      need to rely on chance to discover these bugs.
      
      Apply the usual bugfix and schedule the reset work on the system
      workqueue to keep our own driver workqueue free of any modeset lock
      grabbing.
      
      Note that this is not a terribly serious regression since before the
      offending commit we'd simply have stalled userspace forever due to
      failing to abort all outstanding pageflips.
      
      v2: Add a comment as requested by Chris.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      122f46ba
    • C
      drm/i915: Hold an object reference whilst we shrink it · 57094f82
      Chris Wilson 提交于
      Whilst running the shrinker, we need to hold a reference as we unbind
      the objects, or else we may end up waiting for and retiring requests,
      which in turn may result in this object being freed.
      
      This is very similar to the eviction code which also has to be very
      careful to keep a reference to its objects as it retires and unbinds
      them.
      
      Another similarity, that Ben pointed out, is that as we may call
      retire-requests, the unbound_list is outside of our control. We must
      only process a single element of that list at a time, that is we can not
      rely on the "safe" next pointer being valid after a call to
      i915_vma_unbind().
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        IP: [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
        PGD 758d3067 PUD ac0d6067 PMD 0
        Oops: 0000 [#1] SMP
        Modules linked in: dm_mod snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support pcspkr snd_hda_intel i2c_i801 snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd lpc_ich mfd_core soundcore battery ac option usb_wwan usbserial uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev i915 video button drm_kms_helper drm acpi_cpufreq mperf freq_table
        CPU: 1 PID: 16835 Comm: fbo-maxsize Not tainted 3.11.0-rc7_nightlytop_8fdad4_20130902_+ #7977
        task: ffff8800712106d0 ti: ffff880028e4a000 task.ti: ffff880028e4a000
        RIP: 0010:[<ffffffffa0082892>]  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
        RSP: 0018:ffff880028e4b9e8  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff880145734000 RCX: ffff880145735328
        RDX: ffff8801457353fc RSI: 0000000000000000 RDI: ffff88007597cc00
        RBP: ffff88007597cc00 R08: 0000000000000001 R09: ffff88014f257f00
        R10: ffffea0001d65f00 R11: 0000000000bba60b R12: ffff880149e5b000
        R13: ffff880145734001 R14: ffff88007597ccc8 R15: ffff88007597cc00
        FS:  00007ff5bc919740(0000) GS:ffff88014f240000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000008 CR3: 0000000028f4c000 CR4: 00000000001407e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Stack:
         0000000000000000 ffff88007597cc00 ffff8801440d6840 0000000000000000
         ffff880145734000 ffffffffa007c854 0000000000000010 ffff88007597c900
         0000000000018000 00000000004a1201 ffff88007597cc60 ffffffffa007d183
        Call Trace:
         [<ffffffffa007c854>] ? i915_vma_unbind+0xe2/0x1d1 [i915]
         [<ffffffffa007d183>] ? __i915_gem_shrink+0xf1/0x162 [i915]
         [<ffffffffa007d2ee>] ? i915_gem_object_get_pages_gtt+0xfa/0x303 [i915]
         [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
         [<ffffffffa007cbda>] ? i915_gem_object_pin+0x238/0x5ce [i915]
         [<ffffffff812cba5f>] ? __sg_page_iter_next+0x2b/0x58
         [<ffffffffa0082056>] ? gen6_ppgtt_insert_entries+0xf2/0x114 [i915]
         [<ffffffffa007fe4b>] ? i915_gem_execbuffer_reserve_vma.isra.13+0x79/0x18d [i915]
         [<ffffffffa008017c>] ? i915_gem_execbuffer_reserve+0x21d/0x347 [i915]
         [<ffffffffa0080bfb>] ? i915_gem_do_execbuffer.isra.17+0x4f3/0xe61 [i915]
         [<ffffffffa00795f4>] ? i915_gem_object_get_pages+0x54/0x89 [i915]
         [<ffffffffa007e405>] ? i915_gem_pwrite_ioctl+0x743/0x7a5 [i915]
         [<ffffffffa0081a46>] ? i915_gem_execbuffer2+0x15e/0x1e4 [i915]
         [<ffffffffa000e20d>] ? drm_ioctl+0x2a5/0x3c4 [drm]
         [<ffffffffa00818e8>] ? i915_gem_execbuffer+0x37f/0x37f [i915]
         [<ffffffff816f64c0>] ? __do_page_fault+0x3ab/0x449
         [<ffffffff810be3da>] ? do_mmap_pgoff+0x2b2/0x341
         [<ffffffff810e49be>] ? vfs_ioctl+0x1e/0x31
         [<ffffffff810e5194>] ? do_vfs_ioctl+0x3ad/0x3ef
         [<ffffffff810e5224>] ? SyS_ioctl+0x4e/0x7e
         [<ffffffff816f88d2>] ? system_call_fastpath+0x16/0x1b
        Code: 52 0c a0 48 c7 c6 22 30 0d a0 31 c0 e8 ef 00 f9 ff bf c6 a7 00 00 e8 90 5d 24 e1 f6 85 13 01 00 00 10 75 44 48 8b 85 18 01 00 00 <8b> 50 08 48 8b 30 49 8b 84 24 88 02 00 00 48 89 c7 48 81 c7 98
        RIP  [<ffffffffa0082892>] i915_gem_gtt_finish_object+0x68/0xbd [i915]
        RSP <ffff880028e4b9e8>
        CR2: 0000000000000008
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      [danvet: Bikeshed the comments a bit as discussed with Chris.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      57094f82
  6. 04 9月, 2013 16 次提交
  7. 03 9月, 2013 7 次提交
  8. 02 9月, 2013 7 次提交