1. 16 3月, 2017 1 次提交
    • C
      drm/i915/userptr: Reinvent GGTT self-faulting protection · 15c344f4
      Chris Wilson 提交于
      lockdep doesn't like us taking the mm->mmap_sem inside the get_pages
      callback for a couple of reasons. The straightforward deadlock:
      
      [13755.434059] =============================================
      [13755.434061] [ INFO: possible recursive locking detected ]
      [13755.434064] 4.11.0-rc1-CI-CI_DRM_297+ #1 Tainted: G     U
      [13755.434066] ---------------------------------------------
      [13755.434068] gem_userptr_bli/8398 is trying to acquire lock:
      [13755.434070]  (&mm->mmap_sem){++++++}, at: [<ffffffffa00c988a>] i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
      [13755.434096]
                     but task is already holding lock:
      [13755.434098]  (&mm->mmap_sem){++++++}, at: [<ffffffff8104d485>] __do_page_fault+0x105/0x560
      [13755.434105]
                     other info that might help us debug this:
      [13755.434108]  Possible unsafe locking scenario:
      
      [13755.434110]        CPU0
      [13755.434111]        ----
      [13755.434112]   lock(&mm->mmap_sem);
      [13755.434115]   lock(&mm->mmap_sem);
      [13755.434117]
                      *** DEADLOCK ***
      
      [13755.434121]  May be due to missing lock nesting notation
      
      [13755.434126] 2 locks held by gem_userptr_bli/8398:
      [13755.434128]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8104d485>] __do_page_fault+0x105/0x560
      [13755.434135]  #1:  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa00b887d>] __i915_gem_object_get_pages+0x1d/0x70 [i915]
      [13755.434156]
                     stack backtrace:
      [13755.434161] CPU: 3 PID: 8398 Comm: gem_userptr_bli Tainted: G     U          4.11.0-rc1-CI-CI_DRM_297+ #1
      [13755.434165] Hardware name: GIGABYTE GB-BKi7(H)A-7500/MFLP7AP-00, BIOS F4 02/20/2017
      [13755.434169] Call Trace:
      [13755.434174]  dump_stack+0x67/0x92
      [13755.434178]  __lock_acquire+0x133a/0x1b50
      [13755.434182]  lock_acquire+0xc9/0x220
      [13755.434200]  ? i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
      [13755.434204]  down_read+0x42/0x70
      [13755.434221]  ? i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
      [13755.434238]  i915_gem_userptr_get_pages+0x5a/0x2e0 [i915]
      [13755.434255]  ____i915_gem_object_get_pages+0x25/0x60 [i915]
      [13755.434272]  __i915_gem_object_get_pages+0x59/0x70 [i915]
      [13755.434288]  i915_gem_fault+0x397/0x6a0 [i915]
      [13755.434304]  ? i915_gem_fault+0x1a1/0x6a0 [i915]
      [13755.434308]  ? __lock_acquire+0x449/0x1b50
      [13755.434311]  ? __lock_acquire+0x449/0x1b50
      [13755.434315]  ? vm_mmap_pgoff+0xa9/0xd0
      [13755.434318]  __do_fault+0x19/0x70
      [13755.434321]  __handle_mm_fault+0x863/0xe50
      [13755.434325]  handle_mm_fault+0x17f/0x370
      [13755.434329]  ? handle_mm_fault+0x40/0x370
      [13755.434332]  __do_page_fault+0x279/0x560
      [13755.434336]  do_page_fault+0xc/0x10
      [13755.434339]  page_fault+0x22/0x30
      [13755.434342] RIP: 0033:0x7f5ab91b5880
      [13755.434345] RSP: 002b:00007fff62922218 EFLAGS: 00010216
      [13755.434348] RAX: 0000000000b74500 RBX: 00007f5ab7f81000 RCX: 0000000000000000
      [13755.434352] RDX: 0000000000100000 RSI: 00007f5ab7f81000 RDI: 00007f5aba61c000
      [13755.434355] RBP: 00007f5aba61c000 R08: 0000000000000007 R09: 0000000100000000
      [13755.434359] R10: 000000000000037d R11: 00007f5ab91b5840 R12: 0000000000000001
      [13755.434362] R13: 0000000000000005 R14: 0000000000000001 R15: 0000000000000000
      
      and cyclic deadlocks:
      
      [ 2566.458979] ======================================================
      [ 2566.459054] [ INFO: possible circular locking dependency detected ]
      [ 2566.459127] 4.11.0-rc1+ #26 Not tainted
      [ 2566.459194] -------------------------------------------------------
      [ 2566.459266] gem_streaming_w/759 is trying to acquire lock:
      [ 2566.459334]  (&obj->mm.lock){+.+.+.}, at: [<ffffffffa034bc80>] i915_gem_object_pin_pages+0x0/0xc0 [i915]
      [ 2566.459605]
      [ 2566.459605] but task is already holding lock:
      [ 2566.459699]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106fd11>] __do_page_fault+0x121/0x500
      [ 2566.459814]
      [ 2566.459814] which lock already depends on the new lock.
      [ 2566.459814]
      [ 2566.459934]
      [ 2566.459934] the existing dependency chain (in reverse order) is:
      [ 2566.460030]
      [ 2566.460030] -> #1 (&mm->mmap_sem){++++++}:
      [ 2566.460139]        lock_acquire+0xfe/0x220
      [ 2566.460214]        down_read+0x4e/0x90
      [ 2566.460444]        i915_gem_userptr_get_pages+0x6e/0x340 [i915]
      [ 2566.460669]        ____i915_gem_object_get_pages+0x8b/0xd0 [i915]
      [ 2566.460900]        __i915_gem_object_get_pages+0x6a/0x80 [i915]
      [ 2566.461132]        __i915_vma_do_pin+0x7fa/0x930 [i915]
      [ 2566.461352]        eb_add_vma+0x67b/0x830 [i915]
      [ 2566.461572]        eb_lookup_vmas+0xafe/0x1010 [i915]
      [ 2566.461792]        i915_gem_do_execbuffer+0x715/0x2870 [i915]
      [ 2566.462012]        i915_gem_execbuffer2+0x106/0x2b0 [i915]
      [ 2566.462152]        drm_ioctl+0x36c/0x670 [drm]
      [ 2566.462236]        do_vfs_ioctl+0x12c/0xa60
      [ 2566.462317]        SyS_ioctl+0x41/0x70
      [ 2566.462399]        entry_SYSCALL_64_fastpath+0x1c/0xb1
      [ 2566.462477]
      [ 2566.462477] -> #0 (&obj->mm.lock){+.+.+.}:
      [ 2566.462587]        __lock_acquire+0x1602/0x1790
      [ 2566.462661]        lock_acquire+0xfe/0x220
      [ 2566.462893]        i915_gem_object_pin_pages+0x4c/0xc0 [i915]
      [ 2566.463116]        i915_gem_fault+0x2c2/0x8c0 [i915]
      [ 2566.463197]        __do_fault+0x42/0x130
      [ 2566.463276]        __handle_mm_fault+0x92c/0x1280
      [ 2566.463356]        handle_mm_fault+0x1e2/0x440
      [ 2566.463443]        __do_page_fault+0x1c4/0x500
      [ 2566.463529]        do_page_fault+0xc/0x10
      [ 2566.463613]        page_fault+0x1f/0x30
      [ 2566.463693]
      [ 2566.463693] other info that might help us debug this:
      [ 2566.463693]
      [ 2566.463820]  Possible unsafe locking scenario:
      [ 2566.463820]
      [ 2566.463918]        CPU0                    CPU1
      [ 2566.463988]        ----                    ----
      [ 2566.464068]   lock(&mm->mmap_sem);
      [ 2566.464143]                                lock(&obj->mm.lock);
      [ 2566.464226]                                lock(&mm->mmap_sem);
      [ 2566.464304]   lock(&obj->mm.lock);
      [ 2566.464378]
      [ 2566.464378]  *** DEADLOCK ***
      [ 2566.464378]
      [ 2566.464504] 1 lock held by gem_streaming_w/759:
      [ 2566.464576]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106fd11>] __do_page_fault+0x121/0x500
      [ 2566.464699]
      [ 2566.464699] stack backtrace:
      [ 2566.464801] CPU: 0 PID: 759 Comm: gem_streaming_w Not tainted 4.11.0-rc1+ #26
      [ 2566.464881] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F8 03/02/2016
      [ 2566.464983] Call Trace:
      [ 2566.465061]  dump_stack+0x68/0x9f
      [ 2566.465144]  print_circular_bug+0x20b/0x260
      [ 2566.465234]  __lock_acquire+0x1602/0x1790
      [ 2566.465323]  ? debug_check_no_locks_freed+0x1a0/0x1a0
      [ 2566.465564]  ? i915_gem_object_wait+0x238/0x650 [i915]
      [ 2566.465657]  ? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
      [ 2566.465749]  lock_acquire+0xfe/0x220
      [ 2566.465985]  ? i915_sg_trim+0x1b0/0x1b0 [i915]
      [ 2566.466223]  i915_gem_object_pin_pages+0x4c/0xc0 [i915]
      [ 2566.466461]  ? i915_sg_trim+0x1b0/0x1b0 [i915]
      [ 2566.466699]  i915_gem_fault+0x2c2/0x8c0 [i915]
      [ 2566.466939]  ? i915_gem_pwrite_ioctl+0xce0/0xce0 [i915]
      [ 2566.467030]  ? __lock_acquire+0x642/0x1790
      [ 2566.467122]  ? __lock_acquire+0x642/0x1790
      [ 2566.467209]  ? debug_lockdep_rcu_enabled+0x35/0x40
      [ 2566.467299]  ? get_unmapped_area+0x1b4/0x1d0
      [ 2566.467387]  __do_fault+0x42/0x130
      [ 2566.467474]  __handle_mm_fault+0x92c/0x1280
      [ 2566.467564]  ? __pmd_alloc+0x1e0/0x1e0
      [ 2566.467651]  ? vm_mmap_pgoff+0x160/0x190
      [ 2566.467740]  ? handle_mm_fault+0x111/0x440
      [ 2566.467827]  handle_mm_fault+0x1e2/0x440
      [ 2566.467914]  ? handle_mm_fault+0x5d/0x440
      [ 2566.468002]  __do_page_fault+0x1c4/0x500
      [ 2566.468090]  do_page_fault+0xc/0x10
      [ 2566.468180]  page_fault+0x1f/0x30
      [ 2566.468263] RIP: 0033:0x557895ced32a
      [ 2566.468337] RSP: 002b:00007fffd6dd8a10 EFLAGS: 00010202
      [ 2566.468419] RAX: 00007f659a4db000 RBX: 0000000000000003 RCX: 00007f659ad032da
      [ 2566.468501] RDX: 0000000000000000 RSI: 0000000000100000 RDI: 0000000000000000
      [ 2566.468586] RBP: 0000000000000007 R08: 0000000000000003 R09: 0000000100000000
      [ 2566.468667] R10: 0000000000000001 R11: 0000000000000246 R12: 0000557895ceda60
      [ 2566.468749] R13: 0000000000000001 R14: 00007fffd6dd8ac0 R15: 00007f659a4db000
      
      By checking the status of the gup worker (serialized by the
      obj->mm.lock) we can determine whether it is still active, has failed or
      has succeeded. If the worker is still active (or failed), we know that
      it cannot be bound and so we can skip taking struct_mutex (risking
      potential recursion). As we check the worker status, we mark it to
      discard any partial results, forcing us to restart on the next
      get_pages.
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Fixes: 1c8782dd ("drm/i915/userptr: Disallow wrapping GTT into a userptr")
      Testcase: igt/gem_userptr_blits/map-fixed-invalidate-gup
      Testcase: igt/gem_userptr_blits/dmabuf-sync
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170315140150.19432-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      15c344f4
  2. 09 3月, 2017 3 次提交
    • C
      drm/i915/userptr: Disallow wrapping GTT into a userptr · 1c8782dd
      Chris Wilson 提交于
      If we allow the user to convert a GTT mmap address into a userptr, we
      may end up in recursion hell, where currently we hit a mutex deadlock
      but other possibilities include use-after-free during the
      unbind/cancel_userptr.
      
      [  143.203989] gem_userptr_bli D    0   902    898 0x00000000
      [  143.204054] Call Trace:
      [  143.204137]  __schedule+0x511/0x1180
      [  143.204195]  ? pci_mmcfg_check_reserved+0xc0/0xc0
      [  143.204274]  schedule+0x57/0xe0
      [  143.204327]  schedule_timeout+0x383/0x670
      [  143.204374]  ? trace_hardirqs_on_caller+0x187/0x280
      [  143.204457]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      [  143.204507]  ? usleep_range+0x110/0x110
      [  143.204657]  ? irq_exit+0x89/0x100
      [  143.204710]  ? retint_kernel+0x2d/0x2d
      [  143.204794]  ? trace_hardirqs_on_caller+0x187/0x280
      [  143.204857]  ? _raw_spin_unlock_irq+0x33/0x60
      [  143.204944]  wait_for_common+0x1f0/0x2f0
      [  143.205006]  ? out_of_line_wait_on_atomic_t+0x170/0x170
      [  143.205103]  ? wake_up_q+0xa0/0xa0
      [  143.205159]  ? flush_workqueue_prep_pwqs+0x15a/0x2c0
      [  143.205237]  wait_for_completion+0x1d/0x20
      [  143.205292]  flush_workqueue+0x2e9/0xbb0
      [  143.205339]  ? flush_workqueue+0x163/0xbb0
      [  143.205418]  ? __schedule+0x533/0x1180
      [  143.205498]  ? check_flush_dependency+0x1a0/0x1a0
      [  143.205681]  i915_gem_userptr_mn_invalidate_range_start+0x1c7/0x270 [i915]
      [  143.205865]  ? i915_gem_userptr_dmabuf_export+0x40/0x40 [i915]
      [  143.205955]  __mmu_notifier_invalidate_range_start+0xc6/0x120
      [  143.206044]  ? __mmu_notifier_invalidate_range_start+0x51/0x120
      [  143.206123]  zap_page_range_single+0x1c7/0x1f0
      [  143.206171]  ? unmap_single_vma+0x160/0x160
      [  143.206260]  ? unmap_mapping_range+0xa9/0x1b0
      [  143.206308]  ? vma_interval_tree_subtree_search+0x75/0xd0
      [  143.206397]  unmap_mapping_range+0x18f/0x1b0
      [  143.206444]  ? zap_vma_ptes+0x70/0x70
      [  143.206524]  ? __pm_runtime_resume+0x67/0xa0
      [  143.206723]  i915_gem_release_mmap+0x1ba/0x1c0 [i915]
      [  143.206846]  i915_vma_unbind+0x5c2/0x690 [i915]
      [  143.206925]  ? __lock_is_held+0x52/0x100
      [  143.207076]  i915_gem_object_set_tiling+0x1db/0x650 [i915]
      [  143.207236]  i915_gem_set_tiling_ioctl+0x1d3/0x3b0 [i915]
      [  143.207377]  ? i915_gem_set_tiling_ioctl+0x5/0x3b0 [i915]
      [  143.207457]  drm_ioctl+0x36c/0x670
      [  143.207535]  ? debug_lockdep_rcu_enabled.part.0+0x1a/0x30
      [  143.207730]  ? i915_gem_object_set_tiling+0x650/0x650 [i915]
      [  143.207793]  ? drm_getunique+0x120/0x120
      [  143.207875]  ? __handle_mm_fault+0x996/0x14a0
      [  143.207939]  ? vm_insert_page+0x340/0x340
      [  143.208028]  ? up_write+0x28/0x50
      [  143.208086]  ? vm_mmap_pgoff+0x160/0x190
      [  143.208163]  do_vfs_ioctl+0x12c/0xa60
      [  143.208218]  ? debug_lockdep_rcu_enabled+0x35/0x40
      [  143.208267]  ? ioctl_preallocate+0x150/0x150
      [  143.208353]  ? __do_page_fault+0x36a/0x6e0
      [  143.208400]  ? mark_held_locks+0x23/0xc0
      [  143.208479]  ? up_read+0x1f/0x40
      [  143.208526]  ? entry_SYSCALL_64_fastpath+0x5/0xc6
      [  143.208669]  ? __fget_light+0xa7/0xc0
      [  143.208747]  SyS_ioctl+0x41/0x70
      
      To prevent the possibility of a deadlock, we defer scheduling the worker
      until after we have proven that given the current mm, the userptr range
      does not overlap a GGTT mmaping. If another thread tries to remap the
      GGTT over the userptr before the worker is scheduled, it will be stopped
      by its invalidate-range flushing the current work, before the deadlock
      can occur.
      
      v2: Improve discussion of how we end up in the deadlock.
      v3: Don't forget to mark the userptr as active after a successful
      gup_fast. Rename overlaps_ggtt to noncontiguous_or_overlaps_ggtt.
      v4: Fix test ordering between invalid GTT mmaping and range completion
      (Tvrtko)
      Reported-by: NMichał Winiarski <michal.winiarski@intel.com>
      Testcase: igt/gem_userptr_blits/map-fixed-invalidate-gup
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170308215903.24171-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      1c8782dd
    • C
      drm/i915/userptr: Only flush the workqueue if required · d151e9ce
      Chris Wilson 提交于
      To avoid waiting for work from other invalidate-range threads where
      not required, only wait on the userptr cancel workqueue if we have added
      some work to it.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170307205851.32578-2-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      d151e9ce
    • C
      drm/i915/userptr: Deactivate a failed userptr if the worker reports an EFAULT · 42953b3c
      Chris Wilson 提交于
      If the worker fails, it no longer has pages to release and can be
      immediately removed from the invalidate-tree.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170307205851.32578-1-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      42953b3c
  3. 02 3月, 2017 1 次提交
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> · 6e84f315
      Ingo Molnar 提交于
      We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/mm.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      The APIs that are going to be moved first are:
      
         mm_alloc()
         __mmdrop()
         mmdrop()
         mmdrop_async_fn()
         mmdrop_async()
         mmget_not_zero()
         mmput()
         mmput_async()
         get_task_mm()
         mm_access()
         mm_release()
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6e84f315
  4. 28 2月, 2017 2 次提交
  5. 15 12月, 2016 1 次提交
    • L
      mm: add locked parameter to get_user_pages_remote() · 5b56d49f
      Lorenzo Stoakes 提交于
      Patch series "mm: unexport __get_user_pages_unlocked()".
      
      This patch series continues the cleanup of get_user_pages*() functions
      taking advantage of the fact we can now pass gup_flags as we please.
      
      It firstly adds an additional 'locked' parameter to
      get_user_pages_remote() to allow for its callers to utilise
      VM_FAULT_RETRY functionality.  This is necessary as the invocation of
      __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
      this and no other existing higher level function would allow it to do
      so.
      
      Secondly existing callers of __get_user_pages_unlocked() are replaced
      with the appropriate higher-level replacement -
      get_user_pages_unlocked() if the current task and memory descriptor are
      referenced, or get_user_pages_remote() if other task/memory descriptors
      are referenced (having acquiring mmap_sem.)
      
      This patch (of 2):
      
      Add a int *locked parameter to get_user_pages_remote() to allow
      VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
      
      Taking into account the previous adjustments to get_user_pages*()
      functions allowing for the passing of gup_flags, we are now in a
      position where __get_user_pages_unlocked() need only be exported for his
      ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
      subsequently unexport __get_user_pages_unlocked() as well as allowing
      for future flexibility in the use of get_user_pages_remote().
      
      [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
        Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b56d49f
  6. 02 12月, 2016 1 次提交
  7. 11 11月, 2016 1 次提交
  8. 02 11月, 2016 1 次提交
  9. 01 11月, 2016 1 次提交
  10. 29 10月, 2016 6 次提交
  11. 19 10月, 2016 1 次提交
  12. 09 9月, 2016 1 次提交
  13. 19 8月, 2016 1 次提交
  14. 05 8月, 2016 3 次提交
  15. 04 8月, 2016 5 次提交
  16. 20 7月, 2016 4 次提交
  17. 20 5月, 2016 2 次提交
  18. 18 4月, 2016 1 次提交
  19. 14 4月, 2016 2 次提交
    • C
      drm/i915: Prevent leaking of -EIO from i915_wait_request() · f4457ae7
      Chris Wilson 提交于
      Reporting -EIO from i915_wait_request() has proven very troublematic
      over the years, with numerous hard-to-reproduce bugs cropping up in the
      corner case of where a reset occurs and the code wasn't expecting such
      an error.
      
      If the we reset the GPU or have detected a hang and wish to reset the
      GPU, the request is forcibly complete and the wait broken. Currently, we
      report either -EAGAIN or -EIO in order for the caller to retreat and
      restart the wait (if appropriate) after dropping and then reacquiring
      the struct_mutex (essential to allow the GPU reset to proceed). However,
      if we take the view that the request is complete (no further work will
      be done on it by the GPU because it is dead and soon to be reset), then
      we can proceed with the task at hand and then drop the struct_mutex
      allowing the reset to occur. This transfers the burden of checking
      whether it is safe to proceed to the caller, which in all but one
      instance it is safe - completely eliminating the source of all spurious
      -EIO.
      
      Of note, we only have two API entry points where we expect that
      userspace can observe an EIO. First is when submitting an execbuf, if
      the GPU is terminally wedged, then the operation cannot succeed and an
      -EIO is reported. Secondly, existing userspace uses the throttle ioctl
      to detect an already wedged GPU before starting using HW acceleration
      (or to confirm that the GPU is wedged after an error condition). So if
      the GPU is wedged when the user calls throttle, also report -EIO.
      
      v2: Split more carefully the change to i915_wait_request() and assorted
      ABI from the reset handling.
      v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
      so that we don't start to leak EIO there in future (and break our hang
      resistant modesetting).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
      Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
      f4457ae7
    • C
      drm/i915: Store the reset counter when constructing a request · 299259a3
      Chris Wilson 提交于
      As the request is only valid during the same global reset epoch, we can
      record the current reset_counter when constructing the request and reuse
      it when waiting upon that request in future. This removes a very hairy
      atomic check serialised by the struct_mutex at the time of waiting and
      allows us to transfer those waits to a central dispatcher for all
      waiters and all requests.
      
      PS: With per-engine resets, we obviously cannot assume a global reset
      epoch for the requests - a per-engine epoch makes the most sense. The
      challenge then is how to handle checking in the waiter for when to break
      the wait, as the fine-grained reset may also want to requeue the
      request (i.e. the assumption that just because the epoch changes the
      request is completed may be broken - or we just avoid breaking that
      assumption with the fine-grained resets).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
      299259a3
  20. 12 4月, 2016 2 次提交