1. 28 5月, 2019 2 次提交
  2. 20 4月, 2019 1 次提交
    • C
      drm/i915: Start writeback from the shrinker · 2d6692e6
      Chris Wilson 提交于
      When we are called to relieve mempressue via the shrinker, the only way
      we can make progress is either by discarding unwanted pages (those
      objects that userspace has marked MADV_DONTNEED) or by reclaiming the
      dirty objects via swap. As we know that is the only way to make further
      progress, we can initiate the writeback as we invalidate the objects.
      This means the objects we put onto the inactive anon lru list are
      already marked for reclaim+writeback and so will trigger a wait upon the
      writeback inside direct reclaim, greatly improving the success rate of
      direct reclaim on i915 objects.
      
      The corollary is that we may start a slow swap on opportunistic
      mempressure from the likes of the compaction + migration kthreads. This
      is limited by those threads only being allowed to shrink idle pages, but
      also that if we reactivate the page before it is swapped out by gpu
      activity, we only page the cost of repinning the page. The cost is most
      felt when an object is reused after mempressure, which hopefully
      excludes the latency sensitive tasks (as we are just extending the
      impact of swap thrashing to them).
      
      Apparently this is not the first time we've had this idea. Back in
      commit 5537252b ("drm/i915: Invalidate our pages under memory
      pressure") we wanted to start writeback but settled on invalidate after
      Hugh Dickins warned us about a possibility of a deadlock within shmemfs
      if we started writeback from shrink_slab. Looking at the callchain,
      using writeback from i915_gem_shrink should be equivalent to the pageout
      also employed by shrink_slab, i.e. it should not be any riskier afaict.
      
      v2: Leave mmapings intact. At this point, the only mmapings of our
      objects will be via CPU mmaps on the shmemfs filp, which are
      out-of-scope for our LRU tracking. Instead leave those pages to the
      inactive anon LRU page list for aging and pageout as normal.
      
      v3: Be selective on which paths trigger writeback, in particular
      excluding paths shrinking just to reclaim vm space (e.g. mmap, vmap
      reapers) and avoid starting writeback on the entire process space from
      within the pm freezer.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=108686Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Michal Hocko <mhocko@suse.com>
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v1
      Link: https://patchwork.freedesktop.org/patch/msgid/20190420115539.29081-1-chris@chris-wilson.co.uk
      2d6692e6
  3. 29 1月, 2019 2 次提交
    • C
      drm/i915: Pull VM lists under the VM mutex. · 09d7e46b
      Chris Wilson 提交于
      A starting point to counter the pervasive struct_mutex. For the goal of
      avoiding (or at least blocking under them!) global locks during user
      request submission, a simple but important step is being able to manage
      each clients GTT separately. For which, we want to replace using the
      struct_mutex as the guard for all things GTT/VM and switch instead to a
      specific mutex inside i915_address_space.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-2-chris@chris-wilson.co.uk
      09d7e46b
    • C
      drm/i915: Stop tracking MRU activity on VMA · 499197dc
      Chris Wilson 提交于
      Our goal is to remove struct_mutex and replace it with fine grained
      locking. One of the thorny issues is our eviction logic for reclaiming
      space for an execbuffer (or GTT mmaping, among a few other examples).
      While eviction itself is easy to move under a per-VM mutex, performing
      the activity tracking is less agreeable. One solution is not to do any
      MRU tracking and do a simple coarse evaluation during eviction of
      active/inactive, with a loose temporal ordering of last
      insertion/evaluation. That keeps all the locking constrained to when we
      are manipulating the VM itself, neatly avoiding the tricky handling of
      possible recursive locking during execbuf and elsewhere.
      
      Note that discarding the MRU (currently implemented as a pair of lists,
      to avoid scanning the active list for a NONBLOCKING search) is unlikely
      to impact upon our efficiency to reclaim VM space (where we think a LRU
      model is best) as our current strategy is to use random idle replacement
      first before doing a search, and over time the use of softpinned 48b
      per-ppGTT is growing (thereby eliminating any need to perform any eviction
      searches, in theory at least) with the remaining users being found on
      much older devices (gen2-gen6).
      
      v2: Changelog and commentary rewritten to elaborate on the duality of a
      single list being both an inactive and active list.
      v3: Consolidate bool parameters into a single set of flags; don't
      comment on the duality of a single variable being a multiplicity of
      bits.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-1-chris@chris-wilson.co.uk
      499197dc
  4. 15 1月, 2019 3 次提交
  5. 10 1月, 2019 2 次提交
  6. 09 1月, 2019 1 次提交
  7. 08 1月, 2019 1 次提交
    • C
      drm/i915: Return immediately if trylock fails for direct-reclaim · d25f71a1
      Chris Wilson 提交于
      Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
      in the shrinker while performing direct-reclaim. The trade-off being
      (much) lower latency for non-i915 clients at an increased risk of being
      unable to obtain a page from direct-reclaim without hitting the
      oom-notifier. The proviso being that we still keep trying to hard
      obtain the lock for kswapd so that we can reap under heavy memory
      pressure.
      
      v2: Taint all mutexes taken within the shrinker with the struct_mutex
      subclass as an early warning system, and drop I915_SHRINK_ACTIVE from
      vmap to reduce the number of dangerous paths. We also have to drop
      I915_SHRINK_ACTIVE from oom-notifier to be able to make the same claim
      that ACTIVE is only used from outside context, which fits in with a
      longer strategy of avoiding stalls due to scanning active during
      shrinking.
      
      The danger in using the subclass struct_mutex is that we declare
      ourselves more knowledgable than lockdep and deprive ourselves of
      automatic coverage. Instead, we require ourselves to mark up any mutex
      taken inside the shrinker in order to detect lock-inversion, and if we
      miss any we are doomed to a deadlock at the worst possible moment.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190107115509.12523-1-chris@chris-wilson.co.uk
      d25f71a1
  8. 11 7月, 2018 1 次提交
  9. 09 7月, 2018 1 次提交
    • C
      drm/i915: Provide a timeout to i915_gem_wait_for_idle() · ec625fb9
      Chris Wilson 提交于
      Usually we have no idea about the upper bound we need to wait to catch
      up with userspace when idling the device, but in a few situations we
      know the system was idle beforehand and can provide a short timeout in
      order to very quickly catch a failure, long before hangcheck kicks in.
      
      In the following patches, we will use the timeout to curtain two overly
      long waits, where we know we can expect the GPU to complete within a
      reasonable time or declare it broken.
      
      In particular, with a broken GPU we expect it to fail during the initial
      GPU setup where do a couple of context switches to record the defaults.
      This is a task that takes a few milliseconds even on the slowest of
      devices, but we may have to wait 60s for hangcheck to give in and
      declare the machine inoperable. In this a case where any gpu hang is
      unacceptable, both from a timeliness and practical standpoint.
      
      The other improvement is that in selftests, we do not need to arm an
      independent timer to inject a wedge, as we can just limit the timeout on
      the wait directly.
      
      v2: Include the timeout parameter in the trace.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
      ec625fb9
  10. 06 6月, 2018 1 次提交
  11. 22 2月, 2018 1 次提交
  12. 01 2月, 2018 1 次提交
  13. 18 1月, 2018 1 次提交
  14. 28 11月, 2017 2 次提交
  15. 09 11月, 2017 1 次提交
    • C
      drm/i915: Idle the GPU before shinking everything · 0676e794
      Chris Wilson 提交于
      The handling of contexts are peculiar. Instead of tieing their vma to
      activity, we pin the context. This means that we cannot simply unbind
      the context object itself at will (which would normally cause us to wait
      for the vma to be idle), but must manually idle the GPU and retire
      requests first.
      
      A consequence of this peculiarity is when doing a last desperate attempt
      to recover memory. If the memory is tied up inside active context
      objects, we will fail to recover any memory simply by trying to unbind
      the objects without first doing a wait-for-idle.
      
      A side-effect of removing the call to shrinker_lock_uninterruptible()
      from i915_gem_shrinker_oom() was that we removed an unlocked
      wait-for-idle, and so lost the "natural" shrinkage of context objects.
      By replacing that with a locked wait from inside i915_gem_shrink(), we
      not only replace it with the ability to recover all context objects, but
      do so for all i915_gem_shrink_all() callers.
      
      v2: Switching requires request allocation, which is not permitted from
      inside the shrinker as it only uses ordinary allocations.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=102936
      Fixes: f2123818 ("drm/i915: Move dev_priv->mm.[un]bound_list to its own lock")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171108094400.1386-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit 2f6a3783)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      0676e794
  16. 08 11月, 2017 1 次提交
    • C
      drm/i915: Idle the GPU before shinking everything · 2f6a3783
      Chris Wilson 提交于
      The handling of contexts are peculiar. Instead of tieing their vma to
      activity, we pin the context. This means that we cannot simply unbind
      the context object itself at will (which would normally cause us to wait
      for the vma to be idle), but must manually idle the GPU and retire
      requests first.
      
      A consequence of this peculiarity is when doing a last desperate attempt
      to recover memory. If the memory is tied up inside active context
      objects, we will fail to recover any memory simply by trying to unbind
      the objects without first doing a wait-for-idle.
      
      A side-effect of removing the call to shrinker_lock_uninterruptible()
      from i915_gem_shrinker_oom() was that we removed an unlocked
      wait-for-idle, and so lost the "natural" shrinkage of context objects.
      By replacing that with a locked wait from inside i915_gem_shrink(), we
      not only replace it with the ability to recover all context objects, but
      do so for all i915_gem_shrink_all() callers.
      
      v2: Switching requires request allocation, which is not permitted from
      inside the shrinker as it only uses ordinary allocations.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=102936
      Fixes: f2123818 ("drm/i915: Move dev_priv->mm.[un]bound_list to its own lock")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171108094400.1386-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2f6a3783
  17. 17 10月, 2017 5 次提交
  18. 07 9月, 2017 1 次提交
    • C
      drm/i915: wire up shrinkctl->nr_scanned · 912d572d
      Chris Wilson 提交于
      shrink_slab() allows us to report back the number of objects we
      successfully scanned (out of the target shrinkctl->nr_to_scan).  As
      report the number of pages owned by each GEM object as a separate item
      to the shrinker, we cannot precisely control the number of shrinker
      objects we scan on each pass; and indeed may free more than requested.
      If we fail to tell the shrinker about the number of objects we process,
      it will continue to hold a grudge against us as any objects left
      unscanned are added to the next reclaim -- and so we will keep on
      "unfairly" shrinking our own slab in comparison to other slabs.
      
      Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.ukSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      912d572d
  19. 07 8月, 2017 1 次提交
  20. 04 8月, 2017 1 次提交
  21. 14 6月, 2017 1 次提交
  22. 02 6月, 2017 1 次提交
  23. 26 5月, 2017 1 次提交
  24. 18 5月, 2017 1 次提交
  25. 11 5月, 2017 1 次提交
  26. 11 4月, 2017 1 次提交
    • J
      drm/i915: Don't call synchronize_rcu_expedited under struct_mutex · c053b5a5
      Joonas Lahtinen 提交于
      Only call synchronize_rcu_expedited after unlocking struct_mutex to
      avoid deadlock because the workqueues depend on struct_mutex.
      
      >From original patch by Andrea:
      
      synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will
      hang until its own workqueues are run. The i915 gem workqueues will
      wait on the struct_mutex to be released. So we cannot wait for a
      quiescent state using those rcu primitives while holding the
      struct_mutex or it creates a circular lock dependency resulting in
      kernel hangs (which is reproducible but goes undetected by lockdep).
      
      kswapd0         D    0   700      2 0x00000000
      Call Trace:
      ? __schedule+0x1a5/0x660
      ? schedule+0x36/0x80
      ? _synchronize_rcu_expedited.constprop.65+0x2ef/0x300
      ? wake_up_bit+0x20/0x20
      ? rcu_stall_kick_kthreads.part.54+0xc0/0xc0
      ? rcu_exp_wait_wake+0x530/0x530
      ? i915_gem_shrink+0x34b/0x4b0
      ? i915_gem_shrinker_scan+0x7c/0x90
      ? i915_gem_shrinker_scan+0x7c/0x90
      ? shrink_slab.part.61.constprop.72+0x1c1/0x3a0
      ? shrink_zone+0x154/0x160
      ? kswapd+0x40a/0x720
      ? kthread+0xf4/0x130
      ? try_to_free_pages+0x450/0x450
      ? kthread_create_on_node+0x40/0x40
      ? ret_from_fork+0x23/0x30
      plasmashell     D    0  4657   4614 0x00000000
      Call Trace:
      ? __schedule+0x1a5/0x660
      ? schedule+0x36/0x80
      ? schedule_preempt_disabled+0xe/0x10
      ? __mutex_lock.isra.4+0x1c9/0x790
      ? i915_gem_close_object+0x26/0xc0
      ? i915_gem_close_object+0x26/0xc0
      ? drm_gem_object_release_handle+0x48/0x90
      ? drm_gem_handle_delete+0x50/0x80
      ? drm_ioctl+0x1fa/0x420
      ? drm_gem_handle_create+0x40/0x40
      ? pipe_write+0x391/0x410
      ? __vfs_write+0xc6/0x120
      ? do_vfs_ioctl+0x8b/0x5d0
      ? SyS_ioctl+0x3b/0x70
      ? entry_SYSCALL_64_fastpath+0x13/0x94
      kworker/0:0     D    0 29186      2 0x00000000
      Workqueue: events __i915_gem_free_work
      Call Trace:
      ? __schedule+0x1a5/0x660
      ? schedule+0x36/0x80
      ? schedule_preempt_disabled+0xe/0x10
      ? __mutex_lock.isra.4+0x1c9/0x790
      ? del_timer_sync+0x44/0x50
      ? update_curr+0x57/0x110
      ? __i915_gem_free_objects+0x31/0x300
      ? __i915_gem_free_objects+0x31/0x300
      ? __i915_gem_free_work+0x2d/0x40
      ? process_one_work+0x13a/0x3b0
      ? worker_thread+0x4a/0x460
      ? kthread+0xf4/0x130
      ? process_one_work+0x3b0/0x3b0
      ? kthread_create_on_node+0x40/0x40
      ? ret_from_fork+0x23/0x30
      
      Fixes: 3d3d18f0 ("drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)")
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      (cherry picked from commit 8f612d05)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      c053b5a5
  27. 07 4月, 2017 2 次提交
    • J
      drm/i915: Simplify shrinker locking · e92075ff
      Joonas Lahtinen 提交于
      By using the same structure for both interruptible and
      uninterruptible locking in shrinker code, combined with the
      information that mm.interruptible is only being written to, the
      code can be greatly simplified.
      
      Also removing the i915_gem_ prefix from the locking functions so
      that nobody in their wildest dreams considers exporting them.
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1491562175-27680-1-git-send-email-joonas.lahtinen@linux.intel.com
      e92075ff
    • J
      drm/i915: Don't call synchronize_rcu_expedited under struct_mutex · 8f612d05
      Joonas Lahtinen 提交于
      Only call synchronize_rcu_expedited after unlocking struct_mutex to
      avoid deadlock because the workqueues depend on struct_mutex.
      
      >From original patch by Andrea:
      
      synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will
      hang until its own workqueues are run. The i915 gem workqueues will
      wait on the struct_mutex to be released. So we cannot wait for a
      quiescent state using those rcu primitives while holding the
      struct_mutex or it creates a circular lock dependency resulting in
      kernel hangs (which is reproducible but goes undetected by lockdep).
      
      kswapd0         D    0   700      2 0x00000000
      Call Trace:
      ? __schedule+0x1a5/0x660
      ? schedule+0x36/0x80
      ? _synchronize_rcu_expedited.constprop.65+0x2ef/0x300
      ? wake_up_bit+0x20/0x20
      ? rcu_stall_kick_kthreads.part.54+0xc0/0xc0
      ? rcu_exp_wait_wake+0x530/0x530
      ? i915_gem_shrink+0x34b/0x4b0
      ? i915_gem_shrinker_scan+0x7c/0x90
      ? i915_gem_shrinker_scan+0x7c/0x90
      ? shrink_slab.part.61.constprop.72+0x1c1/0x3a0
      ? shrink_zone+0x154/0x160
      ? kswapd+0x40a/0x720
      ? kthread+0xf4/0x130
      ? try_to_free_pages+0x450/0x450
      ? kthread_create_on_node+0x40/0x40
      ? ret_from_fork+0x23/0x30
      plasmashell     D    0  4657   4614 0x00000000
      Call Trace:
      ? __schedule+0x1a5/0x660
      ? schedule+0x36/0x80
      ? schedule_preempt_disabled+0xe/0x10
      ? __mutex_lock.isra.4+0x1c9/0x790
      ? i915_gem_close_object+0x26/0xc0
      ? i915_gem_close_object+0x26/0xc0
      ? drm_gem_object_release_handle+0x48/0x90
      ? drm_gem_handle_delete+0x50/0x80
      ? drm_ioctl+0x1fa/0x420
      ? drm_gem_handle_create+0x40/0x40
      ? pipe_write+0x391/0x410
      ? __vfs_write+0xc6/0x120
      ? do_vfs_ioctl+0x8b/0x5d0
      ? SyS_ioctl+0x3b/0x70
      ? entry_SYSCALL_64_fastpath+0x13/0x94
      kworker/0:0     D    0 29186      2 0x00000000
      Workqueue: events __i915_gem_free_work
      Call Trace:
      ? __schedule+0x1a5/0x660
      ? schedule+0x36/0x80
      ? schedule_preempt_disabled+0xe/0x10
      ? __mutex_lock.isra.4+0x1c9/0x790
      ? del_timer_sync+0x44/0x50
      ? update_curr+0x57/0x110
      ? __i915_gem_free_objects+0x31/0x300
      ? __i915_gem_free_objects+0x31/0x300
      ? __i915_gem_free_work+0x2d/0x40
      ? process_one_work+0x13a/0x3b0
      ? worker_thread+0x4a/0x460
      ? kthread+0xf4/0x130
      ? process_one_work+0x3b0/0x3b0
      ? kthread_create_on_node+0x40/0x40
      ? ret_from_fork+0x23/0x30
      
      Fixes: 3d3d18f0 ("drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)")
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      8f612d05
  28. 21 3月, 2017 1 次提交
    • C
      drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker) · 3d3d18f0
      Chris Wilson 提交于
      The rcu_barrier() takes the cpu_hotplug mutex which itself is not
      reclaim-safe, and so rcu_barrier() is illegal from inside the shrinker.
      
      [  309.661373] =========================================================
      [  309.661376] [ INFO: possible irq lock inversion dependency detected ]
      [  309.661380] 4.11.0-rc1-CI-CI_DRM_2333+ #1 Tainted: G        W
      [  309.661383] ---------------------------------------------------------
      [  309.661386] gem_exec_gttfil/6435 just changed the state of lock:
      [  309.661389]  (rcu_preempt_state.barrier_mutex){+.+.-.}, at: [<ffffffff81100731>] _rcu_barrier+0x31/0x160
      [  309.661399] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [  309.661402]  (cpu_hotplug.lock){+.+.+.}
      [  309.661404]
      
                     and interrupts could create inverse lock ordering between them.
      
      [  309.661410]
                     other info that might help us debug this:
      [  309.661414]  Possible interrupt unsafe locking scenario:
      
      [  309.661417]        CPU0                    CPU1
      [  309.661419]        ----                    ----
      [  309.661421]   lock(cpu_hotplug.lock);
      [  309.661425]                                local_irq_disable();
      [  309.661432]                                lock(rcu_preempt_state.barrier_mutex);
      [  309.661441]                                lock(cpu_hotplug.lock);
      [  309.661446]   <Interrupt>
      [  309.661448]     lock(rcu_preempt_state.barrier_mutex);
      [  309.661453]
                      *** DEADLOCK ***
      
      [  309.661460] 4 locks held by gem_exec_gttfil/6435:
      [  309.661464]  #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff8120d83d>] vfs_write+0x17d/0x1f0
      [  309.661475]  #1:  (debugfs_srcu){......}, at: [<ffffffff81320491>] debugfs_use_file_start+0x41/0xa0
      [  309.661486]  #2:  (&attr->mutex){+.+.+.}, at: [<ffffffff8123a3e7>] simple_attr_write+0x37/0xe0
      [  309.661495]  #3:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0091b4a>] i915_drop_caches_set+0x3a/0x150 [i915]
      [  309.661540]
                     the shortest dependencies between 2nd lock and 1st lock:
      [  309.661547]  -> (cpu_hotplug.lock){+.+.+.} ops: 829 {
      [  309.661553]     HARDIRQ-ON-W at:
      [  309.661560]                       __lock_acquire+0x5e5/0x1b50
      [  309.661565]                       lock_acquire+0xc9/0x220
      [  309.661572]                       __mutex_lock+0x6e/0x990
      [  309.661576]                       mutex_lock_nested+0x16/0x20
      [  309.661583]                       get_online_cpus+0x61/0x80
      [  309.661590]                       kmem_cache_create+0x25/0x1d0
      [  309.661596]                       debug_objects_mem_init+0x30/0x249
      [  309.661602]                       start_kernel+0x341/0x3fe
      [  309.661607]                       x86_64_start_reservations+0x2a/0x2c
      [  309.661612]                       x86_64_start_kernel+0x173/0x186
      [  309.661619]                       verify_cpu+0x0/0xfc
      [  309.661622]     SOFTIRQ-ON-W at:
      [  309.661627]                       __lock_acquire+0x611/0x1b50
      [  309.661632]                       lock_acquire+0xc9/0x220
      [  309.661636]                       __mutex_lock+0x6e/0x990
      [  309.661641]                       mutex_lock_nested+0x16/0x20
      [  309.661646]                       get_online_cpus+0x61/0x80
      [  309.661650]                       kmem_cache_create+0x25/0x1d0
      [  309.661655]                       debug_objects_mem_init+0x30/0x249
      [  309.661660]                       start_kernel+0x341/0x3fe
      [  309.661664]                       x86_64_start_reservations+0x2a/0x2c
      [  309.661669]                       x86_64_start_kernel+0x173/0x186
      [  309.661674]                       verify_cpu+0x0/0xfc
      [  309.661677]     RECLAIM_FS-ON-W at:
      [  309.661682]                          mark_held_locks+0x6f/0xa0
      [  309.661687]                          lockdep_trace_alloc+0xb3/0x100
      [  309.661693]                          kmem_cache_alloc_trace+0x31/0x2e0
      [  309.661699]                          __smpboot_create_thread.part.1+0x27/0xe0
      [  309.661704]                          smpboot_create_threads+0x61/0x90
      [  309.661709]                          cpuhp_invoke_callback+0x9c/0x8a0
      [  309.661713]                          cpuhp_up_callbacks+0x31/0xb0
      [  309.661718]                          _cpu_up+0x7a/0xc0
      [  309.661723]                          do_cpu_up+0x5f/0x80
      [  309.661727]                          cpu_up+0xe/0x10
      [  309.661734]                          smp_init+0x71/0xb3
      [  309.661738]                          kernel_init_freeable+0x94/0x19e
      [  309.661743]                          kernel_init+0x9/0xf0
      [  309.661748]                          ret_from_fork+0x2e/0x40
      [  309.661752]     INITIAL USE at:
      [  309.661757]                      __lock_acquire+0x234/0x1b50
      [  309.661761]                      lock_acquire+0xc9/0x220
      [  309.661766]                      __mutex_lock+0x6e/0x990
      [  309.661771]                      mutex_lock_nested+0x16/0x20
      [  309.661775]                      get_online_cpus+0x61/0x80
      [  309.661780]                      __cpuhp_setup_state+0x44/0x170
      [  309.661785]                      page_alloc_init+0x23/0x3a
      [  309.661790]                      start_kernel+0x124/0x3fe
      [  309.661794]                      x86_64_start_reservations+0x2a/0x2c
      [  309.661799]                      x86_64_start_kernel+0x173/0x186
      [  309.661804]                      verify_cpu+0x0/0xfc
      [  309.661807]   }
      [  309.661813]   ... key      at: [<ffffffff81e37690>] cpu_hotplug+0xb0/0x100
      [  309.661817]   ... acquired at:
      [  309.661821]    lock_acquire+0xc9/0x220
      [  309.661825]    __mutex_lock+0x6e/0x990
      [  309.661829]    mutex_lock_nested+0x16/0x20
      [  309.661833]    get_online_cpus+0x61/0x80
      [  309.661837]    _rcu_barrier+0x9f/0x160
      [  309.661841]    rcu_barrier+0x10/0x20
      [  309.661847]    netdev_run_todo+0x5f/0x310
      [  309.661852]    rtnl_unlock+0x9/0x10
      [  309.661856]    default_device_exit_batch+0x133/0x150
      [  309.661862]    ops_exit_list.isra.0+0x4d/0x60
      [  309.661866]    cleanup_net+0x1d8/0x2c0
      [  309.661872]    process_one_work+0x1f4/0x6d0
      [  309.661876]    worker_thread+0x49/0x4a0
      [  309.661881]    kthread+0x107/0x140
      [  309.661884]    ret_from_fork+0x2e/0x40
      
      [  309.661890] -> (rcu_preempt_state.barrier_mutex){+.+.-.} ops: 179 {
      [  309.661896]    HARDIRQ-ON-W at:
      [  309.661901]                     __lock_acquire+0x5e5/0x1b50
      [  309.661905]                     lock_acquire+0xc9/0x220
      [  309.661910]                     __mutex_lock+0x6e/0x990
      [  309.661914]                     mutex_lock_nested+0x16/0x20
      [  309.661919]                     _rcu_barrier+0x31/0x160
      [  309.661923]                     rcu_barrier+0x10/0x20
      [  309.661928]                     netdev_run_todo+0x5f/0x310
      [  309.661932]                     rtnl_unlock+0x9/0x10
      [  309.661936]                     default_device_exit_batch+0x133/0x150
      [  309.661941]                     ops_exit_list.isra.0+0x4d/0x60
      [  309.661946]                     cleanup_net+0x1d8/0x2c0
      [  309.661951]                     process_one_work+0x1f4/0x6d0
      [  309.661955]                     worker_thread+0x49/0x4a0
      [  309.661960]                     kthread+0x107/0x140
      [  309.661964]                     ret_from_fork+0x2e/0x40
      [  309.661968]    SOFTIRQ-ON-W at:
      [  309.661972]                     __lock_acquire+0x611/0x1b50
      [  309.661977]                     lock_acquire+0xc9/0x220
      [  309.661981]                     __mutex_lock+0x6e/0x990
      [  309.661986]                     mutex_lock_nested+0x16/0x20
      [  309.661990]                     _rcu_barrier+0x31/0x160
      [  309.661995]                     rcu_barrier+0x10/0x20
      [  309.661999]                     netdev_run_todo+0x5f/0x310
      [  309.662003]                     rtnl_unlock+0x9/0x10
      [  309.662008]                     default_device_exit_batch+0x133/0x150
      [  309.662013]                     ops_exit_list.isra.0+0x4d/0x60
      [  309.662017]                     cleanup_net+0x1d8/0x2c0
      [  309.662022]                     process_one_work+0x1f4/0x6d0
      [  309.662027]                     worker_thread+0x49/0x4a0
      [  309.662031]                     kthread+0x107/0x140
      [  309.662035]                     ret_from_fork+0x2e/0x40
      [  309.662039]    IN-RECLAIM_FS-W at:
      [  309.662043]                        __lock_acquire+0x638/0x1b50
      [  309.662048]                        lock_acquire+0xc9/0x220
      [  309.662053]                        __mutex_lock+0x6e/0x990
      [  309.662058]                        mutex_lock_nested+0x16/0x20
      [  309.662062]                        _rcu_barrier+0x31/0x160
      [  309.662067]                        rcu_barrier+0x10/0x20
      [  309.662089]                        i915_gem_shrink_all+0x33/0x40 [i915]
      [  309.662109]                        i915_drop_caches_set+0x141/0x150 [i915]
      [  309.662114]                        simple_attr_write+0xc7/0xe0
      [  309.662119]                        full_proxy_write+0x4f/0x70
      [  309.662124]                        __vfs_write+0x23/0x120
      [  309.662128]                        vfs_write+0xc6/0x1f0
      [  309.662133]                        SyS_write+0x44/0xb0
      [  309.662138]                        entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  309.662142]    INITIAL USE at:
      [  309.662147]                    __lock_acquire+0x234/0x1b50
      [  309.662151]                    lock_acquire+0xc9/0x220
      [  309.662156]                    __mutex_lock+0x6e/0x990
      [  309.662160]                    mutex_lock_nested+0x16/0x20
      [  309.662165]                    _rcu_barrier+0x31/0x160
      [  309.662169]                    rcu_barrier+0x10/0x20
      [  309.662174]                    netdev_run_todo+0x5f/0x310
      [  309.662178]                    rtnl_unlock+0x9/0x10
      [  309.662183]                    default_device_exit_batch+0x133/0x150
      [  309.662188]                    ops_exit_list.isra.0+0x4d/0x60
      [  309.662192]                    cleanup_net+0x1d8/0x2c0
      [  309.662197]                    process_one_work+0x1f4/0x6d0
      [  309.662202]                    worker_thread+0x49/0x4a0
      [  309.662206]                    kthread+0x107/0x140
      [  309.662210]                    ret_from_fork+0x2e/0x40
      [  309.662214]  }
      [  309.662220]  ... key      at: [<ffffffff81e4e1c8>] rcu_preempt_state+0x508/0x780
      [  309.662225]  ... acquired at:
      [  309.662229]    check_usage_forwards+0x12b/0x130
      [  309.662233]    mark_lock+0x360/0x6f0
      [  309.662237]    __lock_acquire+0x638/0x1b50
      [  309.662241]    lock_acquire+0xc9/0x220
      [  309.662245]    __mutex_lock+0x6e/0x990
      [  309.662249]    mutex_lock_nested+0x16/0x20
      [  309.662253]    _rcu_barrier+0x31/0x160
      [  309.662257]    rcu_barrier+0x10/0x20
      [  309.662279]    i915_gem_shrink_all+0x33/0x40 [i915]
      [  309.662298]    i915_drop_caches_set+0x141/0x150 [i915]
      [  309.662303]    simple_attr_write+0xc7/0xe0
      [  309.662307]    full_proxy_write+0x4f/0x70
      [  309.662311]    __vfs_write+0x23/0x120
      [  309.662315]    vfs_write+0xc6/0x1f0
      [  309.662319]    SyS_write+0x44/0xb0
      [  309.662323]    entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      [  309.662329]
                     stack backtrace:
      [  309.662335] CPU: 1 PID: 6435 Comm: gem_exec_gttfil Tainted: G        W       4.11.0-rc1-CI-CI_DRM_2333+ #1
      [  309.662342] Hardware name: Hewlett-Packard HP Compaq 8100 Elite SFF PC/304Ah, BIOS 786H1 v01.13 07/14/2011
      [  309.662348] Call Trace:
      [  309.662354]  dump_stack+0x67/0x92
      [  309.662359]  print_irq_inversion_bug.part.19+0x1a4/0x1b0
      [  309.662365]  check_usage_forwards+0x12b/0x130
      [  309.662369]  mark_lock+0x360/0x6f0
      [  309.662374]  ? print_shortest_lock_dependencies+0x1a0/0x1a0
      [  309.662379]  __lock_acquire+0x638/0x1b50
      [  309.662383]  ? __mutex_unlock_slowpath+0x3e/0x2e0
      [  309.662388]  ? trace_hardirqs_on+0xd/0x10
      [  309.662392]  ? _rcu_barrier+0x31/0x160
      [  309.662396]  lock_acquire+0xc9/0x220
      [  309.662400]  ? _rcu_barrier+0x31/0x160
      [  309.662404]  ? _rcu_barrier+0x31/0x160
      [  309.662409]  __mutex_lock+0x6e/0x990
      [  309.662412]  ? _rcu_barrier+0x31/0x160
      [  309.662416]  ? _rcu_barrier+0x31/0x160
      [  309.662421]  ? synchronize_rcu_expedited+0x35/0xb0
      [  309.662426]  ? _raw_spin_unlock_irqrestore+0x52/0x60
      [  309.662434]  mutex_lock_nested+0x16/0x20
      [  309.662438]  _rcu_barrier+0x31/0x160
      [  309.662442]  rcu_barrier+0x10/0x20
      [  309.662464]  i915_gem_shrink_all+0x33/0x40 [i915]
      [  309.662484]  i915_drop_caches_set+0x141/0x150 [i915]
      [  309.662489]  simple_attr_write+0xc7/0xe0
      [  309.662494]  full_proxy_write+0x4f/0x70
      [  309.662498]  __vfs_write+0x23/0x120
      [  309.662503]  ? rcu_read_lock_sched_held+0x75/0x80
      [  309.662507]  ? rcu_sync_lockdep_assert+0x2a/0x50
      [  309.662512]  ? __sb_start_write+0x102/0x210
      [  309.662516]  ? vfs_write+0x17d/0x1f0
      [  309.662520]  vfs_write+0xc6/0x1f0
      [  309.662524]  ? trace_hardirqs_on_caller+0xe7/0x200
      [  309.662529]  SyS_write+0x44/0xb0
      [  309.662533]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  309.662537] RIP: 0033:0x7f507eac24a0
      [  309.662541] RSP: 002b:00007fffda8720e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  309.662548] RAX: ffffffffffffffda RBX: ffffffff81482bd3 RCX: 00007f507eac24a0
      [  309.662552] RDX: 0000000000000005 RSI: 00007fffda8720f0 RDI: 0000000000000005
      [  309.662557] RBP: ffffc9000048bf88 R08: 0000000000000000 R09: 000000000000002c
      [  309.662561] R10: 0000000000000014 R11: 0000000000000246 R12: 00007fffda872230
      [  309.662566] R13: 00007fffda872228 R14: 0000000000000201 R15: 00007fffda8720f0
      [  309.662572]  ? __this_cpu_preempt_check+0x13/0x20
      
      Fixes: 0eafec6d ("drm/i915: Enable lockless lookup of request tracking via RCU")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100192Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: <stable@vger.kernel.org> # v4.9+
      Link: http://patchwork.freedesktop.org/patch/msgid/20170314115019.18127-1-chris@chris-wilson.co.ukReviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      (cherry picked from commit bd784b7c)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170321144531.12344-1-chris@chris-wilson.co.uk
      3d3d18f0
  29. 14 3月, 2017 1 次提交
    • C
      drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker) · bd784b7c
      Chris Wilson 提交于
      The rcu_barrier() takes the cpu_hotplug mutex which itself is not
      reclaim-safe, and so rcu_barrier() is illegal from inside the shrinker.
      
      [  309.661373] =========================================================
      [  309.661376] [ INFO: possible irq lock inversion dependency detected ]
      [  309.661380] 4.11.0-rc1-CI-CI_DRM_2333+ #1 Tainted: G        W
      [  309.661383] ---------------------------------------------------------
      [  309.661386] gem_exec_gttfil/6435 just changed the state of lock:
      [  309.661389]  (rcu_preempt_state.barrier_mutex){+.+.-.}, at: [<ffffffff81100731>] _rcu_barrier+0x31/0x160
      [  309.661399] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [  309.661402]  (cpu_hotplug.lock){+.+.+.}
      [  309.661404]
      
                     and interrupts could create inverse lock ordering between them.
      
      [  309.661410]
                     other info that might help us debug this:
      [  309.661414]  Possible interrupt unsafe locking scenario:
      
      [  309.661417]        CPU0                    CPU1
      [  309.661419]        ----                    ----
      [  309.661421]   lock(cpu_hotplug.lock);
      [  309.661425]                                local_irq_disable();
      [  309.661432]                                lock(rcu_preempt_state.barrier_mutex);
      [  309.661441]                                lock(cpu_hotplug.lock);
      [  309.661446]   <Interrupt>
      [  309.661448]     lock(rcu_preempt_state.barrier_mutex);
      [  309.661453]
                      *** DEADLOCK ***
      
      [  309.661460] 4 locks held by gem_exec_gttfil/6435:
      [  309.661464]  #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff8120d83d>] vfs_write+0x17d/0x1f0
      [  309.661475]  #1:  (debugfs_srcu){......}, at: [<ffffffff81320491>] debugfs_use_file_start+0x41/0xa0
      [  309.661486]  #2:  (&attr->mutex){+.+.+.}, at: [<ffffffff8123a3e7>] simple_attr_write+0x37/0xe0
      [  309.661495]  #3:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0091b4a>] i915_drop_caches_set+0x3a/0x150 [i915]
      [  309.661540]
                     the shortest dependencies between 2nd lock and 1st lock:
      [  309.661547]  -> (cpu_hotplug.lock){+.+.+.} ops: 829 {
      [  309.661553]     HARDIRQ-ON-W at:
      [  309.661560]                       __lock_acquire+0x5e5/0x1b50
      [  309.661565]                       lock_acquire+0xc9/0x220
      [  309.661572]                       __mutex_lock+0x6e/0x990
      [  309.661576]                       mutex_lock_nested+0x16/0x20
      [  309.661583]                       get_online_cpus+0x61/0x80
      [  309.661590]                       kmem_cache_create+0x25/0x1d0
      [  309.661596]                       debug_objects_mem_init+0x30/0x249
      [  309.661602]                       start_kernel+0x341/0x3fe
      [  309.661607]                       x86_64_start_reservations+0x2a/0x2c
      [  309.661612]                       x86_64_start_kernel+0x173/0x186
      [  309.661619]                       verify_cpu+0x0/0xfc
      [  309.661622]     SOFTIRQ-ON-W at:
      [  309.661627]                       __lock_acquire+0x611/0x1b50
      [  309.661632]                       lock_acquire+0xc9/0x220
      [  309.661636]                       __mutex_lock+0x6e/0x990
      [  309.661641]                       mutex_lock_nested+0x16/0x20
      [  309.661646]                       get_online_cpus+0x61/0x80
      [  309.661650]                       kmem_cache_create+0x25/0x1d0
      [  309.661655]                       debug_objects_mem_init+0x30/0x249
      [  309.661660]                       start_kernel+0x341/0x3fe
      [  309.661664]                       x86_64_start_reservations+0x2a/0x2c
      [  309.661669]                       x86_64_start_kernel+0x173/0x186
      [  309.661674]                       verify_cpu+0x0/0xfc
      [  309.661677]     RECLAIM_FS-ON-W at:
      [  309.661682]                          mark_held_locks+0x6f/0xa0
      [  309.661687]                          lockdep_trace_alloc+0xb3/0x100
      [  309.661693]                          kmem_cache_alloc_trace+0x31/0x2e0
      [  309.661699]                          __smpboot_create_thread.part.1+0x27/0xe0
      [  309.661704]                          smpboot_create_threads+0x61/0x90
      [  309.661709]                          cpuhp_invoke_callback+0x9c/0x8a0
      [  309.661713]                          cpuhp_up_callbacks+0x31/0xb0
      [  309.661718]                          _cpu_up+0x7a/0xc0
      [  309.661723]                          do_cpu_up+0x5f/0x80
      [  309.661727]                          cpu_up+0xe/0x10
      [  309.661734]                          smp_init+0x71/0xb3
      [  309.661738]                          kernel_init_freeable+0x94/0x19e
      [  309.661743]                          kernel_init+0x9/0xf0
      [  309.661748]                          ret_from_fork+0x2e/0x40
      [  309.661752]     INITIAL USE at:
      [  309.661757]                      __lock_acquire+0x234/0x1b50
      [  309.661761]                      lock_acquire+0xc9/0x220
      [  309.661766]                      __mutex_lock+0x6e/0x990
      [  309.661771]                      mutex_lock_nested+0x16/0x20
      [  309.661775]                      get_online_cpus+0x61/0x80
      [  309.661780]                      __cpuhp_setup_state+0x44/0x170
      [  309.661785]                      page_alloc_init+0x23/0x3a
      [  309.661790]                      start_kernel+0x124/0x3fe
      [  309.661794]                      x86_64_start_reservations+0x2a/0x2c
      [  309.661799]                      x86_64_start_kernel+0x173/0x186
      [  309.661804]                      verify_cpu+0x0/0xfc
      [  309.661807]   }
      [  309.661813]   ... key      at: [<ffffffff81e37690>] cpu_hotplug+0xb0/0x100
      [  309.661817]   ... acquired at:
      [  309.661821]    lock_acquire+0xc9/0x220
      [  309.661825]    __mutex_lock+0x6e/0x990
      [  309.661829]    mutex_lock_nested+0x16/0x20
      [  309.661833]    get_online_cpus+0x61/0x80
      [  309.661837]    _rcu_barrier+0x9f/0x160
      [  309.661841]    rcu_barrier+0x10/0x20
      [  309.661847]    netdev_run_todo+0x5f/0x310
      [  309.661852]    rtnl_unlock+0x9/0x10
      [  309.661856]    default_device_exit_batch+0x133/0x150
      [  309.661862]    ops_exit_list.isra.0+0x4d/0x60
      [  309.661866]    cleanup_net+0x1d8/0x2c0
      [  309.661872]    process_one_work+0x1f4/0x6d0
      [  309.661876]    worker_thread+0x49/0x4a0
      [  309.661881]    kthread+0x107/0x140
      [  309.661884]    ret_from_fork+0x2e/0x40
      
      [  309.661890] -> (rcu_preempt_state.barrier_mutex){+.+.-.} ops: 179 {
      [  309.661896]    HARDIRQ-ON-W at:
      [  309.661901]                     __lock_acquire+0x5e5/0x1b50
      [  309.661905]                     lock_acquire+0xc9/0x220
      [  309.661910]                     __mutex_lock+0x6e/0x990
      [  309.661914]                     mutex_lock_nested+0x16/0x20
      [  309.661919]                     _rcu_barrier+0x31/0x160
      [  309.661923]                     rcu_barrier+0x10/0x20
      [  309.661928]                     netdev_run_todo+0x5f/0x310
      [  309.661932]                     rtnl_unlock+0x9/0x10
      [  309.661936]                     default_device_exit_batch+0x133/0x150
      [  309.661941]                     ops_exit_list.isra.0+0x4d/0x60
      [  309.661946]                     cleanup_net+0x1d8/0x2c0
      [  309.661951]                     process_one_work+0x1f4/0x6d0
      [  309.661955]                     worker_thread+0x49/0x4a0
      [  309.661960]                     kthread+0x107/0x140
      [  309.661964]                     ret_from_fork+0x2e/0x40
      [  309.661968]    SOFTIRQ-ON-W at:
      [  309.661972]                     __lock_acquire+0x611/0x1b50
      [  309.661977]                     lock_acquire+0xc9/0x220
      [  309.661981]                     __mutex_lock+0x6e/0x990
      [  309.661986]                     mutex_lock_nested+0x16/0x20
      [  309.661990]                     _rcu_barrier+0x31/0x160
      [  309.661995]                     rcu_barrier+0x10/0x20
      [  309.661999]                     netdev_run_todo+0x5f/0x310
      [  309.662003]                     rtnl_unlock+0x9/0x10
      [  309.662008]                     default_device_exit_batch+0x133/0x150
      [  309.662013]                     ops_exit_list.isra.0+0x4d/0x60
      [  309.662017]                     cleanup_net+0x1d8/0x2c0
      [  309.662022]                     process_one_work+0x1f4/0x6d0
      [  309.662027]                     worker_thread+0x49/0x4a0
      [  309.662031]                     kthread+0x107/0x140
      [  309.662035]                     ret_from_fork+0x2e/0x40
      [  309.662039]    IN-RECLAIM_FS-W at:
      [  309.662043]                        __lock_acquire+0x638/0x1b50
      [  309.662048]                        lock_acquire+0xc9/0x220
      [  309.662053]                        __mutex_lock+0x6e/0x990
      [  309.662058]                        mutex_lock_nested+0x16/0x20
      [  309.662062]                        _rcu_barrier+0x31/0x160
      [  309.662067]                        rcu_barrier+0x10/0x20
      [  309.662089]                        i915_gem_shrink_all+0x33/0x40 [i915]
      [  309.662109]                        i915_drop_caches_set+0x141/0x150 [i915]
      [  309.662114]                        simple_attr_write+0xc7/0xe0
      [  309.662119]                        full_proxy_write+0x4f/0x70
      [  309.662124]                        __vfs_write+0x23/0x120
      [  309.662128]                        vfs_write+0xc6/0x1f0
      [  309.662133]                        SyS_write+0x44/0xb0
      [  309.662138]                        entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  309.662142]    INITIAL USE at:
      [  309.662147]                    __lock_acquire+0x234/0x1b50
      [  309.662151]                    lock_acquire+0xc9/0x220
      [  309.662156]                    __mutex_lock+0x6e/0x990
      [  309.662160]                    mutex_lock_nested+0x16/0x20
      [  309.662165]                    _rcu_barrier+0x31/0x160
      [  309.662169]                    rcu_barrier+0x10/0x20
      [  309.662174]                    netdev_run_todo+0x5f/0x310
      [  309.662178]                    rtnl_unlock+0x9/0x10
      [  309.662183]                    default_device_exit_batch+0x133/0x150
      [  309.662188]                    ops_exit_list.isra.0+0x4d/0x60
      [  309.662192]                    cleanup_net+0x1d8/0x2c0
      [  309.662197]                    process_one_work+0x1f4/0x6d0
      [  309.662202]                    worker_thread+0x49/0x4a0
      [  309.662206]                    kthread+0x107/0x140
      [  309.662210]                    ret_from_fork+0x2e/0x40
      [  309.662214]  }
      [  309.662220]  ... key      at: [<ffffffff81e4e1c8>] rcu_preempt_state+0x508/0x780
      [  309.662225]  ... acquired at:
      [  309.662229]    check_usage_forwards+0x12b/0x130
      [  309.662233]    mark_lock+0x360/0x6f0
      [  309.662237]    __lock_acquire+0x638/0x1b50
      [  309.662241]    lock_acquire+0xc9/0x220
      [  309.662245]    __mutex_lock+0x6e/0x990
      [  309.662249]    mutex_lock_nested+0x16/0x20
      [  309.662253]    _rcu_barrier+0x31/0x160
      [  309.662257]    rcu_barrier+0x10/0x20
      [  309.662279]    i915_gem_shrink_all+0x33/0x40 [i915]
      [  309.662298]    i915_drop_caches_set+0x141/0x150 [i915]
      [  309.662303]    simple_attr_write+0xc7/0xe0
      [  309.662307]    full_proxy_write+0x4f/0x70
      [  309.662311]    __vfs_write+0x23/0x120
      [  309.662315]    vfs_write+0xc6/0x1f0
      [  309.662319]    SyS_write+0x44/0xb0
      [  309.662323]    entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      [  309.662329]
                     stack backtrace:
      [  309.662335] CPU: 1 PID: 6435 Comm: gem_exec_gttfil Tainted: G        W       4.11.0-rc1-CI-CI_DRM_2333+ #1
      [  309.662342] Hardware name: Hewlett-Packard HP Compaq 8100 Elite SFF PC/304Ah, BIOS 786H1 v01.13 07/14/2011
      [  309.662348] Call Trace:
      [  309.662354]  dump_stack+0x67/0x92
      [  309.662359]  print_irq_inversion_bug.part.19+0x1a4/0x1b0
      [  309.662365]  check_usage_forwards+0x12b/0x130
      [  309.662369]  mark_lock+0x360/0x6f0
      [  309.662374]  ? print_shortest_lock_dependencies+0x1a0/0x1a0
      [  309.662379]  __lock_acquire+0x638/0x1b50
      [  309.662383]  ? __mutex_unlock_slowpath+0x3e/0x2e0
      [  309.662388]  ? trace_hardirqs_on+0xd/0x10
      [  309.662392]  ? _rcu_barrier+0x31/0x160
      [  309.662396]  lock_acquire+0xc9/0x220
      [  309.662400]  ? _rcu_barrier+0x31/0x160
      [  309.662404]  ? _rcu_barrier+0x31/0x160
      [  309.662409]  __mutex_lock+0x6e/0x990
      [  309.662412]  ? _rcu_barrier+0x31/0x160
      [  309.662416]  ? _rcu_barrier+0x31/0x160
      [  309.662421]  ? synchronize_rcu_expedited+0x35/0xb0
      [  309.662426]  ? _raw_spin_unlock_irqrestore+0x52/0x60
      [  309.662434]  mutex_lock_nested+0x16/0x20
      [  309.662438]  _rcu_barrier+0x31/0x160
      [  309.662442]  rcu_barrier+0x10/0x20
      [  309.662464]  i915_gem_shrink_all+0x33/0x40 [i915]
      [  309.662484]  i915_drop_caches_set+0x141/0x150 [i915]
      [  309.662489]  simple_attr_write+0xc7/0xe0
      [  309.662494]  full_proxy_write+0x4f/0x70
      [  309.662498]  __vfs_write+0x23/0x120
      [  309.662503]  ? rcu_read_lock_sched_held+0x75/0x80
      [  309.662507]  ? rcu_sync_lockdep_assert+0x2a/0x50
      [  309.662512]  ? __sb_start_write+0x102/0x210
      [  309.662516]  ? vfs_write+0x17d/0x1f0
      [  309.662520]  vfs_write+0xc6/0x1f0
      [  309.662524]  ? trace_hardirqs_on_caller+0xe7/0x200
      [  309.662529]  SyS_write+0x44/0xb0
      [  309.662533]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  309.662537] RIP: 0033:0x7f507eac24a0
      [  309.662541] RSP: 002b:00007fffda8720e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  309.662548] RAX: ffffffffffffffda RBX: ffffffff81482bd3 RCX: 00007f507eac24a0
      [  309.662552] RDX: 0000000000000005 RSI: 00007fffda8720f0 RDI: 0000000000000005
      [  309.662557] RBP: ffffc9000048bf88 R08: 0000000000000000 R09: 000000000000002c
      [  309.662561] R10: 0000000000000014 R11: 0000000000000246 R12: 00007fffda872230
      [  309.662566] R13: 00007fffda872228 R14: 0000000000000201 R15: 00007fffda8720f0
      [  309.662572]  ? __this_cpu_preempt_check+0x13/0x20
      
      Fixes: 0eafec6d ("drm/i915: Enable lockless lookup of request tracking via RCU")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100192Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: <stable@vger.kernel.org> # v4.9+
      Link: http://patchwork.freedesktop.org/patch/msgid/20170314115019.18127-1-chris@chris-wilson.co.ukReviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bd784b7c