1. 23 3月, 2020 1 次提交
    • C
      drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period · 8e87e013
      Chris Wilson 提交于
      Since we take advantage of RCU for some i915_active objects, like the
      intel_timeline_cacheline, we need to delay the i915_active_fini until
      after the RCU grace period and we perform the kfree -- that is until
      after all RCU protected readers.
      
      <3> [108.204873] ODEBUG: assert_init not available (active state 0) object type: i915_active hint: __cacheline_active+0x0/0x80 [i915]
      <4> [108.207377] WARNING: CPU: 3 PID: 2342 at lib/debugobjects.c:488 debug_print_object+0x67/0x90
      <4> [108.207400] Modules linked in: vgem snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec ax88179_178a snd_hwdep usbnet btusb snd_hda_core btrtl mii btbcm btintel snd_pcm bluetooth ecdh_generic ecc i915 i2c_hid pinctrl_sunrisepoint pinctrl_intel intel_lpss_pci prime_numbers
      <4> [108.207587] CPU: 3 PID: 2342 Comm: gem_exec_parall Tainted: G     U            5.6.0-rc6-CI-Patchwork_17047+ #1
      <4> [108.207609] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
      <4> [108.207639] RIP: 0010:debug_print_object+0x67/0x90
      <4> [108.207668] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 87 d2 8a 02 8b 53 10 4c 89 e6 48 c7 c7 38 2b 32 82 48 8b 14 d5 80 2f 07 82 e8 49 d5 b7 ff <0f> 0b 5b 83 05 c3 f6 22 01 01 5d 41 5c c3 83 05 b8 f6 22 01 01 c3
      <4> [108.207692] RSP: 0018:ffffc90000e7f890 EFLAGS: 00010282
      <4> [108.207723] RAX: 0000000000000000 RBX: ffffc90000e7f8b0 RCX: 0000000000000001
      <4> [108.207747] RDX: 0000000080000001 RSI: ffff88817ada8cb8 RDI: 00000000ffffffff
      <4> [108.207770] RBP: ffffffffa0341cc0 R08: ffff88816b5a8948 R09: 0000000000000000
      <4> [108.207792] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82322d54
      <4> [108.207814] R13: ffffffffa0341cc0 R14: ffffffff83df9568 R15: ffff88816064f400
      <4> [108.207839] FS:  00007f437d753700(0000) GS:ffff88817ad80000(0000) knlGS:0000000000000000
      <4> [108.207863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [108.207887] CR2: 00007f2ad1fb5000 CR3: 00000001725d8004 CR4: 00000000003606e0
      <4> [108.207907] Call Trace:
      <4> [108.207959]  debug_object_assert_init+0x15c/0x180
      <4> [108.208475]  ? i915_active_acquire_if_busy+0x10/0x50 [i915]
      <4> [108.208513]  ? rcu_read_lock_held+0x4d/0x60
      <4> [108.208970]  i915_active_acquire_if_busy+0x10/0x50 [i915]
      <4> [108.209380]  intel_timeline_read_hwsp+0x81/0x540 [i915]
      <4> [108.210262]  __emit_semaphore_wait+0x45/0x1b0 [i915]
      <4> [108.210726]  ? i915_request_await_dma_fence+0x143/0x560 [i915]
      <4> [108.211156]  i915_request_await_dma_fence+0x28a/0x560 [i915]
      <4> [108.211633]  i915_request_await_object+0x24a/0x3f0 [i915]
      <4> [108.212102]  eb_submit.isra.47+0x58f/0x920 [i915]
      <4> [108.212622]  i915_gem_do_execbuffer+0x1706/0x2c70 [i915]
      <4> [108.213071]  ? i915_gem_execbuffer2_ioctl+0xc0/0x470 [i915]
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-1-chris@chris-wilson.co.uk
      8e87e013
  2. 07 3月, 2020 1 次提交
  3. 03 2月, 2020 1 次提交
  4. 31 1月, 2020 1 次提交
  5. 09 1月, 2020 1 次提交
  6. 18 12月, 2019 1 次提交
  7. 14 12月, 2019 1 次提交
  8. 28 11月, 2019 1 次提交
    • C
      drm/i915: Serialise i915_active_fence_set() with itself · df9f85d8
      Chris Wilson 提交于
      The expected downside to commit 58b4c1a0 ("drm/i915: Reduce nested
      prepare_remote_context() to a trylock") was that it would need to return
      -EAGAIN to userspace in order to resolve potential mutex inversion. Such
      an unsightly round trip is unnecessary if we could atomically insert a
      barrier into the i915_active_fence, so make it happen.
      
      Currently, we use the timeline->mutex (or some other named outer lock)
      to order insertion into the i915_active_fence (and so individual nodes
      of i915_active). Inside __i915_active_fence_set, we only need then
      serialise with the interrupt handler in order to claim the timeline for
      ourselves.
      
      However, if we remove the outer lock, we need to ensure the order is
      intact between not only multiple threads trying to insert themselves
      into the timeline, but also with the interrupt handler completing the
      previous occupant. We use xchg() on insert so that we have an ordered
      sequence of insertions (and each caller knows the previous fence on
      which to wait, preserving the chain of all fences in the timeline), but
      we then have to cmpxchg() in the interrupt handler to avoid overwriting
      the new occupant. The only nasty side-effect is having to temporarily
      strip off the RCU-annotations to apply the atomic operations, otherwise
      the rules are much more conventional!
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402
      Fixes: 58b4c1a0 ("drm/i915: Reduce nested prepare_remote_context() to a trylock")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
      df9f85d8
  9. 25 11月, 2019 3 次提交
    • C
      drm/i915/gt: Schedule request retirement when timeline idles · 31177017
      Chris Wilson 提交于
      The major drawback of commit 7e34f4e4 ("drm/i915/gen8+: Add RC6 CTX
      corruption WA") is that it disables RC6 while Skylake (and friends) is
      active, and we do not consider the GPU idle until all outstanding
      requests have been retired and the engine switched over to the kernel
      context. If userspace is idle, this task falls onto our background idle
      worker, which only runs roughly once a second, meaning that userspace has
      to have been idle for a couple of seconds before we enable RC6 again.
      Naturally, this causes us to consume considerably more energy than
      before as powersaving is effectively disabled while a display server
      (here's looking at you Xorg) is running.
      
      As execlists will get a completion event as each context is completed,
      we can use this interrupt to queue a retire worker bound to this engine
      to cleanup idle timelines. We will then immediately notice the idle
      engine (without userspace intervention or the aid of the background
      retire worker) and start parking the GPU. Thus during light workloads,
      we will do much more work to idle the GPU faster...  Hopefully with
      commensurate power saving!
      
      v2: Watch context completions and only look at those local to the engine
      when retiring to reduce the amount of excess work we perform.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
      References: 7e34f4e4 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
      References: 2248a283 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
      (cherry picked from commit 4f88f874)
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      31177017
    • C
      drm/i915/gt: Close race between engine_park and intel_gt_retire_requests · ca1711d1
      Chris Wilson 提交于
      The general concept was that intel_timeline.active_count was locked by
      the intel_timeline.mutex. The exception was for power management, where
      the engine->kernel_context->timeline could be manipulated under the
      global wakeref.mutex.
      
      This was quite solid, as we always manipulated the timeline only while
      we held an engine wakeref.
      
      And then we started retiring requests outside of struct_mutex, only
      using the timelines.active_list and the timeline->mutex. There we
      started manipulating intel_timeline.active_count outside of an engine
      wakeref, and so introduced a race between __engine_park() and
      intel_gt_retire_requests(), a race that could result in the
      engine->kernel_context not being added to the active timelines and so
      losing requests, which caused us to keep the system permanently powered
      up [and unloadable].
      
      The race would be easy to close if we could take the engine wakeref for
      the timeline before we retire -- except timelines are not bound to any
      engine and so we would need to keep all active engines awake. The
      alternative is to guard intel_timeline_enter/intel_timeline_exit for use
      outside of the timeline->mutex.
      
      Fixes: e5dadff4 ("drm/i915: Protect request retirement with timeline->mutex")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk
      (cherry picked from commit a6edbca7)
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      ca1711d1
    • C
      drm/i915/gt: Schedule request retirement when timeline idles · 4f88f874
      Chris Wilson 提交于
      The major drawback of commit 7e34f4e4 ("drm/i915/gen8+: Add RC6 CTX
      corruption WA") is that it disables RC6 while Skylake (and friends) is
      active, and we do not consider the GPU idle until all outstanding
      requests have been retired and the engine switched over to the kernel
      context. If userspace is idle, this task falls onto our background idle
      worker, which only runs roughly once a second, meaning that userspace has
      to have been idle for a couple of seconds before we enable RC6 again.
      Naturally, this causes us to consume considerably more energy than
      before as powersaving is effectively disabled while a display server
      (here's looking at you Xorg) is running.
      
      As execlists will get a completion event as each context is completed,
      we can use this interrupt to queue a retire worker bound to this engine
      to cleanup idle timelines. We will then immediately notice the idle
      engine (without userspace intervention or the aid of the background
      retire worker) and start parking the GPU. Thus during light workloads,
      we will do much more work to idle the GPU faster...  Hopefully with
      commensurate power saving!
      
      v2: Watch context completions and only look at those local to the engine
      when retiring to reduce the amount of excess work we perform.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
      References: 7e34f4e4 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
      References: 2248a283 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
      4f88f874
  10. 21 11月, 2019 2 次提交
  11. 20 11月, 2019 1 次提交
  12. 01 11月, 2019 1 次提交
  13. 24 10月, 2019 1 次提交
  14. 04 10月, 2019 2 次提交
  15. 20 9月, 2019 2 次提交
    • C
      drm/i915: Protect timeline->hwsp dereferencing · 9eee0dd7
      Chris Wilson 提交于
      As not only is the signal->timeline volatile, so will be acquiring the
      timeline's HWSP. We must first carefully acquire the timeline from the
      signaling request and then lock the timeline. With the removal of the
      struct_mutex serialisation of request construction, we can have multiple
      timelines active at once, and so we must avoid using the nested mutex
      lock as it is quite possible for both timelines to be establishing
      semaphores on the other and so deadlock.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-3-chris@chris-wilson.co.uk
      9eee0dd7
    • C
      drm/i915: Mark i915_request.timeline as a volatile, rcu pointer · d19d71fc
      Chris Wilson 提交于
      The request->timeline is only valid until the request is retired (i.e.
      before it is completed). Upon retiring the request, the context may be
      unpinned and freed, and along with it the timeline may be freed. We
      therefore need to be very careful when chasing rq->timeline that the
      pointer does not disappear beneath us. The vast majority of users are in
      a protected context, either during request construction or retirement,
      where the timeline->mutex is held and the timeline cannot disappear. It
      is those few off the beaten path (where we access a second timeline) that
      need extra scrutiny -- to be added in the next patch after first adding
      the warnings about dangerous access.
      
      One complication, where we cannot use the timeline->mutex itself, is
      during request submission onto hardware (under spinlocks). Here, we want
      to check on the timeline to finalize the breadcrumb, and so we need to
      impose a second rule to ensure that the request->timeline is indeed
      valid. As we are submitting the request, it's context and timeline must
      be pinned, as it will be used by the hardware. Since it is pinned, we
      know the request->timeline must still be valid, and we cannot submit the
      idle barrier until after we release the engine->active.lock, ergo while
      submitting and holding that spinlock, a second thread cannot release the
      timeline.
      
      v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
      as we need it, and tidy up acquiring the timeline with a bit of
      refactoring (i915_active_add_request)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
      d19d71fc
  16. 07 9月, 2019 1 次提交
  17. 24 8月, 2019 1 次提交
  18. 17 8月, 2019 2 次提交
  19. 16 8月, 2019 3 次提交
  20. 26 6月, 2019 2 次提交
  21. 22 6月, 2019 2 次提交
  22. 21 6月, 2019 4 次提交
  23. 15 6月, 2019 1 次提交
  24. 06 6月, 2019 1 次提交
  25. 09 4月, 2019 1 次提交
  26. 21 3月, 2019 1 次提交
  27. 02 3月, 2019 1 次提交
    • C
      drm/i915: Keep timeline HWSP allocated until idle across the system · ebece753
      Chris Wilson 提交于
      In preparation for enabling HW semaphores, we need to keep in flight
      timeline HWSP alive until its use across entire system has completed,
      as any other timeline active on the GPU may still refer back to the
      already retired timeline. We both have to delay recycling available
      cachelines and unpinning old HWSP until the next idle point.
      
      An easy option would be to simply keep all used HWSP until the system as
      a whole was idle, i.e. we could release them all at once on parking.
      However, on a busy system, we may never see a global idle point,
      essentially meaning the resource will be leaked until we are forced to
      do a GC pass. We already employ a fine-grained idle detection mechanism
      for vma, which we can reuse here so that each cacheline can be freed
      immediately after the last request using it is retired.
      
      v3: Keep track of the activity of each cacheline.
      v4: cacheline_free() on canceling the seqno tracking
      v5: Finally with a testcase to exercise wraparound
      v6: Pack cacheline into empty bits of page-aligned vaddr
      v7: Use i915_utils to hide the pointer casting around bit manipulation
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-2-chris@chris-wilson.co.uk
      ebece753