1. 28 1月, 2020 1 次提交
    • C
      drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutex · e5429340
      Chris Wilson 提交于
      Similar to commit ac0e331a ("drm/i915: Tighten atomicity of
      i915_active_acquire vs i915_active_release") we have the same race of
      trying to pin the context underneath a mutex while allowing the
      decrement to be atomic outside of that mutex. This leads to the problem
      where two threads may simultaneously try to pin the context and the
      second not notice that they needed to repin the context.
      
      <2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387!
      <4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      <4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G     U  W         5.5.0-rc6-CI-CI_DRM_7755+ #1
      <4> [198.669723] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
      <4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915]
      <4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48
      <4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296
      <4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006
      <4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009
      <4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000
      <4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980
      <4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068
      <4> [198.669849] FS:  00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000
      <4> [198.669857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0
      <4> [198.669872] Call Trace:
      <4> [198.669924]  intel_timeline_get_seqno+0x12/0x40 [i915]
      <4> [198.669977]  __i915_request_create+0x76/0x5a0 [i915]
      <4> [198.670024]  i915_request_create+0x86/0x1c0 [i915]
      <4> [198.670068]  i915_gem_do_execbuffer+0xbf2/0x2500 [i915]
      <4> [198.670082]  ? __lock_acquire+0x460/0x15d0
      <4> [198.670128]  i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915]
      <4> [198.670171]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
      <4> [198.670181]  drm_ioctl_kernel+0xa7/0xf0
      <4> [198.670188]  drm_ioctl+0x2e1/0x390
      <4> [198.670233]  ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915]
      
      Fixes: 84135022 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin")
      References: ac0e331a ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk
      e5429340
  2. 10 1月, 2020 2 次提交
  3. 09 1月, 2020 3 次提交
  4. 06 1月, 2020 2 次提交
  5. 22 12月, 2019 1 次提交
  6. 20 12月, 2019 2 次提交
  7. 14 12月, 2019 1 次提交
  8. 05 12月, 2019 1 次提交
  9. 04 12月, 2019 1 次提交
  10. 28 11月, 2019 1 次提交
    • C
      drm/i915: Serialise i915_active_fence_set() with itself · df9f85d8
      Chris Wilson 提交于
      The expected downside to commit 58b4c1a0 ("drm/i915: Reduce nested
      prepare_remote_context() to a trylock") was that it would need to return
      -EAGAIN to userspace in order to resolve potential mutex inversion. Such
      an unsightly round trip is unnecessary if we could atomically insert a
      barrier into the i915_active_fence, so make it happen.
      
      Currently, we use the timeline->mutex (or some other named outer lock)
      to order insertion into the i915_active_fence (and so individual nodes
      of i915_active). Inside __i915_active_fence_set, we only need then
      serialise with the interrupt handler in order to claim the timeline for
      ourselves.
      
      However, if we remove the outer lock, we need to ensure the order is
      intact between not only multiple threads trying to insert themselves
      into the timeline, but also with the interrupt handler completing the
      previous occupant. We use xchg() on insert so that we have an ordered
      sequence of insertions (and each caller knows the previous fence on
      which to wait, preserving the chain of all fences in the timeline), but
      we then have to cmpxchg() in the interrupt handler to avoid overwriting
      the new occupant. The only nasty side-effect is having to temporarily
      strip off the RCU-annotations to apply the atomic operations, otherwise
      the rules are much more conventional!
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402
      Fixes: 58b4c1a0 ("drm/i915: Reduce nested prepare_remote_context() to a trylock")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
      df9f85d8
  11. 27 11月, 2019 1 次提交
  12. 26 11月, 2019 1 次提交
  13. 24 10月, 2019 1 次提交
  14. 08 10月, 2019 1 次提交
  15. 04 10月, 2019 3 次提交
  16. 20 9月, 2019 1 次提交
    • C
      drm/i915: Mark i915_request.timeline as a volatile, rcu pointer · d19d71fc
      Chris Wilson 提交于
      The request->timeline is only valid until the request is retired (i.e.
      before it is completed). Upon retiring the request, the context may be
      unpinned and freed, and along with it the timeline may be freed. We
      therefore need to be very careful when chasing rq->timeline that the
      pointer does not disappear beneath us. The vast majority of users are in
      a protected context, either during request construction or retirement,
      where the timeline->mutex is held and the timeline cannot disappear. It
      is those few off the beaten path (where we access a second timeline) that
      need extra scrutiny -- to be added in the next patch after first adding
      the warnings about dangerous access.
      
      One complication, where we cannot use the timeline->mutex itself, is
      during request submission onto hardware (under spinlocks). Here, we want
      to check on the timeline to finalize the breadcrumb, and so we need to
      impose a second rule to ensure that the request->timeline is indeed
      valid. As we are submitting the request, it's context and timeline must
      be pinned, as it will be used by the hardware. Since it is pinned, we
      know the request->timeline must still be valid, and we cannot submit the
      idle barrier until after we release the engine->active.lock, ergo while
      submitting and holding that spinlock, a second thread cannot release the
      timeline.
      
      v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
      as we need it, and tidy up acquiring the timeline with a bit of
      refactoring (i915_active_add_request)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
      d19d71fc
  17. 11 9月, 2019 1 次提交
  18. 17 8月, 2019 1 次提交
  19. 16 8月, 2019 1 次提交
  20. 10 8月, 2019 3 次提交
  21. 03 8月, 2019 1 次提交
    • C
      drm/i915: Hide unshrinkable context objects from the shrinker · 1aff1903
      Chris Wilson 提交于
      The shrinker cannot touch objects used by the contexts (logical state
      and ring). Currently we mark those as "pin_global" to let the shrinker
      skip over them, however, if we remove them from the shrinker lists
      entirely, we don't event have to include them in our shrink accounting.
      
      By keeping the unshrinkable objects in our shrinker tracking, we report
      a large number of objects available to be shrunk, and leave the shrinker
      deeply unsatisfied when we fail to reclaim those. The shrinker will
      persist in trying to reclaim the unavailable objects, forcing the system
      into a livelock (not even hitting the dread oomkiller).
      
      v2: Extend unshrinkable protection for perma-pinned scratch and guc
      allocations (Tvrtko)
      v3: Notice that we should be pinned when marking unshrinkable and so the
      link cannot be empty; merge duplicate paths.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190802212137.22207-1-chris@chris-wilson.co.uk
      1aff1903
  22. 02 8月, 2019 1 次提交
  23. 30 7月, 2019 1 次提交
  24. 29 7月, 2019 1 次提交
  25. 27 7月, 2019 1 次提交
  26. 23 7月, 2019 1 次提交
  27. 17 7月, 2019 1 次提交
  28. 26 6月, 2019 1 次提交
  29. 22 6月, 2019 1 次提交
  30. 20 6月, 2019 1 次提交
  31. 19 6月, 2019 1 次提交
    • C
      drm/i915: Make the semaphore saturation mask global · 44d89409
      Chris Wilson 提交于
      The idea behind keeping the saturation mask local to a context backfired
      spectacularly. The premise with the local mask was that we would be more
      proactive in attempting to use semaphores after each time the context
      idled, and that all new contexts would attempt to use semaphores
      ignoring the current state of the system. This turns out to be horribly
      optimistic. If the system state is still oversaturated and the existing
      workloads have all stopped using semaphores, the new workloads would
      attempt to use semaphores and be deprioritised behind real work. The
      new contexts would not switch off using semaphores until their initial
      batch of low priority work had completed. Given sufficient backload load
      of equal user priority, this would completely starve the new work of any
      GPU time.
      
      To compensate, remove the local tracking in favour of keeping it as
      global state on the engine -- once the system is saturated and
      semaphores are disabled, everyone stops attempting to use semaphores
      until the system is idle again. One of the reason for preferring local
      context tracking was that it worked with virtual engines, so for
      switching to global state we could either do a complete check of all the
      virtual siblings or simply disable semaphores for those requests. This
      takes the simpler approach of disabling semaphores on virtual engines.
      
      The downside is that the decision that the engine is saturated is a
      local measure -- we are only checking whether or not this context was
      scheduled in a timely fashion, it may be legitimately delayed due to user
      priorities. We still have the same dilemma though, that we do not want
      to employ the semaphore poll unless it will be used.
      
      v2: Explain why we need to assume the worst wrt virtual engines.
      
      Fixes: ca6e56f6 ("drm/i915: Disable semaphore busywaits on saturated systems")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk
      44d89409