1. 23 2月, 2022 1 次提交
  2. 15 11月, 2021 1 次提交
    • M
      drm/i915/request: fix early tracepoints · 00bc1252
      Matthew Auld 提交于
      stable inclusion
      from stable-5.10.71
      commit d35d95e8b9da638d27bce9552262e0c486138343
      bugzilla: 182981 https://gitee.com/openeuler/kernel/issues/I4I3KD
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d35d95e8b9da638d27bce9552262e0c486138343
      
      --------------------------------
      
      [ Upstream commit c83ff018 ]
      
      Currently we blow up in trace_dma_fence_init, when calling into
      get_driver_name or get_timeline_name, since both the engine and context
      might be NULL(or contain some garbage address) in the case of newly
      allocated slab objects via the request ctor. Note that we also use
      SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately
      freed, but delay freeing the underlying page by an RCU grace period.
      With this scheme requests can be re-allocated, at the same time as they
      are also being read by some lockless RCU lookup mechanism.
      
      In the ctor case, which is only called for new slab objects(i.e allocate
      new page and call the ctor for each object) it's safe to reset the
      context/engine prior to calling into dma_fence_init, since we can be
      certain that no one is doing an RCU lookup which might depend on peeking
      at the engine/context, like in active_engine(), since the object can't
      yet be externally visible.
      
      In the recycled case(which might also be externally visible) the request
      refcount always transitions from 0->1 after we set the context/engine
      etc, which should ensure it's valid to dereference the engine for
      example, when doing an RCU list-walk, so long as we can also increment
      the refcount first. If the refcount is already zero, then the request is
      considered complete/released.  If it's non-zero, then the request might
      be in the process of being re-allocated, or potentially still in flight,
      however after successfully incrementing the refcount, it's possible to
      carefully inspect the request state, to determine if the request is
      still what we were looking for. Note that all externally visible
      requests returned to the cache must have zero refcount.
      
      One possible fix then is to move dma_fence_init out from the request
      ctor. Originally this was how it was done, but it was moved in:
      
      commit 855e39e6
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Feb 3 09:41:48 2020 +0000
      
          drm/i915: Initialise basic fence before acquiring seqno
      
      where it looks like intel_timeline_get_seqno() relied on some of the
      rq->fence state, but that is no longer the case since:
      
      commit 12ca695d
      Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Date:   Tue Mar 23 16:49:50 2021 +0100
      
          drm/i915: Do not share hwsp across contexts any more, v8.
      
      intel_timeline_get_seqno() could also be cleaned up slightly by dropping
      the request argument.
      
      Moving dma_fence_init back out of the ctor, should ensure we have enough
      of the request initialised in case of trace_dma_fence_init.
      Functionally this should be the same, and is effectively what we were
      already open coding before, except now we also assign the fence->lock
      and fence->ops, but since these are invariant for recycled
      requests(which might be externally visible), and will therefore already
      hold the same value, it shouldn't matter.
      
      An alternative fix, since we don't yet have a fully initialised request
      when in the ctor, is just setting the context/engine as NULL, but this
      does require adding some extra handling in get_driver_name etc.
      
      v2(Daniel):
        - Try to make the commit message less confusing
      
      Fixes: 855e39e6 ("drm/i915: Initialise basic fence before acquiring seqno")
      Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
      Cc: Michael Mason <michael.w.mason@intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210921134202.3803151-1-matthew.auld@intel.com
      (cherry picked from commit be988eae)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      00bc1252
  3. 15 10月, 2021 1 次提交
  4. 01 10月, 2020 2 次提交
  5. 16 9月, 2020 1 次提交
    • C
      drm/i915: Be wary of data races when reading the active execlists · b82a8b93
      Chris Wilson 提交于
      To implement preempt-to-busy (and so efficient timeslicing and best utilization
      of the hardware submission ports) we let the GPU run asynchronously in respect
      to the ELSP submission queue. This created challenges in keeping and accessing
      the driver state mirroring the asynchronous GPU execution.
      
      The latest occurence of this was spotted by KCSAN:
      
      [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
      [ 1413.563221]
      [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
      [ 1413.563548]  __await_execution+0x217/0x370 [i915]
      [ 1413.563891]  i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
      [ 1413.564235]  i915_request_await_object+0x421/0x490 [i915]
      [ 1413.564577]  i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
      [ 1413.564967]  i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
      [ 1413.564998]  drm_ioctl_kernel+0x156/0x1b0
      [ 1413.565022]  drm_ioctl+0x2ff/0x480
      [ 1413.565046]  __x64_sys_ioctl+0x87/0xd0
      [ 1413.565069]  do_syscall_64+0x4d/0x80
      [ 1413.565094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      To complicate matters, we have to both avoid the read tearing of *active and
      avoid any write tearing as perform the pending[] -> inflight[] promotion of the
      execlists.
      
      This is because we cannot rely on the memcpy doing u64 aligned copies on all
      kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
      annotations to satisfy KCSAN.
      
      v2: When in doubt, write the same comment again.
      v3: Expanded commit message.
      
      Fixes: b55230e5 ("drm/i915: Check for awaits on still currently executing requests")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.ukSigned-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      [Joonas: Rebased and reordered into drm-intel-gt-next branch]
      [Joonas: Added expanded commit message from Tvrtko and Chris]
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit b4d9145b)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      b82a8b93
  6. 07 9月, 2020 9 次提交
  7. 18 8月, 2020 1 次提交
  8. 14 7月, 2020 1 次提交
    • C
      drm/i915: Skip signaling a signaled request · 1d9221e9
      Chris Wilson 提交于
      Preempt-to-busy introduces various fascinating complications in that the
      requests may complete as we are unsubmitting them from HW. As they may
      then signal after unsubmission, we may find ourselves having to cleanup
      the signaling request from within the signaling callback. This causes us
      to recurse onto the same i915_request.lock.
      
      However, if the request is already signaled (as it will be before we
      enter the signal callbacks), we know we can skip the signaling of that
      request during submission, neatly evading the spinlock recursion.
      
      unsubmit(ve.rq0) # timeslice expiration or other preemption
       -> virtual_submit_request(ve.rq0)
      dma_fence_signal(ve.rq0) # request completed before preemption ack
       -> submit_notify(ve.rq1)
         -> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0
            -> __i915_request_submit(ve.rq0)
      
      [  264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093
      [  264.210150]  lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2
      [  264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G     U
      [  264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019
      [  264.210160] Call Trace:
      [  264.210167]  dump_stack+0x98/0xda
      [  264.210174]  spin_dump.cold+0x24/0x3c
      [  264.210178]  do_raw_spin_lock+0x9a/0xd0
      [  264.210184]  _raw_spin_lock_nested+0x6a/0x70
      [  264.210314]  __i915_request_submit+0x10a/0x3c0 [i915]
      [  264.210415]  virtual_submit_request+0x9b/0x380 [i915]
      [  264.210516]  submit_notify+0xaf/0x14c [i915]
      [  264.210602]  __i915_sw_fence_complete+0x8a/0x230 [i915]
      [  264.210692]  i915_sw_fence_complete+0x2d/0x40 [i915]
      [  264.210762]  __dma_i915_sw_fence_wake+0x19/0x30 [i915]
      [  264.210767]  dma_fence_signal_locked+0xb1/0x1c0
      [  264.210772]  dma_fence_signal+0x29/0x50
      [  264.210871]  i915_request_wait+0x5cb/0x830 [i915]
      [  264.210876]  ? dma_resv_get_fences_rcu+0x294/0x5d0
      [  264.210974]  i915_gem_object_wait_fence+0x2f/0x40 [i915]
      [  264.211084]  i915_gem_object_wait+0xce/0x400 [i915]
      [  264.211178]  i915_gem_wait_ioctl+0xff/0x290 [i915]
      
      Fixes: 22b7a426 ("drm/i915/execlists: Preempt-to-busy")
      References: 6d06779e ("drm/i915: Load balancing across a virtual engine")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
      Cc: <stable@vger.kernel.org> # v5.4+
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200713141636.29326-1-chris@chris-wilson.co.uk
      1d9221e9
  9. 03 6月, 2020 1 次提交
  10. 01 6月, 2020 3 次提交
  11. 30 5月, 2020 1 次提交
    • C
      drm/i915: Check for awaits on still currently executing requests · b55230e5
      Chris Wilson 提交于
      With the advent of preempt-to-busy, a request may still be on the GPU as
      we unwind. And in the case of a unpreemptible [due to HW] request, that
      request will remain indefinitely on the GPU even though we have
      returned it back to our submission queue, and cleared the active bit.
      
      We only run the execution callbacks on transferring the request from our
      submission queue to the execution queue, but if this is a bonded request
      that the HW is waiting for, we will not submit it (as we wait for a
      fresh execution) even though it is still being executed.
      
      As we know that there are always preemption points between requests, we
      know that only the currently executing request may be still active even
      though we have cleared the flag. However, we do not precisely know which
      request is in ELSP[0] due to a delay in processing events, and
      furthermore we only store the last request in a context in our state
      tracker.
      
      Fixes: 22b7a426 ("drm/i915/execlists: Preempt-to-busy")
      Testcase: igt/gem_exec_balancer/bonded-dual
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
      b55230e5
  12. 29 5月, 2020 1 次提交
  13. 27 5月, 2020 2 次提交
  14. 26 5月, 2020 1 次提交
  15. 25 5月, 2020 1 次提交
  16. 22 5月, 2020 1 次提交
  17. 14 5月, 2020 2 次提交
  18. 12 5月, 2020 2 次提交
  19. 09 5月, 2020 3 次提交
  20. 08 5月, 2020 4 次提交
  21. 07 5月, 2020 1 次提交