1. 26 3月, 2021 1 次提交
  2. 24 3月, 2021 1 次提交
    • M
      drm/i915: Do not share hwsp across contexts any more, v8. · 12ca695d
      Maarten Lankhorst 提交于
      Instead of sharing pages with breadcrumbs, give each timeline a
      single page. This allows unrelated timelines not to share locks
      any more during command submission.
      
      As an additional benefit, seqno wraparound no longer requires
      i915_vma_pin, which means we no longer need to worry about a
      potential -EDEADLK at a point where we are ready to submit.
      
      Changes since v1:
      - Fix erroneous i915_vma_acquire that should be a i915_vma_release (ickle).
      - Extra check for completion in intel_read_hwsp().
      Changes since v2:
      - Fix inconsistent indent in hwsp_alloc() (kbuild)
      - memset entire cacheline to 0.
      Changes since v3:
      - Do same in intel_timeline_reset_seqno(), and clflush for good measure.
      Changes since v4:
      - Use refcounting on timeline, instead of relying on i915_active.
      - Fix waiting on kernel requests.
      Changes since v5:
      - Bump amount of slots to maximum (256), for best wraparounds.
      - Add hwsp_offset to i915_request to fix potential wraparound hang.
      - Ensure timeline wrap test works with the changes.
      - Assign hwsp in intel_timeline_read_hwsp() within the rcu lock to
        fix a hang.
      Changes since v6:
      - Rename i915_request_active_offset to i915_request_active_seqno(),
        and elaborate the function. (tvrtko)
      Changes since v7:
      - Move hunk to where it belongs. (jekstrand)
      - Replace CACHELINE_BYTES with TIMELINE_SEQNO_BYTES. (jekstrand)
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com> #v1
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210323155059.628690-2-maarten.lankhorst@linux.intel.com
      12ca695d
  3. 15 1月, 2021 3 次提交
  4. 10 1月, 2021 1 次提交
  5. 31 12月, 2020 3 次提交
  6. 24 12月, 2020 1 次提交
  7. 21 12月, 2020 1 次提交
  8. 16 12月, 2020 1 次提交
  9. 27 11月, 2020 1 次提交
  10. 24 11月, 2020 1 次提交
  11. 20 11月, 2020 2 次提交
  12. 13 10月, 2020 1 次提交
  13. 01 10月, 2020 2 次提交
  14. 29 9月, 2020 1 次提交
  15. 26 9月, 2020 1 次提交
  16. 16 9月, 2020 1 次提交
    • C
      drm/i915: Be wary of data races when reading the active execlists · b82a8b93
      Chris Wilson 提交于
      To implement preempt-to-busy (and so efficient timeslicing and best utilization
      of the hardware submission ports) we let the GPU run asynchronously in respect
      to the ELSP submission queue. This created challenges in keeping and accessing
      the driver state mirroring the asynchronous GPU execution.
      
      The latest occurence of this was spotted by KCSAN:
      
      [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915]
      [ 1413.563221]
      [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1:
      [ 1413.563548]  __await_execution+0x217/0x370 [i915]
      [ 1413.563891]  i915_request_await_dma_fence+0x4eb/0x6a0 [i915]
      [ 1413.564235]  i915_request_await_object+0x421/0x490 [i915]
      [ 1413.564577]  i915_gem_do_execbuffer+0x29b7/0x3c40 [i915]
      [ 1413.564967]  i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915]
      [ 1413.564998]  drm_ioctl_kernel+0x156/0x1b0
      [ 1413.565022]  drm_ioctl+0x2ff/0x480
      [ 1413.565046]  __x64_sys_ioctl+0x87/0xd0
      [ 1413.565069]  do_syscall_64+0x4d/0x80
      [ 1413.565094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      To complicate matters, we have to both avoid the read tearing of *active and
      avoid any write tearing as perform the pending[] -> inflight[] promotion of the
      execlists.
      
      This is because we cannot rely on the memcpy doing u64 aligned copies on all
      kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE
      annotations to satisfy KCSAN.
      
      v2: When in doubt, write the same comment again.
      v3: Expanded commit message.
      
      Fixes: b55230e5 ("drm/i915: Check for awaits on still currently executing requests")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200716142207.13003-1-chris@chris-wilson.co.ukSigned-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      [Joonas: Rebased and reordered into drm-intel-gt-next branch]
      [Joonas: Added expanded commit message from Tvrtko and Chris]
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit b4d9145b)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      b82a8b93
  17. 07 9月, 2020 9 次提交
  18. 18 8月, 2020 1 次提交
  19. 14 7月, 2020 1 次提交
    • C
      drm/i915: Skip signaling a signaled request · 1d9221e9
      Chris Wilson 提交于
      Preempt-to-busy introduces various fascinating complications in that the
      requests may complete as we are unsubmitting them from HW. As they may
      then signal after unsubmission, we may find ourselves having to cleanup
      the signaling request from within the signaling callback. This causes us
      to recurse onto the same i915_request.lock.
      
      However, if the request is already signaled (as it will be before we
      enter the signal callbacks), we know we can skip the signaling of that
      request during submission, neatly evading the spinlock recursion.
      
      unsubmit(ve.rq0) # timeslice expiration or other preemption
       -> virtual_submit_request(ve.rq0)
      dma_fence_signal(ve.rq0) # request completed before preemption ack
       -> submit_notify(ve.rq1)
         -> virtual_submit_request(ve.rq1) # sees that we have completed ve.rq0
            -> __i915_request_submit(ve.rq0)
      
      [  264.210142] BUG: spinlock recursion on CPU#2, sample_multi_tr/2093
      [  264.210150]  lock: 0xffff9efd6ac55080, .magic: dead4ead, .owner: sample_multi_tr/2093, .owner_cpu: 2
      [  264.210155] CPU: 2 PID: 2093 Comm: sample_multi_tr Tainted: G     U
      [  264.210158] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X212.B01.1909060036 09/06/2019
      [  264.210160] Call Trace:
      [  264.210167]  dump_stack+0x98/0xda
      [  264.210174]  spin_dump.cold+0x24/0x3c
      [  264.210178]  do_raw_spin_lock+0x9a/0xd0
      [  264.210184]  _raw_spin_lock_nested+0x6a/0x70
      [  264.210314]  __i915_request_submit+0x10a/0x3c0 [i915]
      [  264.210415]  virtual_submit_request+0x9b/0x380 [i915]
      [  264.210516]  submit_notify+0xaf/0x14c [i915]
      [  264.210602]  __i915_sw_fence_complete+0x8a/0x230 [i915]
      [  264.210692]  i915_sw_fence_complete+0x2d/0x40 [i915]
      [  264.210762]  __dma_i915_sw_fence_wake+0x19/0x30 [i915]
      [  264.210767]  dma_fence_signal_locked+0xb1/0x1c0
      [  264.210772]  dma_fence_signal+0x29/0x50
      [  264.210871]  i915_request_wait+0x5cb/0x830 [i915]
      [  264.210876]  ? dma_resv_get_fences_rcu+0x294/0x5d0
      [  264.210974]  i915_gem_object_wait_fence+0x2f/0x40 [i915]
      [  264.211084]  i915_gem_object_wait+0xce/0x400 [i915]
      [  264.211178]  i915_gem_wait_ioctl+0xff/0x290 [i915]
      
      Fixes: 22b7a426 ("drm/i915/execlists: Preempt-to-busy")
      References: 6d06779e ("drm/i915: Load balancing across a virtual engine")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
      Cc: <stable@vger.kernel.org> # v5.4+
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200713141636.29326-1-chris@chris-wilson.co.uk
      1d9221e9
  20. 03 6月, 2020 1 次提交
  21. 01 6月, 2020 3 次提交
  22. 30 5月, 2020 1 次提交
    • C
      drm/i915: Check for awaits on still currently executing requests · b55230e5
      Chris Wilson 提交于
      With the advent of preempt-to-busy, a request may still be on the GPU as
      we unwind. And in the case of a unpreemptible [due to HW] request, that
      request will remain indefinitely on the GPU even though we have
      returned it back to our submission queue, and cleared the active bit.
      
      We only run the execution callbacks on transferring the request from our
      submission queue to the execution queue, but if this is a bonded request
      that the HW is waiting for, we will not submit it (as we wait for a
      fresh execution) even though it is still being executed.
      
      As we know that there are always preemption points between requests, we
      know that only the currently executing request may be still active even
      though we have cleared the flag. However, we do not precisely know which
      request is in ELSP[0] due to a delay in processing events, and
      furthermore we only store the last request in a context in our state
      tracker.
      
      Fixes: 22b7a426 ("drm/i915/execlists: Preempt-to-busy")
      Testcase: igt/gem_exec_balancer/bonded-dual
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200529143926.3245-1-chris@chris-wilson.co.uk
      b55230e5
  23. 29 5月, 2020 1 次提交
  24. 27 5月, 2020 1 次提交
    • C
      drm/i915/gt: Do not schedule normal requests immediately along virtual · 511b6d9a
      Chris Wilson 提交于
      When we push a virtual request onto the HW, we update the rq->engine to
      point to the physical engine. A request that is then submitted by the
      user that waits upon the virtual engine, but along the physical engine
      in use, will then see that it is due to be submitted to the same engine
      and take a shortcut (and be queued without waiting for the completion
      fence). However, the virtual request may be preempted (either by higher
      priority users, or by timeslicing) and removed from the physical engine
      to be migrated over to one of its siblings. The dependent normal request
      however is oblivious to the removal of the virtual request and remains
      queued to execute on HW, believing that once it reaches the head of its
      queue all of its predecessors will have completed executing!
      
      v2: Beware restriction of signal->execution_mask prior to submission.
      
      Fixes: 6d06779e ("drm/i915: Load balancing across a virtual engine")
      Testcase: igt/gem_exec_balancer/sliced
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: <stable@vger.kernel.org> # v5.3+
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200526090753.11329-2-chris@chris-wilson.co.uk
      511b6d9a