1. 17 5月, 2017 2 次提交
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915: Use a define for the default priority [0] · e4f815f6
      Chris Wilson 提交于
      Explicitly assign the default priority, and give it a name. After much
      discussion, we have chosen to call it I915_PRIORITY_NORMAL!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-7-chris@chris-wilson.co.uk
      e4f815f6
  2. 15 4月, 2017 1 次提交
  3. 17 3月, 2017 1 次提交
  4. 09 3月, 2017 1 次提交
  5. 03 3月, 2017 1 次提交
  6. 28 2月, 2017 1 次提交
    • C
      drm/i915: Signal first fence from irq handler if complete · 56299fb7
      Chris Wilson 提交于
      As execlists and other non-semaphore multi-engine devices coordinate
      between engines using interrupts, we can shave off a few 10s of
      microsecond of scheduling latency by doing the fence signaling from the
      interrupt as opposed to a RT kthread. (Realistically the delay adds
      about 1% to an individual cross-engine workload.) We only signal the
      first fence in order to limit the amount of work we move into the
      interrupt handler. We also have to remember that our breadcrumbs may be
      unordered with respect to the interrupt and so we still require the
      waiter process to perform some heavyweight coherency fixups, as well as
      traversing the tree of waiters.
      
      v2: No need for early exit in irq handler - it breaks the flow between
      patches and prevents the tracepoint
      v3: Restore rcu hold across irq signaling of request
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
      56299fb7
  7. 23 2月, 2017 4 次提交
  8. 03 1月, 2017 1 次提交
  9. 23 12月, 2016 1 次提交
  10. 19 12月, 2016 1 次提交
    • C
      drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f
      Chris Wilson 提交于
      The requests conversion introduced a nasty bug where we could generate a
      new request in the middle of constructing a request if we needed to idle
      the system in order to evict space for a context. The request to idle
      would be executed (and waited upon) before the current one, creating a
      minor havoc in the seqno accounting, as we will consider the current
      request to already be completed (prior to deferred seqno assignment) but
      ring->last_retired_head would have been updated and still could allow
      us to overwrite the current request before execution.
      
      We also employed two different mechanisms to track the active context
      until it was switched out. The legacy method allowed for waiting upon an
      active context (it could forcibly evict any vma, including context's),
      but the execlists method took a step backwards by pinning the vma for
      the entire active lifespan of the context (the only way to evict was to
      idle the entire GPU, not individual contexts). However, to circumvent
      the tricky issue of locking (i.e. we cannot take struct_mutex at the
      time of i915_gem_request_submit(), where we would want to move the
      previous context onto the active tracker and unpin it), we take the
      execlists approach and keep the contexts pinned until retirement.
      The benefit of the execlists approach, more important for execlists than
      legacy, was the reduction in work in pinning the context for each
      request - as the context was kept pinned until idle, it could short
      circuit the pinning for all active contexts.
      
      We introduce new engine vfuncs to pin and unpin the context
      respectively. The context is pinned at the start of the request, and
      only unpinned when the following request is retired (this ensures that
      the context is idle and coherent in main memory before we unpin it). We
      move the engine->last_context tracking into the retirement itself
      (rather than during request submission) in order to allow the submission
      to be reordered or unwound without undue difficultly.
      
      And finally an ulterior motive for unifying context handling was to
      prepare for mock requests.
      
      v2: Rename to last_retired_context, split out legacy_context tracking
      for MI_SET_CONTEXT.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
      e8a9c58f
  11. 15 11月, 2016 4 次提交
  12. 11 11月, 2016 1 次提交
  13. 08 11月, 2016 1 次提交
  14. 29 10月, 2016 6 次提交
  15. 25 10月, 2016 1 次提交
    • C
      dma-buf: Rename struct fence to dma_fence · f54d1867
      Chris Wilson 提交于
      I plan to usurp the short name of struct fence for a core kernel struct,
      and so I need to rename the specialised fence/timeline for DMA
      operations to make room.
      
      A consensus was reached in
      https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
      that making clear this fence applies to DMA operations was a good thing.
      Since then the patch has grown a bit as usage increases, so hopefully it
      remains a good thing!
      
      (v2...: rebase, rerun spatch)
      v3: Compile on msm, spotted a manual fixup that I broke.
      v4: Try again for msm, sorry Daniel
      
      coccinelle script:
      @@
      
      @@
      - struct fence
      + struct dma_fence
      @@
      
      @@
      - struct fence_ops
      + struct dma_fence_ops
      @@
      
      @@
      - struct fence_cb
      + struct dma_fence_cb
      @@
      
      @@
      - struct fence_array
      + struct dma_fence_array
      @@
      
      @@
      - enum fence_flag_bits
      + enum dma_fence_flag_bits
      @@
      
      @@
      (
      - fence_init
      + dma_fence_init
      |
      - fence_release
      + dma_fence_release
      |
      - fence_free
      + dma_fence_free
      |
      - fence_get
      + dma_fence_get
      |
      - fence_get_rcu
      + dma_fence_get_rcu
      |
      - fence_put
      + dma_fence_put
      |
      - fence_signal
      + dma_fence_signal
      |
      - fence_signal_locked
      + dma_fence_signal_locked
      |
      - fence_default_wait
      + dma_fence_default_wait
      |
      - fence_add_callback
      + dma_fence_add_callback
      |
      - fence_remove_callback
      + dma_fence_remove_callback
      |
      - fence_enable_sw_signaling
      + dma_fence_enable_sw_signaling
      |
      - fence_is_signaled_locked
      + dma_fence_is_signaled_locked
      |
      - fence_is_signaled
      + dma_fence_is_signaled
      |
      - fence_is_later
      + dma_fence_is_later
      |
      - fence_later
      + dma_fence_later
      |
      - fence_wait_timeout
      + dma_fence_wait_timeout
      |
      - fence_wait_any_timeout
      + dma_fence_wait_any_timeout
      |
      - fence_wait
      + dma_fence_wait
      |
      - fence_context_alloc
      + dma_fence_context_alloc
      |
      - fence_array_create
      + dma_fence_array_create
      |
      - to_fence_array
      + to_dma_fence_array
      |
      - fence_is_array
      + dma_fence_is_array
      |
      - trace_fence_emit
      + trace_dma_fence_emit
      |
      - FENCE_TRACE
      + DMA_FENCE_TRACE
      |
      - FENCE_WARN
      + DMA_FENCE_WARN
      |
      - FENCE_ERR
      + DMA_FENCE_ERR
      )
       (
       ...
       )
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      Acked-by: NSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
      f54d1867
  16. 09 9月, 2016 7 次提交
  17. 22 8月, 2016 1 次提交
    • D
      drm/i915: Ensure consistent control flow __i915_gem_active_get_rcu · c75870d8
      Daniel Vetter 提交于
      This issue here is (I think) purely theoretical, since a compiler
      would need to be especially foolish to recompute the value of
      i915_gem_request_completed right after it was already used. Hence the
      additional barrier() is also not really a restriction.
      
      But I believe this to be at least permissible, and since our rcu
      trickery is a beast it's worth to annotate all the corner cases.
      Chris proposed to instead just wrap a READ_ONCE around
      request->fence.seqno in i915_gem_request_completed. But that has a
      measurable impact on code size, and everywhere we hold a full
      reference to the underlying request it's also not needed. And
      personally I'd like to have just enough barriers and locking needed
      for correctness, but not more - it makes it much easier in the future
      to understand what's going on.
      
      Since the busy ioctl has now fully embraced it's races there's no
      point annotating it there too. We really only need it in
      active_get_rcu, since that function _must_ deliver a correct snapshot
      of the active fences (and not chase something else).
      
      v2: Polish the comment a bit more (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1471856122-466-1-git-send-email-daniel.vetter@ffwll.ch
      c75870d8
  18. 15 8月, 2016 2 次提交
  19. 10 8月, 2016 1 次提交
  20. 09 8月, 2016 2 次提交