1. 06 10月, 2017 1 次提交
  2. 25 9月, 2017 3 次提交
  3. 22 9月, 2017 1 次提交
  4. 18 9月, 2017 4 次提交
  5. 13 9月, 2017 4 次提交
  6. 19 6月, 2017 1 次提交
  7. 31 5月, 2017 1 次提交
  8. 23 5月, 2017 1 次提交
  9. 19 5月, 2017 3 次提交
  10. 17 5月, 2017 3 次提交
    • C
      drm/i915: Create a kmem_cache to allocate struct i915_priolist from · c5cf9a91
      Chris Wilson 提交于
      The i915_priolist are allocated within an atomic context on a path where
      we wish to minimise latency. If we use a dedicated kmem_cache, we have
      the advantage of a local freelist from which to service new requests
      that should keep the latency impact of an allocation small. Though
      currently we expect the majority of requests to be at default priority
      (and so hit the preallocate priolist), once userspace starts using
      priorities they are likely to use many fine grained policies improving
      the utilisation of a private slab.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
      c5cf9a91
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9
      Chris Wilson 提交于
      add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
      function                                     old     new   delta
      execlists_submit_ports                       262     471    +209
      port_assign.isra                               -     136    +136
      capture                                     6344    6359     +15
      reset_common_ring                            438     452     +14
      execlists_submit_request                     228     238     +10
      gen8_init_common_ring                        334     341      +7
      intel_engine_is_idle                         106     105      -1
      i915_engine_info                            2314    2290     -24
      __i915_gem_set_wedged_BKL                    485     411     -74
      intel_lrc_irq_handler                       1789    1604    -185
      execlists_update_context                     294       -    -294
      
      The most important change there is the improve to the
      intel_lrc_irq_handler and excclist_submit_ports (net improvement since
      execlists_update_context is now inlined).
      
      v2: Use the port_api() for guc as well (even though currently we do not
      pack any counters in there, yet) and hide all port->request_count inside
      the helpers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
      77f0d0e9
  11. 28 4月, 2017 1 次提交
    • J
      drm/i915: Sanitize engine context sizes · 63ffbcda
      Joonas Lahtinen 提交于
      Pre-calculate engine context size based on engine class and device
      generation and store it in the engine instance.
      
      v2:
      - Squash and get rid of hw_context_size (Chris)
      
      v3:
      - Move after MMIO init for probing on Gen7 and 8 (Chris)
      - Retained rounding (Tvrtko)
      v4:
      - Rebase for deferred legacy context allocation
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: intel-gvt-dev@lists.freedesktop.org
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      63ffbcda
  12. 26 4月, 2017 1 次提交
  13. 25 4月, 2017 1 次提交
  14. 29 3月, 2017 1 次提交
    • C
      Revert "drm/i915: Skip execlists_dequeue() early if the list is empty" · 18afa288
      Chris Wilson 提交于
      This reverts commit 6c943de6 ("drm/i915: Skip execlists_dequeue()
      early if the list is empty").
      
      The validity of using READ_ONCE there depends upon having a mb to
      coordinate the assignment of engine->execlist_first inside
      submit_request() and checking prior to taking the spinlock in
      execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
      memory barrier making this cross-cpu checking safe", but failed to
      notice that this mb was *conditional* on the execlists being ready, i.e.
      there wasn't the required mb when it was most necessary!
      
      We could install an unconditional memory barrier to fixup the
      READ_ONCE():
      
      diff --git a/drivers/gpu/drm/i915/intel_lrc.c
      b/drivers/gpu/drm/i915/intel_lrc.c
      index 7dd732cb9f57..1ed164b16d44 100644
      --- a/drivers/gpu/drm/i915/intel_lrc.c
      +++ b/drivers/gpu/drm/i915/intel_lrc.c
      @@ -616,6 +616,7 @@ static void execlists_submit_request(struct
      drm_i915_gem_request *request)
      
              if (insert_request(&request->priotree, &engine->execlist_queue))
      {
                      engine->execlist_first = &request->priotree.node;
      +               smp_wmb();
                      if (execlists_elsp_ready(engine))
      
      But we have opted to remove the race as it should be rarely effective,
      and saves us having to explain the necessary memory barriers which we
      quite clearly failed at.
      Reported-and-tested-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Fixes: 6c943de6 ("drm/i915: Skip execlists_dequeue() early if the list is empty")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      18afa288
  15. 27 3月, 2017 1 次提交
  16. 24 3月, 2017 1 次提交
  17. 23 3月, 2017 10 次提交
  18. 21 3月, 2017 2 次提交