1. 17 5月, 2017 2 次提交
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9
      Chris Wilson 提交于
      add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
      function                                     old     new   delta
      execlists_submit_ports                       262     471    +209
      port_assign.isra                               -     136    +136
      capture                                     6344    6359     +15
      reset_common_ring                            438     452     +14
      execlists_submit_request                     228     238     +10
      gen8_init_common_ring                        334     341      +7
      intel_engine_is_idle                         106     105      -1
      i915_engine_info                            2314    2290     -24
      __i915_gem_set_wedged_BKL                    485     411     -74
      intel_lrc_irq_handler                       1789    1604    -185
      execlists_update_context                     294       -    -294
      
      The most important change there is the improve to the
      intel_lrc_irq_handler and excclist_submit_ports (net improvement since
      execlists_update_context is now inlined).
      
      v2: Use the port_api() for guc as well (even though currently we do not
      pack any counters in there, yet) and hide all port->request_count inside
      the helpers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
      77f0d0e9
  2. 12 5月, 2017 1 次提交
  3. 04 5月, 2017 1 次提交
  4. 28 4月, 2017 1 次提交
    • J
      drm/i915: Sanitize engine context sizes · 63ffbcda
      Joonas Lahtinen 提交于
      Pre-calculate engine context size based on engine class and device
      generation and store it in the engine instance.
      
      v2:
      - Squash and get rid of hw_context_size (Chris)
      
      v3:
      - Move after MMIO init for probing on Gen7 and 8 (Chris)
      - Retained rounding (Tvrtko)
      v4:
      - Rebase for deferred legacy context allocation
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: intel-gvt-dev@lists.freedesktop.org
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      63ffbcda
  5. 25 4月, 2017 2 次提交
  6. 12 4月, 2017 2 次提交
  7. 03 4月, 2017 1 次提交
    • C
      drm/i915: intel_ring.engine is unused · d822bb18
      Chris Wilson 提交于
      Or rather it is used only by intel_ring_pin() to extract the
      drm_i915_private which we can easily pass in. As this is a relatively
      rare operation, save the space in the struct, and as such it is even
      break even in the extra code for passing around the parameter:
      
      add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0)
      function                                     old     new   delta
      intel_init_ring_buffer                       906     918     +12
      execlists_context_pin                       1308    1311      +3
      mock_engine                                  407     403      -4
      intel_engine_create_ring                     367     363      -4
      intel_ring_pin                               326     319      -7
      Total: Before=1261794, After=1261794, chg +0.00%
      
      v2: Reorder intel_init_ring_buffer to keep the ring setup together:
      
      add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6)
      function                                     old     new   delta
      intel_init_ring_buffer                       906     912      +6
      execlists_context_pin                       1308    1311      +3
      mock_engine                                  407     403      -4
      intel_engine_create_ring                     367     363      -4
      intel_ring_pin                               326     319      -7
      Total: Before=1261794, After=1261788, chg -0.00%
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
      d822bb18
  8. 31 3月, 2017 1 次提交
  9. 29 3月, 2017 3 次提交
  10. 27 3月, 2017 4 次提交
  11. 24 3月, 2017 1 次提交
  12. 21 3月, 2017 4 次提交
  13. 20 3月, 2017 1 次提交
  14. 17 3月, 2017 4 次提交
  15. 16 3月, 2017 1 次提交
    • C
      drm/i915/scheduler: emulate a scheduler for guc · 31de7350
      Chris Wilson 提交于
      This emulates execlists on top of the GuC in order to defer submission of
      requests to the hardware. This deferral allows time for high priority
      requests to gazump their way to the head of the queue, however it nerfs
      the GuC by converting it back into a simple execlist (where the CPU has
      to wake up after every request to feed new commands into the GuC).
      
      v2: Drop hack status - though iirc there is still a lockdep inversion
      between fence and engine->timeline->lock (which is impossible as the
      nesting only occurs on different fences - hopefully just requires some
      judicious lockdep annotation)
      v3: Apply lockdep nesting to enabling signaling on the request, using
      the pattern we already have in __i915_gem_request_submit();
      v4: Replaying requests after a hang also now needs the timeline
      spinlock, to disable the interrupts at least
      v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal
      v6: Reorder interrupt checking for a happier gcc.
      v7: Only signal the tasklet after a user-interrupt if using guc scheduling
      v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko)
      v9: Avoid re-initialising the engine->irq_tasklet from inside a reset
      v10: Hook up the execlists-style tracepoints
      v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      31de7350
  16. 03 3月, 2017 3 次提交
  17. 27 2月, 2017 1 次提交
  18. 24 2月, 2017 1 次提交
  19. 21 2月, 2017 2 次提交
  20. 20 2月, 2017 1 次提交
  21. 17 2月, 2017 2 次提交
  22. 16 2月, 2017 1 次提交