1. 13 9月, 2017 4 次提交
  2. 19 6月, 2017 1 次提交
  3. 31 5月, 2017 1 次提交
  4. 23 5月, 2017 1 次提交
  5. 19 5月, 2017 3 次提交
  6. 17 5月, 2017 3 次提交
    • C
      drm/i915: Create a kmem_cache to allocate struct i915_priolist from · c5cf9a91
      Chris Wilson 提交于
      The i915_priolist are allocated within an atomic context on a path where
      we wish to minimise latency. If we use a dedicated kmem_cache, we have
      the advantage of a local freelist from which to service new requests
      that should keep the latency impact of an allocation small. Though
      currently we expect the majority of requests to be at default priority
      (and so hit the preallocate priolist), once userspace starts using
      priorities they are likely to use many fine grained policies improving
      the utilisation of a private slab.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
      c5cf9a91
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9
      Chris Wilson 提交于
      add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
      function                                     old     new   delta
      execlists_submit_ports                       262     471    +209
      port_assign.isra                               -     136    +136
      capture                                     6344    6359     +15
      reset_common_ring                            438     452     +14
      execlists_submit_request                     228     238     +10
      gen8_init_common_ring                        334     341      +7
      intel_engine_is_idle                         106     105      -1
      i915_engine_info                            2314    2290     -24
      __i915_gem_set_wedged_BKL                    485     411     -74
      intel_lrc_irq_handler                       1789    1604    -185
      execlists_update_context                     294       -    -294
      
      The most important change there is the improve to the
      intel_lrc_irq_handler and excclist_submit_ports (net improvement since
      execlists_update_context is now inlined).
      
      v2: Use the port_api() for guc as well (even though currently we do not
      pack any counters in there, yet) and hide all port->request_count inside
      the helpers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
      77f0d0e9
  7. 28 4月, 2017 1 次提交
    • J
      drm/i915: Sanitize engine context sizes · 63ffbcda
      Joonas Lahtinen 提交于
      Pre-calculate engine context size based on engine class and device
      generation and store it in the engine instance.
      
      v2:
      - Squash and get rid of hw_context_size (Chris)
      
      v3:
      - Move after MMIO init for probing on Gen7 and 8 (Chris)
      - Retained rounding (Tvrtko)
      v4:
      - Rebase for deferred legacy context allocation
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
      Cc: intel-gvt-dev@lists.freedesktop.org
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      63ffbcda
  8. 26 4月, 2017 1 次提交
  9. 25 4月, 2017 1 次提交
  10. 29 3月, 2017 1 次提交
    • C
      Revert "drm/i915: Skip execlists_dequeue() early if the list is empty" · 18afa288
      Chris Wilson 提交于
      This reverts commit 6c943de6 ("drm/i915: Skip execlists_dequeue()
      early if the list is empty").
      
      The validity of using READ_ONCE there depends upon having a mb to
      coordinate the assignment of engine->execlist_first inside
      submit_request() and checking prior to taking the spinlock in
      execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
      memory barrier making this cross-cpu checking safe", but failed to
      notice that this mb was *conditional* on the execlists being ready, i.e.
      there wasn't the required mb when it was most necessary!
      
      We could install an unconditional memory barrier to fixup the
      READ_ONCE():
      
      diff --git a/drivers/gpu/drm/i915/intel_lrc.c
      b/drivers/gpu/drm/i915/intel_lrc.c
      index 7dd732cb9f57..1ed164b16d44 100644
      --- a/drivers/gpu/drm/i915/intel_lrc.c
      +++ b/drivers/gpu/drm/i915/intel_lrc.c
      @@ -616,6 +616,7 @@ static void execlists_submit_request(struct
      drm_i915_gem_request *request)
      
              if (insert_request(&request->priotree, &engine->execlist_queue))
      {
                      engine->execlist_first = &request->priotree.node;
      +               smp_wmb();
                      if (execlists_elsp_ready(engine))
      
      But we have opted to remove the race as it should be rarely effective,
      and saves us having to explain the necessary memory barriers which we
      quite clearly failed at.
      Reported-and-tested-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Fixes: 6c943de6 ("drm/i915: Skip execlists_dequeue() early if the list is empty")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      18afa288
  11. 27 3月, 2017 1 次提交
  12. 24 3月, 2017 1 次提交
  13. 23 3月, 2017 10 次提交
  14. 21 3月, 2017 2 次提交
  15. 17 3月, 2017 2 次提交
  16. 16 3月, 2017 1 次提交
    • C
      drm/i915/scheduler: emulate a scheduler for guc · 31de7350
      Chris Wilson 提交于
      This emulates execlists on top of the GuC in order to defer submission of
      requests to the hardware. This deferral allows time for high priority
      requests to gazump their way to the head of the queue, however it nerfs
      the GuC by converting it back into a simple execlist (where the CPU has
      to wake up after every request to feed new commands into the GuC).
      
      v2: Drop hack status - though iirc there is still a lockdep inversion
      between fence and engine->timeline->lock (which is impossible as the
      nesting only occurs on different fences - hopefully just requires some
      judicious lockdep annotation)
      v3: Apply lockdep nesting to enabling signaling on the request, using
      the pattern we already have in __i915_gem_request_submit();
      v4: Replaying requests after a hang also now needs the timeline
      spinlock, to disable the interrupts at least
      v5: Hold wq lock for completeness, and emit a tracepoint for enabling signal
      v6: Reorder interrupt checking for a happier gcc.
      v7: Only signal the tasklet after a user-interrupt if using guc scheduling
      v8: Restore lost update of rq through the i915_guc_irq_handler (Tvrtko)
      v9: Avoid re-initialising the engine->irq_tasklet from inside a reset
      v10: Hook up the execlists-style tracepoints
      v11: Clear the execlists irq_posted bit after taking over the interrupt/tasklet
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170316125619.6856-1-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      31de7350
  17. 15 3月, 2017 1 次提交
  18. 13 3月, 2017 1 次提交
  19. 12 3月, 2017 2 次提交
  20. 10 3月, 2017 1 次提交
  21. 02 3月, 2017 1 次提交
    • C
      drm/i915/guc: Disable irq for __i915_guc_submit wq_lock · 25afdf89
      Chris Wilson 提交于
      __i915_guc_submit may be, despite my assertion, called from outside of
      an irq-safe spinlock so we need to use a full spin_lock_irqsave and not
      cheat using a spin_lock. (The initial notify callback from the completed
      fence is called before the spinlock is taken to wake up all waiters and
      call their callbacks.)
      
      [   48.166581] kernel BUG at drivers/gpu/drm/i915/i915_guc_submission.c:527!
      [   48.166617] invalid opcode: 0000 [#1] PREEMPT SMP
      [   48.166644] Modules linked in: i915 prime_numbers x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mei_me mei i2c_i801 netconsole i2c_hid [last unloaded: i915]
      [   48.166733] CPU: 2 PID: 5 Comm: kworker/u8:0 Tainted: G     U          4.10.0nightly-170302-guc_scrub+ #19
      [   48.166778] Hardware name:                  /NUC6i5SYB, BIOS SYSKLi35.86A.0054.2016.0930.1102 09/30/2016
      [   48.166835] Workqueue: i915 __intel_autoenable_gt_powersave [i915]
      [   48.166865] task: ffff88084ab7cf40 task.stack: ffffc90000064000
      [   48.166921] RIP: 0010:__i915_guc_submit+0x1e6/0x2a0 [i915]
      [   48.166953] RSP: 0018:ffffc90000067c80 EFLAGS: 00010202
      [   48.166979] RAX: 0000000000000202 RBX: ffff8808465e0c68 RCX: 0000000000000201
      [   48.167016] RDX: 0000000080000201 RSI: ffff88084ab7d798 RDI: ffff88082b8a8040
      [   48.167054] RBP: ffffc90000067cd8 R08: 0000000000000001 R09: 0000000000000000
      [   48.167085] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88082b8a8148
      [   48.167126] R13: 0000000000000000 R14: ffff88082f440000 R15: ffff88082e85e660
      [   48.167156] FS:  0000000000000000(0000) GS:ffff88086ed00000(0000) knlGS:0000000000000000
      [   48.167195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.167226] CR2: 000055862ffcdc2c CR3: 0000000001e0f000 CR4: 00000000003406e0
      [   48.167257] Call Trace:
      [   48.168112]  ? trace_hardirqs_on+0xd/0x10
      [   48.168966]  ? _raw_spin_unlock_irqrestore+0x4a/0x80
      [   48.169831]  i915_guc_submit+0x1a/0x20 [i915]
      [   48.170680]  submit_notify+0x89/0xc0 [i915]
      [   48.171512]  __i915_sw_fence_complete+0x175/0x220 [i915]
      [   48.172340]  i915_sw_fence_complete+0x2a/0x50 [i915]
      [   48.173158]  i915_sw_fence_commit+0x21/0x30 [i915]
      [   48.173968]  __i915_add_request+0x238/0x530 [i915]
      [   48.174764]  __intel_autoenable_gt_powersave+0x8b/0xb0 [i915]
      [   48.175549]  process_one_work+0x218/0x690
      [   48.176318]  ? process_one_work+0x197/0x690
      [   48.177183]  worker_thread+0x4e/0x4a0
      [   48.178039]  kthread+0x10c/0x140
      [   48.178878]  ? process_one_work+0x690/0x690
      [   48.179718]  ? kthread_create_on_node+0x40/0x40
      [   48.180568]  ret_from_fork+0x31/0x40
      [   48.181423] Code: 02 00 00 43 89 84 ae 50 11 00 00 e8 75 01 62 e1 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c1 e0 20 48 09 c2 49 89 d0 eb 82 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 49 c1 e8 20 44 89 43 34 4a
      [   48.183336] RIP: __i915_guc_submit+0x1e6/0x2a0 [i915] RSP: ffffc90000067c80
      Reported-by: NArkadiusz Hiler <arkadiusz.hiler@intel.com>
      Fixes: 349ab919 ("drm/i915/guc: Make wq_lock irq-safe")
      Fixes: 67b807a8 ("drm/i915: Delay disabling the user interrupt for breadcrumbs")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170302145323.12886-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NArkadiusz Hiler <arkadiusz.hiler@intel.com>
      Tested-by: NArkadiusz Hiler <arkadiusz.hiler@intel.com>
      25afdf89