1. 27 4月, 2019 1 次提交
  2. 25 4月, 2019 2 次提交
    • C
      drm/i915: Invert the GEM wakeref hierarchy · 79ffac85
      Chris Wilson 提交于
      In the current scheme, on submitting a request we take a single global
      GEM wakeref, which trickles down to wake up all GT power domains. This
      is undesirable as we would like to be able to localise our power
      management to the available power domains and to remove the global GEM
      operations from the heart of the driver. (The intent there is to push
      global GEM decisions to the boundary as used by the GEM user interface.)
      
      Now during request construction, each request is responsible via its
      logical context to acquire a wakeref on each power domain it intends to
      utilize. Currently, each request takes a wakeref on the engine(s) and
      the engines themselves take a chipset wakeref. This gives us a
      transition on each engine which we can extend if we want to insert more
      powermangement control (such as soft rc6). The global GEM operations
      that currently require a struct_mutex are reduced to listening to pm
      events from the chipset GT wakeref. As we reduce the struct_mutex
      requirement, these listeners should evaporate.
      
      Perhaps the biggest immediate change is that this removes the
      struct_mutex requirement around GT power management, allowing us greater
      flexibility in request construction. Another important knock-on effect,
      is that by tracking engine usage, we can insert a switch back to the
      kernel context on that engine immediately, avoiding any extra delay or
      inserting global synchronisation barriers. This makes tracking when an
      engine and its associated contexts are idle much easier -- important for
      when we forgo our assumed execution ordering and need idle barriers to
      unpin used contexts. In the process, it means we remove a large chunk of
      code whose only purpose was to switch back to the kernel context.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
      79ffac85
    • C
      drm/i915: Pass intel_context to i915_request_create() · 2ccdf6a1
      Chris Wilson 提交于
      Start acquiring the logical intel_context and using that as our primary
      means for request allocation. This is the initial step to allow us to
      avoid requiring struct_mutex for request allocation along the
      perma-pinned kernel context, but it also provides a foundation for
      breaking up the complex request allocation to handle different scenarios
      inside execbuf.
      
      For the purpose of emitting a request from inside retirement (see the
      next patch for engine power management), we also need to lift control
      over the timeline mutex to the caller.
      
      v2: Note that the request carries the active reference upon construction.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk
      2ccdf6a1
  3. 11 4月, 2019 1 次提交
    • C
      drm/i915: Bump ready tasks ahead of busywaits · b7404c7e
      Chris Wilson 提交于
      Consider two tasks that are running in parallel on a pair of engines
      (vcs0, vcs1), but then must complete on a shared engine (rcs0). To
      maximise throughput, we want to run the first ready task on rcs0 (i.e.
      the first task that completes on either of vcs0 or vcs1). When using
      semaphores, however, we will instead queue onto rcs in submission order.
      
      To resolve this incorrect ordering, we want to re-evaluate the priority
      queue when each of the request is ready. Normally this happens because
      we only insert into the priority queue requests that are ready, but with
      semaphores we are inserting ahead of their readiness and to compensate
      we penalize those tasks with reduced priority (so that tasks that do not
      need to busywait should naturally be run first). However, given a series
      of tasks that each use semaphores, the queue degrades into submission
      fifo rather than readiness fifo, and so to counter this we give a small
      boost to semaphore users as their dependent tasks are completed (and so
      we no longer require any busywait prior to running the user task as they
      are then ready themselves).
      
      v2: Fixup irqsave for schedule_lock (Tvrtko)
      
      Testcase: igt/gem_exec_schedule/semaphore-codependency
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk
      b7404c7e
  4. 05 4月, 2019 1 次提交
  5. 22 3月, 2019 1 次提交
    • C
      drm/i915: Allow contexts to share a single timeline across all engines · ea593dbb
      Chris Wilson 提交于
      Previously, our view has been always to run the engines independently
      within a context. (Multiple engines happened before we had contexts and
      timelines, so they always operated independently and that behaviour
      persisted into contexts.) However, at the user level the context often
      represents a single timeline (e.g. GL contexts) and userspace must
      ensure that the individual engines are serialised to present that
      ordering to the client (or forgot about this detail entirely and hope no
      one notices - a fair ploy if the client can only directly control one
      engine themselves ;)
      
      In the next patch, we will want to construct a set of engines that
      operate as one, that have a single timeline interwoven between them, to
      present a single virtual engine to the user. (They submit to the virtual
      engine, then we decide which engine to execute on based.)
      
      To that end, we want to be able to create contexts which have a single
      timeline (fence context) shared between all engines, rather than multiple
      timelines.
      
      v2: Move the specialised timeline ordering to its own function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
      ea593dbb
  6. 06 3月, 2019 1 次提交
  7. 02 3月, 2019 2 次提交
    • C
      drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ · e8861964
      Chris Wilson 提交于
      Having introduced per-context seqno, we now have a means to identity
      progress across the system without feel of rollback as befell the
      global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
      advance of submission safe in the knowledge that our target seqno and
      address is stable.
      
      However, since we are telling the GPU to busy-spin on the target address
      until it matches the signaling seqno, we only want to do so when we are
      sure that busy-spin will be completed quickly. To achieve this we only
      submit the request to HW once the signaler is itself executing (modulo
      preemption causing us to wait longer), and we only do so for default and
      above priority requests (so that idle priority tasks never themselves
      hog the GPU waiting for others).
      
      As might be reasonably expected, HW semaphores excel in inter-engine
      synchronisation microbenchmarks (where the 3x reduced latency / increased
      throughput more than offset the power cost of spinning on a second ring)
      and have significant improvement (can be up to ~10%, most see no change)
      for single clients that utilize multiple engines (typically media players
      and transcoders), without regressing multiple clients that can saturate
      the system or changing the power envelope dramatically.
      
      v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
      v4: Tell the world and include it as part of scheduler caps.
      
      Testcase: igt/gem_exec_whisper
      Testcase: igt/benchmarks/gem_wsim
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
      e8861964
    • C
      drm/i915: Keep timeline HWSP allocated until idle across the system · ebece753
      Chris Wilson 提交于
      In preparation for enabling HW semaphores, we need to keep in flight
      timeline HWSP alive until its use across entire system has completed,
      as any other timeline active on the GPU may still refer back to the
      already retired timeline. We both have to delay recycling available
      cachelines and unpinning old HWSP until the next idle point.
      
      An easy option would be to simply keep all used HWSP until the system as
      a whole was idle, i.e. we could release them all at once on parking.
      However, on a busy system, we may never see a global idle point,
      essentially meaning the resource will be leaked until we are forced to
      do a GC pass. We already employ a fine-grained idle detection mechanism
      for vma, which we can reuse here so that each cacheline can be freed
      immediately after the last request using it is retired.
      
      v3: Keep track of the activity of each cacheline.
      v4: cacheline_free() on canceling the seqno tracking
      v5: Finally with a testcase to exercise wraparound
      v6: Pack cacheline into empty bits of page-aligned vaddr
      v7: Use i915_utils to hide the pointer casting around bit manipulation
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-2-chris@chris-wilson.co.uk
      ebece753
  8. 28 2月, 2019 1 次提交
  9. 26 2月, 2019 1 次提交
  10. 06 2月, 2019 1 次提交
  11. 30 1月, 2019 2 次提交
    • C
      drm/i915: Replace global breadcrumbs with per-context interrupt tracking · 52c0fdb2
      Chris Wilson 提交于
      A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the
      thundering i915_wait_request herd"), the issue of handling multiple
      clients waiting in parallel was brought to our attention. The
      requirement was that every client should be woken immediately upon its
      request being signaled, without incurring any cpu overhead.
      
      To handle certain fragility of our hw meant that we could not do a
      simple check inside the irq handler (some generations required almost
      unbounded delays before we could be sure of seqno coherency) and so
      request completion checking required delegation.
      
      Before commit 688e6c72, the solution was simple. Every client
      waiting on a request would be woken on every interrupt and each would do
      a heavyweight check to see if their request was complete. Commit
      688e6c72 introduced an rbtree so that only the earliest waiter on
      the global timeline would woken, and would wake the next and so on.
      (Along with various complications to handle requests being reordered
      along the global timeline, and also a requirement for kthread to provide
      a delegate for fence signaling that had no process context.)
      
      The global rbtree depends on knowing the execution timeline (and global
      seqno). Without knowing that order, we must instead check all contexts
      queued to the HW to see which may have advanced. We trim that list by
      only checking queued contexts that are being waited on, but still we
      keep a list of all active contexts and their active signalers that we
      inspect from inside the irq handler. By moving the waiters onto the fence
      signal list, we can combine the client wakeup with the dma_fence
      signaling (a dramatic reduction in complexity, but does require the HW
      being coherent, the seqno must be visible from the cpu before the
      interrupt is raised - we keep a timer backup just in case).
      
      Having previously fixed all the issues with irq-seqno serialisation (by
      inserting delays onto the GPU after each request instead of random delays
      on the CPU after each interrupt), we can rely on the seqno state to
      perfom direct wakeups from the interrupt handler. This allows us to
      preserve our single context switch behaviour of the current routine,
      with the only downside that we lose the RT priority sorting of wakeups.
      In general, direct wakeup latency of multiple clients is about the same
      (about 10% better in most cases) with a reduction in total CPU time spent
      in the waiter (about 20-50% depending on gen). Average herd behaviour is
      improved, but at the cost of not delegating wakeups on task_prio.
      
      v2: Capture fence signaling state for error state and add comments to
      warm even the most cold of hearts.
      v3: Check if the request is still active before busywaiting
      v4: Reduce the amount of pointer misdirection with list_for_each_safe
      and using a local i915_request variable inside the loops
      v5: Add a missing pluralisation to a purely informative selftest message.
      
      References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
      52c0fdb2
    • C
      drm/i915: Identify active requests · 85474441
      Chris Wilson 提交于
      To allow requests to forgo a common execution timeline, one question we
      need to be able to answer is "is this request running?". To track
      whether a request has started on HW, we can emit a breadcrumb at the
      beginning of the request and check its timeline's HWSP to see if the
      breadcrumb has advanced past the start of this request. (This is in
      contrast to the global timeline where we need only ask if we are on the
      global timeline and if the timeline has advanced past the end of the
      previous request.)
      
      There is still confusion from a preempted request, which has already
      started but relinquished the HW to a high priority request. For the
      common case, this discrepancy should be negligible. However, for
      identification of hung requests, knowing which one was running at the
      time of the hang will be much more important.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
      85474441
  12. 29 1月, 2019 2 次提交
  13. 22 1月, 2019 1 次提交
  14. 27 12月, 2018 1 次提交
  15. 02 10月, 2018 2 次提交
  16. 14 9月, 2018 1 次提交
    • C
      drm/i915: Limit the backpressure for i915_request allocation · 11abf0c5
      Chris Wilson 提交于
      If we try and fail to allocate a i915_request, we apply some
      backpressure on the clients to throttle the memory allocations coming
      from i915.ko. Currently, we wait until completely idle, but this is far
      too heavy and leads to some situations where the only escape is to
      declare a client hung and reset the GPU. The intent is to only ratelimit
      the allocation requests and to allow ourselves to recycle requests and
      memory from any long queues built up by a client hog.
      
      Although the system memory is inherently a global resources, we don't
      want to overly penalize an unlucky client to pay the price of reaping a
      hog. To reduce the influence of one client on another, we can instead of
      waiting for the entire GPU to idle, impose a barrier on the local client.
      (One end goal for request allocation is for scalability to many
      concurrent allocators; simultaneous execbufs.)
      
      To prevent ourselves from getting caught out by long running requests
      (requests that may never finish without userspace intervention, whom we
      are blocking) we need to impose a finite timeout, ideally shorter than
      hangcheck. A long time ago Paul McKenney suggested that RCU users should
      ratelimit themselves using judicious use of cond_synchronize_rcu(). This
      gives us the opportunity to reduce our indefinite wait for the GPU to
      idle to a wait for the RCU grace period of the previous allocation along
      this timeline to expire, satisfying both the local and finite properties
      we desire for our ratelimiting.
      
      There are still a few global steps (reclaim not least amongst those!)
      when we exhaust the immediate slab pool, at least now the wait is itself
      decoupled from struct_mutex for our glorious highly parallel future!
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1-chris@chris-wilson.co.uk
      11abf0c5
  17. 07 8月, 2018 1 次提交
  18. 07 7月, 2018 2 次提交
  19. 14 6月, 2018 1 次提交
  20. 11 6月, 2018 1 次提交
    • C
      drm/i915/ringbuffer: Fix context restore upon reset · b3ee09a4
      Chris Wilson 提交于
      The discovery with trying to enable full-ppgtt was that we were
      completely failing to the load both the mm and context following the
      reset. Although we were performing mmio to set the PP_DIR (per-process
      GTT) and CCID (context), these were taking no effect (the assumption was
      that this would trigger reload of the context and restore the page
      tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
      following context switch would anything occur.
      
      Since we are then required to reset the context image and PP_DIR using
      CS commands, we place those commands into every batch. The hardware
      should recognise the no-ops and eliminate the expensive context loads,
      but we still have to pay the cost of using cross-powerwell register
      writes. In practice, this has no effect on actual context switch times,
      and only adds a few hundred nanoseconds to no-op switches. We can improve
      the latter by eliminating the w/a around known no-op switches, but there
      is an ulterior motive to keeping them.
      
      Always emitting the context switch at the beginning of the request (and
      relying on HW to skip unneeded switches) does have one key advantage.
      Should we implement request reordering on Haswell, we will not know in
      advance what the previous executing context was on the GPU and so we
      would not be able to elide the MI_SET_CONTEXT commands ourselves and
      always have to emit them. Having our hand forced now actually prepares
      us for later.
      
      Now since that context and mm follow the request, we no longer (and not
      for a long time since requests took over!) require a trace point to tell
      when we write the switch into the ring, since it is always. (This is
      even more important when you remember that simply writing into the ring
      bears no relation to the current mm.)
      
      v2: Sandybridge has to agree to use LRI as well.
      
      Testcase: igt/drv_selftests/live_hangcheck
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
      b3ee09a4
  21. 01 6月, 2018 1 次提交
  22. 18 5月, 2018 2 次提交
  23. 03 5月, 2018 1 次提交
  24. 19 4月, 2018 3 次提交
  25. 06 3月, 2018 1 次提交
    • C
      drm/i915/breadcrumbs: Reduce signaler rbtree to a sorted list · cd46c545
      Chris Wilson 提交于
      The goal here is to try and reduce the latency of signaling additional
      requests following the wakeup from interrupt by reducing the list of
      to-be-signaled requests from an rbtree to a sorted linked list. The
      original choice of using an rbtree was to facilitate random insertions
      of request into the signaler while maintaining a sorted list. However,
      if we assume that most new requests are added when they are submitted,
      we see those new requests in execution order making a insertion sort
      fast, and the reduction in overhead of each signaler iteration
      significant.
      
      Since commit 56299fb7 ("drm/i915: Signal first fence from irq handler
      if complete"), we signal most fences directly from notify_ring() in the
      interrupt handler greatly reducing the amount of work that actually
      needs to be done by the signaler kthread. All the thread is then
      required to do is operate as the bottom-half, cleaning up after the
      interrupt handler and preparing the next waiter. This includes signaling
      all later completed fences in a saturated system, but on a mostly idle
      system we only have to rebuild the wait rbtree in time for the next
      interrupt. With this de-emphasis of the signaler's role, we want to
      rejig it's datastructures to reduce the amount of work we require to
      both setup the signal tree and maintain it on every interrupt.
      
      References: 56299fb7 ("drm/i915: Signal first fence from irq handler if complete")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180222092545.17216-1-chris@chris-wilson.co.uk
      cd46c545
  26. 22 2月, 2018 1 次提交
  27. 19 1月, 2018 1 次提交
    • C
      drm/i915: Avoid waitboosting on the active request · e9af4ea2
      Chris Wilson 提交于
      Watching a light workload on Baytrail (running glxgears and a 1080p
      decode), instead of the system remaining at low frequency, the glxgears
      would regularly trigger waitboosting after which it would have to spend
      a few seconds throttling back down. In this case, the waitboosting is
      counter productive as the minimal wait for glxgears doesn't prevent it
      from functioning correctly and delivering frames on time. In this case,
      glxgears happens to almost always be waiting on the current request,
      which we already expect to complete quickly (see i915_spin_request) and
      so avoiding the waitboost on the active request and spinning instead
      provides the best latency without overcommitting to upclocking.
      However, if the system falls behind we still force the waitboost.
      Similarly, we will also trigger upclocking if we detect the system is
      not delivering frames on time - again using a mechanism that tries to
      detect a miss and not preemptively upclock.
      
      v2: Also skip boosting for after missed vblank if the desired request is
      already active.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180118131609.16574-1-chris@chris-wilson.co.uk
      e9af4ea2
  28. 03 1月, 2018 2 次提交
  29. 13 12月, 2017 1 次提交
  30. 05 10月, 2017 1 次提交
    • C
      drm/i915/scheduler: Support user-defined priorities · ac14fbd4
      Chris Wilson 提交于
      Use a priority stored in the context as the initial value when
      submitting a request. This allows us to change the default priority on a
      per-context basis, allowing different contexts to be favoured with GPU
      time at the expense of lower importance work. The user can adjust the
      context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
      values being higher priority (they will be serviced earlier, after their
      dependencies have been resolved). Any prerequisite work for an execbuf
      will have its priority raised to match the new request as required.
      
      Normal users can specify any value in the range of -1023 to 0 [default],
      i.e. they can reduce the priority of their workloads (and temporarily
      boost it back to normal if so desired).
      
      Privileged users can specify any value in the range of -1023 to 1023,
      [default is 0], i.e. they can raise their priority above all overs and
      so potentially starve the system.
      
      Note that the existing schedulers are not fair, nor load balancing, the
      execution is strictly by priority on a first-come, first-served basis,
      and the driver may choose to boost some requests above the range
      available to users.
      
      This priority was originally based around nice(2), but evolved to allow
      clients to adjust their priority within a small range, and allow for a
      privileged high priority range.
      
      For example, this can be used to implement EGL_IMG_context_priority
      https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt
      
      	EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
              the context to be created. This attribute is a hint, as an
              implementation may not support multiple contexts at some
              priority levels and system policy may limit access to high
              priority contexts to appropriate system privilege level. The
              default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
              EGL_CONTEXT_PRIORITY_MEDIUM_IMG."
      
      so we can map
      
      	PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
      	PRIORITY_MED -> 0 [default]
      	PRIORITY_LOW -> -1023
      
      They also map onto the priorities used by VkQueue (and a VkQueue is
      essentially a timeline, our i915_gem_context under full-ppgtt).
      
      v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/
      v3: Report min/max user priorities as defines in the uapi, and rebase
      internal priorities on the exposed values.
      
      Testcase: igt/gem_exec_schedule
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-9-chris@chris-wilson.co.uk
      ac14fbd4