1. 15 6月, 2019 2 次提交
    • C
      drm/i915: Replace engine->timeline with a plain list · 422d7df4
      Chris Wilson 提交于
      To continue the onslaught of removing the assumption of a global
      execution ordering, another casualty is the engine->timeline. Without an
      actual timeline to track, it is overkill and we can replace it with a
      much less grand plain list. We still need a list of requests inflight,
      for the simple purpose of finding inflight requests (for retiring,
      resetting, preemption etc).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
      422d7df4
    • C
      drm/i915: Keep contexts pinned until after the next kernel context switch · ce476c80
      Chris Wilson 提交于
      We need to keep the context image pinned in memory until after the GPU
      has finished writing into it. Since it continues to write as we signal
      the final breadcrumb, we need to keep it pinned until the request after
      it is complete. Currently we know the order in which requests execute on
      each engine, and so to remove that presumption we need to identify a
      request/context-switch we know must occur after our completion. Any
      request queued after the signal must imply a context switch, for
      simplicity we use a fresh request from the kernel context.
      
      The sequence of operations for keeping the context pinned until saved is:
      
       - On context activation, we preallocate a node for each physical engine
         the context may operate on. This is to avoid allocations during
         unpinning, which may be from inside FS_RECLAIM context (aka the
         shrinker)
      
       - On context deactivation on retirement of the last active request (which
         is before we know the context has been saved), we add the
         preallocated node onto a barrier list on each engine
      
       - On engine idling, we emit a switch to kernel context. When this
         switch completes, we know that all previous contexts must have been
         saved, and so on retiring this request we can finally unpin all the
         contexts that were marked as deactivated prior to the switch.
      
      We can enhance this in future by flushing all the idle contexts on a
      regular heartbeat pulse of a switch to kernel context, which will also
      be used to check for hung engines.
      
      v2: intel_context_active_acquire/_release
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
      ce476c80
  2. 12 6月, 2019 1 次提交
  3. 07 6月, 2019 1 次提交
  4. 28 5月, 2019 1 次提交
    • M
      drm/i915/guc: Updates for GuC 32.0.3 firmware · ffd5ce22
      Michal Wajdeczko 提交于
      New GuC 32.0.3 firmware made many changes around its ABI that
      require driver updates:
      
      * FW release version numbering schema now includes patch number
      * FW release version encoding in CSS header
      * Boot parameters
      * Suspend/resume protocol
      * Sample-forcewake command
      * Additional Data Structures (ADS)
      
      This commit is a squash of patches 3-8 from series [1].
      [1] https://patchwork.freedesktop.org/series/58760/Signed-off-by: NMichal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
      Cc: Jeff Mcgee <jeff.mcgee@intel.com>
      Cc: John Spotswood <john.a.spotswood@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Tomasz Lis <tomasz.lis@intel.com>
      Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> # numbering schema
      Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> # ccs heaser
      Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> # boot params
      Acked-by: John Spotswood <john.a.spotswood@intel.com> # suspend/resume
      Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> # sample-forcewake
      Acked-by: John Spotswood <john.a.spotswood@intel.com> # sample-forcewake
      Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> # ADS
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190527183613.17076-4-michal.wajdeczko@intel.com
      ffd5ce22
  5. 08 5月, 2019 2 次提交
  6. 01 5月, 2019 1 次提交
  7. 27 4月, 2019 1 次提交
  8. 25 4月, 2019 2 次提交
    • C
      drm/i915: Invert the GEM wakeref hierarchy · 79ffac85
      Chris Wilson 提交于
      In the current scheme, on submitting a request we take a single global
      GEM wakeref, which trickles down to wake up all GT power domains. This
      is undesirable as we would like to be able to localise our power
      management to the available power domains and to remove the global GEM
      operations from the heart of the driver. (The intent there is to push
      global GEM decisions to the boundary as used by the GEM user interface.)
      
      Now during request construction, each request is responsible via its
      logical context to acquire a wakeref on each power domain it intends to
      utilize. Currently, each request takes a wakeref on the engine(s) and
      the engines themselves take a chipset wakeref. This gives us a
      transition on each engine which we can extend if we want to insert more
      powermangement control (such as soft rc6). The global GEM operations
      that currently require a struct_mutex are reduced to listening to pm
      events from the chipset GT wakeref. As we reduce the struct_mutex
      requirement, these listeners should evaporate.
      
      Perhaps the biggest immediate change is that this removes the
      struct_mutex requirement around GT power management, allowing us greater
      flexibility in request construction. Another important knock-on effect,
      is that by tracking engine usage, we can insert a switch back to the
      kernel context on that engine immediately, avoiding any extra delay or
      inserting global synchronisation barriers. This makes tracking when an
      engine and its associated contexts are idle much easier -- important for
      when we forgo our assumed execution ordering and need idle barriers to
      unpin used contexts. In the process, it means we remove a large chunk of
      code whose only purpose was to switch back to the kernel context.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
      79ffac85
    • C
      drm/i915: Move GraphicsTechnology files under gt/ · 112ed2d3
      Chris Wilson 提交于
      Start partitioning off the code that talks to the hardware (GT) from the
      uapi layers and move the device facing code under gt/
      
      One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
      subdivide that header and body further (and split out the submission
      code from the ringbuffer and logical context handling). This patch aims
      to be simple motion so git can fixup inflight patches with little mess.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Acked-by: NJani Nikula <jani.nikula@intel.com>
      Acked-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
      112ed2d3
  9. 16 4月, 2019 1 次提交
  10. 12 4月, 2019 1 次提交
  11. 11 4月, 2019 1 次提交
  12. 27 3月, 2019 1 次提交
  13. 19 3月, 2019 2 次提交
  14. 08 3月, 2019 2 次提交
  15. 06 3月, 2019 3 次提交
  16. 02 3月, 2019 1 次提交
    • C
      drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ · e8861964
      Chris Wilson 提交于
      Having introduced per-context seqno, we now have a means to identity
      progress across the system without feel of rollback as befell the
      global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
      advance of submission safe in the knowledge that our target seqno and
      address is stable.
      
      However, since we are telling the GPU to busy-spin on the target address
      until it matches the signaling seqno, we only want to do so when we are
      sure that busy-spin will be completed quickly. To achieve this we only
      submit the request to HW once the signaler is itself executing (modulo
      preemption causing us to wait longer), and we only do so for default and
      above priority requests (so that idle priority tasks never themselves
      hog the GPU waiting for others).
      
      As might be reasonably expected, HW semaphores excel in inter-engine
      synchronisation microbenchmarks (where the 3x reduced latency / increased
      throughput more than offset the power cost of spinning on a second ring)
      and have significant improvement (can be up to ~10%, most see no change)
      for single clients that utilize multiple engines (typically media players
      and transcoders), without regressing multiple clients that can saturate
      the system or changing the power envelope dramatically.
      
      v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
      v4: Tell the world and include it as part of scheduler caps.
      
      Testcase: igt/gem_exec_whisper
      Testcase: igt/benchmarks/gem_wsim
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
      e8861964
  17. 28 2月, 2019 2 次提交
  18. 26 2月, 2019 3 次提交
  19. 23 2月, 2019 1 次提交
  20. 12 2月, 2019 1 次提交
  21. 08 2月, 2019 1 次提交
    • C
      drm/i915: Hack and slash, throttle execbuffer hogs · d6f328bf
      Chris Wilson 提交于
      Apply backpressure to hogs that emit requests faster than the GPU can
      process them by waiting for their ring to be less than half-full before
      proceeding with taking the struct_mutex.
      
      This is a gross hack to apply throttling backpressure, the long term
      goal is to remove the struct_mutex contention so that each client
      naturally waits, preferably in an asynchronous, nonblocking fashion
      (pipelined operations for the win), for their own resources and never
      blocks another client within the driver at least. (Realtime priority
      goals would extend to ensuring that resource contention favours high
      priority clients as well.)
      
      This patch only limits excessive request production and does not attempt
      to throttle clients that block waiting for eviction (either global GTT or
      system memory) or any other global resources, see above for the long term
      goal.
      
      No microbenchmarks are harmed (to the best of my knowledge).
      
      Testcase: igt/gem_exec_schedule/pi-ringfull-*
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
      d6f328bf
  22. 06 2月, 2019 1 次提交
  23. 04 2月, 2019 1 次提交
    • C
      drm/i915: Allow normal clients to always preempt idle priority clients · 3d7a64b9
      Chris Wilson 提交于
      When first enabling preemption, we hesitated from making it a free-for-all
      where every higher priority client would force a preempt-to-idle cycle
      and take over from all lower priority clients. We hesitated because we
      were uncertain just how well preemption would work in practice, whether
      the preemption latency itself would detract from the latency gains for
      higher priority tasks and whether it would work at all. Since
      introducing preemption, we have been enabling it for more common tasks,
      even giving normal clients a small preemptive boost when they first
      start (to aide fairness and improve interactivity). Now lets take one
      step further and give permission for all normal (priority:0) clients to
      preempt any idle (priority:<0) task so that users running long compute
      jobs do not overly impact other jobs (i.e. their desktop) and the system
      remains responsive under such idle loads.
      
      References: f6322edd ("drm/i915/preemption: Allow preemption between submission ports")
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: "Bloomfield, Jon" <jon.bloomfield@intel.com>
      Cc: "Stead, Alan" <alan.stead@intel.com>
      Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190204084116.3013-1-chris@chris-wilson.co.uk
      3d7a64b9
  24. 30 1月, 2019 5 次提交
    • C
      drm/i915: Drop fake breadcrumb irq · 789659f4
      Chris Wilson 提交于
      Missed breadcrumb detection is defunct due to the tight coupling with
      dma_fence signaling and the myriad ways we may signal fences from
      everywhere but from an interrupt, i.e. we frequently signal a fence
      before we even see its interrupt. This means that even if we miss an
      interrupt for a fence, it still is signaled before our breadcrumb
      hangcheck fires, so simplify the breadcrumb hangchecking by moving it
      into the GPU hangcheck and forgo fake interrupts.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-3-chris@chris-wilson.co.uk
      789659f4
    • C
      drm/i915: Replace global breadcrumbs with per-context interrupt tracking · 52c0fdb2
      Chris Wilson 提交于
      A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the
      thundering i915_wait_request herd"), the issue of handling multiple
      clients waiting in parallel was brought to our attention. The
      requirement was that every client should be woken immediately upon its
      request being signaled, without incurring any cpu overhead.
      
      To handle certain fragility of our hw meant that we could not do a
      simple check inside the irq handler (some generations required almost
      unbounded delays before we could be sure of seqno coherency) and so
      request completion checking required delegation.
      
      Before commit 688e6c72, the solution was simple. Every client
      waiting on a request would be woken on every interrupt and each would do
      a heavyweight check to see if their request was complete. Commit
      688e6c72 introduced an rbtree so that only the earliest waiter on
      the global timeline would woken, and would wake the next and so on.
      (Along with various complications to handle requests being reordered
      along the global timeline, and also a requirement for kthread to provide
      a delegate for fence signaling that had no process context.)
      
      The global rbtree depends on knowing the execution timeline (and global
      seqno). Without knowing that order, we must instead check all contexts
      queued to the HW to see which may have advanced. We trim that list by
      only checking queued contexts that are being waited on, but still we
      keep a list of all active contexts and their active signalers that we
      inspect from inside the irq handler. By moving the waiters onto the fence
      signal list, we can combine the client wakeup with the dma_fence
      signaling (a dramatic reduction in complexity, but does require the HW
      being coherent, the seqno must be visible from the cpu before the
      interrupt is raised - we keep a timer backup just in case).
      
      Having previously fixed all the issues with irq-seqno serialisation (by
      inserting delays onto the GPU after each request instead of random delays
      on the CPU after each interrupt), we can rely on the seqno state to
      perfom direct wakeups from the interrupt handler. This allows us to
      preserve our single context switch behaviour of the current routine,
      with the only downside that we lose the RT priority sorting of wakeups.
      In general, direct wakeup latency of multiple clients is about the same
      (about 10% better in most cases) with a reduction in total CPU time spent
      in the waiter (about 20-50% depending on gen). Average herd behaviour is
      improved, but at the cost of not delegating wakeups on task_prio.
      
      v2: Capture fence signaling state for error state and add comments to
      warm even the most cold of hearts.
      v3: Check if the request is still active before busywaiting
      v4: Reduce the amount of pointer misdirection with list_for_each_safe
      and using a local i915_request variable inside the loops
      v5: Add a missing pluralisation to a purely informative selftest message.
      
      References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
      52c0fdb2
    • C
      drm/i915/execlists: Suppress preempting self · c9a64622
      Chris Wilson 提交于
      In order to avoid preempting ourselves, we currently refuse to schedule
      the tasklet if we reschedule an inflight context. However, this glosses
      over a few issues such as what happens after a CS completion event and
      we then preempt the newly executing context with itself, or if something
      else causes a tasklet_schedule triggering the same evaluation to
      preempt the active context with itself.
      
      However, when we avoid preempting ELSP[0], we still retain the preemption
      value as it may match a second preemption request within the same time period
      that we need to resolve after the next CS event. However, since we only
      store the maximum preemption priority seen, it may not match the
      subsequent event and so we should double check whether or not we
      actually do need to trigger a preempt-to-idle by comparing the top
      priorities from each queue. Later, this gives us a hook for finer
      control over deciding whether the preempt-to-idle is justified.
      
      The sequence of events where we end up preempting for no avail is:
      
      1. Queue requests/contexts A, B
      2. Priority boost A; no preemption as it is executing, but keep hint
      3. After CS switch, B is less than hint, force preempt-to-idle
      4. Resubmit B after idling
      
      v2: We can simplify a bunch of tests based on the knowledge that PI will
      ensure that earlier requests along the same context will have the highest
      priority.
      v3: Demonstrate the stale preemption hint with a selftest
      
      References: a2bf92e8 ("drm/i915/execlists: Avoid kicking priority on the current context")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
      c9a64622
    • C
      drm/i915: Rename execlists->queue_priority to queue_priority_hint · 4d97cbe0
      Chris Wilson 提交于
      After noticing that we trigger preemption events for currently executing
      requests, as well as requests that complete before the preemption and
      attempting to suppress those preemption events, it is wise to not
      consider the queue_priority to be authoritative. As we only track the
      maximum priority seen between dequeue passes, if the maximum priority
      request is no longer available for dequeuing (it completed or is even
      executing on another engine), we have no knowledge of the previous
      queue_priority as it would require us to keep a full history of enqueued
      requests -- but we already have that history in the priolists!
      
      Rename the queue_priority to queue_priority_hint so that we do not
      confuse it as being exactly the maximum priority in the queue, but merely
      an indication that we have seen a new maximum priority value and as such
      we should check whether it should preempt the currently running request.
      
      v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it
      being only used for the singular task of preemption and not the wider
      question of waking up due to a change in the queue.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
      4d97cbe0
    • C
      drm/i915: Identify active requests · 85474441
      Chris Wilson 提交于
      To allow requests to forgo a common execution timeline, one question we
      need to be able to answer is "is this request running?". To track
      whether a request has started on HW, we can emit a breadcrumb at the
      beginning of the request and check its timeline's HWSP to see if the
      breadcrumb has advanced past the start of this request. (This is in
      contrast to the global timeline where we need only ask if we are on the
      global timeline and if the timeline has advanced past the end of the
      previous request.)
      
      There is still confusion from a preempted request, which has already
      started but relinquished the HW to a high priority request. For the
      common case, this discrepancy should be negligible. However, for
      identification of hung requests, knowing which one was running at the
      time of the hang will be much more important.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
      85474441
  25. 29 1月, 2019 2 次提交