1. 19 6月, 2019 2 次提交
    • C
      drm/i915/execlists: Detect cross-contamination with GuC · 73591341
      Chris Wilson 提交于
      The process_csb routine from execlists_submission is incompatible with
      the GuC backend. Add a warning to detect if we accidentally end up in
      the wrong spot.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190618110736.31155-1-chris@chris-wilson.co.uk
      73591341
    • C
      drm/i915: Make the semaphore saturation mask global · 44d89409
      Chris Wilson 提交于
      The idea behind keeping the saturation mask local to a context backfired
      spectacularly. The premise with the local mask was that we would be more
      proactive in attempting to use semaphores after each time the context
      idled, and that all new contexts would attempt to use semaphores
      ignoring the current state of the system. This turns out to be horribly
      optimistic. If the system state is still oversaturated and the existing
      workloads have all stopped using semaphores, the new workloads would
      attempt to use semaphores and be deprioritised behind real work. The
      new contexts would not switch off using semaphores until their initial
      batch of low priority work had completed. Given sufficient backload load
      of equal user priority, this would completely starve the new work of any
      GPU time.
      
      To compensate, remove the local tracking in favour of keeping it as
      global state on the engine -- once the system is saturated and
      semaphores are disabled, everyone stops attempting to use semaphores
      until the system is idle again. One of the reason for preferring local
      context tracking was that it worked with virtual engines, so for
      switching to global state we could either do a complete check of all the
      virtual siblings or simply disable semaphores for those requests. This
      takes the simpler approach of disabling semaphores on virtual engines.
      
      The downside is that the decision that the engine is saturated is a
      local measure -- we are only checking whether or not this context was
      scheduled in a timely fashion, it may be legitimately delayed due to user
      priorities. We still have the same dilemma though, that we do not want
      to employ the semaphore poll unless it will be used.
      
      v2: Explain why we need to assume the worst wrt virtual engines.
      
      Fixes: ca6e56f6 ("drm/i915: Disable semaphore busywaits on saturated systems")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190618074153.16055-8-chris@chris-wilson.co.uk
      44d89409
  2. 15 6月, 2019 2 次提交
    • C
      drm/i915: Replace engine->timeline with a plain list · 422d7df4
      Chris Wilson 提交于
      To continue the onslaught of removing the assumption of a global
      execution ordering, another casualty is the engine->timeline. Without an
      actual timeline to track, it is overkill and we can replace it with a
      much less grand plain list. We still need a list of requests inflight,
      for the simple purpose of finding inflight requests (for retiring,
      resetting, preemption etc).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-3-chris@chris-wilson.co.uk
      422d7df4
    • C
      drm/i915: Keep contexts pinned until after the next kernel context switch · ce476c80
      Chris Wilson 提交于
      We need to keep the context image pinned in memory until after the GPU
      has finished writing into it. Since it continues to write as we signal
      the final breadcrumb, we need to keep it pinned until the request after
      it is complete. Currently we know the order in which requests execute on
      each engine, and so to remove that presumption we need to identify a
      request/context-switch we know must occur after our completion. Any
      request queued after the signal must imply a context switch, for
      simplicity we use a fresh request from the kernel context.
      
      The sequence of operations for keeping the context pinned until saved is:
      
       - On context activation, we preallocate a node for each physical engine
         the context may operate on. This is to avoid allocations during
         unpinning, which may be from inside FS_RECLAIM context (aka the
         shrinker)
      
       - On context deactivation on retirement of the last active request (which
         is before we know the context has been saved), we add the
         preallocated node onto a barrier list on each engine
      
       - On engine idling, we emit a switch to kernel context. When this
         switch completes, we know that all previous contexts must have been
         saved, and so on retiring this request we can finally unpin all the
         contexts that were marked as deactivated prior to the switch.
      
      We can enhance this in future by flushing all the idle contexts on a
      regular heartbeat pulse of a switch to kernel context, which will also
      be used to check for hung engines.
      
      v2: intel_context_active_acquire/_release
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
      ce476c80
  3. 11 6月, 2019 2 次提交
  4. 07 6月, 2019 2 次提交
  5. 28 5月, 2019 3 次提交
  6. 22 5月, 2019 3 次提交
    • C
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson 提交于
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • C
      drm/i915: Apply an execution_mask to the virtual_engine · 78e41ddd
      Chris Wilson 提交于
      Allow the user to direct which physical engines of the virtual engine
      they wish to execute one, as sometimes it is necessary to override the
      load balancing algorithm.
      
      v2: Only kick the virtual engines on context-out if required
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
      78e41ddd
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
  7. 17 5月, 2019 2 次提交
    • C
      drm/i915/execlists: Drop promotion on unsubmit · 4cc79cbb
      Chris Wilson 提交于
      With the disappearance of NEWCLIENT, we no longer need to provide the
      priority boost on preemption in order to prevent repeated gazumping,
      and we can remove the dead code.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk
      4cc79cbb
    • C
      drm/i915: Downgrade NEWCLIENT to non-preemptive · 68fc728b
      Chris Wilson 提交于
      Commit 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") had the
      intended consequence of not allowing a sequence of work that merely
      crossed into a new engine the privilege to be promoted to NEWCLIENT
      status. It also had the unintended consequence of actually making
      NEWCLIENT effective on heavily oversubscribed transcode machines and
      impacting upon their throughput.
      
      If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
      those clients, using the NEWCLIENT boost that will be scheduled as
      
      	rcsA x 30, (rcsB, vcs) x 30
      
      where as before it would have been
      
      	(rcsA, rcsB, vcs) x 30
      
      That is with NEWCLIENT only boosting the first request of each client,
      we would execute all rcsA requests prior to running on the vcs engines;
      acruing a lot of dead time as compared to the previous case where the
      vcs engine would be started in parallel to processing the second client.
      
      The previous patch has the effect of delaying submission until it is
      required by a third party (either the user with an explicit wait, or by
      another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
      which has the effect of removing its preemptive grant and reducing it to
      the same level as any other user interaction -- that it will not be
      promoted above the interengine dependencies, and so preventing NEWCLIENTS
      from starving other engines. This a large nerf to the rrul properties of
      the current NEWCLIENT, but it still does give prioritised submission to
      new requests from light workloads.
      
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Fixes: 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") # customer impact
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
      68fc728b
  8. 08 5月, 2019 3 次提交
  9. 07 5月, 2019 1 次提交
  10. 03 5月, 2019 1 次提交
  11. 01 5月, 2019 1 次提交
  12. 27 4月, 2019 1 次提交
  13. 25 4月, 2019 3 次提交
    • C
      drm/i915: Invert the GEM wakeref hierarchy · 79ffac85
      Chris Wilson 提交于
      In the current scheme, on submitting a request we take a single global
      GEM wakeref, which trickles down to wake up all GT power domains. This
      is undesirable as we would like to be able to localise our power
      management to the available power domains and to remove the global GEM
      operations from the heart of the driver. (The intent there is to push
      global GEM decisions to the boundary as used by the GEM user interface.)
      
      Now during request construction, each request is responsible via its
      logical context to acquire a wakeref on each power domain it intends to
      utilize. Currently, each request takes a wakeref on the engine(s) and
      the engines themselves take a chipset wakeref. This gives us a
      transition on each engine which we can extend if we want to insert more
      powermangement control (such as soft rc6). The global GEM operations
      that currently require a struct_mutex are reduced to listening to pm
      events from the chipset GT wakeref. As we reduce the struct_mutex
      requirement, these listeners should evaporate.
      
      Perhaps the biggest immediate change is that this removes the
      struct_mutex requirement around GT power management, allowing us greater
      flexibility in request construction. Another important knock-on effect,
      is that by tracking engine usage, we can insert a switch back to the
      kernel context on that engine immediately, avoiding any extra delay or
      inserting global synchronisation barriers. This makes tracking when an
      engine and its associated contexts are idle much easier -- important for
      when we forgo our assumed execution ordering and need idle barriers to
      unpin used contexts. In the process, it means we remove a large chunk of
      code whose only purpose was to switch back to the kernel context.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Imre Deak <imre.deak@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
      79ffac85
    • C
      drm/i915: Introduce context->enter() and context->exit() · 6eee33e8
      Chris Wilson 提交于
      We wish to start segregating the power management into different control
      domains, both with respect to the hardware and the user interface. The
      first step is that at the lowest level flow of requests, we want to
      process a context event (and not a global GEM operation). In this patch,
      we introduce the context callbacks that in future patches will be
      redirected to per-engine interfaces leading to global operations as
      required.
      
      The intent is that this will be guarded by the timeline->mutex, except
      that retiring has not quite finished transitioning over from being
      guarded by struct_mutex. So at the moment it is protected by
      struct_mutex with a reminded to switch.
      
      v2: Rename default handlers to intel_context_enter_engine.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk
      6eee33e8
    • C
      drm/i915: Move GraphicsTechnology files under gt/ · 112ed2d3
      Chris Wilson 提交于
      Start partitioning off the code that talks to the hardware (GT) from the
      uapi layers and move the device facing code under gt/
      
      One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to
      subdivide that header and body further (and split out the submission
      code from the ringbuffer and logical context handling). This patch aims
      to be simple motion so git can fixup inflight patches with little mess.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Acked-by: NJani Nikula <jani.nikula@intel.com>
      Acked-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
      112ed2d3
  14. 24 4月, 2019 1 次提交
  15. 12 4月, 2019 3 次提交
    • C
      drm/i915: Flush the CSB pointer reset · 0edda1d6
      Chris Wilson 提交于
      The HW resets it CSB tail pointer on resetting the engine. Most of the
      time. In case it doesn't (and for system resume) we write the expected
      value anyway. For extra paranoia, flush the write before we invalidate
      the cacheline.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190412110159.10495-1-chris@chris-wilson.co.uk
      0edda1d6
    • C
      drm/i915/execlists: Always reset the context's RING registers · 1863e302
      Chris Wilson 提交于
      During reset, we try and stop the active ring. This has the consequence
      that we often clobber the RING registers within the context image. When
      we find an active request, we update the context image to rerun that
      request (if it was guilty, we replace the hanging user payload with
      NOPs). However, we were ignoring an active context if the request had
      completed, with the consequence that the next submission on that request
      would start with RING_HEAD==0 and not the tail of the previous request,
      causing all requests still in the ring to be rerun. Rare, but
      occasionally seen within CI where we would spot that the context seqno
      would reverse and complain that we were retiring an incomplete request.
      
          <0> [412.390350]   <idle>-0       3d.s2 408373352us : __i915_request_submit: rcs0 fence 1e95b:3640 -> current 3638
          <0> [412.390350]   <idle>-0       3d.s2 408373353us : __i915_request_submit: rcs0 fence 1e95b:3642 -> current 3638
          <0> [412.390350]   <idle>-0       3d.s2 408373354us : __i915_request_submit: rcs0 fence 1e95b:3644 -> current 3638
          <0> [412.390350]   <idle>-0       3d.s2 408373354us : __i915_request_submit: rcs0 fence 1e95b:3646 -> current 3638
          <0> [412.390350]   <idle>-0       3d.s2 408373356us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3646 (current 3638), prio=4
          <0> [412.390350] i915_sel-4613    0.... 408373374us : __i915_request_commit: rcs0 fence 1e95b:3648
          <0> [412.390350] i915_sel-4613    0d..1 408373377us : process_csb: rcs0 cs-irq head=2, tail=3
          <0> [412.390350] i915_sel-4613    0d..1 408373377us : process_csb: rcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
          <0> [412.390350] i915_sel-4613    0d..1 408373378us : __i915_request_submit: rcs0 fence 1e95b:3648 -> current 3638
          <0> [412.390350]   <idle>-0       3..s1 408373378us : execlists_submission_tasklet: rcs0 awake?=1, active=5
          <0> [412.390350] i915_sel-4613    0d..1 408373379us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.2, fence 1e95b:3648 (current 3638), prio=4
          <0> [412.390350] i915_sel-4613    0.... 408373381us : i915_reset_engine: rcs0 flags=4
          <0> [412.390350] i915_sel-4613    0.... 408373382us : execlists_reset_prepare: rcs0: depth<-0
          <0> [412.390350]   <idle>-0       3d.s2 408373390us : process_csb: rcs0 cs-irq head=3, tail=4
          <0> [412.390350]   <idle>-0       3d.s2 408373390us : process_csb: rcs0 csb[4]: status=0x00008002:0x00000002, active=0x1
          <0> [412.390350]   <idle>-0       3d.s2 408373390us : process_csb: rcs0 out[0]: ctx=2.2, fence 1e95b:3648 (current 3640), prio=4
          <0> [412.390350] i915_sel-4613    0.... 408373401us : intel_engine_stop_cs: rcs0
          <0> [412.390350] i915_sel-4613    0d..1 408373402us : process_csb: rcs0 cs-irq head=4, tail=4
          <0> [412.390350] i915_sel-4613    0.... 408373403us : intel_gpu_reset: engine_mask=1
          <0> [412.390350] i915_sel-4613    0d..1 408373408us : execlists_cancel_port_requests: rcs0:port0 fence 1e95b:3648, (current 3648)
          <0> [412.390350] i915_sel-4613    0.... 408373442us : intel_engine_cancel_stop_cs: rcs0
          <0> [412.390350] i915_sel-4613    0.... 408373442us : execlists_reset_finish: rcs0: depth->0
          <0> [412.390350] ksoftirq-26      3..s. 408373442us : execlists_submission_tasklet: rcs0 awake?=1, active=0
          <0> [412.390350] ksoftirq-26      3d.s1 408373443us : process_csb: rcs0 cs-irq head=5, tail=5
          <0> [412.390350] i915_sel-4613    0.... 408373475us : i915_request_retire: rcs0 fence 1e95b:3640, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373476us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3640, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373494us : __i915_request_commit: rcs0 fence 1e95b:3650
          <0> [412.390350] i915_sel-4613    0d..1 408373496us : process_csb: rcs0 cs-irq head=5, tail=5
          <0> [412.390350] i915_sel-4613    0d..1 408373496us : __i915_request_submit: rcs0 fence 1e95b:3650 -> current 3648
          <0> [412.390350] i915_sel-4613    0d..1 408373498us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3650 (current 3648), prio=6
          <0> [412.390350] i915_sel-4613    0.... 408373500us : i915_request_retire_upto: rcs0 fence 1e95b:3648, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373500us : i915_request_retire: rcs0 fence 1e95b:3642, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373501us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3642, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373514us : i915_request_retire: rcs0 fence 1e95b:3644, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373515us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3644, current 3648
          <0> [412.390350] i915_sel-4613    0.... 408373527us : i915_request_retire: rcs0 fence 1e95b:3646, current 3640
          <0> [412.390350]   <idle>-0       3..s1 408373569us : execlists_submission_tasklet: rcs0 awake?=1, active=1
          <0> [412.390350]   <idle>-0       3d.s2 408373569us : process_csb: rcs0 cs-irq head=5, tail=1
          <0> [412.390350]   <idle>-0       3d.s2 408373570us : process_csb: rcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
          <0> [412.390350]   <idle>-0       3d.s2 408373570us : process_csb: rcs0 csb[1]: status=0x00000018:0x00000002, active=0x5
          <0> [412.390350]   <idle>-0       3d.s2 408373570us : process_csb: rcs0 out[0]: ctx=2.1, fence 1e95b:3650 (current 3650), prio=6
          <0> [412.390350]   <idle>-0       3d.s2 408373571us : process_csb: rcs0 completed ctx=2
          <0> [412.390350] i915_sel-4613    0.... 408373621us : i915_request_retire: i915_request_retire:253 GEM_BUG_ON(!i915_request_completed(request))
      
      v2: Fixup the cancellation path to drain the CSB and reset the pointers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-2-chris@chris-wilson.co.uk
      1863e302
    • C
      drm/i915/guc: Implement reset locally · 292ad25c
      Chris Wilson 提交于
      Before causing guc and execlists to diverge further (breaking guc in the
      process), take a copy of the current reset procedure and make it local to
      the guc submission backend
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-1-chris@chris-wilson.co.uk
      292ad25c
  16. 11 4月, 2019 3 次提交
  17. 05 4月, 2019 2 次提交
    • C
      drm/i915: Make RING_PDP relative to engine->mmio_base · 6d425728
      Chris Wilson 提交于
      The PDP registers are an oddity inside the set of context saved
      registers in that they take the engine as a parameter to the macro and
      not the mmio_base as the others do. Make it accept the engine->mmio_base
      for consistency in programming the context registers.
      
      add/remove: 0/0 grow/shrink: 2/1 up/down: 3/-32 (-29)
      Function                                     old     new   delta
      emit_ppgtt_update                            324     326      +2
      capture                                     5102    5103      +1
      execlists_init_reg_state.isra               1128    1096     -32
      
      And similar savings later!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190405123831.9724-1-chris@chris-wilson.co.uk
      6d425728
    • C
      drm/i915/execlists: Enable coarse preemption boundaries for gen8 · bac24f59
      Chris Wilson 提交于
      When we introduced preemption, we chose to keep it disabled for gen8 as
      supporting preemption inside GPGPU user batches required various w/a in
      userspace. Since then, the desire to preempt long queues of requests
      between batches (e.g. within busywaiting semaphores) has grown. So allow
      arbitration within the busywaits and between requests, but disable
      arbitration within user batches so that we can preempt between requests
      and not risk breaking GPGPU.
      
      However, since this preemption is much coarser and doesn't interfere
      with userspace, we decline to include it amongst the scheduler
      capabilities. (This is also required for us to skip over the preemption
      selftests that expect to be able to preempt user batches.)
      
      Michal suggested that we could perhaps allow preemption inside gen8
      userspace batches if we can satisfy ourselves that the default
      preemption settings are viable with existing userspace (principally
      OpenCL which already should carry any known workaround). We could then
      merge the two code paths back into one, even dropping the artifical
      has-preemption device feature flag.
      
      Testcase: igt/gem_exec_scheduler/semaphore-user
      References: beecec90 ("drm/i915/execlists: Preemption!")
      Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michal Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> #irc
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190329134024.5254-1-chris@chris-wilson.co.uk
      bac24f59
  18. 27 3月, 2019 2 次提交
  19. 22 3月, 2019 2 次提交
    • C
      drm/i915: Allow contexts to share a single timeline across all engines · ea593dbb
      Chris Wilson 提交于
      Previously, our view has been always to run the engines independently
      within a context. (Multiple engines happened before we had contexts and
      timelines, so they always operated independently and that behaviour
      persisted into contexts.) However, at the user level the context often
      represents a single timeline (e.g. GL contexts) and userspace must
      ensure that the individual engines are serialised to present that
      ordering to the client (or forgot about this detail entirely and hope no
      one notices - a fair ploy if the client can only directly control one
      engine themselves ;)
      
      In the next patch, we will want to construct a set of engines that
      operate as one, that have a single timeline interwoven between them, to
      present a single virtual engine to the user. (They submit to the virtual
      engine, then we decide which engine to execute on based.)
      
      To that end, we want to be able to create contexts which have a single
      timeline (fence context) shared between all engines, rather than multiple
      timelines.
      
      v2: Move the specialised timeline ordering to its own function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
      ea593dbb
    • C
      drm/i915: Flush pages on acquisition · a679f58d
      Chris Wilson 提交于
      When we return pages to the system, we ensure that they are marked as
      being in the CPU domain since any external access is uncontrolled and we
      must assume the worst. This means that we need to always flush the pages
      on acquisition if we need to use them on the GPU, and from the beginning
      have used set-domain. Set-domain is overkill for the purpose as it is a
      general synchronisation barrier, but our intent is to only flush the
      pages being swapped in. If we move that flush into the pages acquisition
      phase, we know then that when we have obj->mm.pages, they are coherent
      with the GPU and need only maintain that status without resorting to
      heavy handed use of set-domain.
      
      The principle knock-on effect for userspace is through mmap-gtt
      pagefaulting. Our uAPI has always implied that the GTT mmap was async
      (especially as when any pagefault occurs is unpredicatable to userspace)
      and so userspace had to apply explicit domain control itself
      (set-domain). However, swapping is transparent to the kernel, and so on
      first fault we need to acquire the pages and make them coherent for
      access through the GTT. Our use of set-domain here leaks into the uABI
      that the first pagefault was synchronous. This is unintentional and
      baring a few igt should be unoticed, nevertheless we bump the uABI
      version for mmap-gtt to reflect the change in behaviour.
      
      Another implication of the change is that gem_create() is presumed to
      create an object that is coherent with the CPU and is in the CPU write
      domain, so a set-domain(CPU) following a gem_create() would be a minor
      operation that merely checked whether we could allocate all pages for
      the object. On applying this change, a set-domain(CPU) causes a clflush
      as we acquire the pages. This will have a small impact on mesa as we move
      the clflush here on !llc from execbuf time to create, but that should
      have minimal performance impact as the same clflush exists but is now
      done early and because of the clflush issue, userspace recycles bo and
      so should resist allocating fresh objects.
      
      Internally, the presumption that objects are created in the CPU
      write-domain and remain so through writes to obj->mm.mapping is more
      prevalent than I expected; but easy enough to catch and apply a manual
      flush.
      
      For the future, we should push the page flush from the central
      set_pages() into the callers so that we can more finely control when it
      is applied, but for now doing it one location is easier to validate, at
      the cost of sometimes flushing when there is no need.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Antonio Argenziano <antonio.argenziano@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
      a679f58d
  20. 21 3月, 2019 1 次提交