1. 03 9月, 2019 1 次提交
  2. 31 8月, 2019 1 次提交
  3. 24 8月, 2019 1 次提交
  4. 20 8月, 2019 1 次提交
  5. 17 8月, 2019 1 次提交
  6. 10 8月, 2019 3 次提交
  7. 08 8月, 2019 1 次提交
  8. 06 8月, 2019 1 次提交
  9. 02 8月, 2019 1 次提交
  10. 31 7月, 2019 1 次提交
  11. 30 7月, 2019 2 次提交
  12. 17 7月, 2019 1 次提交
  13. 13 7月, 2019 1 次提交
  14. 11 7月, 2019 1 次提交
  15. 05 7月, 2019 1 次提交
  16. 22 6月, 2019 1 次提交
  17. 21 6月, 2019 2 次提交
  18. 20 6月, 2019 1 次提交
    • C
      drm/i915/execlists: Preempt-to-busy · 22b7a426
      Chris Wilson 提交于
      When using a global seqno, we required a precise stop-the-workd event to
      handle preemption and unwind the global seqno counter. To accomplish
      this, we would preempt to a special out-of-band context and wait for the
      machine to report that it was idle. Given an idle machine, we could very
      precisely see which requests had completed and which we needed to feed
      back into the run queue.
      
      However, now that we have scrapped the global seqno, we no longer need
      to precisely unwind the global counter and only track requests by their
      per-context seqno. This allows us to loosely unwind inflight requests
      while scheduling a preemption, with the enormous caveat that the
      requests we put back on the run queue are still _inflight_ (until the
      preemption request is complete). This makes request tracking much more
      messy, as at any point then we can see a completed request that we
      believe is not currently scheduled for execution. We also have to be
      careful not to rewind RING_TAIL past RING_HEAD on preempting to the
      running context, and for this we use a semaphore to prevent completion
      of the request before continuing.
      
      To accomplish this feat, we change how we track requests scheduled to
      the HW. Instead of appending our requests onto a single list as we
      submit, we track each submission to ELSP as its own block. Then upon
      receiving the CS preemption event, we promote the pending block to the
      inflight block (discarding what was previously being tracked). As normal
      CS completion events arrive, we then remove stale entries from the
      inflight tracker.
      
      v2: Be a tinge paranoid and ensure we flush the write into the HWS page
      for the GPU semaphore to pick in a timely fashion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
      22b7a426
  19. 17 6月, 2019 1 次提交
  20. 15 6月, 2019 1 次提交
    • C
      drm/i915: Keep contexts pinned until after the next kernel context switch · ce476c80
      Chris Wilson 提交于
      We need to keep the context image pinned in memory until after the GPU
      has finished writing into it. Since it continues to write as we signal
      the final breadcrumb, we need to keep it pinned until the request after
      it is complete. Currently we know the order in which requests execute on
      each engine, and so to remove that presumption we need to identify a
      request/context-switch we know must occur after our completion. Any
      request queued after the signal must imply a context switch, for
      simplicity we use a fresh request from the kernel context.
      
      The sequence of operations for keeping the context pinned until saved is:
      
       - On context activation, we preallocate a node for each physical engine
         the context may operate on. This is to avoid allocations during
         unpinning, which may be from inside FS_RECLAIM context (aka the
         shrinker)
      
       - On context deactivation on retirement of the last active request (which
         is before we know the context has been saved), we add the
         preallocated node onto a barrier list on each engine
      
       - On engine idling, we emit a switch to kernel context. When this
         switch completes, we know that all previous contexts must have been
         saved, and so on retiring this request we can finally unpin all the
         contexts that were marked as deactivated prior to the switch.
      
      We can enhance this in future by flushing all the idle contexts on a
      regular heartbeat pulse of a switch to kernel context, which will also
      be used to check for hung engines.
      
      v2: intel_context_active_acquire/_release
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
      ce476c80
  21. 11 6月, 2019 2 次提交
  22. 06 6月, 2019 2 次提交
  23. 28 5月, 2019 2 次提交
  24. 22 5月, 2019 7 次提交
    • C
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson 提交于
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
    • C
      drm/i915: Allow userspace to clone contexts on creation · b81dde71
      Chris Wilson 提交于
      A usecase arose out of handling context recovery in mesa, whereby they
      wish to recreate a context with fresh logical state but preserving all
      other details of the original. Currently, they create a new context and
      iterate over which bits they want to copy across, but it would much more
      convenient if they were able to just pass in a target context to clone
      during creation. This essentially extends the setparam during creation
      to pull the details from a target context instead of the user supplied
      parameters.
      
      The ideal here is that we don't expose control over anything more than
      can be obtained via CONTEXT_PARAM. That is userspace retains explicit
      control over all features, and this api is just convenience.
      
      For example, you could replace
      
      	struct context_param p = { .param = CONTEXT_PARAM_VM };
      
      	param.ctx_id = old_id;
      	gem_context_get_param(&p.param);
      
      	new_id = gem_context_create();
      
      	param.ctx_id = new_id;
      	gem_context_set_param(&p.param);
      
      	gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */
      
      with
      
      	struct create_ext_param p = {
      	  { .name = CONTEXT_CREATE_CLONE },
      	  .clone_id = old_id,
      	  .flags = CLONE_FLAGS_VM
      	}
      	new_id = gem_context_create_ext(&p);
      
      and not have to worry about stray namespace pollution etc.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk
      b81dde71
    • C
      drm/i915: Re-expose SINGLE_TIMELINE flags for context creation · 8319f44c
      Chris Wilson 提交于
      The SINGLE_TIMELINE flag can be used to create a context such that all
      engine instances within that context share a common timeline. This can
      be useful for mixing operations between real and virtual engines, or
      when using a composite context for a single client API context.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-4-chris@chris-wilson.co.uk
      8319f44c
    • C
      drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] · e620f7b3
      Chris Wilson 提交于
      Allow the user to specify a local engine index (as opposed to
      class:index) that they can use to refer to a preset engine inside the
      ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
      This will be useful for setting SSEU parameters on virtual engines that
      are local to the context and do not have a valid global class:instance
      lookup.
      
      Note that due to the ambiguity in using class:instance with
      ctx->engines[], if a user supplied engine map is active the user must
      specify the engine to alter by its index into the ctx->engines[].
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-3-chris@chris-wilson.co.uk
      e620f7b3
    • C
      drm/i915: Allow a context to define its set of engines · 976b55f0
      Chris Wilson 提交于
      Over the last few years, we have debated how to extend the user API to
      support an increase in the number of engines, that may be sparse and
      even be heterogeneous within a class (not all video decoders created
      equal). We settled on using (class, instance) tuples to identify a
      specific engine, with an API for the user to construct a map of engines
      to capabilities. Into this picture, we then add a challenge of virtual
      engines; one user engine that maps behind the scenes to any number of
      physical engines. To keep it general, we want the user to have full
      control over that mapping. To that end, we allow the user to constrain a
      context to define the set of engines that it can access, order fully
      controlled by the user via (class, instance). With such precise control
      in context setup, we can continue to use the existing execbuf uABI of
      specifying a single index; only now it doesn't automagically map onto
      the engines, it uses the user defined engine map from the context.
      
      v2: Fixup freeing of local on success of get_engines()
      v3: Allow empty engines[]
      v4: s/nengine/num_engines/
      v5: Replace 64 limit on num_engines with a note that execbuf is
      currently limited to only using the first 64 engines.
      v6: Actually use the engines_mutex to guard the ctx->engines.
      
      Testcase: igt/gem_ctx_engines
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk
      976b55f0
    • C
      drm/i915: Restore control over ppgtt for context creation ABI · 7f3f317a
      Chris Wilson 提交于
      Having hid the partially exposed new ABI from the PR, put it back again
      for completion of context recovery. A significant part of context
      recovery is the ability to reuse as much of the old context as is
      feasible (to avoid expensive reconstruction). The biggest chunk kept
      hidden at the moment is fine-control over the ctx->ppgtt (the GPU page
      tables and associated translation tables and kernel maps), so make
      control over the ctx->ppgtt explicit.
      
      This allows userspace to create and share virtual memory address spaces
      (within the limits of a single fd) between contexts they own, along with
      the ability to query the contexts for the vm state.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-1-chris@chris-wilson.co.uk
      7f3f317a
  25. 29 4月, 2019 1 次提交
  26. 27 4月, 2019 2 次提交