1. 04 10月, 2019 5 次提交
  2. 25 9月, 2019 1 次提交
  3. 20 9月, 2019 1 次提交
    • C
      drm/i915: Mark i915_request.timeline as a volatile, rcu pointer · d19d71fc
      Chris Wilson 提交于
      The request->timeline is only valid until the request is retired (i.e.
      before it is completed). Upon retiring the request, the context may be
      unpinned and freed, and along with it the timeline may be freed. We
      therefore need to be very careful when chasing rq->timeline that the
      pointer does not disappear beneath us. The vast majority of users are in
      a protected context, either during request construction or retirement,
      where the timeline->mutex is held and the timeline cannot disappear. It
      is those few off the beaten path (where we access a second timeline) that
      need extra scrutiny -- to be added in the next patch after first adding
      the warnings about dangerous access.
      
      One complication, where we cannot use the timeline->mutex itself, is
      during request submission onto hardware (under spinlocks). Here, we want
      to check on the timeline to finalize the breadcrumb, and so we need to
      impose a second rule to ensure that the request->timeline is indeed
      valid. As we are submitting the request, it's context and timeline must
      be pinned, as it will be used by the hardware. Since it is pinned, we
      know the request->timeline must still be valid, and we cannot submit the
      idle barrier until after we release the engine->active.lock, ergo while
      submitting and holding that spinlock, a second thread cannot release the
      timeline.
      
      v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
      as we need it, and tidy up acquiring the timeline with a bit of
      refactoring (i915_active_add_request)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
      d19d71fc
  4. 03 9月, 2019 1 次提交
  5. 31 8月, 2019 1 次提交
  6. 24 8月, 2019 1 次提交
  7. 20 8月, 2019 1 次提交
  8. 17 8月, 2019 1 次提交
  9. 10 8月, 2019 3 次提交
  10. 08 8月, 2019 1 次提交
  11. 06 8月, 2019 1 次提交
  12. 02 8月, 2019 1 次提交
  13. 31 7月, 2019 1 次提交
  14. 30 7月, 2019 2 次提交
  15. 17 7月, 2019 1 次提交
  16. 13 7月, 2019 1 次提交
  17. 11 7月, 2019 1 次提交
  18. 05 7月, 2019 1 次提交
  19. 22 6月, 2019 1 次提交
  20. 21 6月, 2019 2 次提交
  21. 20 6月, 2019 1 次提交
    • C
      drm/i915/execlists: Preempt-to-busy · 22b7a426
      Chris Wilson 提交于
      When using a global seqno, we required a precise stop-the-workd event to
      handle preemption and unwind the global seqno counter. To accomplish
      this, we would preempt to a special out-of-band context and wait for the
      machine to report that it was idle. Given an idle machine, we could very
      precisely see which requests had completed and which we needed to feed
      back into the run queue.
      
      However, now that we have scrapped the global seqno, we no longer need
      to precisely unwind the global counter and only track requests by their
      per-context seqno. This allows us to loosely unwind inflight requests
      while scheduling a preemption, with the enormous caveat that the
      requests we put back on the run queue are still _inflight_ (until the
      preemption request is complete). This makes request tracking much more
      messy, as at any point then we can see a completed request that we
      believe is not currently scheduled for execution. We also have to be
      careful not to rewind RING_TAIL past RING_HEAD on preempting to the
      running context, and for this we use a semaphore to prevent completion
      of the request before continuing.
      
      To accomplish this feat, we change how we track requests scheduled to
      the HW. Instead of appending our requests onto a single list as we
      submit, we track each submission to ELSP as its own block. Then upon
      receiving the CS preemption event, we promote the pending block to the
      inflight block (discarding what was previously being tracked). As normal
      CS completion events arrive, we then remove stale entries from the
      inflight tracker.
      
      v2: Be a tinge paranoid and ensure we flush the write into the HWS page
      for the GPU semaphore to pick in a timely fashion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
      22b7a426
  22. 17 6月, 2019 1 次提交
  23. 15 6月, 2019 1 次提交
    • C
      drm/i915: Keep contexts pinned until after the next kernel context switch · ce476c80
      Chris Wilson 提交于
      We need to keep the context image pinned in memory until after the GPU
      has finished writing into it. Since it continues to write as we signal
      the final breadcrumb, we need to keep it pinned until the request after
      it is complete. Currently we know the order in which requests execute on
      each engine, and so to remove that presumption we need to identify a
      request/context-switch we know must occur after our completion. Any
      request queued after the signal must imply a context switch, for
      simplicity we use a fresh request from the kernel context.
      
      The sequence of operations for keeping the context pinned until saved is:
      
       - On context activation, we preallocate a node for each physical engine
         the context may operate on. This is to avoid allocations during
         unpinning, which may be from inside FS_RECLAIM context (aka the
         shrinker)
      
       - On context deactivation on retirement of the last active request (which
         is before we know the context has been saved), we add the
         preallocated node onto a barrier list on each engine
      
       - On engine idling, we emit a switch to kernel context. When this
         switch completes, we know that all previous contexts must have been
         saved, and so on retiring this request we can finally unpin all the
         contexts that were marked as deactivated prior to the switch.
      
      We can enhance this in future by flushing all the idle contexts on a
      regular heartbeat pulse of a switch to kernel context, which will also
      be used to check for hung engines.
      
      v2: intel_context_active_acquire/_release
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
      ce476c80
  24. 11 6月, 2019 2 次提交
  25. 06 6月, 2019 2 次提交
  26. 28 5月, 2019 2 次提交
  27. 22 5月, 2019 3 次提交
    • C
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson 提交于
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
    • C
      drm/i915: Allow userspace to clone contexts on creation · b81dde71
      Chris Wilson 提交于
      A usecase arose out of handling context recovery in mesa, whereby they
      wish to recreate a context with fresh logical state but preserving all
      other details of the original. Currently, they create a new context and
      iterate over which bits they want to copy across, but it would much more
      convenient if they were able to just pass in a target context to clone
      during creation. This essentially extends the setparam during creation
      to pull the details from a target context instead of the user supplied
      parameters.
      
      The ideal here is that we don't expose control over anything more than
      can be obtained via CONTEXT_PARAM. That is userspace retains explicit
      control over all features, and this api is just convenience.
      
      For example, you could replace
      
      	struct context_param p = { .param = CONTEXT_PARAM_VM };
      
      	param.ctx_id = old_id;
      	gem_context_get_param(&p.param);
      
      	new_id = gem_context_create();
      
      	param.ctx_id = new_id;
      	gem_context_set_param(&p.param);
      
      	gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */
      
      with
      
      	struct create_ext_param p = {
      	  { .name = CONTEXT_CREATE_CLONE },
      	  .clone_id = old_id,
      	  .flags = CLONE_FLAGS_VM
      	}
      	new_id = gem_context_create_ext(&p);
      
      and not have to worry about stray namespace pollution etc.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk
      b81dde71