1. 03 7月, 2019 1 次提交
  2. 20 6月, 2019 1 次提交
  3. 19 6月, 2019 1 次提交
  4. 14 6月, 2019 1 次提交
  5. 11 6月, 2019 1 次提交
  6. 29 5月, 2019 1 次提交
  7. 28 5月, 2019 2 次提交
  8. 22 5月, 2019 3 次提交
    • C
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson 提交于
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • C
      drm/i915: Apply an execution_mask to the virtual_engine · 78e41ddd
      Chris Wilson 提交于
      Allow the user to direct which physical engines of the virtual engine
      they wish to execute one, as sometimes it is necessary to override the
      load balancing algorithm.
      
      v2: Only kick the virtual engines on context-out if required
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
      78e41ddd
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
  9. 17 5月, 2019 1 次提交
    • C
      drm/i915: Bump signaler priority on adding a waiter · 6e7eb7a8
      Chris Wilson 提交于
      The handling of the no-preemption priority level imposes the restriction
      that we need to maintain the implied ordering even though preemption is
      disabled. Otherwise we may end up with an AB-BA deadlock across multiple
      engine due to a real preemption event reordering the no-preemption
      WAITs. To resolve this issue we currently promote all requests to WAIT
      on unsubmission, however this interferes with the timeslicing
      requirement that we do not apply any implicit promotion that will defeat
      the round-robin timeslice list. (If we automatically promote the active
      request it will go back to the head of the queue and not the tail!)
      
      So we need implicit promotion to prevent reordering around semaphores
      where we are not allowed to preempt, and we must avoid implicit
      promotion on unsubmission. So instead of at unsubmit, if we apply that
      implicit promotion on adding the dependency, we avoid the semaphore
      deadlock and we also reduce the gains made by the promotion for user
      space waiting. Furthermore, by keeping the earlier dependencies at a
      higher level, we reduce the search space for timeslicing without
      altering runtime scheduling too badly (no dependencies at all will be
      assigned a higher priority for rrul).
      
      v2: Limit the bump to external edges (as originally intended) i.e.
      between contexts and out to the user.
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
      6e7eb7a8
  10. 08 5月, 2019 1 次提交
  11. 27 4月, 2019 1 次提交
  12. 25 4月, 2019 1 次提交
  13. 05 4月, 2019 2 次提交
  14. 22 3月, 2019 3 次提交
    • C
      drm/i915/selftests: Mark up preemption tests for hang detection · e70d3d80
      Chris Wilson 提交于
      Use the igt_live_test framework for detecting whether an unwanted hang
      occurred during test execution, and report failure if it does.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321194031.20240-2-chris@chris-wilson.co.uk
      e70d3d80
    • C
      drm/i915/selftests: Calculate maximum ring size for preemption chain · d067994c
      Chris Wilson 提交于
      32 is too many for the likes of kbl, and in order to insert that many
      requests into the ring requires us to declare the first few hung --
      understandably a slow and unexpected process. Instead, measure the size
      of a singe requests and use that to estimate the upper bound on the
      chain length we can use for our test, remembering to flush the previous
      chain between tests for safety.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: N"Yokoyama, Caz" <caz.yokoyama@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321194031.20240-1-chris@chris-wilson.co.uk
      d067994c
    • C
      drm/i915: Flush pages on acquisition · a679f58d
      Chris Wilson 提交于
      When we return pages to the system, we ensure that they are marked as
      being in the CPU domain since any external access is uncontrolled and we
      must assume the worst. This means that we need to always flush the pages
      on acquisition if we need to use them on the GPU, and from the beginning
      have used set-domain. Set-domain is overkill for the purpose as it is a
      general synchronisation barrier, but our intent is to only flush the
      pages being swapped in. If we move that flush into the pages acquisition
      phase, we know then that when we have obj->mm.pages, they are coherent
      with the GPU and need only maintain that status without resorting to
      heavy handed use of set-domain.
      
      The principle knock-on effect for userspace is through mmap-gtt
      pagefaulting. Our uAPI has always implied that the GTT mmap was async
      (especially as when any pagefault occurs is unpredicatable to userspace)
      and so userspace had to apply explicit domain control itself
      (set-domain). However, swapping is transparent to the kernel, and so on
      first fault we need to acquire the pages and make them coherent for
      access through the GTT. Our use of set-domain here leaks into the uABI
      that the first pagefault was synchronous. This is unintentional and
      baring a few igt should be unoticed, nevertheless we bump the uABI
      version for mmap-gtt to reflect the change in behaviour.
      
      Another implication of the change is that gem_create() is presumed to
      create an object that is coherent with the CPU and is in the CPU write
      domain, so a set-domain(CPU) following a gem_create() would be a minor
      operation that merely checked whether we could allocate all pages for
      the object. On applying this change, a set-domain(CPU) causes a clflush
      as we acquire the pages. This will have a small impact on mesa as we move
      the clflush here on !llc from execbuf time to create, but that should
      have minimal performance impact as the same clflush exists but is now
      done early and because of the clflush issue, userspace recycles bo and
      so should resist allocating fresh objects.
      
      Internally, the presumption that objects are created in the CPU
      write-domain and remain so through writes to obj->mm.mapping is more
      prevalent than I expected; but easy enough to catch and apply a manual
      flush.
      
      For the future, we should push the page flush from the central
      set_pages() into the callers so that we can more finely control when it
      is applied, but for now doing it one location is easier to validate, at
      the cost of sometimes flushing when there is no need.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Antonio Argenziano <antonio.argenziano@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
      a679f58d
  15. 08 3月, 2019 1 次提交
  16. 06 3月, 2019 1 次提交
  17. 01 3月, 2019 1 次提交
    • C
      drm/i915/execlists: Suppress mere WAIT preemption · b5773a36
      Chris Wilson 提交于
      WAIT is occasionally suppressed by virtue of preempted requests being
      promoted to NEWCLIENT if they have not all ready received that boost.
      Make this consistent for all WAIT boosts that they are not allowed to
      preempt executing contexts and are merely granted the right to be at the
      front of the queue for the next execution slot. This is in keeping with
      the desire that the WAIT boost be a minor tweak that does not give
      excessive promotion to its user and open ourselves to trivial abuse.
      
      The problem with the inconsistent WAIT preemption becomes more apparent
      as the preemption is propagated across the engines, where one engine may
      preempt and the other not, and we be relying on the exact execution
      order being consistent across engines (e.g. using HW semaphores to
      coordinate parallel execution).
      
      v2: Also protect GuC submission from false preemption loops.
      v3: Build bug safeguards and better debug messages for st.
      v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
      unwind), applying it earlier during submit causes out-of-order execution
      combined with execute fences.
      v5: Call sw_fence_fini for our dummy request (Matthew)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
      b5773a36
  18. 21 2月, 2019 1 次提交
  19. 06 2月, 2019 1 次提交
  20. 30 1月, 2019 1 次提交
    • C
      drm/i915/execlists: Suppress preempting self · c9a64622
      Chris Wilson 提交于
      In order to avoid preempting ourselves, we currently refuse to schedule
      the tasklet if we reschedule an inflight context. However, this glosses
      over a few issues such as what happens after a CS completion event and
      we then preempt the newly executing context with itself, or if something
      else causes a tasklet_schedule triggering the same evaluation to
      preempt the active context with itself.
      
      However, when we avoid preempting ELSP[0], we still retain the preemption
      value as it may match a second preemption request within the same time period
      that we need to resolve after the next CS event. However, since we only
      store the maximum preemption priority seen, it may not match the
      subsequent event and so we should double check whether or not we
      actually do need to trigger a preempt-to-idle by comparing the top
      priorities from each queue. Later, this gives us a hook for finer
      control over deciding whether the preempt-to-idle is justified.
      
      The sequence of events where we end up preempting for no avail is:
      
      1. Queue requests/contexts A, B
      2. Priority boost A; no preemption as it is executing, but keep hint
      3. After CS switch, B is less than hint, force preempt-to-idle
      4. Resubmit B after idling
      
      v2: We can simplify a bunch of tests based on the knowledge that PI will
      ensure that earlier requests along the same context will have the highest
      priority.
      v3: Demonstrate the stale preemption hint with a selftest
      
      References: a2bf92e8 ("drm/i915/execlists: Avoid kicking priority on the current context")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
      c9a64622
  21. 17 1月, 2019 1 次提交
  22. 15 1月, 2019 2 次提交
  23. 02 1月, 2019 1 次提交
  24. 30 11月, 2018 1 次提交
  25. 03 10月, 2018 1 次提交
    • C
      drm/i915/selftests: Hold task_struct ref for smoking kthread · 5ec244f4
      Chris Wilson 提交于
      As the kthread may terminate itself, the parent must hold a task_struct
      reference for it to call kthread_stop().
      
      <4> [498.827675] stack segment: 0000 [#1] PREEMPT SMP PTI
      <4> [498.827683] CPU: 0 PID: 3872 Comm: drv_selftest Tainted: G     U            4.19.0-rc6-CI-CI_DRM_4915+ #1
      <4> [498.827686] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
      <4> [498.827695] RIP: 0010:kthread_stop+0x36/0x210
      <4> [498.827698] Code: 05 df 3d f6 7e 89 c0 48 0f a3 05 95 f8 29 01 0f 82 56 01 00 00 f0 ff 43 20 f6 43 26 20 0f 84 7f 01 00 00 48 8b ab b0 05 00 00 <f0> 80 4d 00 02 48 89 df e8 5d ff ff ff 48 89 df e8 15 c7 00 00 48
      <4> [498.827701] RSP: 0018:ffffc900003937d0 EFLAGS: 00010202
      <4> [498.827704] RAX: 0000000000000001 RBX: ffff8802165ece40 RCX: 0000000000000001
      <4> [498.827707] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: ffffffff82247460
      <4> [498.827709] RBP: 6b6b6b6b6b6b6b6b R08: 00000000581395cb R09: 0000000000000001
      <4> [498.827711] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000393868
      <4> [498.827713] R13: ffffc900003937f0 R14: ffff88026c068040 R15: 0000000000001057
      <4> [498.827716] FS:  00007fc0c464b980(0000) GS:ffff880277e00000(0000) knlGS:0000000000000000
      <4> [498.827718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [498.827720] CR2: 000056178c2feca0 CR3: 000000026983c000 CR4: 0000000000340ef0
      <4> [498.827723] Call Trace:
      <4> [498.827824]  smoke_crescendo+0x14c/0x1d0 [i915]
      <4> [498.827837]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
      <4> [498.827898]  ? __i915_gem_context_pin_hw_id+0x69/0x5f0 [i915]
      <4> [498.827902]  ? ida_alloc_range+0x1f2/0x3d0
      <4> [498.827907]  ? __mutex_unlock_slowpath+0x46/0x2b0
      <4> [498.827914]  ? rcu_lockdep_current_cpu_online+0x8f/0xd0
      <4> [498.827979]  live_preempt_smoke+0x2c2/0x470 [i915]
      <4> [498.828047]  __i915_subtests+0x5e/0xf0 [i915]
      <4> [498.828113]  __run_selftests+0x10b/0x190 [i915]
      <4> [498.828175]  i915_live_selftests+0x2c/0x60 [i915]
      <4> [498.828232]  i915_pci_probe+0x50/0xa0 [i915]
      <4> [498.828238]  pci_device_probe+0xa1/0x130
      <4> [498.828244]  really_probe+0x25d/0x3c0
      <4> [498.828249]  driver_probe_device+0x10a/0x120
      <4> [498.828253]  __driver_attach+0xdb/0x100
      <4> [498.828256]  ? driver_probe_device+0x120/0x120
      <4> [498.828259]  bus_for_each_dev+0x74/0xc0
      <4> [498.828264]  bus_add_driver+0x15f/0x250
      <4> [498.828268]  ? 0xffffffffa00c3000
      <4> [498.828271]  driver_register+0x56/0xe0
      <4> [498.828274]  ? 0xffffffffa00c3000
      <4> [498.828278]  do_one_initcall+0x58/0x2e0
      <4> [498.828281]  ? rcu_lockdep_current_cpu_online+0x8f/0xd0
      <4> [498.828285]  ? do_init_module+0x1d/0x1ea
      <4> [498.828289]  ? rcu_read_lock_sched_held+0x6f/0x80
      <4> [498.828293]  ? kmem_cache_alloc_trace+0x264/0x290
      <4> [498.828297]  do_init_module+0x56/0x1ea
      <4> [498.828302]  load_module+0x26f5/0x29d0
      <4> [498.828309]  ? vfs_read+0x122/0x140
      <4> [498.828318]  ? __se_sys_finit_module+0xd3/0xf0
      <4> [498.828321]  __se_sys_finit_module+0xd3/0xf0
      <4> [498.828329]  do_syscall_64+0x55/0x190
      <4> [498.828332]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      <4> [498.828335] RIP: 0033:0x7fc0c3f16839
      
      Fixes: 992d2098 ("drm/i915/selftests: Split preemption smoke test into threads")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181002132927.7669-1-chris@chris-wilson.co.uk
      5ec244f4
  26. 02 10月, 2018 1 次提交
  27. 01 10月, 2018 3 次提交
  28. 27 9月, 2018 1 次提交
  29. 21 9月, 2018 1 次提交
  30. 17 7月, 2018 1 次提交
  31. 07 7月, 2018 1 次提交