1. 19 6月, 2021 1 次提交
    • M
      drm/i915: Move priolist to new i915_sched_engine object · 3e28d371
      Matthew Brost 提交于
      Introduce i915_sched_engine object which is lower level data structure
      that i915_scheduler / generic code can operate on without touching
      execlist specific structures. This allows additional submission backends
      to be added without breaking the layering. Currently the execlists
      backend uses 1 of these object per each engine (physical or virtual) but
      future backends like the GuC will point to less instances utilizing the
      reference counting.
      
      This is a bit of detour to integrating the i915 with the DRM scheduler
      but this object will still exist when the DRM scheduler lands in the
      i915. It will however look a bit different. It will encapsulate the
      drm_gpu_scheduler object plus and common variables (to the backends)
      related to scheduling. Regardless this is a step in the right direction.
      
      This patch starts the aforementioned transition by moving the priolist
      into the i915_sched_engine object.
      
      v3:
       (Jason Ekstrand)
        Update comment next to intel_engine_cs.virtual
        Add kernel doc
       (Checkpatch)
        Fix double the in commit message
      v4:
       (Daniele)
        Update comment message.
        Add comment about subclass field
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Signed-off-by: NMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210618010638.98941-2-matthew.brost@intel.com
      3e28d371
  2. 25 3月, 2021 2 次提交
  3. 15 1月, 2021 1 次提交
  4. 24 12月, 2020 1 次提交
  5. 20 11月, 2020 1 次提交
    • T
      drm/i915: Show timeline dependencies for debug · da7ac715
      Tvrtko Ursulin 提交于
      Include the signalers each request in the timeline is waiting on, as a
      means to try and identify the cause of a stall. This can be quite
      verbose, even as for now we only show each request in the timeline and
      its immediate antecedents.
      
      This generates output like:
      
      Timeline 886: { count 1, ready: 0, inflight: 0, seqno: { current: 664, last: 666 }, engine: rcs0 }
        U 886:29a-  prio=0 @ 134ms: gem_exec_parall<4621>
          U bc1:27a-  prio=0 @ 134ms: gem_exec_parall[4917]
      Timeline 825: { count 1, ready: 0, inflight: 0, seqno: { current: 802, last: 804 }, engine: vcs0 }
        U 825:324  prio=0 @ 107ms: gem_exec_parall<4518>
          U b75:140-  prio=0 @ 110ms: gem_exec_parall<5486>
      Timeline b46: { count 1, ready: 0, inflight: 0, seqno: { current: 782, last: 784 }, engine: vcs0 }
        U b46:310-  prio=0 @ 70ms: gem_exec_parall<5428>
          U c11:170-  prio=0 @ 70ms: gem_exec_parall[5501]
      Timeline 96b: { count 1, ready: 0, inflight: 0, seqno: { current: 632, last: 634 }, engine: vcs0 }
        U 96b:27a-  prio=0 @ 67ms: gem_exec_parall<4878>
          U b75:19e-  prio=0 @ 67ms: gem_exec_parall<5486>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-6-chris@chris-wilson.co.uk
      da7ac715
  6. 25 5月, 2020 1 次提交
  7. 19 5月, 2020 1 次提交
  8. 14 5月, 2020 1 次提交
  9. 12 5月, 2020 1 次提交
  10. 08 5月, 2020 2 次提交
  11. 01 4月, 2020 2 次提交
  12. 11 3月, 2020 1 次提交
  13. 10 3月, 2020 1 次提交
    • C
      drm/i915/execlsts: Mark up racy inspection of current i915_request priority · a4e648a0
      Chris Wilson 提交于
      [  120.176548] BUG: KCSAN: data-race in __i915_schedule [i915] / effective_prio [i915]
      [  120.176566]
      [  120.176577] write to 0xffff8881e35e6540 of 4 bytes by task 730 on cpu 3:
      [  120.176792]  __i915_schedule+0x63e/0x920 [i915]
      [  120.177007]  __bump_priority+0x63/0x80 [i915]
      [  120.177220]  __i915_sched_node_add_dependency+0x258/0x300 [i915]
      [  120.177438]  i915_sched_node_add_dependency+0x50/0xa0 [i915]
      [  120.177654]  i915_request_await_dma_fence+0x1da/0x530 [i915]
      [  120.177867]  i915_request_await_object+0x2fe/0x470 [i915]
      [  120.178081]  i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
      [  120.178292]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
      [  120.178309]  drm_ioctl_kernel+0xe4/0x120
      [  120.178322]  drm_ioctl+0x297/0x4c7
      [  120.178335]  ksys_ioctl+0x89/0xb0
      [  120.178348]  __x64_sys_ioctl+0x42/0x60
      [  120.178361]  do_syscall_64+0x6e/0x2c0
      [  120.178375]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  120.178387]
      [  120.178397] read to 0xffff8881e35e6540 of 4 bytes by interrupt on cpu 2:
      [  120.178606]  effective_prio+0x25/0xc0 [i915]
      [  120.178812]  process_csb+0xe8b/0x10a0 [i915]
      [  120.179021]  execlists_submission_tasklet+0x30/0x170 [i915]
      [  120.179038]  tasklet_action_common.isra.0+0x42/0xa0
      [  120.179053]  __do_softirq+0xd7/0x2cd
      [  120.179066]  irq_exit+0xbe/0xe0
      [  120.179078]  do_IRQ+0x51/0x100
      [  120.179090]  ret_from_intr+0x0/0x1c
      [  120.179104]  cpuidle_enter_state+0x1b8/0x5d0
      [  120.179117]  cpuidle_enter+0x50/0x90
      [  120.179131]  do_idle+0x1a1/0x1f0
      [  120.179145]  cpu_startup_entry+0x14/0x16
      [  120.179158]  start_secondary+0x120/0x180
      [  120.179172]  secondary_startup_64+0xa4/0xb0
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-5-chris@chris-wilson.co.uk
      a4e648a0
  14. 06 3月, 2020 1 次提交
  15. 20 2月, 2020 2 次提交
  16. 18 2月, 2020 1 次提交
    • C
      drm/i915/gt: Protect defer_request() from new waiters · 19b5f3b4
      Chris Wilson 提交于
      Mika spotted
      
      <4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI
      <4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1
      <4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018
      <4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915]
      <4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05
      <4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083
      <4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122
      <4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48
      <4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe
      <4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8
      <4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80
      <4>[17436.705532] FS:  0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000
      <4>[17436.705534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0
      <4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      <4>[17436.705542] Call Trace:
      <4>[17436.705545]  <IRQ>
      <4>[17436.705603]  execlists_submission_tasklet+0xc0/0x130 [i915]
      
      which is us consuming a partially initialised new waiter in
      defer_requests(). We can prevent this by initialising the i915_dependency
      prior to making it visible, and since we are using a concurrent
      list_add/iterator mark them up to the compiler.
      
      Fixes: 8ee36e04 ("drm/i915/execlists: Minimalistic timeslicing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk
      (cherry picked from commit f14f27b1)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      19b5f3b4
  17. 12 2月, 2020 1 次提交
  18. 07 2月, 2020 2 次提交
    • C
      drm/i915/gt: Protect execlists_hold/unhold from new waiters · 793c2261
      Chris Wilson 提交于
      As we may add new waiters to a request as it is being run, we need to
      mark the list iteration as being safe for concurrent addition.
      
      v2: Mika spotted that we used the same trick for signalers_list, so warn
      the compiler about the lockless walk there as well.
      
      Fixes: 32ff621f ("drm/i915/gt: Allow temporary suspension of inflight requests")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200207110213.2734386-1-chris@chris-wilson.co.uk
      793c2261
    • C
      drm/i915/gt: Protect defer_request() from new waiters · f14f27b1
      Chris Wilson 提交于
      Mika spotted
      
      <4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI
      <4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1
      <4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018
      <4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915]
      <4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05
      <4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083
      <4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122
      <4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48
      <4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe
      <4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8
      <4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80
      <4>[17436.705532] FS:  0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000
      <4>[17436.705534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0
      <4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      <4>[17436.705542] Call Trace:
      <4>[17436.705545]  <IRQ>
      <4>[17436.705603]  execlists_submission_tasklet+0xc0/0x130 [i915]
      
      which is us consuming a partially initialised new waiter in
      defer_requests(). We can prevent this by initialising the i915_dependency
      prior to making it visible, and since we are using a concurrent
      list_add/iterator mark them up to the compiler.
      
      Fixes: 8ee36e04 ("drm/i915/execlists: Minimalistic timeslicing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk
      f14f27b1
  19. 17 1月, 2020 1 次提交
  20. 20 12月, 2019 1 次提交
  21. 16 12月, 2019 1 次提交
    • C
      drm/i915: Copy across scheduler behaviour flags across submit fences · 99de9536
      Chris Wilson 提交于
      We want the bonded request to have the same scheduler properties as its
      master so that it is placed at the same depth in the queue. For example,
      consider we have requests A, B and B', where B & B' are a bonded pair to
      run in parallel on two engines.
      
      	A -> B
           	     \- B'
      
      B will run after A and so may be scheduled on an idle engine and wait on
      A using a semaphore. B' sees B being executed and so enters the queue on
      the same engine as A. As B' did not inherit the semaphore-chain from B,
      it may have higher precedence than A and so preempts execution. However,
      B' then sits on a semaphore waiting for B, who is waiting for A, who is
      blocked by B.
      
      Ergo B' needs to inherit the scheduler properties from B (i.e. the
      semaphore chain) so that it is scheduled with the same priority as B and
      will not be executed ahead of Bs dependencies.
      
      Furthermore, to prevent the priorities changing via the expose fence on
      B', we need to couple in the dependencies for PI. This requires us to
      relax our sanity-checks that dependencies are strictly in order.
      
      v2: Synchronise (B, B') execution on all platforms, regardless of using
      a scheduler, any no-op syncs should be elided.
      
      Fixes: ee113690 ("drm/i915/execlists: Virtual engine bonding")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
      Testcase: igt/gem_exec_balancer/bonded-chain
      Testcase: igt/gem_exec_balancer/bonded-semaphore
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
      (cherry picked from commit c81471f5)
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      99de9536
  22. 11 12月, 2019 1 次提交
    • C
      drm/i915: Copy across scheduler behaviour flags across submit fences · c81471f5
      Chris Wilson 提交于
      We want the bonded request to have the same scheduler properties as its
      master so that it is placed at the same depth in the queue. For example,
      consider we have requests A, B and B', where B & B' are a bonded pair to
      run in parallel on two engines.
      
      	A -> B
           	     \- B'
      
      B will run after A and so may be scheduled on an idle engine and wait on
      A using a semaphore. B' sees B being executed and so enters the queue on
      the same engine as A. As B' did not inherit the semaphore-chain from B,
      it may have higher precedence than A and so preempts execution. However,
      B' then sits on a semaphore waiting for B, who is waiting for A, who is
      blocked by B.
      
      Ergo B' needs to inherit the scheduler properties from B (i.e. the
      semaphore chain) so that it is scheduled with the same priority as B and
      will not be executed ahead of Bs dependencies.
      
      Furthermore, to prevent the priorities changing via the expose fence on
      B', we need to couple in the dependencies for PI. This requires us to
      relax our sanity-checks that dependencies are strictly in order.
      
      v2: Synchronise (B, B') execution on all platforms, regardless of using
      a scheduler, any no-op syncs should be elided.
      
      Fixes: ee113690 ("drm/i915/execlists: Virtual engine bonding")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
      Testcase: igt/gem_exec_balancer/bonded-chain
      Testcase: igt/gem_exec_balancer/bonded-semaphore
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
      c81471f5
  23. 22 11月, 2019 1 次提交
    • C
      drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request · 67a3acaa
      Chris Wilson 提交于
      As we start peeking into requests for longer and longer, e.g.
      incorporating use of spinlocks when only protected by an
      rcu_read_lock(), we need to be careful in how we reset the request when
      recycling and need to preserve any barriers that may still be in use as
      the request is reset for reuse.
      
      Quoting Linus Torvalds:
      
      > If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?
      
        .. because the object can be accessed (by RCU) after the refcount has
        gone down to zero, and the thing has been released.
      
        That's the whole and only point of SLAB_TYPESAFE_BY_RCU.
      
        That flag basically says:
      
        "I may end up accessing this object *after* it has been free'd,
        because there may be RCU lookups in flight"
      
        This has nothing to do with constructors. It's ok if the object gets
        reused as an object of the same type and does *not* get
        re-initialized, because we're perfectly fine seeing old stale data.
      
        What it guarantees is that the slab isn't shared with any other kind
        of object, _and_ that the underlying pages are free'd after an RCU
        quiescent period (so the pages aren't shared with another kind of
        object either during an RCU walk).
      
        And it doesn't necessarily have to have a constructor, because the
        thing that a RCU walk will care about is
      
          (a) guaranteed to be an object that *has* been on some RCU list (so
          it's not a "new" object)
      
          (b) the RCU walk needs to have logic to verify that it's still the
          *same* object and hasn't been re-used as something else.
      
        In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
        immediately, but because it gets reused as the same kind of object,
        the RCU walker can "know" what parts have meaning for re-use, in a way
        it couidn't if the re-use was random.
      
        That said, it *is* subtle, and people should be careful.
      
      > So the re-use might initialize the fields lazily, not necessarily using a ctor.
      
        If you have a well-defined refcount, and use "atomic_inc_not_zero()"
        to guard the speculative RCU access section, and use
        "atomic_dec_and_test()" in the freeing section, then you should be
        safe wrt new allocations.
      
        If you have a completely new allocation that has "random stale
        content", you know that it cannot be on the RCU list, so there is no
        speculative access that can ever see that random content.
      
        So the only case you need to worry about is a re-use allocation, and
        you know that the refcount will start out as zero even if you don't
        have a constructor.
      
        So you can think of the refcount itself as always having a zero
        constructor, *BUT* you need to be careful with ordering.
      
        In particular, whoever does the allocation needs to then set the
        refcount to a non-zero value *after* it has initialized all the other
        fields. And in particular, it needs to make sure that it uses the
        proper memory ordering to do so.
      
        NOTE! One thing to be very worried about is that re-initializing
        whatever RCU lists means that now the RCU walker may be walking on the
        wrong list so the walker may do the right thing for this particular
        entry, but it may miss walking *other* entries. So then you can get
        spurious lookup failures, because the RCU walker never walked all the
        way to the end of the right list. That ends up being a much more
        subtle bug.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
      67a3acaa
  24. 19 11月, 2019 1 次提交
  25. 05 11月, 2019 1 次提交
  26. 04 11月, 2019 1 次提交
  27. 21 10月, 2019 1 次提交
  28. 18 10月, 2019 1 次提交
  29. 14 8月, 2019 1 次提交
    • C
      drm/i915: Push the wakeref->count deferral to the backend · a79ca656
      Chris Wilson 提交于
      If the backend wishes to defer the wakeref parking, make it responsible
      for unlocking the wakeref (i.e. bumping the counter). This allows it to
      time the unlock much more carefully in case it happens to needs the
      wakeref to be active during its deferral.
      
      For instance, during engine parking we may choose to emit an idle
      barrier (a request). To do so, we borrow the engine->kernel_context
      timeline and to ensure exclusive access we keep the
      engine->wakeref.count as 0. However, to submit that request to HW may
      require a intel_engine_pm_get() (e.g. to keep the submission tasklet
      alive) and before we allow that we have to rewake our wakeref to avoid a
      recursive deadlock.
      
      <4> [257.742916] IRQs not enabled as expected
      <4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100
      <4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid
      <4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W         5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1
      <4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
      <4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100
      <4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3
      <4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086
      <4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302
      <4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff
      <4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001
      <4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008
      <4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430
      <4> [257.743074] FS:  0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000
      <4> [257.743082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0
      <4> [257.743096] Call Trace:
      <4> [257.743104]  <IRQ>
      <4> [257.743265]  __i915_request_commit+0x240/0x5d0 [i915]
      <4> [257.743427]  ? __i915_request_create+0x228/0x4c0 [i915]
      <4> [257.743584]  __engine_park+0x64/0x250 [i915]
      <4> [257.743730]  ____intel_wakeref_put_last+0x1c/0x70 [i915]
      <4> [257.743878]  i915_sample+0x2ee/0x310 [i915]
      <4> [257.744030]  ? i915_pmu_cpu_offline+0xb0/0xb0 [i915]
      <4> [257.744040]  __hrtimer_run_queues+0x11e/0x4b0
      <4> [257.744068]  hrtimer_interrupt+0xea/0x250
      <4> [257.744079]  ? lockdep_hardirqs_off+0x79/0xd0
      <4> [257.744101]  smp_apic_timer_interrupt+0x96/0x280
      <4> [257.744114]  apic_timer_interrupt+0xf/0x20
      <4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae
      
      v2: Keep the priority_hint assert
      v3: That assert was desperately trying to point out my bug. Sorry, little
      assert.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
      a79ca656
  30. 20 6月, 2019 2 次提交
    • C
      drm/i915/execlists: Minimalistic timeslicing · 8ee36e04
      Chris Wilson 提交于
      If we have multiple contexts of equal priority pending execution,
      activate a timer to demote the currently executing context in favour of
      the next in the queue when that timeslice expires. This enforces
      fairness between contexts (so long as they allow preemption -- forced
      preemption, in the future, will kick those who do not obey) and allows
      us to avoid userspace blocking forward progress with e.g. unbounded
      MI_SEMAPHORE_WAIT.
      
      For the starting point here, we use the jiffie as our timeslice so that
      we should be reasonably efficient wrt frequent CPU wakeups.
      
      Testcase: igt/gem_exec_scheduler/semaphore-resolve
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk
      8ee36e04
    • C
      drm/i915/execlists: Preempt-to-busy · 22b7a426
      Chris Wilson 提交于
      When using a global seqno, we required a precise stop-the-workd event to
      handle preemption and unwind the global seqno counter. To accomplish
      this, we would preempt to a special out-of-band context and wait for the
      machine to report that it was idle. Given an idle machine, we could very
      precisely see which requests had completed and which we needed to feed
      back into the run queue.
      
      However, now that we have scrapped the global seqno, we no longer need
      to precisely unwind the global counter and only track requests by their
      per-context seqno. This allows us to loosely unwind inflight requests
      while scheduling a preemption, with the enormous caveat that the
      requests we put back on the run queue are still _inflight_ (until the
      preemption request is complete). This makes request tracking much more
      messy, as at any point then we can see a completed request that we
      believe is not currently scheduled for execution. We also have to be
      careful not to rewind RING_TAIL past RING_HEAD on preempting to the
      running context, and for this we use a semaphore to prevent completion
      of the request before continuing.
      
      To accomplish this feat, we change how we track requests scheduled to
      the HW. Instead of appending our requests onto a single list as we
      submit, we track each submission to ELSP as its own block. Then upon
      receiving the CS preemption event, we promote the pending block to the
      inflight block (discarding what was previously being tracked). As normal
      CS completion events arrive, we then remove stale entries from the
      inflight tracker.
      
      v2: Be a tinge paranoid and ensure we flush the write into the HWS page
      for the GPU semaphore to pick in a timely fashion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
      22b7a426
  31. 15 6月, 2019 1 次提交
  32. 22 5月, 2019 1 次提交
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
  33. 20 5月, 2019 2 次提交