1. 06 3月, 2020 1 次提交
  2. 20 2月, 2020 2 次提交
  3. 12 2月, 2020 1 次提交
  4. 07 2月, 2020 2 次提交
    • C
      drm/i915/gt: Protect execlists_hold/unhold from new waiters · 793c2261
      Chris Wilson 提交于
      As we may add new waiters to a request as it is being run, we need to
      mark the list iteration as being safe for concurrent addition.
      
      v2: Mika spotted that we used the same trick for signalers_list, so warn
      the compiler about the lockless walk there as well.
      
      Fixes: 32ff621f ("drm/i915/gt: Allow temporary suspension of inflight requests")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200207110213.2734386-1-chris@chris-wilson.co.uk
      793c2261
    • C
      drm/i915/gt: Protect defer_request() from new waiters · f14f27b1
      Chris Wilson 提交于
      Mika spotted
      
      <4>[17436.705441] general protection fault: 0000 [#1] PREEMPT SMP PTI
      <4>[17436.705447] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.5.0+ #1
      <4>[17436.705449] Hardware name: System manufacturer System Product Name/Z170M-PLUS, BIOS 3805 05/16/2018
      <4>[17436.705512] RIP: 0010:__execlists_submission_tasklet+0xc4d/0x16e0 [i915]
      <4>[17436.705516] Code: c5 4c 8d 60 e0 75 17 e9 8c 07 00 00 49 8b 44 24 20 49 39 c5 4c 8d 60 e0 0f 84 7a 07 00 00 49 8b 5c 24 08 49 8b 87 80 00 00 00 <48> 39 83 d8 fe ff ff 75 d9 48 8b 83 88 fe ff ff a8 01 0f 84 b6 05
      <4>[17436.705518] RSP: 0018:ffffc9000012ce80 EFLAGS: 00010083
      <4>[17436.705521] RAX: ffff88822ae42000 RBX: 5a5a5a5a5a5a5a5a RCX: dead000000000122
      <4>[17436.705523] RDX: ffff88822ae42588 RSI: ffff8881e32a7908 RDI: ffff8881c429fd48
      <4>[17436.705525] RBP: ffffc9000012cf00 R08: ffff88822ae42588 R09: 00000000fffffffe
      <4>[17436.705527] R10: ffff8881c429fb80 R11: 00000000a677cf08 R12: ffff8881c42a0aa8
      <4>[17436.705529] R13: ffff8881c429fd38 R14: ffff88822ae42588 R15: ffff8881c429fb80
      <4>[17436.705532] FS:  0000000000000000(0000) GS:ffff88822ed00000(0000) knlGS:0000000000000000
      <4>[17436.705534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[17436.705536] CR2: 00007f858c76d000 CR3: 0000000005610003 CR4: 00000000003606e0
      <4>[17436.705538] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[17436.705540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      <4>[17436.705542] Call Trace:
      <4>[17436.705545]  <IRQ>
      <4>[17436.705603]  execlists_submission_tasklet+0xc0/0x130 [i915]
      
      which is us consuming a partially initialised new waiter in
      defer_requests(). We can prevent this by initialising the i915_dependency
      prior to making it visible, and since we are using a concurrent
      list_add/iterator mark them up to the compiler.
      
      Fixes: 8ee36e04 ("drm/i915/execlists: Minimalistic timeslicing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200206204915.2636606-2-chris@chris-wilson.co.uk
      f14f27b1
  5. 17 1月, 2020 1 次提交
  6. 20 12月, 2019 1 次提交
  7. 16 12月, 2019 1 次提交
    • C
      drm/i915: Copy across scheduler behaviour flags across submit fences · 99de9536
      Chris Wilson 提交于
      We want the bonded request to have the same scheduler properties as its
      master so that it is placed at the same depth in the queue. For example,
      consider we have requests A, B and B', where B & B' are a bonded pair to
      run in parallel on two engines.
      
      	A -> B
           	     \- B'
      
      B will run after A and so may be scheduled on an idle engine and wait on
      A using a semaphore. B' sees B being executed and so enters the queue on
      the same engine as A. As B' did not inherit the semaphore-chain from B,
      it may have higher precedence than A and so preempts execution. However,
      B' then sits on a semaphore waiting for B, who is waiting for A, who is
      blocked by B.
      
      Ergo B' needs to inherit the scheduler properties from B (i.e. the
      semaphore chain) so that it is scheduled with the same priority as B and
      will not be executed ahead of Bs dependencies.
      
      Furthermore, to prevent the priorities changing via the expose fence on
      B', we need to couple in the dependencies for PI. This requires us to
      relax our sanity-checks that dependencies are strictly in order.
      
      v2: Synchronise (B, B') execution on all platforms, regardless of using
      a scheduler, any no-op syncs should be elided.
      
      Fixes: ee113690 ("drm/i915/execlists: Virtual engine bonding")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
      Testcase: igt/gem_exec_balancer/bonded-chain
      Testcase: igt/gem_exec_balancer/bonded-semaphore
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
      (cherry picked from commit c81471f5)
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      99de9536
  8. 11 12月, 2019 1 次提交
    • C
      drm/i915: Copy across scheduler behaviour flags across submit fences · c81471f5
      Chris Wilson 提交于
      We want the bonded request to have the same scheduler properties as its
      master so that it is placed at the same depth in the queue. For example,
      consider we have requests A, B and B', where B & B' are a bonded pair to
      run in parallel on two engines.
      
      	A -> B
           	     \- B'
      
      B will run after A and so may be scheduled on an idle engine and wait on
      A using a semaphore. B' sees B being executed and so enters the queue on
      the same engine as A. As B' did not inherit the semaphore-chain from B,
      it may have higher precedence than A and so preempts execution. However,
      B' then sits on a semaphore waiting for B, who is waiting for A, who is
      blocked by B.
      
      Ergo B' needs to inherit the scheduler properties from B (i.e. the
      semaphore chain) so that it is scheduled with the same priority as B and
      will not be executed ahead of Bs dependencies.
      
      Furthermore, to prevent the priorities changing via the expose fence on
      B', we need to couple in the dependencies for PI. This requires us to
      relax our sanity-checks that dependencies are strictly in order.
      
      v2: Synchronise (B, B') execution on all platforms, regardless of using
      a scheduler, any no-op syncs should be elided.
      
      Fixes: ee113690 ("drm/i915/execlists: Virtual engine bonding")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
      Testcase: igt/gem_exec_balancer/bonded-chain
      Testcase: igt/gem_exec_balancer/bonded-semaphore
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
      c81471f5
  9. 22 11月, 2019 1 次提交
    • C
      drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request · 67a3acaa
      Chris Wilson 提交于
      As we start peeking into requests for longer and longer, e.g.
      incorporating use of spinlocks when only protected by an
      rcu_read_lock(), we need to be careful in how we reset the request when
      recycling and need to preserve any barriers that may still be in use as
      the request is reset for reuse.
      
      Quoting Linus Torvalds:
      
      > If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?
      
        .. because the object can be accessed (by RCU) after the refcount has
        gone down to zero, and the thing has been released.
      
        That's the whole and only point of SLAB_TYPESAFE_BY_RCU.
      
        That flag basically says:
      
        "I may end up accessing this object *after* it has been free'd,
        because there may be RCU lookups in flight"
      
        This has nothing to do with constructors. It's ok if the object gets
        reused as an object of the same type and does *not* get
        re-initialized, because we're perfectly fine seeing old stale data.
      
        What it guarantees is that the slab isn't shared with any other kind
        of object, _and_ that the underlying pages are free'd after an RCU
        quiescent period (so the pages aren't shared with another kind of
        object either during an RCU walk).
      
        And it doesn't necessarily have to have a constructor, because the
        thing that a RCU walk will care about is
      
          (a) guaranteed to be an object that *has* been on some RCU list (so
          it's not a "new" object)
      
          (b) the RCU walk needs to have logic to verify that it's still the
          *same* object and hasn't been re-used as something else.
      
        In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
        immediately, but because it gets reused as the same kind of object,
        the RCU walker can "know" what parts have meaning for re-use, in a way
        it couidn't if the re-use was random.
      
        That said, it *is* subtle, and people should be careful.
      
      > So the re-use might initialize the fields lazily, not necessarily using a ctor.
      
        If you have a well-defined refcount, and use "atomic_inc_not_zero()"
        to guard the speculative RCU access section, and use
        "atomic_dec_and_test()" in the freeing section, then you should be
        safe wrt new allocations.
      
        If you have a completely new allocation that has "random stale
        content", you know that it cannot be on the RCU list, so there is no
        speculative access that can ever see that random content.
      
        So the only case you need to worry about is a re-use allocation, and
        you know that the refcount will start out as zero even if you don't
        have a constructor.
      
        So you can think of the refcount itself as always having a zero
        constructor, *BUT* you need to be careful with ordering.
      
        In particular, whoever does the allocation needs to then set the
        refcount to a non-zero value *after* it has initialized all the other
        fields. And in particular, it needs to make sure that it uses the
        proper memory ordering to do so.
      
        NOTE! One thing to be very worried about is that re-initializing
        whatever RCU lists means that now the RCU walker may be walking on the
        wrong list so the walker may do the right thing for this particular
        entry, but it may miss walking *other* entries. So then you can get
        spurious lookup failures, because the RCU walker never walked all the
        way to the end of the right list. That ends up being a much more
        subtle bug.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
      67a3acaa
  10. 19 11月, 2019 1 次提交
  11. 05 11月, 2019 1 次提交
  12. 04 11月, 2019 1 次提交
  13. 21 10月, 2019 1 次提交
  14. 18 10月, 2019 1 次提交
  15. 14 8月, 2019 1 次提交
    • C
      drm/i915: Push the wakeref->count deferral to the backend · a79ca656
      Chris Wilson 提交于
      If the backend wishes to defer the wakeref parking, make it responsible
      for unlocking the wakeref (i.e. bumping the counter). This allows it to
      time the unlock much more carefully in case it happens to needs the
      wakeref to be active during its deferral.
      
      For instance, during engine parking we may choose to emit an idle
      barrier (a request). To do so, we borrow the engine->kernel_context
      timeline and to ensure exclusive access we keep the
      engine->wakeref.count as 0. However, to submit that request to HW may
      require a intel_engine_pm_get() (e.g. to keep the submission tasklet
      alive) and before we allow that we have to rewake our wakeref to avoid a
      recursive deadlock.
      
      <4> [257.742916] IRQs not enabled as expected
      <4> [257.742930] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:169 __local_bh_enable_ip+0xa9/0x100
      <4> [257.742936] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 btusb btrtl btbcm btintel snd_hda_intel snd_intel_nhlt bluetooth snd_hda_codec coretemp snd_hwdep crct10dif_pclmul snd_hda_core crc32_pclmul ecdh_generic ecc ghash_clmulni_intel snd_pcm r8169 realtek lpc_ich prime_numbers i2c_hid
      <4> [257.742991] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W         5.3.0-rc3-g5d0a06cd532c-drmtip_340+ #1
      <4> [257.742998] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
      <4> [257.743008] RIP: 0010:__local_bh_enable_ip+0xa9/0x100
      <4> [257.743017] Code: 37 5b 5d c3 8b 80 50 08 00 00 85 c0 75 a9 80 3d 0b be 25 01 00 75 a0 48 c7 c7 f3 0c 06 ac c6 05 fb bd 25 01 01 e8 77 84 ff ff <0f> 0b eb 89 48 89 ef e8 3b 41 06 00 eb 98 e8 e4 5c f4 ff 5b 5d c3
      <4> [257.743025] RSP: 0018:ffffa78600003cb8 EFLAGS: 00010086
      <4> [257.743035] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000010302
      <4> [257.743042] RDX: 0000000080010302 RSI: 0000000000000000 RDI: 00000000ffffffff
      <4> [257.743050] RBP: ffffffffc0494bb3 R08: 0000000000000000 R09: 0000000000000001
      <4> [257.743058] R10: 0000000014c8f0e9 R11: 00000000fee2ff8e R12: ffffa23ba8c38008
      <4> [257.743065] R13: ffffa23bacc579c0 R14: ffffa23bb7db0f60 R15: ffffa23b9cc8c430
      <4> [257.743074] FS:  0000000000000000(0000) GS:ffffa23bbba00000(0000) knlGS:0000000000000000
      <4> [257.743082] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [257.743089] CR2: 00007fe477b20778 CR3: 000000011f72a000 CR4: 00000000001006f0
      <4> [257.743096] Call Trace:
      <4> [257.743104]  <IRQ>
      <4> [257.743265]  __i915_request_commit+0x240/0x5d0 [i915]
      <4> [257.743427]  ? __i915_request_create+0x228/0x4c0 [i915]
      <4> [257.743584]  __engine_park+0x64/0x250 [i915]
      <4> [257.743730]  ____intel_wakeref_put_last+0x1c/0x70 [i915]
      <4> [257.743878]  i915_sample+0x2ee/0x310 [i915]
      <4> [257.744030]  ? i915_pmu_cpu_offline+0xb0/0xb0 [i915]
      <4> [257.744040]  __hrtimer_run_queues+0x11e/0x4b0
      <4> [257.744068]  hrtimer_interrupt+0xea/0x250
      <4> [257.744079]  ? lockdep_hardirqs_off+0x79/0xd0
      <4> [257.744101]  smp_apic_timer_interrupt+0x96/0x280
      <4> [257.744114]  apic_timer_interrupt+0xf/0x20
      <4> [257.744125] RIP: 0010:__do_softirq+0xb3/0x4ae
      
      v2: Keep the priority_hint assert
      v3: That assert was desperately trying to point out my bug. Sorry, little
      assert.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111378Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190813190705.23869-1-chris@chris-wilson.co.uk
      a79ca656
  16. 20 6月, 2019 2 次提交
    • C
      drm/i915/execlists: Minimalistic timeslicing · 8ee36e04
      Chris Wilson 提交于
      If we have multiple contexts of equal priority pending execution,
      activate a timer to demote the currently executing context in favour of
      the next in the queue when that timeslice expires. This enforces
      fairness between contexts (so long as they allow preemption -- forced
      preemption, in the future, will kick those who do not obey) and allows
      us to avoid userspace blocking forward progress with e.g. unbounded
      MI_SEMAPHORE_WAIT.
      
      For the starting point here, we use the jiffie as our timeslice so that
      we should be reasonably efficient wrt frequent CPU wakeups.
      
      Testcase: igt/gem_exec_scheduler/semaphore-resolve
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-2-chris@chris-wilson.co.uk
      8ee36e04
    • C
      drm/i915/execlists: Preempt-to-busy · 22b7a426
      Chris Wilson 提交于
      When using a global seqno, we required a precise stop-the-workd event to
      handle preemption and unwind the global seqno counter. To accomplish
      this, we would preempt to a special out-of-band context and wait for the
      machine to report that it was idle. Given an idle machine, we could very
      precisely see which requests had completed and which we needed to feed
      back into the run queue.
      
      However, now that we have scrapped the global seqno, we no longer need
      to precisely unwind the global counter and only track requests by their
      per-context seqno. This allows us to loosely unwind inflight requests
      while scheduling a preemption, with the enormous caveat that the
      requests we put back on the run queue are still _inflight_ (until the
      preemption request is complete). This makes request tracking much more
      messy, as at any point then we can see a completed request that we
      believe is not currently scheduled for execution. We also have to be
      careful not to rewind RING_TAIL past RING_HEAD on preempting to the
      running context, and for this we use a semaphore to prevent completion
      of the request before continuing.
      
      To accomplish this feat, we change how we track requests scheduled to
      the HW. Instead of appending our requests onto a single list as we
      submit, we track each submission to ELSP as its own block. Then upon
      receiving the CS preemption event, we promote the pending block to the
      inflight block (discarding what was previously being tracked). As normal
      CS completion events arrive, we then remove stale entries from the
      inflight tracker.
      
      v2: Be a tinge paranoid and ensure we flush the write into the HWS page
      for the GPU semaphore to pick in a timely fashion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
      22b7a426
  17. 15 6月, 2019 1 次提交
  18. 22 5月, 2019 1 次提交
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
  19. 20 5月, 2019 3 次提交
  20. 17 5月, 2019 1 次提交
    • C
      drm/i915: Bump signaler priority on adding a waiter · 6e7eb7a8
      Chris Wilson 提交于
      The handling of the no-preemption priority level imposes the restriction
      that we need to maintain the implied ordering even though preemption is
      disabled. Otherwise we may end up with an AB-BA deadlock across multiple
      engine due to a real preemption event reordering the no-preemption
      WAITs. To resolve this issue we currently promote all requests to WAIT
      on unsubmission, however this interferes with the timeslicing
      requirement that we do not apply any implicit promotion that will defeat
      the round-robin timeslice list. (If we automatically promote the active
      request it will go back to the head of the queue and not the tail!)
      
      So we need implicit promotion to prevent reordering around semaphores
      where we are not allowed to preempt, and we must avoid implicit
      promotion on unsubmission. So instead of at unsubmit, if we apply that
      implicit promotion on adding the dependency, we avoid the semaphore
      deadlock and we also reduce the gains made by the promotion for user
      space waiting. Furthermore, by keeping the earlier dependencies at a
      higher level, we reduce the search space for timeslicing without
      altering runtime scheduling too badly (no dependencies at all will be
      assigned a higher priority for rrul).
      
      v2: Limit the bump to external edges (as originally intended) i.e.
      between contexts and out to the user.
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
      6e7eb7a8
  21. 13 5月, 2019 3 次提交
  22. 08 5月, 2019 1 次提交
  23. 11 4月, 2019 1 次提交
    • C
      drm/i915: Bump ready tasks ahead of busywaits · b7404c7e
      Chris Wilson 提交于
      Consider two tasks that are running in parallel on a pair of engines
      (vcs0, vcs1), but then must complete on a shared engine (rcs0). To
      maximise throughput, we want to run the first ready task on rcs0 (i.e.
      the first task that completes on either of vcs0 or vcs1). When using
      semaphores, however, we will instead queue onto rcs in submission order.
      
      To resolve this incorrect ordering, we want to re-evaluate the priority
      queue when each of the request is ready. Normally this happens because
      we only insert into the priority queue requests that are ready, but with
      semaphores we are inserting ahead of their readiness and to compensate
      we penalize those tasks with reduced priority (so that tasks that do not
      need to busywait should naturally be run first). However, given a series
      of tasks that each use semaphores, the queue degrades into submission
      fifo rather than readiness fifo, and so to counter this we give a small
      boost to semaphore users as their dependent tasks are completed (and so
      we no longer require any busywait prior to running the user task as they
      are then ready themselves).
      
      v2: Fixup irqsave for schedule_lock (Tvrtko)
      
      Testcase: igt/gem_exec_schedule/semaphore-codependency
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190409152922.23894-1-chris@chris-wilson.co.uk
      b7404c7e
  24. 02 4月, 2019 1 次提交
    • C
      drm/i915: Only emit one semaphore per request · 7881e605
      Chris Wilson 提交于
      Ideally we only need one semaphore per ring to accommodate waiting on
      multiple engines in parallel. However, since we do not know which fences
      we will finally be waiting on, we emit a semaphore for every fence. It
      turns out to be quite easy to trick ourselves into exhausting our
      ringbuffer causing an error, just by feeding in a batch that depends on
      several thousand contexts.
      
      Since we never can be waiting on more than one semaphore in parallel
      (other than perhaps the desire to busywait on multiple engines), just
      pick the first fence for our semaphore. If we pick the wrong fence to
      busywait on, we just miss an opportunity to reduce latency.
      
      An adaption might be to use sched.flags as either a semaphore counter,
      or to track the first busywait on each engine, converting it back to a
      single use bit prior to closing the request.
      
      v2: Track first semaphore used per-engine (this caters for our basic
      igt that semaphores are working).
      Reported-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Testcase: igt/gem_exec_fence/long-history
      Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-3-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      7881e605
  25. 09 3月, 2019 1 次提交
    • C
      drm/i915: Reacquire priolist cache after dropping the engine lock · 7b1366b4
      Chris Wilson 提交于
      If we drop the engine lock, we may run execlists_dequeue which may free
      the priolist. Therefore if we ever drop the execution lock on the
      engine, we have to discard our cache and refetch the priolist to ensure
      we do not use a stale pointer.
      
      [  506.418935] [IGT] gem_exec_whisper: starting subtest contexts-priority
      [  593.240825] general protection fault: 0000 [#1] SMP
      [  593.240863] CPU: 1 PID: 494 Comm: gem_exec_whispe Tainted: G     U            5.0.0-rc6+ #100
      [  593.240879] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
      [  593.240965] RIP: 0010:__i915_schedule+0x1fe/0x320 [i915]
      [  593.240981] Code: 48 8b 0c 24 48 89 c3 49 8b 45 28 49 8b 75 20 4c 89 3c 24 48 89 46 08 48 89 30 48 8b 43 08 48 89 4b 08 49 89 5d 20 49 89 45 28 <48> 89 08 45 39 a7 b8 03 00 00 7d 44 45 89 a7 b8 03 00 00 49 8b 85
      [  593.240999] RSP: 0018:ffffc90000057a60 EFLAGS: 00010046
      [  593.241013] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8882582d7870 RCX: ffff88826baba6f0
      [  593.241026] RDX: 0000000000000000 RSI: ffff8882582d6e70 RDI: ffff888273482194
      [  593.241049] RBP: ffffc90000057a68 R08: ffff8882582d7680 R09: ffff8882582d7840
      [  593.241068] R10: 0000000000000000 R11: ffffea00095ebe08 R12: 0000000000000728
      [  593.241105] R13: ffff88826baba6d0 R14: ffffc90000057a40 R15: ffff888273482158
      [  593.241120] FS:  00007f4613fb3900(0000) GS:ffff888277a80000(0000) knlGS:0000000000000000
      [  593.241133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  593.241146] CR2: 00007f57d3c66a84 CR3: 000000026e2b6000 CR4: 00000000001406e0
      [  593.241158] Call Trace:
      [  593.241233]  i915_schedule+0x1f/0x30 [i915]
      [  593.241326]  i915_request_add+0x1a9/0x290 [i915]
      [  593.241393]  i915_gem_do_execbuffer+0x45f/0x1150 [i915]
      [  593.241411]  ? init_object+0x49/0x80
      [  593.241425]  ? ___slab_alloc.constprop.91+0x4b8/0x4e0
      [  593.241491]  ? i915_gem_execbuffer2_ioctl+0x99/0x380 [i915]
      [  593.241563]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
      [  593.241629]  i915_gem_execbuffer2_ioctl+0x1bb/0x380 [i915]
      [  593.241705]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
      [  593.241724]  drm_ioctl_kernel+0x81/0xd0
      [  593.241738]  drm_ioctl+0x1a7/0x310
      [  593.241803]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
      [  593.241819]  ? __update_load_avg_se+0x1c9/0x240
      [  593.241834]  ? pick_next_entity+0x7e/0x120
      [  593.241851]  do_vfs_ioctl+0x88/0x5d0
      [  593.241880]  ksys_ioctl+0x35/0x70
      [  593.241894]  __x64_sys_ioctl+0x11/0x20
      [  593.241907]  do_syscall_64+0x44/0xf0
      [  593.241924]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  593.241940] RIP: 0033:0x7f4615ffe757
      [  593.241952] Code: 00 00 90 48 8b 05 39 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 a7 0c 00 f7 d8 64 89 01 48
      [  593.241970] RSP: 002b:00007ffc1030ddf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  593.241984] RAX: ffffffffffffffda RBX: 00007ffc10324420 RCX: 00007f4615ffe757
      [  593.241997] RDX: 00007ffc1030e220 RSI: 0000000040406469 RDI: 0000000000000003
      [  593.242010] RBP: 00007ffc1030e220 R08: 00007f46160c9208 R09: 00007f46160c9240
      [  593.242022] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040406469
      [  593.242038] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      [  593.242058] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers
      
      v2: Track the local engine cache and explicitly clear it when switching
      engine locks.
      
      Fixes: a02eb975 ("drm/i915/execlists: Cache the priolist when rescheduling")
      Testcase: igt/gem_exec_whisper/contexts-priority # rare!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190211204647.26723-1-chris@chris-wilson.co.uk
      (cherry picked from commit ed7dc677)
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      7b1366b4
  26. 06 3月, 2019 1 次提交
  27. 02 3月, 2019 1 次提交
  28. 01 3月, 2019 1 次提交
    • C
      drm/i915/execlists: Suppress mere WAIT preemption · b5773a36
      Chris Wilson 提交于
      WAIT is occasionally suppressed by virtue of preempted requests being
      promoted to NEWCLIENT if they have not all ready received that boost.
      Make this consistent for all WAIT boosts that they are not allowed to
      preempt executing contexts and are merely granted the right to be at the
      front of the queue for the next execution slot. This is in keeping with
      the desire that the WAIT boost be a minor tweak that does not give
      excessive promotion to its user and open ourselves to trivial abuse.
      
      The problem with the inconsistent WAIT preemption becomes more apparent
      as the preemption is propagated across the engines, where one engine may
      preempt and the other not, and we be relying on the exact execution
      order being consistent across engines (e.g. using HW semaphores to
      coordinate parallel execution).
      
      v2: Also protect GuC submission from false preemption loops.
      v3: Build bug safeguards and better debug messages for st.
      v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
      unwind), applying it earlier during submit causes out-of-order execution
      combined with execute fences.
      v5: Call sw_fence_fini for our dummy request (Matthew)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
      b5773a36
  29. 28 2月, 2019 1 次提交
  30. 27 2月, 2019 1 次提交
  31. 12 2月, 2019 1 次提交
    • C
      drm/i915: Reacquire priolist cache after dropping the engine lock · ed7dc677
      Chris Wilson 提交于
      If we drop the engine lock, we may run execlists_dequeue which may free
      the priolist. Therefore if we ever drop the execution lock on the
      engine, we have to discard our cache and refetch the priolist to ensure
      we do not use a stale pointer.
      
      [  506.418935] [IGT] gem_exec_whisper: starting subtest contexts-priority
      [  593.240825] general protection fault: 0000 [#1] SMP
      [  593.240863] CPU: 1 PID: 494 Comm: gem_exec_whispe Tainted: G     U            5.0.0-rc6+ #100
      [  593.240879] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
      [  593.240965] RIP: 0010:__i915_schedule+0x1fe/0x320 [i915]
      [  593.240981] Code: 48 8b 0c 24 48 89 c3 49 8b 45 28 49 8b 75 20 4c 89 3c 24 48 89 46 08 48 89 30 48 8b 43 08 48 89 4b 08 49 89 5d 20 49 89 45 28 <48> 89 08 45 39 a7 b8 03 00 00 7d 44 45 89 a7 b8 03 00 00 49 8b 85
      [  593.240999] RSP: 0018:ffffc90000057a60 EFLAGS: 00010046
      [  593.241013] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8882582d7870 RCX: ffff88826baba6f0
      [  593.241026] RDX: 0000000000000000 RSI: ffff8882582d6e70 RDI: ffff888273482194
      [  593.241049] RBP: ffffc90000057a68 R08: ffff8882582d7680 R09: ffff8882582d7840
      [  593.241068] R10: 0000000000000000 R11: ffffea00095ebe08 R12: 0000000000000728
      [  593.241105] R13: ffff88826baba6d0 R14: ffffc90000057a40 R15: ffff888273482158
      [  593.241120] FS:  00007f4613fb3900(0000) GS:ffff888277a80000(0000) knlGS:0000000000000000
      [  593.241133] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  593.241146] CR2: 00007f57d3c66a84 CR3: 000000026e2b6000 CR4: 00000000001406e0
      [  593.241158] Call Trace:
      [  593.241233]  i915_schedule+0x1f/0x30 [i915]
      [  593.241326]  i915_request_add+0x1a9/0x290 [i915]
      [  593.241393]  i915_gem_do_execbuffer+0x45f/0x1150 [i915]
      [  593.241411]  ? init_object+0x49/0x80
      [  593.241425]  ? ___slab_alloc.constprop.91+0x4b8/0x4e0
      [  593.241491]  ? i915_gem_execbuffer2_ioctl+0x99/0x380 [i915]
      [  593.241563]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
      [  593.241629]  i915_gem_execbuffer2_ioctl+0x1bb/0x380 [i915]
      [  593.241705]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
      [  593.241724]  drm_ioctl_kernel+0x81/0xd0
      [  593.241738]  drm_ioctl+0x1a7/0x310
      [  593.241803]  ? i915_gem_execbuffer_ioctl+0x270/0x270 [i915]
      [  593.241819]  ? __update_load_avg_se+0x1c9/0x240
      [  593.241834]  ? pick_next_entity+0x7e/0x120
      [  593.241851]  do_vfs_ioctl+0x88/0x5d0
      [  593.241880]  ksys_ioctl+0x35/0x70
      [  593.241894]  __x64_sys_ioctl+0x11/0x20
      [  593.241907]  do_syscall_64+0x44/0xf0
      [  593.241924]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  593.241940] RIP: 0033:0x7f4615ffe757
      [  593.241952] Code: 00 00 90 48 8b 05 39 a7 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 09 a7 0c 00 f7 d8 64 89 01 48
      [  593.241970] RSP: 002b:00007ffc1030ddf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  593.241984] RAX: ffffffffffffffda RBX: 00007ffc10324420 RCX: 00007f4615ffe757
      [  593.241997] RDX: 00007ffc1030e220 RSI: 0000000040406469 RDI: 0000000000000003
      [  593.242010] RBP: 00007ffc1030e220 R08: 00007f46160c9208 R09: 00007f46160c9240
      [  593.242022] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000040406469
      [  593.242038] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      [  593.242058] Modules linked in: i915 intel_gtt drm_kms_helper prime_numbers
      
      v2: Track the local engine cache and explicitly clear it when switching
      engine locks.
      
      Fixes: a02eb975 ("drm/i915/execlists: Cache the priolist when rescheduling")
      Testcase: igt/gem_exec_whisper/contexts-priority # rare!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190211204647.26723-1-chris@chris-wilson.co.uk
      ed7dc677
  32. 30 1月, 2019 2 次提交
    • C
      drm/i915: Replace global breadcrumbs with per-context interrupt tracking · 52c0fdb2
      Chris Wilson 提交于
      A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the
      thundering i915_wait_request herd"), the issue of handling multiple
      clients waiting in parallel was brought to our attention. The
      requirement was that every client should be woken immediately upon its
      request being signaled, without incurring any cpu overhead.
      
      To handle certain fragility of our hw meant that we could not do a
      simple check inside the irq handler (some generations required almost
      unbounded delays before we could be sure of seqno coherency) and so
      request completion checking required delegation.
      
      Before commit 688e6c72, the solution was simple. Every client
      waiting on a request would be woken on every interrupt and each would do
      a heavyweight check to see if their request was complete. Commit
      688e6c72 introduced an rbtree so that only the earliest waiter on
      the global timeline would woken, and would wake the next and so on.
      (Along with various complications to handle requests being reordered
      along the global timeline, and also a requirement for kthread to provide
      a delegate for fence signaling that had no process context.)
      
      The global rbtree depends on knowing the execution timeline (and global
      seqno). Without knowing that order, we must instead check all contexts
      queued to the HW to see which may have advanced. We trim that list by
      only checking queued contexts that are being waited on, but still we
      keep a list of all active contexts and their active signalers that we
      inspect from inside the irq handler. By moving the waiters onto the fence
      signal list, we can combine the client wakeup with the dma_fence
      signaling (a dramatic reduction in complexity, but does require the HW
      being coherent, the seqno must be visible from the cpu before the
      interrupt is raised - we keep a timer backup just in case).
      
      Having previously fixed all the issues with irq-seqno serialisation (by
      inserting delays onto the GPU after each request instead of random delays
      on the CPU after each interrupt), we can rely on the seqno state to
      perfom direct wakeups from the interrupt handler. This allows us to
      preserve our single context switch behaviour of the current routine,
      with the only downside that we lose the RT priority sorting of wakeups.
      In general, direct wakeup latency of multiple clients is about the same
      (about 10% better in most cases) with a reduction in total CPU time spent
      in the waiter (about 20-50% depending on gen). Average herd behaviour is
      improved, but at the cost of not delegating wakeups on task_prio.
      
      v2: Capture fence signaling state for error state and add comments to
      warm even the most cold of hearts.
      v3: Check if the request is still active before busywaiting
      v4: Reduce the amount of pointer misdirection with list_for_each_safe
      and using a local i915_request variable inside the loops
      v5: Add a missing pluralisation to a purely informative selftest message.
      
      References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
      52c0fdb2
    • C
      drm/i915/execlists: Suppress preempting self · c9a64622
      Chris Wilson 提交于
      In order to avoid preempting ourselves, we currently refuse to schedule
      the tasklet if we reschedule an inflight context. However, this glosses
      over a few issues such as what happens after a CS completion event and
      we then preempt the newly executing context with itself, or if something
      else causes a tasklet_schedule triggering the same evaluation to
      preempt the active context with itself.
      
      However, when we avoid preempting ELSP[0], we still retain the preemption
      value as it may match a second preemption request within the same time period
      that we need to resolve after the next CS event. However, since we only
      store the maximum preemption priority seen, it may not match the
      subsequent event and so we should double check whether or not we
      actually do need to trigger a preempt-to-idle by comparing the top
      priorities from each queue. Later, this gives us a hook for finer
      control over deciding whether the preempt-to-idle is justified.
      
      The sequence of events where we end up preempting for no avail is:
      
      1. Queue requests/contexts A, B
      2. Priority boost A; no preemption as it is executing, but keep hint
      3. After CS switch, B is less than hint, force preempt-to-idle
      4. Resubmit B after idling
      
      v2: We can simplify a bunch of tests based on the knowledge that PI will
      ensure that earlier requests along the same context will have the highest
      priority.
      v3: Demonstrate the stale preemption hint with a selftest
      
      References: a2bf92e8 ("drm/i915/execlists: Avoid kicking priority on the current context")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
      c9a64622