1. 06 3月, 2019 1 次提交
  2. 02 3月, 2019 1 次提交
    • C
      drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ · e8861964
      Chris Wilson 提交于
      Having introduced per-context seqno, we now have a means to identity
      progress across the system without feel of rollback as befell the
      global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
      advance of submission safe in the knowledge that our target seqno and
      address is stable.
      
      However, since we are telling the GPU to busy-spin on the target address
      until it matches the signaling seqno, we only want to do so when we are
      sure that busy-spin will be completed quickly. To achieve this we only
      submit the request to HW once the signaler is itself executing (modulo
      preemption causing us to wait longer), and we only do so for default and
      above priority requests (so that idle priority tasks never themselves
      hog the GPU waiting for others).
      
      As might be reasonably expected, HW semaphores excel in inter-engine
      synchronisation microbenchmarks (where the 3x reduced latency / increased
      throughput more than offset the power cost of spinning on a second ring)
      and have significant improvement (can be up to ~10%, most see no change)
      for single clients that utilize multiple engines (typically media players
      and transcoders), without regressing multiple clients that can saturate
      the system or changing the power envelope dramatically.
      
      v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
      v4: Tell the world and include it as part of scheduler caps.
      
      Testcase: igt/gem_exec_whisper
      Testcase: igt/benchmarks/gem_wsim
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
      e8861964
  3. 28 2月, 2019 3 次提交
  4. 27 2月, 2019 1 次提交
    • C
      drm/i915: Avoid waking the engines just to check if they are idle · 0b702dca
      Chris Wilson 提交于
      Exploit that reads of the ring registers return 0 from the engine when
      it is idle and we do not apply forcewake to know that if the engine is
      idle then both reads will be identical (and so we interpret the ring as
      idle).
      
      The ulterior motive is to try and reduce the number of spurious wakeups
      to avoid untimely death, such as:
      
      <3> [85.046836] [drm:fw_domains_get [i915]] *ERROR* render: timed out waiting for forcewake ack request.
      <4> [85.051916] ------------[ cut here ]------------
      <4> [85.051917] GT thread status wait timed out
      <4> [85.051963] WARNING: CPU: 2 PID: 2195 at drivers/gpu/drm/i915/intel_uncore.c:303 __gen6_gt_wait_for_thread_c0+0x6e/0xa0 [i915]
      <4> [85.051964] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp mei_hdcp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_hda_codec broadcom bcm_phy_lib i2c_i801 snd_hwdep snd_hda_core tg3 snd_pcm ptp pps_core mei_me mei prime_numbers lpc_ich
      <4> [85.051980] CPU: 2 PID: 2195 Comm: drm_read Tainted: G     U            5.0.0-rc8-CI-CI_DRM_5662+ #1
      <4> [85.051981] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
      <4> [85.052012] RIP: 0010:__gen6_gt_wait_for_thread_c0+0x6e/0xa0 [i915]
      <4> [85.052015] Code: 8b 92 5c 80 13 00 83 e2 07 75 d5 5b 5d c3 80 3d 5b 6a 1a 00 00 75 f4 48 c7 c7 38 21 31 a0 c6 05 4b 6a 1a 00 01 e8 e2 84 ea e0 <0f> 0b eb dd 80 3d 3a 6a 1a 00 00 75 98 48 c7 c6 08 21 31 a0 48 c7
      <4> [85.052016] RSP: 0018:ffffc9000043bd00 EFLAGS: 00010086
      <4> [85.052019] RAX: 0000000000000000 RBX: ffff888217c50000 RCX: 0000000000000000
      <4> [85.052020] RDX: 0000000000000007 RSI: ffffffff820cb141 RDI: 00000000ffffffff
      <4> [85.052022] RBP: 00000013cd30f2fb R08: 0000000000000000 R09: 0000000000000001
      <4> [85.052024] R10: ffffc9000043bce0 R11: 0000000000000000 R12: ffff888217c50ee0
      <4> [85.052025] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff888218076530
      <4> [85.052028] FS:  00007fc79d049980(0000) GS:ffff888227a80000(0000) knlGS:0000000000000000
      <4> [85.052029] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [85.052031] CR2: 00007f782e2940f8 CR3: 000000022458e006 CR4: 00000000000606e0
      <4> [85.052033] Call Trace:
      <4> [85.052064]  gen6_read32+0x14e/0x250 [i915]
      <4> [85.052096]  intel_engine_is_idle+0x7d/0x180 [i915]
      <4> [85.052126]  intel_engines_are_idle+0x29/0x50 [i915]
      <4> [85.052153]  i915_drop_caches_set+0x21c/0x290 [i915]
      <4> [85.052160]  simple_attr_write+0xb0/0xd0
      <4> [85.052165]  full_proxy_write+0x51/0x80
      <4> [85.052170]  __vfs_write+0x31/0x190
      <4> [85.052176]  ? rcu_read_lock_sched_held+0x6f/0x80
      <4> [85.052178]  ? rcu_sync_lockdep_assert+0x29/0x50
      <4> [85.052181]  ? __sb_start_write+0x152/0x1f0
      <4> [85.052183]  ? __sb_start_write+0x163/0x1f0
      <4> [85.052187]  vfs_write+0xbd/0x1b0
      <4> [85.052191]  ksys_write+0x50/0xc0
      <4> [85.052196]  do_syscall_64+0x55/0x190
      <4> [85.052200]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      <4> [85.052202] RIP: 0033:0x7fc79c9d3281
      <4> [85.052204] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
      <4> [85.052206] RSP: 002b:00007fffa4a0a7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      <4> [85.052208] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fc79c9d3281
      <4> [85.052210] RDX: 0000000000000005 RSI: 00007fffa4a0a880 RDI: 0000000000000008
      <4> [85.052212] RBP: 00007fffa4a0a820 R08: 0000000000000000 R09: 0000000000000000
      <4> [85.052213] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc79c9bc718
      <4> [85.052215] R13: 0000000000000003 R14: 00007fc79c9c1628 R15: 00007fc79c9bdd80
      <4> [85.052223] irq event stamp: 71630
      <4> [85.052226] hardirqs last  enabled at (71629): [<ffffffff8197b64c>] _raw_spin_unlock_irqrestore+0x4c/0x60
      <4> [85.052228] hardirqs last disabled at (71630): [<ffffffff8197b4bd>] _raw_spin_lock_irqsave+0xd/0x50
      <4> [85.052231] softirqs last  enabled at (70444): [<ffffffff81c0033a>] __do_softirq+0x33a/0x4b9
      <4> [85.052234] softirqs last disabled at (70433): [<ffffffff810b51b1>] irq_exit+0xd1/0xe0
      <4> [85.052264] WARNING: CPU: 2 PID: 2195 at drivers/gpu/drm/i915/intel_uncore.c:303 __gen6_gt_wait_for_thread_c0+0x6e/0xa0 [i915]
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190227114958.32438-1-chris@chris-wilson.co.uk
      0b702dca
  5. 26 2月, 2019 3 次提交
  6. 21 2月, 2019 1 次提交
  7. 12 2月, 2019 1 次提交
  8. 06 2月, 2019 1 次提交
  9. 30 1月, 2019 3 次提交
    • C
      drm/i915: Replace global breadcrumbs with per-context interrupt tracking · 52c0fdb2
      Chris Wilson 提交于
      A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the
      thundering i915_wait_request herd"), the issue of handling multiple
      clients waiting in parallel was brought to our attention. The
      requirement was that every client should be woken immediately upon its
      request being signaled, without incurring any cpu overhead.
      
      To handle certain fragility of our hw meant that we could not do a
      simple check inside the irq handler (some generations required almost
      unbounded delays before we could be sure of seqno coherency) and so
      request completion checking required delegation.
      
      Before commit 688e6c72, the solution was simple. Every client
      waiting on a request would be woken on every interrupt and each would do
      a heavyweight check to see if their request was complete. Commit
      688e6c72 introduced an rbtree so that only the earliest waiter on
      the global timeline would woken, and would wake the next and so on.
      (Along with various complications to handle requests being reordered
      along the global timeline, and also a requirement for kthread to provide
      a delegate for fence signaling that had no process context.)
      
      The global rbtree depends on knowing the execution timeline (and global
      seqno). Without knowing that order, we must instead check all contexts
      queued to the HW to see which may have advanced. We trim that list by
      only checking queued contexts that are being waited on, but still we
      keep a list of all active contexts and their active signalers that we
      inspect from inside the irq handler. By moving the waiters onto the fence
      signal list, we can combine the client wakeup with the dma_fence
      signaling (a dramatic reduction in complexity, but does require the HW
      being coherent, the seqno must be visible from the cpu before the
      interrupt is raised - we keep a timer backup just in case).
      
      Having previously fixed all the issues with irq-seqno serialisation (by
      inserting delays onto the GPU after each request instead of random delays
      on the CPU after each interrupt), we can rely on the seqno state to
      perfom direct wakeups from the interrupt handler. This allows us to
      preserve our single context switch behaviour of the current routine,
      with the only downside that we lose the RT priority sorting of wakeups.
      In general, direct wakeup latency of multiple clients is about the same
      (about 10% better in most cases) with a reduction in total CPU time spent
      in the waiter (about 20-50% depending on gen). Average herd behaviour is
      improved, but at the cost of not delegating wakeups on task_prio.
      
      v2: Capture fence signaling state for error state and add comments to
      warm even the most cold of hearts.
      v3: Check if the request is still active before busywaiting
      v4: Reduce the amount of pointer misdirection with list_for_each_safe
      and using a local i915_request variable inside the loops
      v5: Add a missing pluralisation to a purely informative selftest message.
      
      References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
      52c0fdb2
    • C
      drm/i915: Rename execlists->queue_priority to queue_priority_hint · 4d97cbe0
      Chris Wilson 提交于
      After noticing that we trigger preemption events for currently executing
      requests, as well as requests that complete before the preemption and
      attempting to suppress those preemption events, it is wise to not
      consider the queue_priority to be authoritative. As we only track the
      maximum priority seen between dequeue passes, if the maximum priority
      request is no longer available for dequeuing (it completed or is even
      executing on another engine), we have no knowledge of the previous
      queue_priority as it would require us to keep a full history of enqueued
      requests -- but we already have that history in the priolists!
      
      Rename the queue_priority to queue_priority_hint so that we do not
      confuse it as being exactly the maximum priority in the queue, but merely
      an indication that we have seen a new maximum priority value and as such
      we should check whether it should preempt the currently running request.
      
      v2: s/preempt_priority_hint/queue_priority_hint/ as preempt implies it
      being only used for the singular task of preemption and not the wider
      question of waking up due to a change in the queue.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-3-chris@chris-wilson.co.uk
      4d97cbe0
    • C
      drm/i915: Identify active requests · 85474441
      Chris Wilson 提交于
      To allow requests to forgo a common execution timeline, one question we
      need to be able to answer is "is this request running?". To track
      whether a request has started on HW, we can emit a breadcrumb at the
      beginning of the request and check its timeline's HWSP to see if the
      breadcrumb has advanced past the start of this request. (This is in
      contrast to the global timeline where we need only ask if we are on the
      global timeline and if the timeline has advanced past the end of the
      previous request.)
      
      There is still confusion from a preempted request, which has already
      started but relinquished the HW to a high priority request. For the
      common case, this discrepancy should be negligible. However, for
      identification of hung requests, knowing which one was running at the
      time of the hang will be much more important.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
      85474441
  10. 29 1月, 2019 3 次提交
  11. 25 1月, 2019 3 次提交
  12. 18 1月, 2019 1 次提交
  13. 17 1月, 2019 2 次提交
  14. 16 1月, 2019 1 次提交
  15. 15 1月, 2019 2 次提交
  16. 03 1月, 2019 1 次提交
  17. 02 1月, 2019 1 次提交
  18. 31 12月, 2018 1 次提交
  19. 28 12月, 2018 1 次提交
  20. 18 12月, 2018 1 次提交
  21. 13 12月, 2018 3 次提交
  22. 12 12月, 2018 1 次提交
  23. 07 12月, 2018 1 次提交
  24. 05 12月, 2018 1 次提交
    • T
      drm/i915: Introduce per-engine workarounds · 90098efa
      Tvrtko Ursulin 提交于
      We stopped re-applying the GT workarounds after engine reset since commit
      59b449d5 ("drm/i915: Split out functions for different kinds of
      workarounds").
      
      Issue with this is that some of the GT workarounds live in the MMIO space
      which gets lost during engine resets. So far the registers in 0x2xxx and
      0xbxxx address range have been identified to be affected.
      
      This losing of applied workarounds has obvious negative effects and can
      even lead to hard system hangs (see the linked Bugzilla).
      
      Rather than just restoring this re-application, because we have also
      observed that it is not safe to just re-write all GT workarounds after
      engine resets (GPU might be live and weird hardware states can happen),
      we introduce a new class of per-engine workarounds and move only the
      affected GT workarounds over.
      
      Using the framework introduced in the previous patch, we therefore after
      engine reset, re-apply only the workarounds living in the affected MMIO
      address ranges.
      
      v2:
       * Move Wa_1406609255:icl to engine workarounds as well.
       * Rename API. (Chris Wilson)
       * Drop redundant IS_KABYLAKE. (Chris Wilson)
       * Re-order engine wa/ init so latest platforms are first. (Rodrigo Vivi)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=107945
      Fixes: 59b449d5 ("drm/i915: Split out functions for different kinds of workarounds")
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: intel-gfx@lists.freedesktop.org
      Acked-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181203133341.10258-1-tvrtko.ursulin@linux.intel.com
      (cherry picked from commit 4a15c75c)
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      90098efa
  25. 04 12月, 2018 2 次提交