1. 01 2月, 2018 1 次提交
    • C
      drm/i915: Always run hangcheck while the GPU is busy · b26a32a8
      Chris Wilson 提交于
      Previously, we relied on only running the hangcheck while somebody was
      waiting on the GPU, in order to minimise the amount of time hangcheck
      had to run. (If nobody was watching the GPU, nobody would notice if the
      GPU wasn't responding -- eventually somebody would care and so kick
      hangcheck into action.) However, this falls apart from around commit
      4680816b ("drm/i915: Wait first for submission, before waiting for
      request completion"), as not all waiters declare themselves to hangcheck
      and so we could switch off hangcheck and miss GPU hangs even when
      waiting under the struct_mutex.
      
      If we enable hangcheck from the first request submission, and let it run
      until the GPU is idle again, we forgo all the complexity involved with
      only enabling around waiters. We just have to remember to be careful that
      we do not declare a GPU hang when idly waiting for the next request to
      be come ready, as we will run hangcheck continuously even when the
      engines are stalled waiting for external events. This should be true
      already as we should only be tracking requests submitted to hardware for
      execution as an indicator that the engine is busy.
      
      Fixes: 4680816b ("drm/i915: Wait first for submission, before waiting for request completion"
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104840Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180129144104.3921-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      (cherry picked from commit 88923048)
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      b26a32a8
  2. 14 12月, 2017 1 次提交
    • C
      drm/i915: Stop listening to request resubmission from the signaler kthread · 74c7b078
      Chris Wilson 提交于
      The intent here was that we would be listening to
      i915_gem_request_unsubmit in order to cancel the signaler quickly and
      release the reference on the request. Cancelling the signaler is done
      directly via intel_engine_cancel_signaling (called from unsubmit), but
      that does not directly wake up the signaling thread, and neither does
      setting the request->global_seqno back to zero wake up listeners to the
      request->execute waitqueue. So the only time that listening to the
      request->execute waitqueue would wake up the signaling kthread would be
      on the request resubmission, during which time we would already receive
      wake ups from rejoining the global breadcrumbs wait rbtree.
      
      Trying to wake up to release the request remains an issue. If the
      signaling was cancelled and no other request required signaling, then it
      is possible for us to shutdown with the reference on the request still
      held. To ensure that we do not try to shutdown, leaking that request, we
      kick the signaling threads whenever we disarm the breadcrumbs, i.e. on
      parking the engine when idle.
      
      v2: We do need to be sure to release the last reference on stopping the
      kthread; asserting that it has been dropped already is insufficient.
      
      Fixes: d6a2289d ("drm/i915: Remove the preempted request from the execution queue")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171208121033.5236-1-chris@chris-wilson.co.ukAcked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      (cherry picked from commit 776bc27f)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      74c7b078
  3. 12 12月, 2017 1 次提交
  4. 11 12月, 2017 1 次提交
    • C
      drm/i915: Stop listening to request resubmission from the signaler kthread · 776bc27f
      Chris Wilson 提交于
      The intent here was that we would be listening to
      i915_gem_request_unsubmit in order to cancel the signaler quickly and
      release the reference on the request. Cancelling the signaler is done
      directly via intel_engine_cancel_signaling (called from unsubmit), but
      that does not directly wake up the signaling thread, and neither does
      setting the request->global_seqno back to zero wake up listeners to the
      request->execute waitqueue. So the only time that listening to the
      request->execute waitqueue would wake up the signaling kthread would be
      on the request resubmission, during which time we would already receive
      wake ups from rejoining the global breadcrumbs wait rbtree.
      
      Trying to wake up to release the request remains an issue. If the
      signaling was cancelled and no other request required signaling, then it
      is possible for us to shutdown with the reference on the request still
      held. To ensure that we do not try to shutdown, leaking that request, we
      kick the signaling threads whenever we disarm the breadcrumbs, i.e. on
      parking the engine when idle.
      
      v2: We do need to be sure to release the last reference on stopping the
      kthread; asserting that it has been dropped already is insufficient.
      
      Fixes: d6a2289d ("drm/i915: Remove the preempted request from the execution queue")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171208121033.5236-1-chris@chris-wilson.co.ukAcked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      776bc27f
  5. 09 12月, 2017 1 次提交
  6. 21 11月, 2017 1 次提交
  7. 16 11月, 2017 1 次提交
  8. 10 11月, 2017 1 次提交
  9. 01 11月, 2017 1 次提交
  10. 26 10月, 2017 1 次提交
    • C
      drm/i915/guc: Always enable the breadcrumbs irq · bcbd5c33
      Chris Wilson 提交于
      The execlists emulation on top of the GuC (used for scheduling and
      preemption) depends on the MI_USER_INTERRUPT for its notifications and
      tasklet action. As we always employ the irq, there is no advantage in
      ever disabling it while we are using the GuC, so allow us to arm the
      breadcrumb irq when enabling GuC submission and disarm upon disabling.
      The impact should be lessened by the delayed irq disabling we do (we
      only disable after receiving an interrupt for which no one was wanting),
      but allowing guc to explicitly manage the irq in relation to itself is
      simpler and prevents an issue with losing an interrupt for preemption
      as it is not coupled to an active request.
      
      Internally, we add a reference counter (breadcrumbs.irq_enabled) as a
      simple mechanism to allow GuC to keep the breadcrumb irq enabled. To
      improve upon always enabling the irq while guc is selected, we need
      to hook into the parking facility of intel_engines so that we only enable
      the breadcrumbs while the GT is active (one step better would be to
      individually park/unpark each engine).
      
      In effect, this means that we keep the breadcrumb irq always enabled for
      the entire duration the guc is busy, whereas before we would try to
      switch it off whenever we idled for more than interrupt with no
      associated waiters. The difference *should* be negligible in practice!
      
      v2: Stop abusing fence signaling (and its auxiliary data structures) to
      enable the breadcrumbs irqs.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michał Winiarski <michal.winiarski@intel.com>,
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>,
      Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-3-chris@chris-wilson.co.uk
      bcbd5c33
  11. 18 10月, 2017 1 次提交
  12. 27 9月, 2017 1 次提交
  13. 08 6月, 2017 2 次提交
  14. 26 4月, 2017 3 次提交
  15. 24 4月, 2017 1 次提交
  16. 04 4月, 2017 2 次提交
  17. 21 3月, 2017 1 次提交
    • C
      drm/i915: Protect intel_engine_wakeup() for call from irq context · 467221bc
      Chris Wilson 提交于
      intel_engine_wakeup() is called by nop_request_submit() which is
      installed to handle third party fences completed from within irq
      context. As such, it needs the full irqsave/irqrestore and not the
      partial spin_irq_lock handling.
      
      [18942.714467] =================================
      [18942.719076] [ INFO: inconsistent lock state ]
      [18942.723522] 4.11.0-rc2-CI-CI_DRM_2368+ #1 Tainted: G     U  W
      [18942.729970] ---------------------------------
      [18942.734466] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      [18942.740594] gem_eio/1275 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [18942.745932]  (&(&fence->lock)->rlock){+.?...}, at: [<ffffffff815ec100>] dma_fence_signal+0x100/0x
      230
      [18942.755331] {IN-SOFTIRQ-W} state was registered at:
      [18942.760356]   __lock_acquire+0x5d0/0x1bb0
      [18942.764444]   lock_acquire+0xc9/0x220
      [18942.768196]   _raw_spin_lock_irqsave+0x41/0x60
      [18942.772747]   dma_fence_signal+0x100/0x230
      [18942.776927]   vgem_fence_timeout+0x9/0x10 [vgem]
      [18942.781701]   call_timer_fn+0x92/0x380
      [18942.785557]   expire_timers+0x150/0x1f0
      [18942.789491]   run_timer_softirq+0x7c/0x160
      [18942.793705]   __do_softirq+0x116/0x4c0
      [18942.797560]   irq_exit+0xa9/0xc0
      [18942.800873]   smp_apic_timer_interrupt+0x38/0x50
      [18942.805611]   apic_timer_interrupt+0x90/0xa0
      [18942.810008]   cpuidle_enter_state+0x135/0x380
      [18942.814503]   cpuidle_enter+0x12/0x20
      [18942.818250]   call_cpuidle+0x1e/0x40
      [18942.821906]   do_idle+0x17e/0x1f0
      [18942.825333]   cpu_startup_entry+0x18/0x20
      [18942.829463]   rest_init+0x127/0x130
      [18942.833025]   start_kernel+0x3f1/0x3fe
      [18942.836908]   x86_64_start_reservations+0x2a/0x2c
      [18942.841733]   x86_64_start_kernel+0x173/0x186
      [18942.846234]   verify_cpu+0x0/0xfc
      [18942.849604] irq event stamp: 30568
      [18942.853140] hardirqs last  enabled at (30567): [<ffffffff8110b81f>] ktime_get+0xef/0x120
      [18942.861468] hardirqs last disabled at (30568): [<ffffffff81876377>] _raw_spin_lock_irqsave+0x17/0
      x60
      [18942.870812] softirqs last  enabled at (30462): [<ffffffff81085cd9>] __do_softirq+0x1d9/0x4c0
      [18942.879443] softirqs last disabled at (30439): [<ffffffff81086139>] irq_exit+0xa9/0xc0
      [18942.887616]
      [18942.887616] other info that might help us debug this:
      [18942.894279]  Possible unsafe locking scenario:
      [18942.894279]
      [18942.900336]        CPU0
      [18942.902851]        ----
      [18942.905362]   lock(&(&fence->lock)->rlock);
      [18942.909647]   <Interrupt>
      [18942.912330]     lock(&(&fence->lock)->rlock);
      [18942.916821]
      [18942.916821]  *** DEADLOCK ***
      [18942.916821]
      [18942.922862] 1 lock held by gem_eio/1275:
      [18942.926859]  #0:  (&(&fence->lock)->rlock){+.?...}, at: [<ffffffff815ec100>] dma_fence_signal+0x1
      00/0x230
      [18942.936651]
      [18942.936651] stack backtrace:
      [18942.941142] CPU: 3 PID: 1275 Comm: gem_eio Tainted: G     U  W       4.11.0-rc2-CI-CI_DRM_2368+ #
      1
      [18942.950367] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F21 01/06/2
      017
      [18942.959756] Call Trace:
      [18942.962244]  dump_stack+0x67/0x92
      [18942.965626]  print_usage_bug.part.23+0x259/0x268
      [18942.970362]  mark_lock+0x12c/0x6f0
      [18942.973851]  ? check_usage_forwards+0x130/0x130
      [18942.978487]  mark_held_locks+0x6f/0xa0
      [18942.982329]  ? _raw_spin_unlock_irq+0x27/0x50
      [18942.986797]  trace_hardirqs_on_caller+0x150/0x200
      [18942.991599]  trace_hardirqs_on+0xd/0x10
      [18942.995515]  _raw_spin_unlock_irq+0x27/0x50
      [18942.999796]  intel_engine_wakeup+0x26/0x30 [i915]
      [18943.004670]  intel_engine_init_global_seqno+0x131/0x1a0 [i915]
      [18943.010745]  nop_submit_request+0x2e/0x40 [i915]
      [18943.015476]  submit_notify+0x3f/0x5c [i915]
      [18943.019763]  __i915_sw_fence_complete+0x176/0x220 [i915]
      [18943.025234]  ? try_to_del_timer_sync+0x4d/0x60
      [18943.029825]  i915_sw_fence_complete+0x25/0x40 [i915]
      [18943.034887]  dma_i915_sw_fence_wake+0x26/0x60 [i915]
      [18943.039959]  dma_fence_signal+0x146/0x230
      [18943.044109]  vgem_fence_signal_ioctl+0x6c/0xc0 [vgem]
      [18943.049275]  drm_ioctl+0x200/0x450
      [18943.052758]  ? vgem_fence_attach_ioctl+0x270/0x270 [vgem]
      [18943.058334]  do_vfs_ioctl+0x90/0x6e0
      [18943.061991]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
      [18943.066843]  ? __this_cpu_preempt_check+0x13/0x20
      [18943.071643]  ? trace_hardirqs_on_caller+0xe7/0x200
      [18943.076532]  SyS_ioctl+0x3c/0x70
      [18943.079842]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [18943.084558] RIP: 0033:0x7f0dfcc14357
      [18943.088240] RSP: 002b:00007ffeb4628da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [18943.095996] RAX: ffffffffffffffda RBX: ffffffff8147eb93 RCX: 00007f0dfcc14357
      [18943.103311] RDX: 00007ffeb4628de0 RSI: 0000000040086442 RDI: 0000000000000005
      [18943.110574] RBP: ffffc9000176ff88 R08: 0000000000000004 R09: 0000000000000000
      [18943.117845] R10: 0000000000000029 R11: 0000000000000246 R12: 0000000000000001
      [18943.125168] R13: 0000000000000005 R14: 0000000040086442 R15: 0000000000000000
      [18943.132520]  ? __this_cpu_preempt_check+0x13/0x20
      
      Fixes: cdc3a453 ("drm/i915: No need to save/restore irq status in intel_engine_wakeup")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170320143133.1507-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      467221bc
  18. 16 3月, 2017 5 次提交
  19. 07 3月, 2017 2 次提交
  20. 06 3月, 2017 1 次提交
    • C
      drm/i915: Wake up all waiters before idling · e1c0c91b
      Chris Wilson 提交于
      When we idle, we wakeup the first waiter (checking to see if it missed
      an earlier wakeup) and disarm the breadcrumbs. However, we now assert
      that there are no waiter when the interrupt is disabled, triggering an
      assert if there were multiple waiters when we idled.
      
      [  420.842275] invalid opcode: 0000 [#1] PREEMPT SMP
      [  420.842285] Modules linked in: vgem snd_hda_codec_realtek x86_pkg_temp_thermal snd_hda_codec_generic intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep mei_me snd_hda_core mei snd_pcm lpc_ich i915 r8169 mii prime_numbers
      [  420.842357] CPU: 4 PID: 8714 Comm: kms_pipe_crc_ba Tainted: G     U  W       4.10.0-CI-CI_DRM_2280+ #1
      [  420.842377] Hardware name: Hewlett-Packard HP Pro 3500 Series/2ABF, BIOS 8.11 10/24/2012
      [  420.842395] task: ffff880117ddce40 task.stack: ffffc90001114000
      [  420.842439] RIP: 0010:__intel_engine_remove_wait+0x1f4/0x200 [i915]
      [  420.842454] RSP: 0018:ffffc90001117b18 EFLAGS: 00010046
      [  420.842467] RAX: 0000000000000000 RBX: ffff88010c25c2a8 RCX: 0000000000000001
      [  420.842481] RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffc90001117c50
      [  420.842495] RBP: ffffc90001117b58 R08: 0000000011e52352 R09: c4d16acc00000000
      [  420.842511] R10: ffffffff82789eb0 R11: ffff880117ddce40 R12: ffffc90001117c50
      [  420.842525] R13: ffffc90001117c50 R14: 0000000000000078 R15: 0000000000000000
      [  420.842540] FS:  00007fe47dda0a40(0000) GS:ffff88011fb00000(0000) knlGS:0000000000000000
      [  420.842559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  420.842571] CR2: 00007fd6c0a2cec4 CR3: 000000010a5e5000 CR4: 00000000001406e0
      [  420.842586] Call Trace:
      [  420.842595]  ? do_raw_spin_lock+0xad/0xb0
      [  420.842635]  intel_engine_remove_wait.part.3+0x26/0x40 [i915]
      [  420.842678]  intel_engine_remove_wait+0xe/0x20 [i915]
      [  420.842721]  i915_wait_request+0x4f0/0x8c0 [i915]
      [  420.842736]  ? wake_up_q+0x70/0x70
      [  420.842747]  ? wake_up_q+0x70/0x70
      [  420.842787]  i915_gem_object_wait_fence+0x7d/0x1a0 [i915]
      [  420.842829]  i915_gem_object_wait+0x30d/0x520 [i915]
      [  420.842842]  ? __this_cpu_preempt_check+0x13/0x20
      [  420.842884]  i915_gem_wait_ioctl+0x12e/0x2e0 [i915]
      [  420.842924]  ? i915_gem_wait_ioctl+0x22/0x2e0 [i915]
      [  420.842939]  drm_ioctl+0x200/0x450
      [  420.842976]  ? i915_gem_set_wedged+0x90/0x90 [i915]
      [  420.842993]  do_vfs_ioctl+0x90/0x6e0
      [  420.843003]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
      [  420.843017]  ? __this_cpu_preempt_check+0x13/0x20
      [  420.843030]  ? trace_hardirqs_on_caller+0xe7/0x200
      [  420.843042]  SyS_ioctl+0x3c/0x70
      [  420.843054]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  420.843065] RIP: 0033:0x7fe47c4b9357
      [  420.843075] RSP: 002b:00007ffc3c0633c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  420.843094] RAX: ffffffffffffffda RBX: ffffffff81482393 RCX: 00007fe47c4b9357
      [  420.843109] RDX: 00007ffc3c063400 RSI: 00000000c010646c RDI: 0000000000000004
      [  420.843123] RBP: ffffc90001117f88 R08: 0000000000000008 R09: 0000000000000000
      [  420.843137] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [  420.843151] R13: 0000000000000004 R14: 00000000c010646c R15: 0000000000000000
      [  420.843168]  ? __this_cpu_preempt_check+0x13/0x20
      [  420.843180] Code: 81 48 c7 c1 40 6a 16 a0 48 c7 c2 47 29 15 a0 be 17 01 00 00 48 c7 c7 10 6a 16 a0 e8 c7 ea fe e0 e9 5d ff ff ff 0f 0b 0f 0b 0f 0b <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8 67 41 7e e1
      [  420.843325] RIP: __intel_engine_remove_wait+0x1f4/0x200 [i915] RSP: ffffc90001117b18
      
      Fixes: b66255f0 ("drm/i915: Refactor wakeup of the next breadcrumb waiter")
      Fixes: 67b807a8 ("drm/i915: Delay disabling the user interrupt for breadcrumbs")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170306092916.11623-2-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      e1c0c91b
  21. 04 3月, 2017 2 次提交
  22. 02 3月, 2017 3 次提交
  23. 28 2月, 2017 5 次提交
  24. 23 2月, 2017 1 次提交