1. 09 5月, 2018 1 次提交
    • C
      drm/i915: Annotate timeline lock nesting · 0adb90d3
      Chris Wilson 提交于
      CI noticed
      
      <4>[   23.430701] ============================================
      <4>[   23.430706] WARNING: possible recursive locking detected
      <4>[   23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted
      <4>[   23.430720] --------------------------------------------
      <4>[   23.430725] systemd-udevd/169 is trying to acquire lock:
      <4>[   23.430732]         (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915]
      <4>[   23.430888]
                        but task is already holding lock:
      <4>[   23.430894]         (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915]
      <4>[   23.430995]
                        other info that might help us debug this:
      <4>[   23.431002]  Possible unsafe locking scenario:
      
      <4>[   23.431007]        CPU0
      <4>[   23.431010]        ----
      <4>[   23.431013]   lock(&(&timeline->lock)->rlock);
      <4>[   23.431021]   lock(&(&timeline->lock)->rlock);
      <4>[   23.431028]
                         *** DEADLOCK ***
      
      <4>[   23.431036]  May be due to missing lock nesting notation
      
      <4>[   23.431044] 5 locks held by systemd-udevd/169:
      <4>[   23.431049]  #0:         (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0
      <4>[   23.431065]  #1:         (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0
      <4>[   23.431078]  #2:         (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915]
      <4>[   23.431174]  #3:         (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915]
      <4>[   23.431271]  #4:         (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915]
      <4>[   23.431369]
                        stack backtrace:
      <4>[   23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1
      <4>[   23.431385] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
      <4>[   23.431394] Call Trace:
      <4>[   23.431403]  dump_stack+0x67/0x9b
      <4>[   23.431411]  __lock_acquire+0xc67/0x1b50
      <4>[   23.431421]  ? ring_buffer_lock_reserve+0x154/0x3f0
      <4>[   23.431429]  ? lock_acquire+0xa6/0x210
      <4>[   23.431435]  lock_acquire+0xa6/0x210
      <4>[   23.431530]  ? move_to_timeline+0x48/0x12c [i915]
      <4>[   23.431540]  _raw_spin_lock+0x2a/0x40
      <4>[   23.431634]  ? move_to_timeline+0x48/0x12c [i915]
      <4>[   23.431730]  move_to_timeline+0x48/0x12c [i915]
      <4>[   23.431826]  __i915_request_submit+0xfa/0x280 [i915]
      <4>[   23.431923]  i915_request_submit+0x25/0x40 [i915]
      <4>[   23.432024]  i9xx_submit_request+0x11/0x140 [i915]
      <4>[   23.432120]  submit_notify+0x8d/0x124 [i915]
      <4>[   23.432202]  __i915_sw_fence_complete+0x81/0x250 [i915]
      <4>[   23.432300]  __i915_request_add+0x31c/0x7c0 [i915]
      <4>[   23.432395]  i915_gem_init+0x621/0x630 [i915]
      <4>[   23.432476]  i915_driver_load+0xbee/0x10b0 [i915]
      <4>[   23.432485]  ? trace_hardirqs_on_caller+0xe0/0x1b0
      <4>[   23.432566]  i915_pci_probe+0x29/0x90 [i915]
      <4>[   23.432574]  pci_device_probe+0xa1/0x130
      <4>[   23.432582]  driver_probe_device+0x306/0x480
      <4>[   23.432589]  __driver_attach+0xb7/0xe0
      <4>[   23.432596]  ? driver_probe_device+0x480/0x480
      <4>[   23.432602]  ? driver_probe_device+0x480/0x480
      <4>[   23.432609]  bus_for_each_dev+0x74/0xc0
      <4>[   23.432616]  bus_add_driver+0x15f/0x250
      <4>[   23.432623]  ? 0xffffffffa02d7000
      <4>[   23.432629]  driver_register+0x52/0xc0
      <4>[   23.432635]  ? 0xffffffffa02d7000
      <4>[   23.432642]  do_one_initcall+0x58/0x370
      <4>[   23.432653]  ? do_init_module+0x1d/0x1ea
      <4>[   23.432660]  ? rcu_read_lock_sched_held+0x6f/0x80
      <4>[   23.432667]  ? kmem_cache_alloc_trace+0x282/0x2e0
      <4>[   23.432675]  do_init_module+0x56/0x1ea
      <4>[   23.432682]  load_module+0x2435/0x2b20
      <4>[   23.432694]  ? __se_sys_finit_module+0xd3/0xf0
      <4>[   23.432701]  __se_sys_finit_module+0xd3/0xf0
      <4>[   23.432710]  do_syscall_64+0x55/0x190
      <4>[   23.432717]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      <4>[   23.432724] RIP: 0033:0x7fa780782839
      <4>[   23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      <4>[   23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839
      <4>[   23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e
      <4>[   23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780
      <4>[   23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000
      <4>[   23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30
      
      but did not report it as an issue as it only occurred during the first
      module on boot. This is due to the removal of the distinct global
      timeline, and its separate lock class. So instead mark up the expected
      nesting. An alternative would be to define a separate lock class for the
      engine, but since we only expect to have a single point of nesting, we
      can avoid having multiple lock classes for the struct.
      
      Fixes: a89d1f92 ("drm/i915: Split i915_gem_timeline into individual timelines")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Tested-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180508153514.20251-1-chris@chris-wilson.co.uk
      0adb90d3
  2. 08 5月, 2018 1 次提交
  3. 04 5月, 2018 2 次提交
    • C
      drm/i915: Remove assertion of active_rings must be non-empty if active_requests · 43c8c441
      Chris Wilson 提交于
      "An outstanding request must still be on an active ring somewhere" is
      only true if we haven't just been interrupted by the shrinker in the
      middle of allocating the request itself. (At the start of
      i915_request_alloc() we pin the context and prepare the GT for activity,
      marking it as active, and then try to allocate the request. If this
      allocation invokes the shrinker, we try to reclaim some space by calling
      i915_retire_requests() which may then be confused by the pre-reservation
      of active_requests.)
      
      <3>[  125.472695] i915_retire_requests:1429 GEM_BUG_ON(list_empty(&i915->gt.active_rings))
      <2>[  125.472792] kernel BUG at drivers/gpu/drm/i915/i915_request.c:1429!
      <4>[  125.472822] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      <4>[  125.498764] Modules linked in: snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btusb btrtl btbcm btintel cdc_ether snd_hda_codec_realtek bluetooth i915 snd_hda_codec_generic usbnet r8152 mii ecdh_generic lpc_ich mei_me snd_hda_intel snd_hda_codec mei snd_hwdep snd_hda_core snd_pcm prime_numbers
      <4>[  125.498923] CPU: 0 PID: 1115 Comm: gem_exec_create Tainted: G     U            4.17.0-rc3-gc49cbe0d1eb8-kasan_32+ #1
      <4>[  125.498955] Hardware name: GOOGLE Peppy/Peppy, BIOS MrChromebox 02/04/2018
      <4>[  125.499074] RIP: 0010:i915_retire_requests+0x3f2/0x590 [i915]
      <4>[  125.499095] RSP: 0018:ffff88004e5dec40 EFLAGS: 00010282
      <4>[  125.499117] RAX: 0000000000000010 RBX: ffff8800458f0000 RCX: 0000000000000000
      <4>[  125.499140] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff880060c2f6f0
      <4>[  125.499164] RBP: ffff88004e5dee30 R08: ffffed000c185ee6 R09: ffffed000c185ee6
      <4>[  125.499187] R10: 0000000000000001 R11: ffffed000c185ee5 R12: ffff8800553da160
      <4>[  125.499210] R13: dffffc0000000000 R14: 0000000000000000 R15: ffff8800458faed0
      <4>[  125.499235] FS:  00007fe18f052980(0000) GS:ffff880065400000(0000) knlGS:0000000000000000
      <4>[  125.499262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[  125.499282] CR2: 00007f01df11efb8 CR3: 00000000518d4001 CR4: 00000000000606f0
      <4>[  125.499304] Call Trace:
      <4>[  125.499417]  i915_gem_shrink+0x576/0xb50 [i915]
      <4>[  125.499532]  ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915]
      <4>[  125.499561]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      <4>[  125.499671]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
      <4>[  125.499782]  ? i915_gem_shrinker_scan+0xc4/0x320 [i915]
      <4>[  125.499889]  i915_gem_shrinker_scan+0xc4/0x320 [i915]
      <4>[  125.499997]  ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915]
      <4>[  125.500021]  ? do_raw_spin_unlock+0x4f/0x240
      <4>[  125.500042]  ? _raw_spin_unlock+0x29/0x40
      <4>[  125.500149]  ? i915_gem_shrinker_count+0x1d6/0x2f0 [i915]
      <4>[  125.500177]  shrink_slab.part.18+0x23e/0x8f0
      <4>[  125.500202]  ? unregister_shrinker+0x1f0/0x1f0
      <4>[  125.500226]  ? mem_cgroup_iter+0x379/0xcc0
      <4>[  125.500249]  shrink_node+0xa7e/0x1180
      <4>[  125.500276]  ? shrink_node_memcg+0x11f0/0x11f0
      <4>[  125.500297]  ? __delayacct_freepages_start+0x38/0x80
      <4>[  125.500319]  ? __is_insn_slot_addr+0xe3/0x1a0
      <4>[  125.500342]  ? recalibrate_cpu_khz+0x10/0x10
      <4>[  125.500361]  ? ktime_get+0xb2/0x140
      <4>[  125.500382]  do_try_to_free_pages+0x2d3/0xe40
      <4>[  125.500407]  ? allow_direct_reclaim.part.23+0x1e0/0x1e0
      <4>[  125.500429]  ? shrink_node+0x1180/0x1180
      <4>[  125.500450]  ? __read_once_size_nocheck.constprop.4+0x10/0x10
      <4>[  125.500476]  try_to_free_pages+0x1af/0x560
      <4>[  125.500497]  ? do_try_to_free_pages+0xe40/0xe40
      <4>[  125.500525]  __alloc_pages_nodemask+0xadc/0x2130
      <4>[  125.500553]  ? gfp_pfmemalloc_allowed+0x150/0x150
      <4>[  125.500654]  ? i915_gem_do_execbuffer+0x219d/0x32e0 [i915]
      <4>[  125.500678]  ? debug_check_no_locks_freed+0x2a0/0x2a0
      <4>[  125.500701]  ? __debug_object_init+0x322/0xd90
      <4>[  125.500722]  ? debug_check_no_locks_freed+0x2a0/0x2a0
      <4>[  125.500827]  ? i915_gem_do_execbuffer+0xdc2/0x32e0 [i915]
      <4>[  125.500942]  ? i915_request_alloc+0x5b5/0x13f0 [i915]
      <4>[  125.500964]  ? page_frag_free+0x170/0x170
      <4>[  125.500984]  ? debug_check_no_locks_freed+0x2a0/0x2a0
      <4>[  125.501008]  new_slab+0x21d/0x5c0
      <4>[  125.501029]  ___slab_alloc.constprop.35+0x322/0x3e0
      <4>[  125.501052]  ? reservation_object_reserve_shared+0x10b/0x250
      <4>[  125.501074]  ? __ww_mutex_lock.constprop.3+0x1104/0x2cf0
      <4>[  125.501097]  ? _raw_spin_unlock_irqrestore+0x39/0x60
      <4>[  125.501120]  ? fs_reclaim_acquire+0x10/0x10
      <4>[  125.501138]  ? lock_acquire+0x138/0x3c0
      <4>[  125.501156]  ? lock_acquire+0x3c0/0x3c0
      <4>[  125.501176]  ? reservation_object_reserve_shared+0x10b/0x250
      <4>[  125.501198]  ? __slab_alloc.isra.27.constprop.34+0x3d/0x70
      <4>[  125.501219]  __slab_alloc.isra.27.constprop.34+0x3d/0x70
      <4>[  125.501243]  ? reservation_object_reserve_shared+0x10b/0x250
      <4>[  125.501265]  __kmalloc_track_caller+0x313/0x350
      <4>[  125.501287]  krealloc+0x62/0xb0
      <4>[  125.501305]  reservation_object_reserve_shared+0x10b/0x250
      <4>[  125.501411]  i915_gem_do_execbuffer+0x2040/0x32e0 [i915]
      <4>[  125.501522]  ? eb_relocate_slow+0xad0/0xad0 [i915]
      <4>[  125.501544]  ? debug_check_no_locks_freed+0x2a0/0x2a0
      <4>[  125.501646]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
      <4>[  125.501755]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
      <4>[  125.501779]  ? drm_dev_get+0x20/0x20
      <4>[  125.501803]  ? __might_fault+0xea/0x1a0
      <4>[  125.501902]  ? i915_gem_execbuffer2_ioctl+0x108/0x770 [i915]
      <4>[  125.502012]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
      <4>[  125.502116]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
      <4>[  125.502218]  i915_gem_execbuffer2_ioctl+0x3c5/0x770 [i915]
      <4>[  125.502243]  ? drm_dev_enter+0xe0/0xe0
      <4>[  125.502260]  ? lock_acquire+0x138/0x3c0
      <4>[  125.502362]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
      <4>[  125.502470]  ? i915_gem_object_create.part.28+0x570/0x570 [i915]
      <4>[  125.502575]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
      <4>[  125.502680]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
      <4>[  125.502702]  drm_ioctl_kernel+0x151/0x200
      <4>[  125.502721]  ? drm_ioctl_permit+0x2a0/0x2a0
      <4>[  125.502746]  drm_ioctl+0x63a/0x920
      <4>[  125.502844]  ? i915_gem_execbuffer_ioctl+0xb90/0xb90 [i915]
      <4>[  125.502868]  ? drm_getstats+0x20/0x20
      <4>[  125.502886]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      <4>[  125.502919]  do_vfs_ioctl+0x173/0xe90
      <4>[  125.502936]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      <4>[  125.502957]  ? ioctl_preallocate+0x170/0x170
      <4>[  125.502978]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      <4>[  125.503002]  ? retint_kernel+0x2d/0x2d
      <4>[  125.503024]  ksys_ioctl+0x35/0x60
      <4>[  125.503043]  __x64_sys_ioctl+0x6a/0xb0
      <4>[  125.503061]  do_syscall_64+0x97/0x400
      <4>[  125.503081]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      <4>[  125.503101] RIP: 0033:0x7fe18e4f65d7
      <4>[  125.503116] RSP: 002b:00007ffe2ffc06a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      <4>[  125.503145] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe18e4f65d7
      <4>[  125.503168] RDX: 00007ffe2ffc07f0 RSI: 0000000040406469 RDI: 0000000000000003
      <4>[  125.503191] RBP: 00007ffe2ffc07f0 R08: 0000000000000004 R09: 00007ffe2ffcf080
      <4>[  125.503215] R10: 000000000002c7de R11: 0000000000000246 R12: 0000000040406469
      <4>[  125.503238] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
      <4>[  125.503268] Code: e8 18 a0 c9 da 48 8b 35 25 3a 47 00 49 c7 c0 a0 3b 88 c0 b9 95 05 00 00 48 c7 c2 e0 49 88 c0 48 c7 c7 8d 3b 5d c0 e8 ee 7e db da <0f> 0b 48 89 ef e8 a4 26 f5 da e9 51 fe ff ff e8 8a 26 f5 da e9
      <1>[  125.503548] RIP: i915_retire_requests+0x3f2/0x590 [i915] RSP: ffff88004e5dec40
      
      Fixes: 643b450a ("drm/i915: Only track live rings for retiring")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180504101147.26286-1-chris@chris-wilson.co.uk
      43c8c441
    • C
      drm/i915: Keep one request in our ring_list · 7c572e1b
      Chris Wilson 提交于
      Don't pre-emptively retire the oldest request in our ring's list if it
      is the only request. We keep various bits of state alive using the
      active reference from the request and would rather transfer that state
      over to a new request rather than the more involved process of retiring
      and reacquiring it.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-2-chris@chris-wilson.co.uk
      7c572e1b
  4. 03 5月, 2018 3 次提交
  5. 30 4月, 2018 4 次提交
  6. 19 4月, 2018 2 次提交
  7. 09 4月, 2018 1 次提交
  8. 07 4月, 2018 2 次提交
    • C
      drm/i915: Pass the set of guilty engines to i915_reset() · d0667e9c
      Chris Wilson 提交于
      Currently, we rely on inspecting the hangcheck state from within the
      i915_reset() routines to determine which engines were guilty of the
      hang. This is problematic for cases where we want to run
      i915_handle_error() and call i915_reset() independently of hangcheck.
      Instead of relying on the indirect parameter passing, turn it into an
      explicit parameter providing the set of stalled engines which then are
      treated as guilty until proven innocent.
      
      While we are removing the implicit stalled parameter, also make the
      reason into an explicit parameter to i915_reset(). We still need a
      back-channel for i915_handle_error() to hand over the task to the locked
      waiter, but let's keep that its own channel rather than incriminate
      another.
      
      This leaves stalled/seqno as being private to hangcheck, with no more
      nefarious snooping by reset, be it whole-device or per-engine. \o/
      
      The only real issue now is that this makes it crystal clear that we
      don't actually do any testing of hangcheck per se in
      drv_selftest/live_hangcheck, merely of resets!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Jeff McGee <jeff.mcgee@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180406220354.18911-2-chris@chris-wilson.co.uk
      d0667e9c
    • C
      drm/i915: Split out parking from the idle worker for reuse · e4d2006f
      Chris Wilson 提交于
      We will want to park GEM before disengaging the drive^W^W^W unwedging.
      Since we already do the work for idling, expose the guts as a new
      function that we can then reuse.
      
      v2: Just skip if already parked; makes it more forgiving to use by
      future callers.
      v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to
      i915_gem_park so that we can evaluate it for symmetry more easily.
      Calling GEM from inside i915_request looks to be a bit of a layering
      violation, for the moment I am imaging them as being notify_cb.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
      Reviewed-by: NMichal Wajdeczko <michal.wajdeczko@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
      e4d2006f
  9. 29 3月, 2018 1 次提交
  10. 22 3月, 2018 2 次提交
  11. 20 3月, 2018 1 次提交
  12. 16 3月, 2018 2 次提交
  13. 13 3月, 2018 1 次提交
  14. 09 3月, 2018 2 次提交
  15. 07 3月, 2018 2 次提交
  16. 24 2月, 2018 1 次提交
  17. 22 2月, 2018 1 次提交
  18. 21 2月, 2018 1 次提交
  19. 08 2月, 2018 2 次提交
  20. 07 2月, 2018 2 次提交
  21. 05 2月, 2018 1 次提交
  22. 01 2月, 2018 1 次提交
    • C
      drm/i915: Always run hangcheck while the GPU is busy · b26a32a8
      Chris Wilson 提交于
      Previously, we relied on only running the hangcheck while somebody was
      waiting on the GPU, in order to minimise the amount of time hangcheck
      had to run. (If nobody was watching the GPU, nobody would notice if the
      GPU wasn't responding -- eventually somebody would care and so kick
      hangcheck into action.) However, this falls apart from around commit
      4680816b ("drm/i915: Wait first for submission, before waiting for
      request completion"), as not all waiters declare themselves to hangcheck
      and so we could switch off hangcheck and miss GPU hangs even when
      waiting under the struct_mutex.
      
      If we enable hangcheck from the first request submission, and let it run
      until the GPU is idle again, we forgo all the complexity involved with
      only enabling around waiters. We just have to remember to be careful that
      we do not declare a GPU hang when idly waiting for the next request to
      be come ready, as we will run hangcheck continuously even when the
      engines are stalled waiting for external events. This should be true
      already as we should only be tracking requests submitted to hardware for
      execution as an indicator that the engine is busy.
      
      Fixes: 4680816b ("drm/i915: Wait first for submission, before waiting for request completion"
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104840Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180129144104.3921-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      (cherry picked from commit 88923048)
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      b26a32a8
  23. 31 1月, 2018 1 次提交
    • C
      drm/i915: Always run hangcheck while the GPU is busy · 88923048
      Chris Wilson 提交于
      Previously, we relied on only running the hangcheck while somebody was
      waiting on the GPU, in order to minimise the amount of time hangcheck
      had to run. (If nobody was watching the GPU, nobody would notice if the
      GPU wasn't responding -- eventually somebody would care and so kick
      hangcheck into action.) However, this falls apart from around commit
      4680816b ("drm/i915: Wait first for submission, before waiting for
      request completion"), as not all waiters declare themselves to hangcheck
      and so we could switch off hangcheck and miss GPU hangs even when
      waiting under the struct_mutex.
      
      If we enable hangcheck from the first request submission, and let it run
      until the GPU is idle again, we forgo all the complexity involved with
      only enabling around waiters. We just have to remember to be careful that
      we do not declare a GPU hang when idly waiting for the next request to
      be come ready, as we will run hangcheck continuously even when the
      engines are stalled waiting for external events. This should be true
      already as we should only be tracking requests submitted to hardware for
      execution as an indicator that the engine is busy.
      
      Fixes: 4680816b ("drm/i915: Wait first for submission, before waiting for request completion"
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104840Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180129144104.3921-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      88923048
  24. 29 1月, 2018 1 次提交
  25. 24 1月, 2018 1 次提交
  26. 20 1月, 2018 1 次提交