1. 01 12月, 2021 3 次提交
  2. 26 11月, 2021 1 次提交
    • T
      drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code · 8b91cdd4
      Thomas Hellström 提交于
      The capture code is typically run entirely in the fence signalling
      critical path. We're about to add lockdep annotation in an upcoming patch
      which reveals a lockdep splat similar to the below one.
      
      Fix the associated potential deadlocks using __GFP_KSWAPD_RECLAIM
      (which is the same as GFP_WAIT, but open-coded for clarity) rather than
      GFP_KERNEL for memory allocation in the capture path. This has the
      potential drawback that capture might fail in situations with memory
      pressure.
      
      [  234.842048] WARNING: possible circular locking dependency detected
      [  234.842050] 5.15.0-rc7+ #20 Tainted: G     U  W
      [  234.842052] ------------------------------------------------------
      [  234.842054] gem_exec_captur/1180 is trying to acquire lock:
      [  234.842056] ffffffffa3e51c00 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x4d/0x330
      [  234.842063]
                     but task is already holding lock:
      [  234.842064] ffffffffa3f57620 (dma_fence_map){++++}-{0:0}, at: i915_vma_snapshot_resource_pin+0x27/0x30 [i915]
      [  234.842138]
                     which lock already depends on the new lock.
      
      [  234.842140]
                     the existing dependency chain (in reverse order) is:
      [  234.842142]
                     -> #2 (dma_fence_map){++++}-{0:0}:
      [  234.842145]        __dma_fence_might_wait+0x41/0xa0
      [  234.842149]        dma_resv_lockdep+0x1dc/0x28f
      [  234.842151]        do_one_initcall+0x58/0x2d0
      [  234.842154]        kernel_init_freeable+0x273/0x2bf
      [  234.842157]        kernel_init+0x16/0x120
      [  234.842160]        ret_from_fork+0x1f/0x30
      [  234.842163]
                     -> #1 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
      [  234.842166]        fs_reclaim_acquire+0x6d/0xd0
      [  234.842168]        __kmalloc_node+0x51/0x3a0
      [  234.842171]        alloc_cpumask_var_node+0x1b/0x30
      [  234.842174]        native_smp_prepare_cpus+0xc7/0x292
      [  234.842177]        kernel_init_freeable+0x160/0x2bf
      [  234.842179]        kernel_init+0x16/0x120
      [  234.842181]        ret_from_fork+0x1f/0x30
      [  234.842184]
                     -> #0 (fs_reclaim){+.+.}-{0:0}:
      [  234.842186]        __lock_acquire+0x1161/0x1dc0
      [  234.842189]        lock_acquire+0xb5/0x2b0
      [  234.842192]        fs_reclaim_acquire+0xa1/0xd0
      [  234.842193]        __kmalloc+0x4d/0x330
      [  234.842196]        i915_vma_coredump_create+0x78/0x5b0 [i915]
      [  234.842253]        intel_engine_coredump_add_vma+0x36/0xe0 [i915]
      [  234.842307]        __i915_gpu_coredump+0x290/0x5e0 [i915]
      [  234.842365]        i915_capture_error_state+0x57/0xa0 [i915]
      [  234.842415]        intel_gt_handle_error+0x348/0x3e0 [i915]
      [  234.842462]        intel_gt_debugfs_reset_store+0x3c/0x90 [i915]
      [  234.842504]        simple_attr_write+0xc1/0xe0
      [  234.842507]        full_proxy_write+0x53/0x80
      [  234.842509]        vfs_write+0xbc/0x350
      [  234.842513]        ksys_write+0x58/0xd0
      [  234.842514]        do_syscall_64+0x38/0x90
      [  234.842516]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  234.842519]
                     other info that might help us debug this:
      
      [  234.842521] Chain exists of:
                       fs_reclaim --> mmu_notifier_invalidate_range_start --> dma_fence_map
      
      [  234.842526]  Possible unsafe locking scenario:
      
      [  234.842528]        CPU0                    CPU1
      [  234.842529]        ----                    ----
      [  234.842531]   lock(dma_fence_map);
      [  234.842532]                                lock(mmu_notifier_invalidate_range_start);
      [  234.842535]                                lock(dma_fence_map);
      [  234.842537]   lock(fs_reclaim);
      [  234.842539]
                      *** DEADLOCK ***
      
      [  234.842540] 4 locks held by gem_exec_captur/1180:
      [  234.842543]  #0: ffff9007812d9460 (sb_writers#17){.+.+}-{0:0}, at: ksys_write+0x58/0xd0
      [  234.842547]  #1: ffff900781d9ecb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write+0x3a/0xe0
      [  234.842552]  #2: ffffffffc11913a8 (capture_mutex){+.+.}-{3:3}, at: i915_capture_error_state+0x1a/0xa0 [i915]
      [  234.842602]  #3: ffffffffa3f57620 (dma_fence_map){++++}-{0:0}, at: i915_vma_snapshot_resource_pin+0x27/0x30 [i915]
      [  234.842656]
                     stack backtrace:
      [  234.842658] CPU: 0 PID: 1180 Comm: gem_exec_captur Tainted: G     U  W         5.15.0-rc7+ #20
      [  234.842661] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
      [  234.842664] Call Trace:
      [  234.842666]  dump_stack_lvl+0x57/0x72
      [  234.842669]  check_noncircular+0xde/0x100
      [  234.842672]  ? __lock_acquire+0x3bf/0x1dc0
      [  234.842675]  __lock_acquire+0x1161/0x1dc0
      [  234.842678]  lock_acquire+0xb5/0x2b0
      [  234.842680]  ? __kmalloc+0x4d/0x330
      [  234.842683]  ? finish_task_switch.isra.0+0xf2/0x360
      [  234.842686]  ? i915_vma_coredump_create+0x78/0x5b0 [i915]
      [  234.842734]  fs_reclaim_acquire+0xa1/0xd0
      [  234.842737]  ? __kmalloc+0x4d/0x330
      [  234.842739]  __kmalloc+0x4d/0x330
      [  234.842742]  i915_vma_coredump_create+0x78/0x5b0 [i915]
      [  234.842793]  ? capture_vma+0xbe/0x110 [i915]
      [  234.842844]  intel_engine_coredump_add_vma+0x36/0xe0 [i915]
      [  234.842892]  __i915_gpu_coredump+0x290/0x5e0 [i915]
      [  234.842939]  i915_capture_error_state+0x57/0xa0 [i915]
      [  234.842985]  intel_gt_handle_error+0x348/0x3e0 [i915]
      [  234.843032]  ? __mutex_lock+0x81/0x830
      [  234.843035]  ? simple_attr_write+0x3a/0xe0
      [  234.843038]  ? __lock_acquire+0x3bf/0x1dc0
      [  234.843041]  intel_gt_debugfs_reset_store+0x3c/0x90 [i915]
      [  234.843083]  ? _copy_from_user+0x45/0x80
      [  234.843086]  simple_attr_write+0xc1/0xe0
      [  234.843089]  full_proxy_write+0x53/0x80
      [  234.843091]  vfs_write+0xbc/0x350
      [  234.843094]  ksys_write+0x58/0xd0
      [  234.843096]  do_syscall_64+0x38/0x90
      [  234.843098]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  234.843101] RIP: 0033:0x7fa467480877
      [  234.843103] Code: 75 05 48 83 c4 58 c3 e8 37 4e ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [  234.843108] RSP: 002b:00007ffd14d79b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  234.843112] RAX: ffffffffffffffda RBX: 00007ffd14d79b60 RCX: 00007fa467480877
      [  234.843114] RDX: 0000000000000014 RSI: 00007ffd14d79b60 RDI: 0000000000000007
      [  234.843116] RBP: 0000000000000007 R08: 0000000000000000 R09: 00007ffd14d79ab0
      [  234.843119] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000014
      [  234.843121] R13: 0000000000000000 R14: 00007ffd14d79b60 R15: 0000000000000005
      
      v5:
      - Use __GFP_KSWAPD_RECLAIM rather than __GFP_NOWAIT for clarity.
        (Daniel Vetter)
      v6:
      - Include an instance in execlists_capture_work().
      - Rework the commit message due to patch reordering.
      Signed-off-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: NRamalingam C <ramalingam.c@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211108174547.979714-3-thomas.hellstrom@linux.intel.com
      8b91cdd4
  3. 25 11月, 2021 1 次提交
  4. 24 11月, 2021 2 次提交
    • T
      drm/i915/gt: Hold RPM wakelock during PXP suspend · d22d446f
      Tejas Upadhyay 提交于
      selftest --r live shows failure in suspend tests when
      RPM wakelock is not acquired during suspend.
      
      This changes addresses below error :
      <4> [154.177535] RPM wakelock ref not held during HW access
      <4> [154.177575] WARNING: CPU: 4 PID: 5772 at
      drivers/gpu/drm/i915/intel_runtime_pm.h:113
      fwtable_write32+0x240/0x320 [i915]
      <4> [154.177974] Modules linked in: i915(+) vgem drm_shmem_helper
      fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic
      ledtrig_audio mei_hdcp mei_pxp x86_pkg_temp_thermal coretemp
      crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_intel_dspcfg
      snd_hda_codec snd_hwdep igc snd_hda_core ttm mei_me ptp
      snd_pcm prime_numbers mei i2c_i801 pps_core i2c_smbus intel_lpss_pci
      btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
      <4> [154.178143] CPU: 4 PID: 5772 Comm: i915_selftest Tainted: G
      U            5.15.0-rc6-CI-Patchwork_21432+ #1
      <4> [154.178154] Hardware name: ASUS System Product Name/TUF GAMING
      Z590-PLUS WIFI, BIOS 0811 04/06/2021
      <4> [154.178160] RIP: 0010:fwtable_write32+0x240/0x320 [i915]
      <4> [154.178604] Code: 15 7b e1 0f 0b e9 34 fe ff ff 80 3d a9 89 31
      00 00 0f 85 31 fe ff ff 48 c7 c7 88 9e 4f a0 c6 05 95 89 31 00 01 e8
      c0 15 7b e1 <0f> 0b e9 17 fe ff ff 8b 05 0f 83 58 e2 85 c0 0f 85 8d
      00 00 00 48
      <4> [154.178614] RSP: 0018:ffffc900016279f0 EFLAGS: 00010286
      <4> [154.178626] RAX: 0000000000000000 RBX: ffff888204fe0ee0
      RCX: 0000000000000001
      <4> [154.178634] RDX: 0000000080000001 RSI: ffffffff823142b5
      RDI: 00000000ffffffff
      <4> [154.178641] RBP: 00000000000320f0 R08: 0000000000000000
      R09: c0000000ffffcd5a
      <4> [154.178647] R10: 00000000000f8c90 R11: ffffc90001627808
      R12: 0000000000000000
      <4> [154.178654] R13: 0000000040000000 R14: ffffffffa04d12e0
      R15: 0000000000000000
      <4> [154.178660] FS:  00007f7390aa4c00(0000) GS:ffff88844f000000(0000)
      knlGS:0000000000000000
      <4> [154.178669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4> [154.178675] CR2: 000055bc40595028 CR3: 0000000204474005
      CR4: 0000000000770ee0
      <4> [154.178682] PKRU: 55555554
      <4> [154.178687] Call Trace:
      <4> [154.178706]  intel_pxp_fini_hw+0x23/0x30 [i915]
      <4> [154.179284]  intel_pxp_suspend+0x1f/0x30 [i915]
      <4> [154.179807]  live_gt_resume+0x5b/0x90 [i915]
      
      Changes since V2 :
      	- Remove boolean in intel_pxp_runtime_preapre for
      	  non-pxp configs. Solves build error
      Changes since V2 :
      	- Open-code intel_pxp_runtime_suspend - Daniele
      	- Remove boolean in intel_pxp_runtime_preapre - Daniele
      Changes since V1 :
      	- split the HW access parts in gt_suspend_late - Daniele
      	- Remove default PXP configs
      Signed-off-by: NTejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
      Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Fixes: 0cfab4cb ("drm/i915/pxp: Enable PXP power management")
      Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211117060321.3729343-1-tejaskumarx.surendrakumar.upadhyay@intel.com
      d22d446f
    • U
      drm/i915/pmu: Increase the live_engine_busy_stats sample period · 5979873e
      Umesh Nerlige Ramappa 提交于
      Irrespective of the backend for request submissions, busyness for an
      engine with an active context is calculated using:
      
      busyness = total + (current_time - context_switch_in_time)
      
      In execlists mode of operation, the context switch events are handled
      by the CPU. Context switch in/out time and current_time are captured
      in CPU time domain using ktime_get().
      
      In GuC mode of submission, context switch events are handled by GuC and
      the times in the above formula are captured in GT clock domain. This
      information is shared with the CPU through shared memory. This results
      in 2 caveats:
      
      1) The time taken between start of a batch and the time that CPU is able
      to see the context_switch_in_time in shared memory is dependent on GuC
      and memory bandwidth constraints.
      
      2) Determining current_time requires an MMIO read that can take anywhere
      between a few us to a couple ms. A reference CPU time is captured soon
      after reading the MMIO so that the caller can compare the cpu delta
      between 2 busyness samples. The issue here is that the CPU delta and the
      busyness delta can be skewed because of the time taken to read the
      register.
      
      These 2 factors affect the accuracy of the selftest -
      live_engine_busy_stats. For (1) the selftest waits until busyness stats
      are visible to the CPU. The effects of (2) are more prominent for the
      current busyness sample period of 100 us. Increase the busyness sample
      period from 100 us to 10 ms to overccome (2).
      
      v2: Fix checkpatch issues
      Signed-off-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211115221640.30793-1-umesh.nerlige.ramappa@intel.com
      5979873e
  5. 22 11月, 2021 1 次提交
  6. 20 11月, 2021 4 次提交
  7. 17 11月, 2021 2 次提交
  8. 16 11月, 2021 2 次提交
  9. 13 11月, 2021 1 次提交
  10. 12 11月, 2021 3 次提交
  11. 10 11月, 2021 3 次提交
  12. 09 11月, 2021 1 次提交
  13. 05 11月, 2021 1 次提交
  14. 04 11月, 2021 3 次提交
  15. 03 11月, 2021 1 次提交
  16. 01 11月, 2021 1 次提交
  17. 29 10月, 2021 4 次提交
    • M
      drm/i915/gtt: stop caching the scratch page · b0cc4dca
      Matthew Auld 提交于
      Normal users shouldn't be hitting this, likely this would indicate a
      userspace bug. So don't bother caching, which should be safe now that we
      manually flush the page.
      Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-2-matthew.auld@intel.com
      b0cc4dca
    • M
      drm/i915/gtt: flush the scratch page · 2ca77606
      Matthew Auld 提交于
      The scratch page is directly visible in the users address space, and
      while this is forced as CACHE_LLC, by the kernel, we still have to
      contend with things like "Bypass-LLC" MOCS. So just flush no matter
      what.
      
      v2(Thomas):
        - Make sure we use drm_clflush_virt_range here, in case clflush support
          is missing.
      Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211028092638.3142258-1-matthew.auld@intel.com
      2ca77606
    • U
      drm/i915/pmu: Connect engine busyness stats from GuC to pmu · 77cdd054
      Umesh Nerlige Ramappa 提交于
      With GuC handling scheduling, i915 is not aware of the time that a
      context is scheduled in and out of the engine. Since i915 pmu relies on
      this info to provide engine busyness to the user, GuC shares this info
      with i915 for all engines using shared memory. For each engine, this
      info contains:
      
      - total busyness: total time that the context was running (total)
      - id: id of the running context (id)
      - start timestamp: timestamp when the context started running (start)
      
      At the time (now) of sampling the engine busyness, if the id is valid
      (!= ~0), and start is non-zero, then the context is considered to be
      active and the engine busyness is calculated using the below equation
      
      	engine busyness = total + (now - start)
      
      All times are obtained from the gt clock base. For inactive contexts,
      engine busyness is just equal to the total.
      
      The start and total values provided by GuC are 32 bits and wrap around
      in a few minutes. Since perf pmu provides busyness as 64 bit
      monotonically increasing values, there is a need for this implementation
      to account for overflows and extend the time to 64 bits before returning
      busyness to the user. In order to do that, a worker runs periodically at
      frequency = 1/8th the time it takes for the timestamp to wrap. As an
      example, that would be once in 27 seconds for a gt clock frequency of
      19.2 MHz.
      
      Note:
      There might be an over-accounting of busyness due to the fact that GuC
      may be updating the total and start values while kmd is reading them.
      (i.e kmd may read the updated total and the stale start). In such a
      case, user may see higher busyness value followed by smaller ones which
      would eventually catch up to the higher value.
      
      v2: (Tvrtko)
      - Include details in commit message
      - Move intel engine busyness function into execlist code
      - Use union inside engine->stats
      - Use natural type for ping delay jiffies
      - Drop active_work condition checks
      - Use for_each_engine if iterating all engines
      - Drop seq locking, use spinlock at GuC level to update engine stats
      - Document worker specific details
      
      v3: (Tvrtko/Umesh)
      - Demarcate GuC and execlist stat objects with comments
      - Document known over-accounting issue in commit
      - Provide a consistent view of GuC state
      - Add hooks to gt park/unpark for GuC busyness
      - Stop/start worker in gt park/unpark path
      - Drop inline
      - Move spinlock and worker inits to GuC initialization
      - Drop helpers that are called only once
      
      v4: (Tvrtko/Matt/Umesh)
      - Drop addressed opens from commit message
      - Get runtime pm in ping, remove from the park path
      - Use cancel_delayed_work_sync in disable_submission path
      - Update stats during reset prepare
      - Skip ping if reset in progress
      - Explicitly name execlists and GuC stats objects
      - Since disable_submission is called from many places, move resetting
        stats to intel_guc_submission_reset_prepare
      
      v5: (Tvrtko)
      - Add a trylock helper that does not sleep and synchronize PMU event
        callbacks and worker with gt reset
      
      v6: (CI BAT failures)
      - DUTs using execlist submission failed to boot since __gt_unpark is
        called during i915 load. This ends up calling the GuC busyness unpark
        hook and results in kick-starting an uninitialized worker. Let
        park/unpark hooks check if GuC submission has been initialized.
      - drop cant_sleep() from trylock helper since rcu_read_lock takes care
        of that.
      
      v7: (CI) Fix igt@i915_selftest@live@gt_engines
      - For GuC mode of submission the engine busyness is derived from gt time
        domain. Use gt time elapsed as reference in the selftest.
      - Increase busyness calculation to 10ms duration to ensure batch runs
        longer and falls within the busyness tolerances in selftest.
      
      v8:
      - Use ktime_get in selftest as before
      - intel_reset_trylock_no_wait results in a lockdep splat that is not
        trivial to fix since the PMU callback runs in irq context and the
        reset paths are tightly knit into the driver. The test that uncovers
        this is igt@perf_pmu@faulting-read. Drop intel_reset_trylock_no_wait,
        instead use the reset_count to synchronize with gt reset during pmu
        callback. For the ping, continue to use intel_reset_trylock since ping
        is not run in irq context.
      
      - GuC PM timestamp does not tick when GuC is idle. This can potentially
        result in wrong busyness values when a context is active on the
        engine, but GuC is idle. Use the RING TIMESTAMP as GPU timestamp to
        process the GuC busyness stats. This works since both GuC timestamp and
        RING timestamp are synced with the same clock.
      
      - The busyness stats may get updated after the batch starts running.
        This delay causes the busyness reported for 100us duration to fall
        below 95% in the selftest. The only option at this time is to wait for
        GuC busyness to change from idle to active before we sample busyness
        over a 100us period.
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NMatthew Brost <matthew.brost@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-2-umesh.nerlige.ramappa@intel.com
      77cdd054
    • U
      drm/i915/pmu: Add a name to the execlists stats · 344e6947
      Umesh Nerlige Ramappa 提交于
      In preparation for GuC pmu stats, add a name to the execlists stats
      structure so that it can be differentiated from the GuC stats.
      Signed-off-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Reviewed-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211027004821.66097-1-umesh.nerlige.ramappa@intel.com
      344e6947
  18. 27 10月, 2021 1 次提交
    • M
      drm/i915/guc: Fix recursive lock in GuC submission · 9ca8bb7a
      Matthew Brost 提交于
      Use __release_guc_id (lock held) rather than release_guc_id (acquires
      lock), add lockdep annotations.
      
      213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
      [ 213.283459] ============================================
      [ 213.283462] WARNING: possible recursive locking detected
      {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
      [ 213.283470] --------------------------------------------
      [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
      [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
      {{[ 213.283618] }}
      {{ but task is already holding lock:}}
      [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283720] }}
      {{ other info that might help us debug this:}}
      [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
      [ 213.283728] ----
      [ 213.283730] lock(&guc->submission_state.lock);
      [ 213.283734] lock(&guc->submission_state.lock);
      {{[ 213.283737] }}
      {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8:
      [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283860] }}
      {{ stack backtrace:}}
      [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18
      [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
      [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
      [ 213.283957] Call Trace:
      [ 213.283960] dump_stack_lvl+0x57/0x72
      [ 213.283966] __lock_acquire.cold+0x191/0x2d3
      [ 213.283972] lock_acquire+0xb5/0x2b0
      [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
      [ 213.284139] ? lock_release+0xb9/0x280
      [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
      [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284310] process_one_work+0x270/0x550
      [ 213.284315] worker_thread+0x52/0x3b0
      [ 213.284319] ? process_one_work+0x550/0x550
      [ 213.284322] kthread+0x135/0x160
      [ 213.284326] ? set_kthread_struct+0x40/0x40
      [ 213.284331] ret_from_fork+0x1f/0x30
      
      and a bit later in the trace:
      
      {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
      [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
      [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
      [ 227.500209] ? mark_held_locks+0x49/0x70
      [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
      [ 227.500320] reset_prepare+0x78/0x90 [i915]
      [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
      [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
      [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
      [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
      [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
      [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
      [ 227.500838] local_pci_probe+0x42/0x80
      [ 227.500842] pci_device_probe+0xd9/0x190
      [ 227.500844] really_probe+0x1f2/0x3f0
      [ 227.500847] __driver_probe_device+0xfe/0x180
      [ 227.500848] driver_probe_device+0x1e/0x90
      [ 227.500850] __driver_attach+0xc4/0x1d0
      [ 227.500851] ? __device_attach_driver+0xe0/0xe0
      [ 227.500853] ? __device_attach_driver+0xe0/0xe0
      [ 227.500854] bus_for_each_dev+0x64/0x90
      [ 227.500856] bus_add_driver+0x12e/0x1f0
      [ 227.500857] driver_register+0x8f/0xe0
      [ 227.500859] i915_init+0x1d/0x8f [i915]
      [ 227.500934] ? 0xffffffffc144a000
      [ 227.500936] do_one_initcall+0x58/0x2d0
      [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
      [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
      [ 227.500944] do_init_module+0x5c/0x270
      [ 227.500946] __do_sys_finit_module+0x95/0xe0
      [ 227.500949] do_syscall_64+0x38/0x90
      [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
      [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
      [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d
      [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004
      [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070
      [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90
      [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0
      
      v2:
       (CI build)
        - Fix build error
      
      Fixes: 1a52faed ("drm/i915/guc: Take GT PM ref when deregistering context")
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
      (cherry picked from commit 12a9917e)
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      9ca8bb7a
  19. 26 10月, 2021 2 次提交
  20. 23 10月, 2021 1 次提交
    • M
      drm/i915/guc: Fix recursive lock in GuC submission · 12a9917e
      Matthew Brost 提交于
      Use __release_guc_id (lock held) rather than release_guc_id (acquires
      lock), add lockdep annotations.
      
      213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
      [ 213.283459] ============================================
      [ 213.283462] WARNING: possible recursive locking detected
      {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
      [ 213.283470] --------------------------------------------
      [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
      [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
      {{[ 213.283618] }}
      {{ but task is already holding lock:}}
      [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283720] }}
      {{ other info that might help us debug this:}}
      [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
      [ 213.283728] ----
      [ 213.283730] lock(&guc->submission_state.lock);
      [ 213.283734] lock(&guc->submission_state.lock);
      {{[ 213.283737] }}
      {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8:
      [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283860] }}
      {{ stack backtrace:}}
      [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18
      [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
      [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
      [ 213.283957] Call Trace:
      [ 213.283960] dump_stack_lvl+0x57/0x72
      [ 213.283966] __lock_acquire.cold+0x191/0x2d3
      [ 213.283972] lock_acquire+0xb5/0x2b0
      [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
      [ 213.284139] ? lock_release+0xb9/0x280
      [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
      [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284310] process_one_work+0x270/0x550
      [ 213.284315] worker_thread+0x52/0x3b0
      [ 213.284319] ? process_one_work+0x550/0x550
      [ 213.284322] kthread+0x135/0x160
      [ 213.284326] ? set_kthread_struct+0x40/0x40
      [ 213.284331] ret_from_fork+0x1f/0x30
      
      and a bit later in the trace:
      
      {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
      [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
      [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
      [ 227.500209] ? mark_held_locks+0x49/0x70
      [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
      [ 227.500320] reset_prepare+0x78/0x90 [i915]
      [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
      [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
      [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
      [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
      [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
      [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
      [ 227.500838] local_pci_probe+0x42/0x80
      [ 227.500842] pci_device_probe+0xd9/0x190
      [ 227.500844] really_probe+0x1f2/0x3f0
      [ 227.500847] __driver_probe_device+0xfe/0x180
      [ 227.500848] driver_probe_device+0x1e/0x90
      [ 227.500850] __driver_attach+0xc4/0x1d0
      [ 227.500851] ? __device_attach_driver+0xe0/0xe0
      [ 227.500853] ? __device_attach_driver+0xe0/0xe0
      [ 227.500854] bus_for_each_dev+0x64/0x90
      [ 227.500856] bus_add_driver+0x12e/0x1f0
      [ 227.500857] driver_register+0x8f/0xe0
      [ 227.500859] i915_init+0x1d/0x8f [i915]
      [ 227.500934] ? 0xffffffffc144a000
      [ 227.500936] do_one_initcall+0x58/0x2d0
      [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
      [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
      [ 227.500944] do_init_module+0x5c/0x270
      [ 227.500946] __do_sys_finit_module+0x95/0xe0
      [ 227.500949] do_syscall_64+0x38/0x90
      [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
      [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
      [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d
      [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004
      [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070
      [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90
      [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0
      
      v2:
       (CI build)
        - Fix build error
      
      Fixes: 1a52faed ("drm/i915/guc: Take GT PM ref when deregistering context")
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
      12a9917e
  21. 22 10月, 2021 1 次提交
  22. 18 10月, 2021 1 次提交