1. 23 10月, 2021 1 次提交
    • M
      drm/i915/guc: Fix recursive lock in GuC submission · 12a9917e
      Matthew Brost 提交于
      Use __release_guc_id (lock held) rather than release_guc_id (acquires
      lock), add lockdep annotations.
      
      213.280129] i915: Running i915_perf_live_selftests/live_noa_gpr
      [ 213.283459] ============================================
      [ 213.283462] WARNING: possible recursive locking detected
      {{[ 213.283466] 5.15.0-rc6+ #18 Tainted: G U W }}
      [ 213.283470] --------------------------------------------
      [ 213.283472] kworker/u24:0/8 is trying to acquire lock:
      [ 213.283475] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x2df/0x350 [i915]
      {{[ 213.283618] }}
      {{ but task is already holding lock:}}
      [ 213.283621] ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283720] }}
      {{ other info that might help us debug this:}}
      [ 213.283724] Possible unsafe locking scenario:[ 213.283727] CPU0
      [ 213.283728] ----
      [ 213.283730] lock(&guc->submission_state.lock);
      [ 213.283734] lock(&guc->submission_state.lock);
      {{[ 213.283737] }}
      {{ *** DEADLOCK ***}}[ 213.283740] May be due to missing lock nesting notation[ 213.283744] 3 locks held by kworker/u24:0/8:
      [ 213.283747] #0: ffff8ffb80059d38 ((wq_completion)events_unbound){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283757] #1: ffffb509000e3e78 ((work_completion)(&guc->submission_state.destroyed_worker)){..}-{0:0}, at: process_one_work+0x1f3/0x550
      [ 213.283766] #2: ffff8ffc4f6cc1e8 (&guc->submission_state.lock){....}-{2:2}, at: destroyed_worker_func+0x4f/0x350 [i915]
      {{[ 213.283860] }}
      {{ stack backtrace:}}
      [ 213.283863] CPU: 8 PID: 8 Comm: kworker/u24:0 Tainted: G U W 5.15.0-rc6+ #18
      [ 213.283868] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021
      [ 213.283873] Workqueue: events_unbound destroyed_worker_func [i915]
      [ 213.283957] Call Trace:
      [ 213.283960] dump_stack_lvl+0x57/0x72
      [ 213.283966] __lock_acquire.cold+0x191/0x2d3
      [ 213.283972] lock_acquire+0xb5/0x2b0
      [ 213.283978] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284059] ? destroyed_worker_func+0x2d7/0x350 [i915]
      [ 213.284139] ? lock_release+0xb9/0x280
      [ 213.284143] _raw_spin_lock_irqsave+0x48/0x60
      [ 213.284148] ? destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284226] destroyed_worker_func+0x2df/0x350 [i915]
      [ 213.284310] process_one_work+0x270/0x550
      [ 213.284315] worker_thread+0x52/0x3b0
      [ 213.284319] ? process_one_work+0x550/0x550
      [ 213.284322] kthread+0x135/0x160
      [ 213.284326] ? set_kthread_struct+0x40/0x40
      [ 213.284331] ret_from_fork+0x1f/0x30
      
      and a bit later in the trace:
      
      {{ 227.499864] do_raw_spin_lock+0x94/0xa0}}
      [ 227.499868] _raw_spin_lock_irqsave+0x50/0x60
      [ 227.499871] ? guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.499995] guc_flush_destroyed_contexts+0x4f/0xf0 [i915]
      [ 227.500104] intel_guc_submission_reset_prepare+0x99/0x4b0 [i915]
      [ 227.500209] ? mark_held_locks+0x49/0x70
      [ 227.500212] intel_uc_reset_prepare+0x46/0x50 [i915]
      [ 227.500320] reset_prepare+0x78/0x90 [i915]
      [ 227.500412] __intel_gt_set_wedged.part.0+0x13/0xe0 [i915]
      [ 227.500485] intel_gt_set_wedged.part.0+0x54/0x100 [i915]
      [ 227.500556] intel_gt_set_wedged_on_fini+0x1a/0x30 [i915]
      [ 227.500622] intel_gt_driver_unregister+0x1e/0x60 [i915]
      [ 227.500694] i915_driver_remove+0x4a/0xf0 [i915]
      [ 227.500767] i915_pci_probe+0x84/0x170 [i915]
      [ 227.500838] local_pci_probe+0x42/0x80
      [ 227.500842] pci_device_probe+0xd9/0x190
      [ 227.500844] really_probe+0x1f2/0x3f0
      [ 227.500847] __driver_probe_device+0xfe/0x180
      [ 227.500848] driver_probe_device+0x1e/0x90
      [ 227.500850] __driver_attach+0xc4/0x1d0
      [ 227.500851] ? __device_attach_driver+0xe0/0xe0
      [ 227.500853] ? __device_attach_driver+0xe0/0xe0
      [ 227.500854] bus_for_each_dev+0x64/0x90
      [ 227.500856] bus_add_driver+0x12e/0x1f0
      [ 227.500857] driver_register+0x8f/0xe0
      [ 227.500859] i915_init+0x1d/0x8f [i915]
      [ 227.500934] ? 0xffffffffc144a000
      [ 227.500936] do_one_initcall+0x58/0x2d0
      [ 227.500938] ? rcu_read_lock_sched_held+0x3f/0x80
      [ 227.500940] ? kmem_cache_alloc_trace+0x238/0x2d0
      [ 227.500944] do_init_module+0x5c/0x270
      [ 227.500946] __do_sys_finit_module+0x95/0xe0
      [ 227.500949] do_syscall_64+0x38/0x90
      [ 227.500951] entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 227.500953] RIP: 0033:0x7ffa59d2ae0d
      [ 227.500954] Code: c8 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 80 0c 00 f7 d8 64 89 01 48
      [ 227.500955] RSP: 002b:00007fff320bbf48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [ 227.500956] RAX: ffffffffffffffda RBX: 00000000022ea710 RCX: 00007ffa59d2ae0d
      [ 227.500957] RDX: 0000000000000000 RSI: 00000000022e1d90 RDI: 0000000000000004
      [ 227.500958] RBP: 0000000000000020 R08: 00007ffa59df3a60 R09: 0000000000000070
      [ 227.500958] R10: 00000000022e1d90 R11: 0000000000000246 R12: 00000000022e1d90
      [ 227.500959] R13: 00000000022e58e0 R14: 0000000000000043 R15: 00000000022e42c0
      
      v2:
       (CI build)
        - Fix build error
      
      Fixes: 1a52faed ("drm/i915/guc: Take GT PM ref when deregistering context")
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211020192147.8048-1-matthew.brost@intel.com
      12a9917e
  2. 16 10月, 2021 17 次提交
  3. 02 10月, 2021 4 次提交
  4. 24 9月, 2021 2 次提交
    • T
      drm/i915: Reduce the number of objects subject to memcpy recover · a259cc14
      Thomas Hellström 提交于
      We really only need memcpy restore for objects that affect the
      operability of the migrate context. That is, primarily the page-table
      objects of the migrate VM.
      
      Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early
      restores using memcpy and a way to assign LMEM page-table object flags
      to be used by the vms.
      
      Restore objects without this flag with the gpu blitter and only objects
      carrying the flag using TTM memcpy.
      
      Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and
      defer for a later audit which vms actually need it. Most importantly, user-
      allocated vms with pinned page-table objects can be restored using the
      blitter.
      
      Performance-wise memcpy restore is probably as fast as gpu restore if not
      faster, but using gpu restore will help tackling future restrictions in
      mappable LMEM size.
      
      v4:
      - Don't mark the aliasing ppgtt page table flags for early resume, but
        rather the ggtt page table flags as intended. (Matthew Auld)
      - The check for user buffer objects during early resume is pointless, since
        they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld)
      v5:
      - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored
        before we fire up the migrate context.
      
      Cc: Matthew Brost <matthew.brost@intel.com>
      Signed-off-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com
      a259cc14
    • T
      drm/i915/gt: Register the migrate contexts with their engines · 3e42cc61
      Thomas Hellström 提交于
      Pinned contexts, like the migrate contexts need reset after resume
      since their context image may have been lost. Also the GuC needs to
      register pinned contexts.
      
      Add a list to struct intel_engine_cs where we add all pinned contexts on
      creation, and traverse that list at resume time to reset the pinned
      contexts.
      
      This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now,
      but proper LMEM backup / restore is needed for full suspend functionality.
      However, note that even with full LMEM backup / restore it may be
      desirable to keep the reset since backing up the migrate context images
      must happen using memcpy() after the migrate context has become inactive,
      and for performance- and other reasons we want to avoid memcpy() from
      LMEM.
      
      Also traverse the list at guc_init_lrc_mapping() calling
      guc_kernel_context_pin() for the pinned contexts, like is already done
      for the kernel context.
      
      v2:
      - Don't reset the contexts on each __engine_unpark() but rather at
        resume time (Chris Wilson).
      v3:
      - Reset contexts in the engine sanitize callback. (Chris Wilson)
      
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Brost Matthew <matthew.brost@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-6-thomas.hellstrom@linux.intel.com
      3e42cc61
  5. 23 9月, 2021 1 次提交
    • A
      drm/i915/guc, docs: Fix pdfdocs build error by removing nested grid · 017792a0
      Akira Yokosawa 提交于
      Nested grids in grid-table cells are not specified as proper ReST
      constructs.
      Commit 572f2a5c ("drm/i915/guc: Update firmware to v62.0.0")
      added a couple of kerneldoc tables of the form:
      
        +---+-------+------------------------------------------------------+
        | 1 |  31:0 |  +------------------------------------------------+  |
        +---+-------+  |                                                |  |
        |...|       |  |  Embedded `HXG Message`_                       |  |
        +---+-------+  |                                                |  |
        | n |  31:0 |  +------------------------------------------------+  |
        +---+-------+------------------------------------------------------+
      
      For "make htmldocs", they happen to work as one might expect,
      but they are incompatible with "make latexdocs" and "make pdfdocs",
      and cause the generated gpu.tex file to become incomplete and
      unbuildable by xelatex.
      
      Restore the compatibility by removing those nested grids in the tables.
      
      Size comparison of generated gpu.tex:
      
                        Sphinx 2.4.4  Sphinx 4.2.0
        v5.14:               3238686       3841631
        v5.15-rc1:            376270        432729
        with this fix:       3377846       3998095
      
      Fixes: 572f2a5c ("drm/i915/guc: Update firmware to v62.0.0")
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAkira Yokosawa <akiyks@gmail.com>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/4a227569-074f-c501-58bb-d0d8f60a8ae9@gmail.com
      017792a0
  6. 21 9月, 2021 4 次提交
  7. 19 9月, 2021 1 次提交
  8. 14 9月, 2021 10 次提交