1. 27 10月, 2022 1 次提交
    • M
      drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis · 83321094
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      
      Design awareness: Selftest impact.
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Design awareness: Race between guc_request_alloc and guc_context_close.
      If a context close is issued while there is a request submission in
      flight and a delayed schedule disable is pending, guc_context_close
      and guc_request_alloc will race to cancel the delayed disable.
      To close the race, make sure that guc_request_alloc waits for
      guc_context_close to finish running before checking any state.
      
      Design awareness: GT Reset event.
      If a gt reset is triggered, as preparation steps, add an additional step
      to ensure all contexts that have a pending delay-disable-schedule task
      be flushed of it. Move them directly into the closed state after cancelling
      the worker. This is okay because the existing flow flushes all
      yet-to-arrive G2H's dropping them anyway.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
      83321094
  2. 05 10月, 2022 1 次提交
  3. 19 9月, 2022 1 次提交
    • C
      drm/i915/gem: Really move i915_gem_context.link under ref protection · ad3aa7c3
      Chris Wilson 提交于
      i915_perf assumes that it can use the i915_gem_context reference to
      protect its i915->gem.contexts.list iteration. However, this requires
      that we do not remove the context from the list until after we drop the
      final reference and release the struct. If, as currently, we remove the
      context from the list during context_close(), the link.next pointer may
      be poisoned while we are holding the context reference and cause a GPF:
      
      [ 4070.573157] i915 0000:00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering on ctx_id=0x1fffff ctx_id_mask=0x1fffff
      [ 4070.574881] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP
      [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G            E     5.17.9 #180
      [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
      [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
      [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
      [ 4070.574990] RSP: 0018:ffffc90002077b78 EFLAGS: 00010202
      [ 4070.574995] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000
      [ 4070.575000] RDX: 0000000000000001 RSI: ffffc90002077b20 RDI: ffff88810ddc7c68
      [ 4070.575004] RBP: 0000000000000001 R08: ffff888103242648 R09: fffffffffffffffc
      [ 4070.575008] R10: ffffffff82c50bc0 R11: 0000000000025c80 R12: ffff888101bf1860
      [ 4070.575012] R13: dead0000000000b0 R14: ffffc90002077c04 R15: ffff88810be5cabc
      [ 4070.575016] FS:  00007f1ed50c0780(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
      [ 4070.575021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4070.575025] CR2: 00007f1ed5590280 CR3: 000000010ef6f005 CR4: 00000000003706e0
      [ 4070.575029] Call Trace:
      [ 4070.575033]  <TASK>
      [ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
      [ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
      [ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
      [ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
      [ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
      [ 4070.575290]  drm_ioctl_kernel+0x85/0x110
      [ 4070.575296]  ? update_load_avg+0x5f/0x5e0
      [ 4070.575302]  drm_ioctl+0x1d3/0x370
      [ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
      [ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
      [ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
      [ 4070.575451]  ? __do_softirq+0xaa/0x1d2
      [ 4070.575456]  do_syscall_64+0x35/0x80
      [ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 4070.575467] RIP: 0033:0x7f1ed5c10397
      [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
      [ 4070.575478] RSP: 002b:00007ffd65c8d7a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [ 4070.575484] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f1ed5c10397
      [ 4070.575488] RDX: 00007ffd65c8d7c0 RSI: 0000000040106476 RDI: 0000000000000006
      [ 4070.575492] RBP: 00005620972f9c60 R08: 000000000000000a R09: 0000000000000005
      [ 4070.575496] R10: 000000000000000d R11: 0000000000000246 R12: 000000000000000a
      [ 4070.575500] R13: 000000000000000d R14: 0000000000000000 R15: 00007ffd65c8d7c0
      [ 4070.575505]  </TASK>
      [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
      [ 4070.575549] ---[ end trace 0000000000000000 ]---
      
      v3: fix incorrect syntax of spin_lock() replacing spin_lock_irqsave()
      
      v2: irqsave not required in a worker, neither conversion to irq safe
          elsewhere (Tvrtko),
        - perf: it's safe to call gen8_configure_context() even if context has
          been closed, no need to check,
        - drop unrelated cleanup (Andi, Tvrtko)
      Reported-by: NMark Janes <mark.janes@intel.com>
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
      References: a4e7ccda ("drm/i915: Move context management under GEM")
      Fixes: f8246cf4 ("drm/i915/gem: Drop free_work for GEM contexts")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: NJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: <stable@vger.kernel.org> # v5.12+
      Signed-off-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-3-janusz.krzysztofik@linux.intel.com
      ad3aa7c3
  4. 21 8月, 2022 1 次提交
  5. 19 8月, 2022 1 次提交
    • M
      drm/i915/guc: Add delay to disable scheduling after pin count goes to zero · 6a079903
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Alan Previn: Matt Brost first introduced this series back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
      6a079903
  6. 27 6月, 2022 1 次提交
  7. 22 6月, 2022 1 次提交
  8. 17 6月, 2022 1 次提交
    • T
      drm/i915: Improve user experience and driver robustness under SIGINT or similar · 45c64ecf
      Tvrtko Ursulin 提交于
      We have long standing customer complaints that pressing Ctrl-C (or to the
      effect of) causes engine resets with otherwise well behaving programs.
      
      Not only is logging engine resets during normal operation not desirable
      since it creates support incidents, but more fundamentally we should avoid
      going the engine reset path when we can since any engine reset introduces
      a chance of harming an innocent context.
      
      Reason for this undesirable behaviour is that the driver currently does
      not distinguish between banned contexts and non-persistent contexts which
      have been closed.
      
      To fix this we add the distinction between the two reasons for revoking
      contexts, which then allows the strict timeout only be applied to banned,
      while innocent contexts (well behaving) can preempt cleanly and exit
      without triggering the engine reset path.
      
      Note that the added context exiting category applies both to closed non-
      persistent context, and any exiting context when hangcheck has been
      disabled by the user.
      
      At the same time we rename the backend operation from 'ban' to 'revoke'
      which more accurately describes the actual semantics. (There is no ban at
      the backend level since banning is a concept driven by the scheduling
      frontend. Backends are simply able to revoke a running context so that
      is the more appropriate name chosen.)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NAndrzej Hajda <andrzej.hajda@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com
      45c64ecf
  9. 02 6月, 2022 1 次提交
    • M
      drm/i915/sseu: Disassociate internal subslice mask representation from uapi · b87d3901
      Matt Roper 提交于
      As with EU masks, it's easier to store subslice/DSS masks internally in
      a format that's more natural for the driver to work with, and then only
      covert into the u8[] uapi form when the query ioctl is invoked.  Since
      the hardware design changed significantly with Xe_HP, we'll use a union
      to choose between the old "hsw-style" subslice masks or the newer xehp
      mask.  HSW-style masks will be stored in an array of u8's, indexed by
      slice (there's never more than 6 subslices per slice on older
      platforms).  For Xe_HP and beyond where slices no longer exist, we only
      need a single bitmask.  However we already know that this mask is
      eventually going to grow too large for a simple u64 to hold, so we'll
      represent it in a manner that can be operated on by the utilities in
      linux/bitmap.h.
      
      v2:
       - Fix typo: BIT(s) -> BIT(ss) in gen9_sseu_device_status()
      
      v3:
       - Eliminate sseu->ss_stride and just calculate the stride while
         specifically handling uapi.  (Tvrtko)
       - Use BITMAP_BITS() macro to refer to size of masks rather than
         passing I915_MAX_SS_FUSE_BITS directly.  (Tvrtko)
       - Report compute/geometry DSS masks separately when dumping Xe_HP SSEU
         info.  (Tvrtko)
       - Restore dropped range checks to intel_sseu_has_subslice().  (Tvrtko)
      
      v4:
       - Make the bitmap size macro check the size of the .xehp field rather
         than the containing union.  (Tvrtko)
       - Don't add GEM_BUG_ON() intel_sseu_has_subslice()'s check for whether
         slice or subslice ID exceed sseu->max_[sub]slices; various loops
         in the driver are expected to exceed these, so we should just
         silently return 'false.'
      
      v5:
       - Move XEHP_BITMAP_BITS() to the header so that we can also replace a
         usage of I915_MAX_SS_FUSE_BITS in one of the inline functions.
         (Bala)
       - Change the local variable in intel_slicemask_from_xehp_dssmask() from
         u16 to 'unsigned long' to make it a bit more future-proof.
      
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
      Signed-off-by: NMatt Roper <matthew.d.roper@intel.com>
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NBalasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220601150725.521468-6-matthew.d.roper@intel.com
      b87d3901
  10. 05 4月, 2022 3 次提交
  11. 07 3月, 2022 1 次提交
  12. 04 3月, 2022 1 次提交
  13. 02 3月, 2022 1 次提交
  14. 14 2月, 2022 3 次提交
  15. 27 12月, 2021 1 次提交
  16. 24 12月, 2021 2 次提交
  17. 18 12月, 2021 1 次提交
  18. 01 12月, 2021 1 次提交
  19. 18 10月, 2021 1 次提交
  20. 16 10月, 2021 2 次提交
  21. 13 10月, 2021 1 次提交
  22. 08 10月, 2021 1 次提交
    • L
      drm/i915: remove IS_ACTIVE · 1a839e01
      Lucas De Marchi 提交于
      When trying to bring IS_ACTIVE to linux/kconfig.h I thought it wouldn't
      provide much value just encapsulating it in a boolean context. So I also
      added the support for handling undefined macros as the IS_ENABLED()
      counterpart. However the feedback received from Masahiro Yamada was that
      it is too ugly, not providing much value. And just wrapping in a boolean
      context is too dumb - we could simply open code it.
      
      As detailed in commit babaab2f ("drm/i915: Encapsulate kconfig
      constant values inside boolean predicates"), the IS_ACTIVE macro was
      added to workaround a compilation warning. However after checking again
      our current uses of IS_ACTIVE it turned out there is only
      1 case in which it triggers a warning in clang (due
      -Wconstant-logical-operand) and 2 in smatch. All the others
      can simply use the shorter version, without wrapping it in any macro.
      
      So here I'm dialing all the way back to simply removing the macro. That
      single case hit by clang can be changed to make the constant come first,
      so it doesn't think it's mask:
      
      	-       if (context && CONFIG_DRM_I915_FENCE_TIMEOUT)
      	+       if (CONFIG_DRM_I915_FENCE_TIMEOUT && context)
      
      As talked with Dan Carpenter, that logic will be added in smatch as
      well, so it will also stop warning about it.
      Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: NJani Nikula <jani.nikula@intel.com>
      Reviewed-by: NMasahiro Yamada <masahiroy@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211005171728.3147094-1-lucas.demarchi@intel.com
      1a839e01
  23. 05 10月, 2021 3 次提交
  24. 24 9月, 2021 1 次提交
    • T
      drm/i915: Reduce the number of objects subject to memcpy recover · a259cc14
      Thomas Hellström 提交于
      We really only need memcpy restore for objects that affect the
      operability of the migrate context. That is, primarily the page-table
      objects of the migrate VM.
      
      Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early
      restores using memcpy and a way to assign LMEM page-table object flags
      to be used by the vms.
      
      Restore objects without this flag with the gpu blitter and only objects
      carrying the flag using TTM memcpy.
      
      Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and
      defer for a later audit which vms actually need it. Most importantly, user-
      allocated vms with pinned page-table objects can be restored using the
      blitter.
      
      Performance-wise memcpy restore is probably as fast as gpu restore if not
      faster, but using gpu restore will help tackling future restrictions in
      mappable LMEM size.
      
      v4:
      - Don't mark the aliasing ppgtt page table flags for early resume, but
        rather the ggtt page table flags as intended. (Matthew Auld)
      - The check for user buffer objects during early resume is pointless, since
        they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld)
      v5:
      - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored
        before we fire up the migrate context.
      
      Cc: Matthew Brost <matthew.brost@intel.com>
      Signed-off-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com
      a259cc14
  25. 14 9月, 2021 1 次提交
    • D
      drm/i915: Release ctx->syncobj on final put, not on ctx close · 03153666
      Daniel Vetter 提交于
      gem context refcounting is another exercise in least locking design it
      seems, where most things get destroyed upon context closure (which can
      race with anything really). Only the actual memory allocation and the
      locks survive while holding a reference.
      
      This tripped up Jason when reimplementing the single timeline feature
      in
      
      commit 00dae4d3
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:12 2021 -0500
      
          drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
      
      We could fix the bug by holding ctx->mutex in execbuf and clear the
      pointer (again while holding the mutex) context_close, but it's
      cleaner to just make the context object actually invariant over its
      _entire_ lifetime. This way any other ioctl that's potentially racing,
      but holding a full reference, can still rely on ctx->syncobj being
      an immutable pointer. Which without this change, is not the case.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Fixes: 00dae4d3 ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch
      (cherry picked from commit c238980e)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      03153666
  26. 06 9月, 2021 7 次提交
    • D
      drm/i915: Drop __rcu from gem_context->vm · 9ec8795e
      Daniel Vetter 提交于
      It's been invariant since
      
          commit ccbc1b97
          Author: Jason Ekstrand <jason@jlekstrand.net>
          Date:   Thu Jul 8 10:48:30 2021 -0500
      
              drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
      this just completes the deed. I've tried to split out prep work for
      more careful review as much as possible, this is what's left:
      
      - get_ppgtt gets simplified since we don't need to grab a temporary
        reference - we can rely on the temporary reference for the gem_ctx
        while we inspect the vm. The new vm_id still needs a full
        i915_vm_open ofc. This also removes the final caller of context_get_vm_rcu
      
      - A pile of selftests can now just look at ctx->vm instead of
        rcu_dereference_protected( , true) or similar things.
      
      - All callers of i915_gem_context_vm also disappear.
      
      - I've changed the hugepage selftest to set scrub_64K without any
        locking, because when we inspect that setting we're also not taking
        any locks either. It works because it's a selftests that's careful
        (single threaded gives you nice ordering) and not a live driver
        where races can happen from anywhere.
      
      These can only be split up further if we have some intermediate state
      with a bunch more rcu_dereference_protected(ctx->vm, true), just to
      shut up lockdep and sparse.
      
      The conversion to __rcu happened in
      
      commit a4e7ccda
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Oct 4 14:40:09 2019 +0100
      
          drm/i915: Move context management under GEM
      
      Note that we're not breaking the actual bugfix in there: The real
      bugfix is pushing the i915_vm_relase onto a separate worker, to avoid
      locking inversion issues. The rcu conversion was just thrown in for
      entertainment value on top (no vm lookup isn't even close to anything
      that's a hotpath where removing the single spinlock can be measured).
      
      v2: Rebase over the change to move the i915_vm_put() into
      i915_gem_context_release().
      
      v3: Trivial conflict against repainted shed.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-9-daniel.vetter@ffwll.ch
      9ec8795e
    • D
      drm/i915: Use i915_gem_context_get_eb_vm in intel_context_set_gem · 0483a301
      Daniel Vetter 提交于
      Since
      
      commit ccbc1b97
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:30 2021 -0500
      
          drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
      the gem_ctx->vm can't change anymore. Plus we always set the
      intel_context->vm, so might as well use the helper we have for that.
      
      This makes it very clear that we always overwrite intel_context->vm
      for userspace contexts, since the default is gt->vm, which is
      explicitly reserved for kernel context use. It would be good to split
      things up a bit further and avoid any possibility for an accident
      where we run kernel stuff in userspace vm or the other way round.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-8-daniel.vetter@ffwll.ch
      0483a301
    • D
      drm/i915: Add i915_gem_context_is_full_ppgtt · a82a9979
      Daniel Vetter 提交于
      And use it anywhere we have open-coded checks for ctx->vm that really
      only check for full ppgtt.
      
      Plus for paranoia add a GEM_BUG_ON that checks it's really only set
      when we have full ppgtt, just in case. gem_context->vm is different
      since it's NULL in ggtt mode, unlike intel_context->vm or gt->vm,
      which is always set.
      
      v2: 0day found a testcase that I missed.
      
      v3: Repaint shed (Jon, Tvrtko)
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-7-daniel.vetter@ffwll.ch
      a82a9979
    • D
      drm/i915: Use i915_gem_context_get_eb_vm in ctx_getparam · 24fad29e
      Daniel Vetter 提交于
      Consolidates the "which is the vm my execbuf runs in" code a bit. We
      do some get/put which isn't really required, but all the other users
      want the refcounting, and I figured doing a function just for this
      getparam to avoid 2 atomis is a bit much.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-6-daniel.vetter@ffwll.ch
      24fad29e
    • D
      drm/i915: Keep gem ctx->vm alive until the final put · 8cf97637
      Daniel Vetter 提交于
      The comment added in
      
          commit b81dde71
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Tue May 21 22:11:29 2019 +0100
      
              drm/i915: Allow userspace to clone contexts on creation
      
      and moved in
      
          commit 27dbae8f
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Wed Nov 6 09:13:12 2019 +0000
      
              drm/i915/gem: Safely acquire the ctx->vm when copying
      
      suggested that i915_address_space were at least intended to be managed
      through SLAB_TYPESAFE_BY_RCU:
      
                      * This ppgtt may have be reallocated between
                      * the read and the kref, and reassigned to a third
                      * context. In order to avoid inadvertent sharing
                      * of this ppgtt with that third context (and not
                      * src), we have to confirm that we have the same
                      * ppgtt after passing through the strong memory
                      * barrier implied by a successful
                      * kref_get_unless_zero().
      
      But extensive git history search has not brough any such reuse to
      light.
      
      What has come to light though is that ever since
      
      commit 2850748e
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Oct 4 14:39:58 2019 +0100
      
          drm/i915: Pull i915_vma_pin under the vm->mutex
      
      (yes this commit is earlier) the final i915_vma_put call has been
      moved from i915_gem_context_free (now called _release) to
      context_close, which means it's not actually safe anymore to access
      the ctx->vm pointer without lock helds, because it might disappear at
      any moment. Note that superficially things all still work, because the
      i915_address_space is RCU protected since
      
          commit b32fa811
          Author: Chris Wilson <chris@chris-wilson.co.uk>
          Date:   Thu Jun 20 19:37:05 2019 +0100
      
              drm/i915/gtt: Defer address space cleanup to an RCU worker
      
      except the very clever macro above (which is designed to protected
      against object reuse due to SLAB_TYPESAFE_BY_RCU or similar tricks)
      results in an endless loop if the refcount of the ctx->vm ever
      permanently drops to 0. Which it totally now can.
      
      Fix that by moving the final i915_vm_put to where it should be.
      
      Note that i915_gem_context is rcu protected, but _only_ the final
      kfree. This means anyone who chases a pointer to a gem ctx solely
      under the protection can pretty only call kref_get_unless_zero(). This
      seems to be pretty much the case, aside from a bunch of cases that
      consult the scheduling information without any further protection.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex")
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-3-daniel.vetter@ffwll.ch
      8cf97637
    • D
      drm/i915: Release ctx->syncobj on final put, not on ctx close · c238980e
      Daniel Vetter 提交于
      gem context refcounting is another exercise in least locking design it
      seems, where most things get destroyed upon context closure (which can
      race with anything really). Only the actual memory allocation and the
      locks survive while holding a reference.
      
      This tripped up Jason when reimplementing the single timeline feature
      in
      
      commit 00dae4d3
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:12 2021 -0500
      
          drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
      
      We could fix the bug by holding ctx->mutex in execbuf and clear the
      pointer (again while holding the mutex) context_close, but it's
      cleaner to just make the context object actually invariant over its
      _entire_ lifetime. This way any other ioctl that's potentially racing,
      but holding a full reference, can still rely on ctx->syncobj being
      an immutable pointer. Which without this change, is not the case.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Fixes: 00dae4d3 ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch
      c238980e
    • D
      drm/i915: Release i915_gem_context from a worker · 75eefd82
      Daniel Vetter 提交于
      The only reason for this really is the i915_gem_engines->fence
      callback engines_notify(), which exists purely as a fairly funky
      reference counting scheme for that. Otherwise all other callers are
      from process context, and generally fairly benign locking context.
      
      Unfortunately untangling that requires some major surgery, and we have
      a few i915_gem_context reference counting bugs that need fixing, and
      they blow in the current hardirq calling context, so we need a
      stop-gap measure.
      
      Put a FIXME comment in when this should be removable again.
      
      v2: Fix mock_context(), noticed by intel-gfx-ci.
      Acked-by: NAcked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-1-daniel.vetter@ffwll.ch
      75eefd82