1. 10 12月, 2022 1 次提交
    • A
      drm/i915/pxp: Promote pxp subsystem to top-level of i915 · f67986b0
      Alan Previn 提交于
      Starting with MTL, there will be two GT-tiles, a render and media
      tile. PXP as a service for supporting workloads with protected
      contexts and protected buffers can be subscribed by process
      workloads on any tile. However, depending on the platform,
      only one of the tiles is used for control events pertaining to PXP
      operation (such as creating the arbitration session and session
      tear-down).
      
      PXP as a global feature is accessible via batch buffer instructions
      on any engine/tile and the coherency across tiles is handled implicitly
      by the HW. In fact, for the foreseeable future, we are expecting this
      single-control-tile for the PXP subsystem.
      
      In MTL, it's the standalone media tile (not the root tile) because
      it contains the VDBOX and KCR engine (among the assets PXP relies on
      for those events).
      
      Looking at the current code design, each tile is represented by the
      intel_gt structure while the intel_pxp structure currently hangs off the
      intel_gt structure.
      
      Keeping the intel_pxp structure within the intel_gt structure makes some
      internal functionalities more straight forward but adds code complexity to
      code readability and maintainibility to many external-to-pxp subsystems
      which may need to pick the correct intel_gt structure. An example of this
      would be the intel_pxp_is_active or intel_pxp_is_enabled functionality
      which should be viewed as a global level inquiry, not a per-gt inquiry.
      
      That said, this series promotes the intel_pxp structure into the
      drm_i915_private structure making it a top-level subsystem and the PXP
      subsystem will select the control gt internally and keep a pointer to
      it for internal reference.
      
      This promotion comes with two noteworthy changes:
      
      1. Exported pxp functions that are called by external subsystems
         (such as intel_pxp_enabled/active) will have to check implicitly
         if i915->pxp is valid as that structure will not be allocated
         for HW that doesn't support PXP.
      
      2. Since GT is now considered a soft-dependency of PXP we are
         ensuring that GT init happens before PXP init and vice versa
         for fini. This causes a minor ordering change whereby we previously
         called intel_pxp_suspend after intel_uc_suspend but now is before
         i915_gem_suspend_late but the change is required for correct
         dependency flows. Additionally, this re-order change doesn't
         have any impact because at that point in either case, the top level
         entry to i915 won't observe any PXP events (since the GPU was
         quiesced during suspend_prepare). Also, any PXP event doesn't
         really matter when we disable the PXP HW (global GT irqs are
         already off anyway, so even if there was a bug that generated
         spurious events we wouldn't see it and we would just clean it
         up on resume which is okay since the default fallback action
         for PXP would be to keep the sessions off at this suspend stage).
      
      Changes from prior revs:
        v11: - Reformat a comment (Tvrtko).
        v10: - Change the code flow for intel_pxp_init to make it more
               cleaner and readible with better comments explaining the
               difference between full-PXP-feature vs the partial-teelink
               inits depending on the platform. Additionally, only do
               the pxp allocation when we are certain the subsystem is
               needed. (Tvrtko).
         v9: - Cosmetic cleanups in supported/enabled/active. (Daniele).
             - Add comments for intel_pxp_init and pxp_get_ctrl_gt that
               explain the functional flow for when PXP is not supported
               but the backend-assets are needed for HuC authentication
               (Daniele and Tvrtko).
             - Fix two remaining functions that are accessible outside
               PXP that need to be checking pxp ptrs before using them:
               intel_pxp_irq_handler and intel_pxp_huc_load_and_auth
               (Tvrtko and Daniele).
             - User helper macro in pxp-debugfs (Tvrtko).
         v8: - Remove pxp_to_gt macro (Daniele).
             - Fix a bug in pxp_get_ctrl_gt for the case of MTL and we don't
               support GSC-FW on it. (Daniele).
             - Leave i915->pxp as NULL if we dont support PXP and in line
               with that, do additional validity check on i915->pxp for
               intel_pxp_is_supported/enabled/active (Daniele).
             - Remove unncessary include header from intel_gt_debugfs.c
               and check drm_minor i915->drm.primary (Daniele).
             - Other cosmetics / minor issues / more comments on suspend
               flow order change (Daniele).
         v7: - Drop i915_dev_to_pxp and in intel_pxp_init use 'i915->pxp'
               through out instead of local variable newpxp. (Rodrigo)
             - In the case intel_pxp_fini is called during driver unload but
               after i915 loading failed without pxp being allocated, check
               i915->pxp before referencing it. (Alan)
         v6: - Remove HAS_PXP macro and replace it with intel_pxp_is_supported
               because : [1] introduction of 'ctrl_gt' means we correct this
               for MTL's upcoming series now. [2] Also, this has little impact
               globally as its only used by PXP-internal callers at the moment.
             - Change intel_pxp_init/fini to take in i915 as its input to avoid
               ptr-to-ptr in init/fini calls.(Jani).
             - Remove the backpointer from pxp->i915 since we can use
               pxp->ctrl_gt->i915 if we need it. (Rodrigo).
         v5: - Switch from series to single patch (Rodrigo).
             - change function name from pxp_get_kcr_owner_gt to
               pxp_get_ctrl_gt.
             - Fix CI BAT failure by removing redundant call to intel_pxp_fini
               from driver-remove.
             - NOTE: remaining open still persists on using ptr-to-ptr
               and back-ptr.
         v4: - Instead of maintaining intel_pxp as an intel_gt structure member
               and creating a number of convoluted helpers that takes in i915 as
               input and redirects to the correct intel_gt or takes any intel_gt
               and internally replaces with the correct intel_gt, promote it to
               be a top-level i915 structure.
         v3: - Rename gt level helper functions to "intel_pxp_is_enabled/
               supported/ active_on_gt" (Daniele)
             - Upgrade _gt_supports_pxp to replace what was intel_gtpxp_is
               supported as the new intel_pxp_is_supported_on_gt to check for
               PXP feature support vs the tee support for huc authentication.
               Fix pxp-debugfs-registration to use only the former to decide
               support. (Daniele)
             - Couple minor optimizations.
         v2: - Avoid introduction of new device info or gt variables and use
               existing checks / macros to differentiate the correct GT->PXP
               control ownership (Daniele Ceraolo Spurio)
             - Don't reuse the updated global-checkers for per-GT callers (such
               as other files within PXP) to avoid unnecessary GT-reparsing,
               expose a replacement helper like the prior ones. (Daniele).
         v1: - Add one more patch to the series for the intel_pxp suspend/resume
               for similar refactoring
      
      References: https://patchwork.freedesktop.org/patch/msgid/20221202011407.4068371-1-alan.previn.teres.alexis@intel.comSigned-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221208180542.998148-1-alan.previn.teres.alexis@intel.com
      f67986b0
  2. 10 11月, 2022 1 次提交
  3. 27 10月, 2022 2 次提交
    • K
      i915/i915_gem_context: Remove debug message in i915_gem_context_create_ioctl · 67f99e34
      Karolina Drobnik 提交于
      We know that as long as GEM context create ioctl succeeds, a context was
      created. There is no need to write about it, especially when such a message
      heavily pollutes dmesg and makes debugging actual errors harder.
      
      Since commit baa89ba3 ("drm/i915/gem: initial conversion to new
      logging macros using coccinelle"), the logging for creating a new user
      context was moved under the driver debug output (for lack of a means for
      per-user logs, and a lack of user-focused drm.debug parameter). This
      only reveals how obnoxious having that spam be part of the driver debug
      logs, so remove it. [ from Chris Wilson ]
      Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NKarolina Drobnik <karolina.drobnik@intel.com>
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Reviewed-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221025091903.986819-1-karolina.drobnik@intel.com
      67f99e34
    • M
      drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis · 83321094
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      
      Design awareness: Selftest impact.
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Design awareness: Race between guc_request_alloc and guc_context_close.
      If a context close is issued while there is a request submission in
      flight and a delayed schedule disable is pending, guc_context_close
      and guc_request_alloc will race to cancel the delayed disable.
      To close the race, make sure that guc_request_alloc waits for
      guc_context_close to finish running before checking any state.
      
      Design awareness: GT Reset event.
      If a gt reset is triggered, as preparation steps, add an additional step
      to ensure all contexts that have a pending delay-disable-schedule task
      be flushed of it. Move them directly into the closed state after cancelling
      the worker. This is okay because the existing flow flushes all
      yet-to-arrive G2H's dropping them anyway.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
      83321094
  4. 10 10月, 2022 1 次提交
  5. 05 10月, 2022 1 次提交
  6. 20 9月, 2022 1 次提交
    • C
      drm/i915/gem: Really move i915_gem_context.link under ref protection · d119888b
      Chris Wilson 提交于
      i915_perf assumes that it can use the i915_gem_context reference to
      protect its i915->gem.contexts.list iteration. However, this requires
      that we do not remove the context from the list until after we drop the
      final reference and release the struct. If, as currently, we remove the
      context from the list during context_close(), the link.next pointer may
      be poisoned while we are holding the context reference and cause a GPF:
      
      [ 4070.573157] i915 0000:00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering on ctx_id=0x1fffff ctx_id_mask=0x1fffff
      [ 4070.574881] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP
      [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G            E     5.17.9 #180
      [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
      [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
      [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
      [ 4070.574990] RSP: 0018:ffffc90002077b78 EFLAGS: 00010202
      [ 4070.574995] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000
      [ 4070.575000] RDX: 0000000000000001 RSI: ffffc90002077b20 RDI: ffff88810ddc7c68
      [ 4070.575004] RBP: 0000000000000001 R08: ffff888103242648 R09: fffffffffffffffc
      [ 4070.575008] R10: ffffffff82c50bc0 R11: 0000000000025c80 R12: ffff888101bf1860
      [ 4070.575012] R13: dead0000000000b0 R14: ffffc90002077c04 R15: ffff88810be5cabc
      [ 4070.575016] FS:  00007f1ed50c0780(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
      [ 4070.575021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4070.575025] CR2: 00007f1ed5590280 CR3: 000000010ef6f005 CR4: 00000000003706e0
      [ 4070.575029] Call Trace:
      [ 4070.575033]  <TASK>
      [ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
      [ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
      [ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
      [ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
      [ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
      [ 4070.575290]  drm_ioctl_kernel+0x85/0x110
      [ 4070.575296]  ? update_load_avg+0x5f/0x5e0
      [ 4070.575302]  drm_ioctl+0x1d3/0x370
      [ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
      [ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
      [ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
      [ 4070.575451]  ? __do_softirq+0xaa/0x1d2
      [ 4070.575456]  do_syscall_64+0x35/0x80
      [ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 4070.575467] RIP: 0033:0x7f1ed5c10397
      [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
      [ 4070.575478] RSP: 002b:00007ffd65c8d7a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [ 4070.575484] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f1ed5c10397
      [ 4070.575488] RDX: 00007ffd65c8d7c0 RSI: 0000000040106476 RDI: 0000000000000006
      [ 4070.575492] RBP: 00005620972f9c60 R08: 000000000000000a R09: 0000000000000005
      [ 4070.575496] R10: 000000000000000d R11: 0000000000000246 R12: 000000000000000a
      [ 4070.575500] R13: 000000000000000d R14: 0000000000000000 R15: 00007ffd65c8d7c0
      [ 4070.575505]  </TASK>
      [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
      [ 4070.575549] ---[ end trace 0000000000000000 ]---
      
      v3: fix incorrect syntax of spin_lock() replacing spin_lock_irqsave()
      
      v2: irqsave not required in a worker, neither conversion to irq safe
          elsewhere (Tvrtko),
        - perf: it's safe to call gen8_configure_context() even if context has
          been closed, no need to check,
        - drop unrelated cleanup (Andi, Tvrtko)
      Reported-by: NMark Janes <mark.janes@intel.com>
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
      References: a4e7ccda ("drm/i915: Move context management under GEM")
      Fixes: f8246cf4 ("drm/i915/gem: Drop free_work for GEM contexts")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: NJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: <stable@vger.kernel.org> # v5.12+
      Signed-off-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-3-janusz.krzysztofik@linux.intel.com
      (cherry picked from commit ad3aa7c3)
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      d119888b
  7. 19 9月, 2022 1 次提交
    • C
      drm/i915/gem: Really move i915_gem_context.link under ref protection · ad3aa7c3
      Chris Wilson 提交于
      i915_perf assumes that it can use the i915_gem_context reference to
      protect its i915->gem.contexts.list iteration. However, this requires
      that we do not remove the context from the list until after we drop the
      final reference and release the struct. If, as currently, we remove the
      context from the list during context_close(), the link.next pointer may
      be poisoned while we are holding the context reference and cause a GPF:
      
      [ 4070.573157] i915 0000:00:02.0: [drm:i915_perf_open_ioctl [i915]] filtering on ctx_id=0x1fffff ctx_id_mask=0x1fffff
      [ 4070.574881] general protection fault, probably for non-canonical address 0xdead000000000100: 0000 [#1] PREEMPT SMP
      [ 4070.574897] CPU: 1 PID: 284392 Comm: amd_performance Tainted: G            E     5.17.9 #180
      [ 4070.574903] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
      [ 4070.574907] RIP: 0010:oa_configure_all_contexts.isra.0+0x222/0x350 [i915]
      [ 4070.574982] Code: 08 e8 32 6e 10 e1 4d 8b 6d 50 b8 ff ff ff ff 49 83 ed 50 f0 41 0f c1 04 24 83 f8 01 0f 84 e3 00 00 00 85 c0 0f 8e fa 00 00 00 <49> 8b 45 50 48 8d 70 b0 49 8d 45 50 48 39 44 24 10 0f 85 34 fe ff
      [ 4070.574990] RSP: 0018:ffffc90002077b78 EFLAGS: 00010202
      [ 4070.574995] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000
      [ 4070.575000] RDX: 0000000000000001 RSI: ffffc90002077b20 RDI: ffff88810ddc7c68
      [ 4070.575004] RBP: 0000000000000001 R08: ffff888103242648 R09: fffffffffffffffc
      [ 4070.575008] R10: ffffffff82c50bc0 R11: 0000000000025c80 R12: ffff888101bf1860
      [ 4070.575012] R13: dead0000000000b0 R14: ffffc90002077c04 R15: ffff88810be5cabc
      [ 4070.575016] FS:  00007f1ed50c0780(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
      [ 4070.575021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4070.575025] CR2: 00007f1ed5590280 CR3: 000000010ef6f005 CR4: 00000000003706e0
      [ 4070.575029] Call Trace:
      [ 4070.575033]  <TASK>
      [ 4070.575037]  lrc_configure_all_contexts+0x13e/0x150 [i915]
      [ 4070.575103]  gen8_enable_metric_set+0x4d/0x90 [i915]
      [ 4070.575164]  i915_perf_open_ioctl+0xbc0/0x1500 [i915]
      [ 4070.575224]  ? asm_common_interrupt+0x1e/0x40
      [ 4070.575232]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
      [ 4070.575290]  drm_ioctl_kernel+0x85/0x110
      [ 4070.575296]  ? update_load_avg+0x5f/0x5e0
      [ 4070.575302]  drm_ioctl+0x1d3/0x370
      [ 4070.575307]  ? i915_oa_init_reg_state+0x110/0x110 [i915]
      [ 4070.575382]  ? gen8_gt_irq_handler+0x46/0x130 [i915]
      [ 4070.575445]  __x64_sys_ioctl+0x3c4/0x8d0
      [ 4070.575451]  ? __do_softirq+0xaa/0x1d2
      [ 4070.575456]  do_syscall_64+0x35/0x80
      [ 4070.575461]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 4070.575467] RIP: 0033:0x7f1ed5c10397
      [ 4070.575471] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
      [ 4070.575478] RSP: 002b:00007ffd65c8d7a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [ 4070.575484] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007f1ed5c10397
      [ 4070.575488] RDX: 00007ffd65c8d7c0 RSI: 0000000040106476 RDI: 0000000000000006
      [ 4070.575492] RBP: 00005620972f9c60 R08: 000000000000000a R09: 0000000000000005
      [ 4070.575496] R10: 000000000000000d R11: 0000000000000246 R12: 000000000000000a
      [ 4070.575500] R13: 000000000000000d R14: 0000000000000000 R15: 00007ffd65c8d7c0
      [ 4070.575505]  </TASK>
      [ 4070.575507] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) i915(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) aesni_intel(E) crypto_simd(E) intel_gtt(E) cryptd(E) ttm(E) rapl(E) intel_cstate(E) drm_kms_helper(E) cfbfillrect(E) syscopyarea(E) cfbimgblt(E) intel_uncore(E) sysfillrect(E) mei_me(E) sysimgblt(E) i2c_i801(E) fb_sys_fops(E) mei(E) intel_pch_thermal(E) i2c_smbus(E) cfbcopyarea(E) video(E) button(E) efivarfs(E) autofs4(E)
      [ 4070.575549] ---[ end trace 0000000000000000 ]---
      
      v3: fix incorrect syntax of spin_lock() replacing spin_lock_irqsave()
      
      v2: irqsave not required in a worker, neither conversion to irq safe
          elsewhere (Tvrtko),
        - perf: it's safe to call gen8_configure_context() even if context has
          been closed, no need to check,
        - drop unrelated cleanup (Andi, Tvrtko)
      Reported-by: NMark Janes <mark.janes@intel.com>
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/6222
      References: a4e7ccda ("drm/i915: Move context management under GEM")
      Fixes: f8246cf4 ("drm/i915/gem: Drop free_work for GEM contexts")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: NJanusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: <stable@vger.kernel.org> # v5.12+
      Signed-off-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220916092403.201355-3-janusz.krzysztofik@linux.intel.com
      ad3aa7c3
  8. 21 8月, 2022 1 次提交
  9. 19 8月, 2022 1 次提交
    • M
      drm/i915/guc: Add delay to disable scheduling after pin count goes to zero · 6a079903
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Alan Previn: Matt Brost first introduced this series back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
      6a079903
  10. 27 6月, 2022 1 次提交
  11. 22 6月, 2022 1 次提交
  12. 17 6月, 2022 1 次提交
    • T
      drm/i915: Improve user experience and driver robustness under SIGINT or similar · 45c64ecf
      Tvrtko Ursulin 提交于
      We have long standing customer complaints that pressing Ctrl-C (or to the
      effect of) causes engine resets with otherwise well behaving programs.
      
      Not only is logging engine resets during normal operation not desirable
      since it creates support incidents, but more fundamentally we should avoid
      going the engine reset path when we can since any engine reset introduces
      a chance of harming an innocent context.
      
      Reason for this undesirable behaviour is that the driver currently does
      not distinguish between banned contexts and non-persistent contexts which
      have been closed.
      
      To fix this we add the distinction between the two reasons for revoking
      contexts, which then allows the strict timeout only be applied to banned,
      while innocent contexts (well behaving) can preempt cleanly and exit
      without triggering the engine reset path.
      
      Note that the added context exiting category applies both to closed non-
      persistent context, and any exiting context when hangcheck has been
      disabled by the user.
      
      At the same time we rename the backend operation from 'ban' to 'revoke'
      which more accurately describes the actual semantics. (There is no ban at
      the backend level since banning is a concept driven by the scheduling
      frontend. Backends are simply able to revoke a running context so that
      is the more appropriate name chosen.)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NAndrzej Hajda <andrzej.hajda@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220527072452.2225610-1-tvrtko.ursulin@linux.intel.com
      45c64ecf
  13. 02 6月, 2022 1 次提交
    • M
      drm/i915/sseu: Disassociate internal subslice mask representation from uapi · b87d3901
      Matt Roper 提交于
      As with EU masks, it's easier to store subslice/DSS masks internally in
      a format that's more natural for the driver to work with, and then only
      covert into the u8[] uapi form when the query ioctl is invoked.  Since
      the hardware design changed significantly with Xe_HP, we'll use a union
      to choose between the old "hsw-style" subslice masks or the newer xehp
      mask.  HSW-style masks will be stored in an array of u8's, indexed by
      slice (there's never more than 6 subslices per slice on older
      platforms).  For Xe_HP and beyond where slices no longer exist, we only
      need a single bitmask.  However we already know that this mask is
      eventually going to grow too large for a simple u64 to hold, so we'll
      represent it in a manner that can be operated on by the utilities in
      linux/bitmap.h.
      
      v2:
       - Fix typo: BIT(s) -> BIT(ss) in gen9_sseu_device_status()
      
      v3:
       - Eliminate sseu->ss_stride and just calculate the stride while
         specifically handling uapi.  (Tvrtko)
       - Use BITMAP_BITS() macro to refer to size of masks rather than
         passing I915_MAX_SS_FUSE_BITS directly.  (Tvrtko)
       - Report compute/geometry DSS masks separately when dumping Xe_HP SSEU
         info.  (Tvrtko)
       - Restore dropped range checks to intel_sseu_has_subslice().  (Tvrtko)
      
      v4:
       - Make the bitmap size macro check the size of the .xehp field rather
         than the containing union.  (Tvrtko)
       - Don't add GEM_BUG_ON() intel_sseu_has_subslice()'s check for whether
         slice or subslice ID exceed sseu->max_[sub]slices; various loops
         in the driver are expected to exceed these, so we should just
         silently return 'false.'
      
      v5:
       - Move XEHP_BITMAP_BITS() to the header so that we can also replace a
         usage of I915_MAX_SS_FUSE_BITS in one of the inline functions.
         (Bala)
       - Change the local variable in intel_slicemask_from_xehp_dssmask() from
         u16 to 'unsigned long' to make it a bit more future-proof.
      
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
      Signed-off-by: NMatt Roper <matthew.d.roper@intel.com>
      Acked-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NBalasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220601150725.521468-6-matthew.d.roper@intel.com
      b87d3901
  14. 05 4月, 2022 3 次提交
  15. 07 3月, 2022 1 次提交
  16. 04 3月, 2022 1 次提交
  17. 02 3月, 2022 1 次提交
  18. 14 2月, 2022 3 次提交
  19. 27 12月, 2021 1 次提交
  20. 24 12月, 2021 2 次提交
  21. 18 12月, 2021 1 次提交
  22. 01 12月, 2021 1 次提交
  23. 18 10月, 2021 1 次提交
  24. 16 10月, 2021 2 次提交
  25. 13 10月, 2021 1 次提交
  26. 08 10月, 2021 1 次提交
    • L
      drm/i915: remove IS_ACTIVE · 1a839e01
      Lucas De Marchi 提交于
      When trying to bring IS_ACTIVE to linux/kconfig.h I thought it wouldn't
      provide much value just encapsulating it in a boolean context. So I also
      added the support for handling undefined macros as the IS_ENABLED()
      counterpart. However the feedback received from Masahiro Yamada was that
      it is too ugly, not providing much value. And just wrapping in a boolean
      context is too dumb - we could simply open code it.
      
      As detailed in commit babaab2f ("drm/i915: Encapsulate kconfig
      constant values inside boolean predicates"), the IS_ACTIVE macro was
      added to workaround a compilation warning. However after checking again
      our current uses of IS_ACTIVE it turned out there is only
      1 case in which it triggers a warning in clang (due
      -Wconstant-logical-operand) and 2 in smatch. All the others
      can simply use the shorter version, without wrapping it in any macro.
      
      So here I'm dialing all the way back to simply removing the macro. That
      single case hit by clang can be changed to make the constant come first,
      so it doesn't think it's mask:
      
      	-       if (context && CONFIG_DRM_I915_FENCE_TIMEOUT)
      	+       if (CONFIG_DRM_I915_FENCE_TIMEOUT && context)
      
      As talked with Dan Carpenter, that logic will be added in smatch as
      well, so it will also stop warning about it.
      Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com>
      Reviewed-by: NJani Nikula <jani.nikula@intel.com>
      Reviewed-by: NMasahiro Yamada <masahiroy@kernel.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20211005171728.3147094-1-lucas.demarchi@intel.com
      1a839e01
  27. 05 10月, 2021 3 次提交
  28. 24 9月, 2021 1 次提交
    • T
      drm/i915: Reduce the number of objects subject to memcpy recover · a259cc14
      Thomas Hellström 提交于
      We really only need memcpy restore for objects that affect the
      operability of the migrate context. That is, primarily the page-table
      objects of the migrate VM.
      
      Add an object flag, I915_BO_ALLOC_PM_EARLY for objects that need early
      restores using memcpy and a way to assign LMEM page-table object flags
      to be used by the vms.
      
      Restore objects without this flag with the gpu blitter and only objects
      carrying the flag using TTM memcpy.
      
      Initially mark the migrate, gt, gtt and vgpu vms to use this flag, and
      defer for a later audit which vms actually need it. Most importantly, user-
      allocated vms with pinned page-table objects can be restored using the
      blitter.
      
      Performance-wise memcpy restore is probably as fast as gpu restore if not
      faster, but using gpu restore will help tackling future restrictions in
      mappable LMEM size.
      
      v4:
      - Don't mark the aliasing ppgtt page table flags for early resume, but
        rather the ggtt page table flags as intended. (Matthew Auld)
      - The check for user buffer objects during early resume is pointless, since
        they are never marked I915_BO_ALLOC_PM_EARLY. (Matthew Auld)
      v5:
      - Mark GuC LMEM objects with I915_BO_ALLOC_PM_EARLY to have them restored
        before we fire up the migrate context.
      
      Cc: Matthew Brost <matthew.brost@intel.com>
      Signed-off-by: NThomas Hellström <thomas.hellstrom@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-8-thomas.hellstrom@linux.intel.com
      a259cc14
  29. 14 9月, 2021 1 次提交
    • D
      drm/i915: Release ctx->syncobj on final put, not on ctx close · 03153666
      Daniel Vetter 提交于
      gem context refcounting is another exercise in least locking design it
      seems, where most things get destroyed upon context closure (which can
      race with anything really). Only the actual memory allocation and the
      locks survive while holding a reference.
      
      This tripped up Jason when reimplementing the single timeline feature
      in
      
      commit 00dae4d3
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:12 2021 -0500
      
          drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)
      
      We could fix the bug by holding ctx->mutex in execbuf and clear the
      pointer (again while holding the mutex) context_close, but it's
      cleaner to just make the context object actually invariant over its
      _entire_ lifetime. This way any other ioctl that's potentially racing,
      but holding a full reference, can still rely on ctx->syncobj being
      an immutable pointer. Which without this change, is not the case.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Fixes: 00dae4d3 ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Thomas Hellström" <thomas.hellstrom@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-2-daniel.vetter@ffwll.ch
      (cherry picked from commit c238980e)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      03153666
  30. 06 9月, 2021 2 次提交
    • D
      drm/i915: Drop __rcu from gem_context->vm · 9ec8795e
      Daniel Vetter 提交于
      It's been invariant since
      
          commit ccbc1b97
          Author: Jason Ekstrand <jason@jlekstrand.net>
          Date:   Thu Jul 8 10:48:30 2021 -0500
      
              drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
      this just completes the deed. I've tried to split out prep work for
      more careful review as much as possible, this is what's left:
      
      - get_ppgtt gets simplified since we don't need to grab a temporary
        reference - we can rely on the temporary reference for the gem_ctx
        while we inspect the vm. The new vm_id still needs a full
        i915_vm_open ofc. This also removes the final caller of context_get_vm_rcu
      
      - A pile of selftests can now just look at ctx->vm instead of
        rcu_dereference_protected( , true) or similar things.
      
      - All callers of i915_gem_context_vm also disappear.
      
      - I've changed the hugepage selftest to set scrub_64K without any
        locking, because when we inspect that setting we're also not taking
        any locks either. It works because it's a selftests that's careful
        (single threaded gives you nice ordering) and not a live driver
        where races can happen from anywhere.
      
      These can only be split up further if we have some intermediate state
      with a bunch more rcu_dereference_protected(ctx->vm, true), just to
      shut up lockdep and sparse.
      
      The conversion to __rcu happened in
      
      commit a4e7ccda
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Oct 4 14:40:09 2019 +0100
      
          drm/i915: Move context management under GEM
      
      Note that we're not breaking the actual bugfix in there: The real
      bugfix is pushing the i915_vm_relase onto a separate worker, to avoid
      locking inversion issues. The rcu conversion was just thrown in for
      entertainment value on top (no vm lookup isn't even close to anything
      that's a hotpath where removing the single spinlock can be measured).
      
      v2: Rebase over the change to move the i915_vm_put() into
      i915_gem_context_release().
      
      v3: Trivial conflict against repainted shed.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-9-daniel.vetter@ffwll.ch
      9ec8795e
    • D
      drm/i915: Use i915_gem_context_get_eb_vm in intel_context_set_gem · 0483a301
      Daniel Vetter 提交于
      Since
      
      commit ccbc1b97
      Author: Jason Ekstrand <jason@jlekstrand.net>
      Date:   Thu Jul 8 10:48:30 2021 -0500
      
          drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
      the gem_ctx->vm can't change anymore. Plus we always set the
      intel_context->vm, so might as well use the helper we have for that.
      
      This makes it very clear that we always overwrite intel_context->vm
      for userspace contexts, since the default is gt->vm, which is
      explicitly reserved for kernel context use. It would be good to split
      things up a bit further and avoid any possibility for an accident
      where we run kernel stuff in userspace vm or the other way round.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-8-daniel.vetter@ffwll.ch
      0483a301