1. 12 9月, 2022 5 次提交
  2. 08 9月, 2022 4 次提交
  3. 07 9月, 2022 1 次提交
  4. 06 9月, 2022 1 次提交
  5. 05 9月, 2022 1 次提交
  6. 02 9月, 2022 1 次提交
  7. 01 9月, 2022 1 次提交
  8. 31 8月, 2022 2 次提交
  9. 30 8月, 2022 1 次提交
  10. 29 8月, 2022 1 次提交
  11. 26 8月, 2022 5 次提交
  12. 24 8月, 2022 2 次提交
  13. 21 8月, 2022 1 次提交
  14. 19 8月, 2022 2 次提交
    • M
      drm/i915/guc: Add delay to disable scheduling after pin count goes to zero · 6a079903
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Alan Previn: Matt Brost first introduced this series back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220817020511.2180747-3-alan.previn.teres.alexis@intel.com
      6a079903
    • M
      61faec5f
  15. 18 8月, 2022 9 次提交
  16. 17 8月, 2022 2 次提交
  17. 09 8月, 2022 1 次提交
    • M
      drm/i915/ttm: fix CCS handling · 8676145e
      Matthew Auld 提交于
      Crucible + recent Mesa seems to sometimes hit:
      
      GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER)
      
      And it looks like we can also trigger this with gem_lmem_swapping, if we
      modify the test to use slightly larger object sizes.
      
      Looking closer it looks like we have the following issues in
      migrate_copy():
      
        - We are using plain integer in various places, which we can easily
          overflow with a large object.
      
        - We pass the entire object size (when the src is lmem) into
          emit_pte() and then try to copy it, which doesn't work, since we
          only have a few fixed sized windows in which to map the pages and
          perform the copy. With an object > 8M we therefore aren't properly
          copying the pages. And then with an object > 64M we trigger the
          GEM_BUG_ON(num_ccs_blks > NUM_CCS_BLKS_PER_XFER).
      
      So it looks like our copy handling for any object > 8M (which is our
      CHUNK_SZ) is currently broken on DG2.
      
      Fixes: da0595ae ("drm/i915/migrate: Evict and restore the flatccs capable lmem obj")
      Testcase: igt@gem_lmem_swapping
      Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Reviewed-by: Ramalingam C<ramalingam.c@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220805132240.442747-2-matthew.auld@intel.com
      8676145e