1. 27 10月, 2022 5 次提交
    • M
      drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis · 83321094
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      
      Design awareness: Selftest impact.
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Design awareness: Race between guc_request_alloc and guc_context_close.
      If a context close is issued while there is a request submission in
      flight and a delayed schedule disable is pending, guc_context_close
      and guc_request_alloc will race to cancel the delayed disable.
      To close the race, make sure that guc_request_alloc waits for
      guc_context_close to finish running before checking any state.
      
      Design awareness: GT Reset event.
      If a gt reset is triggered, as preparation steps, add an additional step
      to ensure all contexts that have a pending delay-disable-schedule task
      be flushed of it. Move them directly into the closed state after cancelling
      the worker. This is okay because the existing flow flushes all
      yet-to-arrive G2H's dropping them anyway.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
      83321094
    • A
      drm/i915/guc: Fix GuC error capture sizing estimation and reporting · befb231d
      Alan Previn 提交于
      During GuC error capture initialization, we estimate the amount of size
      we need for the error-capture-region of the shared GuC-log-buffer.
      This calculation was incorrect so fix that. With the fixed calculation
      we can reduce the allocation of error-capture region from 4MB to 1MB
      (see note2 below for reasoning). Additionally, switch from drm_notice to
      drm_debug for the 3X spare size check since that would be impossible to
      hit without redesigning gpu_coredump framework to hold multiple captures.
      
      NOTE1: Even for 1x the min size estimation case, actually running out
      of space is a corner case because it can only occur if all engine
      instances get reset all at once and i915 isn't able extract the capture
      data fast enough within G2H handler worker.
      
      NOTE2: With the corrected calculation, a DG2 part required ~77K and a PVC
      required ~115K (1X min-est-size that is calculated as one-shot all-engine-
      reset scenario).
      
      Fixes: d7c15d76 ("drm/i915/guc: Check sizing of guc_capture output")
      Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
      Cc: Matthew Brost <matthew.brost@intel.com>
      Cc: Lucas De Marchi <lucas.demarchi@intel.com>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Chris Wilson <chris.p.wilson@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221026060506.1007830-2-alan.previn.teres.alexis@intel.com
      befb231d
    • V
      drm/i915/slpc: Use platform limits for min/max frequency · 37d52e44
      Vinay Belgaumkar 提交于
      GuC will set the min/max frequencies to theoretical max on
      ATS-M. This will break kernel ABI, so limit min/max frequency
      to RP0(platform max) instead.
      
      Also modify the SLPC selftest to update the min frequency
      when we have a server part so that we can iterate between
      platform min and max.
      
      v2: Check softlimits instead of platform limits (Riana)
      v3: More review comments (Ashutosh)
      v4: No need to use saved_min_freq and other comments (Ashutosh)
      
      Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7030Acked-by: NNirmoy Das <nirmoy.das@intel.com>
      Reviewed-by: NRiana Tauro <riana.tauro@intel.com>
      Signed-off-by: NVinay Belgaumkar <vinay.belgaumkar@intel.com>
      Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221024225453.4856-1-vinay.belgaumkar@intel.com
      37d52e44
    • V
      drm/i915/slpc: Optmize waitboost for SLPC · f864a29a
      Vinay Belgaumkar 提交于
      Waitboost (when SLPC is enabled) results in a H2G message. This can result
      in thousands of messages during a stress test and fill up an already full
      CTB. There is no need to request for boost if min softlimit is equal or
      greater than it.
      
      v2: Add the tracing back, and check requested freq
      in the worker thread (Tvrtko)
      v3: Check requested freq in dec_waiters as well
      v4: Only check min_softlimit against boost_freq. Limit this
      optimization for server parts for now.
      v5: min_softlimit can be greater than boost (Ashutosh)
      Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: NVinay Belgaumkar <vinay.belgaumkar@intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221024171108.14373-1-vinay.belgaumkar@intel.com
      f864a29a
    • G
      drm/i915/xelp: Add Wa_1806527549 · e62f31e1
      Gustavo Sousa 提交于
      Workaround to be applied to platforms using XE_LP graphics.
      
      BSpec: 52890
      Signed-off-by: NGustavo Sousa <gustavo.sousa@intel.com>
      Reviewed-by: NLucas De Marchi <lucas.demarchi@intel.com>
      Signed-off-by: NMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221019161334.119885-1-gustavo.sousa@intel.com
      e62f31e1
  2. 25 10月, 2022 8 次提交
  3. 21 10月, 2022 4 次提交
  4. 20 10月, 2022 1 次提交
  5. 19 10月, 2022 2 次提交
  6. 18 10月, 2022 15 次提交
  7. 17 10月, 2022 5 次提交