1. 09 11月, 2022 1 次提交
  2. 08 11月, 2022 2 次提交
  3. 05 11月, 2022 1 次提交
    • J
      drm/i915/guc: Don't deadlock busyness stats vs reset · 178b8a36
      John Harrison 提交于
      The engine busyness stats has a worker function to do things like
      64bit extend the 32bit hardware counters. The GuC's reset prepare
      function flushes out this worker function to ensure no corruption
      happens during the reset. Unforunately, the worker function has an
      infinite wait for active resets to finish before doing its work. Thus
      a deadlock would occur if the worker function had actually started
      just as the reset starts.
      
      The function being used to lock the reset-in-progress mutex is called
      intel_gt_reset_trylock(). However, as noted it does not follow
      standard 'trylock' conventions and exit if already locked. So rename
      the current _trylock function to intel_gt_reset_lock_interruptible(),
      which is the behaviour it actually provides. In addition, add a new
      implementation of _trylock and call that from the busyness stats
      worker instead.
      
      v2: Rename existing trylock to interruptible rather than trying to
      preserve the existing (confusing) naming scheme (review comments from
      Tvrtko).
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-3-John.C.Harrison@Intel.com
      178b8a36
  4. 04 11月, 2022 1 次提交
  5. 03 11月, 2022 1 次提交
  6. 31 10月, 2022 6 次提交
  7. 27 10月, 2022 4 次提交
    • K
      i915/i915_gem_context: Remove debug message in i915_gem_context_create_ioctl · 67f99e34
      Karolina Drobnik 提交于
      We know that as long as GEM context create ioctl succeeds, a context was
      created. There is no need to write about it, especially when such a message
      heavily pollutes dmesg and makes debugging actual errors harder.
      
      Since commit baa89ba3 ("drm/i915/gem: initial conversion to new
      logging macros using coccinelle"), the logging for creating a new user
      context was moved under the driver debug output (for lack of a means for
      per-user logs, and a lack of user-focused drm.debug parameter). This
      only reveals how obnoxious having that spam be part of the driver debug
      logs, so remove it. [ from Chris Wilson ]
      Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NKarolina Drobnik <karolina.drobnik@intel.com>
      Cc: Andi Shyti <andi.shyti@linux.intel.com>
      Reviewed-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Signed-off-by: NAndi Shyti <andi.shyti@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221025091903.986819-1-karolina.drobnik@intel.com
      67f99e34
    • S
      drm/ttm: rework on ttm_resource to use size_t type · e3c92eb4
      Somalapuram Amaranath 提交于
      Change ttm_resource structure from num_pages to size_t size in bytes.
      v1 -> v2: change PFN_UP(dst_mem->size) to ttm->num_pages
      v1 -> v2: change bo->resource->size to bo->base.size at some places
      v1 -> v2: remove the local variable
      v1 -> v2: cleanup cmp_size_smaller_first()
      v2 -> v3: adding missing PFN_UP in ttm_bo_vm_fault_reserved
      Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221027091237.983582-1-Amaranath.Somalapuram@amd.comReviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      e3c92eb4
    • R
      drm/i915: stop abusing swiotlb_max_segment · 78a07fe7
      Robert Beckett 提交于
      swiotlb_max_segment used to return either the maximum size that swiotlb
      could bounce, or for Xen PV PAGE_SIZE even if swiotlb could bounce buffer
      larger mappings.  This made i915 on Xen PV work as it bypasses the
      coherency aspect of the DMA API and can't cope with bounce buffering
      and this avoided bounce buffering for the Xen/PV case.
      
      So instead of adding this hack back, check for Xen/PV directly in i915
      for the Xen case and otherwise use the proper DMA API helper to query
      the maximum mapping size.
      
      Replace swiotlb_max_segment() calls with dma_max_mapping_size().
      In i915_gem_object_get_pages_internal() no longer consider max_segment
      only if CONFIG_SWIOTLB is enabled. There can be other (iommu related)
      causes of specific max segment sizes.
      
      Fixes: a2daa27c ("swiotlb: simplify swiotlb_max_segment")
      Reported-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: NRobert Beckett <bob.beckett@collabora.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [hch: added the Xen hack, rewrote the changelog]
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221020110308.1582518-1-hch@lst.de
      78a07fe7
    • M
      drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis · 83321094
      Matthew Brost 提交于
      Add a delay, configurable via debugfs (default 34ms), to disable
      scheduling of a context after the pin count goes to zero. Disable
      scheduling is a costly operation as it requires synchronizing with
      the GuC. So the idea is that a delay allows the user to resubmit
      something before doing this operation. This delay is only done if
      the context isn't closed and less than a given threshold
      (default is 3/4) of the guc_ids are in use.
      
      Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
      However no real world workload with measured performance impact was
      available to prove the intended results. Today, this series is being
      republished in response to a real world workload that benefited greatly
      from it along with measured performance improvement.
      
      Workload description: 36 containers were created on a DG2 device where
      each container was performing a combination of 720p 3d game rendering
      and 30fps video encoding. The workload density was configured in a way
      that guaranteed each container to ALWAYS be able to render and
      encode no less than 30fps with a predefined maximum render + encode
      latency time. That means the totality of all 36 containers and their
      workloads were not saturating the engines to their max (in order to
      maintain just enough headrooom to meet the min fps and max latencies
      of incoming container submissions).
      
      Problem statement: It was observed that the CPU core processing the i915
      soft IRQ work was experiencing severe load. Using tracelogs and an
      instrumentation patch to count specific i915 IRQ events, it was confirmed
      that the majority of the CPU cycles were caused by the
      gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
      majority of the cycles was determined to be processing a specific G2H
      IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
      by GuC in response to i915 KMD sending H2G requests:
      INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
      whenever a context goes idle so that we can unpin the context from GuC.
      The high CPU utilization % symptom was limiting density scaling.
      
      Root Cause Analysis: Because the incoming execution buffers were spread
      across 36 different containers (each with multiple contexts) but the
      system in totality was NOT saturated to the max, it was assumed that each
      context was constantly idling between submissions. This was causing
      a thrashing of unpinning contexts from GuC at one moment, followed quickly
      by repinning them due to incoming workload the very next moment. These
      event-pairs were being triggered across multiple contexts per container,
      across all containers at the rate of > 30 times per sec per context.
      
      Metrics: When running this workload without this patch, we measured an
      average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
      seconds or ~10 million times over ~25+ mins. With this patch, the count
      reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
      improvement observed is ~99% for the average counts per 10 seconds.
      
      Design awareness: Selftest impact.
      As temporary WA disable this feature for the selftests. Selftests are
      very timing sensitive and any change in timing can cause failure. A
      follow up patch will fixup the selftests to understand this delay.
      
      Design awareness: Race between guc_request_alloc and guc_context_close.
      If a context close is issued while there is a request submission in
      flight and a delayed schedule disable is pending, guc_context_close
      and guc_request_alloc will race to cancel the delayed disable.
      To close the race, make sure that guc_request_alloc waits for
      guc_context_close to finish running before checking any state.
      
      Design awareness: GT Reset event.
      If a gt reset is triggered, as preparation steps, add an additional step
      to ensure all contexts that have a pending delay-disable-schedule task
      be flushed of it. Move them directly into the closed state after cancelling
      the worker. This is okay because the existing flow flushes all
      yet-to-arrive G2H's dropping them anyway.
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Signed-off-by: NAlan Previn <alan.previn.teres.alexis@intel.com>
      Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
      83321094
  8. 21 10月, 2022 2 次提交
  9. 19 10月, 2022 1 次提交
  10. 18 10月, 2022 1 次提交
  11. 15 10月, 2022 1 次提交
    • M
      drm/i915: enable PS64 support for DG2 · 8133a6da
      Matthew Auld 提交于
      It turns out that on production DG2/ATS HW we should have support for
      PS64. This feature allows to provide a 64K TLB hint at the PTE level,
      which is a lot more flexible than the current method of enabling 64K GTT
      pages for the entire page-table, since that leads to all kinds of
      annoying restrictions, as documented in:
      
      commit caa574ff
      Author: Matthew Auld <matthew.auld@intel.com>
      Date:   Sat Feb 19 00:17:49 2022 +0530
      
          drm/i915/uapi: document behaviour for DG2 64K support
      
          On discrete platforms like DG2, we need to support a minimum page size
          of 64K when dealing with device local-memory. This is quite tricky for
          various reasons, so try to document the new implicit uapi for this.
      
      With PS64, we can now drop the 2M GTT alignment restriction, and instead
      only require 64K or larger when dealing with lmem. We still use the
      compact-pt layout when possible, but only when we are certain that this
      doesn't interfere with userspace.
      
      Note that this is a change in uAPI behaviour, but hopefully shouldn't be
      a concern (IGT is at least able to autodetect the alignment), since we
      are only making the GTT alignment constraint less restrictive.
      
      Based on a patch from CQ Tang.
      
      v2: update the comment wrt scratch page
      v3: (Nirmoy)
       - Fix the selftest to actually use the random size, plus some comment
         improvements, also drop the rem stuff.
      Reported-by: NMichal Mrozek <michal.mrozek@intel.com>
      Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      Cc: Stuart Summers <stuart.summers@intel.com>
      Cc: Jordan Justen <jordan.l.justen@intel.com>
      Cc: Yang A Shi <yang.a.shi@intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
      Reviewed-by: NNirmoy Das <nirmoy.das@intel.com>
      Acked-by: NMichal Mrozek <michal.mrozek@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221004114915.221708-1-matthew.auld@intel.com
      8133a6da
  12. 12 10月, 2022 2 次提交
    • J
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld 提交于
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
    • M
      drm/i915: allow control over the flags when migrating · 695ddc93
      Matthew Auld 提交于
      In the next patch we want to move the object (if the current resource is
      not compatible), to the mappable part of lmem for some display buffers.
      Currently that requires being able to unset the I915_BO_ALLOC_GPU_ONLY
      hint.
      Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
      Cc: Jianshui Yu <jianshui.yu@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Nirmoy Das <nirmoy.das@intel.com>
      Reviewed-by: NNirmoy Das <nirmoy.das@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221004131916.233474-3-matthew.auld@intel.com
      (cherry picked from commit 999f4562)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      695ddc93
  13. 10 10月, 2022 2 次提交
  14. 06 10月, 2022 2 次提交
  15. 05 10月, 2022 3 次提交
  16. 04 10月, 2022 1 次提交
  17. 01 10月, 2022 1 次提交
    • A
      drm/i915/mtl: enable local stolen memory · dbb2ffbf
      Aravind Iddamsetty 提交于
      As an integrated GPU, MTL does not have local memory and HAS_LMEM()
      returns false.  However the platform's stolen memory is presented via
      BAR2 (i.e., the BAR we traditionally consider to be the GMADR on IGFX)
      and should be managed by the driver the same way that local memory is
      on dgpu platforms (which includes setting the "lmem" bit on page table
      entries).  We use the term "local stolen memory" to refer to this
      model.
      
      The major difference from the traditional BAR2 (GMADR) is that
      the stolen area is mapped via the BAR2 while in the former BAR2 is an
      aperture into the GTT VA through which access are made into stolen area.
      
      BSPEC: 53098, 63830
      
      v2:
      1. dropped is_dsm_invalid, updated valid_stolen_size check from Lucas
      (Jani, Lucas)
      2. drop lmembar_is_igpu_stolen
      3. revert to referring GFXMEM_BAR as GEN12_LMEM_BAR (Lucas)
      
      v3:(Jani)
      1. rename get_mtl_gms_size to mtl_get_gms_size
      2. define register for MMIO address
      
      v4:(Matt)
      1. Use REG_FIELD_GET to read GMS value
      2. replace the calculations with SZ_256M/SZ_8M
      
      v5: Include more details to commit message on how it is different from
      earlier platforms (Anshuman)
      
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Cc: Lucas De Marchi <lucas.demarchi@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Signed-off-by: NCQ Tang <cq.tang@intel.com>
      Signed-off-by: NAravind Iddamsetty <aravind.iddamsetty@intel.com>
      Original-author: CQ Tang
      Reviewed-by: NMatt Roper <matthew.d.roper@intel.com>
      Signed-off-by: NMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220929114658.145287-1-aravind.iddamsetty@intel.com
      dbb2ffbf
  18. 28 9月, 2022 1 次提交
  19. 27 9月, 2022 3 次提交
  20. 26 9月, 2022 1 次提交
  21. 24 9月, 2022 1 次提交
  22. 22 9月, 2022 2 次提交