1. 20 9月, 2021 2 次提交
  2. 19 9月, 2021 4 次提交
  3. 17 9月, 2021 1 次提交
    • M
      kernel/locking: Add context to ww_mutex_trylock() · 12235da8
      Maarten Lankhorst 提交于
      i915 will soon gain an eviction path that trylock a whole lot of locks
      for eviction, getting dmesg failures like below:
      
        BUG: MAX_LOCK_DEPTH too low!
        turning off the locking correctness validator.
        depth: 48  max: 48!
        48 locks held by i915_selftest/5776:
         #0: ffff888101a79240 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x88/0x160
         #1: ffffc900009778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
         #2: ffff88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
         #3: ffff88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c4/0x9d0 [i915]
         #4: ffff88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
         #5: ffff88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
        ...
         #46: ffff88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
         #47: ffff88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915]
        INFO: lockdep is turned off.
      
      Fixing eviction to nest into ww_class_acquire is a high priority, but
      it requires a rework of the entire driver, which can only be done one
      step at a time.
      
      As an intermediate solution, add an acquire context to
      ww_mutex_trylock, which allows us to do proper nesting annotations on
      the trylocks, making the above lockdep splat disappear.
      
      This is also useful in regulator_lock_nested, which may avoid dropping
      regulator_nesting_mutex in the uncontended path, so use it there.
      
      TTM may be another user for this, where we could lock a buffer in a
      fastpath with list locks held, without dropping all locks we hold.
      
      [peterz: rework actual ww_mutex_trylock() implementations]
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/YUBGPdDDjKlxAuXJ@hirez.programming.kicks-ass.net
      12235da8
  4. 16 9月, 2021 1 次提交
  5. 15 9月, 2021 7 次提交
  6. 14 9月, 2021 23 次提交
  7. 11 9月, 2021 1 次提交
  8. 10 9月, 2021 1 次提交
    • T
      drm/i915: Use Transparent Hugepages when IOMMU is enabled · 74388ca4
      Tvrtko Ursulin 提交于
      Usage of Transparent Hugepages was disabled in 9987da4b
      ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
      appears majority of performance regressions reported with an enabled IOMMU
      can be almost eliminated by turning them on, lets just do that.
      
      To err on the side of safety we keep the current default in cases where
      IOMMU is not active, and only when it is default to the "huge=within_size"
      mode. Although there probably would be wins to enable them throughout,
      more extensive testing across benchmarks and platforms would need to be
      done.
      
      With the patch and IOMMU enabled my local testing on a small Skylake part
      shows OglVSTangent regression being reduced from ~14% (IOMMU on versus
      IOMMU off) to ~2% (same comparison but with THP on).
      
      More detailed testing done in the below referenced Gitlab issue by Eero:
      
      Skylake GT4e:
      
      Performance drops from enabling IOMMU:
      
          30-35% SynMark CSDof
          20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent
          ~20% GLB Egypt  (1/2 screen window)
          10-15% GLB T-Rex (1/2 screen window)
          8-10% GfxBench T-Rex, MemBW GPU blit
          7-8% SynMark DeferredAA + TerrainFly* + ZBuffer
          6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth
          5-6% GfxBench CarChase, Unigine Valley
          3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture,
               SynMark Fill*, Deferred, TerrainPan*
          1-2% Most of the other tests
      
      With the patch drops become:
      
          20-25% SynMark TexMem*
          15-20% GLB Egypt (1/2 screen window)
          10-15% GLB T-Rex (1/2 screen window)
          4-7% GfxBench T-Rex, GpuTest Triangle
          1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%)
          3% GfxBench Manhattan 3.0, SynMark CSDof
          2-3% Unigine Heaven + Valley, MemBW GPU texture
          1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins
      
      Broxton:
      
      Performance drops from IOMMU, without patch:
      
          30% MemBW GPU write
          25% SynMark ZBuffer + Fill*
          20% MemBW GPU blit
          15% MemBW GPU blend, GpuTest Triangle
          10-15% MemBW GPU texture
          10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly*
          7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex,
               SynMark Deferred* + TexMem*
          6-8% GfxBench CarChase, Unigine Valley,
               SynMark CSCloth + ShMapVsm + TerrainPan*
          5-6% GfxBench Manhattan 3.1 + GL AztecRuins,
               SynMark CSDof + TexFilterTri
          2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] +
               TexFilterAniso, GpuTest GiMark + 32-bit Julia
      
      And with patch:
      
          15-20% MemBW GPU texture
          10% SynMark TexMem*
          8-9% GLB Egypt (1/2 screen window)
          4-5% GLB T-Rex (1/2 screen window)
          3-6% GfxBench Manhattan 3.0, GpuTest FurMark,
               SynMark Deferred + TexFilterTri
          3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing
          2-4% GpuTest Triangle, SynMark DeferredAA
          2-3% Unigine Heaven + Valley
          1-3% SynMark Terrain*
          1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer
      
      Tigerlake-H:
      
          20-25% MemBW GPU texture
          15-20% GpuTest Triangle
          13-15% SynMark TerrainFly* + DeferredAA + HdrBloom
          8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes
          6-7% GfxBench Manhattan 3.0, SynMark TexMem*
          4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than
               offscreen versions of T-Rex/Egypt)
          4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia,
               SynMark CSDof + DrvState
          3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D
          1-7% Media tests
          2-3% MemBW GPU blit
          1-3% Most of the rest of 3D tests
      
      With the patch:
      
          6-8% MemBW GPU blend => the only regression in these tests (compared
               to IOMMU without THP)
          4-6% SynMark DrvState (not impacted) + HdrBloom (improved)
          3-4% GLB T-Rex
          ~3% GLB Egypt, SynMark DrvRes
          1-3% GfxBench T-Rex + Egypt, SynMark TexFilterTri
          1-2% GfxBench CarChase + GLES AztecRuins, Unigine Valley,
              GpuTest Triangle
          ~1% GfxBench Manhattan 3.0/3.1, Unigine Heaven
      
      Perf of several tests actually improved with IOMMU + THP, compared to no
      IOMMU / no THP:
      
          10-15% SynMark Batch[0-3]
          5-10% MemBW GPU texture, SynMark ShMapVsm
          3-4% SynMark Fill* + Geom*
          2-3% SynMark TexMem512 + CSCloth
          1-2% SynMark TexMem128 + DeferredAA
      
      As a summary across all platforms, these are the benchmarks where enabling
      THP on top of IOMMU enabled brings regressions:
      
       * Skylake GT4e:
         20-25% SynMark TexMem*
         (whereas all MemBW GPU tests either improve or are not affected)
      
       * Broxton J4205:
         7% MemBW GPU texture
         2-3% SynMark TexMem*
      
       * Tigerlake-H:
         7% MemBW GPU blend
      
      Other benchmarks show either lowering of regressions or improvements.
      
      v2:
       * Add Kconfig dependency to transparent hugepages and some help text.
       * Move to helper for easier handling of kernel build options.
      
      v3:
       * Drop Kconfig. (Daniel)
      
      v4:
       * Add some benchmark results to commit message.
      
      v5:
       * Add explicit regression summary to commit message. (Eero)
      
      References: b901bb89 ("drm/i915/gemfs: enable THP")
      References: 9987da4b ("drm/i915: Disable THP until we have a GPU read BW W/A")
      References: https://gitlab.freedesktop.org/drm/intel/-/issues/430Co-developed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210909114448.508493-1-tvrtko.ursulin@linux.intel.com
      74388ca4