1. 17 2月, 2017 2 次提交
  2. 15 2月, 2017 2 次提交
  3. 14 2月, 2017 1 次提交
    • T
      drm/i915: Emit to ringbuffer directly · 73dec95e
      Tvrtko Ursulin 提交于
      This removes the usage of intel_ring_emit in favour of
      directly writing to the ring buffer.
      
      intel_ring_emit was preventing the compiler for optimising
      fetch and increment of the current ring buffer pointer and
      therefore generating very verbose code for every write.
      
      It had no useful purpose since all ringbuffer operations
      are started and ended with intel_ring_begin and
      intel_ring_advance respectively, with no bail out in the
      middle possible, so it is fine to increment the tail in
      intel_ring_begin and let the code manage the pointer
      itself.
      
      Useless instruction removal amounts to approximately
      two and half kilobytes of saved text on my build.
      
      Not sure if this has any measurable performance
      implications but executing a ton of useless instructions
      on fast paths cannot be good.
      
      v2:
       * Change return from intel_ring_begin to error pointer by
         popular demand.
       * Move tail increment to intel_ring_advance to enable some
         error checking.
      
      v3:
       * Move tail advance back into intel_ring_begin.
       * Rebase and tidy.
      
      v4:
       * Complete rebase after a few months since v3.
      
      v5:
       * Remove unecessary cast and fix !debug compile. (Chris Wilson)
      
      v6:
       * Make intel_ring_offset take request as well.
       * Fix recording of request postfix plus a sprinkle of asserts.
         (Chris Wilson)
      
      v7:
       * Use intel_ring_offset to get the postfix. (Chris Wilson)
       * Convert GVT code as well.
      
      v8:
       * Rename *out++ to *cs++.
      
      v9:
       * Fix GVT out to cs conversion in GVT.
      
      v10:
       * Rebase for new intel_ring_begin in selftests.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Zhi Wang <zhi.a.wang@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
      73dec95e
  4. 11 2月, 2017 1 次提交
  5. 10 2月, 2017 1 次提交
  6. 08 2月, 2017 1 次提交
    • C
      drm/i915: Restore context and pd for ringbuffer submission after reset · c0dcb203
      Chris Wilson 提交于
      Following a reset, the context and page directory registers are lost.
      However, the queue of requests that we resubmit after the reset may
      depend upon them - the registers are restored from a context image, but
      that restore may be inhibited and may simply be absent from the request
      if it was in the middle of a sequence using the same context. If we
      prime the CCID/PD registers with the first request in the queue (even
      for the hung request), we prevent invalid memory access for the
      following requests (and continually hung engines).
      
      v2: Magic BIT(8), reserved for future use but still appears unused.
      v3: Some commentary on handling innocent vs guilty requests
      v4: Add a wait for PD_BASE fetch. The reload appears to be instant on my
      Ivybridge, but this bit probably exists for a reason.
      
      Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170207152437.4252-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      c0dcb203
  7. 07 2月, 2017 1 次提交
  8. 30 1月, 2017 1 次提交
  9. 19 1月, 2017 1 次提交
  10. 18 1月, 2017 1 次提交
    • F
      drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. · 4fc020d8
      Francisco Jerez 提交于
      The WaDisableLSQCROPERFforOCL workaround has the side effect of
      disabling an L3SQ optimization that has huge performance implications
      and is unlikely to be necessary for the correct functioning of usual
      graphic workloads.  Userspace is free to re-enable the workaround on
      demand, and is generally in a better position to determine whether the
      workaround is necessary than the DRM is (e.g. only during the
      execution of compute kernels that rely on both L3 fences and HDC R/W
      requests).
      
      The same workaround seems to apply to BDW (at least to production
      stepping G1) and SKL as well (the internal workaround database claims
      that it does for all steppings, while the BSpec workaround table only
      mentions pre-production steppings), but the DRM doesn't do anything
      beyond whitelisting the L3SQCREG4 register so userspace can enable it
      when it sees fit.  Do the same on KBL platforms.
      
      Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
      and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
      This is followed by a regression of 35% and 10% respectively for the
      same benchmarks and platform caused by my recent patch series
      switching userspace to use the dataport constant cache instead of the
      sampler to implement uniform pull constant loads, which caused us to
      hit more heavily the L3 cache (and on platforms other than KBL had the
      opposite effect of improving performance of the same two benchmarks).
      The overall effect on KBL of this change combined with the recent
      userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
      was affected by the constant cache changes (though it improved as it
      did on other platforms rather than regressing), but is not
      significantly affected by this patch (with statistical significance of
      5% and sample size 20).
      
      v2: Drop some more code to avoid unused variable warning.
      
      Fixes: 738fa1b3 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: beignet@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.7+
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      [Removed double Fixes tag]
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
      (cherry picked from commit 8726f2fa)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      4fc020d8
  11. 12 1月, 2017 1 次提交
    • F
      drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. · 8726f2fa
      Francisco Jerez 提交于
      The WaDisableLSQCROPERFforOCL workaround has the side effect of
      disabling an L3SQ optimization that has huge performance implications
      and is unlikely to be necessary for the correct functioning of usual
      graphic workloads.  Userspace is free to re-enable the workaround on
      demand, and is generally in a better position to determine whether the
      workaround is necessary than the DRM is (e.g. only during the
      execution of compute kernels that rely on both L3 fences and HDC R/W
      requests).
      
      The same workaround seems to apply to BDW (at least to production
      stepping G1) and SKL as well (the internal workaround database claims
      that it does for all steppings, while the BSpec workaround table only
      mentions pre-production steppings), but the DRM doesn't do anything
      beyond whitelisting the L3SQCREG4 register so userspace can enable it
      when it sees fit.  Do the same on KBL platforms.
      
      Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
      and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
      This is followed by a regression of 35% and 10% respectively for the
      same benchmarks and platform caused by my recent patch series
      switching userspace to use the dataport constant cache instead of the
      sampler to implement uniform pull constant loads, which caused us to
      hit more heavily the L3 cache (and on platforms other than KBL had the
      opposite effect of improving performance of the same two benchmarks).
      The overall effect on KBL of this change combined with the recent
      userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
      was affected by the constant cache changes (though it improved as it
      did on other platforms rather than regressing), but is not
      significantly affected by this patch (with statistical significance of
      5% and sample size 20).
      
      v2: Drop some more code to avoid unused variable warning.
      
      Fixes: 738fa1b3 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: beignet@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.7+
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      [Removed double Fixes tag]
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
      8726f2fa
  12. 11 1月, 2017 1 次提交
  13. 07 1月, 2017 1 次提交
  14. 24 12月, 2016 1 次提交
  15. 19 12月, 2016 2 次提交
    • C
      drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc · f73e7399
      Chris Wilson 提交于
      A fairly trivial move of a matching pair of routines (for preparing a
      request for construction) onto an engine vfunc. The ulterior motive is
      to be able to create a mock request implementation.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk
      f73e7399
    • C
      drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f
      Chris Wilson 提交于
      The requests conversion introduced a nasty bug where we could generate a
      new request in the middle of constructing a request if we needed to idle
      the system in order to evict space for a context. The request to idle
      would be executed (and waited upon) before the current one, creating a
      minor havoc in the seqno accounting, as we will consider the current
      request to already be completed (prior to deferred seqno assignment) but
      ring->last_retired_head would have been updated and still could allow
      us to overwrite the current request before execution.
      
      We also employed two different mechanisms to track the active context
      until it was switched out. The legacy method allowed for waiting upon an
      active context (it could forcibly evict any vma, including context's),
      but the execlists method took a step backwards by pinning the vma for
      the entire active lifespan of the context (the only way to evict was to
      idle the entire GPU, not individual contexts). However, to circumvent
      the tricky issue of locking (i.e. we cannot take struct_mutex at the
      time of i915_gem_request_submit(), where we would want to move the
      previous context onto the active tracker and unpin it), we take the
      execlists approach and keep the contexts pinned until retirement.
      The benefit of the execlists approach, more important for execlists than
      legacy, was the reduction in work in pinning the context for each
      request - as the context was kept pinned until idle, it could short
      circuit the pinning for all active contexts.
      
      We introduce new engine vfuncs to pin and unpin the context
      respectively. The context is pinned at the start of the request, and
      only unpinned when the following request is retired (this ensures that
      the context is idle and coherent in main memory before we unpin it). We
      move the engine->last_context tracking into the retirement itself
      (rather than during request submission) in order to allow the submission
      to be reordered or unwound without undue difficultly.
      
      And finally an ulterior motive for unifying context handling was to
      prepare for mock requests.
      
      v2: Rename to last_retired_context, split out legacy_context tracking
      for MI_SET_CONTEXT.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
      e8a9c58f
  16. 07 12月, 2016 1 次提交
  17. 02 12月, 2016 2 次提交
  18. 15 11月, 2016 1 次提交
  19. 01 11月, 2016 1 次提交
  20. 29 10月, 2016 9 次提交
  21. 25 10月, 2016 1 次提交
    • A
      drm/i915: Add low level set of routines for programming PM IER/IIR/IMR register set · f4e9af4f
      Akash Goel 提交于
      So far PM IER/IIR/IMR registers were being used only for Turbo related
      interrupts. But interrupts coming from GuC also use the same set.
      As a precursor to supporting GuC interrupts, added new low level routines
      so as to allow sharing the programming of PM IER/IIR/IMR registers between
      Turbo & GuC.
      Also similar to PM IMR, maintaining a bitmask for PM IER register, to allow
      easy sharing of it between Turbo & GuC without involving a rmw operation.
      
      v2:
      - For appropriateness & avoid any ambiguity, rename old functions
        enable/disable pm_irq to mask/unmask pm_irq and rename new functions
        enable/disable pm_interrupts to enable/disable pm_irq. (Tvrtko)
      - Use u32 in place of uint32_t. (Tvrtko)
      
      v3:
      - Rename the fields pm_irq_mask & pm_ier_mask and do some cleanup. (Chris)
      - Rebase.
      
      v4: Fix the inadvertent disabling of User interrupt for VECS ring causing
          failure for certain IGTs.
      
      v5: Use dev_priv with HAS_VEBOX macro. (Tvrtko)
      Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      f4e9af4f
  22. 21 10月, 2016 1 次提交
  23. 14 10月, 2016 1 次提交
    • A
      drm/i915: Allocate intel_engine_cs structure only for the enabled engines · 3b3f1650
      Akash Goel 提交于
      With the possibility of addition of many more number of rings in future,
      the drm_i915_private structure could bloat as an array, of type
      intel_engine_cs, is embedded inside it.
      	struct intel_engine_cs engine[I915_NUM_ENGINES];
      Though this is still fine as generally there is only a single instance of
      drm_i915_private structure used, but not all of the possible rings would be
      enabled or active on most of the platforms. Some memory can be saved by
      allocating intel_engine_cs structure only for the enabled/active engines.
      Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
      indexed using the enums defined in intel_engine_id.
      To save memory and continue using the static engine/ring IDs, 'engine' is
      defined as an array of pointers.
      	struct intel_engine_cs *engine[I915_NUM_ENGINES];
      dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
      
      There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
      i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
      
      v2:
      - Remove the engine iterator field added in drm_i915_private structure,
        instead pass a local iterator variable to the for_each_engine**
        macros. (Chris)
      - Do away with intel_engine_initialized() and instead directly use the
        NULL pointer check on engine pointer. (Chris)
      
      v3:
      - Remove for_each_engine_id() macro, as the updated macro for_each_engine()
        can be used in place of it. (Chris)
      - Protect the access to Render engine Fault register with a NULL check, as
        engine specific init is done later in Driver load sequence.
      
      v4:
      - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
      - Kill the superfluous init_engine_lists().
      
      v5:
      - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
        allocation of intel_engine_cs structure. (Chris)
      
      v6:
      - Rebase.
      
      v7:
      - Optimize the for_each_engine_masked() macro. (Chris)
      - Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
      - Rebase.
      
      v8: Rebase.
      
      v9: Rebase.
      
      v10:
      - For index calculation use engine ID instead of pointer based arithmetic in
        intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
      - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
      - Use for_each_engine macro for cleanup in intel_engines_init() and remove
        check for NULL engine pointer in cleanup() routines. (Joonas)
      
      v11: Rebase.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
      3b3f1650
  24. 10 10月, 2016 1 次提交
  25. 07 10月, 2016 1 次提交
  26. 05 10月, 2016 2 次提交
  27. 26 9月, 2016 1 次提交