1. 01 2月, 2017 2 次提交
  2. 30 1月, 2017 2 次提交
  3. 25 1月, 2017 2 次提交
  4. 24 1月, 2017 4 次提交
  5. 23 1月, 2017 6 次提交
  6. 19 1月, 2017 1 次提交
  7. 18 1月, 2017 1 次提交
    • F
      drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. · 4fc020d8
      Francisco Jerez 提交于
      The WaDisableLSQCROPERFforOCL workaround has the side effect of
      disabling an L3SQ optimization that has huge performance implications
      and is unlikely to be necessary for the correct functioning of usual
      graphic workloads.  Userspace is free to re-enable the workaround on
      demand, and is generally in a better position to determine whether the
      workaround is necessary than the DRM is (e.g. only during the
      execution of compute kernels that rely on both L3 fences and HDC R/W
      requests).
      
      The same workaround seems to apply to BDW (at least to production
      stepping G1) and SKL as well (the internal workaround database claims
      that it does for all steppings, while the BSpec workaround table only
      mentions pre-production steppings), but the DRM doesn't do anything
      beyond whitelisting the L3SQCREG4 register so userspace can enable it
      when it sees fit.  Do the same on KBL platforms.
      
      Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
      and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
      This is followed by a regression of 35% and 10% respectively for the
      same benchmarks and platform caused by my recent patch series
      switching userspace to use the dataport constant cache instead of the
      sampler to implement uniform pull constant loads, which caused us to
      hit more heavily the L3 cache (and on platforms other than KBL had the
      opposite effect of improving performance of the same two benchmarks).
      The overall effect on KBL of this change combined with the recent
      userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
      was affected by the constant cache changes (though it improved as it
      did on other platforms rather than regressing), but is not
      significantly affected by this patch (with statistical significance of
      5% and sample size 20).
      
      v2: Drop some more code to avoid unused variable warning.
      
      Fixes: 738fa1b3 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: beignet@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.7+
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      [Removed double Fixes tag]
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
      (cherry picked from commit 8726f2fa)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      4fc020d8
  8. 13 1月, 2017 1 次提交
  9. 12 1月, 2017 2 次提交
    • F
      drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. · 8726f2fa
      Francisco Jerez 提交于
      The WaDisableLSQCROPERFforOCL workaround has the side effect of
      disabling an L3SQ optimization that has huge performance implications
      and is unlikely to be necessary for the correct functioning of usual
      graphic workloads.  Userspace is free to re-enable the workaround on
      demand, and is generally in a better position to determine whether the
      workaround is necessary than the DRM is (e.g. only during the
      execution of compute kernels that rely on both L3 fences and HDC R/W
      requests).
      
      The same workaround seems to apply to BDW (at least to production
      stepping G1) and SKL as well (the internal workaround database claims
      that it does for all steppings, while the BSpec workaround table only
      mentions pre-production steppings), but the DRM doesn't do anything
      beyond whitelisting the L3SQCREG4 register so userspace can enable it
      when it sees fit.  Do the same on KBL platforms.
      
      Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
      and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
      This is followed by a regression of 35% and 10% respectively for the
      same benchmarks and platform caused by my recent patch series
      switching userspace to use the dataport constant cache instead of the
      sampler to implement uniform pull constant loads, which caused us to
      hit more heavily the L3 cache (and on platforms other than KBL had the
      opposite effect of improving performance of the same two benchmarks).
      The overall effect on KBL of this change combined with the recent
      userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
      was affected by the constant cache changes (though it improved as it
      did on other platforms rather than regressing), but is not
      significantly affected by this patch (with statistical significance of
      5% and sample size 20).
      
      v2: Drop some more code to avoid unused variable warning.
      
      Fixes: 738fa1b3 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: beignet@lists.freedesktop.org
      Cc: <stable@vger.kernel.org> # v4.7+
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      [Removed double Fixes tag]
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
      8726f2fa
    • Z
      drm/i915: check ppgtt validity when init reg state · 34869776
      Zhenyu Wang 提交于
      Check if ppgtt is valid for context when init reg state. For gvt
      context which has no i915 allocated ppgtt, failed to check that
      would cause kernel null ptr reference error.
      
      v2: remove !48bit ppgtt case as we'll always update before submit (Chris)
      Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170109131453.3943-1-zhenyuw@linux.intel.com
      34869776
  10. 11 1月, 2017 1 次提交
  11. 07 1月, 2017 1 次提交
  12. 05 1月, 2017 2 次提交
  13. 31 12月, 2016 1 次提交
  14. 24 12月, 2016 2 次提交
  15. 20 12月, 2016 1 次提交
  16. 19 12月, 2016 4 次提交
  17. 16 12月, 2016 1 次提交
  18. 06 12月, 2016 1 次提交
  19. 02 12月, 2016 1 次提交
  20. 29 11月, 2016 1 次提交
  21. 17 11月, 2016 2 次提交
  22. 15 11月, 2016 1 次提交
    • C
      drm/i915/scheduler: Execute requests in order of priorities · 20311bd3
      Chris Wilson 提交于
      Track the priority of each request and use it to determine the order in
      which we submit requests to the hardware via execlists.
      
      The priority of the request is determined by the user (eventually via
      the context) but may be overridden at any time by the driver. When we set
      the priority of the request, we bump the priority of all of its
      dependencies to match - so that a high priority drawing operation is not
      stuck behind a background task.
      
      When the request is ready to execute (i.e. we have signaled the submit
      fence following completion of all its dependencies, including third
      party fences), we put the request into a priority sorted rbtree to be
      submitted to the hardware. If the request is higher priority than all
      pending requests, it will be submitted on the next context-switch
      interrupt as soon as the hardware has completed the current request. We
      do not currently preempt any current execution to immediately run a very
      high priority request, at least not yet.
      
      One more limitation, is that this is first implementation is for
      execlists only so currently limited to gen8/gen9.
      
      v2: Replace recursive priority inheritance bumping with an iterative
      depth-first search list.
      v3: list_next_entry() for walking lists
      v4: Explain how the dfs solves the recursion problem with PI.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
      20311bd3