提交 · 5d12fcef4ef15d00ae3228cc1ed80d2d81bc2c06 · openanolis / cloud-kernel

19 1月, 2017 1 次提交

drm/i915: Remove i915_vma_create from VMA API · a01cb37a

由 Chris Wilson 提交于 1月 16, 2017

With the introduce of i915_vma_instance() for obtaining the VMA
singleton for a (obj, vm, view) tuple, we can remove the
i915_vma_create() in favour of a single entry point. We do incur a
lookup onto an empty tree, but the i915_vma_create() were being called
infrequently and during initialisation, so the small overhead is
negligible.

v2: Drop the i915_ prefix from the now static vma_create() function
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170116152131.18089-4-chris@chris-wilson.co.uk

a01cb37a

13 1月, 2017 1 次提交

drm/i915: Invalidate the guc ggtt TLB upon insertion · 7c3f86b6

由 Chris Wilson 提交于 1月 12, 2017

Move the GuC invalidation of its ggtt TLB to where we perform the ggtt
modification rather than proliferate it into all the callers of the
insert (which may or may not in fact have to do the insertion).

v2: Just do the guc invalidate unconditionally, (afaict) it has no impact
without the guc loaded on gen8+
v3: Conditionally invalidate the guc - just in case that register has
not been validated for other modes.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170112110050.25333-1-chris@chris-wilson.co.uk

7c3f86b6

12 1月, 2017 2 次提交

drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround. · 8726f2fa

由 Francisco Jerez 提交于 1月 12, 2017

The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads.  Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).

The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit.  Do the same on KBL platforms.

Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).

v2: Drop some more code to avoid unused variable warning.

Fixes: 738fa1b3 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
[Removed double Fixes tag]
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com

8726f2fa

drm/i915: check ppgtt validity when init reg state · 34869776

由 Zhenyu Wang 提交于 1月 09, 2017

Check if ppgtt is valid for context when init reg state. For gvt
context which has no i915 allocated ppgtt, failed to check that
would cause kernel null ptr reference error.

v2: remove !48bit ppgtt case as we'll always update before submit (Chris)
Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109131453.3943-1-zhenyuw@linux.intel.com

34869776

11 1月, 2017 1 次提交

drm/i915: Replace 4096 with PAGE_SIZE or I915_GTT_PAGE_SIZE · f51455d4

由 Chris Wilson 提交于 1月 10, 2017

Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.

v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>

f51455d4

07 1月, 2017 1 次提交

drm/i915: Simplify testing for am-I-the-kernel-context? · 984ff29f

由 Chris Wilson 提交于 1月 06, 2017

The kernel context (dev_priv->kernel_context) is unique in that it is
not associated with any user filp - it is the only one with
ctx->file_priv == NULL. This is a simpler test than comparing it against
dev_priv->kernel_context which involves some pointer dancing.

In checking that this is true, we notice that the gvt context is
allocating itself a i915_hw_ppgtt it doesn't use and not flagging that
its file_priv should be invalid.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170106152013.24684-5-chris@chris-wilson.co.uk

984ff29f

05 1月, 2017 2 次提交

drm/i915/execlists: Reorder execlists register enabling · f3b8f912

由 Chris Wilson 提交于 1月 05, 2017

Empirically we restart following a GPU reset more successfully if we call
lrc_init_hws() (which contains a posting read) last. (The failure mode
that was observed was that breadcrumb writes into the HWS from the
recovered requests went astray leading to the context-switch maintaining
forward progress, but the requests not being retired/completed.)

For clarity, lrc_init_hws() is inlined (and the unused function then
removed).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-3-chris@chris-wilson.co.uk

f3b8f912

drm/i915: Assert that we do create the deferred context · 56f6e0a7

由 Chris Wilson 提交于 1月 05, 2017

In order to convince static analyzers that the allocation function
returns an error or sets ce->state, assert that it is set afterwards.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170105153023.30575-2-chris@chris-wilson.co.uk

56f6e0a7

31 12月, 2016 1 次提交

drm/i915: Complete kerneldoc for struct i915_gem_context · 6095868a

由 Chris Wilson 提交于 12月 31, 2016

The existing kerneldoc was outdated, so time for a refresh.

v2: Use single line kdoc, mention functions for manipulation
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161231112012.29263-3-chris@chris-wilson.co.uk

6095868a

24 12月, 2016 2 次提交

drm/i915: re-use computed offset bias for context pin · feef2a7c

由 Daniele Ceraolo Spurio 提交于 12月 23, 2016

The context has to obey the same offset requirements as the ring,
so we can re-use the same bias value we computed for the ring instead of
unconditionally using GUC_WOPCM_TOP.
Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-2-git-send-email-daniele.ceraolospurio@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

feef2a7c

drm/i915: request ring to be pinned above GUC_WOPCM_TOP · d3ef1af6

由 Daniele Ceraolo Spurio 提交于 12月 23, 2016

GuC will validate the ring offset and fail if it is in the
[0, GUC_WOPCM_TOP) range. The bias is conditionally applied only
if GuC loading is enabled (we can't check for guc submission enabled as
in other cases because HuC loading requires this fix).

Note that the default context is processed before enable_guc_loading is
sanitized, so we might still apply the bias to its ring even if it is
not needed.

v2: compute the value during ctx init and pass it to
    intel_ring_pin (Chris), updated commit message
Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1482537382-28584-1-git-send-email-daniele.ceraolospurio@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

d3ef1af6

20 12月, 2016 1 次提交

drm/i915: Fix use after free in logical_render_ring_init · d8953c83

由 Tvrtko Ursulin 提交于 12月 16, 2016

Commit 3b3f1650 ("drm/i915: Allocate intel_engine_cs
structure only for the enabled engines") introduced the
dynanically allocated engine instances and created an
potential use after free scenario in logical_render_ring_init
where lrc_destroy_wa_ctx_obj could be called after the engine
instance has been freed.

This can only happen during engine setup/init error handling
which luckily does not happen ever in practice.

Fix is to not call lrc_destroy_wa_ctx_obj since it would have
already been executed from the preceding engine cleanup.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: 3b3f1650 ("drm/i915: Allocate intel_engine_cs structure only for the enabled engines")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1481894322-2145-1-git-send-email-tvrtko.ursulin@linux.intel.com
(cherry picked from commit d038fc7e)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

d8953c83

19 12月, 2016 4 次提交

drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc · f73e7399

由 Chris Wilson 提交于 12月 18, 2016

A fairly trivial move of a matching pair of routines (for preparing a
request for construction) onto an engine vfunc. The ulterior motive is
to be able to create a mock request implementation.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk

f73e7399

drm/i915/execlists: Request the kernel context be pinned high · 2947e408

由 Chris Wilson 提交于 12月 18, 2016

PIN_HIGH is an expensive operation (in comparison to allocating from the
hole stack) unsuitable for frequent use (such as switching between
contexts). However, the kernel context should be pinned just once for
the lifetime of the driver, and here it is appropriate to keep it out of
the mappable range (in order to maximise mappable space for users).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-6-chris@chris-wilson.co.uk

2947e408

drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f

由 Chris Wilson 提交于 12月 18, 2016

The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk

e8a9c58f

drm/i915: Move intel_lrc_context_pin() to avoid the forward declaration · ef11c01d

由 Chris Wilson 提交于 12月 18, 2016

Just a simple move to avoid a forward declaration, though the diff likes
to present itself as a move of intel_logical_ring_alloc_request_extras()
in the opposite direction.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-2-chris@chris-wilson.co.uk

ef11c01d

16 12月, 2016 1 次提交

drm/i915: Fix use after free in logical_render_ring_init · d038fc7e

由 Tvrtko Ursulin 提交于 12月 16, 2016

Commit 3b3f1650 ("drm/i915: Allocate intel_engine_cs
structure only for the enabled engines") introduced the
dynanically allocated engine instances and created an
potential use after free scenario in logical_render_ring_init
where lrc_destroy_wa_ctx_obj could be called after the engine
instance has been freed.

This can only happen during engine setup/init error handling
which luckily does not happen ever in practice.

Fix is to not call lrc_destroy_wa_ctx_obj since it would have
already been executed from the preceding engine cleanup.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: 3b3f1650 ("drm/i915: Allocate intel_engine_cs structure only for the enabled engines")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1481894322-2145-1-git-send-email-tvrtko.ursulin@linux.intel.com

d038fc7e

06 12月, 2016 1 次提交

drm/i915/execlists: Use list_safe_reset_next() instead of opencoding · 0798cff4

由 Chris Wilson 提交于 12月 05, 2016

list.h provides a macro for updating the next element in a safe
list-iter, so let's use it so that it is hopefully clearer to the reader
about the unusual behaviour, and also easier to grep.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-6-chris@chris-wilson.co.uk

0798cff4

02 12月, 2016 1 次提交

drm/i915: Make GEM object create and create from data take dev_priv · 12d79d78

由 Tvrtko Ursulin 提交于 12月 01, 2016

Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)

12d79d78

29 11月, 2016 1 次提交

Revert "drm/i915/execlists: Use a local lock for dfs_link access" · 70cd1476

由 Chris Wilson 提交于 11月 28, 2016

This reverts commit 27745e82 ("drm/i915/execlists: Use a local lock
for dfs_link access") as the struct_mutex was required to prevent
concurrent retiring and freeing, now restored in the previous patch.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161128143649.4289-2-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>

70cd1476

17 11月, 2016 2 次提交

drm/i915: fix the dequeue logic for single_port_submission context · d7ab992c

由 Min He 提交于 11月 16, 2016

For a single_port_submission context, GVT expects that it can only be
submitted to port 0, and there shouldn't be any other context in port 1
at the same time. This is required by GVT-g context to have an opportunity
to save/restore some non-hw context render registers.

This patch is to workaround GVT-g.

v2: optimized code by following Chris's advice, and added more comments to
explain the patch.
v3: followed the coding style.
Signed-off-by: NMin He <min.he@intel.com>
Reviewed-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1479305104-17049-1-git-send-email-min.he@intel.com

d7ab992c

drm/i915/execlists: Use a local lock for dfs_link access · 27745e82

由 Chris Wilson 提交于 11月 16, 2016

Avoid requiring struct_mutex for exclusive access to the temporary
dfs_link inside the i915_dependency as not all callers may want to touch
struct_mutex. So rather than force them to take a highly contended
lock, introduce a local lock for the execlists schedule operation.
Reported-by: NDavid Weinehall <david.weinehall@linux.intel.com>
Fixes: 9a151987 ("drm/i915: Add execution priority boosting for mmioflips")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161116152721.11053-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>

27745e82

15 11月, 2016 3 次提交

drm/i915/scheduler: Execute requests in order of priorities · 20311bd3

由 Chris Wilson 提交于 11月 14, 2016

Track the priority of each request and use it to determine the order in
which we submit requests to the hardware via execlists.

The priority of the request is determined by the user (eventually via
the context) but may be overridden at any time by the driver. When we set
the priority of the request, we bump the priority of all of its
dependencies to match - so that a high priority drawing operation is not
stuck behind a background task.

When the request is ready to execute (i.e. we have signaled the submit
fence following completion of all its dependencies, including third
party fences), we put the request into a priority sorted rbtree to be
submitted to the hardware. If the request is higher priority than all
pending requests, it will be submitted on the next context-switch
interrupt as soon as the hardware has completed the current request. We
do not currently preempt any current execution to immediately run a very
high priority request, at least not yet.

One more limitation, is that this is first implementation is for
execlists only so currently limited to gen8/gen9.

v2: Replace recursive priority inheritance bumping with an iterative
depth-first search list.
v3: list_next_entry() for walking lists
v4: Explain how the dfs solves the recursion problem with PI.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk

20311bd3

drm/i915: Remove engine->execlist_lock · 663f71e7

由 Chris Wilson 提交于 11月 14, 2016

The execlist_lock is now completely subsumed by the engine->timeline->lock,
and so we can remove the redundant layer of locking.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-5-chris@chris-wilson.co.uk

663f71e7

drm/i915: Defer transfer onto execution timeline to actual hw submission · d55ac5bf

由 Chris Wilson 提交于 11月 14, 2016

Defer the transfer from the client's timeline onto the execution
timeline from the point of readiness to the point of actual submission.
For example, in execlists, a request is finally submitted to hardware
when the hardware is ready, and only put onto the hardware queue when
the request is ready. By deferring the transfer, we ensure that the
timeline is maintained in retirement order if we decide to queue the
requests onto the hardware in a different order than fifo.

v2: Rebased onto distinct global/user timeline lock classes.
v3: Play with the position of the spin_lock().
v4: Nesting finally resolved with distinct sw_fence lock classes.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk

d55ac5bf

07 11月, 2016 1 次提交

drm/i915: Make sure engines are idle during GPU idling in LR mode · 0cb5670b

由 Imre Deak 提交于 11月 07, 2016

We assume that the GPU is idle once receiving the seqno via the last
request's user interrupt. In execlist mode the corresponding context
completed interrupt can be delayed though and until this latter
interrupt arrives we consider the request to be pending on the ELSP
submit port. This can cause a problem during system suspend where this
last request will be seen by the resume code as still pending. Such
pending requests are normally replayed after a GPU reset, but during
resume we reset both SW and HW tracking of the ring head/tail pointers,
so replaying the pending request with its stale tail pointer will leave
the ring in an inconsistent state. A subsequent request submission can
lead then to the GPU executing from uninitialized area in the ring
behind the above stale tail pointer.

Fix this by making sure any pending request on the ELSP port is
completed before suspending. I used a polling wait since the completion
time I measured was <1ms and since normally we only need to wait during
system suspend. GPU idling during runtime suspend is scheduled with a
delay (currently 50-100ms) after the retirement of the last request at
which point the context completed interrupt must have arrived already.

The chance of this bug was increased by

commit 1c777c5d
Author: Imre Deak <imre.deak@intel.com>
Date:   Wed Oct 12 17:46:37 2016 +0300

    drm/i915/hsw: Fix GPU hang during resume from S3-devices state

but it could happen even without the explicit GPU reset, since we
disable interrupts afterwards during the suspend sequence.

v2:
- Do an unlocked poll-wait first. (Chris)
v3-4:
- s/intel_lr_engines_idle/intel_execlists_idle/ and move
  i915.enable_execlists check to the new helper. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98470Signed-off-by: NImre Deak <imre.deak@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1478510405-11799-3-git-send-email-imre.deak@intel.com

0cb5670b

29 10月, 2016 6 次提交

drm/i915: Defer breadcrumb emission · caddfe71