提交 f3987631 编写于 作者: P Paulo Zanoni 提交者: Daniel Vetter

drm/i915: add workarounds to gen7_render_ring_flush

From Bspec, Vol 2a, Section 1.9.3.4 "PIPE_CONTROL", intro section
detailing the various workarounds:

"[DevIVB {W/A}, DevHSW {W/A}]: Pipe_control with CS-stall bit
set must be issued before a pipe-control command that has the State
Cache Invalidate bit set."

Note that public Bspec has different numbering, it's Vol2Part1,
Section 1.10.4.1 "PIPE_CONTROL" there.

There's also a second workaround for the PIPE_CONTROL command itself:

"[DevIVB, DevVLV, DevHSW] {WA}: Every 4th PIPE_CONTROL command, not
counting the PIPE_CONTROL with only read-cache-invalidate bit(s) set,
must have a CS_STALL bit set"

For simplicity we simply set the CS_STALL bit on every pipe_control on
gen7+

Note that this massively helps on some hsw machines, together with the
following patch to unconditionally set the CS_STALL bit on every
pipe_control it prevents a gpu hang every few seconds.

This is a regression that has been introduced in the pipe_control
cleanup:

commit 6c6cf5aa
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 20 18:02:28 2012 +0100

    drm/i915: Only apply the SNB pipe control w/a to gen6

It looks like the massive snb pipe_control workaround also papered
over any issues on ivb and hsw.
Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: squashed both workarounds together, pimped commit message
with Bsepc citations, regression commit citation and changed the
comment in the code a bit to clarify that we unconditionally set
CS_STALL to avoid being hurt by trying to be clever.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
上级 b3111509
...@@ -262,6 +262,25 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring, ...@@ -262,6 +262,25 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring,
return 0; return 0;
} }
static int
gen7_render_ring_cs_stall_wa(struct intel_ring_buffer *ring)
{
int ret;
ret = intel_ring_begin(ring, 4);
if (ret)
return ret;
intel_ring_emit(ring, GFX_OP_PIPE_CONTROL(4));
intel_ring_emit(ring, PIPE_CONTROL_CS_STALL |
PIPE_CONTROL_STALL_AT_SCOREBOARD);
intel_ring_emit(ring, 0);
intel_ring_emit(ring, 0);
intel_ring_advance(ring);
return 0;
}
static int static int
gen7_render_ring_flush(struct intel_ring_buffer *ring, gen7_render_ring_flush(struct intel_ring_buffer *ring,
u32 invalidate_domains, u32 flush_domains) u32 invalidate_domains, u32 flush_domains)
...@@ -271,6 +290,16 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring, ...@@ -271,6 +290,16 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring,
u32 scratch_addr = pc->gtt_offset + 128; u32 scratch_addr = pc->gtt_offset + 128;
int ret; int ret;
/*
* Ensure that any following seqno writes only happen when the render
* cache is indeed flushed.
*
* Workaround: 4th PIPE_CONTROL command (except the ones with only
* read-cache invalidate bits set) must have the CS_STALL bit set. We
* don't try to be clever and just set it unconditionally.
*/
flags |= PIPE_CONTROL_CS_STALL;
/* Just flush everything. Experiments have shown that reducing the /* Just flush everything. Experiments have shown that reducing the
* number of bits based on the write domains has little performance * number of bits based on the write domains has little performance
* impact. * impact.
...@@ -278,11 +307,6 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring, ...@@ -278,11 +307,6 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring,
if (flush_domains) { if (flush_domains) {
flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
/*
* Ensure that any following seqno writes only happen
* when the render cache is indeed flushed.
*/
flags |= PIPE_CONTROL_CS_STALL;
} }
if (invalidate_domains) { if (invalidate_domains) {
flags |= PIPE_CONTROL_TLB_INVALIDATE; flags |= PIPE_CONTROL_TLB_INVALIDATE;
...@@ -295,6 +319,11 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring, ...@@ -295,6 +319,11 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring,
* TLB invalidate requires a post-sync write. * TLB invalidate requires a post-sync write.
*/ */
flags |= PIPE_CONTROL_QW_WRITE; flags |= PIPE_CONTROL_QW_WRITE;
/* Workaround: we must issue a pipe_control with CS-stall bit
* set before a pipe_control command that has the state cache
* invalidate bit set. */
gen7_render_ring_cs_stall_wa(ring);
} }
ret = intel_ring_begin(ring, 4); ret = intel_ring_begin(ring, 4);
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册