提交 · 016fd0c1aee31902d82c1ac32312f1cc32298b66 · bug2833 / cloud-kernel

26 7月, 2012 5 次提交

drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer · 016fd0c1

由 Chris Wilson 提交于 7月 20, 2012

Otherwise once we use the buffer with a BLT command on gen2/3, we will
always regard future command submissions as continuing the fenced
access. However, now that we flush/invalidate between every batch we can
drop this pessimism.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

016fd0c1

drm/i915: Replace the complex flushing logic with simple invalidate/flush all · 6ac42f41

由 Daniel Vetter 提交于 7月 21, 2012

Now that we unconditionally flush and invalidate between every batch
buffer, we no longer need the complex logic to decide which domains
require flushing. Remove it and rejoice.

v2 (danvet): Keep around the flip waiting logic. It's gross and
broken, I know, but we can't just kill that thing ... even if we just
keep it around as a reminder that things are broken.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6ac42f41

drm/i915: Remove the per-ring write list · 69c2fc89

由 Chris Wilson 提交于 7月 20, 2012

This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

69c2fc89

drm/i915: Replace the pending_gpu_write flag with an explicit seqno · 0201f1ec

由 Chris Wilson 提交于 7月 20, 2012

As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0201f1ec

drm/i915: Allow late allocation of request for i915_add_request() · 3bb73aba

由 Chris Wilson 提交于 7月 20, 2012

Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.

By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.

v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3bb73aba

25 7月, 2012 1 次提交

drm/i915: Set the context before setting up regs for the context. · 0da5cec1

由 Eric Anholt 提交于 7月 23, 2012

Fixes failures in transform feedback on gen7 because our SOL_RESET
flag was setting the transform feedback offsets in the old context
(occasionally happened to be ours) instead of the new context.
Signed-off-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0da5cec1

20 7月, 2012 1 次提交

drm/i915: Insert a flush between batches if the breadcrumb was dropped · 09cf7c9a

由 Chris Wilson 提交于 7月 13, 2012

If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)

Note that beside simply adding the missing flush, avoiding potential
render corruption, this will also fix at least parts of the problem
introduced by some funny interaction of these two commits:

commit de2b9985
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 4 22:52:50 2012 +0200

    drm/i915: don't return a spurious -EIO from intel_ring_begin

which allowed intel_ring_begin to return -ERESTARTSYS and

commit cc889e0f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which essentially disabled the flushing list.

The issue happens when we submit a batch & emit it, but get
interrupted (thanks to the first patch) while trying to emit the
flush. On the next batch we still assume that the full gpu domain
handling is in effect and hence compute the invalidate&flushing
domains. But thanks to the 2nd patch we totally ignore these and only
invalidate all gpu domains, presuming that any required flushes have
been issued already.  Which is wrong and eventually results in us
updating the new write_domain values with the computed
pending_write_domain values, which leaves an object with write_domain
== 0 on the gpu_write_list.

As soon as we try to unbind that object, things blow up.

Fix this by emitting the missing flush according to the new
ring->gpu_caches_dirty flag.

Note that this does _not_ fix all the current cases where we end up
with an object on the flushing_list that can't be flushed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Add bug explanation to commit message.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

09cf7c9a

20 6月, 2012 1 次提交

drm/i915: disable flushing_list/gpu_write_list · cc889e0f

由 Daniel Vetter 提交于 6月 13, 2012

This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cc889e0f

14 6月, 2012 1 次提交

drm/i915/context: switch contexts with execbuf2 · 6e0a69db

由 Ben Widawsky 提交于 6月 04, 2012

Use the rsvd1 field in execbuf2 to specify the context ID associated
with the workload. This will allow the driver to do the proper context
switch when/if needed.

v2: Add checks for context switches on rings not supporting contexts.
Before the code would silently ignore such requests.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

6e0a69db

20 5月, 2012 1 次提交

drm/i915: Check whether the ring is initialised prior to dispatch · a15817cf

由 Chris Wilson 提交于 5月 11, 2012

Rather than use the magic feature tests HAS_BLT/HAS_BSD just check
whether the ring we are about to dispatch the execbuffer on is
initialised.

v2: Use intel_ring_initialized()
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a15817cf

08 5月, 2012 1 次提交

drm/i915: Limit calling mark-busy only for potential scanouts · acb87dfb

由 Chris Wilson 提交于 5月 03, 2012

The principle of intel_mark_busy() is that we want to spot the
transition of when the display engine is being used in order to bump
powersaving modes and increase display clocks. As such it is only
important when the display is changing, i.e. when rendering to the
scanout or other sprite/plane, and these are characterised by being
pinned.

v2: Mark the whole device as busy on execbuffer and pageflips as well
and rebase against dinq for the minor bug fix to be immediately
applicable.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: fix compile fail.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

acb87dfb

03 5月, 2012 2 次提交

drm/i915: disallow clip rects on gen5+ · 6ebebc92

由 Daniel Vetter 提交于 4月 26, 2012

Unfortunately there has been dri1 userspace that used gem to manage
the gtt and hence also needed cliprects in the execbuf ioctl. So
we can't ever remove that code without breaking the ioctl abi.

But at least we can disable it on gen5+, because these horrible
versions of mesa have not supported these chips.
Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6ebebc92

drm/i915: remove do_retire from i915_wait_request · b2da9fe5

由 Ben Widawsky 提交于 4月 26, 2012

This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.

The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.

v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b2da9fe5

24 4月, 2012 2 次提交

drm/i915: fix integer overflow in i915_gem_do_execbuffer() · 44afb3a0

由 Xi Wang 提交于 4月 23, 2012

On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

44afb3a0

drm/i915: fix integer overflow in i915_gem_execbuffer2() · ed8cd3b2

由 Xi Wang 提交于 4月 23, 2012

On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ed8cd3b2

18 4月, 2012 2 次提交

drm/i915: Remove the pipelined parameter from get_fence() · 06d98131

由 Chris Wilson 提交于 4月 17, 2012

We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.

Step 1 is the removal of the pipeline parameter to get_fence().
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

06d98131

drm/i915: Always flush tiling changes before accessing through the GTT · 7b09638f

由 Chris Wilson 提交于 4月 14, 2012

As we defer updating the fence register from set-tiling to the point of
use, we need to declare every access through the GTT as either fenced or
unfenced.

This patches fixes an old bug in the execbuffer relocation processing
which could conceivably be hit by a pathological userspace.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7b09638f

13 4月, 2012 2 次提交

drm/i915: use semaphores for the display plane · 2911a35b

由 Ben Widawsky 提交于 4月 05, 2012

In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
  - I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
  - We don't want to enable workarounds for early silicon unless we have
    to.
  - I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
  - This should be how the code already worked, unless I am
    misunderstanding your meaning.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2911a35b

drm/i915: Reorganise rules for get_fence/put_fence · 9a5a53b3

由 Chris Wilson 提交于 3月 22, 2012

By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9a5a53b3

01 4月, 2012 1 次提交

drm/i915: Mark untiled BLT commands as fenced on gen2/3 · 7dd49065

由 Chris Wilson 提交于 3月 21, 2012

The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.
Reported-by: NJiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7dd49065

27 3月, 2012 2 次提交

mm: extend prefault helpers to fault in more than PAGE_SIZE · f56f821f

由 Daniel Vetter 提交于 3月 25, 2012

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f56f821f

drm/i915: Avoid using mappable space for relocation processing through the CPU · dabdfe02

由 Chris Wilson 提交于 3月 26, 2012

We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

dabdfe02

26 3月, 2012 1 次提交

drm/i915: Batch copy_from_user for relocation processing · 1d83f442

由 Chris Wilson 提交于 3月 24, 2012

Originally the code tried to allocate a large enough array to perform
the copy using vmalloc, performance wasn't great and throughput was
improved by processing each individual relocation entry separately.
This too is not as efficient as one would desire. A compromise would be
to allocate a single page, or to allocate a few entries on the stack,
and process the copy in batches. The latter gives simpler code and more
consistent performance due to a lack of heuristic.

x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu)
before: 249000 785000 1280000 (80%)
page: 264000 896000 1280000 (65%)
on-stack: 264000 902000 1280000 (67%)

v2: Use 512-bytes of stack for batching rather than allocate a page.
v3: Tidy the code slightly with more descriptive variable names
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1d83f442

21 3月, 2012 1 次提交

drm/i915: implement SNB workaround for lazy global gtt · 149c8407

由 Daniel Vetter 提交于 2月 15, 2012

PIPE_CONTROL on snb needs global gtt mappings in place to workaround a
hw gotcha. No other commands need such a workaround. Luckily we can
detect a PIPE_CONTROL commands easily because they have a write_domain
= I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that).

v2: Binding the target of such a reloc into the global gtt actually
works instead of binding the source, which is rather pointless ...

v3: Kill a superflous has_global_gtt_mapping assignement noticed by
Chris Wilson.
Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

149c8407

10 2月, 2012 1 次提交

drm/i915: ppgtt binding/unbinding support · 7bddb01f

由 Daniel Vetter 提交于 2月 09, 2012

This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7bddb01f

09 2月, 2012 1 次提交

drm/i915: s/DRM_ERROR/DRM_DEBUG in i915_gem_execbuffer.c · ff240199

由 Daniel Vetter 提交于 1月 31, 2012

These are all user-trigerable, so tune down their loudness a notch.
For some of these we have i-g-t tests (because they prevent
newly-discovered bugs), without this patches running the test suite
leaves behind a dirty dmesg.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ff240199

30 1月, 2012 3 次提交

drm/i915: reject GTT domain in relocations · 4ca4a250

由 Daniel Vetter 提交于 12月 14, 2011

This confuses our domain tracking and can (for gtt write domains) lead
to a subsequent oops.

Tested by tests/gem_exec_bad_domains from i-g-t.
Reviewed-by: NEric Anholt <eric@anholt.net>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4ca4a250

drm/i915: Separate fence pin counting from normal bind pin counting · 1690e1eb

由 Chris Wilson 提交于 12月 14, 2011

In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...

Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: NPaul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1690e1eb

drm/i915: switch ring->id to be a real id · 96154f2f

由 Daniel Vetter 提交于 12月 14, 2011

... and add a helpr function for the places where we want a flag.

This way we can use ring->id to index into arrays.

v2: Resurrect the missing beautification-space Chris Wilson noted.
I'm moving this space around because I'll reuse ring_str in the next
patch.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

96154f2f

26 1月, 2012 1 次提交

drm/i915: argument to control retiring behavior · b93f9cf1

由 Ben Widawsky 提交于 1月 25, 2012

Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.
Reviewed-by: NKeith Packard <keithp@keithp.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b93f9cf1

04 1月, 2012 3 次提交

drm/i915: Add support for resetting the SO write pointers on gen7. · ae662d31

由 Eric Anholt 提交于 1月 03, 2012

These registers are automatically incremented by the hardware during
transform feedback to track where the next streamed vertex output
should go.  Unlike the previous generation, which had a packet for
setting the corresponding registers to a defined value, gen7 only has
MI_LOAD_REGISTER_IMM to do so.  That's a secure packet (since it loads
an arbitrary register), so we need to do it from the kernel, and it
needs to be settable atomically with the batchbuffer execution so that
two clients doing transform feedback don't stomp on each others'
state.

Instead of building a more complicated interface involcing setting the
registers to a specific value, just set them to 0 when asked and
userland can tweak its pointers accordingly.
Signed-off-by: NEric Anholt <eric@anholt.net>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NKenneth Graunke <kenneth@whitecape.org>
Signed-off-by: NKeith Packard <keithp@keithp.com>

ae662d31

drm/i915: Force sync command ordering (Gen6+) · 84f9f938

由 Ben Widawsky 提交于 12月 12, 2011

The docs say this is required for Gen7, and since the bit was added for
Gen6, we are also setting it there pit pf paranoia. Particularly as
Chris points out, if PIPE_CONTROL counts as a 3d state packet.

This was found through doc inspection by Ken and applies to Gen6+;
Reported-by: NKenneth Graunke <kenneth@whitecape.org>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NKeith Packard <keithp@keithp.com>

84f9f938

drm/i915: relative_constants_mode race fix · e2971bda

由 Ben Widawsky 提交于 12月 12, 2011

dev_priv keeps track of the current addressing mode that gets set at
execbuffer time. Unfortunately the existing code was doing this before
acquiring struct_mutex which leaves a race with another thread also
doing an execbuffer. If that wasn't bad enough, relocate_slow drops
struct_mutex which opens a much more likely error where another thread
comes in and modifies the state while relocate_slow is being slow.

The solution here is to just defer setting this state until we
absolutely need it, and we know we'll have struct_mutex for the
remainder of our code path.

v2: Keith noticed a bug in the original patch.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NKeith Packard <keithp@keithp.com>

e2971bda

27 12月, 2011 1 次提交

drm/i915: Disable semaphores by default on SNB · ebbd857e

由 Keith Packard 提交于 12月 26, 2011

Semaphores still cause problems on some machines:

> From Udo Steinberg:
>
> With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of
> text scroll in an xterm, such as when extracting a tar archive. Such as this
> one (note the timestamps):
>
>  I can reproduce it fairly easily with something
>  as simple as:
>
>	  while true; do dmesg; done

This patch turns them off on SNB while leaving them on for IVB.
Reported-by: NUdo Steinberg <udo@hypervisor.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Eugeni Dodonov <eugeni@dodonov.net>
Signed-off-by: NKeith Packard <keithp@keithp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ebbd857e

17 12月, 2011 1 次提交

drm/i915: enable semaphores on per-device defaults · f45b5557

由 Eugeni Dodonov 提交于 12月 09, 2011

This adds a default setting for semaphores parameter, and enables
semaphores by default on IVB.

For now, as semaphores interaction with VTd causes random issues on
SNB, we do not enable them by default. But they can still be enabled
via the semaphores=1 kernel parameter.

v2: enables semaphores on SNB when IO remapping is disabled, with base
on Keith Packard patch.

CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: Ben Widawsky <ben@bwidawsk.net>
CC: Keith Packard <keithp@keithp.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NKeith Packard <keithp@keithp.com>

f45b5557

22 9月, 2011 1 次提交

drm/i915: Dumb down the semaphore logic · c8c99b0f

由 Ben Widawsky 提交于 9月 14, 2011

While I think the previous code is correct, it was hard to follow and
hard to debug. Since we already have a ring abstraction, might as well
use it to handle the semaphore updates and compares.

I don't expect this code to make semaphores better or worse, but you
never know...

v2:
Remove magic per Keith's suggestions.
Ran Daniel's gem_ring_sync_loop test on this.

v3:
Ignored one of Keith's suggestions.

v4:
Removed some bloat per Daniel's recommendation.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NKeith Packard <keithp@keithp.com>

c8c99b0f

22 6月, 2011 1 次提交

Revert "drm/i915: Kill GTT mappings when moving from GTT domain" · e92d03bf

由 Eric Anholt 提交于 6月 14, 2011

This reverts commit 4a684a41.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce.  (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.
Signed-off-by: NKeith Packard <keithp@keithp.com>

e92d03bf

23 3月, 2011 1 次提交

drm/i915: Disable pagefaults along execbuffer relocation fast path · d4aeee77

由 Chris Wilson 提交于 3月 14, 2011

Along the fast path for relocation handling, we attempt to copy directly
from the user data structures whilst holding our mutex. This causes
lockdep to warn about circular lock dependencies if we need to pagefault
the user pages. [Since when handling a page fault on a mmapped bo, we
need to acquire the struct mutex whilst already holding the mm
semaphore, it is then verboten to acquire the mm semaphore when already
holding the struct mutex. The likelihood of the user passing in the
relocations contained in a GTT mmaped bo is low, but conceivable for
extreme pathology.] In order to force the mm to return EFAULT rather
than handle the pagefault, we therefore need to disable pagefaults
across the relocation fast path.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d4aeee77

07 3月, 2011 2 次提交

drm/i915: Only wait on a pending flip if we intend to write to the buffer · c59a333f

由 Chris Wilson 提交于 3月 06, 2011

... as if we are only reading from it, we can do that concurrently with
the queue flip.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

c59a333f

drm/i915: Disable GPU semaphores by default · a1656b90

由 Chris Wilson 提交于 3月 04, 2011

Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09
down to the use of GPU semaphores, and we already know that they appear
broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling
GPU semaphores is simply hiding another bug by the latency and
side-effects of the additional device interaction it introduces...)

However, use of semaphores is a massive performance improvement... Only
as long as the system remains stable. Enable at your peril.
Reported-by: NAndi Kleen <andi-fd@firstfloor.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

a1656b90

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致