提交 · c8c99b0f0dea1ced5d0e10cdb9143356cc16b484 · openeuler / Kernel

22 9月, 2011 1 次提交

drm/i915: Dumb down the semaphore logic · c8c99b0f

由 Ben Widawsky 提交于 9月 14, 2011

While I think the previous code is correct, it was hard to follow and
hard to debug. Since we already have a ring abstraction, might as well
use it to handle the semaphore updates and compares.

I don't expect this code to make semaphores better or worse, but you
never know...

v2:
Remove magic per Keith's suggestions.
Ran Daniel's gem_ring_sync_loop test on this.

v3:
Ignored one of Keith's suggestions.

v4:
Removed some bloat per Daniel's recommendation.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NKeith Packard <keithp@keithp.com>

c8c99b0f

22 6月, 2011 1 次提交

Revert "drm/i915: Kill GTT mappings when moving from GTT domain" · e92d03bf

由 Eric Anholt 提交于 6月 14, 2011

This reverts commit 4a684a41.
Userland has always been required to set the object's domain to GTT
before using it through a GTT mapping, it's not something that the
kernel is supposed to enforce.  (The pagefault support is so that we
can handle multiple mappings without userland having to pin across
them, not so that userland can use GTT after GPU domains without
telling the kernel).

Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl
firefox-talos-gfx on my T420 latop.
Signed-off-by: NKeith Packard <keithp@keithp.com>

e92d03bf

23 3月, 2011 1 次提交

drm/i915: Disable pagefaults along execbuffer relocation fast path · d4aeee77

由 Chris Wilson 提交于 3月 14, 2011

Along the fast path for relocation handling, we attempt to copy directly
from the user data structures whilst holding our mutex. This causes
lockdep to warn about circular lock dependencies if we need to pagefault
the user pages. [Since when handling a page fault on a mmapped bo, we
need to acquire the struct mutex whilst already holding the mm
semaphore, it is then verboten to acquire the mm semaphore when already
holding the struct mutex. The likelihood of the user passing in the
relocations contained in a GTT mmaped bo is low, but conceivable for
extreme pathology.] In order to force the mm to return EFAULT rather
than handle the pagefault, we therefore need to disable pagefaults
across the relocation fast path.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d4aeee77

07 3月, 2011 2 次提交

drm/i915: Only wait on a pending flip if we intend to write to the buffer · c59a333f

由 Chris Wilson 提交于 3月 06, 2011

... as if we are only reading from it, we can do that concurrently with
the queue flip.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

c59a333f

drm/i915: Disable GPU semaphores by default · a1656b90

由 Chris Wilson 提交于 3月 04, 2011

Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09
down to the use of GPU semaphores, and we already know that they appear
broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling
GPU semaphores is simply hiding another bug by the latency and
side-effects of the additional device interaction it introduces...)

However, use of semaphores is a massive performance improvement... Only
as long as the system remains stable. Enable at your peril.
Reported-by: NAndi Kleen <andi-fd@firstfloor.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

a1656b90

02 3月, 2011 2 次提交

drm/i915: Re-enable GPU semaphores for SandyBridge mobile · e8b2c3c4

由 Chris Wilson 提交于 3月 01, 2011

This seems to be running stably on my test laptop, so hopefully the
reported hangs where just symptoms of other bugs.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

e8b2c3c4

drm/i915: Allow relocation deltas outside of target bo · 271d81b8

由 Chris Wilson 提交于 3月 01, 2011

Userspace has a legitimate requirement to use a delta that points to
outside of the target bo, and so we need to enable this. (As this is an
abi break, albeit a relaxation of the current restrictions, mark the change
with a new flag.)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

271d81b8

22 2月, 2011 3 次提交

drm/i915: Use a device flag for non-interruptible phases · ce453d81

由 Chris Wilson 提交于 2月 21, 2011

The code paths for modesetting are growing in complexity as we may need
to move the buffers around in order to fit the scanout in the aperture.
Therefore we face a choice as to whether to thread the interruptible status
through the entire pinning and unbinding code paths or to add a flag to
the device when we may not be interrupted by a signal. This does the
latter and so fixes a few instances of modesetting failures under stress.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

ce453d81

drm/i915: First try a normal large kmalloc for the temporary exec buffers · 8408c282

由 Chris Wilson 提交于 2月 21, 2011

As we just need a temporary array whilst performing the relocations for
the execbuffer, first attempt to allocate using kmalloc even if it is
not of order page-0. This avoids the overhead of remapping the
discontiguous array and so gives a moderate boost to execution
throughput.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

8408c282

drm/i915: Protect against drm_gem_object not being the first member · c8725226

由 Chris Wilson 提交于 2月 19, 2011

Dave Airlie spotted that we had a potential bug should we ever rearrange
the drm_i915_gem_object so not the base drm_gem_object was not its first
member. He noticed that we often convert the return of
drm_gem_object_lookup() immediately into drm_i915_gem_object and then
check the result for nullity. This is only valid when the base object is
the first member and so the superobject has the same address. Play safe
instead and use the compiler to convert back to the original return
address for sanity testing.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

c8725226

07 2月, 2011 1 次提交

drm/i915: Refine tracepoints · db53a302

由 Chris Wilson 提交于 2月 03, 2011

A lot of minor tweaks to fix the tracepoints, improve the outputting for
ftrace, and to generally make the tracepoints useful again. It is a start
and enough to begin identifying performance issues and gaps in our
coverage.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

db53a302

23 1月, 2011 1 次提交

drm/i915: Fix use of invalid array size for ring->sync_seqno · 076e2c0e

由 Chris Wilson 提交于 1月 21, 2011

There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we
were clearing I915_NUM_RINGS of them. Oops.
Reported-by: NJiri Slaby <jirislaby@gmail.com>
Tested-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

076e2c0e

19 1月, 2011 1 次提交

drm/i915: Trivial sparse fixes · 311bd68e

由 Chris Wilson 提交于 1月 13, 2011

Move code around and invoke iomem annotation in a few more places in
order to silence sparse. Still a few more iomem annotations to go...
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

311bd68e

14 1月, 2011 3 次提交

drm/i915: Disable GPU semaphores on SandyBridge mobile · 1591192d

由 Chris Wilson 提交于 1月 14, 2011

Hopefully, this is a temporary measure whilst the root cause is
understood. At the moment, we experience a hard hang whilst looping
urbanterror that has been identified as a result of the use of
semaphores, but so far only on SNB mobile.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752
Tested-by: mengmeng.meng@intel.com
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

1591192d

drm/i915/execbuffer: Clear domains before beginning reloc processing · 595dad76

由 Chris Wilson 提交于 1月 13, 2011

After reordering the sequence of relocating objects, commit 6fe4f140,
we can no longer rely on seeing all reloc targets prior to performing
the relocation. As a result we were ignoring the need to flush objects
from the render cache and invalidate the sampler caches, resulting in
rendering glitches. So we need to clear the relocation domains earlier.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

595dad76

drm/i915/execbuffer: Reorder relocations to match new object order · dd6864a4

由 Chris Wilson 提交于 1月 12, 2011

On the fault path, commit 6fe4f140 introduction a regression whereby it
changed the sequence of the objects but continued to use the original
ordering of relocation entries. The result was that incorrect GTT offsets
were being fed into the execbuffer causing lots of misrendering and
potential hangs.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

dd6864a4

12 1月, 2011 5 次提交

drm/i915/execbuffer: Reorder binding of objects to favour restrictions · 6fe4f140

由 Chris Wilson 提交于 1月 10, 2011

As the mappable portion of the aperture is always a small subset at the
start of the GTT, it is allocated preferentially by drm_mm. This is
useful in case we ever need to map an object later. However, if you have
a large object that can consume the entire mappable region of the
GTT this prevents the batchbuffer from fitting and so causing an error.
Instead allocate all those that require a mapping up front in order to
improve the likelihood of finding sufficient space to bind them.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

6fe4f140

drm/i915/execbuffer: Correctly clear the current object list upon EFAULT · 36cf1742

由 Chris Wilson 提交于 1月 10, 2011

Before releasing the lock in order to copy the relocation list from user
pages, we need to drop all the object references as another thread may
usurp and execute another batchbuffer before we reacquire the lock.
However, the code was buggy and failed to clear the list...
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org

36cf1742

drm/i915: Propagate error from flushing the ring · 88241785

由 Chris Wilson 提交于 1月 07, 2011

... in order to avoid a BUG() and potential unbounded waits.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

88241785

C
drm/i915: Handle ringbuffer stalls when flushing · b72f3acb
由 Chris Wilson 提交于 1月 04, 2011
```
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
```
b72f3acb

drm/i915: Enforce write ordering through the GTT · 63256ec5

由 Chris Wilson 提交于 1月 04, 2011

We need to ensure that writes through the GTT land before any
modification to the MMIO registers and so must impose a mandatory write
barrier when flushing the GTT domain. This was revealed by relaxing the
write ordering by experimentally mapping the registers and the GATT as
write-combining.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

63256ec5

20 12月, 2010 1 次提交

drm/i915: Allow the application to choose the constant addressing mode · 72bfa19c

由 Chris Wilson 提交于 12月 19, 2010

The relative-to-general state default is useless as it means having to
rewrite the streaming kernels for each batch. Relative-to-surface is
more useful, as that stream usually needs to be rewritten for each
batch. And absolute addressing mode, vital if you start streaming
state, is also only available by adjusting the register...
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

72bfa19c

10 12月, 2010 2 次提交

C
drm/i915: Mark the user reloc error paths as unlikely · b8f7ab17
由 Chris Wilson 提交于 12月 08, 2010
```
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
```
b8f7ab17

drm/i915: Eliminate drm_gem_object_lookup during relocation · 67731b87

由 Chris Wilson 提交于 12月 08, 2010

As we provide a list of all objects that will be accessed from the
batchbuffer, we can build a lut of the handles associated with those
objects for this invocation and use that to avoid the overhead of
looking up those objects again for every relocation.

The cost of building and searching a small hash table is much less than
that of acquiring a spinlock, searching a radix tree and manipulating an
atomic refcnt per relocation.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

67731b87

06 12月, 2010 1 次提交

drm/i915: Ignore fenced commands for gpu access on gen4 · 9b3826bf

由 Chris Wilson 提交于 12月 05, 2010

Userspace should not have been declaring that it needed fenced GPU
access with gen4+ as those GPUs have no fenced commands, but to be on
the safe side it is easier to ignore userspace in case they did.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

9b3826bf

05 12月, 2010 1 次提交

drm/i915: Implement GPU semaphores for inter-ring synchronisation on SNB · 1ec14ad3

由 Chris Wilson 提交于 12月 04, 2010

The bulk of the change is to convert the growing list of rings into an
array so that the relationship between the rings and the semaphore sync
registers can be easily computed.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

1ec14ad3

02 12月, 2010 2 次提交

drm/i915: Pipelined fencing [infrastructure] · d9e86c0e

由 Chris Wilson 提交于 11月 10, 2010

With this change, every batchbuffer can use all available fences (save
pinned and scanout, of course) without ever stalling the gpu!

In theory. Currently the actual pipelined update of the register is
disabled due to some stability issues. However, just the deferred update
is a significant win.

Based on a series of patches by Daniel Vetter.

The premise is that before every access to a buffer through the GTT we
have to declare whether we need a register or not. If the access is by
the GPU, a pipelined update to the register is made via the ringbuffer,
and we track the last seqno of the batches that access it. If by the
CPU we wait for the last GPU access and update the register (either
to clear or to set it for the current buffer).

One advantage of being able to pipeline changes is that we can defer the
actual updating of the fence register until we first need to access the
object through the GTT, i.e. we can eliminate the stall on set_tiling.
This is important as the userspace bo cache does not track the tiling
status of active buffers which generate frequent stalls on gen3 when
enabling tiling for an already bound buffer.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d9e86c0e

C
drm/i915: Prevent stalling for a GTT read back from a read-only GPU target · 87ca9c8a
由 Chris Wilson 提交于 12月 02, 2010
```
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
```
87ca9c8a

30 11月, 2010 1 次提交

drm/i915/ringbuffer: Handle cliprects in the caller · c4e7a414

由 Chris Wilson 提交于 11月 30, 2010

This makes the various rings more consistent by removing the anomalous
handing of the rendering ring execbuffer dispatch.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

c4e7a414

28 11月, 2010 1 次提交

drm/i915/execbuffer: On error, starting unwinding from the previous object · 602606a4

由 Chris Wilson 提交于 11月 28, 2010

As the error occurred on the current object, it means that its state was
not changed and so it should be excluded from the unwind.
Reported-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

602606a4

26 11月, 2010 2 次提交

drm/i915: Avoid allocation for execbuffer object list · 432e58ed

由 Chris Wilson 提交于 11月 25, 2010

Besides the minimal improvement in reducing the execbuffer overhead, the
real benefit is clarifying a few routines.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

432e58ed

drm/i915: Split i915_gem_execbuffer into its own file. · 54cf91dc

由 Chris Wilson 提交于 11月 25, 2010

A number of dragons have been seen lurking within the execbuffer code.
The first step is then to isolate them from the rest and begin to
scrutinise them in depth. Suggested by Daniel Vetter.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

54cf91dc

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功