提交 · be7cb6347e0c3aa1956748a860a2465a7ea128c4 · openeuler / raspberrypi-kernel

22 11月, 2012 1 次提交

drm/i915: Remove bogus test for a present execbuffer · be7cb634

由 Chris Wilson 提交于 11月 19, 2012

The intention of checking obj->gtt_offset!=0 is to verify that the
target object was listed in the execbuffer and had been bound into the
GTT. This is guarranteed by the earlier rearrangement to split the
execbuffer operation into reserve and relocation phases and then
verified by the check that the target handle had been processed during
the reservation phase.

However, the actual checking of obj->gtt_offset==0 is bogus as we can
indeed reference an object at offset 0. For instance, the framebuffer
installed by the BIOS often resides at offset 0 - causing EINVAL as we
legimately try to render using the stolen fb.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

be7cb634

12 11月, 2012 1 次提交

drm/i915: Stop using AGP layer for GEN6+ · e76e9aeb

由 Ben Widawsky 提交于 11月 04, 2012

As a quick hack we make the old intel_gtt structure mutable so we can
fool a bunch of the existing code which depends on elements in that data
structure. We can/should try to remove this in a subsequent patch.

This should preserve the old gtt init behavior which upon writing these
patches seems incorrect. The next patch will fix these things.

The one exception is VLV which doesn't have the preserved flush control
write behavior. Since we want to do that for all GEN6+ stuff, we'll
handle that in a later patch. Mainstream VLV support doesn't actually
exist yet anyway.

v2: Update the comment to remove the "voodoo"
Check that the last pte written matches what we readback

v3: actually kill cache_level_to_agp_type since most of the flags will
disappear in an upcoming patch

v4: v3 was actually not what we wanted (Daniel)
Make the ggtt bind assertions better and stricter (Chris)
Fix some uncaught errors at gtt init (Chris)
Some other random stuff that Chris wanted

v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben)
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a
tad more robust by mapping everything != CACHE_NONE to the cached agp
flag - we have a 1:1 uncached mapping, but different modes of
cacheable (at least on later generations). Suggested by Chris Wilson.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e76e9aeb

18 10月, 2012 1 次提交

drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers · d7d4eedd

由 Chris Wilson 提交于 10月 17, 2012

With the introduction of per-process GTT space, the hardware designers
thought it wise to also limit the ability to write to MMIO space to only
a "secure" batch buffer. The ability to rewrite registers is the only
way to program the hardware to perform certain operations like scanline
waits (required for tear-free windowed updates). So we either have a
choice of adding an interface to perform those synchronized updates
inside the kernel, or we permit certain processes the ability to write
to the "safe" registers from within its command stream. This patch
exposes the ability to submit a SECURE batch buffer to
DRM_ROOT_ONLY|DRM_MASTER processes.

v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
bit (bit 13, accidentally not set). Also add a comment explaining why
secure batches need a global gtt binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
[danvet: added hsw fixup.]
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d7d4eedd

03 10月, 2012 2 次提交

UAPI: (Scripted) Convert #include "..." to #include <path/...> in drivers/gpu/ · 760285e7

由 David Howells 提交于 10月 02, 2012

Convert #include "..." to #include <path/...> in drivers/gpu/.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NDave Airlie <airlied@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

760285e7

UAPI: (Scripted) Remove redundant DRM UAPI header #inclusions from drivers/gpu/. · 4126d5d6

由 David Howells 提交于 10月 02, 2012

Remove redundant DRM UAPI header #inclusions from drivers/gpu/.

Remove redundant #inclusions of core DRM UAPI headers (drm.h, drm_mode.h and
drm_sarea.h).  They are now #included via drmP.h and drm_crtc.h via a preceding
patch.

Without this patch and the patch to make include the UAPI headers from the core
headers, after the UAPI split, the DRM C sources cannot find these UAPI headers
because the DRM code relies on specific -I flags to make #include "..."  work
on headers in include/drm/ - but that does not work after the UAPI split without
adding more -I flags.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NDave Airlie <airlied@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

4126d5d6

20 9月, 2012 3 次提交

drm/i915: Assert that the exec object lookup table is a power-of-two · 41783eea

由 Chris Wilson 提交于 9月 18, 2012

As we make the simplification of using a power-of-two size for the
execbuffer handle-to-object TLB, we should validate that this is actually
true and so clarify that premise.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

41783eea

drm/i915: Drop the misleading cast to the wrong user pointer type · ba7a6458

由 Chris Wilson 提交于 9月 14, 2012

The exec_list is of type drm_i915_gem_exec_object2 and so casting it to
a drm_i915_gem_relocation_entry is very confusing!
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ba7a6458

drm/i915: Replace the array of pages with a scatterlist · 9da3da66

由 Chris Wilson 提交于 6月 01, 2012

Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.

One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.

The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.

v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9da3da66

25 8月, 2012 1 次提交

drm/i915: Avoid unbinding due to an interrupted pin_and_fence during execbuffer · 7788a765

由 Chris Wilson 提交于 8月 24, 2012

If we need to stall in order to complete the pin_and_fence operation
during execbuffer reservation, there is a high likelihood that the
operation will be interrupted by a signal (thanks X!). In order to
simplify the cleanup along that error path, the object was
unconditionally unbound and the error propagated. However, being
interrupted here is far more common than I would like and so we can
strive to avoid the extra work by eliminating the forced unbind.

v2: In discussion over the indecent colour of the new functions and
unwind path, we realised that we can use the new unreserve function to
clean up the code even further.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7788a765

24 8月, 2012 2 次提交

drm/i915: Use cpu relocations if the object is in the GTT but not mappable · 504c7267

由 Chris Wilson 提交于 8月 23, 2012

This prevents the case of unbinding the object in order to process the
relocations through the GTT and then rebinding it only to then proceed
to use cpu relocations as the object is now in the CPU write domain. By
choosing to use cpu relocations up front, we can therefore avoid the
rebind penalty.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

504c7267

drm/i915: Only pwrite through the GTT if there is space in the aperture · 86a1ee26

由 Chris Wilson 提交于 8月 11, 2012

Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

86a1ee26

21 8月, 2012 1 次提交

drm/i915: Track unbound pages · 6c085a72

由 Chris Wilson 提交于 8月 20, 2012

When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.

To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.

As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.

Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).

Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.

v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.

v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6c085a72

06 8月, 2012 1 次提交

drm/i915: Don't forget to apply SNB PIPE_CONTROL GTT workaround. · e844b990

由 Eric Anholt 提交于 7月 31, 2012

If a buffer that was the target of a PIPE_CONTROL from userland was a
reused one that hadn't been evicted which had not previously had this
workaround applied, then the early return for a correct
presumed_offset in this function meant we would not bind it into the
GTT and the write would land somewhere else.

Fixes reproducible failures with GL_EXT_timer_query usage in apitrace,
and I also expect it to fix the intermittent OQ issues on snb that
danvet's been working on.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48019
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52932Signed-off-by: NEric Anholt <eric@anholt.net>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NCarl Worth <cworth@cworth.org>
Tested-by: NCarl Worth <cworth@cworth.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e844b990

26 7月, 2012 7 次提交

drm/i915: Avoid concurrent access when marking the device as idle/busy · f047e395

由 Chris Wilson 提交于 7月 21, 2012

As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.

v2: Complete overhaul and removal of the racy timers and workers.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f047e395

drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs · a7b9761d

由 Chris Wilson 提交于 7月 20, 2012

By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a7b9761d

drm/i915: Clear the pending_gpu_fenced_access flag at the start of execbuffer · 016fd0c1

由 Chris Wilson 提交于 7月 20, 2012

Otherwise once we use the buffer with a BLT command on gen2/3, we will
always regard future command submissions as continuing the fenced
access. However, now that we flush/invalidate between every batch we can
drop this pessimism.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

016fd0c1

drm/i915: Replace the complex flushing logic with simple invalidate/flush all · 6ac42f41

由 Daniel Vetter 提交于 7月 21, 2012

Now that we unconditionally flush and invalidate between every batch
buffer, we no longer need the complex logic to decide which domains
require flushing. Remove it and rejoice.

v2 (danvet): Keep around the flip waiting logic. It's gross and
broken, I know, but we can't just kill that thing ... even if we just
keep it around as a reminder that things are broken.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6ac42f41

drm/i915: Remove the per-ring write list · 69c2fc89

由 Chris Wilson 提交于 7月 20, 2012

This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

69c2fc89

drm/i915: Replace the pending_gpu_write flag with an explicit seqno · 0201f1ec

由 Chris Wilson 提交于 7月 20, 2012

As we always flush the GPU cache prior to emitting the breadcrumb, we no
longer have to worry about the deferred flush causing the
pending_gpu_write to be delayed. So we can instead utilize the known
last_write_seqno to hopefully minimise the wait times.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0201f1ec

drm/i915: Allow late allocation of request for i915_add_request() · 3bb73aba

由 Chris Wilson 提交于 7月 20, 2012

Request preallocation was added to i915_add_request() in order to
support the overlay. However, not all users care and can quite happily
ignore the failure to allocate the request as they will simply repeat
the request in the future.

By pushing the allocation down into i915_add_request(), we can then
remove some rather ugly error handling in the callers.

v2: Nullify request->file_priv otherwise we chase a garbage pointer
when retiring requests.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3bb73aba

25 7月, 2012 1 次提交

drm/i915: Set the context before setting up regs for the context. · 0da5cec1

由 Eric Anholt 提交于 7月 23, 2012

Fixes failures in transform feedback on gen7 because our SOL_RESET
flag was setting the transform feedback offsets in the old context
(occasionally happened to be ours) instead of the new context.
Signed-off-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0da5cec1

20 7月, 2012 1 次提交

drm/i915: Insert a flush between batches if the breadcrumb was dropped · 09cf7c9a

由 Chris Wilson 提交于 7月 13, 2012

If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)

Note that beside simply adding the missing flush, avoiding potential
render corruption, this will also fix at least parts of the problem
introduced by some funny interaction of these two commits:

commit de2b9985
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 4 22:52:50 2012 +0200

    drm/i915: don't return a spurious -EIO from intel_ring_begin

which allowed intel_ring_begin to return -ERESTARTSYS and

commit cc889e0f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which essentially disabled the flushing list.

The issue happens when we submit a batch & emit it, but get
interrupted (thanks to the first patch) while trying to emit the
flush. On the next batch we still assume that the full gpu domain
handling is in effect and hence compute the invalidate&flushing
domains. But thanks to the 2nd patch we totally ignore these and only
invalidate all gpu domains, presuming that any required flushes have
been issued already.  Which is wrong and eventually results in us
updating the new write_domain values with the computed
pending_write_domain values, which leaves an object with write_domain
== 0 on the gpu_write_list.

As soon as we try to unbind that object, things blow up.

Fix this by emitting the missing flush according to the new
ring->gpu_caches_dirty flag.

Note that this does _not_ fix all the current cases where we end up
with an object on the flushing_list that can't be flushed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Add bug explanation to commit message.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

09cf7c9a

20 6月, 2012 1 次提交

drm/i915: disable flushing_list/gpu_write_list · cc889e0f

由 Daniel Vetter 提交于 6月 13, 2012

This is just the minimal patch to disable all this code so that we can
do decent amounts of QA before we rip it all out.

The complicating thing is that we need to flush the gpu caches after
the batchbuffer is emitted. Which is past the point of no return where
execbuffer can't fail any more (otherwise we risk submitting the same
batch multiple times).

Hence we need to add a flag to track whether any caches associated
with that ring are dirty. And emit the flush in add_request if that's
the case.

Note that this has a quite a few behaviour changes:
- Caches get flushed/invalidated unconditionally.
- Invalidation now happens after potential inter-ring sync.

I've bantered around a bit with Chris on irc whether this fixes
anything, and it might or might not. The only thing clear is that with
these changes it's much easier to reason about correctness.

Also rip out a lone get_next_request_seqno in the execbuffer
retire_commands function. I've dug around and I couldn't figure out
why that is still there, with the outstanding lazy request stuff it
shouldn't be necessary.

v2: Chris Wilson complained that I also invalidate the read caches
when flushing after a batchbuffer. Now optimized.

v3: Added some comments to explain the new flushing behaviour.

Cc: Eric Anholt <eric@anholt.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cc889e0f

14 6月, 2012 1 次提交

drm/i915/context: switch contexts with execbuf2 · 6e0a69db

由 Ben Widawsky 提交于 6月 04, 2012

Use the rsvd1 field in execbuf2 to specify the context ID associated
with the workload. This will allow the driver to do the proper context
switch when/if needed.

v2: Add checks for context switches on rings not supporting contexts.
Before the code would silently ignore such requests.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

6e0a69db

20 5月, 2012 1 次提交

drm/i915: Check whether the ring is initialised prior to dispatch · a15817cf

由 Chris Wilson 提交于 5月 11, 2012

Rather than use the magic feature tests HAS_BLT/HAS_BSD just check
whether the ring we are about to dispatch the execbuffer on is
initialised.

v2: Use intel_ring_initialized()
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a15817cf

08 5月, 2012 1 次提交

drm/i915: Limit calling mark-busy only for potential scanouts · acb87dfb

由 Chris Wilson 提交于 5月 03, 2012

The principle of intel_mark_busy() is that we want to spot the
transition of when the display engine is being used in order to bump
powersaving modes and increase display clocks. As such it is only
important when the display is changing, i.e. when rendering to the
scanout or other sprite/plane, and these are characterised by being
pinned.

v2: Mark the whole device as busy on execbuffer and pageflips as well
and rebase against dinq for the minor bug fix to be immediately
applicable.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: fix compile fail.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

acb87dfb

03 5月, 2012 2 次提交

drm/i915: disallow clip rects on gen5+ · 6ebebc92

由 Daniel Vetter 提交于 4月 26, 2012

Unfortunately there has been dri1 userspace that used gem to manage
the gtt and hence also needed cliprects in the execbuf ioctl. So
we can't ever remove that code without breaking the ioctl abi.

But at least we can disable it on gen5+, because these horrible
versions of mesa have not supported these chips.
Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6ebebc92

drm/i915: remove do_retire from i915_wait_request · b2da9fe5

由 Ben Widawsky 提交于 4月 26, 2012

This originates from a hack by me to quickly fix a bug in an earlier
patch where we needed control over whether or not waiting on a seqno
actually did any retire list processing. Since the two operations aren't
clearly related, we should pull the parameter out of the wait function,
and make the caller responsible for retiring if the action is desired.

The only function call site which did not get an explicit retire_request call
(on purpose) is i915_gem_inactive_shrink(). That code was already calling
retire_request a second time.

v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel)
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b2da9fe5

24 4月, 2012 2 次提交

drm/i915: fix integer overflow in i915_gem_do_execbuffer() · 44afb3a0

由 Xi Wang 提交于 4月 23, 2012

On 32-bit systems, a large args->num_cliprects from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid
allocation for execbuffer object list").
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

44afb3a0

drm/i915: fix integer overflow in i915_gem_execbuffer2() · ed8cd3b2

由 Xi Wang 提交于 4月 23, 2012

On 32-bit systems, a large args->buffer_count from userspace via ioctl
may overflow the allocation size, leading to out-of-bounds access.

This vulnerability was introduced in commit 8408c282 ("drm/i915:
First try a normal large kmalloc for the temporary exec buffers").
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ed8cd3b2

18 4月, 2012 2 次提交

drm/i915: Remove the pipelined parameter from get_fence() · 06d98131

由 Chris Wilson 提交于 4月 17, 2012

We never succeeded in getting pipelined fencing to work (unresolved
spurious GPU hangs), so begin the process of dismantling and removal
the broken code.

Step 1 is the removal of the pipeline parameter to get_fence().
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

06d98131

drm/i915: Always flush tiling changes before accessing through the GTT · 7b09638f

由 Chris Wilson 提交于 4月 14, 2012

As we defer updating the fence register from set-tiling to the point of
use, we need to declare every access through the GTT as either fenced or
unfenced.

This patches fixes an old bug in the execbuffer relocation processing
which could conceivably be hit by a pathological userspace.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7b09638f

13 4月, 2012 2 次提交

drm/i915: use semaphores for the display plane · 2911a35b

由 Ben Widawsky 提交于 4月 05, 2012

In theory this will have performance and power improvements. Performance
because we don't need to stall when the scanout BO is busy, and power
because we don't have to stall when the BO is busy (and the ring can
even go to sleep if the HW supports it).

v2:
squash 2 patches into 1 (me)
un-inline the enable_semaphores function (Daniel)
remove comment about SNB hangs from i915_gem_object_sync (Chris)
rename intel_enable_semaphores to i915_semaphore_is_enabled (me)
removed page flip comment; "no why" (Chris)

To address other comments from Daniel (irc):
update the comment to say 'vt-d is crap, don't enable semaphores'
  - I think you misinterpreted Chris' comment, it already exists.
checking out whether we can pageflip on the render ring on ivb (didn't
work on early silicon)
  - We don't want to enable workarounds for early silicon unless we have
    to.
  - I can't find any references in the docs about this.
optionally use it if the fb is already busy on the render ring
  - This should be how the code already worked, unless I am
    misunderstanding your meaning.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2911a35b

drm/i915: Reorganise rules for get_fence/put_fence · 9a5a53b3

由 Chris Wilson 提交于 3月 22, 2012

By simplifying the rules to calling get_fence when writing to the
through the GTT in a tiled manner, and calling put_fence before writing
to the object through the GTT in a linear manner, the code becomes
clearer and there is less chance of making a mistake.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
[danvet: fixed up conflict with ppgtt code and spelling in a new
comment.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9a5a53b3

01 4月, 2012 1 次提交

drm/i915: Mark untiled BLT commands as fenced on gen2/3 · 7dd49065

由 Chris Wilson 提交于 3月 21, 2012

The BLT commands on gen2/3 utilize the fence registers and so we cannot
modify any fences for the object whilst those commands are in flight.
Currently we marked tiled commands as occupying a fence, but forgot to
restrict the untiled commands from preventing a fence being assigned
before they were completed.

One side-effect is that we ten have to double check that a fence was
allocated for a fenced buffer during move-to-active.
Reported-by: NJiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Testcase: i-g-t/tests/gem_tiled_after_untiled_blt
Tested-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7dd49065

27 3月, 2012 2 次提交

mm: extend prefault helpers to fault in more than PAGE_SIZE · f56f821f

由 Daniel Vetter 提交于 3月 25, 2012

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f56f821f

drm/i915: Avoid using mappable space for relocation processing through the CPU · dabdfe02

由 Chris Wilson 提交于 3月 26, 2012

We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

dabdfe02

26 3月, 2012 1 次提交

drm/i915: Batch copy_from_user for relocation processing · 1d83f442

由 Chris Wilson 提交于 3月 24, 2012

Originally the code tried to allocate a large enough array to perform
the copy using vmalloc, performance wasn't great and throughput was
improved by processing each individual relocation entry separately.
This too is not as efficient as one would desire. A compromise would be
to allocate a single page, or to allocate a few entries on the stack,
and process the copy in batches. The latter gives simpler code and more
consistent performance due to a lack of heuristic.

x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu)
before: 249000 785000 1280000 (80%)
page: 264000 896000 1280000 (65%)
on-stack: 264000 902000 1280000 (67%)

v2: Use 512-bytes of stack for batching rather than allocate a page.
v3: Tidy the code slightly with more descriptive variable names
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1d83f442

21 3月, 2012 1 次提交

drm/i915: implement SNB workaround for lazy global gtt · 149c8407

由 Daniel Vetter 提交于 2月 15, 2012

PIPE_CONTROL on snb needs global gtt mappings in place to workaround a
hw gotcha. No other commands need such a workaround. Luckily we can
detect a PIPE_CONTROL commands easily because they have a write_domain
= I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that).

v2: Binding the target of such a reloc into the global gtt actually
works instead of binding the source, which is rather pointless ...

v3: Kill a superflous has_global_gtt_mapping assignement noticed by
Chris Wilson.
Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

149c8407

10 2月, 2012 1 次提交

drm/i915: ppgtt binding/unbinding support · 7bddb01f

由 Daniel Vetter 提交于 2月 09, 2012

This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7bddb01f