提交 · 2f745ad3d3ce96cb72d3981570d9e9988442bce8 · openeuler / Kernel

20 9月, 2012 6 次提交

drm/i915: Convert the dmabuf object to use the new i915_gem_object_ops · 2f745ad3

由 Chris Wilson 提交于 9月 04, 2012

By providing a callback for when we need to bind the pages, and then
release them again later, we can shorten the amount of time we hold the
foreign pages mapped and pinned, and importantly the dmabuf objects then
behave as any other normal object with respect to the shrinker and
memory management.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2f745ad3

drm/i915: Replace the array of pages with a scatterlist · 9da3da66

由 Chris Wilson 提交于 6月 01, 2012

Rather than have multiple data structures for describing our page layout
in conjunction with the array of pages, we can migrate all users over to
a scatterlist.

One major advantage, other than unifying the page tracking structures,
this offers is that we replace the vmalloc'ed array (which can be up to
a megabyte in size) with a chain of individual pages which helps reduce
memory pressure.

The disadvantage is that we then do not have a simple array to iterate,
or to access randomly. The common case for this is in the relocation
processing, which will typically fit within a single scatterlist page
and so be almost the same cost as the simple array. For iterating over
the array, the extra function call could be optimised away, but in
reality is an insignificant cost of either binding the pages, or
performing the pwrite/pread.

v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
trivial compile error from rebasing.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9da3da66

drm/i915: Pin backing pages for pread · f60d7f0c

由 Chris Wilson 提交于 9月 04, 2012

By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
beneath us, and so we can simplify the code for iterating over the pages.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f60d7f0c

drm/i915: Pin backing pages for pwrite · 755d2218

由 Chris Wilson 提交于 9月 04, 2012

By using the recently introduced pinning of pages, we can safely drop
the mutex in the knowledge that the pages are not going to disappear
jeneath us, and so we can simplify the code for iterating over the pages.

Note: The old code had such complicated page refcounting since it used
obj->pages as a micro-optimization if it's there, but that could
(before this patch) disappear when we drop the dev->struct_mutex.
Hence some manual page refcounting was required for the slow path,
complicated by the fact that pages returned by shmem_read_mapping_page
already have a pageref, which needs to be dropped again.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
[danvet: Added note to explain the question Ben raised in review.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

755d2218

drm/i915: Pin backing pages whilst exporting through a dmabuf vmap · a5570178

由 Chris Wilson 提交于 9月 04, 2012

We need to refcount our pages in order to prevent reaping them at
inopportune times, such as when they currently vmapped or exported to
another driver. However, we also wish to keep the lazy deallocation of
our pages so we need to take a pin/unpinned approach rather than a
simple refcount.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a5570178

drm/i915: Introduce drm_i915_gem_object_ops · 37e680a1

由 Chris Wilson 提交于 6月 07, 2012

In order to specialise functions depending upon the type of object, we
can attach vfuncs to each object via a new ->ops pointer.

For instance, this will be used in future patches to only bind pages from
a dma-buf for the duration that the object is used by the GPU - and so
prevent them from pinning those pages for the entire of the object.

v2: Bonus comments.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

37e680a1

27 8月, 2012 1 次提交

drm/i915: Remove __GFP_NO_KSWAPD · d7c3b937

由 Sedat Dilek 提交于 8月 27, 2012

When I pulled-in today's drm-intel-next into linux-next (next-20120824)
I saw this build-breakage:

drivers/gpu/drm/i915/i915_gem.c: In function 'i915_gem_object_get_pages_gtt':
drivers/gpu/drm/i915/i915_gem.c:1778:40: error: '__GFP_NO_KSWAPD' undeclared (first use in this function)
drivers/gpu/drm/i915/i915_gem.c:1778:40: note: each undeclared identifier is reported only once for each function it appears in

This is caused by commit ba099ef165f8 ("mm: remove __GFP_NO_KSWAPD")
and commit b6beae2c2014 ("mm: remove __GFP_NO_KSWAPD fixes") in
linux-next (next-20120824).

Fix this by removing __GFP_NO_KSWAPD from drm/i915 driver.
Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d7c3b937

24 8月, 2012 5 次提交

drm/i915: Use a non-blocking wait for set-to-domain ioctl · 3236f57a

由 Chris Wilson 提交于 8月 24, 2012

The principal use for set-to-domain is for userspace to serialise
operations with a particular buffer, for example to maintain coherency
with a CPU map or to ratelimit its rendering by waiting on all previous
operations before continuing. As such we tend to hold the struct_mutex
for long periods during the synchronisation and so cause contention
issues with other users of the graphics device, even for independent
operations as memory management. An example is the contention between
compiz and X which causes jitter in the display and a drop in peak
throughput.

The ultimate solution would be a set of fine grained locks and lockless
operations, but an intermediate step is to first attempt the
synchronisation for set-to-domain without holding the mutex. This
introduces a number of race conditions, so we limit it use to the ioctl
periphery where we have no dependent state and can safely complete with
a locked synchronisation afterwards.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3236f57a

drm/i915: Juggle code order to ease flow of the next patch · b361237b

由 Chris Wilson 提交于 8月 24, 2012

Move the wait-for-rendering logic around in the file so that we can
group it together with the subsequent variations. The general goal is to
have the lower level routines clustered together and then the higher
level logic building upon those low level routines that came before.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b361237b

drm/i915: Extract general object init routine · 0327d6ba

由 Chris Wilson 提交于 8月 11, 2012

As we wish to create specialised object constructions in the near
future that share the same basic GEM object struct, export the default
initializer.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0327d6ba

drm/i915: Protect private gem objects from truncate (such as imported dmabuf) · 4d6294bf

由 Chris Wilson 提交于 8月 11, 2012

If the object has no backing shmemfs filp, then we obviously cannot
perform a truncation operation upon it.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4d6294bf

drm/i915: Only pwrite through the GTT if there is space in the aperture · 86a1ee26

由 Chris Wilson 提交于 8月 11, 2012

Avoid stalling and waiting for the GPU by checking to see if there is
sufficient inactive space in the aperture for us to bind the buffer
prior to writing through the GTT. If there is inadequate space we will
have to stall waiting for the GPU, and incur overheads moving objects
about. Instead, only incur the clflush overhead on the target object by
writing through shmem.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

86a1ee26

21 8月, 2012 3 次提交

drm/i915: Try harder to allocate an mmap_offset · d8cb5086

由 Chris Wilson 提交于 8月 11, 2012

Given the persistence of an offset for the lifetime of an object, itis
easy to contemplate how the mmap space becomes badly fragmented to the
point that further allocations fail with ENOSPC. Our only recourse at
this point is to try to purge the objects to release some space and
reattempt the allocation.

References: https://bugs.freedesktop.org/show_bug.cgi?id=39552Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d8cb5086

drm/i915: Add some sanity checks to unbound tracking · c4670ad0

由 Chris Wilson 提交于 8月 20, 2012

A pair of universally true checks that just need to be put in the right
place depending on where in the patch sequence you go. Note that
i915_gem_object_put_pages_gtt() already gains the
BUG_ON(obj->gtt_space), but on reflection that needed to migrate to
put_pages().
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c4670ad0

drm/i915: Track unbound pages · 6c085a72

由 Chris Wilson 提交于 8月 20, 2012

When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.

To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.

As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.

Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).

Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.

v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.

v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6c085a72

20 8月, 2012 1 次提交

drm/i915: move functions around · 225067ee

由 Daniel Vetter 提交于 8月 20, 2012

Prep work to make Chris Wilson's unbound tracking patch a bit easier
to read. Alas, I'd have preferred that moving the page allocation
retry loop from bind to get_pages would have been a separate patch,
too. But that looks like real work ;-)
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

225067ee

17 8月, 2012 1 次提交

drm/i915/contexts: fix list corruption · b6c7488d

由 Ben Widawsky 提交于 8月 14, 2012

After reset we unconditionally reinitialize lists. If the context switch
hasn't yet completed before the suspend, the default context object will
end up on lists that are going to go away when we resume.

The patch forces the context switch to be synchronous before suspend
assuring that the active/inactive tracking is correct at the time of
resume.

References: https://bugs.freedesktop.org/show_bug.cgi?id=52429Tested-by: NGuang A Yang <guang.a.yang@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b6c7488d

10 8月, 2012 1 次提交

drm/i915: Lazily apply the SNB+ seqno w/a · b2eadbc8

由 Chris Wilson 提交于 8月 09, 2012

Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.

This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.

v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b2eadbc8

26 7月, 2012 12 次提交

drm/i915: Export ability of changing cache levels to userspace · e6994aee

由 Chris Wilson 提交于 7月 10, 2012

By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e6994aee

drm/i915: Segregate memory domains in the GTT using coloring · 42d6ab48

由 Chris Wilson 提交于 7月 26, 2012

Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.

v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

42d6ab48

drm/i915: Avoid concurrent access when marking the device as idle/busy · f047e395

由 Chris Wilson 提交于 7月 21, 2012

As suggested by Daniel, rip out the independent timers for device and
crtc busyness and integrate the manual powermanagement of the display
engine into the GEM core and its request tracking. The benefits are that
the code is a lot smaller, fewer moving parts and should fit more neatly
into the overall activity tracking of the driver.

v2: Complete overhaul and removal of the racy timers and workers.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f047e395

drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs · a7b9761d