- 25 8月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
If we need to stall in order to complete the pin_and_fence operation during execbuffer reservation, there is a high likelihood that the operation will be interrupted by a signal (thanks X!). In order to simplify the cleanup along that error path, the object was unconditionally unbound and the error propagated. However, being interrupted here is far more common than I would like and so we can strive to avoid the extra work by eliminating the forced unbind. v2: In discussion over the indecent colour of the new functions and unwind path, we realised that we can use the new unreserve function to clean up the code even further. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 24 8月, 2012 2 次提交
-
-
由 Chris Wilson 提交于
This prevents the case of unbinding the object in order to process the relocations through the GTT and then rebinding it only to then proceed to use cpu relocations as the object is now in the CPU write domain. By choosing to use cpu relocations up front, we can therefore avoid the rebind penalty. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Avoid stalling and waiting for the GPU by checking to see if there is sufficient inactive space in the aperture for us to bind the buffer prior to writing through the GTT. If there is inadequate space we will have to stall waiting for the GPU, and incur overheads moving objects about. Instead, only incur the clflush overhead on the target object by writing through shmem. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 8月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
When dealing with a working set larger than the GATT, or even the mappable aperture when touching through the GTT, we end up with evicting objects only to rebind them at a new offset again later. Moving an object into and out of the GTT requires clflushing the pages, thus causing a double-clflush penalty for rebinding. To avoid having to clflush on rebinding, we can track the pages as they are evicted from the GTT and only relinquish those pages on memory pressure. As usual, if it were not for the handling of out-of-memory condition and having to manually shrink our own bo caches, it would be a net reduction of code. Alas. Note: The patch also contains a few changes to the last-hope evict_everything logic in i916_gem_execbuffer.c - we no longer try to only evict the purgeable stuff in a first try (since that's superflous and only helps in OOM corner-cases, not fragmented-gtt trashing situations). Also, the extraction of the get_pages retry loop from bind_to_gtt (and other callsites) to get_pages should imo have been a separate patch. v2: Ditch the newly added put_pages (for unbound objects only) in i915_gem_reset. A quick irc discussion hasn't revealed any important reason for this, so if we need this, I'd like to have a git blame'able explanation for it. v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Split out code movements and rant a bit in the commit message with a few Notes. Done v2] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 06 8月, 2012 1 次提交
-
-
由 Eric Anholt 提交于
If a buffer that was the target of a PIPE_CONTROL from userland was a reused one that hadn't been evicted which had not previously had this workaround applied, then the early return for a correct presumed_offset in this function meant we would not bind it into the GTT and the write would land somewhere else. Fixes reproducible failures with GL_EXT_timer_query usage in apitrace, and I also expect it to fix the intermittent OQ issues on snb that danvet's been working on. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48019 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52932Signed-off-by: NEric Anholt <eric@anholt.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NCarl Worth <cworth@cworth.org> Tested-by: NCarl Worth <cworth@cworth.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 7月, 2012 7 次提交
-
-
由 Chris Wilson 提交于
As suggested by Daniel, rip out the independent timers for device and crtc busyness and integrate the manual powermanagement of the display engine into the GEM core and its request tracking. The benefits are that the code is a lot smaller, fewer moving parts and should fit more neatly into the overall activity tracking of the driver. v2: Complete overhaul and removal of the racy timers and workers. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
By moving the function to intel_ringbuffer and currying the appropriate parameter, hopefully we make the callsites easier to read and understand. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Otherwise once we use the buffer with a BLT command on gen2/3, we will always regard future command submissions as continuing the fenced access. However, now that we flush/invalidate between every batch we can drop this pessimism. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Now that we unconditionally flush and invalidate between every batch buffer, we no longer need the complex logic to decide which domains require flushing. Remove it and rejoice. v2 (danvet): Keep around the flip waiting logic. It's gross and broken, I know, but we can't just kill that thing ... even if we just keep it around as a reminder that things are broken. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
This is now handled by a global flag to ensure we emit a flush before the next serialisation point (if we failed to queue one previously). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
As we always flush the GPU cache prior to emitting the breadcrumb, we no longer have to worry about the deferred flush causing the pending_gpu_write to be delayed. So we can instead utilize the known last_write_seqno to hopefully minimise the wait times. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Request preallocation was added to i915_add_request() in order to support the overlay. However, not all users care and can quite happily ignore the failure to allocate the request as they will simply repeat the request in the future. By pushing the allocation down into i915_add_request(), we can then remove some rather ugly error handling in the callers. v2: Nullify request->file_priv otherwise we chase a garbage pointer when retiring requests. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 25 7月, 2012 1 次提交
-
-
由 Eric Anholt 提交于
Fixes failures in transform feedback on gen7 because our SOL_RESET flag was setting the transform feedback offsets in the old context (occasionally happened to be ours) instead of the new context. Signed-off-by: NEric Anholt <eric@anholt.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 7月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
If we drop the breadcrumb request after a batch due to a signal for example we aim to fix it up at the next opportunity. In this case we emit a second batchbuffer with no waits upon the first and so no opportunity to insert the missing request, so we need to emit the missing flush for coherency. (Note that that invalidating the render cache is the same as flushing it, so there should have been no observable corruption.) Note that beside simply adding the missing flush, avoiding potential render corruption, this will also fix at least parts of the problem introduced by some funny interaction of these two commits: commit de2b9985 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 4 22:52:50 2012 +0200 drm/i915: don't return a spurious -EIO from intel_ring_begin which allowed intel_ring_begin to return -ERESTARTSYS and commit cc889e0f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list which essentially disabled the flushing list. The issue happens when we submit a batch & emit it, but get interrupted (thanks to the first patch) while trying to emit the flush. On the next batch we still assume that the full gpu domain handling is in effect and hence compute the invalidate&flushing domains. But thanks to the 2nd patch we totally ignore these and only invalidate all gpu domains, presuming that any required flushes have been issued already. Which is wrong and eventually results in us updating the new write_domain values with the computed pending_write_domain values, which leaves an object with write_domain == 0 on the gpu_write_list. As soon as we try to unbind that object, things blow up. Fix this by emitting the missing flush according to the new ring->gpu_caches_dirty flag. Note that this does _not_ fix all the current cases where we end up with an object on the flushing_list that can't be flushed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52040Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Add bug explanation to commit message.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 6月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
This is just the minimal patch to disable all this code so that we can do decent amounts of QA before we rip it all out. The complicating thing is that we need to flush the gpu caches after the batchbuffer is emitted. Which is past the point of no return where execbuffer can't fail any more (otherwise we risk submitting the same batch multiple times). Hence we need to add a flag to track whether any caches associated with that ring are dirty. And emit the flush in add_request if that's the case. Note that this has a quite a few behaviour changes: - Caches get flushed/invalidated unconditionally. - Invalidation now happens after potential inter-ring sync. I've bantered around a bit with Chris on irc whether this fixes anything, and it might or might not. The only thing clear is that with these changes it's much easier to reason about correctness. Also rip out a lone get_next_request_seqno in the execbuffer retire_commands function. I've dug around and I couldn't figure out why that is still there, with the outstanding lazy request stuff it shouldn't be necessary. v2: Chris Wilson complained that I also invalidate the read caches when flushing after a batchbuffer. Now optimized. v3: Added some comments to explain the new flushing behaviour. Cc: Eric Anholt <eric@anholt.net> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 14 6月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
Use the rsvd1 field in execbuf2 to specify the context ID associated with the workload. This will allow the driver to do the proper context switch when/if needed. v2: Add checks for context switches on rings not supporting contexts. Before the code would silently ignore such requests. Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
- 20 5月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
Rather than use the magic feature tests HAS_BLT/HAS_BSD just check whether the ring we are about to dispatch the execbuffer on is initialised. v2: Use intel_ring_initialized() Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 5月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
The principle of intel_mark_busy() is that we want to spot the transition of when the display engine is being used in order to bump powersaving modes and increase display clocks. As such it is only important when the display is changing, i.e. when rendering to the scanout or other sprite/plane, and these are characterised by being pinned. v2: Mark the whole device as busy on execbuffer and pageflips as well and rebase against dinq for the minor bug fix to be immediately applicable. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: fix compile fail.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 03 5月, 2012 2 次提交
-
-
由 Daniel Vetter 提交于
Unfortunately there has been dri1 userspace that used gem to manage the gtt and hence also needed cliprects in the execbuf ioctl. So we can't ever remove that code without breaking the ioctl abi. But at least we can disable it on gen5+, because these horrible versions of mesa have not supported these chips. Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
This originates from a hack by me to quickly fix a bug in an earlier patch where we needed control over whether or not waiting on a seqno actually did any retire list processing. Since the two operations aren't clearly related, we should pull the parameter out of the wait function, and make the caller responsible for retiring if the action is desired. The only function call site which did not get an explicit retire_request call (on purpose) is i915_gem_inactive_shrink(). That code was already calling retire_request a second time. v2: don't modify any behavior excepit i915_gem_inactive_shrink(Daniel) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 24 4月, 2012 2 次提交
-
-
由 Xi Wang 提交于
On 32-bit systems, a large args->num_cliprects from userspace via ioctl may overflow the allocation size, leading to out-of-bounds access. This vulnerability was introduced in commit 432e58ed ("drm/i915: Avoid allocation for execbuffer object list"). Signed-off-by: NXi Wang <xi.wang@gmail.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Xi Wang 提交于
On 32-bit systems, a large args->buffer_count from userspace via ioctl may overflow the allocation size, leading to out-of-bounds access. This vulnerability was introduced in commit 8408c282 ("drm/i915: First try a normal large kmalloc for the temporary exec buffers"). Signed-off-by: NXi Wang <xi.wang@gmail.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 4月, 2012 2 次提交
-
-
由 Chris Wilson 提交于
We never succeeded in getting pipelined fencing to work (unresolved spurious GPU hangs), so begin the process of dismantling and removal the broken code. Step 1 is the removal of the pipeline parameter to get_fence(). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
As we defer updating the fence register from set-tiling to the point of use, we need to declare every access through the GTT as either fenced or unfenced. This patches fixes an old bug in the execbuffer relocation processing which could conceivably be hit by a pathological userspace. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 4月, 2012 2 次提交
-
-
由 Ben Widawsky 提交于
In theory this will have performance and power improvements. Performance because we don't need to stall when the scanout BO is busy, and power because we don't have to stall when the BO is busy (and the ring can even go to sleep if the HW supports it). v2: squash 2 patches into 1 (me) un-inline the enable_semaphores function (Daniel) remove comment about SNB hangs from i915_gem_object_sync (Chris) rename intel_enable_semaphores to i915_semaphore_is_enabled (me) removed page flip comment; "no why" (Chris) To address other comments from Daniel (irc): update the comment to say 'vt-d is crap, don't enable semaphores' - I think you misinterpreted Chris' comment, it already exists. checking out whether we can pageflip on the render ring on ivb (didn't work on early silicon) - We don't want to enable workarounds for early silicon unless we have to. - I can't find any references in the docs about this. optionally use it if the fb is already busy on the render ring - This should be how the code already worked, unless I am misunderstanding your meaning. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
By simplifying the rules to calling get_fence when writing to the through the GTT in a tiled manner, and calling put_fence before writing to the object through the GTT in a linear manner, the code becomes clearer and there is less chance of making a mistake. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> [danvet: fixed up conflict with ppgtt code and spelling in a new comment.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 01 4月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
The BLT commands on gen2/3 utilize the fence registers and so we cannot modify any fences for the object whilst those commands are in flight. Currently we marked tiled commands as occupying a fence, but forgot to restrict the untiled commands from preventing a fence being assigned before they were completed. One side-effect is that we ten have to double check that a fence was allocated for a fenced buffer during move-to-active. Reported-by: NJiri Slaby <jirislaby@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43427 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47990Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Testcase: i-g-t/tests/gem_tiled_after_untiled_blt Tested-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 3月, 2012 2 次提交
-
-
由 Daniel Vetter 提交于
drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 3月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
Originally the code tried to allocate a large enough array to perform the copy using vmalloc, performance wasn't great and throughput was improved by processing each individual relocation entry separately. This too is not as efficient as one would desire. A compromise would be to allocate a single page, or to allocate a few entries on the stack, and process the copy in batches. The latter gives simpler code and more consistent performance due to a lack of heuristic. x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu) before: 249000 785000 1280000 (80%) page: 264000 896000 1280000 (65%) on-stack: 264000 902000 1280000 (67%) v2: Use 512-bytes of stack for batching rather than allocate a page. v3: Tidy the code slightly with more descriptive variable names Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 3月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
PIPE_CONTROL on snb needs global gtt mappings in place to workaround a hw gotcha. No other commands need such a workaround. Luckily we can detect a PIPE_CONTROL commands easily because they have a write_domain = I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that). v2: Binding the target of such a reloc into the global gtt actually works instead of binding the source, which is rather pointless ... v3: Kill a superflous has_global_gtt_mapping assignement noticed by Chris Wilson. Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 2月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Tested-by: NChris Wilson <chris@chris-wilson.co.uk> Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 2月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
These are all user-trigerable, so tune down their loudness a notch. For some of these we have i-g-t tests (because they prevent newly-discovered bugs), without this patches running the test suite leaves behind a dirty dmesg. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 30 1月, 2012 3 次提交
-
-
由 Daniel Vetter 提交于
This confuses our domain tracking and can (for gtt write domains) lead to a subsequent oops. Tested by tests/gem_exec_bad_domains from i-g-t. Reviewed-by: NEric Anholt <eric@anholt.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
In order to correctly account for reserving space in the GTT and fences for a batch buffer, we need to independently track whether the fence is pinned due to a fenced GPU access in the batch or whether the buffer is pinned in the aperture. Currently we count the fenced as pinned if the buffer has already been seen in the execbuffer. This leads to a false accounting of available fence registers, causing frequent mass evictions. Worse, if coupled with the change to make i915_gem_object_get_fence() report EDADLK upon fence starvation, the batchbuffer can fail with only one fence required... Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Tested-by: NPaul Neumann <paul104x@yahoo.de> [danvet: Resolve the functional conflict with Jesse Barnes sprite patches, acked by Chris Wilson on irc.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
... and add a helpr function for the places where we want a flag. This way we can use ring->id to index into arrays. v2: Resurrect the missing beautification-space Chris Wilson noted. I'm moving this space around because I'll reuse ring_str in the next patch. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 1月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
Sometimes it may be the case when we idle the gpu or wait on something we don't actually want to process the retiring list. This patch allows callers to choose the behavior. Reviewed-by: NKeith Packard <keithp@keithp.com> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 1月, 2012 3 次提交
-
-
由 Eric Anholt 提交于
These registers are automatically incremented by the hardware during transform feedback to track where the next streamed vertex output should go. Unlike the previous generation, which had a packet for setting the corresponding registers to a defined value, gen7 only has MI_LOAD_REGISTER_IMM to do so. That's a secure packet (since it loads an arbitrary register), so we need to do it from the kernel, and it needs to be settable atomically with the batchbuffer execution so that two clients doing transform feedback don't stomp on each others' state. Instead of building a more complicated interface involcing setting the registers to a specific value, just set them to 0 when asked and userland can tweak its pointers accordingly. Signed-off-by: NEric Anholt <eric@anholt.net> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
由 Ben Widawsky 提交于
The docs say this is required for Gen7, and since the bit was added for Gen6, we are also setting it there pit pf paranoia. Particularly as Chris points out, if PIPE_CONTROL counts as a 3d state packet. This was found through doc inspection by Ken and applies to Gen6+; Reported-by: NKenneth Graunke <kenneth@whitecape.org> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NEric Anholt <eric@anholt.net> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
由 Ben Widawsky 提交于
dev_priv keeps track of the current addressing mode that gets set at execbuffer time. Unfortunately the existing code was doing this before acquiring struct_mutex which leaves a race with another thread also doing an execbuffer. If that wasn't bad enough, relocate_slow drops struct_mutex which opens a much more likely error where another thread comes in and modifies the state while relocate_slow is being slow. The solution here is to just defer setting this state until we absolutely need it, and we know we'll have struct_mutex for the remainder of our code path. v2: Keith noticed a bug in the original patch. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NKeith Packard <keithp@keithp.com>
-