- 27 3月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
We try to avoid writing the relocations through the uncached GTT, if the buffer is currently in the CPU write domain and so will be flushed out to main memory afterwards anyway. Also on SandyBridge we can safely write to the pages in cacheable memory, so long as the buffer is LLC mapped. In either of these cases, we therefore do not need to force the reallocation of the buffer into the mappable region of the GTT, reducing the aperture pressure. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 3月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
Originally the code tried to allocate a large enough array to perform the copy using vmalloc, performance wasn't great and throughput was improved by processing each individual relocation entry separately. This too is not as efficient as one would desire. A compromise would be to allocate a single page, or to allocate a few entries on the stack, and process the copy in batches. The latter gives simpler code and more consistent performance due to a lack of heuristic. x11perf -copywinwin10: n450/pnv i3-330m i5-2520m (cpu) before: 249000 785000 1280000 (80%) page: 264000 896000 1280000 (65%) on-stack: 264000 902000 1280000 (67%) v2: Use 512-bytes of stack for batching rather than allocate a page. v3: Tidy the code slightly with more descriptive variable names Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 3月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
PIPE_CONTROL on snb needs global gtt mappings in place to workaround a hw gotcha. No other commands need such a workaround. Luckily we can detect a PIPE_CONTROL commands easily because they have a write_domain = I915_GEM_DOMAIN_INSTRUCTION (and nothing else has that). v2: Binding the target of such a reloc into the global gtt actually works instead of binding the source, which is rather pointless ... v3: Kill a superflous has_global_gtt_mapping assignement noticed by Chris Wilson. Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 2月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
This adds support to bind/unbind objects and wires it up. Objects are only put into the ppgtt when necessary, i.e. at execbuf time. Objects are still unconditionally put into the global gtt. v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind like for the global gtt function. Noticed by Chris Wilson. Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Tested-by: NChris Wilson <chris@chris-wilson.co.uk> Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 2月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
These are all user-trigerable, so tune down their loudness a notch. For some of these we have i-g-t tests (because they prevent newly-discovered bugs), without this patches running the test suite leaves behind a dirty dmesg. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 30 1月, 2012 3 次提交
-
-
由 Daniel Vetter 提交于
This confuses our domain tracking and can (for gtt write domains) lead to a subsequent oops. Tested by tests/gem_exec_bad_domains from i-g-t. Reviewed-by: NEric Anholt <eric@anholt.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
In order to correctly account for reserving space in the GTT and fences for a batch buffer, we need to independently track whether the fence is pinned due to a fenced GPU access in the batch or whether the buffer is pinned in the aperture. Currently we count the fenced as pinned if the buffer has already been seen in the execbuffer. This leads to a false accounting of available fence registers, causing frequent mass evictions. Worse, if coupled with the change to make i915_gem_object_get_fence() report EDADLK upon fence starvation, the batchbuffer can fail with only one fence required... Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Tested-by: NPaul Neumann <paul104x@yahoo.de> [danvet: Resolve the functional conflict with Jesse Barnes sprite patches, acked by Chris Wilson on irc.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
... and add a helpr function for the places where we want a flag. This way we can use ring->id to index into arrays. v2: Resurrect the missing beautification-space Chris Wilson noted. I'm moving this space around because I'll reuse ring_str in the next patch. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 1月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
Sometimes it may be the case when we idle the gpu or wait on something we don't actually want to process the retiring list. This patch allows callers to choose the behavior. Reviewed-by: NKeith Packard <keithp@keithp.com> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 1月, 2012 3 次提交
-
-
由 Eric Anholt 提交于
These registers are automatically incremented by the hardware during transform feedback to track where the next streamed vertex output should go. Unlike the previous generation, which had a packet for setting the corresponding registers to a defined value, gen7 only has MI_LOAD_REGISTER_IMM to do so. That's a secure packet (since it loads an arbitrary register), so we need to do it from the kernel, and it needs to be settable atomically with the batchbuffer execution so that two clients doing transform feedback don't stomp on each others' state. Instead of building a more complicated interface involcing setting the registers to a specific value, just set them to 0 when asked and userland can tweak its pointers accordingly. Signed-off-by: NEric Anholt <eric@anholt.net> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
由 Ben Widawsky 提交于
The docs say this is required for Gen7, and since the bit was added for Gen6, we are also setting it there pit pf paranoia. Particularly as Chris points out, if PIPE_CONTROL counts as a 3d state packet. This was found through doc inspection by Ken and applies to Gen6+; Reported-by: NKenneth Graunke <kenneth@whitecape.org> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NEric Anholt <eric@anholt.net> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
由 Ben Widawsky 提交于
dev_priv keeps track of the current addressing mode that gets set at execbuffer time. Unfortunately the existing code was doing this before acquiring struct_mutex which leaves a race with another thread also doing an execbuffer. If that wasn't bad enough, relocate_slow drops struct_mutex which opens a much more likely error where another thread comes in and modifies the state while relocate_slow is being slow. The solution here is to just defer setting this state until we absolutely need it, and we know we'll have struct_mutex for the remainder of our code path. v2: Keith noticed a bug in the original patch. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
- 27 12月, 2011 1 次提交
-
-
由 Keith Packard 提交于
Semaphores still cause problems on some machines: > From Udo Steinberg: > > With Linux-3.2-rc6 I'm frequently seeing GPU hangs when large amounts of > text scroll in an xterm, such as when extracting a tar archive. Such as this > one (note the timestamps): > > I can reproduce it fairly easily with something > as simple as: > > while true; do dmesg; done This patch turns them off on SNB while leaving them on for IVB. Reported-by: NUdo Steinberg <udo@hypervisor.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Eugeni Dodonov <eugeni@dodonov.net> Signed-off-by: NKeith Packard <keithp@keithp.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 12月, 2011 1 次提交
-
-
由 Eugeni Dodonov 提交于
This adds a default setting for semaphores parameter, and enables semaphores by default on IVB. For now, as semaphores interaction with VTd causes random issues on SNB, we do not enable them by default. But they can still be enabled via the semaphores=1 kernel parameter. v2: enables semaphores on SNB when IO remapping is disabled, with base on Keith Packard patch. CC: Daniel Vetter <daniel.vetter@ffwll.ch> CC: Ben Widawsky <ben@bwidawsk.net> CC: Keith Packard <keithp@keithp.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> CC: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
- 22 9月, 2011 1 次提交
-
-
由 Ben Widawsky 提交于
While I think the previous code is correct, it was hard to follow and hard to debug. Since we already have a ring abstraction, might as well use it to handle the semaphore updates and compares. I don't expect this code to make semaphores better or worse, but you never know... v2: Remove magic per Keith's suggestions. Ran Daniel's gem_ring_sync_loop test on this. v3: Ignored one of Keith's suggestions. v4: Removed some bloat per Daniel's recommendation. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
- 22 6月, 2011 1 次提交
-
-
由 Eric Anholt 提交于
This reverts commit 4a684a41. Userland has always been required to set the object's domain to GTT before using it through a GTT mapping, it's not something that the kernel is supposed to enforce. (The pagefault support is so that we can handle multiple mappings without userland having to pin across them, not so that userland can use GTT after GPU domains without telling the kernel). Fixes 19.2% +/- 0.8% (n=6) performance regression in cairo-gl firefox-talos-gfx on my T420 latop. Signed-off-by: NKeith Packard <keithp@keithp.com>
-
- 23 3月, 2011 1 次提交
-
-
由 Chris Wilson 提交于
Along the fast path for relocation handling, we attempt to copy directly from the user data structures whilst holding our mutex. This causes lockdep to warn about circular lock dependencies if we need to pagefault the user pages. [Since when handling a page fault on a mmapped bo, we need to acquire the struct mutex whilst already holding the mm semaphore, it is then verboten to acquire the mm semaphore when already holding the struct mutex. The likelihood of the user passing in the relocations contained in a GTT mmaped bo is low, but conceivable for extreme pathology.] In order to force the mm to return EFAULT rather than handle the pagefault, we therefore need to disable pagefaults across the relocation fast path. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 07 3月, 2011 2 次提交
-
-
由 Chris Wilson 提交于
... as if we are only reading from it, we can do that concurrently with the queue flip. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
Andi Kleen narrowed his GPU hangs on his Sugar Bay (SNB desktop) rev 09 down to the use of GPU semaphores, and we already know that they appear broken up to Huron River (mobile) rev 08. (I'm optimistic that disabling GPU semaphores is simply hiding another bug by the latency and side-effects of the additional device interaction it introduces...) However, use of semaphores is a massive performance improvement... Only as long as the system remains stable. Enable at your peril. Reported-by: NAndi Kleen <andi-fd@firstfloor.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33921Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 02 3月, 2011 2 次提交
-
-
由 Chris Wilson 提交于
This seems to be running stably on my test laptop, so hopefully the reported hangs where just symptoms of other bugs. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
Userspace has a legitimate requirement to use a delta that points to outside of the target bo, and so we need to enable this. (As this is an abi break, albeit a relaxation of the current restrictions, mark the change with a new flag.) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 22 2月, 2011 3 次提交
-
-
由 Chris Wilson 提交于
The code paths for modesetting are growing in complexity as we may need to move the buffers around in order to fit the scanout in the aperture. Therefore we face a choice as to whether to thread the interruptible status through the entire pinning and unbinding code paths or to add a flag to the device when we may not be interrupted by a signal. This does the latter and so fixes a few instances of modesetting failures under stress. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
As we just need a temporary array whilst performing the relocations for the execbuffer, first attempt to allocate using kmalloc even if it is not of order page-0. This avoids the overhead of remapping the discontiguous array and so gives a moderate boost to execution throughput. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
Dave Airlie spotted that we had a potential bug should we ever rearrange the drm_i915_gem_object so not the base drm_gem_object was not its first member. He noticed that we often convert the return of drm_gem_object_lookup() immediately into drm_i915_gem_object and then check the result for nullity. This is only valid when the base object is the first member and so the superobject has the same address. Play safe instead and use the compiler to convert back to the original return address for sanity testing. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 07 2月, 2011 1 次提交
-
-
由 Chris Wilson 提交于
A lot of minor tweaks to fix the tracepoints, improve the outputting for ftrace, and to generally make the tracepoints useful again. It is a start and enough to begin identifying performance issues and gaps in our coverage. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 23 1月, 2011 1 次提交
-
-
由 Chris Wilson 提交于
There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we were clearing I915_NUM_RINGS of them. Oops. Reported-by: NJiri Slaby <jirislaby@gmail.com> Tested-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 19 1月, 2011 1 次提交
-
-
由 Chris Wilson 提交于
Move code around and invoke iomem annotation in a few more places in order to silence sparse. Still a few more iomem annotations to go... Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 14 1月, 2011 3 次提交
-
-
由 Chris Wilson 提交于
Hopefully, this is a temporary measure whilst the root cause is understood. At the moment, we experience a hard hang whilst looping urbanterror that has been identified as a result of the use of semaphores, but so far only on SNB mobile. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32752 Tested-by: mengmeng.meng@intel.com Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
After reordering the sequence of relocating objects, commit 6fe4f140, we can no longer rely on seeing all reloc targets prior to performing the relocation. As a result we were ignoring the need to flush objects from the render cache and invalidate the sampler caches, resulting in rendering glitches. So we need to clear the relocation domains earlier. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Tested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
On the fault path, commit 6fe4f140 introduction a regression whereby it changed the sequence of the objects but continued to use the original ordering of relocation entries. The result was that incorrect GTT offsets were being fed into the execbuffer causing lots of misrendering and potential hangs. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Tested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 12 1月, 2011 5 次提交
-
-
由 Chris Wilson 提交于
As the mappable portion of the aperture is always a small subset at the start of the GTT, it is allocated preferentially by drm_mm. This is useful in case we ever need to map an object later. However, if you have a large object that can consume the entire mappable region of the GTT this prevents the batchbuffer from fitting and so causing an error. Instead allocate all those that require a mapping up front in order to improve the likelihood of finding sufficient space to bind them. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
Before releasing the lock in order to copy the relocation list from user pages, we need to drop all the object references as another thread may usurp and execute another batchbuffer before we reacquire the lock. However, the code was buggy and failed to clear the list... Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
-
由 Chris Wilson 提交于
... in order to avoid a BUG() and potential unbounded waits. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
We need to ensure that writes through the GTT land before any modification to the MMIO registers and so must impose a mandatory write barrier when flushing the GTT domain. This was revealed by relaxing the write ordering by experimentally mapping the registers and the GATT as write-combining. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 20 12月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
The relative-to-general state default is useless as it means having to rewrite the streaming kernels for each batch. Relative-to-surface is more useful, as that stream usually needs to be rewritten for each batch. And absolute addressing mode, vital if you start streaming state, is also only available by adjusting the register... Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 10 12月, 2010 2 次提交
-
-
由 Chris Wilson 提交于
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
As we provide a list of all objects that will be accessed from the batchbuffer, we can build a lut of the handles associated with those objects for this invocation and use that to avoid the overhead of looking up those objects again for every relocation. The cost of building and searching a small hash table is much less than that of acquiring a spinlock, searching a radix tree and manipulating an atomic refcnt per relocation. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 06 12月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
Userspace should not have been declaring that it needed fenced GPU access with gen4+ as those GPUs have no fenced commands, but to be on the safe side it is easier to ignore userspace in case they did. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 05 12月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
The bulk of the change is to convert the growing list of rings into an array so that the relationship between the rings and the semaphore sync registers can be easily computed. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-