- 04 9月, 2013 4 次提交
-
-
由 Chris Wilson 提交于
A follow-on to the update of the LLC coherency logic is that we can rely on the LLC being coherent with the CS for rewriting batchbuffers irrespective of their cache domain. (This should have no effect currently as all the batch buffers are expected to be I915_CACHE_LLC and so using the cpu relocation path anyway.) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
In the execbuf code we don't clean up any vmas which ended up not getting bound for code simplicity. To make sure that we don't end up creating multiple vma for the same vm kill the somewhat dangerous vma_create function and inline it into lookup_or_create. This is just a safety measure to prevent surprises in the future. Also update the somewhat confused comment in the execbuf code and clarify what kind of magic is going on with a new one. v2: Keep the function separate as requested by Chris. But give it a __ prefix for paranoia and move it tighter together with the other vma stuff. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
In order to transition more of our code over to using a VMA instead of an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up until now, we've only had a VMA when actually binding an object. The previous patch helped handle the distinction on bound vs. unbound. This patch will help us catch leaks, and other issues before we actually shuffle a bunch of stuff around. This attempts to convert all the execbuf code to speak in vmas. Since the execbuf code is very self contained it was a nice isolated conversion. The meat of the code is about turning eb_objects into eb_vma, and then wiring up the rest of the code to use vmas instead of obj, vm pairs. Unfortunately, to do this, we must move the exec_list link from the obj structure. This list is reused in the eviction code, so we must also modify the eviction code to make this work. WARNING: This patch makes an already hotly profiled path slower. The cost is unavoidable. In reply to this mail, I will attach the extra data. v2: Release table lock early, and two a 2 phase vma lookup to avoid having to use a GFP_ATOMIC. (Chris) v3: s/obj_exec_list/obj_exec_link/ Updates to address commit 6d2b8885 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 7 18:30:54 2013 +0100 drm/i915: List objects allocated from stolen memory in debugfs v4: Use obj = vma->obj for neatness in some places (Chris) need_reloc_mappable() should return false if ppgtt (Chris) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Split out prep patches. Also remove a FIXME comment which is now taken care of.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Somehow we've lost the error handling in the patch split-up between the internal and external patch. This regression has been introduced in commit 5032d871 Author: Rafael Barbalho <rafael.barbalho@intel.com> Date: Wed Aug 21 17:10:51 2013 +0100 drm/i915: Cleaning up the relocate entry function This bug is exercised by igt/gem_reloc_vs_gpu/interruptible. Cc: Rafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 23 8月, 2013 1 次提交
-
-
由 Rafael Barbalho 提交于
As the relocate entry function was getting a bit too big I've moved the code that used to use either the cpu or the gtt to for the relocation into two separate functions. Signed-off-by: NRafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 8月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
Now that we skip clflushes more often, return a boolean indicating whether the clflush was actually performed, and only if it was do the chipset flush. (Though on most of the architectures where the clflush will be skipped, the chipset flush is a no-op!) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 8月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
As mentioned in the previous commit, reads and writes from both the CPU and GPU go through the LLC. This gives us coherency between the CPU and GPU irrespective of the attribute settings either device sets. We can use to avoid having to clflush even uncached memory. Except for the scanout. The scanout resides within another functional block that does not use the LLC but reads directly from main memory. So in order to maintain coherency with the scanout, writes to uncached memory must be flushed. In order to optimize writes elsewhere, we start tracking whether an framebuffer is attached to an object. v2: Use pin_display tracking rather than fb_count (to ensure we flush cursors as well etc) and only force the clflush along explicit writes to the scanout paths (i.e. pin_to_display_plane and pwrite into scanout). v3: Force the flush after hitting the slowpath in pwrite, as after dropping the lock the object's cache domain may be invalidated. (Ville) Based on a patch by Ville Syrjälä. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 8月, 2013 3 次提交
-
-
由 Ben Widawsky 提交于
formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
In some places, we want to know if an object is bound in any address space, and not just the global GTT. This often applies when there is a single global resource (object, pages, etc.) function | reason -------------------------------------------------- i915_gem_object_is_inactive | global object i915_gem_object_put_pages | object's pages 915_gem_object_unpin | global object i915_gem_execbuffer_unreserve_object | temporary until we plumb vma pread/pwrite | see the note below Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering call - but that once only worked if the object is bound. We really should replace this with a plain wait_rendering call, which would have the upside that in pread it would be clearer that we actually only wait for oustanding gpu writes. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer Ben to replace those with wait_rendering calls.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
As alluded to in several patches, and it will be reiterated later... A VMA is an abstraction for a GEM BO bound into an address space. Therefore it stands to reason, that the existing bind, and unbind are the ones which will be the most impacted. This patch implements this, and updates all callers which weren't already updated in the series (because it was too messy). This patch represents the bulk of an earlier, larger patch. I've pulled out a bunch of things by the request of Daniel. The history is preserved for posterity with the email convention of ">" One big change from the original patch aside from a bunch of cropping is I've created an i915_vma_unbind() function. That is because we always have the VMA anyway, and doing an extra lookup is useful. There is a caveat, we retain an i915_gem_object_ggtt_unbind, for the global cases which might not talk in VMAs. > drm/i915: plumb VM into object operations > > This patch was formerly known as: > "drm/i915: Create VMAs (part 3) - plumbing" > > This patch adds a VM argument, bind/unbind, and the object > offset/size/color getters/setters. It preserves the old ggtt helper > functions because things still need, and will continue to need them. > > Some code will still need to be ported over after this. > > v2: Fix purge to pick an object and unbind all vmas > This was doable because of the global bound list change. > > v3: With the commit to actually pin/unpin pages in place, there is no > longer a need to check if unbind succeeded before calling put_pages(). > Make put_pages only BUG() after checking pin count. > > v4: Rebased on top of the new hangcheck work by Mika > plumbed eb_destroy also > Many checkpatch related fixes > > v5: Very large rebase > > v6: > Change BUG_ON to WARN_ON (Daniel) > Rename vm to ggtt in preallocate stolen, since it is always ggtt when > dealing with stolen memory. (Daniel) > list_for_each will short-circuit already (Daniel) > remove superflous space (Daniel) > Use per object list of vmas (Daniel) > Make obj_bound_any() use obj_bound for each vm (Ben) > s/bind_to_gtt/bind_to_vm/ (Ben) > > Fixed up the inactive shrinker. As Daniel noticed the code could > potentially count the same object multiple times. While it's not > possible in the current case, since 1 object can only ever be bound into > 1 address space thus far - we may as well try to get something more > future proof in place now. With a prep patch before this to switch over > to using the bound list + inactive check, we're now able to carry that > forward for every address space an object is bound into. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA in destroy".] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 06 8月, 2013 2 次提交
-
-
由 Ben Widawsky 提交于
This represents the first half of hooking up VMs to execbuf. Here we basically pass an address space all around to the different internal functions. It should be much more readable, and have less risk than the second half, which begins switching over to using VMAs instead of an obj,vm. The overall series echoes this style of, "add a VM, then make it smart later" Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Switch a BUG_ON to WARN_ON.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
To verbalize it, one can say, "pin an object into the given address space." The semantics of pinning remain the same otherwise. Certain objects will always have to be bound into the global GTT. Therefore, global GTT is a special case, and keep a special interface around for it (i915_gem_obj_ggtt_pin). v2: s/i915_gem_ggtt_pin/i915_gem_obj_ggtt_pin Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 7月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 19 7月, 2013 1 次提交
-
-
由 Xiong Zhang 提交于
prefault is stll enabled by default which prevent most of pwrite/pread/reloc from running slow path, in order to verify these slow pathes, prefault need to be disabled. Signed-off-by: NXiong Zhang <xiong.y.zhang@intel.com> [danvet: Make checkpatch happy and bikeshed the module option help text a bit.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 16 7月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
The intent of the check is made more clear if we use the proper name for 0 here. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 7月, 2013 1 次提交
-
-
由 Daniel Vetter 提交于
In kernel modeset driver mode we're in full control of the chip, always. So there's no need at all to set mm.suspended in i915_gem_idle. Hence move that out into the leavevt ioctl. Since i915_gem_idle doesn't suspend gem any more we can also drop the re-enabling for KMS in the thaw function. Also clean up the handling of mm.suspend at driver load by coalescing all the assignments. Stumbled over while reading through our resume code for unrelated reasons. v2: Shovel mm.suspended into the (newly created) ums dungeon as suggested by Chris Wilson. The plan is that once we've completely stopped relying on the register save/restore code we could shovel even that in there. v3: Improve the locking for the entervt/leavevt ioctls a bit by moving the dev->struct_mutex locking outside of i915_gem_idle. Also don't clear dev_priv->ums.mm_suspended for the kms case, we allocate it with kzalloc. Both suggested by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 7月, 2013 1 次提交
-
-
由 Ben Widawsky 提交于
Soon we want to gut a lot of our existing assumptions how many address spaces an object can live in, and in doing so, embed the drm_mm_node in the object (and later the VMA). It's possible in the future we'll want to add more getter/setter methods, but for now this is enough to enable the VMAs. v2: Reworked commit message (Ben) Added comments to the main functions (Ben) sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/*.[ch] sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/*.[ch] sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/*.[ch] sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/*.[ch] (Daniel) v3: Rebased on new reserve_node patch Changed DRM_DEBUG_KMS to actually work (will need fixing later) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 6月, 2013 2 次提交
-
-
由 Mika Kuoppala 提交于
In order to track down a batch buffer and context which caused the ring to hang, store reference to bo into the request struct. Request can also cause gpu to hang after the batch in the flush section in the ring. To detect this add start of the flush portion offset into the request. v2: Included comment about request vs batch_obj lifetimes (Chris Wilson) Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Mika Kuoppala 提交于
Only execbuffer needed all the parameters on i915_add_request(). By putting __i915_add_request behind macro, all current callsites become cleaner. Following patch will introduce a new parameter for __i915_add_request. With this patch, only the relevant callsite will reflect the change making commit smaller and easier to understand. v2: _i915_add_request as function name (Chris Wilson) v3: change name __i915_add_request and fix ordering of params (Ben Widawsky) Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 07 6月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
This is required for tracking render damage for use with FBC and will be used in subsequent patches. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 01 6月, 2013 1 次提交
-
-
由 Xiang, Haihao 提交于
A user can run batchbuffer via VEBOX ring. Signed-off-by: NXiang, Haihao <haihao.xiang@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 3月, 2013 1 次提交
-
-
由 Lauri Kasanen 提交于
ERROR: "__build_bug_on_failed" [drivers/gpu/drm/i915/i915.ko] undefined! Originally reported at http://www.gossamer-threads.com/lists/linux/kernel/1631803 FDO bug #62775 This needs to be backported to both 3.7 and 3.8 stable trees. Doesn't apply straight, but it's a quick change. Signed-off-by: NLauri Kasanen <cand@gmx.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62775 Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 3月, 2013 2 次提交
-
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 14 3月, 2013 2 次提交
-
-
由 Kees Cook 提交于
It is possible to wrap the counter used to allocate the buffer for relocation copies. This could lead to heap writing overflows. CVE-2013-0913 v3: collapse test, improve comment v2: move check into validate_exec_list Signed-off-by: NKees Cook <keescook@chromium.org> Reported-by: Pinkie Pie Cc: stable@vger.kernel.org Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Kees Cook 提交于
This clarifies the comment above the access_ok check so a missing VERIFY_READ doesn't alarm anyone. v2: - rewrote comment, thanks to Chris Wilson Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: add patch history log to commit message.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 3月, 2013 1 次提交
-
-
由 Ville Syrjälä 提交于
to_user_ptr() simply casts a pointer passed as u64 from user space to void __user * correctly. Using this lets us get rid of all the tiresome casts. The idea came from Chris Wilson <chris@chris-wilson.co.uk>. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 1月, 2013 6 次提交
-
-
由 Ben Widawsky 提交于
The purpose of the gtt structure is to help isolate our gtt specific properties from the rest of the code (in doing so it help us finish the isolation from the AGP connection). The following members are pulled out (and renamed): gtt_start gtt_total gtt_mappable_end gtt_mappable gtt_base_addr gsm The gtt structure will serve as a nice place to put gen specific gtt routines in upcoming patches. As far as what else I feel belongs in this structure: it is meant to encapsulate the GTT's physical properties. This is why I've not added fields which track various drm_mm properties, or things like gtt_mtrr (which is itself a pretty transient field). Reviewed-by: NRodrigo Vivi <rodrigo.vivi@gmail.com> [Ben modified commit messages] Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 623000.0/sec. i3-330m: 748000.0/sec to 789000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Userspace is able to hint to the kernel that its command stream and auxiliary state buffers already hold the correct presumed addresses and so the relocation process may be skipped if the kernel does not need to move any buffers in preparation for the execbuffer. Thus for the common case where the allotment of buffers is static between batches, we can avoid the overhead of individually checking the relocation entries. Note that this requires userspace to supply the domain tracking and requests for workarounds itself that would otherwise be computed based upon the relocation entries. Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 632000.0/sec. i3-330m: 748000.0/sec to 830000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> [danvet: Fixup merge conflict in userspace header due to different baseline trees.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Instead of passing around the eb-objects hashtable and a separate object list, we can include the object list into the eb-objects structure for convenience. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
The difference is that the kernel will then know that this memory will be reclaimable in the near future. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 16 1月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
In the slow path, we are forced to copy the relocations prior to acquiring the struct mutex in order to handle pagefaults. We forgo copying the new offsets back into the relocation entries in order to prevent a recursive locking bug should we trigger a pagefault whilst holding the mutex for the reservations of the execbuffer. Therefore, we need to reset the presumed_offsets just in case the objects are rebound back into their old locations after relocating for this exexbuffer - if that were to happen we would assume the relocations were valid and leave the actual pointers to the kernels dangling, instant hang. Fixes regression from commit bcf50e27 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Nov 21 22:07:12 2010 +0000 drm/i915: Handle pagefaults in execbuffer user relocations Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55984Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@fwll.ch> Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 12月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 12月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
Simply use the last write-domain set for the object in the batch, trusting userspace to have correctly flushed the caches between usage as a write target. This check dates back from the golden age of having only a single operation per batch with the kernel repeating it for each cliprect, and conflicts both with userspace trying to efficiently batch multiple operations and with reducing the kernel overhead of relocation processing. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 29 11月, 2012 2 次提交
-
-
由 Ville Syrjälä 提交于
As per Chris Wilson's suggestion make i915_gem_execbuffer_wait_for_flips() go away. This was used to stall the GPU ring while there are pending page flips involving the relevant BO. Ie. while the BO is still being scanned out by the display controller. The recommended alternative is to use the page flip events to wait for the page flips to fully complete before reusing the BO of the old front buffer. Or use more buffers. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NKristian Høgsberg <krh@bitplanet.net> Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org> [danvet: don't remove obj->pending_flips, still required due to reorder patches.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Based on the work by Mika Kuoppala, we realised that we need to handle seqno wraparound prior to committing our changes to the ring. The most obvious point then is to grab the seqno inside intel_ring_begin(), and then to reuse that seqno for all ring operations until the next request. As intel_ring_begin() can fail, the callers must already be prepared to handle such failure and so we can safely add further checks. This patch looks like it should be split up into the interface changes and the tweaks to move seqno wrapping from the execbuffer into the core seqno increment. However, I found no easy way to break it into incremental steps without introducing further broken behaviour. v2: Mika found a silly mistake and a subtle error in the existing code; inside i915_gem_retire_requests() we were resetting the sync_seqno of the target ring based on the seqno from this ring - which are only related by the order of their allocation, not retirement. Hence we were applying the optimisation that the rings were synchronised too early, fortunately the only real casualty there is the handling of seqno wrapping. v3: Do not forget to reset the sync_seqno upon module reinitialisation, ala resume. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861 Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 11月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
The intention of checking obj->gtt_offset!=0 is to verify that the target object was listed in the execbuffer and had been bound into the GTT. This is guarranteed by the earlier rearrangement to split the execbuffer operation into reserve and relocation phases and then verified by the check that the target handle had been processed during the reservation phase. However, the actual checking of obj->gtt_offset==0 is bogus as we can indeed reference an object at offset 0. For instance, the framebuffer installed by the BIOS often resides at offset 0 - causing EINVAL as we legimately try to render using the stolen fb. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NEric Anholt <eric@anholt.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 12 11月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
As a quick hack we make the old intel_gtt structure mutable so we can fool a bunch of the existing code which depends on elements in that data structure. We can/should try to remove this in a subsequent patch. This should preserve the old gtt init behavior which upon writing these patches seems incorrect. The next patch will fix these things. The one exception is VLV which doesn't have the preserved flush control write behavior. Since we want to do that for all GEN6+ stuff, we'll handle that in a later patch. Mainstream VLV support doesn't actually exist yet anyway. v2: Update the comment to remove the "voodoo" Check that the last pte written matches what we readback v3: actually kill cache_level_to_agp_type since most of the flags will disappear in an upcoming patch v4: v3 was actually not what we wanted (Daniel) Make the ggtt bind assertions better and stricter (Chris) Fix some uncaught errors at gtt init (Chris) Some other random stuff that Chris wanted v5: check for i==0 in gen6_ggtt_bind_object to shut up gcc (Ben) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by [v4]: Chris Wilson <chris@chris-wilson.co.uk> [danvet: Make the cache_level -> agp_flags conversion for pre-gen6 a tad more robust by mapping everything != CACHE_NONE to the cached agp flag - we have a 1:1 uncached mapping, but different modes of cacheable (at least on later generations). Suggested by Chris Wilson.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-