- 16 12月, 2014 1 次提交
-
-
由 Brad Volkin 提交于
This patch sets up all of the tracking and copying necessary to use batch pools with the command parser and dispatches the copied (shadow) batch to the hardware. After this patch, the parser is in 'enabling' mode. Note that performance takes a hit from the copy in some cases and will likely need some work. At a rough pass, the memcpy appears to be the bottleneck. Without having done a deeper analysis, two ideas that come to mind are: 1) Copy sections of the batch at a time, as they are reached by parsing. Might improve cache locality. 2) Copy only up to the userspace-supplied batch length and memset the rest of the buffer. Reduces the number of reads. v2: - Remove setting the capacity of the pool - One global pool instead of per-ring pools - Replace batch_obj with shadow_batch_obj and hook into eb->vmas - Memset any space in the shadow batch beyond what gets copied - Rebased on execlist prep refactoring v3: - Rebase on chained batch handling - Squash in setting the secure dispatch flag - Add a note about the interaction w/secure dispatch pinning - Check for request->batch_obj == NULL in i915_gem_free_request v4: - Fix read domains for shadow_batch_obj - Remove the set_to_gtt_domain call from i915_parse_cmds - ggtt_pin/unpin in the parser block to simplify error handling - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag - Remove i915_gem_batch_pool_put calls v5: - Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after the parser (danvet, from v4 0/7 feedback) Issue: VIZ-4719 Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com> Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 15 12月, 2014 1 次提交
-
-
由 Tvrtko Ursulin 提交于
Things like reliable GGTT mappings and mirrored 2d-on-3d display will need to map objects into the same address space multiple times. Added a GGTT view concept and linked it with the VMA to distinguish between multiple instances per address space. New objects and GEM functions which do not take this new view as a parameter assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the previous behaviour. This now means that objects can have multiple VMA entries so the code which assumed there will only be one also had to be modified. Alternative GGTT views are supposed to borrow DMA addresses from obj->pages which is DMA mapped on first VMA instantiation and unmapped on the last one going away. v2: * Removed per view special casing in i915_gem_ggtt_prepare / finish_object in favour of creating and destroying DMA mappings on first VMA instantiation and last VMA destruction. (Daniel Vetter) * Simplified i915_vma_unbind which does not need to count the GGTT views. (Daniel Vetter) * Also moved obj->map_and_fenceable reset under the same check. * Checkpatch cleanups. v3: * Only retire objects once the last VMA is unbound. v4: * Keep scatter-gather table for alternative views persistent for the lifetime of the VMA. * Propagate binding errors to callers and handle appropriately. v5: * Explicitly look for normal GGTT view in i915_gem_obj_bound to align usage in i915_gem_object_ggtt_unpin. (Michel Thierry) * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry) * Removed stray semi-colon in i915_gem_object_set_cache_level. For: VIZ-4544 Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NMichel Thierry <michel.thierry@intel.com> [danvet: Drop hunk from i915_gem_shrink since it's just prettification but upsets a __must_check warning.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 03 12月, 2014 4 次提交
-
-
由 John Harrison 提交于
All the code above is now using requests not seqnos so it is possible to convert the trace functions across. Note that rather than get into problematic reference counting issues, the trace code only saves the seqno and ring values from the request structure not the structure pointer itself. For: VIZ-4377 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
There is no longer any need to retrieve a seqno value from an i915_add_request() call. The calling code already knows which request structure is being processed (it can only be ring->OLR). And as the request itself is now used in preference to the basic seqno value, the latter is now redundant in this situation. For: VIZ-4377 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
The OLS value is now obsolete. Exactly the same value is guarateed to be always available as PLR->seqno. Thus it is safe to remove the OLS completely. And also to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention valid. For: VIZ-4377 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
The object structure contains the last read, write and fenced seqno values for use in syncrhonisation operations. These have now been replaced with their request structure counterparts. Note that to ensure that objects do not end up with dangling pointers, the assignments of last_*_req include reference count updates. Thus a request cannot be freed if an object is still hanging on to it for any reason. v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did not get updated when 'last_rendering_seqno' became 'last_read|write_seqno' several millenia ago. For: VIZ-4377 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 11月, 2014 1 次提交
-
-
由 Thomas Hellstrom 提交于
It happens on occasion that developers of generic user-space applications abuse the dumb buffer API to get hold of drm buffers that they can both mmap() and use for GPU acceleration, using the assumptions that dumb buffers and buffers available for GPU are a) The same type and can be aribtrarily type-casted. b) fully coherent. This patch makes the most widely used drivers warn nicely when that happens, the next step will be to fail. v2: Move drmP.h changes to drm_gem.h. Fix Radeon dumb mmap breakage. Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 20 11月, 2014 2 次提交
-
-
由 Daniel Vetter 提交于
Again just complicates gem init functions and makes a general mess out of everything. Good riddance! v2: In my enthusiasm to start removing dri1/ums crud I went overboard a bit and killed parts of hangcheck. Resurrect it. Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
由 Chris Wilson 提交于
With the deprecation of UMS, and by association DRI1, we have a tough choice when updating the ring access routines. We either rewrite the DRI1 routines blindly without testing (so likely to be broken) or take the liberty of declaring them no longer supported and remove them entirely. This takes the latter approach. v2: Also remove the DRI1 sarea updates Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Fix rebase conflicts.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 11月, 2014 1 次提交
-
-
由 Chris Wilson 提交于
Always require PIN_GLOBAL when we want a mappable offset (PIN_MAPPABLE). This causes the pin to fixup the global binding in cases were the vma was already bound (and due to the proceeding bug, we considered it to be already mappable). References: https://bugs.freedesktop.org/show_bug.cgi?id=85671Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Add WARN_ON to check that PIN_MAP implies PIN_GLOBAL as discussed on irc.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 11月, 2014 2 次提交
-
-
由 Brad Volkin 提交于
libva uses chained batch buffers in a way that the command parser can't generally handle. Fortunately, libva doesn't need to write registers from batch buffers in the way that mesa does, so this patch causes the driver to fall back to non-secure dispatch if the parser detects a chained batch buffer. Note: The 2nd hunk to munge the error code of the parser looks a bit superflous. At least until we have the batch copy code ready and can run the cmd parser in granting mode. But it isn't since we still need to let existing libva buffers pass (though not with elevated privs ofc!). Testcase: igt/gem_exec_parse/chained-batch Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com> [danvet: Add note - this confused me in review and Brad clarified things (after a few mails ...).] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Tvrtko Ursulin 提交于
If these flags are on the object level it will be more difficult to allow for multiple VMAs per object. v2: Simplification and cleanup after code review comments (Chris Wilson). Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 8月, 2014 1 次提交
-
-
由 Daniel Vetter 提交于
There's a bit a confusion since we track the global gtt, the aliasing and real ppgtt in the ctx->vm pointer. And not all callers really bother to check for the different cases and just presume that it points to a real ppgtt. Now looking closely we don't actually need ->vm to always point at an address space - the only place that cares actually has fixup code already to decide whether to look at the per-proces or the global address space. So switch to just tracking the ppgtt directly and ditch all the extraneous code. v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt. Also drop the early exit - without aliasing ppgtt we want to dump all the ppgtts of the contexts if we have full ppgtt. v3: Actually git add the compile fix. Reviewed-by: NMichel Thierry <michel.thierry@intel.com> Cc: "Thierry, Michel" <michel.thierry@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> OTC-Jira: VIZ-3724 [danvet: Resolve conflicts with execlist patches while applying.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 12 8月, 2014 1 次提交
-
-
由 Oscar Mateo 提交于
This is what i915_gem_do_execbuffer calls when it wants to execute some worload in an Execlists world. v2: Check arguments before doing stuff in intel_execlists_submission. Also, get rel_constants parsing right. Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> [danvet: Drop the chipset flush, that's pre-gen6. And appease checkpatch a bit .... again!] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 11 8月, 2014 8 次提交
-
-
由 Oscar Mateo 提交于
As suggested by Daniel Vetter. The idea, in subsequent patches, is to provide an alternative to these vfuncs for the Execlists submission mechanism. v2: Splitted into two and reordered to illustrate our intentions, instead of showing it off. Also, remove the add_request vfunc and added the stop_ring one. Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> [danvet: - Make checkpatch happy. - Be grumpy about the excessive vtable. - Ditch gt->is_ring_initialized.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Oscar Mateo 提交于
The backing objects and ringbuffers for contexts created via open fd are actually empty until the user starts sending execbuffers to them. At that point, we allocate & populate them. We do this because, at create time, we really don't know which engine is going to be used with the context later on (and we don't want to waste memory on objects that we might never use). v2: As contexts created via ioctl can only be used with the render ring, we have enough information to allocate & populate them right away. v3: Defer the creation always, even with ioctl-created contexts, as requested by Daniel Vetter. Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Even though we should not try to use 4+GiB GTTs on 32-bit systems, by using a local variable we can future proof the code whilst making it easier to read. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Appease checkpatch a bit.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Part of the pre-validation for an execbuffer call is that there is at least one object in the execlist. As we bail if we fail to lookup any object, we can be sure that after the eb_lookup_vma() there is at least one object in the vma list and so we do not need to assert. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
We have an implementation requirement that precludes the user from requesting a ggtt entry when the device is operating in ppgtt mode. Move the current check from inside the execbuffer object collation to the prevalidation phase. v2: Roll both invalid flags checks into one Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Based upon a hunk from a patch from Chris Wilson, but augmented to: - Process the batch in the full ppgtt vm so that self-relocations match again with userspace's expectations.. - Add a comment why plain pin for the global gtt binding is safe at that point. v2: Drop local bind_vm variable (Chris). v3: Explain why this works despite the lack of proper active tracking for the ggtt batch vma. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
This migrates the fence tracking onto the existing seqno infrastructure so that the later conversion to tracking via requests is simplified. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Move the decision on whether we need to have a mappable object during execbuffer to the fore and then reuse that decision by propagating the flag through to reservation. As a corollary, before doing the actual relocation through the GTT, we can make sure that we do have a GTT mapping through which to operate. Note that the key to make this work is to ditch the obj->map_and_fenceable unbind optimization - with full ppgtt it doesn't make a lot of sense any more anyway. v2: Revamp and resend to ease future patches. v3: Refresh patch rationale References: https://bugs.freedesktop.org/show_bug.cgi?id=81094Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Explain why obj->map_and_fenceable is key and split out the secure batch fix.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 8月, 2014 1 次提交
-
-
由 Dave Airlie 提交于
This reverts commit 48ba8137. Thanks to Chris: "drm_file->is_master is not synomous with having drm_file->master == drm_file->minor->master. This is because drm_file->master is the same for all drm_files of the same generation and so when there is a master, every drm_file believes itself to be the master. Confusion ensues and things go pear shaped when one file is closed and there is no master anymore." Conflicts: drivers/gpu/drm/drm_drv.c drivers/gpu/drm/drm_stub.c
-
- 05 8月, 2014 1 次提交
-
-
由 David Herrmann 提交于
The drm_file->is_master field is redundant as it's equivalent to: drm_file->master && drm_file->master == drm_file->minor->master 1) "=>" Whenever we set drm_file->is_master, we also set: drm_file->minor->master = drm_file->master; Whenever we clear drm_file->is_master, we also call: drm_master_put(&drm_file->minor->master); which implicitly clears it to NULL. 2) "<=" minor->master cannot be set if it is non-NULL. Therefore, it stays as is unless a file drops it. If minor->master is NULL, it is only set by places that also adjust drm_file->is_master. Therefore, we can safely drop is_master and replace it by an inline helper that matches: drm_file->master && drm_file->master == drm_file->minor->master Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
-
- 08 7月, 2014 2 次提交
-
-
由 Oscar Mateo 提交于
So that we isolate the legacy ringbuffer submission mechanism, which becomes a good candidate to be abstracted away. This is prep-work for Execlists (which will its own workload submission mechanism). No functional changes. Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Oscar Mateo 提交于
This is an Execlists preparatory patch, since they make context ID become an overloaded term: - In the software, it was used to distinguish which context userspace was trying to use. - In the BSpec, the term is used to describe the 20-bits long field the hardware uses to it to discriminate the contexts that are submitted to the ELSP and inform the driver about their current status (via Context Switch Interrupts and Context Status Buffers). Initially, I tried to make the different meanings converge, but it proved impossible: - The software ctx->id is per-filp, while the hardware one needs to be globally unique. - Also, we multiplex several backing states objects per intel_context, and all of them need unique HW IDs. - I tried adding a per-filp ID and then composing the HW context ID as: ctx->id + file_priv->id + ring->id, but the fact that the hardware only uses 20-bits means we have to artificially limit the number of filps or contexts the userspace can create. The ctx->user_handle renaming bits are done with this Cocci patch (plus manual frobbing of the struct declaration): @@ struct intel_context c; @@ - (c).id + c.user_handle @@ struct intel_context *c; @@ - (c)->id + c->user_handle Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and change the type to unsigned 32 bits. v2: s/handle/user_handle and change the type to uint32_t as suggested by Chris Wilson. Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1) Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 6月, 2014 1 次提交
-
-
由 Daniel Vetter 提交于
So these are the guts of the new beast. This tracks when a frontbuffer gets invalidated (due to frontbuffer rendering) and hence should be constantly scaned out, and when it's flushed again and can be compressed/one-shot-upload. Rules for flushing are simple: The frontbuffer needs one more full upload starting from the next vblank. Which means that the flushing can _only_ be called once the frontbuffer update has been latched. But this poses a problem for pageflips: We can't just delay the flushing until the pageflip is latched, since that would pose the risk that we override frontbuffer rendering that has been scheduled in-between the pageflip ioctl and the actual latching. To handle this track asynchronous invalidations (and also pageflip) state per-ring and delay any in-between flushing until the rendering has completed. And also cancel any delayed flushing if we get a new invalidation request (whether delayed or not). Also call intel_mark_fb_busy in both cases in all cases to make sure that we keep the screen at the highest refresh rate both on flips, synchronous plane updates and for frontbuffer rendering. v2: Lots of improvements Suggestions from Chris: - Move invalidate/flush in flush_*_domain and set_to_*_domain. - Drop the flush in busy_ioctl since it's redundant. Was a leftover from an earlier concept to track flips/delayed flushes. - Don't forget about the initial modeset enable/final disable. Suggested by Chris. Track flips accurately, too. Since flips complete independently of rendering we need to track pending flips in a separate mask. Again if an invalidate happens we need to cancel the evenutal flush to avoid races. v3: Provide correct header declarations for flip functions. Currently not needed outside of intel_display.c, but part of the proper interface. v4: Add proper domain management to fbcon so that the fbcon buffer is also tracked correctly. v5: Fixup locking around the fbcon set_to_gtt_domain call. v6: More comments from Chris: - Split out fbcon changes. - Drop superflous checks for potential scanout before calling intel_fb functions - we can micro-optimize this later. - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem object. We already have precedence for fb_obj in the pin_and_fence functions. v7: Clarify the semantics of the flip flush handling by renaming things a bit: - Don't go through a gem object but take the relevant frontbuffer bits directly. These functions center on the plane, the actual object is irrelevant - even a flip to the same object as already active should cause a flush. - Add a new intel_frontbuffer_flip for synchronous plane updates. It currently just calls intel_frontbuffer_flush since the implemenation differs. This way we achieve a clear split between one-shot update events on one side and frontbuffer rendering with potentially a very long delay between the invalidate and flush. Chris and I also had some discussions about mark_busy and whether it is appropriate to call from flush. But mark busy is a state which should be derived from the 3 events (invalidate, flush, flip) we now have by the users, like psr does by tracking relevant information in psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for frontbuffer) needs to have similar logic. With that the overall mark_busy in the core could be removed. v8: Only when retiring gpu buffers only flush frontbuffer bits we actually invalidated in a batch. Just for safety since before any additional usage/invalidate we should always retire current rendering. Suggested by Chris Wilson. v9: Actually use intel_frontbuffer_flip in all appropriate places. Spotted by Chris. v10: Address more comments from Chris: - Don't call _flip in set_base when the crtc is inactive, avoids redunancy in the modeset case with the initial enabling of all planes. - Add comments explaining that the initial/final plane enable/disable still has work left to do before it's fully generic. v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris. v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 6月, 2014 1 次提交
-
-
由 Ville Syrjälä 提交于
CHECK linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: warning: incorrect type in initializer (different address spaces) linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: expected struct drm_i915_gem_exec_object2 *user_exec_list linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: got void [noderef] <asn:1>* linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: warning: incorrect type in argument 1 (different address spaces) linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: expected void [noderef] <asn:1>*dst linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: got unsigned long long *<noident> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 5月, 2014 2 次提交
-
-
由 Chris Wilson 提交于
This is pure evil. Userspace, I'm looking at you SNA, repacks batch buffers on the fly after generation as they are being passed to the kernel for execution. These batches also contain self-referenced relocations as a single buffer encompasses the state commands, kernels, vertices and sampler. During generation the buffers are placed at known offsets within the full batch, and then the relocation deltas (as passed to the kernel) are tweaked as the batch is repacked into a smaller buffer. This means that userspace is passing negative relocations deltas, which subsequently wrap to large values if the batch is at a low address. The GPU hangs when it then tries to use the large value as a base for its address offsets, rather than wrapping back to the real value (as one would hope). As the GPU uses positive offsets from the base, we can treat the relocation address as the minimum address read by the GPU. For the upper bound, we trust that userspace will not read beyond the end of the buffer. So, how do we fix negative relocations from wrapping? We can either check that every relocation looks valid when we write it, and then position each object such that we prevent the offset wraparound, or we just special-case the self-referential behaviour of SNA and force all batches to be above 256k. Daniel prefers the latter approach. This fixes a GPU hang when it tries to use an address (relocation + offset) greater than the GTT size. The issue would occur quite easily with full-ppgtt as each fd gets its own VM space, so low offsets would often be handed out. However, with the rearrangement of the low GTT due to capturing the BIOS framebuffer, it is already affecting kernels 3.15 onwards. I think only IVB+ is susceptible to this bug, but the workaround should only kick in rarely, so it seems sensible to always apply it. v3: Use a bias for batch buffers to prevent small negative delta relocations from wrapping. v4 from Daniel: - s/BIAS/BATCH_OFFSET_BIAS/ - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions were growing rather cumbersome. - Add a comment to eb_get_batch explaining why we do this. - Apply the batch offset bias everywhere but mention that we've only observed it on gen7 gpus. - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch. v5: Add static to eb_get_batch, spotted by 0-day tester. Testcase: igt/gem_bad_reloc Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3) Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
We only want to modifiy a single field in the userspace view of the execbuffer command buffer, so explicitly change that rather than copy everything back again. This serves two purposes: 1. The single fields are much cheaper to copy (constant size so the copy uses special case code) and much smaller than the whole array. 2. We modify the array for internal use that need to be masked from the user. Note: We need this backported since without it the next bugfix will blow up when userspace recycles batchbuffers and relocations. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 23 5月, 2014 2 次提交
-
-
由 Oscar Mateo 提交于
Up until now, contexts had one (and only one) backing object that was used by the hardware to save/restore render ring contexts (via the MI_SET_CONTEXT command). Other rings did not have or need this, so our i915_hw_context struct had a 1:1 relationship with a a real HW context. With Logical Ring Contexts and Execlists, this is not possible anymore: all rings need a backing object, and it cannot be reused. To prepare for that, rename our contexts to the more generic term intel_context. No functional changes. Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Oscar Mateo 提交于
In the upcoming patches we plan to break the correlation between engine command streamers (a.k.a. rings) and ringbuffers, so it makes sense to refactor the code and make the change obvious. No functional changes. Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 5月, 2014 1 次提交
-
-
由 Daniel Vetter 提交于
Adding stuff at the bottom is really no how this should be done, since that's the place for ums/dri dungeons. This was added in commit a8ebba75 Author: Zhao Yakui <yakui.zhao@intel.com> Date: Thu Apr 17 10:37:40 2014 +0800 drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3 Also add a note to prevent this from happening again - people really should be less lazy and take more time to look for a good home of their new driver-global state. Cc: Imre Deak <imre.deak@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 19 5月, 2014 1 次提交
-
-
由 Chris Wilson 提交于
More fallout from commit c8725f3d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Mar 17 12:21:55 2014 +0000 drm/i915: Do not call retire_requests from wait_for_rendering is that we can completely fill all of memory using small objects, such that we exhaust the filp space, and spend all of our time evicting objects from the aperture. As such, we never fill the ring, and never trigger the last resort flushing in commit 1cf0ba14 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon May 5 09:07:33 2014 +0100 drm/i915: Flush request queue when waiting for ring space and so all the requests are left active and the objects keep that last active reference. Eventually the system comes to a halt as it runs out of memory. The impact is mainly limited to test cases as regular userspace will trigger retirement by manually checking whether an object is active. Testcase: igt/gem_lut_handle Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78724Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Tested-by: NGuo Jinxian <jinxianx.guo@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 5月, 2014 1 次提交
-
-
由 Daniel Vetter 提交于
Somehow UXA submits a completely bogus DR4 value since essentially forever. It was originally introduced in commit bade7d7d2505a10a8a7d24b084aff9742e2d6d64 Author: Eric Anholt <eric@anholt.net> Date: Fri Jun 6 14:03:25 2008 -0700 Use the DRM for submitting batchbuffers when available. and dutifully copied around ever since. Since we want to keep the general dirt catching around just special case the UXA value. This regression was introduced in commit 9cb34664 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Apr 24 08:09:11 2014 +0200 drm/i915: Catch dirt in unused execbuffer fields Comment from Chris' review: "To be fair, it is a sensible value if one supposes a Region style API to cliprects. Under that API, DR[14] define the extents of the clip region, and ((0,0), (0,0)) [DR1==DR4==0] would mean all clipped, do not draw anything." v2: Pimp commit message a bit and remove the double space. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78494 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jörg Otte <jrg.otte@gmail.com> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 05 5月, 2014 5 次提交
-
-
由 Ben Widawsky 提交于
All the rest of the code to enable this is in my branch. Without my branch, hitting > 32b offsets is impossible. The code has always "supported" 64b, but it's never actually been run of tested. This change doesn't actually fix anything. [1] I am not sure why X won't work yet. I do not get hangs or obvious errors. There are 3 fixes grouped together here. First is to remove the hardcoded 0 for the upper dword of the relocation. The next fix is to use a 64b value for target_offset. The final fix is to not directly apply target_offset to reloc->delta. reloc->delta is part of ABI, and so we cannot change it. As it stands, 32b is enough to represent everything we're interested in representing anyway. The main problem is, we cannot add greater than 32b values to it directly. [1] Almost all of intel-gpu-tools is not yet ready to test 64b relocations. There are a few places that expect 32b values for offsets and these all won't work. Cc: Rafael Barbalho <rafael.barbalho@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Previously, our code only had a 32b offset value for where the batchbuffer starts. With full PPGTT, and 64b canonical GPU address space, that is an insufficient value. The code to expand is pretty straight forward, and only one platform needs to do anything with the extra bits. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NRafael Barbalho <rafael.barbalho@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
A common issue we have is that retiring requests causes recursion through GTT manipulation or page table manipulation which we can only handle at very specific points. However, to maintain internal consistency (enforced through our sanity checks on write_domain at various points in the GEM object lifecycle) we do need to retire the object prior to marking it with a new write_domain, and also clear the write_domain for the implicit flush following a batch. Note that this then allows the unbound objects to still be on the active lists, and so care must be taken when removing objects from unbound lists (similar to the caveats we face processing the bound lists). v2: Fix i915_gem_shrink_all() to handle updated object lifetime rules, by refactoring it to call into __i915_gem_shrink(). v3: Missed an object-retire prior to changing cache domains in i915_gem_object_set_cache_leve() v4: Rebase Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Tested-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Zhao Yakui 提交于
The BDW GT3 has two independent BSD rings, which can be used to process the video commands. To be simpler, it is transparent to user-space driver/middle. Instead the kernel driver will decide which ring is to dispatch the BSD video command. As every BSD ring is powerful, it is enough to dispatch the BSD video command based on the drm fd. In such case it can play back video stream while encoding another video stream. The coarse ping-pong mechanism is used to determine which BSD ring is used to dispatch the BSD video command. V1->V2: Follow Daniel's comment and use the simple ping-pong mechanism. This is only to add the support of dual BSD rings on BDW GT3 machine. The further optimization will be considered in another patch set. V2->V3: Follow Daniel's comment to use the struct_mutext instead of atomic_t during determining which ring can be used to dispatch Video command. Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NZhao Yakui <yakui.zhao@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Zhao Yakui 提交于
Signed-off-by: NZhao Yakui <yakui.zhao@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-