- 11 11月, 2016 2 次提交
-
-
由 Tvrtko Ursulin 提交于
A small selection of macros which can only accept dev_priv from now on and a resulting trickle of fixups. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NDavid Weinehall <david.weinehall@linux.intel.com>
-
由 Tvrtko Ursulin 提交于
A small selection of macros which can only accept dev_priv from now on and a resulting trickle of fixups. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NDavid Weinehall <david.weinehall@linux.intel.com>
-
- 08 11月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
On LLC, or even snooped, machines rendering via the GPU ends up in the CPU cache. This cacheline dirt also needs to be flushed to main memory when moving to an incoherent domain, such as the display's scanout engine. Mostly, this happens because either the object is marked as dirty from its first use or is avoided by setting the object into the display domain from the start. v2: Treat WT as not requiring a clflush prior to use on the display engine as well. Fixes: 0f71979a ("drm/i915: Performed deferred clflush inside set-cache-level") References: https://bugs.freedesktop.org/show_bug.cgi?id=95414Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.0+ Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161107165204.7008-1-chris@chris-wilson.co.uk
-
- 03 11月, 2016 1 次提交
-
-
由 Joonas Lahtinen 提交于
Move has_64bit_reloc into dev_priv->info. This will make it visible in the feature listing debug output. v2: - Keep the struct member to keep GCC fragile but happy (Chris) v3: - More detailed commit message (Chris) - Include forgotten CHV and BXT (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478162386-5018-1-git-send-email-joonas.lahtinen@linux.intel.com
-
- 29 10月, 2016 5 次提交
-
-
由 Chris Wilson 提交于
In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which *i915* engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The plan is to make obtaining the backing storage for the object avoid struct_mutex (i.e. use its own locking). The first step is to update the API so that normal users only call pin/unpin whilst working on the backing storage. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We only need the active reference to keep the object alive after the handle has been deleted (so as to prevent a synchronous gem_close). Why then pay the price of a kref on every execbuf when we can insert that final active ref just in time for the handle deletion? Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We are not allowed to touch the GTT entries underneath an atomic section, as they take a rpm wakelock (which is illegal from atomic context) and in the near future acquiring the DMA address for a page within an object may sleep for an allocation. This makes the current shortcircuit in relocation_iomap() for performing a second relocation on an adjacent page illegal, and we need to release the atomic iomapping, lookup the DMA, insert it into the GTT before reentering the atomic iomap section. As it happens, this is precisely what we do on if we are using an iomapping over the full object and not just a single page and by removing the shortcut, we do the right thing. Fixes: 9c870d03 ("drm/i915: Use RPM as the barrier for controlling...") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028142756.3850-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
- 18 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
When handling execbuf relocations, we play a delicate dance with pagefault. We first try to access the user pages underneath our struct_mutex. However, if those pages were inside a GEM object, we may trigger a pagefault and deadlock as i915_gem_fault() tries to recursively acquire struct_mutex. Instead, we choose to disable pagefaulting around the copy_from_user whilst inside the struct_mutex and handle the EFAULT by falling back to a copy outside the struct_mutex. We however presumed that disabling pagefaults would be expensive. It is just an operation on the local current task. Cheap enough that we can restrict the disable/enable to the critical section around the copy, and so avoid having to handle the atomic sections within the relocation handling itself. v2: Just illustrate the broken error handling rather than argue why it is safer to ignore it, for now. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161018120251.25043-4-chris@chris-wilson.co.uk
-
- 14 10月, 2016 3 次提交
-
-
由 Michał Winiarski 提交于
We never used any invalid ptes, those were put in place for a possibility of doing gpu faults. However our batchbuffers are not restricted in length, so everything needs to be pointing to something and thus out-of-bounds is pointing to scratch. Remove the valid flag as it is always true. v2: Expand commit msg, patch reorder (Mika) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NMichał Winiarski <michal.winiarski@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1476360162-24062-1-git-send-email-michal.winiarski@intel.com
-
由 Tvrtko Ursulin 提交于
Saves 1416 bytes of .rodata strings. v2: Add parantheses around dev_priv. (Ville Syrjala) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NDavid Weinehall <david.weinehall@linux.intel.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NJani Nikula <jani.nikula@linux.intel.com> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476352990-2504-1-git-send-email-tvrtko.ursulin@linux.intel.com
-
由 Akash Goel 提交于
With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs *engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine** macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NAkash Goel <akash.goel@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
-
- 10 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
If we run out of enough aperture space to fit the entire object, we fallback to trying to insert a single page. However, if that also fails, we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means to userspace that their batch could not be fitted within the GTT.) Prior to commit e8cb909a ("drm/i915: Fallback to single page GTT mmappings for relocations") the approach is to fallback to using the slow CPU relocation path in case of iomapping failure, and that is the behaviour we need to restore. Fixes: e8cb909a ("drm/i915: Fallback to single page GTT mmappings...") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk (cherry picked from commit d7f76335) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 07 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
If we run out of enough aperture space to fit the entire object, we fallback to trying to insert a single page. However, if that also fails, we currently fail to userspace with an unexpected ENOSPC. (ENOSPC means to userspace that their batch could not be fitted within the GTT.) Prior to commit e8cb909a ("drm/i915: Fallback to single page GTT mmappings for relocations") the approach is to fallback to using the slow CPU relocation path in case of iomapping failure, and that is the behaviour we need to restore. Fixes: e8cb909a ("drm/i915: Fallback to single page GTT mmappings...") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98101Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-2-chris@chris-wilson.co.uk
-
- 04 10月, 2016 1 次提交
-
-
由 Jani Nikula 提交于
drivers/gpu/drm/i915/i915_gem_execbuffer.c:432:52: warning: incorrect type in argument 1 (different address spaces) drivers/gpu/drm/i915/i915_gem_execbuffer.c:432:52: expected void [noderef] <asn:2>*vaddr drivers/gpu/drm/i915/i915_gem_execbuffer.c:432:52: got void * drivers/gpu/drm/i915/i915_gem_execbuffer.c:477:15: warning: incorrect type in assignment (different address spaces) drivers/gpu/drm/i915/i915_gem_execbuffer.c:477:15: expected void *vaddr drivers/gpu/drm/i915/i915_gem_execbuffer.c:477:15: got void [noderef] <asn:2>* Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NJani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1475574853-4178-2-git-send-email-jani.nikula@intel.com
-
- 28 9月, 2016 1 次提交
-
-
由 Al Viro 提交于
* the only remaining callers of "short" fault-ins are just as happy with generic variants (both in lib/iov_iter.c); switch them to multipage variants, kill the "short" ones * rename the multipage variants to now available plain ones. * get rid of compat macro defining iov_iter_fault_in_multipage_readable by expanding it in its only user. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 9月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
Now that we can wait upon fences before emitting the request, it becomes trivial to wait upon any implicit fence provided by the dma-buf reservation object. To protect against failure, we force any asynchronous waits on a foreign fence to timeout after 10s - so that a stall in another driver does not permanently cripple ourselves. Still unpleasant though! Testcase: igt/prime_vgem/fence-wait Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJohn Harrison <john.c.harrison@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-21-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We are about to specialize object synchronisation to enable nonblocking execbuf submission. First we make a copy of the current object synchronisation for execbuffer. The general i915_gem_object_sync() will be removed following the removal of CS flips in the near future. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJohn Harrison <john.c.harrison@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-16-chris@chris-wilson.co.uk
-
- 01 9月, 2016 1 次提交
-
-
由 Joonas Lahtinen 提交于
Use atomic type and operands for dev_priv->mm.bsd_engine_dispatch_index to avoid one struct_mutex locking scenario. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Cc: Zhao Yakui <yakui.zhao@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1472731101-21982-1-git-send-email-joonas.lahtinen@linux.intel.com
-
- 27 8月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
With full-ppgtt, we want the user to have full control over their memory layout, with a separate instance per context. Forcing them to use a shared memory layout for !RCS not only duplicates the amount of work we have to do, but also defeats the memory segregation on offer. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20160822080350.4964-4-chris@chris-wilson.co.ukReviewed-by: NJohn Harrison <john.c.harrison@intel.com> Reviewed-by: NThomas Daniel <thomas.daniel@intel.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 8月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
If userspace is asynchronously streaming into the batch or other execobjects, we may not flush those writes along with a change in cache domain (as there is no change). Therefore those writes may end up in internal chipset buffers and not visible to the GPU upon execution. We must issue a flush command or otherwise we encounter incoherency in the batchbuffers and the GPU executing invalid commands (i.e. hanging) quite regularly. v2: Throw a paranoid wmb() into the general flush so that we remain consistent with before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841 Fixes: 1816f923 ("drm/i915: Support creation of unbound wc user...") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Tested-by: NMatti Hämäläinen <ccr@tnsp.org> Cc: stable@vger.kernel.org Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk (cherry picked from commit 600f4368) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 20 8月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
As io_mapping.h now always allocates the struct, we can avoid that allocation and extra pointer dance by embedding the struct inside drm_i915_private Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
-
- 19 8月, 2016 9 次提交
-
-
由 Chris Wilson 提交于
The single largest factor in the overhead of parsing the commands is the setup of the virtual mapping to provide a continuous block for the batch buffer. If we keep those vmappings around (against the better judgement of mm/vmalloc.c, which we offset by handwaving and looking suggestively at the shrinker) we can dramatically improve the performance of the parser for small batches (such as media workloads). Furthermore, we can use the prepare shmem read/write functions to determine how best we need to clflush the range (rather than every page of the object). The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt for the throughput on small batches. (Caching just the dst vmap and iterating over the src, doing a page by page copy is roughly 5% slower on both platforms. That may be an acceptable trade-off to eliminate one cached vmapping, and we may be able to reduce the per-page copying overhead further.) For *this* simple test case, the cmdparser is now within a factor of 2 of ideal performance. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. v2: A couple of silly drops replaced spotted by Joonas Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Our current practice is to only name the actual list (here dev_priv->fence_list) using "list", and elements upon that list are referred to as "link". Further, the lru nature is of the list and not of the node and including in the name does not disambiguate the link from anything else. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-20-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
By moving map-and-fenceable tracking from the object to the VMA, we gain fine-grained tracking and the ability to track individual fences on the VMA (subsequent patch). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-16-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As we cannot access the backing pages behind stolen objects, we should not attempt to do so for relocations. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-15-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If we cannot pin the entire object into the mappable region of the GTT, try to pin a single page instead. This is much more likely to succeed, and prevents us falling back to the clflush slow path. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-14-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
With the introduction of the reloc page cache, we are just one step away from refactoring the relocation write functions into one. Not only does it tidy the code (slightly), but it greatly simplifies the control logic much to gcc's satisfaction. v2: Add selftests to document the relationship between the clflush flags, the KMAP bit and packing into the page-aligned pointer. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-13-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When doing relocations, we have to obtain a mapping to the page containing the target address. This is either a kmap or iomap depending on GPU and its cache coherency. Neighbouring relocation entries are typically within the same page and so we can cache our kmapping between them and avoid those pesky TLB flushes. Note that there is some sleight-of-hand in how the slow relocate works as the reloc_entry_cache implies pagefaults disabled (as we are inside a kmap_atomic section). However, the slow relocate code is meant to be the fallback from the atomic fast path failing. Fortunately it works as we already have performed the copy_from_user for the relocation array (no more pagefaults there) and the kmap_atomic cache is enabled after we have waited upon an active buffer (so no more sleeping in atomic). Magic! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If userspace is asynchronously streaming into the batch or other execobjects, we may not flush those writes along with a change in cache domain (as there is no change). Therefore those writes may end up in internal chipset buffers and not visible to the GPU upon execution. We must issue a flush command or otherwise we encounter incoherency in the batchbuffers and the GPU executing invalid commands (i.e. hanging) quite regularly. v2: Throw a paranoid wmb() into the general flush so that we remain consistent with before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90841 Fixes: 1816f923 ("drm/i915: Support creation of unbound wc user...") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Tested-by: NMatti Hämäläinen <ccr@tnsp.org> Cc: stable@vger.kernel.org Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-1-chris@chris-wilson.co.uk
-
- 15 8月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searching for the relevant pin later. v2: Joonas' stylistic nitpicks, a fun rebase. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The VMA are unreferenced, they belong to the object and live until they are closed. However, if we want to use the VMA as a cookie and use it to keep the object alive, we want to hold onto a reference to the object for the lifetime of the VMA cookie. To facilitate this, add a couple of simple wrappers for managing the reference count on the object owning the VMA. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-11-git-send-email-chris@chris-wilson.co.uk
-
- 10 8月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
request->batch_obj is only set by execbuffer for the convenience of debugging hangs. By moving that operation to the callsite, we can simplify all other callers and future patches. We also move the complications of reference handling of the request->batch_obj next to where the active tracking is set up for the request. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470832906-13972-2-git-send-email-chris@chris-wilson.co.uk
-
- 05 8月, 2016 5 次提交
-
-
由 Chris Wilson 提交于
In the previous commit, we moved the obj->tiling_mode out of a bitfield and into its own integer so that we could safely use READ_ONCE(). Let us now repair some of that damage by sharing the tiling_mode with its companion, the fence stride. v2: New magic Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If the GEM objects being rendered with in this request have been exported via dma-buf to a third party, hook ourselves into the dma-buf reservation object so that the third party can serialise with our rendering via the dma-buf fences. Testcase: igt/prime_busy Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-26-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We are motivated to avoid using a bitfield for obj->active for a couple of reasons. Firstly, we wish to document our lockless read of obj->active using READ_ONCE inside i915_gem_busy_ioctl() and that requires an integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code when presented with a bitfield and that shows up high on the profiles of request tracking (mainly due to excess memory traffic as it converts the bitfield to a register and back and generates frequent AGI in the process). v2: BIT, break up a long line in compute the other engines, new paint for i915_gem_object_is_active (now i915_gem_object_get_active). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In view of adding inline functions into the intel_frontbuffer section, we first split the header into its own file so that we can integrate it more easily with kerneldoc. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confusion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. v2: Add a redundant GEM_BUG_ON(!view) to i915_gem_obj_lookup_or_create_ggtt_vma() Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
-