- 29 3月, 2017 1 次提交
-
-
由 Lucas Stach 提交于
Make sure the GPU lock is taken, so that fence completion order matches seqno order. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
- 02 2月, 2017 3 次提交
-
-
由 Lucas Stach 提交于
There are 3 big benefits to suballocating a single big DMA buffer for command submission: 1. Avoid hammering CMA. The old way of allocating and freeing a DMA buffer for each submission was hitting some of the real slow pathes in CMA, as this allocator was not designed for a concurrent small buffers load. 2. Less TLB flushes on IOMMUv2. If a new command buffer is mapped into the GPU address space the MMU TLBs need to be flushed. By having one big buffer statically mapped to the GPU, a lot of those flushes can be avoided. 3. No funky workarounds for GC3000. The FE TLB flush on GC3000 isn't reliable. To work around that we tried to lay out the cmdbufs in the GPU address space in a way to avoid this issue. This hasn't always worked if the address space is crowded. A single statically mapped buffer avoids the erratum completely. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Reviewed-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
-
由 Lucas Stach 提交于
Don't call the IOMMU directly, but go through the new cmdbuf abstraction. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Reviewed-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
-
由 Lucas Stach 提交于
This will get more complex with the following changes, so move it into its own place. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Reviewed-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
-
- 30 1月, 2017 1 次提交
-
-
由 Wladimir J. van der Laan 提交于
Set up the PULSE_EATER register (0x0010C) in etnaviv_gpu_hw_init. This ports three mostly undocumented model/revision-specific register overrides from the Vivante kernel driver. This is relevant as at least the "disable internal DFS" for revisions > 0x5420 has shown to have a huge impact on shader performance (sped up memory read performance by 7.5x and write performance by 1.5x) on an affected GPU. Signed-off-by: NWladimir J. van der Laan <laanwj@gmail.com> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
- 03 12月, 2016 1 次提交
-
-
由 Lucas Stach 提交于
On i.MX6SX the physical memory is placed above the 2GB mark, so the GPU linear window has to be moved for the GPU to work at all. This doesn't mix with the FAST_CLEAR feature, as the TS unit doesn't take the linear window offset into account and will corrupt memory when used with a non-zero offset. Move the linear window if it's necessary for the GPU to work, but avoid announcing FAST_CLEAR support to userspace in this case. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Tested-by: NMarek Vasut <marex@denx.de>
-
- 25 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: NSumit Semwal <sumit.semwal@linaro.org> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
-
- 15 9月, 2016 16 次提交
-
-
由 Lucas Stach 提交于
If we reset the GPU to get it back into a usable state we lose all context, not just the MMU one. Mark the whole context as lost to trigger a restore of the exec and MMU state. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
GC2000+ on the i.MX6QP is just a re-branded GC3000, lets call it by its real name to avoid confusion in other parts of the driver. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
Bit 30 of the interrupt status signals an MMU exception. Handle this condition properly and dump some useful registers. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
With MMUv2 all buffers need to be mapped through the MMU once it is enabled. Align the buffer size to 4K, as the MMU is only able to map page aligned buffers. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
Split out into a new externally visible function, as the IOMMUv2 code needs this functionality, too. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
Split out into a new externally visible function, as the IOMMUv2 code needs this functionality, too. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
The GPU virtual address for the command buffers differs depending on the IOMMU version. Move the calculation of the iova into etnaviv mmu, to enable proper dispatch. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
The GPU code doesn't need to deal with the IOMMU directly, instead it can all be hidden behind the etnaviv mmu interface. Move the last remaining part into etnaviv mmu. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
So we can call the v2 restore code once it is there. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
It is only relevant for the V1 MMU, so we should not do this in the common code. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
This function has external visibility and only handles the Vivant IOMMU version 1. Rename to make this more clear and allow a clear separation of the different IOMMU versions. Also drop the domain parameter, as we can infer it from the GPU we are dealing with. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
There is no linear window on MMUv2 and the FE can access the full 4GB address space either directly (as long as the MMU isn't configured) or through the MMU, once it is up. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
The driver doesn't ever enable individual clocks alone, so there is no need to scatter the clock enable/disable sequences through multiple functions. Fold them into the top one. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Fabio Estevam 提交于
There is no need to initialize variable 'err' with 0 because it will be properly assigned later on. Signed-off-by: NFabio Estevam <festevam@gmail.com>
-
由 Fabio Estevam 提交于
In the etnaviv_gpu_platform_probe() error path the 'fail' label is used to just return the error code. This can be simplified by returning the error code immediately, so get rid of the unneeded 'fail' label. Signed-off-by: NFabio Estevam <festevam@gmail.com>
-
由 Fabio Estevam 提交于
clk_prepare_enable() may fail, so we should better check for its return value and propagate it in the case of failure. Signed-off-by: NFabio Estevam <festevam@gmail.com>
-
- 15 8月, 2016 1 次提交
-
-
由 Lucas Stach 提交于
Both the fence and event alloc are safe to be done without holding the GPU lock, as they either don't need any locking (fences) or are protected by their own lock (events). This solves a bad locking interaction between the submit path and the recover worker. If userspace manages to exhaust all available events while the GPU is hung, the submit will wait for events to become available holding the GPU lock. The recover worker waits for this lock to become available before trying to recover the GPU which frees up the allocated events. Essentially both paths are deadlocked until the submit path times out waiting for available events, failing the submit that could otherwise be handled just fine if the recover worker had the chance to bring the GPU back in a working state. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Reviewed-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
-
- 05 7月, 2016 2 次提交
-
-
由 Lucas Stach 提交于
Print error messages that mention the exact cause of the failure on all paths which may fail the GPU init. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Reviewed-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
-
由 Russell King 提交于
Enable GPU module level hardware clock gating, using the conditions found in the galcore v5 driver. v2 lst: Split out clock gating enable into separate function, as there might be more conditions needed for new hardware. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: NChristian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
- 06 5月, 2016 1 次提交
-
-
由 Lucas Stach 提交于
The hangcheck handler is already running with very coarse timeouts, so it doesn't hurt to combine this timer with other wakeups in the system. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
- 28 4月, 2016 1 次提交
-
-
由 Masanari Iida 提交于
This patch fix spelling typos in printk from various part of the codes. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 21 4月, 2016 1 次提交
-
-
由 Lucas Stach 提交于
On cores with MC1.0 the memory window offset is not properly respected by all engines in the core, leading to different views of the memory if the offset in non-zero. This causes relocs for those engines to be wrong and might lead to other subtile problems. Rather than trying to work around this, just disable the linear memory window offset for those cores. Suggested-by: NRussell King <linux@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
- 09 3月, 2016 1 次提交
-
-
由 Luis R. Rodriguez 提交于
Rename dma_*_writecombine() to dma_*_wc(), so that the naming is coherent across the various write-combining APIs. Keep the old names for compatibility for a while, these can be removed at a later time. A guard is left to enable backporting of the rename, and later remove of the old mapping defines seemlessly. Build tested successfully with allmodconfig. The following Coccinelle SmPL patch was used for this simple transformation: @ rename_dma_alloc_writecombine @ expression dev, size, dma_addr, gfp; @@ -dma_alloc_writecombine(dev, size, dma_addr, gfp) +dma_alloc_wc(dev, size, dma_addr, gfp) @ rename_dma_free_writecombine @ expression dev, size, cpu_addr, dma_addr; @@ -dma_free_writecombine(dev, size, cpu_addr, dma_addr) +dma_free_wc(dev, size, cpu_addr, dma_addr) @ rename_dma_mmap_writecombine @ expression dev, vma, cpu_addr, dma_addr, size; @@ -dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size) +dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size) We also keep the old names as compatibility helpers, and guard against their definition to make backporting easier. Generated-by: Coccinelle SmPL Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bhelgaas@google.com Cc: bp@suse.de Cc: dan.j.williams@intel.com Cc: daniel.vetter@ffwll.ch Cc: dhowells@redhat.com Cc: julia.lawall@lip6.fr Cc: konrad.wilk@oracle.com Cc: linux-fbdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: luto@amacapital.net Cc: mst@redhat.com Cc: tomi.valkeinen@ti.com Cc: toshi.kani@hp.com Cc: vinod.koul@intel.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1453516462-4844-1-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 3月, 2016 4 次提交
-
-
由 Russell King 提交于
Currently, we scan the list of mappings each time we want to operate on the vram_mapping struct. Rather than repeatedly scanning these, look them up once in the submission path, and then use _reference and _unreference methods as necessary to manage this object. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Russell King 提交于
Add tracking of the current execution state (iow, active GPU pipe). Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
If the end of the system DMA window is farther away from the start of physical RAM than the size of the GPU linear window, move the linear window so that it ends at the same address than the system DMA window. This allows to map command buffer from CMA, which is likely to reside at the end of the system DMA window, while also overlapping as much RAM as possible, in order to optimize regular buffer mappings through the linear window. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Lucas Stach 提交于
The retire worker is kicked for each fence, either the normal way by signaling the fence from the event completion interrupt or by the recover worker if the GPU got stuck. Moving the RPM put into the retire worker allows us to have it in a single place for both cases. This also shaves off quite a bit of the CPU time spent in hardirq context, as arming the autosuspend timer when the RPM refcount drops to 0 is a relatively costly operation. Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
- 27 1月, 2016 6 次提交
-
-
由 Lucas Stach 提交于
Plug in error handling to free any allocated ressources in the IOMMU init path. Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Russell King 提交于
Export further minor feature bitmasks and the varyings count from the GPU specifications registers to userspace. Acked-by: NChristian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Russell King 提交于
Add and use a helper for comparing the model and revision IDs. Acked-by: NChristian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Russell King 提交于
Add a helper to extract etnaviv bitfields from register values. Acked-by: NChristian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Russell King 提交于
Use the defined constants in common.xml.h for the chip model rather than coding these as hex numbers. Acked-by: NChristian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-
由 Russell King 提交于
Ignore GPUs with a 2.0 front end. These have a different register layout for the front end, which provokes imprecise aborts from the register accesses in the 'gpu' debugfs file. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
-