- 22 6月, 2015 1 次提交
-
-
由 Rodrigo Vivi 提交于
This patch doesn't have any functional change, but organize fruntbuffer invalidate and busy by removing unecesarry signature argument for ring. It was unsed on mark_fb_busy and only used on fb_obj_invalidate for the same ORIGIN_CS usage. So let's clean it a bit Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 15 6月, 2015 1 次提交
-
-
由 Ville Syrjälä 提交于
Apparently we can have requests even if though the active list is empty, so do the request retirement regardless of whether there's anything on the active list. The way it happened here is that during suspend intel_ring_idle() notices the olr hanging around and then proceeds to get rid of it by adding a request. However since there was nothing on the active lists i915_gem_retire_requests() didn't clean those up, and so the idle work never runs, and we leave the GPU "busy" during suspend resulting in a WARN later. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 27 5月, 2015 1 次提交
-
-
由 Chris Wilson 提交于
In commit 1854d5ca Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:32 2015 +0100 drm/i915: Deminish contribution of wait-boosting from clients we removed an atomic timer based check for allowing waitboosting and moved it below the mutex taken during RPS. However, that mutex can be held for long periods of time on Vallyview/Cherryview as communication with the PCU is slow. As clients may frequently wait for results (e.g. such as tranform feedback) we introduced contention between the client and the RPS worker. We can take advantage of the RPS worker, by switching the wait boost decision to use spin locks and defer the actual reclocking to the worker. Fixes a regression of up to 45% on Baytrail and Baswell! v2 (Daniel): - Use max_freq_softlimit instead of the not-yet-merged boost frequency. - Don't inject a fake irq into the boost work, instead treat client_boost as just another legit waker. v3: Drop the now unused mask (Chris). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 5月, 2015 3 次提交
-
-
由 Chris Wilson 提交于
As Daniel commented on commit b7ffe1362c5f468b853223acc9268804aa92afc8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 27 13:41:24 2015 +0100 drm/i915: Free RPS boosts for all laggards it is better to be explicit when sharing hardcoded values such as throttle/boost timeouts. Make it so! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
After allocating from the slab cache, we then need to free the request back into the slab cache upon error (and not call kfree as that leads to eventual memory corruption). Fixes regression from commit efab6d8d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:57 2015 +0100 drm/i915: Use a separate slab for requests Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
If the client stalls on a congested request, chosen to be 20ms old to match throttling, allow the client a free RPS boost. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] [danvet: s/0/NULL/ reported by 0-day build] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 5月, 2015 4 次提交
-
-
由 Chris Wilson 提交于
Now that we have internal clients, rather than faking a whole drm_i915_file_private just for tracking RPS boosts, create a new struct intel_rps_client and pass it along when waiting. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Ring switches can occur many times per frame, and are often out of control, causing frequent RPS boosting for no practical benefit. Treat the sw semaphore synchronisation as a separate client and only allow it to boost once per busy/idle cycle. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
The merged seqno->request conversion from John called request variables req, but some (not all) of Chris' recent patches changed those to just rq. We've had a lenghty (and inconclusive) discussion on irc which is the more meaningful name with maybe at most a slight bias towards req. Given that the "don't change names without good reason to avoid conflicts" rule applies, so lets go back to a req everywhere for consistency. I'll sed any patches for which this will cause conflicts before applying. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> [danvet: s/origina/merged/ as pointed out by Chris - the first mass-conversion patch was from Chris, the merged one from John.] Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
- 20 5月, 2015 2 次提交
-
-
由 Chris Wilson 提交于
We no longer interpolate domains in the same manner, and even if we did, we should trust setting either of the other write domains would trigger an invalidation rather than force it. Remove the tweaking of the read_domains since it serves no purpose and use i915_gem_object_wait_rendering() directly. Note that this goes back to commit a8198eea Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 13 22:04:09 2011 +0100 drm/i915: Introduce i915_gem_object_finish_gpu() and gpu domain tracking died in commit cc889e0f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_list which is more than 1 year older. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Add notes with information dug out of git history.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Lost in commit c5ad54cf Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Date: Wed May 6 14:36:09 2015 +0300 drm/i915: Use partial view in mmap fault handler Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
- 08 5月, 2015 6 次提交
-
-
由 Joonas Lahtinen 提交于
We do not yet support tiled objects bigger than the mappable aperture size so reject them. Reported-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> [danvet: Rework the check a bit to avoid warnings.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Joonas Lahtinen 提交于
Use partial view for huge BOs (bigger than half the mappable aperture) in fault handler so that they can be accessed withough trying to make room for them by evicting other objects. v2: - Only use partial views in the case where early rejection was previously done. - Account variable type changes from previous reroll. v3: - Add a comment about overwriting existing page entries. (Tvrtko Ursulin) - Whitespace fixes. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Joonas Lahtinen 提交于
Do not skip special GGTT views when considering whether an object is pinned or not. Wrong behaviour was introduced in; commit ec7adb6e Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Date: Mon Mar 16 14:11:13 2015 +0200 drm/i915: Do not use ggtt_view with (aliasing) PPGTT Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Joonas Lahtinen 提交于
GGTT VMA sizes might be smaller than the whole object size due to different GGTT views. v2: - Separate GGTT view constraint calculations from normal view constraint calculations (Chris Wilson) v3: - Do not bother with debug wording. (Tvrtko Ursulin) v4: - Clearer logic for calculating map_and_fenceable (Tvrtko Ursulin) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> [danvet: Drop BUG_ON, it's redudant.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Reading a single value from the object, the locking only provides futile protection against userspace races. The locking is useless so remove it. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Mika Kuoppala 提交于
Unbinding doesn't always lead to unconditional destruction of vma. This destruction avoidance happens if vma is part of execbuffer relocation list or if vma is being considered for eviction in i915_gem_evict_something(). For those other users, mark the vma unbound so that the correct state of this vma is preserved. Reported-by: NChris Wilson <chris@chris-wilson.co.ok> Cc: Chris Wilson <chris@chris-wilson.co.ok> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 24 4月, 2015 3 次提交
-
-
由 Michel Thierry 提交于
WaIdleLiteRestore is an execlists-only workaround, and requires the driver to ensure that any context always has HEAD!=TAIL when attempting lite restore. Add two extra MI_NOOP instructions at the end of each request, but keep the requests tail pointing before the MI_NOOPs. We may not need to executed them, and this is why request->tail is sampled before adding these extra instructions. If we submit a context to the ELSP which has previously been submitted, move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL. v2: Move overallocation to gen8_emit_request, and added note about sampling request->tail in commit message (Chris). v3: Remove redundant request->tail assignment in __i915_add_request, in lrc mode this is already set in execlists_context_queue. Do not add wa implementation details inside gem (Chris). v4: Apply the wa whenever the req has been resubmitted and update comment (Chris). Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NThomas Daniel <thomas.daniel@intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Daniel Vetter 提交于
Currently we have the problem that the decision whether ptes need to be (re)written is splattered all over the codebase. Move all that into i915_vma_bind. This needs a few changes: - Just reuse the PIN_* flags for i915_vma_bind and do the conversion to vma->bound in there to avoid duplicating the conversion code all over. - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if around) explicit, add PIN_USER for that. - Two callers want to update ptes, give them a PIN_UPDATE for that. Of course we still want to avoid double-binding, but that should be taken care of: - A ppgtt vma will only ever see PIN_USER, so no issue with double-binding. - A ggtt vma with aliasing ppgtt needs both types of binding, and we track that properly now. - A ggtt vma without aliasing ppgtt could be bound twice. In the lower-level ->bind_vma functions hence unconditionally set GLOBAL_BIND when writing the ggtt ptes. There's still a bit room for cleanup, but that's for follow-up patches. v2: Fixup fumbles. v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
由 Daniel Vetter 提交于
It's true that we might need to context switch, but both the signalling and implementation of the same are a few source files away. Remove it. Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 4月, 2015 2 次提交
-
-
由 Daniel Vetter 提交于
They change with the address space and not with each vma, so move them into the right pile of vfuncs. Save 2 pointers per vma and clarifies the code. Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
由 Tvrtko Ursulin 提交于
Purpose of this tracking is to know when to flush the cache between the CPU and the non-coherent display engine. Prior to: commit 121920fa Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Mon Mar 23 11:10:37 2015 +0000 drm/i915/skl: Query display address through a wrapper This worked by a mix of direct flag manipulation and checking for existence of a pinned GGTT VMA. With the introduction of rotated display mappings this approach is no longer correct. New simpler approach is to just keep this count over calls which pin and unpin objects to and from display, at the slight cost of extra space in every bo. (Inspired and extracted code from a larger rework by Chris Wilson.) v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä) v3: Commit message corrections. (Chris Wilson) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 16 4月, 2015 1 次提交
-
-
由 Tvrtko Ursulin 提交于
One month passed between posting a patch and it getting merged, and unfortunately even though it still applies, it needs fixing to account for changes in function parameters since: commit d385612e15b8b6eb3db328d83f1872ef8a381788 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Date: Tue Mar 17 14:45:29 2015 +0000 drm/i915: Log view type when printing warnings Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> [danvet: Squash in fixup from Tvrtko to fix the rebase conflict.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 4月, 2015 2 次提交
-
-
由 Chris Wilson 提交于
The obj->pin_mappable flag only exists for debug purposes and is a hindrance that is mistreated with rotated GGTT views. For debug purposes, it suffices to mark objects with pin_display as being of note. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
This provides a nice boost to mesa in swap bound scenarios (as mesa throttles itself to the previous frame and given the scenario that will complete shortly). It will also provide a good boost to systems running with semaphores disabled and so frequently waiting on the GPU as it switches rings. In the most favourable of microbenchmarks, this can increase performance by around 15% - though in practice improvements will be marginal and rarely noticeable. v2: Account for user timeouts v3: Limit the spinning to a single jiffie (~1us) at most. On an otherwise idle system, there is no scheduler contention and so without a limit we would spin until the GPU is ready. v4: Drop forcewake - the lazy coherent access doesn't require it, and we have no reason to believe that the forcewake itself improves seqno coherency - it only adds delay. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 4月, 2015 12 次提交
-
-
由 Mika Kuoppala 提交于
Move to i915_vma_bind as it is part of the binding. Suggested-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
We already assign a unique identifier to every request: seqno. That someone felt like adding a second one without even mentioning why and tweaking ABI smells very fishy. Fixes regression from commit b3a38998 Author: Nick Hoath <nicholas.hoath@intel.com> Date: Thu Feb 19 16:30:47 2015 +0000 drm/i915: Fix a use after free, and unbalanced refcounting v2: Rebase Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Nick Hoath <nicholas.hoath@intel.com> Cc: Thomas Daniel <thomas.daniel@intel.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> [danvet: Fixup because different merge order.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
vma are more frequently allocated than objects and so should equally benefit from having a dedicated slab. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
requests are even more frequently allocated than objects and equally benefit from having a dedicated slab. v2: Rebase Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
When we submit a request to the GPU, we first take the rpm wakelock, and only release it once the GPU has been idle for a small period of time after all requests have been complete. This means that we are sure no new interrupt can arrive whilst we do not hold the rpm wakelock and so can drop the individual get/put around every single request inside execlists. Note: to close one potential issue we should mark the GPU as busy earlier in __i915_add_request. To elaborate: The issue is that we emit the irq signalling sequence before we grab the rpm reference, which means we could miss the resulting interrupt (since that's not set up when suspended). The only bad side effect is a missed interrupt, gt mmio writes automatically wake up the hw itself. But otoh we have an umbrella rpm reference for the entirety of execbuf, as long as that's there we're covered. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Explain a bit more about the add_request issue, which after some irc chatting with Chris turns out to not be an issue really.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Now with the trimmed memcpy before the command parser, we try to allocate many different sizes of batches, predominantly one or two pages. We can therefore speed up searching for a good sized batch by keeping the objects of buckets of roughly the same size. v2: Add a comment about bucket sizes Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
At runtime, this helps ensure that the batch pools are kept trim and fast. Then at suspend, this releases memory that we do not need to restore. It also ties into the oom-notifier to ensure that we recover as much kernel memory as possible during OOM. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
This reverts commit ec5cc0f9 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 12 10:28:55 2014 +0100 drm/i915: Restrict GPU boost to the RCS engine The premise that media/blitter workloads are not affected by boosting is patently false with a trip through igt. The question that remains is what exactly is going wrong with the media workload that prompted this? Hopefully that would be fixed by the missing agressive downclocking, in addition to the extra restrictions imposed on how frequent a process is allowed to boost. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll> Acked-by: NDeepak S <deepak.s@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
With boosting for missed pageflips, we have a much stronger indication of when we need to (temporarily) boost GPU frequency to ensure smooth delivery of frames. So now only allow each client to perform one RPS boost in each period of GPU activity due to stalling on results. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Deepak S <deepak.s@linux.intel.com> Reviewed-by: NDeepak S <deepak.s@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
The biggest user of i915_gem_object_get_page() is the relocation processing during execbuffer. Typically userspace passes in a set of relocations in sorted order. Sadly, we alternate between relocations increasing from the start of the buffers, and relocations decreasing from the end. However the majority of consecutive lookups will still be in the same page. We could cache the start of the last sg chain, however for most callers, the entire sgl is inside a single chain and so we see no improve from the extra layer of caching. v2: Avoid the double increment inside unlikely() References: https://bugs.freedesktop.org/show_bug.cgi?id=88308Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 01 4月, 2015 2 次提交
-
-
由 John Harrison 提交于
The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
The submission portion of the execbuffer code path was abstracted into a function pointer indirection as part of the legacy vs execlist work. The two implementation functions are called 'i915_gem_ringbuffer_submission' and 'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is already a 'i915_gem_do_execbuffer' function (which is what calls the pointer indirection). The name of the pointer is therefore considered to be backwards and should be changed. This patch renames it to 'execbuf_submit' which is hopefully a bit clearer. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-