- 23 6月, 2015 7 次提交
-
-
由 John Harrison 提交于
Now that all callers of i915_add_request() have a request pointer to hand, it is possible to update the add request function to take a request pointer rather than pulling it out of the OLR. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
Updated the display page flip code to do explicit request creation and submission rather than relying on the OLR and just hoping that the request actually gets submitted at some random point. The sequence is now to create a request, queue the work to the ring, assign the known request to the flip queue work item then actually submit the work and post the request. Note that every single flip function used to finish with '__intel_ring_advance(ring);'. However, immediately after they return there is now an add request call which will do the advance anyway. Thus the many duplicate advance calls have been removed. v2: Updated commit message with comment about advance removal. v3: The request can now be allocated by the _sync() code earlier on. Thus the page flip path does not necessarily need to allocate a new request, it may be able to re-use one. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
Updated the two render_state_init() functions to take a request pointer instead of a ring. This removes their reliance on the OLR. v2: Rebased to newer tree. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
Now that everything above has been converted to use requests, it is possible to update init_context() to take a request pointer instead of a ring/context pair. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
The alloc_request() function does not actually return the newly allocated request. Instead, it must be pulled from ring->outstanding_lazy_request. This patch fixes this so that code can create a request and start using it knowing exactly which request it actually owns. v2: Updated for new i915_gem_request_alloc() scheme. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
The i915_add_request() function is called to keep track of work that has been written to the ring buffer. It adds epilogue commands to track progress (seqno updates and such), moves the request structure onto the right list and other such house keeping tasks. However, the work itself has already been written to the ring and will get executed whether or not the add request call succeeds. So no matter what goes wrong, there isn't a whole lot of point in failing the call. At the moment, this is fine(ish). If the add request does bail early on and not do the housekeeping, the request will still float around in the ring->outstanding_lazy_request field and be picked up next time. It means multiple pieces of work will be tagged as the same request and driver can't actually wait for the first piece of work until something else has been submitted. But it all sort of hangs together. This patch series is all about removing the OLR and guaranteeing that each piece of work gets its own personal request. That means that there is no more 'hoovering up of forgotten requests'. If the request does not get tracked then it will be leaked. Thus the add request call _must_ not fail. The previous patch should have already ensured that it _will_ not fail by removing the potential for running out of ring space. This patch enforces the rule by actually removing the early exit paths and the return code. Note that if something does manage to fail and the epilogue commands don't get written to the ring, the driver will still hang together. The request will be added to the tracking lists. And as in the old case, any subsequent work will generate a new seqno which will suffice for marking the old one as complete. v2: Improved WARNings (Tomas Elf review request). For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
It is a bad idea for i915_add_request() to fail. The work will already have been send to the ring and will be processed, but there will not be any tracking or management of that work. The only way the add request call can fail is if it can't write its epilogue commands to the ring (cache flushing, seqno updates, interrupt signalling). The reasons for that are mostly down to running out of ring buffer space and the problems associated with trying to get some more. This patch prevents that situation from happening in the first place. When a request is created, it marks sufficient space as reserved for the epilogue commands. Thus guaranteeing that by the time the epilogue is written, there will be plenty of space for it. Note that a ring_begin() call is required to actually reserve the space (and do any potential waiting). However, that is not currently done at request creation time. This is because the ring_begin() code can allocate a request. Hence calling begin() from the request allocation code would lead to infinite recursion! Later patches in this series remove the need for begin() to do the allocate. At that point, it becomes safe for the allocate to call begin() and really reserve the space. Until then, there is a potential for insufficient space to be available at the point of calling i915_add_request(). However, that would only be in the case where the request was created and immediately submitted without ever calling ring_begin() and adding any work to that request. Which should never happen. And even if it does, and if that request happens to fall down the tiny window of opportunity for failing due to being out of ring space then does it really matter because the request wasn't doing anything in the first place? v2: Updated the 'reserved space too small' warning to include the offending sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added re-initialisation of tracking state after a buffer wrap to keep the sanity checks accurate. v3: Incremented the reserved size to accommodate Ironlake (after finally managing to run on an ILK system). Also fixed missing wrap code in LRC mode. v4: Added extra comment and removed duplicate WARN (feedback from Tomas). For: VIZ-5115 CC: Tomas Elf <tomas.elf@intel.com> Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 03 6月, 2015 3 次提交
-
-
由 Ville Syrjälä 提交于
MI_MODE is saved in the logical context so WaDisableAsyncFlipPerfMode must be applied using LRIs on gen8. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Ville Syrjälä 提交于
INSTPM is saved in the logical context so we should initialize it using LRIs on gen8. It actually defaults to 1 starting from HSW, but let's keep the write around anyway. Also drop the INSTPM_FORCE_ORDERING setup entirely on gen9+ since it's now a reserved bit. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Ville Syrjälä 提交于
commit 65ca7514 Author: Damien Lespiau <damien.lespiau@intel.com> Date: Mon Feb 9 19:33:22 2015 +0000 drm/i915/skl: Implement WaBarrierPerformanceFixDisable got misapplied and the code landed in chv_init_workarounds() instead of the intended skl_init_workarounds(). Move it over to the right place. Cc: Damien Lespiau <damien.lespiau@intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 22 5月, 2015 1 次提交
-
-
由 Ville Syrjälä 提交于
GEN8_L3SQCREG1 isn't saved in the context (verified by going through a context dump), and so we shouldn't be using the ring w/a code to initialize it. Also Bspec explicitly talks about MMIO and writing it with the CPU. Additionally there's another w/a WaTempDisableDOPClkGating:bdw which tells us to disable DOP clock gating around the GEN8_L3SQCREG1 write to make sure everyone notices the change. So let's do that as well. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 5月, 2015 3 次提交
-
-
由 Chris Wilson 提交于
Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Imre Deak 提交于
v2: - set the override disable flag too on stepping F0 (mika) Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Imre Deak 提交于
On B0 and C0 steppings the workaround enable bit would be overriden by default, so the overriding must be disabled. The WA was added in commit 83a24979 Author: Nick Hoath <nicholas.hoath@intel.com> Date: Fri Apr 10 13:12:26 2015 +0100 drm/i915/bxt: Add WaForceContextSaveRestoreNonCoherent Spotted-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 5月, 2015 1 次提交
-
-
由 Imre Deak 提交于
Also make the WA comment consistent with the rest, where the stepping info is not shown. Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 5月, 2015 12 次提交
-
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Note that we also need this for skl. Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> [danvet: Note that we also need this for skl, requested by Imre.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Note that we also need this for skl. Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> [danvet: Note that we also need this for skl, requested by Imre.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Nick Hoath 提交于
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 14 4月, 2015 2 次提交
-
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NNick Hoath <nicholas.hoath@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NNick Hoath <nicholas.hoath@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 4月, 2015 3 次提交
-
-
由 Chris Wilson 提交于
requests are even more frequently allocated than objects and equally benefit from having a dedicated slab. v2: Rebase Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
After the removal of DRI1, all access to the rings are through requests and so we can always be sure that there is a request to wait upon to free up available space. The fallback code only existed so that we could quiesce the GPU following unmediated access by DRI1. v2: Rebase Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
I woke up one morning and found 50k objects sitting in the batch pool and every search seemed to iterate the entire list... Painting the screen in oils would provide a more fluid display. One issue with the current design is that we only check for retirements on the current ring when preparing to submit a new batch. This means that we can have thousands of "active" batches on another ring that we have to walk over. The simplest way to avoid that is to split the pools per ring and then our LRU execution ordering will also ensure that the inactive buffers remain at the front. v2: execlists still requires duplicate code. v3: execlists requires more duplicate code Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 4月, 2015 1 次提交
-
-
由 Nick Hoath 提交于
Adds framework for Broxton HW WAs Signed-off-by: NNick Hoath <nicholas.hoath@intel.com> Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NNick Hoath <nicholas.hoath@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 07 4月, 2015 1 次提交
-
-
由 Rodrigo Vivi 提交于
Program the default initial value of the L3SqcReg1 on BDW for performance v2: Default confirmed and using intel_ring_emit_wa as Mika pointed out. v3: Spec shows now a different value. It tells us to set to 0x784000 instead the 0x610000 that is there already. Also rebased after a long time so using WA_WRITE now. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 01 4月, 2015 2 次提交
-
-
由 John Harrison 提交于
The legacy and LRC code paths have an almost identical procedure for waiting for space in the ring buffer. They both search for a request in the free list that will advance the tail to a point where sufficient space is available. They then wait for that request, retire it and recalculate the free space value. Unfortunately, a bug in the LRC side meant that the resulting free space might not be as large as expected and indeed, might not be sufficient. This is because it was testing against the value of request->tail not request->postfix. Whereas, when a request is retired, ringbuf->tail is updated to req->postfix not req->tail. Another significant difference between the two is that the LRC one did not trust the wait for request to work! It redid the is there enough space available test and would fail the call if insufficient. Whereas, the legacy version just said 'return 0' - it assumed the preceeding code works. This difference meant that the LRC version still worked even with the bug - it just fell back to the polling wait path. For: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NThomas Daniel <thomas.daniel@intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
The request allocation code is largely duplicated between legacy mode and execlist mode. The actual difference between the two versions of the code is pretty minimal. This patch moves the common code out into a separate function. This is then called by the execution specific version prior to setting up the one different value. For: VIZ-5190 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 3月, 2015 2 次提交
-
-
由 kbuild test robot 提交于
drivers/gpu/drm/i915/intel_ringbuffer.c:435:1-4: WARNING: end returns can be simpified Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci CC: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Paulo Zanoni 提交于
Kill the blt/render tracking we currently have and use the frontbuffer tracking infrastructure. Don't enable things by default yet. v2: (Rodrigo) Fix small conflict on rebase and typo at subject. v3: (Paulo) Rebase on RENDER_CS change. v4: (Paulo) Rebase. v5: (Paulo) Simplify: flushes don't have origin (Daniel). Also rebase due to patch order changes. Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 2月, 2015 2 次提交
-
-
由 John Harrison 提交于
In execlist mode, the ringbuf is a function of the ring and context whereas in legacy mode, it is derived from the ring alone. Thus the calculation required to determine the ringbuf pointer from the ring (and context) also needs to test execlist mode or not. This is messy. Further, the request structure holds a pointer to both the ring and the context for which it was created. Thus, given a request, it is possible to derive the ringbuf in either legacy or execlist mode. Hence it is necessary to pass just the request in to all the low level functions rather than some combination of request, ring, context and ringbuf. However, rather than recalculating it each time, it is much simpler to just cache the ringbuf pointer in the request structure itself. Caching the pointer means the calculation is done once at request creation time and all further code and simply read it directly from the request structure. OTC-Jira: VIZ-5115 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> [danvet: Drop contentless comment in lrc alloc request entirely. And spelling fix in the commit message.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 John Harrison 提交于
There is a flags word that is passed through the execbuffer code path all the way from initial decoding of the user parameters down to the very final dispatch buffer call. It is simply called 'flags'. Unfortuantely, there are many other flags words floating around in the same blocks of code. Even more once the GPU scheduler arrives. This patch makes it more obvious exactly which flags word is which by renaming 'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word already have an 'I915_DISPATCH_' prefix on them and so are not quite so ambiguous. OTC-Jira: VIZ-1587 Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> [danvet: Resolve conflict with Chris' rework of the bb parsing.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-