- 28 5月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Split the plain old shmem object into its own file to start decluttering i915_gem.c v2: Lose the confusing, hysterical raisins, suffix of _gtt. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk
-
- 08 5月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
After realising we need to sample RING_START to detect context switches from preemption events that do not allow for the seqno to advance, we can also realise that the seqno itself is just a distance along the ring and so can be replaced by sampling RING_HEAD. v2: Bonus comment for the mystery separate CS_STALL before MI_USER_INTERRUPT Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190508080704.24223-1-chris@chris-wilson.co.uk
-
- 01 5月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Make the engine responsible for cleaning itself up! This removes the i915->gt.cleanup vfunc that has been annoying the casual reader and myself for the last several years, and helps keep a future patch to add more cleanup tidy. v2: Assert that engine->destroy is set after the backend starts allocating its own state. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190501103204.18632-1-chris@chris-wilson.co.uk
-
- 27 4月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
In the next patch, we require the engine vfuncs setup prior to initialising the pinned kernel contexts, so split the vfunc setup from the engine initialisation and call it earlier. v2: s/setup_xcs/setup_common/ for intel_ring_submission_setup() Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-6-chris@chris-wilson.co.uk
-
- 26 4月, 2019 3 次提交
-
-
由 Chris Wilson 提交于
Broadwater and the rest of gen4 do support being able to saving and reloading context specific registers between contexts, providing isolation of the basic GPU state (as programmable by userspace). This allows userspace to assume that the GPU retains their state from one batch to the next, minimising the amount of state it needs to reload and manually save across batches. v2: CONSTANT_BUFFER woes Running through piglit turned up an interesting issue, a GPU hang inside the context load. The context image includes the CONSTANT_BUFFER command that loads an address into a on-gpu buffer, and the context load was executing that immediately. However, since it was reading from the GTT there is no guarantee that the GTT retains the same configuration as when the context was saved, resulting in stray reads and a GPU hang. Having tried issuing a CONSTANT_BUFFER (to disable the command) from the ring before saving the context to no avail, we resort to patching out the instruction inside the context image before loading. This does impose that gen4 always reissues CONSTANT_BUFFER commands on each batch, but due to the use of a shared GTT that was and will remain a requirement. v3: ECOSKPD to the rescue Ville found the magic bit in the ECOSKPD to disable saving and restoring the CONSTANT_BUFFER from the context image, thereby completely avoiding the GPU hangs from chasing invalid pointers. This appears to be the default behaviour for gen5, and so we just need to tweak gen4 to match. v4: Fix spelling of ECOSKPD and discover it already exists Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190419172720.5462-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Ironlake does support being able to saving and reloading context specific registers between contexts, providing isolation of the basic GPU state (as programmable by userspace). This allows userspace to assume that the GPU retains their state from one batch to the next, minimising the amount of state it needs to reload, or manually save and restore. v2: Fix off-by-one in reading CXT_SIZE, and add a comment that the CXT_SIZE and context-layout do not match in bspec, but the difference is irrelevant as we overallocate the full page anyway (Ville). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Despite what I think the prm recommends, commit f2253bd9 ("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context") turned out to be a huge mistake when enabling Ironlake contexts as the GPU would hang on either a MI_FLUSH or PIPE_CONTROL immediately following the MI_SET_CONTEXT of an active mesa context (more vanilla contexts, e.g. simple rendercopies with igt, do not suffer). Ville found the following clue, "[DevCTG+]: For the invalidate operation of the pipe control, the following pointers are affected. The invalidate operation affects the restore of these packets. If the pipe control invalidate operation is completed before the context save, the indirect pointers will not be restored from memory. 1. Pipeline State Pointer 2. Media State Pointer 3. Constant Buffer Packet" which suggests by us emitting the INVALIDATE prior to the MI_SET_CONTEXT, we prevent the context-restore from chasing the dangling pointers within the image, and explains why this likely prevents the GPU hang. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-1-chris@chris-wilson.co.uk
-
- 25 4月, 2019 4 次提交
-
-
由 Chris Wilson 提交于
In the current scheme, on submitting a request we take a single global GEM wakeref, which trickles down to wake up all GT power domains. This is undesirable as we would like to be able to localise our power management to the available power domains and to remove the global GEM operations from the heart of the driver. (The intent there is to push global GEM decisions to the boundary as used by the GEM user interface.) Now during request construction, each request is responsible via its logical context to acquire a wakeref on each power domain it intends to utilize. Currently, each request takes a wakeref on the engine(s) and the engines themselves take a chipset wakeref. This gives us a transition on each engine which we can extend if we want to insert more powermangement control (such as soft rc6). The global GEM operations that currently require a struct_mutex are reduced to listening to pm events from the chipset GT wakeref. As we reduce the struct_mutex requirement, these listeners should evaporate. Perhaps the biggest immediate change is that this removes the struct_mutex requirement around GT power management, allowing us greater flexibility in request construction. Another important knock-on effect, is that by tracking engine usage, we can insert a switch back to the kernel context on that engine immediately, avoiding any extra delay or inserting global synchronisation barriers. This makes tracking when an engine and its associated contexts are idle much easier -- important for when we forgo our assumed execution ordering and need idle barriers to unpin used contexts. In the process, it means we remove a large chunk of code whose only purpose was to switch back to the kernel context. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Start acquiring the logical intel_context and using that as our primary means for request allocation. This is the initial step to allow us to avoid requiring struct_mutex for request allocation along the perma-pinned kernel context, but it also provides a foundation for breaking up the complex request allocation to handle different scenarios inside execbuf. For the purpose of emitting a request from inside retirement (see the next patch for engine power management), we also need to lift control over the timeline mutex to the caller. v2: Note that the request carries the active reference upon construction. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We wish to start segregating the power management into different control domains, both with respect to the hardware and the user interface. The first step is that at the lowest level flow of requests, we want to process a context event (and not a global GEM operation). In this patch, we introduce the context callbacks that in future patches will be redirected to per-engine interfaces leading to global operations as required. The intent is that this will be guarded by the timeline->mutex, except that retiring has not quite finished transitioning over from being guarded by struct_mutex. So at the moment it is protected by struct_mutex with a reminded to switch. v2: Rename default handlers to intel_context_enter_engine. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: NJani Nikula <jani.nikula@intel.com> Acked-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
-
- 19 4月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
We need to set the various ring registers prior to restarting the engine, or else we may restart it after reset/resume in an ill-defined state. Reported-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190418132720.3716-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We store the engine->imr mask and set up the RING_IMR register on restarting the engine. We do not then want to overwrite it with an incomplete mask later as we may then lose interrupts! Reported-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190418132720.3716-1-chris@chris-wilson.co.uk
-
- 17 4月, 2019 1 次提交
-
-
由 Paulo Zanoni 提交于
This discussion started because we use token pasting in the GEN{2,3}_IRQ_INIT and GEN{2,3}_IRQ_RESET macros, so gen2-4 passes an empty argument to those macros, making the code a little weird. The original proposal was to just add a comment as the empty argument, but Ville suggested we just add a prefix to the registers, and that indeed sounds like a more elegant solution. Now doing this is kinda against our rules for register naming since we only add gens or platform names as register prefixes when the given gen/platform changes a register that already existed before. On the other hand, we have so many instances of IIR/IMR in comments that adding a prefix would make the users of these register more easily findable, in addition to make our token pasting macros actually readable. So IMHO opening an exception here is worth it. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190410235344.31199-4-paulo.r.zanoni@intel.com
-
- 11 4月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
On resume, we know that the only pinned contexts in danger of seeing corruption are the kernel context, and so we do not need to walk the list of all GEM contexts as we tracked them on each engine. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk
-
- 30 3月, 2019 1 次提交
-
-
由 Daniele Ceraolo Spurio 提交于
Those functions are used on gen4 as well and gen4 does have a non-RCS engine, so remove the BUG_ON and flip back the logic to what it was before the ENGINE_READ/WRITE update v2: update the posting read as well (Chris, Ville). Fixes: baba6e57 ("drm/i915: take a reference to uncore in the engine and use it") Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190329165018.32953-1-daniele.ceraolospurio@intel.com
-
- 27 3月, 2019 3 次提交
-
-
由 Daniele Ceraolo Spurio 提交于
A few advantages: - Prepares us for the planned split of display uncore from GT uncore - Improves our engine-centric view of the world in the engine code and allows us to avoid jumping back to dev_priv. - Allows us to wrap accesses to engine register in nice macros that automatically pick the right mmio base. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-10-daniele.ceraolospurio@intel.com
-
由 Daniele Ceraolo Spurio 提交于
The intel_uncore structure is the owner of register access, so subclass the function to it. While at it, use a local uncore var and switch to the new read/write functions where it makes sense. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-9-daniele.ceraolospurio@intel.com
-
由 Daniele Ceraolo Spurio 提交于
The intel_uncore structure is the owner of register access, so subclass the function to it. While at it, use a local uncore var and switch to the new read/write functions where it makes sense. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-8-daniele.ceraolospurio@intel.com
-
- 22 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
When we return pages to the system, we ensure that they are marked as being in the CPU domain since any external access is uncontrolled and we must assume the worst. This means that we need to always flush the pages on acquisition if we need to use them on the GPU, and from the beginning have used set-domain. Set-domain is overkill for the purpose as it is a general synchronisation barrier, but our intent is to only flush the pages being swapped in. If we move that flush into the pages acquisition phase, we know then that when we have obj->mm.pages, they are coherent with the GPU and need only maintain that status without resorting to heavy handed use of set-domain. The principle knock-on effect for userspace is through mmap-gtt pagefaulting. Our uAPI has always implied that the GTT mmap was async (especially as when any pagefault occurs is unpredicatable to userspace) and so userspace had to apply explicit domain control itself (set-domain). However, swapping is transparent to the kernel, and so on first fault we need to acquire the pages and make them coherent for access through the GTT. Our use of set-domain here leaks into the uABI that the first pagefault was synchronous. This is unintentional and baring a few igt should be unoticed, nevertheless we bump the uABI version for mmap-gtt to reflect the change in behaviour. Another implication of the change is that gem_create() is presumed to create an object that is coherent with the CPU and is in the CPU write domain, so a set-domain(CPU) following a gem_create() would be a minor operation that merely checked whether we could allocate all pages for the object. On applying this change, a set-domain(CPU) causes a clflush as we acquire the pages. This will have a small impact on mesa as we move the clflush here on !llc from execbuf time to create, but that should have minimal performance impact as the same clflush exists but is now done early and because of the clflush issue, userspace recycles bo and so should resist allocating fresh objects. Internally, the presumption that objects are created in the CPU write-domain and remain so through writes to obj->mm.mapping is more prevalent than I expected; but easy enough to catch and apply a manual flush. For the future, we should push the page flush from the central set_pages() into the callers so that we can more finely control when it is applied, but for now doing it one location is easier to validate, at the cost of sometimes flushing when there is no need. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
-
- 21 3月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
The timeline->name is only used for convenience in pretty printing the i915_request.fence->ops->get_timeline_name() and it is just as convenient to pull it from the gem_context directly. The few instances of its use inside GEM_TRACE() has proven more of a nuisance than helpful, so not worth saving imo. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-4-chris@chris-wilson.co.uk
-
由 Daniele Ceraolo Spurio 提交于
Now that the internal code all works on intel_uncore, flip the external-facing interface. v2: fix GVT. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-4-daniele.ceraolospurio@intel.com
-
- 19 3月, 2019 4 次提交
-
-
由 Chris Wilson 提交于
For virtual engines, we need to keep the HW context alive while it remains in use. For regular HW contexts, they are created and kept alive until the end of the GEM context. For simplicity, generalise the requirements and keep an active reference to each HW context. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On unpinning the intel_context, we remove it from the active list inside the GEM context. This list is supposed to be guarded by the GEM context mutex, so remember to take it! Fixes: 7e3d9a59 ("drm/i915: Track active engines within a context") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As the final request on a ring may hold the reference to this ring (via retiring the last pinned context), we may find ourselves chasing a dangling pointer on completion of the list. A quick solution is to hold a reference to the ring itself as we retire along it so that we only free it after we stop dereferencing it. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If we use the STORE_DATA_INDEX function we can use a fixed offset and avoid having to lookup up the engine HWS address. A step closer to being able to emit the final breadcrumb during request_add rather than later in the submission interrupt handler. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-9-chris@chris-wilson.co.uk
-
- 12 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Move the pair of messages to the common callsite where it makes sense to include a bit more information about which request is being reset. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190312111146.10662-1-chris@chris-wilson.co.uk
-
- 08 3月, 2019 6 次提交
-
-
由 Chris Wilson 提交于
Introduce a mutex to start locking the HW contexts independently of struct_mutex, with a view to reducing the coarse struct_mutex. The intel_context.pin_mutex is used to guard the transition to and from being pinned on the gpu, and so is required before starting to build any request. The intel_context will then remain pinned until the request completes, but the mutex can be released immediately unpin completion of pinning the context. A slight variant of the above is used by per-context sseu that wants to inspect the pinned status of the context, and requires that it remains stable (either !pinned or pinned) across its operation. By using the pin_mutex to serialise operations while pin_count==0, we can take that pin_mutex for stabilise the boolean pin status. v2: for Tvrtko! * Improved commit message. * Dropped _gpu suffix from gen8_modify_rpcs_gpu. v3: Repair the locking for sseu selftests Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Each engine acquires a pin on the kernel contexts (normal and preempt) so that the logical state is always available on demand. Keep track of each engines pin by storing the returned pointer on the engine for quick access. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-6-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Push the intel_context pin callback down from intel_engine_cs onto the context itself by virtue of having a central caller for intel_context_pin() being able to lookup the intel_context itself. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In preparation for an ever growing number of engines and so ever increasing static array of HW contexts within the GEM context, move the array over to an rbtree, allocated upon first use. Unfortunately, this imposes an rbtree lookup at a few frequent callsites, but we should be able to mitigate those by moving over to using the HW context as our primary type and so only incur the lookup on the boundary with the user GEM context and engines. v2: Check for no HW context in guc_stage_desc_init Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If we place a pointer to the engine specific intel_context_ops in the engine itself, we can assign the ops pointer on initialising the context, and then rely on it being set. This simplifies the code in later patches. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
For use in the next patch, if we track which engines have been used by the HW, we can reduce the work required to flush our state off the HW to those engines. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-1-chris@chris-wilson.co.uk
-
- 06 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
In the next patch, we are introducing a broad virtual engine to encompass multiple physical engines, losing the 1:1 nature of BIT(engine->id). To reflect the broader set of engines implied by the virtual instance, lets store the full bitmask. v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/) v3: Tvrtko voted for moah churn so teach everyone to not mention ring and use $class$instance throughout. v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and VCS[0-4] in later gen. We opt to keep the code consistent and use 0-index naming throughout. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
-
- 26 2月, 2019 3 次提交
-
-
由 Chris Wilson 提交于
Having weaned the interrupt handling off using a single global execution queue, we no longer need to emit a global_seqno. Note that we still have a few assumptions about execution order along engine timelines, but this removes the most obvious artefact! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Stop accessing the HWSP to read the global seqno, and stop tracking the mirror in the engine's execution timeline -- it is unused. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
To determine whether an engine has 'stuck', we simply check whether or not is still on the same seqno for several seconds. To keep this simple mechanism intact over the loss of a global seqno, we can simply add a new global heartbeat seqno instead. As we cannot know the sequence in which requests will then be completed, we use a primitive random number generator instead (with a cycle long enough to not matter over an interval of a few thousand requests between hangcheck samples). The alternative to using a dedicated seqno on every request is to issue a heartbeat request and query its progress through the system. Sadly this requires us to reduce struct_mutex so that we can issue requests without requiring that bkl. v2: And without the extra CS_STALL for the hangcheck seqno -- we don't need strict serialisation with what comes later, we just need to be sure we don't write the hangcheck seqno before our batch is flushed. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-1-chris@chris-wilson.co.uk
-
- 08 2月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Apply backpressure to hogs that emit requests faster than the GPU can process them by waiting for their ring to be less than half-full before proceeding with taking the struct_mutex. This is a gross hack to apply throttling backpressure, the long term goal is to remove the struct_mutex contention so that each client naturally waits, preferably in an asynchronous, nonblocking fashion (pipelined operations for the win), for their own resources and never blocks another client within the driver at least. (Realtime priority goals would extend to ensuring that resource contention favours high priority clients as well.) This patch only limits excessive request production and does not attempt to throttle clients that block waiting for eviction (either global GTT or system memory) or any other global resources, see above for the long term goal. No microbenchmarks are harmed (to the best of my knowledge). Testcase: igt/gem_exec_schedule/pi-ringfull-* Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
-
- 30 1月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd"), the issue of handling multiple clients waiting in parallel was brought to our attention. The requirement was that every client should be woken immediately upon its request being signaled, without incurring any cpu overhead. To handle certain fragility of our hw meant that we could not do a simple check inside the irq handler (some generations required almost unbounded delays before we could be sure of seqno coherency) and so request completion checking required delegation. Before commit 688e6c72, the solution was simple. Every client waiting on a request would be woken on every interrupt and each would do a heavyweight check to see if their request was complete. Commit 688e6c72 introduced an rbtree so that only the earliest waiter on the global timeline would woken, and would wake the next and so on. (Along with various complications to handle requests being reordered along the global timeline, and also a requirement for kthread to provide a delegate for fence signaling that had no process context.) The global rbtree depends on knowing the execution timeline (and global seqno). Without knowing that order, we must instead check all contexts queued to the HW to see which may have advanced. We trim that list by only checking queued contexts that are being waited on, but still we keep a list of all active contexts and their active signalers that we inspect from inside the irq handler. By moving the waiters onto the fence signal list, we can combine the client wakeup with the dma_fence signaling (a dramatic reduction in complexity, but does require the HW being coherent, the seqno must be visible from the cpu before the interrupt is raised - we keep a timer backup just in case). Having previously fixed all the issues with irq-seqno serialisation (by inserting delays onto the GPU after each request instead of random delays on the CPU after each interrupt), we can rely on the seqno state to perfom direct wakeups from the interrupt handler. This allows us to preserve our single context switch behaviour of the current routine, with the only downside that we lose the RT priority sorting of wakeups. In general, direct wakeup latency of multiple clients is about the same (about 10% better in most cases) with a reduction in total CPU time spent in the waiter (about 20-50% depending on gen). Average herd behaviour is improved, but at the cost of not delegating wakeups on task_prio. v2: Capture fence signaling state for error state and add comments to warm even the most cold of hearts. v3: Check if the request is still active before busywaiting v4: Reduce the amount of pointer misdirection with list_for_each_safe and using a local i915_request variable inside the loops v5: Add a missing pluralisation to a purely informative selftest message. References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
To allow requests to forgo a common execution timeline, one question we need to be able to answer is "is this request running?". To track whether a request has started on HW, we can emit a breadcrumb at the beginning of the request and check its timeline's HWSP to see if the breadcrumb has advanced past the start of this request. (This is in contrast to the global timeline where we need only ask if we are on the global timeline and if the timeline has advanced past the end of the previous request.) There is still confusion from a preempted request, which has already started but relinquished the HW to a high priority request. For the common case, this discrepancy should be negligible. However, for identification of hung requests, knowing which one was running at the time of the hang will be much more important. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
-